diff --git a/AmazonBedrock/anthropic/03_Assigning_Roles_Role_Prompting.ipynb b/AmazonBedrock/anthropic/03_Assigning_Roles_Role_Prompting.ipynb
index 6aff6b1..5ddf70d 100755
--- a/AmazonBedrock/anthropic/03_Assigning_Roles_Role_Prompting.ipynb
+++ b/AmazonBedrock/anthropic/03_Assigning_Roles_Role_Prompting.ipynb
@@ -124,7 +124,7 @@
"source": [
"You can use role prompting as a way to get Claude to emulate certain styles in writing, speak in a certain voice, or guide the complexity of its answers. **Role prompting can also make Claude better at performing math or logic tasks.**\n",
"\n",
- "For example, in the example below, there is a definitive correct answer, which is yes. However, Claude gets it wrong and thinks it lacks information, which it doesn't:"
+ "For example, there is a definitive correct answer to the puzzle below: yes. **On newer models, Claude may already answer correctly without extra prompting**, so focus on how the role changes the model's **reasoning style and reliability**, not just whether the first answer is wrong."
]
},
{
@@ -144,9 +144,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "Now, what if we **prime Claude to act as a logic bot**? How will that change Claude's answer? \n",
+ "Now, what if we **prime Claude to act as a logic bot**? How will that change Claude's answer?\n",
"\n",
- "It turns out that with this new role assignment, Claude gets it right. (Although notably not for all the right reasons)"
+ "On current models, the biggest difference is often **how explicitly Claude reasons through the cases**, even when both versions land on the right final answer. That is still a useful prompt engineering win, because it makes the behavior more legible and repeatable."
]
},
{
@@ -189,9 +189,9 @@
"metadata": {},
"source": [
"### Exercise 3.1 - Math Correction\n",
- "In some instances, **Claude may struggle with mathematics**, even simple mathematics. Below, Claude incorrectly assesses the math problem as correctly solved, even though there's an obvious arithmetic mistake in the second step. Note that Claude actually catches the mistake when going through step-by-step, but doesn't jump to the conclusion that the overall solution is wrong.\n",
+ "Even simple arithmetic checks can benefit from better prompting. **Depending on the model version, Claude may already notice the mistake below**, or it may explain the arithmetic issue without clearly labeling the solution as incorrect.\n",
"\n",
- "Modify the `PROMPT` and / or the `SYSTEM_PROMPT` to make Claude grade the solution as `incorrectly` solved, rather than correctly solved. \n"
+ "Modify the `PROMPT` and / or the `SYSTEM_PROMPT` to make Claude grade the solution as `incorrectly` solved, rather than correctly solved. The goal is to practice using role prompting to make the verdict **more explicit and robust**, not to rely on a guaranteed baseline failure."
]
},
{
diff --git a/AmazonBedrock/anthropic/04_Separating_Data_and_Instructions.ipynb b/AmazonBedrock/anthropic/04_Separating_Data_and_Instructions.ipynb
index 010a847..b89de00 100755
--- a/AmazonBedrock/anthropic/04_Separating_Data_and_Instructions.ipynb
+++ b/AmazonBedrock/anthropic/04_Separating_Data_and_Instructions.ipynb
@@ -140,18 +140,18 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "Here, **Claude thinks \"Yo Claude\" is part of the email it's supposed to rewrite**! You can tell because it begins its rewrite with \"Dear Claude\". To the human eye, it's clear, particularly in the prompt template where the email begins and ends, but it becomes much less clear in the prompt after substitution."
+ "Here, **the prompt boundary is ambiguous**. On older models, Claude often treated \"Yo Claude\" as part of the email and began the rewrite with \"Dear Claude\". **Newer models may sometimes recover anyway**, but the input and instruction are still mixed together in a brittle way."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
- "How do we solve this? **Wrap the input in XML tags**! We did this below, and as you can see, there's no more \"Dear Claude\" in the output.\n",
+ "How do we solve this? **Wrap the input in XML tags**. That makes the data boundary explicit and usually gives more reliable behavior across model versions, even when the untagged prompt happens to work once by luck.\n",
"\n",
- "[XML tags](https://docs.anthropic.com/claude/docs/use-xml-tags) are angle-bracket tags like ``. They come in pairs and consist of an opening tag, such as ``, and a closing tag marked by a `/`, such as ``. XML tags are used to wrap around content, like this: `content`.\n",
+ "[XML tags](https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/use-xml-tags) are angle-bracket tags like ``. They come in pairs and consist of an opening tag, such as ``, and a closing tag marked by a `/`, such as ``. XML tags are used to wrap around content, like this: `content`.\n",
"\n",
- "**Note:** While Claude can recognize and work with a wide range of separators and delimeters, we recommend that you **use specifically XML tags as separators** for Claude, as Claude was trained specifically to recognize XML tags as a prompt organizing mechanism. Outside of function calling, **there are no special sauce XML tags that Claude has been trained on that you should use to maximally boost your performance**. We have purposefully made Claude very malleable and customizable this way."
+ "**Note:** While Claude can recognize and work with a wide range of separators and delimiters, we recommend that you **use XML tags as separators** because they make the structure obvious to both the model and the human reading the prompt."
]
},
{
@@ -177,9 +177,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "Let's see another example of how XML tags can help us. \n",
+ "Let's see another example of how XML tags can help.\n",
"\n",
- "In the following prompt, **Claude incorrectly interprets what part of the prompt is the instruction vs. the input**. It incorrectly considers `Each is about an animal, like rabbits` to be part of the list due to the formatting, when the user (the one filling out the `SENTENCES` variable) presumably did not want that."
+ "In the following prompt, the instruction and the list items are formatted almost the same way. **Depending on the model version, Claude may or may not fail every time**, but the hyphen before `Each is about an animal, like rabbits.` still makes the prompt structure unnecessarily confusing."
]
},
{
@@ -243,7 +243,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "**Note:** In the incorrect version of the \"Each is about an animal\" prompt, we had to include the hyphen to get Claude to respond incorrectly in the way we wanted to for this example. This is an important lesson about prompting: **small details matter**! It's always worth it to **scrub your prompts for typos and grammatical errors**. Claude is sensitive to patterns (in its early years, before finetuning, it was a raw text-prediction tool), and it's more likely to make mistakes when you make mistakes, smarter when you sound smart, sillier when you sound silly, and so on.\n",
+ "**Note:** In the incorrect version of the \"Each is about an animal\" prompt, we intentionally include the leading hyphen to make the formatting misleading. This is an important lesson about prompting: **small details matter**. Even when a stronger model recovers, cleaner structure makes the prompt easier to debug and more reliable across model updates.\n",
"\n",
"If you would like to experiment with the lesson prompts without changing any content above, scroll all the way to the bottom of the lesson notebook to visit the [**Example Playground**](#example-playground)."
]
@@ -317,7 +317,7 @@
"metadata": {},
"source": [
"### Exercise 4.2 - Dog Question with Typos\n",
- "Fix the `PROMPT` by adding XML tags so that Claude produces the right answer. \n",
+ "Fix the `PROMPT` by adding XML tags so that Claude produces the right answer **reliably**, not just by accident on one model version.\n",
"\n",
"Try not to change anything else about the prompt. The messy and mistake-ridden writing is intentional, so you can see how Claude reacts to such mistakes."
]
@@ -373,7 +373,7 @@
"### Exercise 4.3 - Dog Question Part 2\n",
"Fix the `PROMPT` **WITHOUT** adding XML tags. Instead, remove only one or two words from the prompt.\n",
"\n",
- "Just as with the above exercises, try not to change anything else about the prompt. This will show you what kind of language Claude can parse and understand."
+ "Just as with the above exercises, try not to change anything else about the prompt. The point is to see which parts of the surrounding noise actually make the instruction harder to parse, even if a newer model sometimes answers correctly anyway."
]
},
{
diff --git a/AmazonBedrock/boto3/03_Assigning_Roles_Role_Prompting.ipynb b/AmazonBedrock/boto3/03_Assigning_Roles_Role_Prompting.ipynb
index e78f86c..edf0a76 100755
--- a/AmazonBedrock/boto3/03_Assigning_Roles_Role_Prompting.ipynb
+++ b/AmazonBedrock/boto3/03_Assigning_Roles_Role_Prompting.ipynb
@@ -128,7 +128,7 @@
"source": [
"You can use role prompting as a way to get Claude to emulate certain styles in writing, speak in a certain voice, or guide the complexity of its answers. **Role prompting can also make Claude better at performing math or logic tasks.**\n",
"\n",
- "For example, in the example below, there is a definitive correct answer, which is yes. However, Claude gets it wrong and thinks it lacks information, which it doesn't:"
+ "For example, there is a definitive correct answer to the puzzle below: yes. **On newer models, Claude may already answer correctly without extra prompting**, so focus on how the role changes the model's **reasoning style and reliability**, not just whether the first answer is wrong."
]
},
{
@@ -148,9 +148,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "Now, what if we **prime Claude to act as a logic bot**? How will that change Claude's answer? \n",
+ "Now, what if we **prime Claude to act as a logic bot**? How will that change Claude's answer?\n",
"\n",
- "It turns out that with this new role assignment, Claude gets it right. (Although notably not for all the right reasons)"
+ "On current models, the biggest difference is often **how explicitly Claude reasons through the cases**, even when both versions land on the right final answer. That is still a useful prompt engineering win, because it makes the behavior more legible and repeatable."
]
},
{
@@ -193,9 +193,9 @@
"metadata": {},
"source": [
"### Exercise 3.1 - Math Correction\n",
- "In some instances, **Claude may struggle with mathematics**, even simple mathematics. Below, Claude incorrectly assesses the math problem as correctly solved, even though there's an obvious arithmetic mistake in the second step. Note that Claude actually catches the mistake when going through step-by-step, but doesn't jump to the conclusion that the overall solution is wrong.\n",
+ "Even simple arithmetic checks can benefit from better prompting. **Depending on the model version, Claude may already notice the mistake below**, or it may explain the arithmetic issue without clearly labeling the solution as incorrect.\n",
"\n",
- "Modify the `PROMPT` and / or the `SYSTEM_PROMPT` to make Claude grade the solution as `incorrectly` solved, rather than correctly solved. \n"
+ "Modify the `PROMPT` and / or the `SYSTEM_PROMPT` to make Claude grade the solution as `incorrectly` solved, rather than correctly solved. The goal is to practice using role prompting to make the verdict **more explicit and robust**, not to rely on a guaranteed baseline failure."
]
},
{
diff --git a/AmazonBedrock/boto3/04_Separating_Data_and_Instructions.ipynb b/AmazonBedrock/boto3/04_Separating_Data_and_Instructions.ipynb
index bce6608..bff2b06 100755
--- a/AmazonBedrock/boto3/04_Separating_Data_and_Instructions.ipynb
+++ b/AmazonBedrock/boto3/04_Separating_Data_and_Instructions.ipynb
@@ -144,18 +144,18 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "Here, **Claude thinks \"Yo Claude\" is part of the email it's supposed to rewrite**! You can tell because it begins its rewrite with \"Dear Claude\". To the human eye, it's clear, particularly in the prompt template where the email begins and ends, but it becomes much less clear in the prompt after substitution."
+ "Here, **the prompt boundary is ambiguous**. On older models, Claude often treated \"Yo Claude\" as part of the email and began the rewrite with \"Dear Claude\". **Newer models may sometimes recover anyway**, but the input and instruction are still mixed together in a brittle way."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
- "How do we solve this? **Wrap the input in XML tags**! We did this below, and as you can see, there's no more \"Dear Claude\" in the output.\n",
+ "How do we solve this? **Wrap the input in XML tags**. That makes the data boundary explicit and usually gives more reliable behavior across model versions, even when the untagged prompt happens to work once by luck.\n",
"\n",
- "[XML tags](https://docs.anthropic.com/claude/docs/use-xml-tags) are angle-bracket tags like ``. They come in pairs and consist of an opening tag, such as ``, and a closing tag marked by a `/`, such as ``. XML tags are used to wrap around content, like this: `content`.\n",
+ "[XML tags](https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/use-xml-tags) are angle-bracket tags like ``. They come in pairs and consist of an opening tag, such as ``, and a closing tag marked by a `/`, such as ``. XML tags are used to wrap around content, like this: `content`.\n",
"\n",
- "**Note:** While Claude can recognize and work with a wide range of separators and delimeters, we recommend that you **use specifically XML tags as separators** for Claude, as Claude was trained specifically to recognize XML tags as a prompt organizing mechanism. Outside of function calling, **there are no special sauce XML tags that Claude has been trained on that you should use to maximally boost your performance**. We have purposefully made Claude very malleable and customizable this way."
+ "**Note:** While Claude can recognize and work with a wide range of separators and delimiters, we recommend that you **use XML tags as separators** because they make the structure obvious to both the model and the human reading the prompt."
]
},
{
@@ -181,9 +181,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "Let's see another example of how XML tags can help us. \n",
+ "Let's see another example of how XML tags can help.\n",
"\n",
- "In the following prompt, **Claude incorrectly interprets what part of the prompt is the instruction vs. the input**. It incorrectly considers `Each is about an animal, like rabbits` to be part of the list due to the formatting, when the user (the one filling out the `SENTENCES` variable) presumably did not want that."
+ "In the following prompt, the instruction and the list items are formatted almost the same way. **Depending on the model version, Claude may or may not fail every time**, but the hyphen before `Each is about an animal, like rabbits.` still makes the prompt structure unnecessarily confusing."
]
},
{
@@ -247,7 +247,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "**Note:** In the incorrect version of the \"Each is about an animal\" prompt, we had to include the hyphen to get Claude to respond incorrectly in the way we wanted to for this example. This is an important lesson about prompting: **small details matter**! It's always worth it to **scrub your prompts for typos and grammatical errors**. Claude is sensitive to patterns (in its early years, before finetuning, it was a raw text-prediction tool), and it's more likely to make mistakes when you make mistakes, smarter when you sound smart, sillier when you sound silly, and so on.\n",
+ "**Note:** In the incorrect version of the \"Each is about an animal\" prompt, we intentionally include the leading hyphen to make the formatting misleading. This is an important lesson about prompting: **small details matter**. Even when a stronger model recovers, cleaner structure makes the prompt easier to debug and more reliable across model updates.\n",
"\n",
"If you would like to experiment with the lesson prompts without changing any content above, scroll all the way to the bottom of the lesson notebook to visit the [**Example Playground**](#example-playground)."
]
@@ -321,7 +321,7 @@
"metadata": {},
"source": [
"### Exercise 4.2 - Dog Question with Typos\n",
- "Fix the `PROMPT` by adding XML tags so that Claude produces the right answer. \n",
+ "Fix the `PROMPT` by adding XML tags so that Claude produces the right answer **reliably**, not just by accident on one model version.\n",
"\n",
"Try not to change anything else about the prompt. The messy and mistake-ridden writing is intentional, so you can see how Claude reacts to such mistakes."
]
@@ -377,7 +377,7 @@
"### Exercise 4.3 - Dog Question Part 2\n",
"Fix the `PROMPT` **WITHOUT** adding XML tags. Instead, remove only one or two words from the prompt.\n",
"\n",
- "Just as with the above exercises, try not to change anything else about the prompt. This will show you what kind of language Claude can parse and understand."
+ "Just as with the above exercises, try not to change anything else about the prompt. The point is to see which parts of the surrounding noise actually make the instruction harder to parse, even if a newer model sometimes answers correctly anyway."
]
},
{
diff --git a/Anthropic 1P/03_Assigning_Roles_Role_Prompting.ipynb b/Anthropic 1P/03_Assigning_Roles_Role_Prompting.ipynb
index 3b67ae3..389c5f1 100644
--- a/Anthropic 1P/03_Assigning_Roles_Role_Prompting.ipynb
+++ b/Anthropic 1P/03_Assigning_Roles_Role_Prompting.ipynb
@@ -118,7 +118,7 @@
"source": [
"You can use role prompting as a way to get Claude to emulate certain styles in writing, speak in a certain voice, or guide the complexity of its answers. **Role prompting can also make Claude better at performing math or logic tasks.**\n",
"\n",
- "For example, in the example below, there is a definitive correct answer, which is yes. However, Claude gets it wrong and thinks it lacks information, which it doesn't:"
+ "For example, there is a definitive correct answer to the puzzle below: yes. **On newer models, Claude may already answer correctly without extra prompting**, so focus on how the role changes the model's **reasoning style and reliability**, not just whether the first answer is wrong."
]
},
{
@@ -138,9 +138,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "Now, what if we **prime Claude to act as a logic bot**? How will that change Claude's answer? \n",
+ "Now, what if we **prime Claude to act as a logic bot**? How will that change Claude's answer?\n",
"\n",
- "It turns out that with this new role assignment, Claude gets it right. (Although notably not for all the right reasons)"
+ "On current models, the biggest difference is often **how explicitly Claude reasons through the cases**, even when both versions land on the right final answer. That is still a useful prompt engineering win, because it makes the behavior more legible and repeatable."
]
},
{
@@ -183,9 +183,9 @@
"metadata": {},
"source": [
"### Exercise 3.1 - Math Correction\n",
- "In some instances, **Claude may struggle with mathematics**, even simple mathematics. Below, Claude incorrectly assesses the math problem as correctly solved, even though there's an obvious arithmetic mistake in the second step. Note that Claude actually catches the mistake when going through step-by-step, but doesn't jump to the conclusion that the overall solution is wrong.\n",
+ "Even simple arithmetic checks can benefit from better prompting. **Depending on the model version, Claude may already notice the mistake below**, or it may explain the arithmetic issue without clearly labeling the solution as incorrect.\n",
"\n",
- "Modify the `PROMPT` and / or the `SYSTEM_PROMPT` to make Claude grade the solution as `incorrectly` solved, rather than correctly solved. \n"
+ "Modify the `PROMPT` and / or the `SYSTEM_PROMPT` to make Claude grade the solution as `incorrectly` solved, rather than correctly solved. The goal is to practice using role prompting to make the verdict **more explicit and robust**, not to rely on a guaranteed baseline failure."
]
},
{
diff --git a/Anthropic 1P/04_Separating_Data_and_Instructions.ipynb b/Anthropic 1P/04_Separating_Data_and_Instructions.ipynb
index 7d81fd1..fe67f48 100644
--- a/Anthropic 1P/04_Separating_Data_and_Instructions.ipynb
+++ b/Anthropic 1P/04_Separating_Data_and_Instructions.ipynb
@@ -134,18 +134,18 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "Here, **Claude thinks \"Yo Claude\" is part of the email it's supposed to rewrite**! You can tell because it begins its rewrite with \"Dear Claude\". To the human eye, it's clear, particularly in the prompt template where the email begins and ends, but it becomes much less clear in the prompt after substitution."
+ "Here, **the prompt boundary is ambiguous**. On older models, Claude often treated \"Yo Claude\" as part of the email and began the rewrite with \"Dear Claude\". **Newer models may sometimes recover anyway**, but the input and instruction are still mixed together in a brittle way."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
- "How do we solve this? **Wrap the input in XML tags**! We did this below, and as you can see, there's no more \"Dear Claude\" in the output.\n",
+ "How do we solve this? **Wrap the input in XML tags**. That makes the data boundary explicit and usually gives more reliable behavior across model versions, even when the untagged prompt happens to work once by luck.\n",
"\n",
- "[XML tags](https://docs.anthropic.com/claude/docs/use-xml-tags) are angle-bracket tags like ``. They come in pairs and consist of an opening tag, such as ``, and a closing tag marked by a `/`, such as ``. XML tags are used to wrap around content, like this: `content`.\n",
+ "[XML tags](https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/use-xml-tags) are angle-bracket tags like ``. They come in pairs and consist of an opening tag, such as ``, and a closing tag marked by a `/`, such as ``. XML tags are used to wrap around content, like this: `content`.\n",
"\n",
- "**Note:** While Claude can recognize and work with a wide range of separators and delimeters, we recommend that you **use specifically XML tags as separators** for Claude, as Claude was trained specifically to recognize XML tags as a prompt organizing mechanism. Outside of function calling, **there are no special sauce XML tags that Claude has been trained on that you should use to maximally boost your performance**. We have purposefully made Claude very malleable and customizable this way."
+ "**Note:** While Claude can recognize and work with a wide range of separators and delimiters, we recommend that you **use XML tags as separators** because they make the structure obvious to both the model and the human reading the prompt."
]
},
{
@@ -171,9 +171,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "Let's see another example of how XML tags can help us. \n",
+ "Let's see another example of how XML tags can help.\n",
"\n",
- "In the following prompt, **Claude incorrectly interprets what part of the prompt is the instruction vs. the input**. It incorrectly considers `Each is about an animal, like rabbits` to be part of the list due to the formatting, when the user (the one filling out the `SENTENCES` variable) presumably did not want that."
+ "In the following prompt, the instruction and the list items are formatted almost the same way. **Depending on the model version, Claude may or may not fail every time**, but the hyphen before `Each is about an animal, like rabbits.` still makes the prompt structure unnecessarily confusing."
]
},
{
@@ -237,7 +237,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "**Note:** In the incorrect version of the \"Each is about an animal\" prompt, we had to include the hyphen to get Claude to respond incorrectly in the way we wanted to for this example. This is an important lesson about prompting: **small details matter**! It's always worth it to **scrub your prompts for typos and grammatical errors**. Claude is sensitive to patterns (in its early years, before finetuning, it was a raw text-prediction tool), and it's more likely to make mistakes when you make mistakes, smarter when you sound smart, sillier when you sound silly, and so on.\n",
+ "**Note:** In the incorrect version of the \"Each is about an animal\" prompt, we intentionally include the leading hyphen to make the formatting misleading. This is an important lesson about prompting: **small details matter**. Even when a stronger model recovers, cleaner structure makes the prompt easier to debug and more reliable across model updates.\n",
"\n",
"If you would like to experiment with the lesson prompts without changing any content above, scroll all the way to the bottom of the lesson notebook to visit the [**Example Playground**](#example-playground)."
]
@@ -311,7 +311,7 @@
"metadata": {},
"source": [
"### Exercise 4.2 - Dog Question with Typos\n",
- "Fix the `PROMPT` by adding XML tags so that Claude produces the right answer. \n",
+ "Fix the `PROMPT` by adding XML tags so that Claude produces the right answer **reliably**, not just by accident on one model version.\n",
"\n",
"Try not to change anything else about the prompt. The messy and mistake-ridden writing is intentional, so you can see how Claude reacts to such mistakes."
]
@@ -367,7 +367,7 @@
"### Exercise 4.3 - Dog Question Part 2\n",
"Fix the `PROMPT` **WITHOUT** adding XML tags. Instead, remove only one or two words from the prompt.\n",
"\n",
- "Just as with the above exercises, try not to change anything else about the prompt. This will show you what kind of language Claude can parse and understand."
+ "Just as with the above exercises, try not to change anything else about the prompt. The point is to see which parts of the surrounding noise actually make the instruction harder to parse, even if a newer model sometimes answers correctly anyway."
]
},
{
diff --git a/scripts/check_model_drift_notes.py b/scripts/check_model_drift_notes.py
new file mode 100644
index 0000000..4d07644
--- /dev/null
+++ b/scripts/check_model_drift_notes.py
@@ -0,0 +1,24 @@
+from pathlib import Path
+
+checks = {
+ '03_Assigning_Roles_Role_Prompting.ipynb': [
+ 'On newer models, Claude may already answer correctly without extra prompting',
+ 'Depending on the model version, Claude may already notice the mistake below',
+ ],
+ '04_Separating_Data_and_Instructions.ipynb': [
+ 'Newer models may sometimes recover anyway',
+ 'Depending on the model version, Claude may or may not fail every time',
+ 'produces the right answer **reliably**',
+ ],
+}
+
+for path in Path('.').rglob('*.ipynb'):
+ snippets = checks.get(path.name)
+ if not snippets:
+ continue
+ text = path.read_text()
+ for snippet in snippets:
+ if snippet not in text:
+ raise SystemExit(f'missing snippet in {path}: {snippet}')
+
+print('model drift notes present in all notebook variants')
diff --git a/scripts/update_model_drift_notes.py b/scripts/update_model_drift_notes.py
new file mode 100644
index 0000000..bc77738
--- /dev/null
+++ b/scripts/update_model_drift_notes.py
@@ -0,0 +1,56 @@
+import json
+from pathlib import Path
+
+replacements = {
+ '03_Assigning_Roles_Role_Prompting.ipynb': {
+ 7: """You can use role prompting as a way to get Claude to emulate certain styles in writing, speak in a certain voice, or guide the complexity of its answers. **Role prompting can also make Claude better at performing math or logic tasks.**
+
+For example, there is a definitive correct answer to the puzzle below: yes. **On newer models, Claude may already answer correctly without extra prompting**, so focus on how the role changes the model's **reasoning style and reliability**, not just whether the first answer is wrong.""",
+ 9: """Now, what if we **prime Claude to act as a logic bot**? How will that change Claude's answer?
+
+On current models, the biggest difference is often **how explicitly Claude reasons through the cases**, even when both versions land on the right final answer. That is still a useful prompt engineering win, because it makes the behavior more legible and repeatable.""",
+ 13: """### Exercise 3.1 - Math Correction
+Even simple arithmetic checks can benefit from better prompting. **Depending on the model version, Claude may already notice the mistake below**, or it may explain the arithmetic issue without clearly labeling the solution as incorrect.
+
+Modify the `PROMPT` and / or the `SYSTEM_PROMPT` to make Claude grade the solution as `incorrectly` solved, rather than correctly solved. The goal is to practice using role prompting to make the verdict **more explicit and robust**, not to rely on a guaranteed baseline failure.""",
+ },
+ '04_Separating_Data_and_Instructions.ipynb': {
+ 8: """Here, **the prompt boundary is ambiguous**. On older models, Claude often treated \"Yo Claude\" as part of the email and began the rewrite with \"Dear Claude\". **Newer models may sometimes recover anyway**, but the input and instruction are still mixed together in a brittle way.""",
+ 9: """How do we solve this? **Wrap the input in XML tags**. That makes the data boundary explicit and usually gives more reliable behavior across model versions, even when the untagged prompt happens to work once by luck.
+
+[XML tags](https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/use-xml-tags) are angle-bracket tags like ``. They come in pairs and consist of an opening tag, such as ``, and a closing tag marked by a `/`, such as ``. XML tags are used to wrap around content, like this: `content`.
+
+**Note:** While Claude can recognize and work with a wide range of separators and delimiters, we recommend that you **use XML tags as separators** because they make the structure obvious to both the model and the human reading the prompt.""",
+ 11: """Let's see another example of how XML tags can help.
+
+In the following prompt, the instruction and the list items are formatted almost the same way. **Depending on the model version, Claude may or may not fail every time**, but the hyphen before `Each is about an animal, like rabbits.` still makes the prompt structure unnecessarily confusing.""",
+ 15: """**Note:** In the incorrect version of the \"Each is about an animal\" prompt, we intentionally include the leading hyphen to make the formatting misleading. This is an important lesson about prompting: **small details matter**. Even when a stronger model recovers, cleaner structure makes the prompt easier to debug and more reliable across model updates.
+
+If you would like to experiment with the lesson prompts without changing any content above, scroll all the way to the bottom of the lesson notebook to visit the [**Example Playground**](#example-playground).""",
+ 21: """### Exercise 4.2 - Dog Question with Typos
+Fix the `PROMPT` by adding XML tags so that Claude produces the right answer **reliably**, not just by accident on one model version.
+
+Try not to change anything else about the prompt. The messy and mistake-ridden writing is intentional, so you can see how Claude reacts to such mistakes.""",
+ 25: """### Exercise 4.3 - Dog Question Part 2
+Fix the `PROMPT` **WITHOUT** adding XML tags. Instead, remove only one or two words from the prompt.
+
+Just as with the above exercises, try not to change anything else about the prompt. The point is to see which parts of the surrounding noise actually make the instruction harder to parse, even if a newer model sometimes answers correctly anyway.""",
+ },
+}
+
+for path in Path('.').rglob('*.ipynb'):
+ name = path.name
+ if name not in replacements:
+ continue
+ nb = json.loads(path.read_text())
+ changed = False
+ for idx, new_src in replacements[name].items():
+ cell = nb['cells'][idx]
+ if cell['cell_type'] != 'markdown':
+ raise RuntimeError(f'{path}:{idx} expected markdown, got {cell["cell_type"]}')
+ lines = new_src.split('\n')
+ cell['source'] = [line + ('\n' if i < len(lines) - 1 else '') for i, line in enumerate(lines)]
+ changed = True
+ if changed:
+ path.write_text(json.dumps(nb, ensure_ascii=False, indent=1) + '\n')
+ print(f'updated {path}')