← BlogApril 11, 2026

Prompting best practices

Effective prompting means shaping model behavior with clear instructions, examples, structure, and constraints, so it can reason, act, verify, and respond reliably.

agent prompt

14 min read 2852 words

Prompting That Actually Works

Prompt engineering is no longer just about getting nicer text. In Anthropic’s guide, prompting becomes a way to control behavior: how a model interprets instructions, uses tools, manages long context, reasons, persists state, and acts inside agentic systems. OpenAI’s guide reinforces the same foundation from a simpler angle: put instructions first, be specific, show the desired format, and tighten vague language. Read together, the message is clear: good prompts reduce ambiguity at every layer of the task.

1. General Principles: Make the Task Executable

Anthropic’s first principle is the right starting point: write prompts as if you were briefing a brilliant new teammate. If a person with little context would hesitate, the model will too. OpenAI complements this with the practical rule to put instructions first and separate them from context clearly. The shared idea is simple: models do better when the task, constraints, and desired outcome are easy to parse.

Be clear and direct

Less effective ❌:

Create an analytics dashboard.

Better ✅:

Create an analytics dashboard. Include as many relevant features and interactions as possible. Go beyond the basics to create a fully-featured implementation.

Why: vague prompts force the model to guess the quality bar. Better prompts define ambition, scope, and depth up front. Anthropic emphasizes explicitness; OpenAI similarly recommends being specific about outcome, length, style, and format.

Add context that explains the constraint

Less effective ❌:

NEVER use ellipses.

Better ✅:

Your response will be read aloud by a text-to-speech engine, so never use ellipses since the text-to-speech engine will not know how to pronounce them.

Why: a rule becomes much stronger when the model understands the reason behind it. Anthropic’s point is not just “add context,” but “add motivation,” because the model can generalize from the explanation.

Put instructions first and separate the input

Less effective ❌:

Summarize the text below as a bullet point list of the most important points. {text input here}

Better ✅:

Summarize the text below as a bullet point list of the most important points.

Text: """
{text input here}
"""

Why: OpenAI’s version is a lightweight formatting discipline. Anthropic pushes the same idea further with XML tags. In both cases, the model performs better when the task and the data are cleanly separated.

Use examples as anchors, not decoration

Less effective ❌:

Extract the entities mentioned in the text below. Extract the following 4 entity types:
company names, people names, specific topics and themes. Text: {text}

Better ✅:

Extract the important entities mentioned in the text below. First extract all company names, then extract all people names, then extract specific topics which fit the content and finally extract general overarching themes.

Desired format:
Company names: <comma_separated_list_of_company_names>
People names: <comma_separated_list_of_people_names>
Specific topics: <comma_separated_list_of_specific_topics>
General themes: <comma_separated_list_of_general_themes>

Text: {text}

Why: OpenAI frames this as “show, and tell.” Anthropic adds that examples should be relevant, diverse, and clearly wrapped, ideally in <example> tags, because examples do not just show format — they teach the model what distinctions matter.

Structure prompts with XML when tasks get complex

Instead of piling instructions, data, examples, and variables into one blob, Anthropic recommends explicit structure such as <instructions>, <context>, <input>, and nested <documents>. This is one of the most useful ideas in the whole guide: XML is not cosmetic, it is a way to reduce misinterpretation. OpenAI’s delimiter advice is the simpler version of the same principle.

Give the model a role

A short role prompt can compress many downstream choices. Anthropic’s example, “You are a helpful coding assistant specializing in Python,” works because it narrows tone, domain, and default assumptions at once. The know-how here is that a role should define judgment, not just personality.

2. Long Context Prompting: Put Evidence Before the Question

Anthropic goes much deeper than OpenAI here. For large inputs, it recommends placing long documents at the top, leaving the actual query later, and structuring each document with metadata. The reason is practical: the model should build its understanding of the evidence before it is pushed into answer mode. It also recommends asking the model to quote relevant passages first, which turns long-context work into an evidence-first workflow rather than a guess-first workflow.

Structure multi-document inputs

Less effective ❌:

What are the key takeaways from these files?
[paste a pile of documents]

Better ✅:

<documents>
  <document index="1">
    <source>annual_report_2023.pdf</source>
    <document_content>
    {{ANNUAL_REPORT}}
    </document_content>
  </document>
  <document index="2">
    <source>competitor_analysis_q2.xlsx</source>
    <document_content>
    {{COMPETITOR_ANALYSIS}}
    </document_content>
  </document>
</documents>

Analyze the annual report and competitor analysis. Identify strategic advantages and recommend Q3 focus areas.

Why: structured inputs reduce source confusion. Anthropic explicitly recommends <document>, <document_content>, and <source> tags for multi-document prompts.

Ask for quotes before synthesis

Less effective ❌:

Diagnose the patient from these records.

Better ✅:

Find quotes from the patient records and appointment history that are relevant to diagnosing the patient's reported symptoms. Place these in <quotes> tags. Then, based on these quotes, list all information that would help the doctor diagnose the patient's symptoms. Place your diagnostic information in <info> tags.

Why: quoting first forces the model to ground its later reasoning. This is one of Anthropic’s strongest practical tricks for noisy long-context tasks.

Model self-knowledge matters when identity or API strings matter

Anthropic includes a small but useful section on telling the model exactly how to identify itself and which model string to use. This is not a general creativity trick; it is a precision tool for applications where naming and routing must be correct.

3. Output and Formatting: Define the Surface, Not Just the Content

Anthropic observes that newer models are more concise, more conversational, and less likely to narrate every step unless asked. It also gives a much richer set of formatting controls than OpenAI: tell the model what to do instead of what not to do, match prompt style to desired output style, and use detailed formatting instructions when needed. OpenAI supports the same direction with its advice to reduce fluffy wording and articulate the output format explicitly.

Ask for summaries after tool use if you want visibility

If a model feels too silent after tool calls, Anthropic suggests asking for a quick summary of the work done. The deeper lesson is that verbosity is itself promptable. If you want more observability, ask for it explicitly.

Say what to do, not just what to avoid

Less effective ❌:

Do not use markdown in your response.

Better ✅:

Your response should be composed of smoothly flowing prose paragraphs.

Why: negative rules block behavior; positive rules define a target. OpenAI makes the same point in a customer-support example: instead of only banning PII questions, tell the agent what safe path to follow instead.

Reduce fluffy descriptions

Less effective ❌:

The description for this product should be fairly short, a few sentences only, and not too much more.

Better ✅:

Use a 3 to 5 sentence paragraph to describe this product.

Why: “short” is subjective. “3 to 5 sentences” is executable. OpenAI’s advice here is small but fundamental: convert fuzzy preferences into measurable constraints.

Use stronger formatting controls when needed

Anthropic adds three advanced moves: use XML format indicators, make the prompt visually resemble the output you want, and provide detailed style instructions when output quality really matters. This is especially useful for long-form writing, technical explanations, and reports. It also includes a dedicated plain-text math instruction for cases where LaTeX is undesirable, and a document-creation prompt for presentations or visual docs.

Migrate away from prefills

Anthropic’s prefill section is really about modernization. Older workflows used prefilled assistant text to force schema, skip preambles, continue partial outputs, or rehydrate context. Anthropic’s message is that newer models usually do not need this. Use direct instructions, structured outputs, tools, retries, or user-turn continuations instead. This is less a prompt trick than a change in system design philosophy.

4. Tool Use: Prompt for Action, Not Just Advice

This section is one of Anthropic’s clearest contributions. If you ask vaguely, the model may explain. If you ask concretely, it may act. The operational question is: do you want a consultant or an operator? Anthropic shows how to set that default explicitly.

Be explicit when you want action

Less effective ❌:

Can you suggest some changes to improve this function?

Better ✅:

Change this function to improve its performance.

Why: “suggest” implies discussion. “change” implies execution. Anthropic treats this as a major source of confusion in tool-using systems.

Set the default agency level

Less effective ❌:

Help with code changes when appropriate.

Better ✅:

<default_to_action>
By default, implement changes rather than only changes rather than only suggesting them. If the user's intent is unclear, infer the most useful likely action and proceed, using tools to discover any missing details instead of guessing. Try to infer the user's intent about whether a tool call (e.g., file edit or read) is intended or not, and act accordingly.
</default_to_action>

Better, if you want caution ✅:

<do_not_act_before_instructions>
Do not jump into implementatation or changes files unless clearly instructed to make changes. When the user's intent is ambiguous, default to providing information, doing research, and providing recommendations rather than taking action. Only proceed with edits, modifications, or implementations when the user explicitly requests them.
</do_not_act_before_instructions>

Why: the important insight is not the exact wording but the existence of a default policy. Agentic behavior becomes much more reliable when the model knows whether ambiguity should resolve towt.

Use parallel tools when independence exists

Anthropic notes that modern models are good at parallel tool calling, but only when dependencies do not exist. The deep rule is simple: parallelize independent work, sequence dependent work. More parallelism is not always smarter; it is only smarter when tt.

5. Thinking and Reasoning: Control Search, Not Just Output

Anthropic’s reasoning section is really about controlling exploration. Stronger models can overthink, oversearch, or expand into unnecessary branches. The fix is not merely “use fewer tokens.” It is to guide how the model commits, reflects, and escalates reasoning. OpenAI does not go this far, but its advice to start with zero-shot, then few-shot, and only escalate when needed matches the same philosophy oft

Prevent overthinking by constraining decision behavior

Less effective ❌:

Think very thoroughly about every possible approach.

Better ✅:

When you're deciding how to approach a problem, choose an approach and commit to it. Avoid revisiting decisions unless you encounter new information that directly contradicts your reasoning. If you're weighing two approaches, pick one and see it through. You can always course-correct later if the chosen approach fails.

Why: this prompt limits branching. It shapes the search process, which is often more important than limh.

Use adaptive thinking where the workload is uneven

Anthropic recommends adaptive thinking for tasks that mix easy and hard steps, especially multi-step tool use, complex coding, and long-horizon loops. The key idea is that the model should decide when deeper reasoning is worth the cost. It also notes that general instructions often work better than hand-written reasoning scripts, that few-shot reasoning examples can teach thinking style, and that self-check instructions y.

Start simple, then escalate guidance

OpenAI’s zero-shot → few-shot → fine-tune ladder fits naturally here. It is a practical reminder not to overengineer the prompt on day one. Start with the simplest instruction that could work, then add examples or heavier med.

6. Agentic Systems: Prompting Becomes Workflow Design

This is where Anthropic’s guide becomes much richer than a typical prompt article. Once the model is working across many steps, tools, files, or context windows, the prompt is no longer just a text instruction. It becomes a policy for persistence, delegation, verificatioy

Long-horizon work needs resumable state

Anthropic recommends telling the model not to wrap up early as the context limit approaches, but to save state and continue systematically. The important idea is that long tasks should survive interruption. In practice, that means external memory, progress files, tests, y.

Multi-window workflows should be designed, not improvised

Anthropic keeps several useful patterns here: use the first window differently from later ones, ask the model to write tests in structured form, create setup scripts for repeated workflows, sometimes prefer starting fresh over compacting, provide verification tools, and encourage the model to use the full context window productively. These are not just prompt tweaks. They are habits for bus.

State management should separate structure from narrative

Structured state like test results works best in JSON or similar schemas. General progress notes work well as freeform text. Git works well as a timeline of work and checkpoints. Anthropic’s deeper point is that good agents do not just act; they leae.

Use reversibility as the safety rule

Less effective ❌:

Be careful with risky actions.

Better ✅:

Consider the reversibility and potential impact of your actions. You are encouraged to take local, reversible actions like editing files or running tests, but for actions that are hard to reverse, affect shared systems, or could be destructive, ask the user before proceeding.

Why: “be careful” is vague. “Escalate irreversible actions” is a usable policy. This is one of the strongest safety formulationt.

Research should manage uncertainty, not just retrieve answers

Less effective ❌:

Research this topic and give me the answer.

Better ✅:

Search for this information in a structured way. As you gather data, develop several competing hypotheses. Track your confidence levels in your progress notes to improve calibration. Regularly self-critique your approach and plan. Update a hypothesis tree or research notes file to persist information and provide transparency. Break down this complex research task systematically.

Why: Anthropic reframes research as iterative hypothesis management. That is much more robust than treating it as s.

Subagents are useful only when decomposition helps

Anthropic notes that newer models may delegate naturally, but warns against overuse. Subagents make sense when work can happen in parallel or in isolated contexts. They do not help when the task is simple, sequential, or context-coupled. That is an important design insight: decomposition is not aute.

Prompt chaining still matters when you need checkpoints

Even if the model can reason internally, explicit chains are useful when you want to inspect the intermediate draft, critique, and revision. Anthropic’s self-correction pipeline — generate, review, refine — is really about observabily.

Good coding agents need restraint

Anthropic includes several practical controls for coding agents: clean up temporary files if created, avoid overengineering, do not optimize only for passing tests, and never speculate about code without reading it. Together, these recommendations say something important: stronger agents do not just need more power, they need tightere.

7. Capability-Specific Tips: Prompt the Modality, Not Just the Task

Anthropic adds two specialized areas that are easy to overlook. First, improved vision performance: models handle images, screenshots, and extracted visual information better, and can benefit from a crop or zoom-like tool. Second, frontend design: without guidance, models drift toward generic “AI slop,” so the prompt should explicitly push typography, color, motion, backgrounds, and distinctive aesthetic choices. The insight is that multimodal and design tasks need domain-specific prompting.

Use leading words for code and other structured outputs

OpenAI adds a smaller but useful version of this idea: hints like import for Python or SELECT for SQL can nudge the model into the right pattern. This works because the prompt is not only telling the model what to do; it is also showing how n.

8. Migration Considerations: Retune Prompts for Newer Models

Anthropic’s migration section is a strong reminder that prompt advice expires. Newer models are more proactive, more capable, and often more eager. Instructions that once fixed under-triggering may now cause over-triggering. It recommends being more specific about desired behavior, asking for extra polish or features explicitly, updating from manual thinking to adaptive thinking, moving away from prefills, and dialing back older anti-laziness language. It also gives concrete effort guidance for Sonnet 4.6 and explains whene.

OpenAI’s “use the latest model” advice belongs here as a supplement. It sounds basic, but it matters because newer models usually need less prompt scaffolding, not more. You should not assume the old prompt remains optimal juss.

Use parameters deliberately

OpenAI closes with a small but important reminder: model choice, temperature, max completion tokens, and stop sequences affect output behavior. Temperature especially matters: higher temperature can increase creativity, but not truthfulness; for factual extraction or Q&A, lower temperature is generally safer. This fits Anthropic’s broader theme that output quality comes from controlling behavior, not just write.

Final Insight: Prompting Is Now Behavioral Design

The deepest lesson from Anthropic’s article is that prompting is not just wording. It is behavioral design. You are deciding how the model should read instructions, how much context it should gather, whether it should act or advise, when it should reflect, how it should persist progress, when it should ask for confirmation, and how it should avoid common failure modes. OpenAI’s guide strengthens the foundation by showing how specificity, examples, formatting, and measurable constraints improve even ordinary prompts. Together, the two guides suggest a mature standard for prompt engineering: *be explicit about the task, explicit about the format, and explicit about the mod

If you want, I can turn this into an even tighter publish-ready blog draft with a sharper intro, cleaner transitions, and a stronger closing.