The Complete Prompt Engineering Guide for Claude

TL;DR

A good prompt shapes the model's behavior — role, context, format, constraints. Specificity and examples do most of the work. CoT, ReAct, RAG, and PAL handle the hard cases. Know which one to reach for.

Prompt engineering is the practice of designing and optimizing inputs so that a model produces what you actually want. The same task phrased differently produces dramatically different results, and the gap between a careless prompt and a deliberate one is often bigger than the gap between two models.

This guide covers the full stack: the anatomy, the generation knobs, and every major technique — with notes on where each one is worth the cost and where it isn't.

Anatomy of a prompt

Most well-structured prompts have four parts. You don't always need all of them:

Instruction — what needs to be done
Context — background the model needs to make a good decision
Input data — the specific item the task operates on
Output indicator — the format or shape of the expected answer

Specificity beats cleverness

Bad: "Write something about marketing."

Good: "Write three email campaign ideas for a B2B SaaS targeting small businesses. Each idea: 2–3 sentences."

Role instructions

Assigning a role reliably improves quality in specialized domains:

You are an expert marketing copywriter with 10 years of experience
in B2B SaaS. Write a compelling product description for...

Format enforcement

State the output shape explicitly. "Respond in JSON with fields name, description, price." "Return a bulleted list." "Use a Markdown table for comparison." The model will mostly comply.

Generation parameters

Temperature (0.0–0.3): deterministic output. Use for facts, code, classification.
Temperature (0.7–1.0): creative output. Use for brainstorming and writing.
Top-p: limits sampling to tokens whose cumulative probability fits under p. top_p=0.9 is a safe default.
Max tokens: cap the output. Don't leave it unset in production or you'll truncate answers mid-sentence when someone asks a big question.

Zero-shot and few-shot

Zero-shot

Ask the model to do the task with no examples. Works well for things the model has clearly seen in training — sentiment, basic classification, summarization. For anything with a specific format or domain nuance, you'll need examples.

Few-shot

A few examples ("shots") sharply raise quality. The rules:

Make examples diverse and representative
Order matters — later examples carry more weight
3–8 examples is the sweet spot for most tasks; more isn't always better

Review: "This is awesome!"          Sentiment: Positive
Review: "This is bad!"              Sentiment: Negative
Review: "The movie was okay."       Sentiment: Neutral
Review: "The food here is exceptional!"  Sentiment:

Few-shot isn't free. A 2023 job-classification study found few-shot CoT performed worse than zero-shot for tasks that don't require expert reasoning. Test before you commit.

Chain-of-Thought (CoT)

Induce the model to think aloud before answering. Critical for tasks that need reasoning — arithmetic, logic, multi-hop questions.

Without CoT:

Q: Roger has 5 tennis balls. He buys 2 more cans of balls.
Each can has 3 balls. How many does he have now?
A: 11

With CoT:

A: Roger started with 5. 2 cans x 3 balls = 6 new balls.
5 + 6 = 11. The answer is 11.

Zero-shot CoT

A single phrase — "Let's think step by step" — significantly improves math and logic results without any example prompts. Cheap and effective.

Self-Consistency

Run the same CoT prompt multiple times with temperature > 0, then take a majority vote. Expensive in tokens, but worth it for high-stakes reasoning tasks where correctness matters more than cost.

Beyond CoT: structured reasoning

Tree of Thoughts (ToT)

The model explores several reasoning paths in parallel — generate multiple candidate "thoughts" at each step, evaluate which are promising, continue the best. Useful for planning, complex code, strategic choices.

Generated Knowledge Prompting

Two-step: first ask the model to generate facts about the topic, then use those facts as context for the real question. Reduces hallucinations and improves accuracy in domain-specific questions.

Prompt Chaining

Break a hard task into a sequence of prompts — output of one becomes input of the next. Classic example: document QA in two steps — extract relevant quotes, then form the answer from those quotes. Easier to debug, more transparent, often cheaper than one giant prompt.

Tool-augmented prompting

ReAct — Reasoning + Acting

The model alternates between reasoning (Thought) and actions (Action), observing the result before the next step:

Thought: I need to look up Colorado orogeny...
Action: Search[Colorado orogeny]
Observation: [result]

Thought: Eastern sector extends into the High Plains.
Action: Search[High Plains elevation]
Observation: 1,800 to 7,000 ft.

Action: Finish[1,800 to 7,000 ft]

ReAct sharply reduces hallucinations by grounding reasoning in external observations. It's the basis of most agent frameworks.

RAG — Retrieval Augmented Generation

Combine vector search with generation. The model receives relevant context from your documents before answering. The flow: user query → vector DB search → top-K docs → prompt with context → answer.

Up-to-date data without retraining
Source citations out of the box
Full control over what the model "knows"

PAL — Program-Aided Language Models

Instead of answering in text, the model writes a program (usually Python) that produces the answer. Calculation errors disappear because you delegate math to an interpreter.

# I have 5 rows x 8 columns of plants. 3 are diseased. How many remain?
total = 5 * 8     # 40
remaining = total - 3
print(remaining)  # 37

Function calling

Modern models detect when a function should be called and return structured JSON with the arguments. "What's the weather in London?" → the model returns get_current_weather(location="London") → your code calls the API → you pass the result back for the final answer. This is how production LLM apps connect to the outside world.

Advanced and niche techniques

Reflexion

The agent tries a task, gets feedback (error or score), writes a "lesson" into a verbal memory, and retries. Dramatically improves results on multi-step coding and reasoning loops.

Directional Stimulus Prompting

Add a "stimulus" — a hint or keyword — to steer generation. "Summarize the article. Focus on: climate impact, economic costs, policy recommendations." Small change, large effect on relevance.

Automatic Prompt Engineer (APE)

Use an LLM to generate and evaluate prompts automatically: generate candidates → score them on a validation set → pick the best → iterate. Worth setting up for any prompt you'll run at scale.

Prompt Functions

Package prompts as reusable functions with a name, input, and rule — then compose them: fix_english(expand_word(trans_word('original text'))). Turns ad-hoc prompts into composable building blocks.

LLM applications

Code generation

A system message sets the behavior: "You are a helpful code assistant. Language: Python. Don't explain — return only the code block." From there, the model handles generation from comments, function completion, SQL from schema descriptions, and code explanation.

Synthetic data

LLMs are excellent at generating evaluation and training data. The key for dataset diversity: define variable parameters (vocabulary, themes, features), randomize the combinations, and keep temperature above default. Cost: one study produced 50,000 RAG query pairs for about $55 — far cheaper than manual labeling.

Context caching

Modern API features let you load a large context once and run many queries against it cheaply. Useful for analyzing hundreds of documents: load summaries once, then query interactively without re-sending the payload every time.

Measure, don't guess. A 2023 study moved F1 from 65.6 to 91.7 by iterating on instructions, not by switching models. Reiterating key points was one of the biggest drivers. Your prompts deserve the same treatment as code — version, test, measure.

The bottom line

Prompt engineering isn't magic syntax. It's the same skill as writing clear technical specs: state the goal, give context, show the format, and provide examples when the model might guess wrong.

Reach for advanced techniques only when the simple ones hit a wall. CoT for reasoning. RAG for fresh or private knowledge. ReAct when tools are involved. PAL when math matters. Everything else is usually solved by being more specific.

For quick-reference prompts and Claude-specific shortcuts, check the prompt engineering cheatsheet or the full tools guide.

Want the full Claude Code reference? Open the cheatsheet →

Anatomy of a prompt

Specificity beats cleverness

Role instructions

Format enforcement

Generation parameters

Zero-shot and few-shot

Zero-shot

Few-shot

Chain-of-Thought (CoT)

Zero-shot CoT

Self-Consistency

Beyond CoT: structured reasoning

Tree of Thoughts (ToT)

Generated Knowledge Prompting

Prompt Chaining

Tool-augmented prompting

ReAct — Reasoning + Acting

RAG — Retrieval Augmented Generation

PAL — Program-Aided Language Models

Function calling

Advanced and niche techniques

Reflexion

Directional Stimulus Prompting

Automatic Prompt Engineer (APE)

Prompt Functions

LLM applications

Code generation

Synthetic data

Context caching

The bottom line

Related reading