Key Points
GPT-4.1 Characteristics: More literal instruction-following, improved coding, and 1M token context. Migrate existing prompts carefully.
Agentic Workflow Essentials: Add three reminders to system prompts: (1) Persistence — keep going until fully resolved, (2) Tool-calling — use tools instead of guessing, (3) Planning — reflect before and after each tool call.
Long Context: Up to 1M tokens. Place instructions at both start and end for best results. Specify whether to use only provided context or allow internal knowledge.
Chain of Thought: GPT-4.1 doesn't reason internally but responds well to explicit step-by-step prompting. This improved SWE-bench scores by 4%.
Instruction Following: Be explicit — implicit rules aren't inferred. Use "Response Rules" sections. Later instructions take priority in conflicts.
Common Failures: Forced tool calls with insufficient info, repeating example phrases, over-formatting without clear instructions.
Prompt Structure: Use Markdown headers (Role, Instructions, Reasoning Steps, Output Format, Examples, Context). XML works well for structured/nested data.
Diff Generation: Use apply_patch format with context lines and @@ markers, or SEARCH/REPLACE blocks.