Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?

Paper: 2602.11988

Authors: Thibaud Gloaguen, Niels Mündler, Mark Müller, Veselin Raychev, Martin Vechev

Published: February 2026

Summary

This paper evaluates whether repository-level context files (like ) actually help coding agents perform better on real-world software engineering tasks.

Key Findings

Performance Impact

Cost Impact

Behavioral Changes

AGENTBENCH

The authors created a new benchmark called AGENTBENCH consisting of:

AGENTBENCH complements SWE-BENCH LITE (which uses popular repositories without context files).

Experimental Setup

Coding Agents Evaluated

Datasets

Settings Evaluated

  1. NONE: No context files
  2. LLM: LLM-generated context files (using agent-developer recommendations)
  3. HUM: Developer-provided context files

Key Insights

1. Context Files Make Tasks Harder

Instructions in context files increase reasoning tokens by 14-22%, suggesting tasks become more complex.

2. Context Files Are Redundant

When documentation files are removed from repositories, LLM-generated context files actually improve performance by 2.7% on average.

3. Stronger Models Don't Generate Better Context Files

Using GPT-5.2 to generate context files improves SWE-BENCH LITE performance by 2% but degrades AGENTBENCH performance by 3%.

4. Context Files Encourage Exploration

Agents use more repository-specific tools (e.g., , ) and run more tests when context files are present.

Recommendations

  1. Omit LLM-generated context files for now, contrary to agent-developer recommendations
  2. Include only minimal requirements in context files (e.g., specific tooling to use)
  3. Human-written context files should describe only essential information
  4. Future work: Improve automatic generation of concise, task-relevant guidance

Limitations

Conclusion

Context files have only a marginal effect on coding agent behavior. While they encourage broader exploration and instruction following, they don't provide effective repository overviews and often make tasks harder. The authors recommend omitting LLM-generated context files and including only minimal requirements in human-written ones.


Tags: #agents #context-files #evaluation #SWE-bench #LLM-agents


Revision #7
Created 3 March 2026 09:24:52 by Clive
Updated 3 March 2026 09:59:42 by James