Hierarchical Reasoning Model

Paper URL: https://arxiv.org/pdf/2506.21734   Code Repo:  https://github.com/sapientinc/HRM   

 HRM is an alternative to transformer architecture that is better able to reason. It outperforms transformer-based LLMs at ARC-AGI2 with only 27M parameters. 

 

 Training a 27M Parameter Model with 1000 Examples 

 In the paper the authors refer to the fact that they only use between 1000 and 10,000 examples for specific problem domains: 

 

 Sudoku-Extreme : 1000 training examples (used in main experiments) 

 Sudoku-Extreme-Full : ~10,000 examples (used in analysis experiments for convergence guarantees) 

 ARC-AGI : ~1000 examples from the official dataset, heavily augmented with translations, rotations, flips, and color permutations 

 

 This may seem quite low considering that this is a 27M parameter neural network and it seems likely that the network would be underfit after so few examples. The authors provide some additional clarifications around this point: 

 

 Data augmentation is used in order to functionally boost the size of the training set. 

 The authors use deep supervision to augment the training process (rather than relying on back-propagation alone). 

 The problem domain is simpler than for language - particularly for things like Sudoku and ARC-AGI - these are structured grid type problems.