ModernBERT
The original BERT paper came out in 2018, around 7 years ago at time of writing. However, it is still referred to and used as a strong baseline in a number of NLP tasks. ModernBERT was created by HuggingFace. ModernBERT is a drop-in replacement for use in problems where BERT may have previously been used and like the original, has a base and a large variant.
ModernBERT also outperforms DEBERTA-v3-base which has been a favourite of NLP practitioners for a few years thanks to its few-shot and zero-shot capabilities.
Comparison Table
| 
 BERT Base 
 | 
 ModernBERT Base 
 | 
 BERT Large 
 | 
 ModernBERT Large 
 | 
|
|---|---|---|---|---|
| 
 # Params 
 | 
 110M 
 | 
 149M 
 | 
 340M 
 | 
 395M 
 | 
| 
 Context Size 
 | 
 512 
 | 
 8192 
 | 
 512 
 | 
 8192 
 | 
| 
 BEIR 
 | 
 38.9 
 | 
 41.6 
 | 
 38.9 
 | 
 44.0 
 | 
| 
 MLDROOD  
 | 
 23.9 
 | 
 27.4 
 | 
 23.3 
 | 
 34.3 
 | 
| 
 MLDRID 
 | 
 32.2 
 | 
 44.0 
 | 
 31.7 
 | 
 48.6 
 | 
| 
 BEIR (ColBERT) 
 | 
 49.0 
 | 
 51.3 
 | 
 49.5 
 | 
 52.4 
 | 
| 
 MLDROOD (ColBERT) 
 | 
 28.1 
 | 
 80.2 
 | 
 28.5 
 | 
 80.4 
 | 
| 
 GLUE 
 | 
 84.7 
 | 
 88.5 
 | 
 85.2 
 | 
 90.4 
 | 
| 
 CSN 
 | 
 41.2 
 | 
 56.4 
 | 
 41.6 
 | 
 59.5 
 | 
| 
 SQA 
 | 
 59.5 
 | 
 73.6 
 | 
 60.8 
 | 
 83.9 
 | 
OOD=out of domain ID=In Domain
mmBERT is a modern multi-lingual encoder-only model based on ModernBERT.