Skip to main content

ModernBERT

The original BERT paper came out in 2018, around 7 years ago at time of writing. However, it is still referred to and used as a strong baseline in a number of NLP tasks. ModernBERT was created by HuggingFace. ModernBERT is a drop-in replacement for use in problems where BERT may have previously been used and like the original, has a base and a large variant.

ModernBERT also outperforms DEBERTA-v3-base which has been a favourite of NLP practitioners for a few years thanks to its few-shot and zero-shot capabilities.

Comparison Table

 
BERT Base
ModernBERT Base
BERT Large
ModernBERT Large
# Params
110M
149M
340M
395M
Context Size
512
8192
512
8192
BEIR
38.9
41.6
38.9
44.0
MLDROOD 
23.9
27.4
23.3
34.3
MLDRID
32.2
44.0
31.7
48.6
BEIR (ColBERT)
49.0
51.3
49.5
52.4
MLDROOD (ColBERT)
28.1
80.2
28.5
80.4
GLUE
84.7
88.5
85.2
90.4
CSN
41.2
56.4
41.6
59.5
SQA
59.5
73.6
60.8
83.9

OOD=out of domain ID=In Domain

 

mmBERT is a modern multi-lingual encoder-only model based on ModernBERT.