European LLM Leaderboard

This is a collection of multilingual evaluation results obtained using our fork of the LM-evaluation-harness (https://github.com/OpenGPTX/lm-evaluation-harness), based on V1 of the https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard. Note that currently, benchmarks are available in 21 European languages (Irish, Maltese, Croatian missing).

Select model type
[
  • 7,
  • 8
]
Select languages to average over
Select tasks to show
1
2
3