LLM Evaluation Framework and Data Curation
Background
Participating at the lab level in the World Best LLM project, an ongoing national initiative for Sovereign AI.
My Role
- Developed a unified framework for evaluating LLMs on tasks like math, code, human alignment, and agent capabilities, closely replicating the evaluation methods and environments from the original papers.
- During data preprocessing, I evaluated responses to queries using a reward model and filtered the data based on reward scores. I significantly improved computational efficiency by applying parallel processing to the reward score calculation.