LLM Evaluation Framework and Data Curation

Background

Participating at the lab level in the World Best LLM project, an ongoing national initiative for Sovereign AI.

My Role

  • Developed a unified framework for evaluating LLMs on tasks like math, code, human alignment, and agent capabilities, closely replicating the evaluation methods and environments from the original papers.
  • During data preprocessing, I evaluated responses to queries using a reward model and filtered the data based on reward scores. I significantly improved computational efficiency by applying parallel processing to the reward score calculation.