Background

Participating at the lab level in the World Best LLM project, an ongoing national initiative for Sovereign AI.

My Role

Developed a unified framework for evaluating LLMs on tasks like math, code, human alignment, and agent capabilities, closely replicating the evaluation methods and environments from the original papers.
During data preprocessing, I evaluated responses to queries using a reward model and filtered the data based on reward scores. I significantly improved computational efficiency by applying parallel processing to the reward score calculation.