Question: How to run itas algorithm for each benchmark besides mt_bench and arena? For example gsm8k? #6

oandreeva-nv · 2025-02-04T00:50:59Z

[Note: edited for clarification]

Dear authors,

I was trying to run ITAS algorithm for GSM8K benchmark to get a task specific ARCHON architecture. Unfortunately, I'm a bit stuck with unsupported benchmark issues.

I can see that provided scripts under benchmarks/ and benchmarks/gsm8k repos can generate and evaluate answers.
Unfortunately, it seems like itas_algorithm script in current released version supports only "mt_bench" and "arena_hard_auto":

Archon/src/archon/itas_algorithms/itas_algorithm.py

Line 150 in d45892c

if self.search_config["benchmark"] in ["mt_bench", "arena_hard_auto"]:

Please, let me know if I'm wrong and what steps are necessary to get a task specific ARCHON architecture.

My intuition leads me to the fact that I need to add question map to use in power_ranker:

Archon/src/archon/itas_algorithms/power_ranker.py

Lines 24 to 27 in d45892c

    
           QUESTION_MAP = { 
        
               "arena_hard_auto": "archon/benchmarks/arena_hard_auto/arena_questions.jsonl", 
        
               "mt_bench": "archon/benchmarksmt_bench/FastChat/fastchat/llm_judge/data/mt_bench/question.jsonl", 
        
           }

as well as add some logic to compare generated answer against a correct one. Is my intuition correct? Do you plan to update the code with this logic by any chance?

Thanks in advance!

The text was updated successfully, but these errors were encountered:

shloknatarajan · 2025-02-06T04:22:18Z

You're correct; at this point in time, Archon only supports arena hard auto and mt_bench for sampling. The brunt of the work is getting Power Ranker to support new benchmarks since ITAS relies on Power Ranker to decide what configurations work best. We don't have a current timeline in mind for supporting other benchmarks, but it is something that's on the agenda. If you do implement this for your own use case and put up a PR that would definitely help us get the integration working sooner

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: How to run itas algorithm for each benchmark besides mt_bench and arena? For example gsm8k? #6

Question: How to run itas algorithm for each benchmark besides mt_bench and arena? For example gsm8k? #6

oandreeva-nv commented Feb 4, 2025 •

edited

Loading

shloknatarajan commented Feb 6, 2025

Question: How to run itas algorithm for each benchmark besides mt_bench and arena? For example gsm8k? #6

Question: How to run itas algorithm for each benchmark besides mt_bench and arena? For example gsm8k? #6

Comments

oandreeva-nv commented Feb 4, 2025 • edited Loading

shloknatarajan commented Feb 6, 2025

oandreeva-nv commented Feb 4, 2025 •

edited

Loading