EleutherAI / lm-evaluation-harness Public

Notifications You must be signed in to change notification settings
Fork 2.3k
Star 8.8k

Code
Issues 387
Pull requests 118
Actions
Projects 1
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: EleutherAI/lm-evaluation-harness

reproduce llama 3 evals

#2557 opened Dec 10, 2024 by baberabb

Open 6

Beta

Labels 10 Milestones 1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

387 Open 959 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

TypeError: top_k_top_p_sampling_from_probs() missing 1 required positional argument: 'top_p'

#2936 opened Apr 28, 2025 by rangehow

np.NaN

#2935 opened Apr 27, 2025 by upunaprosk

how do evaluate mmlu by local-chat-completions or local-completions?

#2934 opened Apr 26, 2025 by amdslgl

Are there any Safety Alignment Benchmarks ready for use?

#2933 opened Apr 26, 2025 by JovialYip

Some LongBench scores on Llama-3.1-8B-Instruct are far off

#2932 opened Apr 25, 2025 by cameronshinn

llama3-8b-instruct 0-shot mbpp pass@1=0

#2931 opened Apr 25, 2025 by LafouCC

Support for OlympiadBench and AMC

#2930 opened Apr 25, 2025 by Edric-Zhao

What is the correct way to save result file?

#2927 opened Apr 24, 2025 by rangehow

Do we have plan to add BFCL into the tasks?

#2926 opened Apr 23, 2025 by zzhangncsu

eval bbh failed

#2922 opened Apr 18, 2025 by godlikehhd

clean up process results bug

Something isn't working.

#2920 opened Apr 16, 2025 by baberabb

GPQA Preprocessing Function Results in Incorrect Physics Equations bug

Something isn't working.

validation

For validation of task implementations.

#2907 opened Apr 14, 2025 by ShayekhBinIslam

Filter not extracting choice selection correctly validation

For validation of task implementations.

#2905 opened Apr 14, 2025 by 1jamesthompson1

Hang with VLLM backend when data_parallel_size > 1

#2904 opened Apr 14, 2025 by rangehow

Does lm-eval currently support testing for models in the deekseek r1 category?

#2903 opened Apr 13, 2025 by Polarisamoon

Caching requests does not work for model gguf

#2901 opened Apr 11, 2025 by falkbene

How to use *chat_template* with .gguf models ?

#2900 opened Apr 11, 2025 by Bobchenyx

Improve the behavior of progress bars when using huggingface

#2898 opened Apr 11, 2025 by Zephyr271828

Additional data for evaluation

#2896 opened Apr 9, 2025 by harshakokel

RuntimeError: 500 Server Error for URL During LM-Eval with gguf Model

#2894 opened Apr 9, 2025 by amjh83

Injecting custom variables/values into a task via YAML

#2891 opened Apr 8, 2025 by j0ma

low eval accuracy with gguf

#2887 opened Apr 7, 2025 by jerryzh168

Hellaswag filenotfound error

#2886 opened Apr 7, 2025 by dsvilarkovic

LLaMA-4 evaluation

#2885 opened Apr 7, 2025 by jybbjybb

TypeError from missing yaml_path in lm_eval.utils.load_yaml_config when task uses include

#2884 opened Apr 6, 2025 by MarieRoald

Previous 1 2 3 4 5 … 15 16 Next

Previous Next

ProTip! Type g p on any issue or pull request to go back to the pull request listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly