Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] #862

Open
2 tasks done
Xamli-hzn opened this issue Feb 25, 2025 · 2 comments
Open
2 tasks done

[BUG] #862

Xamli-hzn opened this issue Feb 25, 2025 · 2 comments
Labels
bug Something isn't working

Comments

@Xamli-hzn
Copy link

Prerequisites

  • I have read the documentation.
  • I have checked other issues for similar problems.

Backend

Hugging Face Space/Endpoints

Interface Used

UI

CLI Command

No response

UI Screenshots & Parameters

Error Logs

Error type 1 log:

94%|█████████▍| 64/68 [06:57<00:22, 5.73s/it]

96%|█████████▌| 65/68 [07:03<00:17, 5.81s/it]

97%|█████████▋| 66/68 [07:10<00:12, 6.02s/it]

99%|█████████▊| 67/68 [07:16<00:06, 6.18s/it]

100%|██████████| 68/68 [07:18<00:00, 4.94s/it]ERROR | 2025-02-25 10:16:37 | autotrain.trainers.common:wrapper:215 - train has failed due to an exception: Traceback (most recent call last):
File "/app/env/lib/python3.10/site-packages/autotrain/trainers/common.py", line 212, in wrapper
return func(*args, **kwargs)
File "/app/env/lib/python3.10/site-packages/autotrain/trainers/seq2seq/main.py", line 244, in train
trainer.train()
File "/app/env/lib/python3.10/site-packages/transformers/trainer.py", line 2171, in train
return inner_training_loop(
File "/app/env/lib/python3.10/site-packages/transformers/trainer.py", line 2625, in _inner_training_loop
self._maybe_log_save_evaluate(tr_loss, grad_norm, model, trial, epoch, ignore_keys_for_eval, start_time)
File "/app/env/lib/python3.10/site-packages/transformers/trainer.py", line 3071, in _maybe_log_save_evaluate
metrics = self._evaluate(trial, ignore_keys_for_eval)
File "/app/env/lib/python3.10/site-packages/transformers/trainer.py", line 3025, in _evaluate
metrics = self.evaluate(ignore_keys=ignore_keys_for_eval)
File "/app/env/lib/python3.10/site-packages/transformers/trainer_seq2seq.py", line 197, in evaluate
return super().evaluate(eval_dataset, ignore_keys=ignore_keys, metric_key_prefix=metric_key_prefix)
File "/app/env/lib/python3.10/site-packages/transformers/trainer.py", line 4076, in evaluate
output = eval_loop(
File "/app/env/lib/python3.10/site-packages/transformers/trainer.py", line 4365, in evaluation_loop
metrics = self.compute_metrics(
File "/app/env/lib/python3.10/site-packages/autotrain/trainers/seq2seq/utils.py", line 43, in _seq2seq_metrics
decoded_preds = tokenizer.batch_decode(predictions, skip_special_tokens=True)
File "/app/env/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3811, in batch_decode
return [
File "/app/env/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3812, in
self.decode(
File "/app/env/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3851, in decode
return self._decode(
File "/app/env/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 668, in _decode
text = self._tokenizer.decode(token_ids, skip_special_tokens=skip_special_tokens)
OverflowError: out of range integral type conversion attempted

ERROR | 2025-02-25 10:16:37 | autotrain.trainers.common:wrapper:216 - out of range integral type conversion attempted
INFO | 2025-02-25 10:16:37 | autotrain.trainers.common:pause_space:156 - Pausing space...

Error type 2 log:

100%|██████████| 68/68 [24:39<00:00, 18.16s/it]ERROR | 2025-02-24 13:33:50 | autotrain.trainers.common:wrapper:215 - train has failed due to an exception: Traceback (most recent call last):
File "/app/env/lib/python3.10/site-packages/autotrain/trainers/common.py", line 212, in wrapper
return func(*args, **kwargs)
File "/app/env/lib/python3.10/site-packages/autotrain/trainers/seq2seq/main.py", line 244, in train
trainer.train()
File "/app/env/lib/python3.10/site-packages/transformers/trainer.py", line 2171, in train
return inner_training_loop(
File "/app/env/lib/python3.10/site-packages/transformers/trainer.py", line 2625, in _inner_training_loop
self._maybe_log_save_evaluate(tr_loss, grad_norm, model, trial, epoch, ignore_keys_for_eval, start_time)
File "/app/env/lib/python3.10/site-packages/transformers/trainer.py", line 3071, in _maybe_log_save_evaluate
metrics = self._evaluate(trial, ignore_keys_for_eval)
File "/app/env/lib/python3.10/site-packages/transformers/trainer.py", line 3025, in _evaluate
metrics = self.evaluate(ignore_keys=ignore_keys_for_eval)
File "/app/env/lib/python3.10/site-packages/transformers/trainer_seq2seq.py", line 197, in evaluate
return super().evaluate(eval_dataset, ignore_keys=ignore_keys, metric_key_prefix=metric_key_prefix)
File "/app/env/lib/python3.10/site-packages/transformers/trainer.py", line 4076, in evaluate
output = eval_loop(
File "/app/env/lib/python3.10/site-packages/transformers/trainer.py", line 4365, in evaluation_loop
metrics = self.compute_metrics(
File "/app/env/lib/python3.10/site-packages/autotrain/trainers/seq2seq/utils.py", line 48, in _seq2seq_metrics
decoded_preds = ["\n".join(nltk.sent_tokenize(pred.strip())) for pred in decoded_preds]
File "/app/env/lib/python3.10/site-packages/autotrain/trainers/seq2seq/utils.py", line 48, in
decoded_preds = ["\n".join(nltk.sent_tokenize(pred.strip())) for pred in decoded_preds]
File "/app/env/lib/python3.10/site-packages/nltk/tokenize/init.py", line 119, in sent_tokenize
tokenizer = _get_punkt_tokenizer(language)
File "/app/env/lib/python3.10/site-packages/nltk/tokenize/init.py", line 105, in _get_punkt_tokenizer
return PunktTokenizer(language)
File "/app/env/lib/python3.10/site-packages/nltk/tokenize/punkt.py", line 1744, in init
self.load_lang(lang)
File "/app/env/lib/python3.10/site-packages/nltk/tokenize/punkt.py", line 1749, in load_lang
lang_dir = find(f"tokenizers/punkt_tab/{lang}/")
File "/app/env/lib/python3.10/site-packages/nltk/data.py", line 579, in find
raise LookupError(resource_not_found)
LookupError:


Resource punkt_tab not found.
Please use the NLTK Downloader to obtain the resource:

import nltk
nltk.download('punkt_tab')

For more information see: https://www.nltk.org/data.html

Attempted to load tokenizers/punkt_tab/english/

Searched in:
- '/app/nltk_data'
- '/app/env/nltk_data'
- '/app/env/share/nltk_data'
- '/app/env/lib/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'


ERROR | 2025-02-24 13:33:50 | autotrain.trainers.common:wrapper:216 -


Resource punkt_tab not found.
Please use the NLTK Downloader to obtain the resource:

import nltk
nltk.download('punkt_tab')

For more information see: https://www.nltk.org/data.html

Attempted to load tokenizers/punkt_tab/english/

Searched in:
- '/app/nltk_data'
- '/app/env/nltk_data'
- '/app/env/share/nltk_data'
- '/app/env/lib/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'


INFO | 2025-02-24 13:33:50 | autotrain.trainers.common:pause_space:156 - Pausing space...
INFO: 10.16.41.172:4130 - "GET /ui/accelerators HTTP/1.1" 200 OK
INFO: 10.16.27.28:32660 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.27.28:36086 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.41.172:64719 - "GET /ui/accelerators HTTP/1.1" 200 OK
INFO: 10.16.28.141:59511 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.27.28:53898 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.41.172:60829 - "GET /?logs=container&__sign=eyJhbGciOiJFZERTQSJ9.eyJyZWFkIjp0cnVlLCJwZXJtaXNzaW9ucyI6eyJyZXBvLmNvbnRlbnQucmVhZCI6dHJ1ZX0sIm9uQmVoYWxmT2YiOnsia2luZCI6InVzZXIiLCJfaWQiOiI2NWIwYzY2Y2M5YTVhNzY4MGY4Y2E3NmQiLCJ1c2VyIjoiaGFzc2FtbmlhejcifSwiaWF0IjoxNzQwNDA0MDUwLCJzdWIiOiIvc3BhY2VzL2hhc3NhbW5pYXo3L2F1dG90cmFpbi1hZHZhbmNlZCIsImV4cCI6MTc0MDQ5MDQ1MCwiaXNzIjoiaHR0cHM6Ly9odWdnaW5nZmFjZS5jbyJ9.MjbokpvYHrz07PN4vwpXo2HTTQ2hZQ3pcmIHJv8f9CiMVbpvyd-zRyNTxN-rxzDXyPbbRyaCucGLapEMiXBzCQ HTTP/1.1" 307 Temporary Redirect
ERROR | 2025-02-24 13:34:11 | autotrain.app.ui_routes:load_index:381 - Failed to get user and orgs: object of type '_TemplateResponse' has no len()
INFO: 10.16.28.141:23805 - "GET /ui/?logs=container&__sign=eyJhbGciOiJFZERTQSJ9.eyJyZWFkIjp0cnVlLCJwZXJtaXNzaW9ucyI6eyJyZXBvLmNvbnRlbnQucmVhZCI6dHJ1ZX0sIm9uQmVoYWxmT2YiOnsia2luZCI6InVzZXIiLCJfaWQiOiI2NWIwYzY2Y2M5YTVhNzY4MGY4Y2E3NmQiLCJ1c2VyIjoiaGFzc2FtbmlhejcifSwiaWF0IjoxNzQwNDA0MDUwLCJzdWIiOiIvc3BhY2VzL2hhc3NhbW5pYXo3L2F1dG90cmFpbi1hZHZhbmNlZCIsImV4cCI6MTc0MDQ5MDQ1MCwiaXNzIjoiaHR0cHM6Ly9odWdnaW5nZmFjZS5jbyJ9.MjbokpvYHrz07PN4vwpXo2HTTQ2hZQ3pcmIHJv8f9CiMVbpvyd-zRyNTxN-rxzDXyPbbRyaCucGLapEMiXBzCQ HTTP/1.1" 200 OK
INFO: 10.16.27.28:53898 - "GET /?logs=container&__sign=eyJhbGciOiJFZERTQSJ9.eyJyZWFkIjp0cnVlLCJwZXJtaXNzaW9ucyI6eyJyZXBvLmNvbnRlbnQucmVhZCI6dHJ1ZX0sIm9uQmVoYWxmT2YiOnsia2luZCI6InVzZXIiLCJfaWQiOiI2NWIwYzY2Y2M5YTVhNzY4MGY4Y2E3NmQiLCJ1c2VyIjoiaGFzc2FtbmlhejcifSwiaWF0IjoxNzQwNDA0MDUwLCJzdWIiOiIvc3BhY2VzL2hhc3NhbW5pYXo3L2F1dG90cmFpbi1hZHZhbmNlZCIsImV4cCI6MTc0MDQ5MDQ1MCwiaXNzIjoiaHR0cHM6Ly9odWdnaW5nZmFjZS5jbyJ9.MjbokpvYHrz07PN4vwpXo2HTTQ2hZQ3pcmIHJv8f9CiMVbpvyd-zRyNTxN-rxzDXyPbbRyaCucGLapEMiXBzCQ HTTP/1.1" 307 Temporary Redirect
ERROR | 2025-02-24 13:34:12 | autotrain.app.ui_routes:load_index:381 - Failed to get user and orgs: object of type '_TemplateResponse' has no len()
INFO: 10.16.27.28:53898 - "GET /ui/?logs=container&__sign=eyJhbGciOiJFZERTQSJ9.eyJyZWFkIjp0cnVlLCJwZXJtaXNzaW9ucyI6eyJyZXBvLmNvbnRlbnQucmVhZCI6dHJ1ZX0sIm9uQmVoYWxmT2YiOnsia2luZCI6InVzZXIiLCJfaWQiOiI2NWIwYzY2Y2M5YTVhNzY4MGY4Y2E3NmQiLCJ1c2VyIjoiaGFzc2FtbmlhejcifSwiaWF0IjoxNzQwNDA0MDUwLCJzdWIiOiIvc3BhY2VzL2hhc3NhbW5pYXo3L2F1dG90cmFpbi1hZHZhbmNlZCIsImV4cCI6MTc0MDQ5MDQ1MCwiaXNzIjoiaHR0cHM6Ly9odWdnaW5nZmFjZS5jbyJ9.MjbokpvYHrz07PN4vwpXo2HTTQ2hZQ3pcmIHJv8f9CiMVbpvyd-zRyNTxN-rxzDXyPbbRyaCucGLapEMiXBzCQ HTTP/1.1" 200 OK
INFO: 10.16.28.141:23805 - "GET /static/logo.png HTTP/1.1" 200 OK

Error type 3 Log

To avoid this warning pass in values for each of the problematic parameters or run accelerate config.
INFO: 10.16.47.22:33211 - "GET /ui/is_model_training HTTP/1.1" 200 OK
Traceback (most recent call last):
File "/app/env/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/app/env/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/app/env/lib/python3.10/site-packages/autotrain/trainers/seq2seq/main.py", line 33, in
from autotrain.trainers.seq2seq import utils
File "/app/env/lib/python3.10/site-packages/autotrain/trainers/seq2seq/utils.py", line 8, in
ROUGE_METRIC = evaluate.load("rouge")
File "/app/env/lib/python3.10/site-packages/evaluate/loading.py", line 748, in load
evaluation_module = evaluation_module_factory(
File "/app/env/lib/python3.10/site-packages/evaluate/loading.py", line 681, in evaluation_module_factory
raise FileNotFoundError(
FileNotFoundError: Couldn't find a module script at /app/rouge/rouge.py. Module 'rouge' doesn't exist on the Hugging Face Hub either.
Traceback (most recent call last):
File "/app/env/bin/accelerate", line 8, in
sys.exit(main())
File "/app/env/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
args.func(args)
File "/app/env/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1174, in launch_command
simple_launcher(args)
File "/app/env/lib/python3.10/site-packages/accelerate/commands/launch.py", line 769, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/app/env/bin/python', '-m', 'autotrain.trainers.seq2seq', '--training_config', 'autotrain-8r2be-1jpag/training_params.json']' returned non-zero exit status 1.
INFO: 10.16.17.134:18967 - "GET /ui/accelerators HTTP/1.1" 200 OK
INFO | 2025-02-24 12:23:05 | autotrain.app.utils:get_running_jobs:40 - Killing PID: 168
INFO | 2025-02-24 12:23:05 | autotrain.app.utils:kill_process_by_pid:90 - Sent SIGTERM to process with PID 168

Additional Information

Kindly provide a stable workable Auto-train space versions for seq2seq training for multi-language model or English model. Unable to train as singel model for seq2seq Since January updates. Now not even getting previous version which worked very fine with mt5 small or mt5 base or even facebook bart large 50, but still getting such error, tried debugging datasets as well, no negative or greater than model vocab sizes are generated. I have tried installing manually eval rouge and nltk punkt-tab with paths from github using docker files, but still one these three errors persist.

Using Transformers version 4.43.1 with autotrain 0.8.11 && Transformers 4.45.0 or + till 4.48.0 with autotrain 0.8.24, Tested many versions, stable auto-train versions, but still getting these error on different datasets like spanish english on different small or base models that were working fine duirng Dec-jan period, tried and resotred same versions of auto-train sapce but still get this error or if try older auto-train 08.8.11 with stable 4.43.1 or +, i get eval rouge score error, tried installing required resources but errors now keep on getting repeated Three errors January 2025. Even getting errors on already trained datasets, that i used earlier on HF auto-train spaces and were successfully trained.

@Xamli-hzn Xamli-hzn added the bug Something isn't working label Feb 25, 2025
@John6666cat
Copy link

A Transformers issue that seems to be related.
huggingface/transformers#36330

@Xamli-hzn
Copy link
Author

A Transformers issue that seems to be related. huggingface/transformers#36330

Thanks for mention, there three issues, usually overflow error random on even smaller datasets is random with same space :latest with transformers 4.48, older version gives nltk error when installed manually, gives error of overflow, as mentioned above,

@abhishekkrthakur can you confirm if its an account limit issue ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants