You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I frequently experience the index out of range issue when cloning voice given speaker sample and text scripts. It is not always related to the number of tokens. As text with a larger number of tokens may succeed but with smaller number of tokens may fail.
Error message:
ERROR:main:❌ Error during voice cloning:
ERROR:main:Traceback (most recent call last):
File "/home/user/app/app.py", line 485, in generate_voiceover_clone
tts.tts_to_file(
File "/usr/local/lib/python3.10/site-packages/TTS/api.py", line 334, in tts_to_file
wav = self.tts(
File "/usr/local/lib/python3.10/site-packages/TTS/api.py", line 276, in tts
wav = self.synthesizer.tts(
File "/usr/local/lib/python3.10/site-packages/TTS/utils/synthesizer.py", line 386, in tts
outputs = self.tts_model.synthesize(
File "/usr/local/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 419, in synthesize
return self.full_inference(text, speaker_wav, language, **settings)
File "/usr/local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 488, in full_inference
return self.inference(
File "/usr/local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 541, in inference
gpt_codes = self.gpt.generate(
File "/usr/local/lib/python3.10/site-packages/TTS/tts/layers/xtts/gpt.py", line 590, in generate
gen = self.gpt_inference.generate(
File "/usr/local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/transformers/generation/utils.py", line 1575, in generate
result = self._sample(
File "/usr/local/lib/python3.10/site-packages/transformers/generation/utils.py", line 2697, in _sample
outputs = self(
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/TTS/tts/layers/xtts/gpt_inference.py", line 94, in forward
emb = emb + self.pos_embedding.get_fixed_embedding(
File "/usr/local/lib/python3.10/site-packages/TTS/tts/layers/xtts/gpt.py", line 40, in get_fixed_embedding
return self.emb(torch.tensor([ind], device=dev)).unsqueeze(0)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 162, in forward
return F.embedding(
File "/usr/local/lib/python3.10/site-packages/torch/nn/functional.py", line 2233, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self
Can you try our fork (available via pip install coqui-tts)? This repo is not maintained anymore.
The following code runs fine for me there:
fromTTS.apiimportTTStts=TTS(model_name="tts_models/multilingual/multi-dataset/xtts_v2")
tts.tts_to_file(
text="Life is so beautiful",
speaker="Uta Obando",
language="en",
)
Describe the bug
I frequently experience the index out of range issue when cloning voice given speaker sample and text scripts. It is not always related to the number of tokens. As text with a larger number of tokens may succeed but with smaller number of tokens may fail.
Successful Text: Nature has a delicate and complex structure that allows thousands of creatures to maintain a delicate balance.
Tokenized length: 60
Tokens: [259, 467, 1375, 18, 2, 1221, 2, 14, 2, 636, 91, 186, 2, 53, 2, 884, 25, 169, 2, 32, 1951, 2766, 861, 2, 73, 2, 14, 84, 69, 32, 2, 40, 206, 43, 2864, 2, 58, 2, 814, 18, 14, 1375, 61, 2, 51, 2, 845, 33, 137, 2, 14, 2, 636, 91, 186, 2, 15, 1821, 3263, 9]
Failed Text: Life is so beautiful
Tokenized length: 13
Tokens: [259, 25, 140, 18, 2, 54, 2, 123, 2, 67, 847, 140, 167]
Code:
tts = TTS(model_name="tts_models/multilingual/multi-dataset/xtts_v2")
tts.tts_to_file(
text=full_text,
speaker_wav=speaker_wav_path,
language=target_language,
file_path=output_audio_path,
speed=speed_tts,
split_sentences=True
)
Error message:
ERROR:main:❌ Error during voice cloning:
ERROR:main:Traceback (most recent call last):
File "/home/user/app/app.py", line 485, in generate_voiceover_clone
tts.tts_to_file(
File "/usr/local/lib/python3.10/site-packages/TTS/api.py", line 334, in tts_to_file
wav = self.tts(
File "/usr/local/lib/python3.10/site-packages/TTS/api.py", line 276, in tts
wav = self.synthesizer.tts(
File "/usr/local/lib/python3.10/site-packages/TTS/utils/synthesizer.py", line 386, in tts
outputs = self.tts_model.synthesize(
File "/usr/local/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 419, in synthesize
return self.full_inference(text, speaker_wav, language, **settings)
File "/usr/local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 488, in full_inference
return self.inference(
File "/usr/local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 541, in inference
gpt_codes = self.gpt.generate(
File "/usr/local/lib/python3.10/site-packages/TTS/tts/layers/xtts/gpt.py", line 590, in generate
gen = self.gpt_inference.generate(
File "/usr/local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/transformers/generation/utils.py", line 1575, in generate
result = self._sample(
File "/usr/local/lib/python3.10/site-packages/transformers/generation/utils.py", line 2697, in _sample
outputs = self(
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/TTS/tts/layers/xtts/gpt_inference.py", line 94, in forward
emb = emb + self.pos_embedding.get_fixed_embedding(
File "/usr/local/lib/python3.10/site-packages/TTS/tts/layers/xtts/gpt.py", line 40, in get_fixed_embedding
return self.emb(torch.tensor([ind], device=dev)).unsqueeze(0)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 162, in forward
return F.embedding(
File "/usr/local/lib/python3.10/site-packages/torch/nn/functional.py", line 2233, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self
To Reproduce
Code:
tts = TTS(model_name="tts_models/multilingual/multi-dataset/xtts_v2")
tts.tts_to_file(
text=full_text,
speaker_wav=speaker_wav_path,
language=target_language,
file_path=output_audio_path,
speed=speed_tts,
split_sentences=True
)
Expected behavior
No response
Logs
Environment
Additional context
No response
The text was updated successfully, but these errors were encountered: