[Bug] Index out of range issue with voice cloning #4179

qqwjq1981 · 2025-04-01T01:31:01Z

Describe the bug

I frequently experience the index out of range issue when cloning voice given speaker sample and text scripts. It is not always related to the number of tokens. As text with a larger number of tokens may succeed but with smaller number of tokens may fail.

Successful Text: Nature has a delicate and complex structure that allows thousands of creatures to maintain a delicate balance.
Tokenized length: 60
Tokens: [259, 467, 1375, 18, 2, 1221, 2, 14, 2, 636, 91, 186, 2, 53, 2, 884, 25, 169, 2, 32, 1951, 2766, 861, 2, 73, 2, 14, 84, 69, 32, 2, 40, 206, 43, 2864, 2, 58, 2, 814, 18, 14, 1375, 61, 2, 51, 2, 845, 33, 137, 2, 14, 2, 636, 91, 186, 2, 15, 1821, 3263, 9]
Failed Text: Life is so beautiful
Tokenized length: 13
Tokens: [259, 25, 140, 18, 2, 54, 2, 123, 2, 67, 847, 140, 167]

Code:
tts = TTS(model_name="tts_models/multilingual/multi-dataset/xtts_v2")

tts.tts_to_file(
text=full_text,
speaker_wav=speaker_wav_path,
language=target_language,
file_path=output_audio_path,
speed=speed_tts,
split_sentences=True
)

Error message:
ERROR:main:❌ Error during voice cloning:
ERROR:main:Traceback (most recent call last):
File "/home/user/app/app.py", line 485, in generate_voiceover_clone
tts.tts_to_file(
File "/usr/local/lib/python3.10/site-packages/TTS/api.py", line 334, in tts_to_file
wav = self.tts(
File "/usr/local/lib/python3.10/site-packages/TTS/api.py", line 276, in tts
wav = self.synthesizer.tts(
File "/usr/local/lib/python3.10/site-packages/TTS/utils/synthesizer.py", line 386, in tts
outputs = self.tts_model.synthesize(
File "/usr/local/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 419, in synthesize
return self.full_inference(text, speaker_wav, language, **settings)
File "/usr/local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 488, in full_inference
return self.inference(
File "/usr/local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 541, in inference
gpt_codes = self.gpt.generate(
File "/usr/local/lib/python3.10/site-packages/TTS/tts/layers/xtts/gpt.py", line 590, in generate
gen = self.gpt_inference.generate(
File "/usr/local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/transformers/generation/utils.py", line 1575, in generate
result = self._sample(
File "/usr/local/lib/python3.10/site-packages/transformers/generation/utils.py", line 2697, in _sample
outputs = self(
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/TTS/tts/layers/xtts/gpt_inference.py", line 94, in forward
emb = emb + self.pos_embedding.get_fixed_embedding(
File "/usr/local/lib/python3.10/site-packages/TTS/tts/layers/xtts/gpt.py", line 40, in get_fixed_embedding
return self.emb(torch.tensor([ind], device=dev)).unsqueeze(0)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 162, in forward
return F.embedding(
File "/usr/local/lib/python3.10/site-packages/torch/nn/functional.py", line 2233, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self

To Reproduce

Code:
tts = TTS(model_name="tts_models/multilingual/multi-dataset/xtts_v2")

tts.tts_to_file(
text=full_text,
speaker_wav=speaker_wav_path,
language=target_language,
file_path=output_audio_path,
speed=speed_tts,
split_sentences=True
)

Expected behavior

No response

Logs

Environment

# Coqui TTS (XTTS v2)
TTS==0.22.0
torch==2.1.0  # Or the version best suited for your GPU/CPU
CPU

Additional context

No response

eginhard · 2025-04-02T15:08:23Z

Can you try our fork (available via pip install coqui-tts)? This repo is not maintained anymore.

The following code runs fine for me there:

from TTS.api import TTS

tts = TTS(model_name="tts_models/multilingual/multi-dataset/xtts_v2")

tts.tts_to_file(
    text="Life is so beautiful",
    speaker="Uta Obando",
    language="en",
)

qqwjq1981 added the bug Something isn't working label Apr 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Index out of range issue with voice cloning #4179

[Bug] Index out of range issue with voice cloning #4179

qqwjq1981 commented Apr 1, 2025

eginhard commented Apr 2, 2025

[Bug] Index out of range issue with voice cloning #4179

[Bug] Index out of range issue with voice cloning #4179

Comments

qqwjq1981 commented Apr 1, 2025

Describe the bug

To Reproduce

Expected behavior

Logs

Environment

Additional context

eginhard commented Apr 2, 2025