Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using the model to generate Carnatic music ( Indian Classical music) #93

Open
krylor opened this issue Mar 10, 2025 · 7 comments
Open

Comments

@krylor
Copy link

krylor commented Mar 10, 2025

Namastey,

I'm very impressed with the capabilities of the YuE model. I'm writing to request a feature: the ability to generate Carnatic music.

Carnatic music is a classical music tradition from South India with a rich and complex structure, including:

Ragas: Melodic frameworks with specific rules and scales.
Talams: Rhythmic cycles with intricate patterns.
Improvisation: Extensive scope for improvisation within the established frameworks.
It would be incredibly valuable to see if YuE could be adapted to generate music in this style.

Specifically, I'm interested in knowing:

The model's architecture could be extended to understand and generate the nuances of ragas and talams.

https://www.researchgate.net/publication/313237218_Modeling_and_Analysis_of_Indian_Carnatic_Music_Using_Category_Theory

Do you have any plans to incorporate datasets of Carnatic music for training?
If there are any potential challenges in achieving Carnatic music generation with YuE.

Thank you for your time and consideration.

@a43992899
Copy link
Collaborator

Hi, do you have any suggestions for data?

Yes, we are definitely interested in extending YuE to a more diverse set of human audio art.

@krylor
Copy link
Author

krylor commented Mar 11, 2025

Hi @a43992899

When considering extending the Yue model to Indian music, the significant difference in rhythmic complexity becomes apparent. Indian taals often involve cycles of 8, 12, or more beats, which may pose a challenge for the existing model.

Would fine-tuning be enough to address this, or would the model's architecture need to be modified to effectively learn and generate these longer and more intricate rhythmic patterns?

There are datasets available for this

https://github.com/MTG/saraga/tree/master/dataset & a few others too

Thank you for your guidance.

@a43992899
Copy link
Collaborator

Hi, I think 50hr data is worth a try. I think fine-tuning is enough; no need for architecture modification.

But I did not find any lyrics in the dataset you recommend.

@krylor
Copy link
Author

krylor commented Mar 12, 2025

Thanks for your quick response!

Regarding the lyrics, you're right. The initial dataset focuses primarily on the instrumental and melodic aspects of the music. We're approaching this iteratively, starting with the music structure and then layering in the vocal components, which will include lyrics. This is especially important for Indian music, where complex beat cycles and rhythmic variations play a crucial role.

We're really excited to see how your model handles this. Do you have an estimated timeline for when the fine-tuning code or a more complete implementation might be available? We're eager to start experimenting and contributing back to the project.

@a43992899
Copy link
Collaborator

a43992899 commented Mar 12, 2025

Give me one or two weeks. I just finished the YuE paper (https://arxiv.org/abs/2503.08638) . I will start working on huggingface based finetuning. We support Megatron-LM fine-tuning internally, but it is super heavy.

And BTW, are you guys interested in contribute to the first open access royalty free lyrics2song dataset? We are working on the fine-tuning recipe, which includes data recipe.

@krylor
Copy link
Author

krylor commented Mar 13, 2025

Yes, we are interested in doing that. I think the Saraga dataset mightn't be enough.
We will explore other royalty-free sources to complement the main dataset and also try to generate phonetic lyrics in English.
One of the main challenges is that these aren't in a single language.

@a43992899
Copy link
Collaborator

Feel free to join our channel on royalty free data collection: https://discord.gg/vWjHTfvP

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants