Readme: Example usage #11

OWKenobi · 2022-10-30T11:35:05Z

OWKenobi
Oct 30, 2022

Hello,

is it possible to add to the readme an example on how to use this? A line like:

For minimal setup, you will need following libraries and models:
 -
 -
 -

To produce audio, you can use following command:

audiolm source.wav --length=2000  --output=output.wav

LWprogramming · 2023-01-30T23:09:33Z

LWprogramming
Jan 30, 2023

Just got merged! :) https://github.com/lucidrains/audiolm-pytorch/blob/main/audiolm_pytorch_demo.ipynb

2 replies

OWKenobi Jan 31, 2023
Author

Hello,
thanks a lot! But there is crucial information missing:

Train on a "large" dataset. How many files would I need on minimum? How will I have to annotate them so that the engine knows which one is a "bird chirping" etc - will it just go by the file names?
How long should I train this dataset on minimum? 1 hour, 10 days?
The readme actually ends with: sample = trainer... , but it doesn't tell what to do with that variable. Possibly add something like:
```
torchaudio.save("out.wav",sample.cpu(),44000)
```

So the demo actually outputs something ;-)

LWprogramming Jan 31, 2023

Re 1 & 2, you may like to look at issues #54 #57 #61 as well as discussion #24. To my knowledge, the files aren't annotated-- all the training is on speech, or all the training is on piano, etc.

For 3, sounds good, feel free to make a PR (maybe with a comment about how exactly the sample rate affects things, if relevant-- I just picked the one I used for the placeholder dataset)! I don't own this repo but lucidrains is really on top of things, so it should be pretty quick :)

OWKenobi · 2023-01-31T18:47:10Z

OWKenobi
Jan 31, 2023
Author

Hi,
I'm not at all an expert at this, I am just doing educated guesses, so I'm not going make a PR from that ;-)

However, you seem to have collected some valuable information: 10 days on a NVIDIA RTX A6000 and 23GB of data set is actually a number to go #1 and #2. (So it is not like Stable Diffusion where I throw 10 images in for half an hour and finish training!)

But a training without annotation doesn't make sense. If I want to be able to use the Text Conditioning, I need annotated files so in the end I can use more than just 1 word for my 23GB of training data. The A.I. needs to know which words correspond to which sound files, or it can only produce unguided recompilations.

All I want to say is that, the examples are awesome, but still hard to follow if you are new, maybe @lucidrains reads this discussion and adds the things!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Readme: Example usage #11

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Readme: Example usage #11

OWKenobi Oct 30, 2022

Replies: 2 comments · 2 replies

LWprogramming Jan 30, 2023

OWKenobi Jan 31, 2023 Author

LWprogramming Jan 31, 2023

OWKenobi Jan 31, 2023 Author

OWKenobi
Oct 30, 2022

Replies: 2 comments 2 replies

LWprogramming
Jan 30, 2023

OWKenobi Jan 31, 2023
Author

OWKenobi
Jan 31, 2023
Author