You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am working with the Emilia dataset and have been using the Emilia 101k hours dataset. Recently, I noticed that the dataset has been expanded with an additional 114k hours of data under the Emilia-YODAS section. I would like to obtain the raw URLs for the Emilia-YODAS data so that I can process it using emilia-pipe. However, I haven't been able to find a direct way to retrieve these URLs.
Used huggingface_hub.snapshot_download(repo_id="amphion/Emilia-Dataset", allow_patterns=["Emilia-YODAS/*"]) to fetch the new data, but this does not provide direct access to raw URLs.
Explored the dataset structure and metadata to find potential references to source URLs but couldn't locate them.
Searched previous issues and discussions for information related to extracting original dataset URLs but did not find a solution.
Expected Outcome
I would like to:
Obtain the raw URLs of the Emilia-YODAS 114k hours dataset.
Use these URLs to feed data into emilia-pipe for further processing.
Understand if there is a recommended way to extract or generate these URLs from the Hugging Face dataset.
Screenshots
N/A
Environment Information
N/A
Additional Context
If there is an existing way to extract the URLs or if they are stored in a metadata file, please let me know. Any guidance on accessing this data efficiently would be greatly appreciated!
The text was updated successfully, but these errors were encountered:
Hi, thank you so much for your attention.tion to our work! Please refer to the original Yodas dataset for the raw data and meta information: https://huggingface.co/datasets/espnet/yodas2
Problem Overview
I am working with the Emilia dataset and have been using the Emilia 101k hours dataset. Recently, I noticed that the dataset has been expanded with an additional 114k hours of data under the Emilia-YODAS section. I would like to obtain the raw URLs for the Emilia-YODAS data so that I can process it using emilia-pipe. However, I haven't been able to find a direct way to retrieve these URLs.
Steps Taken
Checked the Hugging Face dataset page for any listed download URLs.
Used huggingface_hub.snapshot_download(repo_id="amphion/Emilia-Dataset", allow_patterns=["Emilia-YODAS/*"]) to fetch the new data, but this does not provide direct access to raw URLs.
Explored the dataset structure and metadata to find potential references to source URLs but couldn't locate them.
Searched previous issues and discussions for information related to extracting original dataset URLs but did not find a solution.
Expected Outcome
I would like to:
Obtain the raw URLs of the Emilia-YODAS 114k hours dataset.
Use these URLs to feed data into emilia-pipe for further processing.
Understand if there is a recommended way to extract or generate these URLs from the Hugging Face dataset.
Screenshots
N/A
Environment Information
N/A
Additional Context
If there is an existing way to extract the URLs or if they are stored in a metadata file, please let me know. Any guidance on accessing this data efficiently would be greatly appreciated!
The text was updated successfully, but these errors were encountered: