[Help]: How to Obtain Emilia-YODAS 114k Hours Raw URLs for Processing with emilia-pipe #402

zhuangweiji · 2025-02-28T02:59:51Z

Problem Overview

I am working with the Emilia dataset and have been using the Emilia 101k hours dataset. Recently, I noticed that the dataset has been expanded with an additional 114k hours of data under the Emilia-YODAS section. I would like to obtain the raw URLs for the Emilia-YODAS data so that I can process it using emilia-pipe. However, I haven't been able to find a direct way to retrieve these URLs.

Steps Taken

Checked the Hugging Face dataset page for any listed download URLs.

Used huggingface_hub.snapshot_download(repo_id="amphion/Emilia-Dataset", allow_patterns=["Emilia-YODAS/*"]) to fetch the new data, but this does not provide direct access to raw URLs.

Explored the dataset structure and metadata to find potential references to source URLs but couldn't locate them.

Searched previous issues and discussions for information related to extracting original dataset URLs but did not find a solution.

Expected Outcome

I would like to:

Obtain the raw URLs of the Emilia-YODAS 114k hours dataset.

Use these URLs to feed data into emilia-pipe for further processing.

Understand if there is a recommended way to extract or generate these URLs from the Hugging Face dataset.

Screenshots

N/A

Environment Information

N/A

Additional Context

If there is an existing way to extract the URLs or if they are stored in a metadata file, please let me know. Any guidance on accessing this data efficiently would be greatly appreciated!

HarryHe11 · 2025-03-08T07:11:07Z

Hi, thank you so much for your attention.tion to our work! Please refer to the original Yodas dataset for the raw data and meta information: https://huggingface.co/datasets/espnet/yodas2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Help]: How to Obtain Emilia-YODAS 114k Hours Raw URLs for Processing with emilia-pipe #402

[Help]: How to Obtain Emilia-YODAS 114k Hours Raw URLs for Processing with emilia-pipe #402

zhuangweiji commented Feb 28, 2025

HarryHe11 commented Mar 8, 2025

[Help]: How to Obtain Emilia-YODAS 114k Hours Raw URLs for Processing with emilia-pipe #402

[Help]: How to Obtain Emilia-YODAS 114k Hours Raw URLs for Processing with emilia-pipe #402

Comments

zhuangweiji commented Feb 28, 2025

Problem Overview

Steps Taken

Expected Outcome

Screenshots

Environment Information

Additional Context

HarryHe11 commented Mar 8, 2025