Skip to content

Commit 4261b84

Browse files
authored
Release Amphion v0.2 technical report (#392)
1 parent fc1bf88 commit 4261b84

File tree

3 files changed

+23
-2
lines changed

3 files changed

+23
-2
lines changed

README.md

+11-1
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@
3434
In addition to the specific generation tasks, Amphion includes several **vocoders** and **evaluation metrics**. A vocoder is an important module for producing high-quality audio signals, while evaluation metrics are critical for ensuring consistent metrics in generation tasks. Moreover, Amphion is dedicated to advancing audio generation in real-world applications, such as building **large-scale datasets** for speech synthesis.
3535

3636
## 🚀 News
37+
- **2025/01/30**: We release [Amphion v0.2 Technical Report](https://arxiv.org/abs/2501.15442), which provides a comprehensive overview of the Amphion updates in 2024. [![arXiv](https://img.shields.io/badge/arXiv-Paper-COLOR.svg)](https://arxiv.org/abs/2501.15442)
3738
- **2025/01/23**: [MaskGCT](https://arxiv.org/abs/2409.00750) and [Vevo](https://openreview.net/pdf?id=anQDiQZhDP) got accepted by ICLR 2025! 🎉
3839
- **2024/12/22**: We release the reproduction of **Vevo**, a zero-shot voice imitation framework with controllable timbre and style. Vevo can be applied into a series of speech generation tasks, including VC, TTS, AC, and more. The released pre-trained models are trained on [Emilia](https://huggingface.co/datasets/amphion/Emilia-Dataset) dataset and achieve SOTA zero-shot VC performance. [![arXiv](https://img.shields.io/badge/OpenReview-Paper-COLOR.svg)](https://openreview.net/pdf?id=anQDiQZhDP) [![hf](https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-model-yellow)](https://huggingface.co/amphion/Vevo) [![WebPage](https://img.shields.io/badge/WebPage-Demo-red)](https://versavoice.github.io/) [![readme](https://img.shields.io/badge/README-Key%20Features-blue)](models/vc/vevo/README.md)
3940
- **2024/10/19**: We release **MaskGCT**, a fully non-autoregressive TTS model that eliminates the need for explicit alignment information between text and speech supervision. MaskGCT is trained on [Emilia](https://huggingface.co/datasets/amphion/Emilia-Dataset) dataset and achieves SOTA zero-shot TTS performance. [![arXiv](https://img.shields.io/badge/arXiv-Paper-COLOR.svg)](https://arxiv.org/abs/2409.00750) [![hf](https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-model-yellow)](https://huggingface.co/amphion/maskgct) [![hf](https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-demo-pink)](https://huggingface.co/spaces/amphion/maskgct) [![ModelScope](https://img.shields.io/badge/ModelScope-space-purple)](https://modelscope.cn/studios/amphion/maskgct) [![ModelScope](https://img.shields.io/badge/ModelScope-model-cyan)](https://modelscope.cn/models/amphion/MaskGCT) [![readme](https://img.shields.io/badge/README-Key%20Features-blue)](models/tts/maskgct/README.md)
@@ -182,7 +183,16 @@ We appreciate all contributions to improve Amphion. Please refer to [CONTRIBUTIN
182183
Amphion is under the [MIT License](LICENSE). It is free for both research and commercial use cases.
183184

184185
## 📚 Citations
185-
186+
Amphion v0.2:
187+
```bibtex
188+
@article{amphion_v0.2,
189+
title = {Overview of the Amphion Toolkit (v0.2)},
190+
author = {Jiaqi Li and Xueyao Zhang and Yuancheng Wang and Haorui He and Chaoren Wang and Li Wang and Huan Liao and Junyi Ao and Zeyu Xie and Yiqiao Huang and Junan Zhang and Zhizheng Wu},
191+
year = {2025},
192+
journal = {arXiv preprint arXiv:2501.15442},
193+
}
194+
```
195+
Amphion v0.1:
186196
```bibtex
187197
@inproceedings{amphion,
188198
author={Xueyao Zhang and Liumeng Xue and Yicheng Gu and Yuancheng Wang and Jiaqi Li and Haorui He and Chaoren Wang and Ting Song and Xi Chen and Zihao Fang and Haopeng Chen and Junan Zhang and Tze Ying Tang and Lexiao Zou and Mingxuan Wang and Jun Han and Kai Chen and Haizhou Li and Zhizheng Wu},

models/tts/maskgct/README.md

+6-1
Original file line numberDiff line numberDiff line change
@@ -202,7 +202,12 @@ If you use MaskGCT in your research, please cite the following paper:
202202
publisher = {OpenReview.net},
203203
year = {2025}
204204
}
205-
205+
@article{amphion_v0.2,
206+
title = {Overview of the Amphion Toolkit (v0.2)},
207+
author = {Jiaqi Li and Xueyao Zhang and Yuancheng Wang and Haorui He and Chaoren Wang and Li Wang and Huan Liao and Junyi Ao and Zeyu Xie and Yiqiao Huang and Junan Zhang and Zhizheng Wu},
208+
year = {2025},
209+
journal = {arXiv preprint arXiv:2501.15442},
210+
}
206211
@inproceedings{amphion,
207212
author={Zhang, Xueyao and Xue, Liumeng and Gu, Yicheng and Wang, Yuancheng and Li, Jiaqi and He, Haorui and Wang, Chaoren and Song, Ting and Chen, Xi and Fang, Zihao and Chen, Haopeng and Zhang, Junan and Tang, Tze Ying and Zou, Lexiao and Wang, Mingxuan and Han, Jun and Chen, Kai and Li, Haizhou and Wu, Zhizheng},
208213
title={Amphion: An Open-Source Audio, Music and Speech Generation Toolkit},

models/vc/vevo/README.md

+6
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,12 @@ If you use Vevo in your research, please cite the following papers:
9292
publisher = {OpenReview.net},
9393
year = {2025}
9494
}
95+
@article{amphion_v0.2,
96+
title = {Overview of the Amphion Toolkit (v0.2)},
97+
author = {Jiaqi Li and Xueyao Zhang and Yuancheng Wang and Haorui He and Chaoren Wang and Li Wang and Huan Liao and Junyi Ao and Zeyu Xie and Yiqiao Huang and Junan Zhang and Zhizheng Wu},
98+
year = {2025},
99+
journal = {arXiv preprint arXiv:2501.15442},
100+
}
95101
96102
@inproceedings{amphion,
97103
author={Xueyao Zhang and Liumeng Xue and Yicheng Gu and Yuancheng Wang and Jiaqi Li and Haorui He and Chaoren Wang and Ting Song and Xi Chen and Zihao Fang and Haopeng Chen and Junan Zhang and Tze Ying Tang and Lexiao Zou and Mingxuan Wang and Jun Han and Kai Chen and Haizhou Li and Zhizheng Wu},

0 commit comments

Comments
 (0)