Jiaqi Li's Homepage

jiaqili3[at]link.cuhk.edu.cn
I am a first-year Ph.D. student at the Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), SDS, supervised by Professor Zhizheng Wu. Before that, I received B.S. degree at CUHK-Shenzhen.
My research interest includes text-to-speech synthesis and neural audio coding. I am one of the main contributors and leaders of the open-source Amphion toolkit. Currently, I am working on open-sourcing a new neural audio codec named DualCodec and some TTS systems using DualCodec.
news
Feb 01, 2025 | We released the Amphion v0.2 technical report, summarizing our development of Amphion in 2024. |
---|---|
Dec 03, 2024 | I presented our new paper, Investigating neural audio codecs for speech language model-based speech generation in SLT 2024. ![]() |
Aug 25, 2024 | 🎉 Our papers, Amphion and Emila, got accepted by IEEE SLT 2024! |
Jul 28, 2024 | 🔥 We released Emila: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation, with 101k hours of speech in six languages and features diverse speech with varied speaking styles. |
Apr 19, 2024 | I presented our paper, An initial investigation of neural replay simulator for over-the-air adversarial perturbations to automatic speaker verification in ICASSP 2024 in Korea! ![]() |
Nov 26, 2023 | 🔥 We released Amphion v0.1 |
selected publications
- Tech ReportOverview of the Amphion Toolkit (v0. 2)arXiv preprint arXiv:2501.15442, 2025TL;DR: This is the technical report for the second version of the Amphion toolkit.
- SLT 2024Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech GenerationIn 2024 IEEE Spoken Language Technology Workshop (SLT), 2024TL;DR: We collect a 100k hours in-the-wild speech dataset for speech generation.
- SLT 2024Amphion: an Open-Source Audio, Music, and Speech Generation ToolkitIn 2024 IEEE Spoken Language Technology Workshop (SLT), 2024TL;DR: We develop a unified toolkit for audio, music, and speech generation.
internships
Baidu Inc., Shenzhen | Research Intern · Shenzhen, China · 2024.6 - 2025.01 Efficient, High-Quality Neural Audio Codec for Zero-Shot TTS Neural Audio Codec Speech Synthesis |
---|---|
Microsoft Research | Research Intern · Redmond, Washington (Remote) · 2023.10 - 2024.04 Zero-Shot TTS with Neural Audio Codecs Neural Audio Codec Speech Synthesis |