Jiaqi Li's Homepage
jiaqili3[at]link.cuhk.edu.cn
I am a third-year Ph.D. student at the Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), SDS, supervised by Professor Zhizheng Wu. Before that, I received B.S. degree at CUHK-Shenzhen.
My research interest includes speech language models, text-to-speech synthesis, and neural audio coding. I am one of the main contributors and leaders of the open-source Amphion toolkit. My recent work includes DualCodec and FlexiCodec, two neural audio codec systems for low-frame-rate speech generation.
news
| Apr 26, 2026 | I presented our FlexiCodec paper at ICLR 2026 in Brazil. |
|---|---|
| May 17, 2025 | Our DualCodec paper was accepted to InterSpeech 2025! |
| Feb 01, 2025 | We released the Amphion v0.2 technical report, summarizing our development of Amphion in 2024. |
| Dec 03, 2024 | I presented our new paper, Investigating neural audio codecs for speech language model-based speech generation in SLT 2024. |
| Aug 25, 2024 | 馃帀 Our papers, Amphion and Emila, got accepted by IEEE SLT 2024! |
| Jul 28, 2024 | 馃敟 We released Emila: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation, with 101k hours of speech in six languages and features diverse speech with varied speaking styles. |
| Apr 19, 2024 | I presented our paper, An initial investigation of neural replay simulator for over-the-air adversarial perturbations to automatic speaker verification in ICASSP 2024 in Korea! |
| Nov 26, 2023 | 馃敟 We released Amphion v0.1 |
selected publications
- Tech ReportOverview of the Amphion Toolkit (v0. 2)arXiv preprint arXiv:2501.15442, 2025TL;DR: This is the technical report for the second version of the Amphion toolkit.
- SLT 2024Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech GenerationIn 2024 IEEE Spoken Language Technology Workshop (SLT), 2024TL;DR: We collect a 100k hours in-the-wild speech dataset for speech generation.
- SLT 2024Amphion: an Open-Source Audio, Music, and Speech Generation ToolkitIn 2024 IEEE Spoken Language Technology Workshop (SLT), 2024TL;DR: We develop a unified toolkit for audio, music, and speech generation.
internships
| Microsoft Research | Research Intern 路 Redmond, Washington (Remote) 路 2025.04 - 2025.08 Dynamic Frame Rate Neural Audio Coding Dynamic Frame Rate Neural Audio Coding |
|---|---|
| Baidu Inc., Shenzhen | Research Intern 路 Shenzhen, China 路 2024.6 - 2025.01 Efficient, High-Quality Neural Audio Codec for Zero-Shot TTS Neural Audio Codec Speech Synthesis |
| Microsoft Research | Research Intern 路 Redmond, Washington (Remote) 路 2023.10 - 2024.04 Zero-Shot TTS with Neural Audio Codecs Neural Audio Codec Speech Synthesis |