Jiaqi Li's Homepage

iclr2026.jpg
jiaqili3[at]link.cuhk.edu.cn

I am a third-year Ph.D. student at the Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), SDS, supervised by Professor Zhizheng Wu. Before that, I received B.S. degree at CUHK-Shenzhen.

My research interest includes speech language models, text-to-speech synthesis, and neural audio coding. I am one of the main contributors and leaders of the open-source Amphion toolkit. My recent work includes DualCodec and FlexiCodec, two neural audio codec systems for low-frame-rate speech generation.

news

Apr 26, 2026 I presented our FlexiCodec paper at ICLR 2026 in Brazil. ICLR 2026
May 17, 2025 Our DualCodec paper was accepted to InterSpeech 2025!
Feb 01, 2025 We released the Amphion v0.2 technical report, summarizing our development of Amphion in 2024.
Dec 03, 2024 I presented our new paper, Investigating neural audio codecs for speech language model-based speech generation in SLT 2024. SLT 2024
Aug 25, 2024 馃帀 Our papers, Amphion and Emila, got accepted by IEEE SLT 2024!
Jul 28, 2024 馃敟 We released Emila: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation, with 101k hours of speech in six languages and features diverse speech with varied speaking styles.
Apr 19, 2024 I presented our paper, An initial investigation of neural replay simulator for over-the-air adversarial perturbations to automatic speaker verification in ICASSP 2024 in Korea! SLT 2024
Nov 26, 2023 馃敟 We released Amphion v0.1 GitHub stars, which is an open-source toolkit for audio, music, and speech generation.

selected publications

  1. ICLR 2026
    FlexiCodec: A Dynamic Neural Audio Codec for Low Frame Rates
    Jiaqi Li,聽Yao Qian,聽Yuxuan Hu,聽Leying Zhang,聽Xiaofei Wang,聽Heng Lu,聽Manthan Thakker,聽Jinyu Li,聽Sheng Zhao,聽and聽Zhizheng Wu
    In International Conference on Learning Representations (ICLR), 2026
    TL;DR: We develop a dynamic neural audio codec with controllable low frame rates for efficient speech generation.
  2. Interspeech 2025
    DualCodec: A Low-Frame-Rate, Semantically-Enhanced Neural Audio Codec for Speech Generation
    Jiaqi Li,聽Xiaolong Lin,聽Zhekai Li,聽Shixi Huang,聽Yuancheng Wang,聽Chaoren Wang,聽Zhenpeng Zhan,聽and聽Zhizheng Wu
    In Proceedings of Interspeech 2025, 2025
    TL;DR: We propose a low-frame-rate neural audio codec that strengthens semantic information for speech generation.
  3. Tech Report
    Overview of the Amphion Toolkit (v0. 2)
    Jiaqi Li,聽Xueyao Zhang,聽Yuancheng Wang,聽Haorui He,聽Chaoren Wang,聽Li Wang,聽Huan Liao,聽Junyi Ao,聽Zeyu Xie,聽Yiqiao Huang,聽and聽 others
    arXiv preprint arXiv:2501.15442, 2025
    TL;DR: This is the technical report for the second version of the Amphion toolkit.
  4. SLT 2024
    Investigating neural audio codecs for speech language model-based speech generation
    Jiaqi Li,聽Dongmei Wang,聽Xiaofei Wang,聽Yao Qian,聽Long Zhou,聽Shujie Liu,聽Midia Yousefi,聽Canrun Li,聽Chung-Hsien Tsai,聽Zhen Xiao,聽and聽 others
    In 2024 IEEE Spoken Language Technology Workshop (SLT), 2024
  5. SLT 2024
    Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation
    Haorui He,聽Zengqiang Shang,聽Chaoren Wang,聽Xuyuan Li,聽Yicheng Gu,聽Hua Hua,聽Liwei Liu,聽Chen Yang,聽Jiaqi Li,聽Peiyang Shi,聽Yuancheng Wang,聽Kai Chen,聽Pengyuan Zhang,聽and聽Zhizheng Wu
    In 2024 IEEE Spoken Language Technology Workshop (SLT), 2024
    TL;DR: We collect a 100k hours in-the-wild speech dataset for speech generation.
  6. SLT 2024
    Amphion: an Open-Source Audio, Music, and Speech Generation Toolkit
    Xueyao Zhang*,聽Liumeng Xue*,聽Yicheng Gu*,聽Yuancheng Wang*,聽Jiaqi Li,聽Haorui He,聽Chaoren Wang,聽Songting Liu,聽Xi Chen,聽Junan Zhang,聽Tze Ying Tang,聽Lexiao Zou,聽Mingxuan Wang,聽Jun Han,聽Kai Chen,聽Haizhou Li,聽and聽Zhizheng Wu
    In 2024 IEEE Spoken Language Technology Workshop (SLT), 2024
    TL;DR: We develop a unified toolkit for audio, music, and speech generation.
  7. ICASSP 2024
    An initial investigation of neural replay simulator for over-the-air adversarial perturbations to automatic speaker verification
    Jiaqi Li,聽Li Wang,聽Liumeng Xue,聽Lei Wang,聽and聽Zhizheng Wu
    In ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024
  8. ICASSP 2024
    Advsv: An over-the-air adversarial attack dataset for speaker verification
    Li Wang,聽Jiaqi Li,聽Yuhao Luo,聽Jiahao Zheng,聽Lei Wang,聽Hao Li,聽Ke Xu,聽Chengfang Fang,聽Jie Shi,聽and聽Zhizheng Wu
    In ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024

internships

Microsoft Research
Research Intern Redmond, Washington (Remote) 2025.04 - 2025.08
Dynamic Frame Rate Neural Audio Coding
Dynamic Frame Rate Neural Audio Coding
Baidu Inc., Shenzhen
Research Intern Shenzhen, China 2024.6 - 2025.01
Efficient, High-Quality Neural Audio Codec for Zero-Shot TTS
Neural Audio Codec Speech Synthesis
Microsoft Research
Research Intern Redmond, Washington (Remote) 2023.10 - 2024.04
Zero-Shot TTS with Neural Audio Codecs
Neural Audio Codec Speech Synthesis