Jiaqi Li's Homepage

ljq.jpg
jiaqili3[at]link.cuhk.edu.cn

I am a first-year Ph.D. student at the Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), SDS, supervised by Professor Zhizheng Wu. Before that, I received B.S. degree at CUHK-Shenzhen.

My research interest includes text-to-speech synthesis and neural audio coding. I am one of the main contributors and leaders of the open-source Amphion toolkit. Currently, I am working on open-sourcing a new neural audio codec named DualCodec and some TTS systems using DualCodec.

news

Feb 01, 2025 We released the Amphion v0.2 technical report, summarizing our development of Amphion in 2024.
Dec 03, 2024 I presented our new paper, Investigating neural audio codecs for speech language model-based speech generation in SLT 2024. SLT 2024
Aug 25, 2024 🎉 Our papers, Amphion and Emila, got accepted by IEEE SLT 2024!
Jul 28, 2024 🔥 We released Emila: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation, with 101k hours of speech in six languages and features diverse speech with varied speaking styles.
Apr 19, 2024 I presented our paper, An initial investigation of neural replay simulator for over-the-air adversarial perturbations to automatic speaker verification in ICASSP 2024 in Korea! SLT 2024
Nov 26, 2023 🔥 We released Amphion v0.1 GitHub stars, which is an open-source toolkit for audio, music, and speech generation.

selected publications

  1. Tech Report
    Overview of the Amphion Toolkit (v0. 2)
    Jiaqi Li, Xueyao Zhang, Yuancheng Wang, Haorui He, Chaoren Wang, Li Wang, Huan Liao, Junyi Ao, Zeyu Xie, Yiqiao Huang, and  others
    arXiv preprint arXiv:2501.15442, 2025
    TL;DR: This is the technical report for the second version of the Amphion toolkit.
  2. SLT 2024
    Investigating neural audio codecs for speech language model-based speech generation
    Jiaqi Li, Dongmei Wang, Xiaofei Wang, Yao Qian, Long Zhou, Shujie Liu, Midia Yousefi, Canrun Li, Chung-Hsien Tsai, Zhen Xiao, and  others
    In 2024 IEEE Spoken Language Technology Workshop (SLT), 2024
  3. SLT 2024
    Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation
    Haorui He, Zengqiang Shang, Chaoren Wang, Xuyuan Li, Yicheng Gu, Hua Hua, Liwei Liu, Chen Yang, Jiaqi Li, Peiyang Shi, Yuancheng Wang, Kai Chen, Pengyuan Zhang, and Zhizheng Wu
    In 2024 IEEE Spoken Language Technology Workshop (SLT), 2024
    TL;DR: We collect a 100k hours in-the-wild speech dataset for speech generation.
  4. SLT 2024
    Amphion: an Open-Source Audio, Music, and Speech Generation Toolkit
    Xueyao Zhang*, Liumeng Xue*, Yicheng Gu*, Yuancheng Wang*Jiaqi Li, Haorui He, Chaoren Wang, Songting Liu, Xi Chen, Junan Zhang, Tze Ying Tang, Lexiao Zou, Mingxuan Wang, Jun Han, Kai Chen, Haizhou Li, and Zhizheng Wu
    In 2024 IEEE Spoken Language Technology Workshop (SLT), 2024
    TL;DR: We develop a unified toolkit for audio, music, and speech generation.
  5. ICASSP 2024
    An initial investigation of neural replay simulator for over-the-air adversarial perturbations to automatic speaker verification
    Jiaqi Li, Li Wang, Liumeng Xue, Lei Wang, and Zhizheng Wu
    In ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024
  6. ICASSP 2024
    Advsv: An over-the-air adversarial attack dataset for speaker verification
    Li Wang, Jiaqi Li, Yuhao Luo, Jiahao Zheng, Lei Wang, Hao Li, Ke Xu, Chengfang Fang, Jie Shi, and Zhizheng Wu
    In ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024

internships

Baidu Inc., Shenzhen
Research Intern · Shenzhen, China · 2024.6 - 2025.01
Efficient, High-Quality Neural Audio Codec for Zero-Shot TTS
Neural Audio Codec Speech Synthesis
Microsoft Research
Research Intern · Redmond, Washington (Remote) · 2023.10 - 2024.04
Zero-Shot TTS with Neural Audio Codecs
Neural Audio Codec Speech Synthesis