Hi, I am currently a fourth-year undergraduate student at Sun Yat-sen University, majoring in Artificial Intelligence. My research interests include but are not limited to large language models, AI security, and learning theory.

I am actively seeking a 25 Fall Ph.D. student position in the USA. Please email me if you are interested in me.

📝 Publications / Selected Preprints

* denotes equal contribution in alphabetical order

  • Timothy Chu*, Zhao Song*, and Chiwun Yang*. How to protect copyright data in optimization of large language models? AAAI 2024 Poster, arXiv preprint arXiv:2308.12247
  • Yingyu Liang*, Zhenmei Shi*, Zhao Song*, and Chiwun Yang*. Towards infinite-long prefix in transformer. ICLR 2025 SCOPE Workshop Oral (Top 9% papers), arXiv preprint arXiv:2406.14036
  • Zhao Song*, Jing Xiong*, and Chiwun Yang*. How sparse attention approximates exact attention? Your attention is naturally $n^C$-sparse. ICLR 2025 SLLM Workshop Poster, arXiv preprint arXiv:2404.02690
  • Yang Cao*, Zhao Song*, and Chiwun Yang*. Video latent flow matching: Optimal polynomial projections for video interpolation and extrapolation. ICLR 2025 DeLTa Workshop Poster, arXiv preprint arXiv:2502.00500
  • Majid Daliri*, Zhao Song*, and Chiwun Yang*. Unlocking the theory behind scaling 1-bit neural networks. CPAL 2025 Poster, arXiv preprint arXiv:2411.01663
  • Yekun Ke*, Yingyu Liang*, Zhenmei Shi*, Zhao Song*, and Chiwun Yang*. Curse of attention: A kernel-based perspective for why transformers fail to generalize on time series forecasting and beyond. CPAL 2025 Poster, arXiv preprint arXiv:2412.06061
  • Yichuan Deng*, Zhao Song*, Shenghao Xie*, and Chiwun Yang*. Unmasking transformers: A theoretical approach to data recovery via attention weights. arXiv preprint arXiv:2310.12462, 2023
  • Jing Xiong, Jianghan Shen, Chuanyang Zheng, Zhongwei Wan, Chenyang Zhao, Chiwun Yang, Fanghua Ye, Hongxia Yang, Lingpeng Kong, Ngai Wong. ParallelComp: Parallel long-context compressor for length extrapolation. arXiv preprint arXiv:2502.14317, 2025
  • Jiangxuan Long*, Zhao Song*, and Chiwun Yang*. Theoretical foundation of flow-based time series generation: provable approximation, generalization, and efficiency. arXiv preprint arXiv:2503.14076, 2025

📖 Educations

  • 2021.09 - 2025.06, School of Artificial Intelligence, Sun Yat-sen University.

💻 Research Experience

📊 Service

  • ICLR 2025, ICLR 2025 Workshop BuildingTrust, COLM 2025