Publications

ISCA
LUT Tensor Core: A Software-Hardware Co-Design for LUT-Based Low-Bit LLM Inference

Zhiwen Mo, Lei Wang, Jianyu Wei, Zhichen Zeng, Shijie Cao, Lingxiao Ma, Naifeng Jing, Ting Cao, Jilong Xue, Fan Yang, Mao Yang

The 52nd International Symposium on Computer Architecture (ISCA'25), 2025
2025
arxiv
DeepStack: Scalable and Accurate Design Space Exploration for Distributed 3D-Stacked AI Accelerators

Zhiwen Mo, Guoyu Li, Hao (Mark) Chen, Yu Cheng, Zhengju Tang, Qianzhou Wang, Lei Wang, Shuang Liang, Lingxiao Ma, Xianqi Zhou, Yuxiao Guo, Wayne Luk, Jilong Xue, Hongxiang Fan

arXiv preprint arXiv:2604.04750, 2026
2026
Internal
A GPU Kernel Research Project

Zhiwen Mo, et al.

Research in progress, 2026
2026
ASPLOS
Democratizing Agentic AI with Fast Test-Time Scaling on the Edge

Hao (Mark) Chen, Zhiwen Mo, Guanxi Lu, Shuang Liang, Lingxiao Ma, Wayne Luk, Hongxiang Fan

The31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'26), 2026
2026
ICLR Oral
TileLang: A Composable Tiled Programming Model for AI Systems

Lei Wang, Yu Cheng, Yining Shi, Zhiwen Mo, Zhengju Tang, Wenhao Xie, Tong Wu, Lingxiao Ma, Yuqing Xia, Jilong Xue, Fan Yang, Zhi Yang

The Fourteenth International Conference on Learning Representations (ICLR'26 Oral), 2026
2026
MLSys
Flash3DGS: Algorithm and System Co-Optimization for Fast 3D Gaussian Splatting on GPUs

Lingjun Gao, Zhican Wang, Zhiwen Mo, Hongxiang Fan

The Ninth Annual Conference on Machine Learning and Systems (MLSys'26), 2026
2026
ISCA
Coset Ensemble Decoder for Quantum Error Correction with Algorithm-Hardware Co-Design

Shuang Liang, Jubo Xu, Giulio Bassanino, Qianzhou Wang, Yidong Zhou, Yuncheng Lu, Zhiwen Mo, Paul Kelly, Bo Yuan, Wayne Luk, Hongxiang Fan

The 53rd International Symposium on Computer Architecture (ISCA'26), 2026
2026
ISCA
PLENA: Breaking the Memory Walls for Agentic LLM Inference

Haoran Wu, Can Xiao, Jiayi Nie, Xuan Guo, Binglei Lou, Jeffrey T.H. Wong, Zhiwen Mo, Cheng Zhang, Przemyslaw Forys, Chengyang Ai, Timi Adeniran, Wayne Luk, Hongxiang Fan, Jianyi Cheng, Timothy M. Jones, Rika Antonova, Robert Mullins, Aaron Zhao

The 53rd International Symposium on Computer Architecture (ISCA'26), 2026
2026
arxiv
Dynamic Expert Sharing: Decoupling Memory from Parallelism in Mixture-of-Experts Diffusion LLMs

Hao (Mark) Chen, Zhiwen Mo, Royson Lee, Qianzhou Wang, Da Li, Shell Xu Hu, Wayne Luk, Timothy Hospedales, Hongxiang Fan

arXiv preprint arXiv:2602.00879, 2026
2026
OSDI
PipeThreader: Software-Defined Pipelining for Efficient DNN Execution

Yu Cheng, Lei Wang, Yining Shi, and Yuqing Xia, Lingxiao Ma, Jilong Xue, Yang Wang, Zhiwen Mo, Feiyang Chen, Fan Yang, Mao Yang, Zhi Yang

The 19th USENIX Symposium on Operating Systems Design and Implementation (OSDI'25), 2025
2025
NIPS
Rethinking Optimal Verification Granularity for Compute-Efficient Test-Time Scaling

Hao (Mark) Chen, Guanxi Lu, Yasuyuki Okoshi, Zhiwen Mo, Masato Motomura, Hongxiang Fan

The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NIPS'25), 2025
2025
DAC
Enabling Multiple Tensor-wise Operator Fusion for Transformer Models on Spatial Accelerators

Lei Xu, Zhiwen Mo, Qin Wang, Jianfei Jiang, Naifeng Jing

The 62nd ACM/IEEE Design Automation Conference (DAC'24), 2024
2024