Publications

ISCA

LUT Tensor Core: A Software-Hardware Co-Design for LUT-Based Low-Bit LLM Inference

Zhiwen Mo, Lei Wang, Jianyu Wei, Zhichen Zeng, Shijie Cao, Lingxiao Ma, Naifeng Jing, Ting Cao, Jilong Xue, Fan Yang, Mao Yang

The 52nd International Symposium on Computer Architecture (ISCA'25), 2025

2025

arxiv

DeepStack: Scalable and Accurate Design Space Exploration for Distributed 3D-Stacked AI Accelerators

Zhiwen Mo, Guoyu Li, Hao (Mark) Chen, Yu Cheng, Zhengju Tang, Qianzhou Wang, Lei Wang, Shuang Liang, Lingxiao Ma, Xianqi Zhou, Yuxiao Guo, Wayne Luk, Jilong Xue, Hongxiang Fan

arXiv preprint arXiv:2604.04750, 2026

2026

Internal

A GPU Kernel Research Project

Zhiwen Mo, et al.

Research in progress, 2026

2026

ASPLOS

Democratizing Agentic AI with Fast Test-Time Scaling on the Edge

Hao (Mark) Chen, Zhiwen Mo, Guanxi Lu, Shuang Liang, Lingxiao Ma, Wayne Luk, Hongxiang Fan

The31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'26), 2026

2026

ICLR Oral

TileLang: A Composable Tiled Programming Model for AI Systems

Lei Wang, Yu Cheng, Yining Shi, Zhiwen Mo, Zhengju Tang, Wenhao Xie, Tong Wu, Lingxiao Ma, Yuqing Xia, Jilong Xue, Fan Yang, Zhi Yang

The Fourteenth International Conference on Learning Representations (ICLR'26 Oral), 2026

2026

MLSys

Flash3DGS: Algorithm and System Co-Optimization for Fast 3D Gaussian Splatting on GPUs

Lingjun Gao, Zhican Wang, Zhiwen Mo, Hongxiang Fan

The Ninth Annual Conference on Machine Learning and Systems (MLSys'26), 2026

2026

ISCA

Coset Ensemble Decoder for Quantum Error Correction with Algorithm-Hardware Co-Design

Shuang Liang, Jubo Xu, Giulio Bassanino, Qianzhou Wang, Yidong Zhou, Yuncheng Lu, Zhiwen Mo, Paul Kelly, Bo Yuan, Wayne Luk, Hongxiang Fan

The 53rd International Symposium on Computer Architecture (ISCA'26), 2026

2026

ISCA

PLENA: Breaking the Memory Walls for Agentic LLM Inference

Haoran Wu, Can Xiao, Jiayi Nie, Xuan Guo, Binglei Lou, Jeffrey T.H. Wong, Zhiwen Mo, Cheng Zhang, Przemyslaw Forys, Chengyang Ai, Timi Adeniran, Wayne Luk, Hongxiang Fan, Jianyi Cheng, Timothy M. Jones, Rika Antonova, Robert Mullins, Aaron Zhao

The 53rd International Symposium on Computer Architecture (ISCA'26), 2026

2026

arxiv

Dynamic Expert Sharing: Decoupling Memory from Parallelism in Mixture-of-Experts Diffusion LLMs

Hao (Mark) Chen, Zhiwen Mo, Royson Lee, Qianzhou Wang, Da Li, Shell Xu Hu, Wayne Luk, Timothy Hospedales, Hongxiang Fan

arXiv preprint arXiv:2602.00879, 2026

2026

OSDI

PipeThreader: Software-Defined Pipelining for Efficient DNN Execution

Yu Cheng, Lei Wang, Yining Shi, and Yuqing Xia, Lingxiao Ma, Jilong Xue, Yang Wang, Zhiwen Mo, Feiyang Chen, Fan Yang, Mao Yang, Zhi Yang

The 19th USENIX Symposium on Operating Systems Design and Implementation (OSDI'25), 2025

2025

NIPS

Rethinking Optimal Verification Granularity for Compute-Efficient Test-Time Scaling

Hao (Mark) Chen, Guanxi Lu, Yasuyuki Okoshi, Zhiwen Mo, Masato Motomura, Hongxiang Fan

The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NIPS'25), 2025

2025

DAC

Enabling Multiple Tensor-wise Operator Fusion for Transformer Models on Spatial Accelerators

Lei Xu, Zhiwen Mo, Qin Wang, Jianfei Jiang, Naifeng Jing

The 62nd ACM/IEEE Design Automation Conference (DAC'24), 2024

2024