教师名录
邮箱:liang-xy@sjtu.edu.cn
所在研究所:可扩展计算研究所
个人简介
梁晓峣,1946伟德国际源自英国教授。博士毕业于美国哈佛大学(Harvard University),上海市青年五四奖章获得者,曾任美国英伟达公司总部资深GPU芯片架构师。他主要致力于计算机体系结构尤其是GPU和集成电路方面的研究。已在国际会议与期刊杂志累计发表百余篇论文,包括计算机体系结构四大顶会ISCA、HPCA、MICRO、ASPLOS。他入选体系结构顶会MICRO名人堂,2次入选计算机体系结构年度最佳论文(IEEE MICRO TOP PICKS),连续获得了2023年和2024年芯片自动化设计顶会DATE最佳论文奖。其编著的《昇腾AI处理器架构与编程》是第一本介绍“昇腾”AI芯片架构设计的参考书并获得清华出版社2020年度畅销书奖,他编著的《通用图形处理器设计-GPGPU编程模型与架构原理》是国内第一本介绍GPGPU体系架构和软硬件设计的专业教材。他发起的开源平台“青花瓷”提供了一个完整兼容CUDA的现代GPGPU芯片设计,包括指令集、微体系结构、编译器和硬件描述,对设计下一代AI算力芯片具有极具价值的现实指导意义。
教育背景
(1) 2005-02 至 2009-02, 哈佛大学, 计算机体系结构, 博士
(2) 2003-09 至 2004-12, 纽约州立大学石溪分校, 集成电路, 硕士
(3) 1996-09 至 2000-06, 复旦大学, 通信工程, 学士
工作履历
(1) 2013-01 至 今, 1946伟德国际源自英国, 1946伟德国际源自英国, 教授
(2) 2009-01 至 2012-12, 美国NVIDIA公司, GPU架构部, 高级架构师
教授课程
并行与分布式程序设计
GPU计算及深度学习
多核计算与并行处理
计算机系统
论文发表
2025
Zhuoran Song, Jiabei Long
, Li Jiang
, Naifeng Jing
, Xiaoyao Liang
:
GCNTrain+: A Versatile and Efficient Accelerator for Graph Convolutional Neural Network Training. ACM Trans. Archit. Code Optim. 22(1): 22:1-22:22 (2025)
Xuhang Wang, Zhuoran Song
, Chunyu Qi
, Fangxin Liu
, Naifeng Jing
, Li Jiang
, Xiaoyao Liang
:
RTSA: A Run-Through Sparse Attention Framework for Video Transformer. IEEE Trans. Computers 74(6): 1949-1962 (2025)
Xuhang Wang, Qiyue Huang, Xing Li
, Haozhe Jiang, Qiang Xu
, Xiaoyao Liang
, Zhuoran Song
:
Vision Transformer Acceleration via a Versatile Attention Optimization Framework. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 44(6): 2398-2411 (2025)
Cheng Gu, Gang Li, Xiaolong Lin, Jiayao Ling, Jian Cheng, Xiaoyao Liang:
BMP-SD: Marrying Binary and Mixed-Precision Quantization for Efficient Stable Diffusion Inference. DATE 2025: 1-7
Jiayao Ling, Gang Li, Qinghao Hu, Xiaolong Lin, Cheng Gu, Jian Cheng, Xiaoyao Liang:
SBQ: Exploiting Significant Bits for Efficient and Accurate Post-Training DNN Quantization. DATE 2025: 1-7
Cheng Gu, Gang Li, Xuan Zhang, Jiayao Ling, Xiaolong Lin, Zhuoran Song, Jian Cheng, Xiaoyao Liang:
Light-DiT: An Importance-Aware Dynamic Compression Framework for Diffusion Transformers. Euro-Par (2) 2025: 349-364
Houshu He, Gang Li, Fangxin Liu, Li Jiang, Xiaoyao Liang, Zhuoran Song:
GSArch: Breaking Memory Barriers in 3D Gaussian Splatting Training via Architectural Support. HPCA 2025: 366-379
Ruiyang Chen, Xing Li, Xiaoyao Liang, Zhuoran Song:
GIFTS: Efficient GCN Inference Framework on PyTorch-CPU via Exploring the Sparsity. IPDPS 2025: 1286-1297
2024
Zhuoran Song, Zhongkai Yu
, Xinkai Song
, Yifan Hao
, Li Jiang
, Naifeng Jing
, Xiaoyao Liang
:
Environmental Condition Aware Super-Resolution Acceleration Framework in Server-Client Hierarchies. ACM Trans. Archit. Code Optim. 21(4): 65:1-65:26 (2024)
Fangxin Liu, Wenbo Zhao
, Zongwu Wang
, Yongbiao Chen
, Xiaoyao Liang
, Li Jiang
:
ERA-BS: Boosting the Efficiency of ReRAM-Based PIM Accelerator With Fine-Grained Bit-Level Sparsity. IEEE Trans. Computers 73(9): 2320-2334 (2024)
Chen Nie, Chenyu Tang
, Jie Lin
, Huan Hu
, Chenyang Lv
, Ting Cao
, Weifeng Zhang
, Li Jiang
, Xiaoyao Liang
, Weikang Qian
, Yanan Sun
, Zhezhi He
:
VSPIM: SRAM Processing-in-Memory DNN Acceleration via Vector-Scalar Operations. IEEE Trans. Computers 73(10): 2378-2390 (2024)
Xing Li, Zhuoran Song
, Rachata Ausavarungnirun
, Xiao Liu, Xueyuan Liu
, Xuan Zhang
, Xuhang Wang
, Jiayao Ling, Gang Li
, Naifeng Jing
, Xiaoyao Liang:
Janus: A Flexible Processing-in-Memory Graph Accelerator Toward Sparsity. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 43(12): 4813-4826 (2024)
Zhuoran Song, Chunyu Qi
, Fangxin Liu
, Naifeng Jing
, Xiaoyao Liang
:
CMC: Video Transformer Acceleration via CODEC Assisted Matrix Condensing. ASPLOS (2) 2024: 201-215
Zeyu Zhu, Peisong Wang
, Qinghao Hu
, Gang Li
, Xiaoyao Liang
, Jian Cheng
:
FastGL: A GPU-Efficient Framework for Accelerating Sampling-Based GNN Training at Large Scale. ASPLOS (4) 2024: 94-110
Xueyuan Liu, Zhuoran Song
, Hao Chen
, Xing Li
, Xiaoyao Liang
:
MoC: A Morton-Code-Based Fine-Grained Quantization for Accelerating Point Cloud Neural Networks. DAC 2024: 42:1-42:6
Zhuoran Song, Chunyu Qi
, Yuanzheng Yao
, Peng Zhou
, Yanyi Zi
, Nan Wang
, Xiaoyao Liang
:
TSAcc: An Efficient Tempo-Spatial Similarity Aware Accelerator for Attention Acceleration. DAC 2024: 68:1-68:6
Xuhang Wang, Zhuoran Song
, Xiaoyao Liang
:
InterArch: Video Transformer Acceleration via Inter-Feature Deduplication with Cube-based Dataflow. DAC 2024: 99:1-99:6
Xueyuan Liu, Zhuoran Song, Guohao Dai, Gang Li, Can Xiao, Yan Xiang, Dehui Kong, Ke Xu, Xiaoyao Liang:
FusionArch: A Fusion-Based Accelerator for Point-Based Point Cloud Neural Networks. DATE 2024: 1-6
Xueyuan Liu, Zhuoran Song, Xiang Liao, Xing Li, Tao Yang, Fangxin Liu, Xiaoyao Liang:
Sava: A Spatial- and Value-Aware Accelerator for Point Cloud Transformer. DATE 2024: 1-6
Xuan Zhang, Zhuoran Song, Xing Li, Zhezhi He, Naifeng Jing, Li Jiang, Xiaoyao Liang:
Watt: A Write-Optimized RRAM-Based Accelerator for Attention. Euro-Par (2) 2024: 107-120
Zeyu Zhu, Fanrong Li, Gang Li, Zejian Liu, Zitao Mo, Qinghao Hu, Xiaoyao Liang, Jian Cheng:
MEGA: A Memory-Efficient GNN Accelerator Exploiting Degree-Aware Mixed-Precision Quantization. HPCA 2024: 124-138
Xuan Zhang, Zhuoran Song, Peng Zhou, Xing Li, Xueyuan Liu, Xiaolong Lin, Zhezhi He, Li Jiang, Naifeng Jing, Xiaoyao Liang:
Early: An Importance-Aware Early Firing and Exit for SNN Acceleration. ICCD 2024: 624-627
Yilong Zhao, Mingyu Gao, Fangxin Liu, Yiwei Hu, Zongwu Wang, Han Lin, Jin Li, He Xian, Hanlin Dong, Tao Yang, Naifeng Jing, Xiaoyao Liang, Li Jiang:
UM-PIM: DRAM-based PIM with Uniform & Shared Memory Space. ISCA 2024: 644-659
Cheng Gu, Gang Li, Xiaolong Lin, Jiayao Ling, Xiaoyao Liang:
GNeRF: Accelerating Neural Radiance Fields Inference via Adaptive Sample Gating. ISCAS 2024: 1-5
Zhuoran Song, Houshu He, Fangxin Liu, Yifan Hao, Xinkai Song, Li Jiang, Xiaoyao Liang:
SRender: Boosting Neural Radiance Field Efficiency via Sensitivity-Aware Dynamic Precision Rendering. MICRO 2024: 525-537
2023
Jing Ke, Yizhou Lu, Yiqing Shen, Junchao Zhu
, Yijin Zhou
, Jinghan Huang, Jieteng Yao, Xiaoyao Liang, Yi Guo
, Zhonghua Wei
, Sheng Liu, Qin Huang
, Fusong Jiang
, Dinggang Shen:
ClusterSeg: A crowd cluster pinpointed nucleus segmentation framework with cross-modality datasets. Medical Image Anal. 85: 102758 (2023)
Fangxin Liu, Zongwu Wang, Yongbiao Chen, Zhezhi He
, Tao Yang
, Xiaoyao Liang, Li Jiang
:
SoBS-X: Squeeze-Out Bit Sparsity for ReRAM-Crossbar-Based Neural Network Accelerator. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 42(1): 204-217 (2023)
Zhuoran Song, Heng Lu
, Li Jiang
, Naifeng Jing
, Xiaoyao Liang:
Real-Time Video Recognition via Decoder-Assisted Neural Network Acceleration Framework. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 42(7): 2238-2251 (2023)
Yiqing Shen, Arcot Sowmya, Yulin Luo, Xiaoyao Liang, Dinggang Shen
, Jing Ke
:
A Federated Learning System for Histopathology Image Analysis With an Orchestral Stain-Normalization GAN. IEEE Trans. Medical Imaging 42(7): 1969-1981 (2023)
Zhuoran Song, Naifeng Jing
, Xiaoyao Liang
:
E2-VOR: An End-to-End En/Decoder Architecture for Efficient Video Object Recognition. ACM Trans. Design Autom. Electr. Syst. 28(1): 10:1-10:21 (2023)
Zhuoran Song, Wanzhen Liu
, Tao Yang
, Fangxin Liu
, Naifeng Jing
, Xiaoyao Liang
:
A Point Cloud Video Recognition Acceleration Framework Based on Tempo-Spatial Information. IEEE Trans. Parallel Distributed Syst. 34(12): 3224-3237 (2023)
Zhuoran Song, Naifeng Jing
, Xiaoyao Liang
:
PRADA: Point Cloud Recognition Acceleration via Dynamic Approximation. ACM TUR-C 2023: 49-50
Xiaolong Lin, Gang Li, Zizhao Liu, Yadong Liu, Fan Zhang, Zhuoran Song, Naifeng Jing, Xiaoyao Liang:
AdaS: A Fast and Energy-Efficient CNN Accelerator Exploiting Bit-Sparsity. DAC 2023: 1-6
Zhuoran Song, Heng Lu, Gang Li, Li Jiang, Naifeng Jing, Xiaoyao Liang:
PRADA: Point Cloud Recognition Acceleration via Dynamic Approximation. DATE 2023: 1-6
Chunyu Qi, Zilong Li, Zhuoran Song, Xiaoyao Liang:
ViTframe: Vision Transformer Acceleration via Informative Frame Selection for Video Recognition. ICCD 2023: 383-390
Xuhang Wang, Zhuoran Song, Qiyue Huang, Xiaoyao Liang:
DEQ: Dynamic Element-wise Quantization for Efficient Attention Architecture. ICCD 2023: 623-630
Xuhang Wang, Zhuoran Song, Xiaoyao Liang:
RealArch: A Real-Time Scheduler for Mapping Multi-Tenant DNNs on Multi-Core Accelerators. ICCD 2023: 158-165
Xuan Zhang, Zhuoran Song, Xing Li, Zhezhi He, Li Jiang, Naifeng Jing, Xiaoyao Liang:
HyAcc: A Hybrid CAM-MAC RRAM-based Accelerator for Recommendation Model. ICCD 2023: 375-382
Zeyu Zhu, Fanrong Li, Zitao Mo, Qinghao Hu, Gang Li, Zejian Liu, Xiaoyao Liang, Jian Cheng:
A2Q: Aggregation-Aware Quantization for Graph Neural Networks. ICLR 2023
2022
Xiaofeng Hou, Chao Li, Jinghang Yang, Wenli Zheng
, Xiaoyao Liang, Minyi Guo
:
Integrated Power Anomaly Defense: Towards Oversubscription-Safe Data Centers. IEEE Trans. Cloud Comput. 10(3): 1875-1887 (2022)
Fangxin Liu, Wenbo Zhao, Zongwu Wang, Yongbiao Chen, Zhezhi He, Naifeng Jing, Xiaoyao Liang, Li Jiang:
EBSP: evolving bit sparsity patterns for hardware-friendly inference of quantized deep neural networks. DAC 2022: 259-264
Zhuoran Song, Zhongkai Yu, Naifeng Jing, Xiaoyao Liang:
E2SR: an end-to-end video CODEC assisted system for super resolution acceleration. DAC 2022: 229-234
Yu Gong, Zhihan Xu, Zhezhi He, Weifeng Zhang, Xiaobing Tu, Xiaoyao Liang, Li Jiang:
N3H-Core: Neuron-designed Neural Network Accelerator via FPGA-based Heterogeneous Computing Cores. FPGA 2022: 112-122
Xing Li, Rachata Ausavarungnirun, Xiao Liu, Xueyuan Liu, Xuan Zhang, Heng Lu, Zhuoran Song, Naifeng Jing, Xiaoyao Liang:
Gzippo: Highly-Compact Processing-in-Memory Graph Accelerator Alleviating Sparsity and Redundancy. ICCAD 2022: 115:1-115:9
Heng Lu, Zhuoran Song, Xing Li, Naifeng Jing, Xiaoyao Liang:
GCNTrain: A Unified and Efficient Accelerator for Graph Convolutional Neural Network Training. ICCD 2022: 730-737
Gang Li, Weixiang Xu, Zhuoran Song, Naifeng Jing, Jian Cheng, Xiaoyao Liang:
Ristretto: An Atomized Processing Architecture for Sparsity-Condensed Stream Flow in CNN. MICRO 2022: 1434-1450