FlexiGPT: Pruning and Extending Large Language Models with Low-Rank Weight Sharing
NAACL, 2025
Paper
/
Cite
@misc{smith2025flexigptpruningextendinglarge,
title={FlexiGPT: Pruning and Extending Large Language Models with Low-Rank Weight Sharing},
author={James Seale Smith and Chi-Heng Lin and Shikhar Tuli and Haris Jeelani and Shangqian Gao and Yilin Shen and Hongxia Jin and Yen-Chang Hsu},
year={2025},
eprint={2501.14713},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2501.14713},
}
Large Language Models
LLM Compression
PEFT
Layer pruning enables fast compression, while PEFT and weight-sharing effectively restore LLM performance.
|
MoDeGPT: Modular Decomposition for Large Language Model Compression
ICLR, 2025
(Oral Presentation, top 1.8%)
Paper
/
Cite
@article{lin2024modegpt,
title={Modegpt: Modular decomposition for large language model compression},
author={Lin, Chi-Heng and Gao, Shangqian and Smith, James Seale and Patel, Abhishek and Tuli, Shikhar and Shen, Yilin and Jin, Hongxia and Hsu, Yen-Chang},
journal={arXiv preprint arXiv:2408.09632},
year={2024}
}
Large Language Models
LLM Compression
PEFT
Low-rank decomposition for weight matrices can achieve SOTA compression when transformers have been partitioned into modules.
|
DISP-LLM: Dimension-Independent Structural Pruning for Large Language Models
NeurIPS, 2024
Paper
/
Cite
@article{gao2024disp,
title={DISP-LLM: Dimension-Independent Structural Pruning for Large Language Models},
author={Gao, Shangqian and Lin, Chi-Heng and Hua, Ting and Zheng, Tang and Shen, Yilin and Jin, Hongxia and Hsu, Yen-Chang},
journal={arXiv preprint arXiv:2410.11988},
year={2024}
}
Large Language Models
LLM Compression
PEFT
Consecutive dimensions of blocks can be pruned independently using simple index addition and selection. This flexibility enhances the compression quality.
|
SLiM: Speculative Decoding with Hypothesis Reduction
NAACL-Findings, 2024
Paper
/
Cite
@inproceedings{lin2024slim,
title={SLiM: Speculative Decoding with Hypothesis Reduction},
author={Lin, Chi-Heng and Tuli, Shikhar and Smith, James and Hsu, Yen-Chang and Shen, Yilin and Jin, Hongxia},
booktitle={Findings of the Association for Computational Linguistics: NAACL 2024},
pages={1005--1017},
year={2024}
}
Large Language Models
Multi-token Prediction
Multi-token predictions enhanced by bi-gram tables significantly reduce computational FLOPS during token verification.
|
DynaMo: Accelerating Language Model Inference with Dynamic Multi-Token Sampling
NAACL, 2024
Paper
/
Cite
@article{tuli2024dynamo,
title={DynaMo: Accelerating Language Model Inference with Dynamic Multi-Token Sampling},
author={Tuli, Shikhar and Lin, Chi-Heng and Hsu, Yen-Chang and Jha, Niraj K and Shen, Yilin and Jin, Hongxia},
journal={arXiv preprint arXiv:2405.00888},
year={2024}
}
Large Language Models
Multi-token Prediction
Co-concurrence masking with adaptive thresholding enables multi-token predictions, achieving 2.57x speedup with under 5% overhead.
|
The Good, the Bad and the Ugly Sides of Data Augmentation: An Implicit Spectral Regularization Perspective
JMLR, 2024a
Paper
/
Cite
@article{lin2024good,
title={The good, the bad and the ugly sides of data augmentation: An implicit spectral regularization perspective},
author={Lin, Chi-Heng and Kaushik, Chiraag and Dyer, Eva L and Muthukumar, Vidya},
journal={Journal of Machine Learning Research},
volume={25},
number={91},
pages={1--85},
year={2024}
}
Data Augmentation
Machine Learning Theory
We reveal a close relationship between data augmentation and spectral regularization. Unlike ridge regression, it can both help and hurt generalization.
|
Your Contrastive Learning Problem is Secretly a Distribution Alignment Problem
NeurIPS, 2024
Paper
/
Cite
@inproceedings{chenyour,
title={Your contrastive learning problem is secretly a distribution alignment problem},
author={Chen, Zihao and Lin, Chi-Heng and Liu, Ran and Xiao, Jingyun and Dyer, Eva L},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems}
}
Self-supervised Learning
Optimal Transport
Contrastive learning reframed as a matching problem enables generalizing existing self-supervised methods with optimal transport theory.
|
Balanced Data, Imbalanced Spectra: Unveiling Class Disparities with Spectral Imbalance
ICML, 2024
Paper
/
Cite
@article{kaushik2024balanced,
title={Balanced Data, Imbalanced Spectra: Unveiling Class Disparities with Spectral Imbalance},
author={Kaushik, Chiraag and Liu, Ran and Lin, Chi-Heng and Khera, Amrit and Jin, Matthew Y and Ma, Wenrui and Muthukumar, Vidya and Dyer, Eva L},
journal={arXiv preprint arXiv:2402.11742},
year={2024}
}
Data Augmentation
Machine Learning Theory
Intrinsic bias in class spectra is mitigated with augmentation strategies, improving generalization across classes.
|
Half-Hop: A graph upsampling approach for slowing down message passing
ICML, 2023
Project
/
Paper
/
Cite
@inproceedings{azabou2023half,
title={Half-Hop: A graph upsampling approach for slowing down message passing},
author={Azabou, Mehdi and Ganesh, Venkataramana and Thakoor, Shantanu and Lin, Chi-Heng and Sathidevi, Lakshmi and Liu, Ran and Valko, Michal and Veli{\v{c}}kovi{\'c}, Petar and Dyer, Eva L},
booktitle={International Conference on Machine Learning},
pages={1341--1360},
year={2023},
organization={PMLR}
}
Data Augmentation
Graph Neural Networks
Self-supervised Learning
An upsampling method for graphs is proposed to generate augmentations and improve self-supervised learning.
|
Provable Acceleration of Heavy Ball beyond Quadratics for a Class of Polyak-Łojasiewicz Functions when the Non-Convexity is Averaged-Out
ICML, 2022
Paper
/
Cite
@inproceedings{wang2022provable,
title={Provable acceleration of heavy ball beyond quadratics for a class of Polyak-Lojasiewicz functions when the non-convexity is averaged-out},
author={Wang, Jun-Kun and Lin, Chi-Heng and Wibisono, Andre and Hu, Bin},
booktitle={International conference on machine learning},
pages={22839--22864},
year={2022},
organization={PMLR}
}
Optimization
Machine Learning Theory
The training acceleration of heavy ball momentum has been proved for Polyak-Łojasiewicz (PL) optimization problems.
|
Making transport more robust and interpretable by moving data through a small number of anchor points
ICML, 2021
Project
/
Paper
/
Cite
@article{lin2021making,
title={Making transport more robust and interpretable by moving data through a small number of anchor points},
author={Lin, Chi-Heng and Azabou, Mehdi and Dyer, Eva L},
journal={Proceedings of machine learning research},
volume={139},
pages={6631},
year={2021},
publisher={NIH Public Access}
}
Optimal Transport
Domain Adaptation
A low-rank transport formulation is proposed to move data through a small number of anchor points, improving robustness and interpretability.
|
A Modular Analysis of Provable Acceleration via Polyak’s Momentum: Training a Wide ReLU Network and a Deep Linear Network
ICML, 2021
Paper
/
Cite
@inproceedings{wang2021modular,
title={A modular analysis of provable acceleration via polyak’s momentum: Training a wide relu network and a deep linear network},
author={Wang, Jun-Kun and Lin, Chi-Heng and Abernethy, Jacob D},
booktitle={International Conference on Machine Learning},
pages={10816--10827},
year={2021},
organization={PMLR}
}
Optimization
Machine Learning Theory
The training acceleration of heavy ball momentum has been proved for one-layer ReLU and deep linear networks.
|
Drop, Swap, and Generate: A Self-Supervised Approach for Generating Neural Activity
NeurIPS,
2021
(Oral Presentation)
Project
/
Paper
/
Cite
@article{liu2021drop,
title={Drop, swap, and generate: A self-supervised approach for generating neural activity},
author={Liu, Ran and Azabou, Mehdi and Dabagia, Max and Lin, Chi-Heng and Gheshlaghi Azar, Mohammad and Hengen, Keith and Valko, Michal and Dyer, Eva},
journal={Advances in neural information processing systems},
volume={34},
pages={10587--10599},
year={2021}
}
Self-supervised Learning
Neuroscience
A generative-based self-supervised learning approach is proposed to extract interpretable representations of neural activity.
|
Bayesian Optimization for Modular Black-Box Systems with Switching Costs
UAI,
2021
Paper
/
Cite
@inproceedings{lin2021bayesian,
title={Bayesian optimization for modular black-box systems with switching costs},
author={Lin, Chi-Heng and Miano, Joseph D and Dyer, Eva L},
booktitle={Uncertainty in Artificial Intelligence},
pages={1024--1034},
year={2021},
organization={PMLR}
}
Bayesian Optimization
Neuroscience
A hyperparameter tuning strategy is proposed to optimize a neuroimaging system that has different switching costs.
|
Mine your own view: A self-supervised approach for learning representations of neural activity
Mehdi Azabou,
Mohammad Gheshlaghi Azar,
Ran Liu,
Chi-Heng Lin,
Erik C. Johnson,
Kiran Bhaskaran-Nair,
Max Dabagia,
Bernardo Avila-Pires,
Lindsey Kitchell,
Keith B. Hengen,
William Gray-Roncal,
Michal Valko,
Eva L. Dyer
NeurIPS Workshop on Self-Supervised Learning,
2021
(Oral Presentation)
Project
/
Paper
/
Cite
@article{azabou2021mine,
title={Mine your own view: Self-supervised learning through across-sample prediction},
author={Azabou, Mehdi and Azar, Mohammad Gheshlaghi and Liu, Ran and Lin, Chi-Heng and Johnson, Erik C and Bhaskaran-Nair, Kiran and Dabagia, Max and Avila-Pires, Bernardo and Kitchell, Lindsey and Hengen, Keith B and others},
journal={arXiv preprint arXiv:2102.10106},
year={2021}
}
Self-supervised Learning
Neuroscience
Learning representations from nearest neighbors allows us to leverage the diversity within the latent space.
|
Escaping saddle points faster with stochastic momentum
ICLR,
2020
Paper
/
Cite
@article{wang2021escaping,
title={Escaping saddle points faster with stochastic momentum},
author={Wang, Jun-Kun and Lin, Chi-Heng and Abernethy, Jacob},
journal={arXiv preprint arXiv:2106.02985},
year={2021}
}
Optimization
Machine Learning Theory
We prove that stochastic momentum can escape saddle points faster than gradient descent.
|