I am a Machine Learning Research Engineer at
Samsung AI Center - Mountain View
, where I specialize in on-device AI and natural language processing.
At Samsung, I have worked on various LLM applications, including
multi-token predictions,
model compression, and
hybrid state-space models, where our work received the Samsung Best Paper Award.
I earned my PhD in Electrical and Computer Engineering at
Georgia Institute of Technology,
where I had the privilege of being advised by
Dr. Eva L. Dyer.
I am passionate about advancing artificial intelligence to achieve reasoning and concept generation that surpasses human capabilities, addressing fundamental challenges through interdisciplinary machine learning techniques. My work tries to bridge the gap between practice and theory, spanning practical applications of large language models (LLMs) and the theoretical foundations of machine learning.
@article{lin2024modegpt,
title={Modegpt: Modular decomposition for large language model compression},
author={Lin, Chi-Heng and Gao, Shangqian and Smith, James Seale and Patel, Abhishek and Tuli, Shikhar and Shen, Yilin and Jin, Hongxia and Hsu, Yen-Chang},
journal={arXiv preprint arXiv:2408.09632},
year={2024}
}
Large Language ModelsLLM Compression
Low-rank decomposition for weight matrices can achieve SOTA compression when transformers have been partitioned into modules.
@article{gao2024disp,
title={DISP-LLM: Dimension-Independent Structural Pruning for Large Language Models},
author={Gao, Shangqian and Lin, Chi-Heng and Hua, Ting and Zheng, Tang and Shen, Yilin and Jin, Hongxia and Hsu, Yen-Chang},
journal={arXiv preprint arXiv:2410.11988},
year={2024}
}
Large Language ModelsLLM Compression
Consecutive dimensions of blocks can be pruned independently using simple index addition and selection. This flexibility enhances the compression quality.
@inproceedings{lin2024slim,
title={SLiM: Speculative Decoding with Hypothesis Reduction},
author={Lin, Chi-Heng and Tuli, Shikhar and Smith, James and Hsu, Yen-Chang and Shen, Yilin and Jin, Hongxia},
booktitle={Findings of the Association for Computational Linguistics: NAACL 2024},
pages={1005--1017},
year={2024}
}
Large Language ModelsMulti-token Prediction
Multi-token predictions enhanced by bi-gram tables significantly reduce computational FLOPS during token verification.
@article{tuli2024dynamo,
title={DynaMo: Accelerating Language Model Inference with Dynamic Multi-Token Sampling},
author={Tuli, Shikhar and Lin, Chi-Heng and Hsu, Yen-Chang and Jha, Niraj K and Shen, Yilin and Jin, Hongxia},
journal={arXiv preprint arXiv:2405.00888},
year={2024}
}
Large Language ModelsMulti-token Prediction
Co-concurrence masking with adaptive thresholding enables multi-token predictions, achieving 2.57x speedup with under 5% overhead.
@article{lin2024good,
title={The good, the bad and the ugly sides of data augmentation: An implicit spectral regularization perspective},
author={Lin, Chi-Heng and Kaushik, Chiraag and Dyer, Eva L and Muthukumar, Vidya},
journal={Journal of Machine Learning Research},
volume={25},
number={91},
pages={1--85},
year={2024}
}
Data AugmentationMachine Learning Theory
We reveal a close relationship between data augmentation and spectral regularization. Unlike ridge regression, it can both help and hurt generalization.
@inproceedings{chenyour,
title={Your contrastive learning problem is secretly a distribution alignment problem},
author={Chen, Zihao and Lin, Chi-Heng and Liu, Ran and Xiao, Jingyun and Dyer, Eva L},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems}
}
Self-supervised LearningOptimal Transport
Contrastive learning reframed as a matching problem enables generalizing existing self-supervised methods with optimal transport theory.
@article{kaushik2024balanced,
title={Balanced Data, Imbalanced Spectra: Unveiling Class Disparities with Spectral Imbalance},
author={Kaushik, Chiraag and Liu, Ran and Lin, Chi-Heng and Khera, Amrit and Jin, Matthew Y and Ma, Wenrui and Muthukumar, Vidya and Dyer, Eva L},
journal={arXiv preprint arXiv:2402.11742},
year={2024}
}
Data Augmentation
Machine Learning Theory
Intrinsic bias in class spectra is mitigated with augmentation strategies, improving generalization across classes.
@inproceedings{azabou2023half,
title={Half-Hop: A graph upsampling approach for slowing down message passing},
author={Azabou, Mehdi and Ganesh, Venkataramana and Thakoor, Shantanu and Lin, Chi-Heng and Sathidevi, Lakshmi and Liu, Ran and Valko, Michal and Veli{\v{c}}kovi{\'c}, Petar and Dyer, Eva L},
booktitle={International Conference on Machine Learning},
pages={1341--1360},
year={2023},
organization={PMLR}
}
Data Augmentation
Graph Neural Networks
Self-supervised Learning
An upsampling method for graphs is proposed to generate augmentations and improve self-supervised learning.
@inproceedings{wang2022provable,
title={Provable acceleration of heavy ball beyond quadratics for a class of Polyak-Lojasiewicz functions when the non-convexity is averaged-out},
author={Wang, Jun-Kun and Lin, Chi-Heng and Wibisono, Andre and Hu, Bin},
booktitle={International conference on machine learning},
pages={22839--22864},
year={2022},
organization={PMLR}
}
Optimization
Machine Learning Theory
The training acceleration of heavy ball momentum has been proved for Polyak-Łojasiewicz (PL) optimization problems.
@article{lin2021making,
title={Making transport more robust and interpretable by moving data through a small number of anchor points},
author={Lin, Chi-Heng and Azabou, Mehdi and Dyer, Eva L},
journal={Proceedings of machine learning research},
volume={139},
pages={6631},
year={2021},
publisher={NIH Public Access}
}
Optimal Transport
Domain Adaptation
A low-rank transport formulation is proposed to move data through a small number of anchor points, improving robustness and interpretability.
@inproceedings{wang2021modular,
title={A modular analysis of provable acceleration via polyak’s momentum: Training a wide relu network and a deep linear network},
author={Wang, Jun-Kun and Lin, Chi-Heng and Abernethy, Jacob D},
booktitle={International Conference on Machine Learning},
pages={10816--10827},
year={2021},
organization={PMLR}
}
Optimization
Machine Learning Theory
The training acceleration of heavy ball momentum has been proved for one-layer ReLU and deep linear networks.
@article{liu2021drop,
title={Drop, swap, and generate: A self-supervised approach for generating neural activity},
author={Liu, Ran and Azabou, Mehdi and Dabagia, Max and Lin, Chi-Heng and Gheshlaghi Azar, Mohammad and Hengen, Keith and Valko, Michal and Dyer, Eva},
journal={Advances in neural information processing systems},
volume={34},
pages={10587--10599},
year={2021}
}
Self-supervised Learning
Neuroscience
A generative-based self-supervised learning approach is proposed to extract interpretable representations of neural activity.
@inproceedings{lin2021bayesian,
title={Bayesian optimization for modular black-box systems with switching costs},
author={Lin, Chi-Heng and Miano, Joseph D and Dyer, Eva L},
booktitle={Uncertainty in Artificial Intelligence},
pages={1024--1034},
year={2021},
organization={PMLR}
}
Bayesian Optimization
Neuroscience
A hyperparameter tuning strategy is proposed to optimize a neuroimaging system that has different switching costs.
@article{wang2021escaping,
title={Escaping saddle points faster with stochastic momentum},
author={Wang, Jun-Kun and Lin, Chi-Heng and Abernethy, Jacob},
journal={arXiv preprint arXiv:2106.02985},
year={2021}
}
Optimization
Machine Learning Theory
We prove that stochastic momentum can escape saddle points faster than gradient descent.
Miscellanea
I am Catman (tier S superhero), Master Chef specializing in instant noodles and curry, and a sleeping agent who enjoys sleeping.
Uhhh... I enjoy cracking math puzzles for fun, too 😂
"For the things we have to learn before we can do them, we learn by doing them."