About Experiences News

Chi-Heng Lin

I am a Machine Learning Research Engineer at Samsung AI Center - Mountain View , where I specialize in on-device AI and natural language processing.

At Samsung, I have worked on various LLM applications, including multi-token predictions, model compression, and hybrid state-space models, where our work received the Samsung Best Paper Award. I earned my PhD in Electrical and Computer Engineering at Georgia Institute of Technology, where I had the privilege of being advised by Dr. Eva L. Dyer.

Email  /  CV  /  Scholar  /  Twitter  /  Github

Chi-Heng Lin

Research

I am passionate about advancing artificial intelligence to achieve reasoning and concept generation that surpasses human capabilities, addressing fundamental challenges through interdisciplinary machine learning techniques. My work tries to bridge the gap between practice and theory, spanning practical applications of large language models (LLMs) and the theoretical foundations of machine learning.

MoDeGPT
MoDeGPT: Modular Decomposition for Large Language Model Compression
arXiv, 2024
Paper / Cite
Large Language Models LLM Compression

Low-rank decomposition for weight matrices can achieve SOTA compression when transformers have been partitioned into modules.

DISP-LLM
DISP-LLM: Dimension-Independent Structural Pruning for Large Language Models
NeurIPS, 2024
Paper / Cite
Large Language Models LLM Compression

Consecutive dimensions of blocks can be pruned independently using simple index addition and selection. This flexibility enhances the compression quality.

SLiM
SLiM: Speculative Decoding with Hypothesis Reduction
NAACL-Findings, 2024
Paper / Cite
Large Language Models Multi-token Prediction

Multi-token predictions enhanced by bi-gram tables significantly reduce computational FLOPS during token verification.

DynaMo
DynaMo: Accelerating Language Model Inference with Dynamic Multi-Token Sampling
NAACL, 2024
Paper / Cite
Large Language Models Multi-token Prediction

Co-concurrence masking with adaptive thresholding enables multi-token predictions, achieving 2.57x speedup with under 5% overhead.

Good, Bad, Ugly
The Good, the Bad and the Ugly Sides of Data Augmentation: An Implicit Spectral Regularization Perspective
JMLR, 2024a
Paper / Cite
Data Augmentation Machine Learning Theory

We reveal a close relationship between data augmentation and spectral regularization. Unlike ridge regression, it can both help and hurt generalization.

Balanced Data, Imbalanced Spectra
Your Contrastive Learning Problem is Secretly a Distribution Alignment Problem
NeurIPS, 2024
Paper / Cite
Self-supervised Learning Optimal Transport

Contrastive learning reframed as a matching problem enables generalizing existing self-supervised methods with optimal transport theory.

Balanced Data, Imbalanced Spectra
Balanced Data, Imbalanced Spectra: Unveiling Class Disparities with Spectral Imbalance
ICML, 2024
Paper / Cite
Data Augmentation Machine Learning Theory

Intrinsic bias in class spectra is mitigated with augmentation strategies, improving generalization across classes.

Half-Hop
Half-Hop: A graph upsampling approach for slowing down message passing
ICML, 2023
Project / Paper / Cite
Data Augmentation Graph Neural Networks Self-supervised Learning

An upsampling method for graphs is proposed to generate augmentations and improve self-supervised learning.

Provable Acceleration of Heavy Ball
Provable Acceleration of Heavy Ball beyond Quadratics for a Class of Polyak-Łojasiewicz Functions when the Non-Convexity is Averaged-Out
ICML, 2022
Paper / Cite
Optimization Machine Learning Theory

The training acceleration of heavy ball momentum has been proved for Polyak-Łojasiewicz (PL) optimization problems.

Making Transport More Robust and Interpretable
Making transport more robust and interpretable by moving data through a small number of anchor points
Chi-Heng Lin, Mehdi Azabou, Eva L. Dyer
ICML, 2021
Project / Paper / Cite
Optimal Transport Domain Adaptation

A low-rank transport formulation is proposed to move data through a small number of anchor points, improving robustness and interpretability.

A Modular Analysis of Provable Acceleration via Polyak’s Momentum
A Modular Analysis of Provable Acceleration via Polyak’s Momentum: Training a Wide ReLU Network and a Deep Linear Network
Jun-Kun Wang, Chi-Heng Lin, Jacob Abernethy
ICML, 2021
Paper / Cite
Optimization Machine Learning Theory

The training acceleration of heavy ball momentum has been proved for one-layer ReLU and deep linear networks.

Drop, Swap, and Generate
Drop, Swap, and Generate: A Self-Supervised Approach for Generating Neural Activity
NeurIPS, 2021 (Oral Presentation)
Project / Paper / Cite
Self-supervised Learning Neuroscience

A generative-based self-supervised learning approach is proposed to extract interpretable representations of neural activity.

Bayesian Optimization for Modular Black-Box Systems
Bayesian Optimization for Modular Black-Box Systems with Switching Costs
Chi-Heng Lin, Joseph D. Miano, Eva L. Dyer
UAI, 2021
Paper / Cite
Bayesian Optimization Neuroscience

A hyperparameter tuning strategy is proposed to optimize a neuroimaging system that has different switching costs.

Escaping Saddle Points with Stochastic Momentum
Escaping saddle points faster with stochastic momentum
Jun-Kun Wang, Chi-Heng Lin, Jacob Abernethy
ICLR, 2020
Paper / Cite
Optimization Machine Learning Theory

We prove that stochastic momentum can escape saddle points faster than gradient descent.

Miscellanea

I am Catman (tier S superhero), Master Chef specializing in instant noodles and curry, and a sleeping agent who enjoys sleeping. Uhhh... I enjoy cracking math puzzles for fun, too 😂

Descriptive Text

"For the things we have to learn before we can do them, we learn by doing them."

— Aristotle