Ayush Kaushal

About me

Hi There!

I am a Graduate Student at University of Texas at Austin. I am currently researching with Prof. Irina Rish (Mila Institute, University of Montreal) on efficient foundation model training and inference along the directions of low-bitwidth language modeling, model compression and continual pretraining.

Prior to this, I was researching at Google, Mountain View on Efficient Transformer Architecture for Beyond Turn-Based Interactivity with Shyam Upadhyay, Aditya Gupta and Manaal Faruqui. Before that, I worked on Interpretability in LLMs with Prof. Kyle Mahowald at UT Austin. Even further back, I researched at ETH Zurich on Causal NLP, University of Oregon on Multimodal NLP, IBM India on AI Application and IIT Kharagpur on Robustness and Evaluation of NLP Models.

As an undergraduate, I developed various Open Source softwares in Julia language surrounding the Machine Learning & Natural Language Processing ecosystem.

Scalable Foundation Model Training:
- Maximum NVIDIA GPUs I’ve scaled to was 6144 for Hi-NOLIN model.
- The same for AMD is 2048 GCD (1024 GPUs) on Frontier Supercomputer.
- The weirdest hardware configuration I dealt with had “6” GPUs per node and IBM CPUs (which is neither x86 nor ARM, but Power CPU Architecture).
Neural Network (NN) Training (small scale):
- Trained atleast one NN on atleast one GPU of each NVIDIA microarchitecture generation from Kepler to Hopper.
- Trained atleast one NN on atleast one AI/Graphics accelerator from NVIDIA, AMD, Gaudi, Google TPU chips.
- Trained atleast one NN on atleast one CPU from Intel, AMD, PowerPC and Apple Silicon.
Wrote the first open implementation to run 4-bit LLaMa models on Mac CPU.
Preferred language for neural networks: Python > > > Julia > JS > C/C++ > English.