Writing on ML, NLP, mechanistic interpretability.

All of my papers on ML, NLP, mechanistic interpretability, and more, collected in chronological order.

Efficient Training of Sparse Autoencoders for Large Language Models via Layer Clustering

Sparse Autoencoders (SAEs) have recently been employed as an unsupervised approach for understanding the inner workings of Large Language Models (LLMs). They reconstruct the model’s activations with a sparse linear combination of interpretable features. However, training SAEs is computationally intensive, especially as models grow in size and complexity. To address this challenge, we propose a novel training strategy that reduces the number of trained SAEs from one per layer to one for a given group of contiguous layers. Our experimental results on Pythia 160M highlight a 6x speedup without compromising the reconstruction quality and performance on downstream tasks. Therefore, layer clustering presents an efficient approach to train SAEs in modern LLMs.

Accelerating Sparse Autoencoder Training via Layer-Wise Transfer Learning in Large Language Models

Sparse AutoEncoders (SAEs) have gained popularity as a tool for enhancing the interpretability of Large Language Models (LLMs). However, training SAEs can be computationally intensive, especially as model complexity grows. In this study, the potential of transfer learning to accelerate SAEs training is explored by capitalizing on the shared representations found across adjacent layers of LLMs. Our experimental results demonstrate that fine-tuning SAEs using pre-trained models from nearby layers not only maintains but often improves the quality of learned representations, while significantly accelerating convergence. These findings indicate that the strategic reuse of pretrained SAEs is a promising approach, particularly in settings where computational resources are constrained.

Applications of Autoencoder Asset Pricing Models to a Highly Dimensional Cross-Section

We test Autoencoder asset pricing models, Kelly, Gu, and Xiu (KGX, 2019), with a dataset that is smaller than the one they used by two orders of magnitude, and has higher dimensionality; specifically, the new dataset has 123 variables as opposed to the original dataset, which has 94. It’s also more geographically diverse: it comes from Qi4M, and thus includes EMEA-based securities vs. us-based the Center for Research on Security Prices (CRSP) for the original. Lastly, we fit the model on both the original and the Qi4M dataset, and we probe the solidity of the model’s performance when confronted with challenging data by comparing their respective R2s. We check the degree to which the increase in the number of predictive characteristics impacts the model’s tendency to overfit the training dataset.