Note sull'episodio
đGemma Scope: helping the safety community shed light on the inner workings of language models.
Explainable AI: One of the most requested feature for LLMs is to understand how to take internal decisions. This is a big step towards interpretability "This is a barebones tutorial on how to use Gemma Scope, Google DeepMind's suite of Sparse Autoencoders (SAEs) on every layer and sublayer of Gemma 2 2B and 9B. Sparse Autoencoders are an interpretability tool that act like a "microscope" on language model activations. They let us zoom in on dense, compressed activations, and expand them to a larger but sparser and seemingly more interpretable form, which can be a very useful tool when doing interpretability research!"
Listen to it at our podcast and Support us by subscribing at