Finding Circuits in Protein Language Models
Description: Mechanistic Interpretability is a sub field of Natural Language processing which seeks to reverse engineer the behavior of black box models such as Transformers. The goal of this talk is to introduce key ideas from this field and discuss their applications to current Protein Language models like ESM-2.