How do Large Language Models Represent Information?

Pretrained LLMs have demonstrated impressive abilities, but it is hard to understand how they work or how well they will generalise to a new domain. Idiap researchers are developing a model of how information is represented inside LLMs. By identifying and removing unreliable information, this model can improve generalisation to new domains, without the need or any additional training.

The transformer architecture which underpins LLMs uses an attention mechanism to access its internal representations. Idiap's NLU group has developed a novel extension of these attention-based representations based on Bayesian probabilities and information theory.  We reinterpret pretrained LLMs in terms of this theory, and show that removing small amounts of information results in equally good models when applied within the training domain, and improved models when applied to different domains.  This shows that this information-theoretic interpretation does a good job of characterising what information learned during training is reliable and what is unreliable out-of-domain.  This method can be used to improve performance in new domains without needing to retrain the model on the new domain.

Ref paper: Nonparametric Variational Regularisation of Pretrained Transformers