How do Large Language Models Represent Information?
The transformer architecture which underpins LLMs uses an attention mechanism to access its internal representations. Idiap's NLU group has developed a novel extension of these attention-based representations based on Bayesian probabilities and information theory. We reinterpret pretrained LLMs in terms of this theory, and show that removing small amounts of information results in equally good models when applied within the training domain, and improved models when applied to different domains. This shows that this information-theoretic interpretation does a good job of characterising what information learned during training is reliable and what is unreliable out-of-domain. This method can be used to improve performance in new domains without needing to retrain the model on the new domain.
Ref paper: Nonparametric Variational Regularisation of Pretrained Transformers