In recent years, attention-based models like Transformers have radically improved the performance of natural language understanding (NLU), demonstrating the appropriateness of attention-based representation for language. In (Henderson, 2020) we show that these representation share many characteristics with those found in traditional computational linguistics (e.g. graph structure), except that they do not automatically learn multiple levels of representation nor their entities (morphemes, phrases, discourse entities, etc). Motivated by this challenge of entity induction, our recent work has discovered a very non-traditional perspective, which characterises attention-based models like Transformers as doing nonparametric Bayesian inference. Given an input text, our Nonparametric Variational Information Bottleneck (NVIB) Transformer infers distributions over nonparametric mixture distributions (Henderson and Fehr, 2023). We have even shown that pretrained Transformers can be converted into equivalent NVIB Transformers, and regularised post-training (Fehr and Henderson, 2023).This reinterpretation of Transformers, combined with their unprecedented empirical success, leads us to postulate the hypothesis that natural language understanding is nonparametric variational Bayesian inference over mixture distributions. This claim of the adequacy of NVIB leads to two fundamental challenges which are not currently being addressed, each with an associated technological aim:1. How can NVIB support inducing graph-structured representations at multiple levels of representation? Making deep learning representations interpretable.2. How can NVIB enable controlling the information in representations? Making deep learning representations controllable.For the first challenge, we will extend our previous structure processing methods (Mohammadshahi and Henderson, 2020, 2021, 2023; Miculicich and Henderson, 2022), developed for set-of-vector representations, to mixture-of-component distributions. And we will focus on unsupervised learning methods, rather than our previous supervised learning methods. To extend these models to multiple levels, we will take the approach of embedding all levels in one big mixture of non-homogeneous components, which are computed with iterative refinement. This extends our previous work on iterative graph refinement (Mohammadshahi and Henderson, 2021; Miculicich and Henderson, 2022), adding the induction of the nodes of the graph and the induction of multiple levels of representation. Learning representations which are interpretable as linguistic structures will be a testbed for the general aim of deep learning of interpretable representations.For the second challenge, we will leverage the information theory behind NVIB to model both inferring implicit information and removing private information. We will investigate the use of KL divergence as a measure of entailment in semantic inference. We will apply the framework of Rényi differential privacy (Mironov, 2017) to provide privacy guarantees by adding noise which removes targeted information from Transformer embeddings. This method extends differential privacy to anything that can be embedded with a Transformer (especially text), with many important applications. These methods address the general aim of controlling the information in deep learning representations.Addressing these challenges will lead to fundamental advances in machine learning, including novel deep learning architectures and fundamental insights into Transformers and their pretraining. We will do both intrinsic and extrinsic evaluations of our induced representations, expecting to show improvements on core NLP tasks, including privacy-preserving sharing of textual data. Given the current level of interest in the AI research and development community for Transformers and variational Bayesian methods, we expect the proposed research to have a profound impact on the field.