Nishant Subramani

I am a 2nd year PhD student 🎓 at CMU in the Language Technologies Institute (LTI) advised by Mona Diab. I have published at most of the major ML and NLP venues and was a part of the BLOOM and OLMo open LLM efforts leading to two best paper awards at ACL2024 and a GeekWire innovation of the year award. I spent this past summer as an intern on the semantic machines team at Microsoft Research on Ben Van Durme and Jason Eisner’s team.

My main research interest is in model interpretability 🔎: the intersection of mechanistic and traditional interpretability. Specifically I am excited about understanding the internals of models and identifying steerable regions in the latent spaces of models. I wrote some of the first papers on steering vectors for LSTM-based models at NeurIPS 2019 and Transformer-based ones at ACL 2022, which have recently gained popularity. Understanding model internals is essential to help us build more responsible, controllable, trustworthy, and efficient NLP systems. If you’re interested in or actively working on model interpretability, I’d love to chat, please reach out!

Before CMU, I spent nearly two and a half years as a predoctoral researcher on the AllenNLP team at AI2, where I worked with Matt Peters. Before that I spent two years in industry at startups as a research scientist working on NLP, vision, speech, and multimodal applications. I have had the opportunity to work closely with some amazing collaborators at other institutions including Margaret Mitchell, Sasha Luccioni, Vladlen Koltun, and Doug Downey.

news

Jun 2025	New preprint: 🕵 Model Internal Sleuthing: Finding Lexical Identity and Inflectional Morphology in Modern Language Models led by my undergrad mentee Michael Li is out!
May 2025	Started as a student researcher (intern) on the Cloud AI research team at Google Cloud working with Hamid Palangi on actionable interpretability 🔎 to supercharge tool-using agents 🤖
Apr 2025	At NAACL in Albuquerque to present MICE for CATs! Reach out if you want to chat about interpretability things 🔎
Apr 2025	MICE for CATs: Model-Internal Confidence Estimation for Calibrating Agents with Tools won a Best Paper Runner Up prize at the LTI Student Research Symposium
Jan 2025	MICE for CATs: Model-Internal Confidence Estimation for Calibrating Agents with Tools has been accepted to NAACL2025 as a main conference paper! See you in Albuquerque in April!
Aug 2024	OLMo won an outstanding paper award at ACL 2024!
Aug 2024	Dolma won an outstanding resource paper award at ACL 2024!
Jun 2024	At NAACL in Mexico City ; come say hi!
Jun 2024	In Seattle for the summer 🏔️ - started as a PhD research intern on the semantic machines team at Microsoft Research working with Sam Thomson and Yu Su on calibrating tool-using agents 🤖
May 2024	OLMo and Dolma accepted to the main conference at ACL. See my wonderful coauthors in Bangkok in August!
Apr 2024	Evaluating Personal Information Parroting in Language Models has been accepted to TrustNLP! See you in Mexico City in June!
Nov 2023	Had a wonderful time giving a talk on Steering vectors: an alternative way to steer language models at in Annie En-Shiun Lee’s group at OntarioTech!
Aug 2023	Started my PhD at CMU LTI with Mona Diab on model interpretability

selected publications

Under Review
Making Sense of LLM Benchmarks from the Performance Matrices Alone

Nishant Subramani^*, Alfredo Gomez^*, and Mona T. Diab

Preprint, 2025

Paper Abs Bib

Modern language models are evaluated on large benchmarks. Given how many different numbers these evaluations output, making sense of them for model selection can be difficult. We take a closer look at this using a model-centric lens and look at the evaluation numbers themselves. In this work, we analyze benchmarks in three stages: dataset & model comparison, representative set identification, and performance prediction. Since datasets and models relate strongly to one another, we develop an algorithm to identify a representative set of datasets that covers a benchmark using the raw evaluation scores alone. Using our algorithm, we find that with 5.9% (1/17), 1.7% (1/58), and 16.2% (12/74) of the datasets for HELM, MMLU, and BigBenchLite respectively, we achieve coverage levels of at least 95%. Additionally, using just these representative subsets, we can both preserve model ranks and predict performance on a held-out set of models with near zero mean-squared error. Taken together, our analysis can help model developers improve efficiency and allow dataset creators validate whether their newly created dataset differs from existing datasets in the benchmark.
@article{Subramani2025MakingSense, title = {Making Sense of LLM Benchmarks from the Performance Matrices Alone}, author = {Subramani, Nishant and Gomez, Alfredo and Diab, Mona T.}, journal = {Preprint}, year = {2025}, url = {}, paper_link = {}, }
Under Review
LLM Microscope: What Model Internals Reveal About Answer Correctness and Context Use

Jiarui Liu^*, Jivitesh Jain^*, Mona T. Diab, and Nishant Subramani

Preprint, 2025

Paper Abs Bib

Although large language models (LLMs) have tremendous utility, trustworthiness is still a chief concern: models often generate incorrect information with high confidence. Retrieval-augmented generation (RAG) is known to help, but identifying queries which may benefit from retrieved context, and the efficacy of such context, remains challenging. In this work, we operationalize interpretability methods to ascertain whether we can predict the correctness of model outputs from the model’s activations alone, and if model internals contain signals about the efficacy of external context. We consider correct, incorrect, and irrelevant context and introduce metrics to distinguish amongst them. Experiments on six different models reveal that a simple classifier trained on first-layer activations of the first output token can predict output correctness with high accuracy, enabling early auditing. Our model-internals-based metric significantly outperforms prompting baselines at distinguishing between correct and incorrect context, guarding against inaccuracies introduced by polluted context. These findings offer a lens to better understand the underlying decision-making processes of LLMs.
@article{LiuJain2025MIMicroscope, title = {LLM Microscope: What Model Internals Reveal About Answer Correctness and Context Use}, author = {Liu, Jiarui and Jain, Jivitesh and Diab, Mona T. and Subramani, Nishant}, journal = {Preprint}, year = {2025}, url = {}, paper_link = {}, }
Under Review
Model Internal Sleuthing: Finding Lexical Identity and Inflectional Morphology in Modern Language Models

Michael Li, and Nishant Subramani

Preprint, 2025

Paper Abs Bib

Large transformer-based language models dominate modern NLP, yet our understanding of how they encode linguistic information is rooted in studies of early models like BERT and GPT-2. To better understand today’s language models, we investigate how both classical architectures (BERT, DeBERTa, GPT-2)and contemporary large language models (Pythia, OLMo-2, Gemma-2, Qwen2.5, Llama-3.1) represent lexical identity and inflectional morphology. We train linear and nonlinear classifiers on layer-wise activations to predict word lemmas and inflectional features. We discover that models concentrate lexical information linearly in early layers and increasingly nonlinearly in later layers, while keeping inflectional information uniformly accessible and linearly separable throughout the layers. Further analysis reveals that these models encode inflectional morphology through generalizable abstractions, but rely predominantly on memorization to encode lexical identity. Remarkably, these patterns emerge across all 16 models we test, despite differences in architecture, size, and training regime (including pretrained and instruction-tuned variants). This consistency suggests that, despite substantial advances in LLM technologies, transformer models organize linguistic information in similar ways, indicating that these properties could be fundamental for next token prediction and are learned early during pretraining. Our code is available at https://github.com/ml5885/model_internal_sleuthing.
@article{LiSubramani2025ModelInternalSleuthing, title = {Model Internal Sleuthing: Finding Lexical Identity and Inflectional Morphology in Modern Language Models}, author = {Li, Michael and Subramani, Nishant}, journal = {Preprint}, year = {2025}, url = {https://arxiv.org/abs/2506.02132}, paper_link = {https://arxiv.org/abs/2506.02132}, }
Under Review
Personal Information Parroting in Language Models

Nishant Subramani, Kshitish Ghate, and Mona Diab

Preprint, 2025

Paper Abs Bib

Modern language models are trained on large scrapes of the Web, containing millions of personal information (PI) instances, many of which language models (LMs) memorize, increasing privacy risks. In this work, we develop the regexes and rules (R&R) detector suite to detect email addresses, phone numbers, and IP addresses, which outperforms the best regex-based PI detectors. On a manually curated set of 483 instances of PI, we measure memorization and parroting: finding that 13.6% are parroted verbatim by the Pythia-6.9b model, i.e. when the model is prompted with the tokens that precede the PI in the original document, greedy decoding generates the entire PI span exactly. We expand this analysis to study models of varying sizes (160M-6.9B) and timesteps of pretraining (70k-143k iterations) on the Pythia model suite and find that both model size and amount of pretraining are positively correlated with memorization. Even the smallest model, Pythia-160m, parrots 2.7% of the instances exactly. Consequently, we strongly recommend that pretraining datasets be aggressively filtered and anonymized to minimize PI parroting.
@article{Subramani2025personalinfoparroting, title = {Personal Information Parroting in Language Models}, author = {Subramani, Nishant and Ghate, Kshitish and Diab, Mona}, journal = {Preprint}, year = {2025}, url = {}, paper_link = {}, }
NAACL
MICE for CATs: Model-Internal Confidence Estimation for Calibrating Agents with Tools

Nishant Subramani, Jason Eisner, Justin Svegliato, Benjamin Van Durme, Yu Su, and Sam Thomson

In Proceedings of NAACL, 2025

Paper Abs Bib

Tool-using agents that act in the world need to be both useful and safe. Well-calibrated model confidences can be used to weigh the risk versus reward of potential actions, but prior work shows that many models are poorly calibrated. Inspired by interpretability literature exploring the internals of models, we propose a novel class of model-internal confidence estimators (MICE) to better assess confidence when calling tools. MICE first decodes from each intermediate layer of the language model using logit lens (nostalgebraist, 2020) and then computes similarity scores between each layer’s generation and the final output. These features are fed into a learned probabilistic classifier to assess confidence in the decoded output. On the simulated trial and error (STE) tool-calling dataset using Llama3 models, we find that MICE beats or matches the baselines on smoothed expected calibration error. Using MICE confidences to determine whether to call a tool significantly improves over strong baselines on a new metric, expected tool-calling utility. Further experiments show that MICE is sample-efficient, can generalize zero-shot to unseen APIs, and results in higher tool-calling utility in scenarios with varying risk levels. Our code is open source, available at https://github.com/microsoft/mice_for_cats.
@inproceedings{subramani-etal-2025-mice, title = {{MICE} for {CAT}s: Model-Internal Confidence Estimation for Calibrating Agents with Tools}, author = {Subramani, Nishant and Eisner, Jason and Svegliato, Justin and Van Durme, Benjamin and Su, Yu and Thomson, Sam}, editor = {Chiruzzo, Luis and Ritter, Alan and Wang, Lu}, booktitle = {Proceedings of NAACL}, month = apr, year = {2025}, address = {Albuquerque, New Mexico}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2025.naacl-long.615/}, pages = {12362--12375}, isbn = {979-8-89176-189-6}, paper_link = {https://aclanthology.org/2025.naacl-long.615/}, }

ACL

Best Paper

OLMo: Accelerating the Science of Language Models

Dirk Groeneveld, Iz Beltagy, Evan Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson, and 31 more authors

In Proceedings of ACL, 2024

Paper Abs Bib

Language models (LMs) have become ubiquitous in both NLP research and in commercial product offerings. As their commercial importance has surged, the most powerful models have become closed off, gated behind proprietary interfaces, with important details of their training data, architectures, and development undisclosed. Given the importance of these details in scientifically studying these models, including their biases and potential risks, we believe it is essential for the research community to have access to powerful, truly open LMs. To this end, we have built OLMo, a competitive, truly Open Language Model, to enable the scientific study of language models. Unlike most prior efforts that have only released model weights and inference code, we release OLMo alongside open training data and training and evaluation code. We hope this release will empower the open research community and inspire a new wave of innovation.

@inproceedings{groeneveld-etal-2024-olmo,
  title = {{OLM}o: Accelerating the Science of Language Models},
  author = {Groeneveld, Dirk and Beltagy, Iz and Walsh, Evan and Bhagia, Akshita and Kinney, Rodney and Tafjord, Oyvind and Jha, Ananya and Ivison, Hamish and Magnusson, Ian and Wang, Yizhong and Arora, Shane and Atkinson, David and Authur, Russell and Chandu, Khyathi and Cohan, Arman and Dumas, Jennifer and Elazar, Yanai and Gu, Yuling and Hessel, Jack and Khot, Tushar and Merrill, William and Morrison, Jacob and Muennighoff, Niklas and Naik, Aakanksha and Nam, Crystal and Peters, Matthew and Pyatkin, Valentina and Ravichander, Abhilasha and Schwenk, Dustin and Shah, Saurabh and Smith, William and Strubell, Emma and Subramani, Nishant and Wortsman, Mitchell and Dasigi, Pradeep and Lambert, Nathan and Richardson, Kyle and Zettlemoyer, Luke and Dodge, Jesse and Lo, Kyle and Soldaini, Luca and Smith, Noah and Hajishirzi, Hannaneh},
  editor = {Ku, Lun-Wei and Martins, Andre and Srikumar, Vivek},
  booktitle = {Proceedings of ACL},
  month = aug,
  year = {2024},
  address = {Bangkok, Thailand},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/2024.acl-long.841},
  paper_link = {https://arxiv.org/abs/2402.00838},
}

ACL

Best Resource Paper
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

Luca Soldaini, Rodney Kinney, Akshita Bhagia, Dustin Schwenk, David Atkinson, Russell Authur, Ben Bogin, Khyathi Chandu, Jennifer Dumas, Yanai Elazar, Valentin Hofmann, Ananya Jha, and 24 more authors

In Proceedings of ACL, 2024

Paper Abs Bib

Information about pretraining corpora used to train the current best-performing language models is seldom discussed: commercial models rarely detail their data, and even open models are often released without accompanying training data or recipes to reproduce them. As a result, it is challenging to conduct and advance scientific research on language modeling, such as understanding how training data impacts model capabilities and limitations. To facilitate scientific research on language model pretraining, we curate and release Dolma, a three-trillion-token English corpus, built from a diverse mixture of web content, scientific papers, code, public-domain books, social media, and encyclopedic materials. We extensively document Dolma, including its design principles, details about its construction, and a summary of its contents. We present analyses and experimental results on intermediate states of Dolma to share what we have learned about important data curation practices. Finally, we open-source our data curation toolkit to enable reproduction of our work as well as support further research in large-scale data curation.
@inproceedings{soldaini-etal-2024-dolma, title = {Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research}, author = {Soldaini, Luca and Kinney, Rodney and Bhagia, Akshita and Schwenk, Dustin and Atkinson, David and Authur, Russell and Bogin, Ben and Chandu, Khyathi and Dumas, Jennifer and Elazar, Yanai and Hofmann, Valentin and Jha, Ananya and Kumar, Sachin and Lucy, Li and Lyu, Xinxi and Lambert, Nathan and Magnusson, Ian and Morrison, Jacob and Muennighoff, Niklas and Naik, Aakanksha and Nam, Crystal and Peters, Matthew and Ravichander, Abhilasha and Richardson, Kyle and Shen, Zejiang and Strubell, Emma and Subramani, Nishant and Tafjord, Oyvind and Walsh, Evan and Zettlemoyer, Luke and Smith, Noah and Hajishirzi, Hannaneh and Beltagy, Iz and Groeneveld, Dirk and Dodge, Jesse and Lo, Kyle}, booktitle = {Proceedings of ACL}, year = {2024}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2024.acl-long.840}, paper_link = {https://arxiv.org/abs/2402.00159}, }
ACL
Extracting Latent Steering Vectors from Pretrained Language Models

Nishant Subramani, Nivedita Suresh, and Matthew Peters

In Findings of ACL, 2022

Paper Abs Bib

Prior work on controllable text generation has focused on learning how to control language models through trainable decoding, smart-prompt design, or fine-tuning based on a desired objective. We hypothesize that the information needed to steer the model to generate a target sentence is already encoded within the model. Accordingly, we explore a different approach altogether: extracting latent vectors directly from pretrained language model decoders without fine-tuning. Experiments show that there exist steering vectors, which, when added to the hidden states of the language model, generate a target sentence nearly perfectly (> 99 BLEU) for English sentences from a variety of domains. We show that vector arithmetic can be used for unsupervised sentiment transfer on the Yelp sentiment benchmark, with performance comparable to models tailored to this task. We find that distances between steering vectors reflect sentence similarity when evaluated on a textual similarity benchmark (STS-B), outperforming pooled hidden states of models. Finally, we present an analysis of the intrinsic properties of the steering vectors. Taken together, our results suggest that frozen LMs can be effectively controlled through their latent steering space.
@inproceedings{subramani-etal-2022-extracting, title = {Extracting Latent Steering Vectors from Pretrained Language Models}, author = {Subramani, Nishant and Suresh, Nivedita and Peters, Matthew}, editor = {Muresan, Smaranda and Nakov, Preslav and Villavicencio, Aline}, booktitle = {Findings of ACL}, month = may, year = {2022}, address = {Dublin, Ireland}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2022.findings-acl.48}, paper_link = {https://arxiv.org/abs/2205.05124/}, }

BigScience Workshop

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ili’c, Daniel Hesslow, Roman Castagn’e, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow, Alexander M. Rush, and 379 more authors

ArXiv, 2022

Paper Abs Bib

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.

@article{Scao2022BLOOMA1,
  title = {BLOOM: A 176B-Parameter Open-Access Multilingual Language Model},
  author = {Scao, Teven Le and Fan, Angela and Akiki, Christopher and Pavlick, Ellie and Ili'c, Suzana and Hesslow, Daniel and Castagn'e, Roman and Luccioni, Alexandra Sasha and Yvon, François and Gall{\'e}, Matthias and Tow, Jonathan and Rush, Alexander M. and Biderman, Stella and Webson, Albert and Ammanamanchi, Pawan Sasanka and Wang, Thomas and Sagot, Beno{\i}t and Muennighoff, Niklas and del Moral, Albert Villanova and Ruwase, Olatunji and Bawden, Rachel and Bekman, Stas and McMillan-Major, Angelina and Beltagy, Iz and Nguyen, Huu and Saulnier, Lucile and Tan, Samson and Suarez, Pedro Ortiz and Sanh, Victor and Laurenccon, Hugo and Jernite, Yacine and Launay, Julien and Mitchell, Margaret and Raffel, Colin and Gokaslan, Aaron and Simhi, Adi and Etxabe, Aitor Soroa and Aji, Alham Fikri and Alfassy, Amit and Rogers, Anna and Nitzav, Ariel Kreisberg and Xu, Canwen and Mou, Chenghao and Emezue, Chris C. and Klamm, Christopher and Leong, Colin and van Strien, Daniel Alexander and Adelani, David Ifeoluwa and Radev, Dragomir R. and Ponferrada, Eduardo Gonz'alez and Levkovizh, Efrat and Kim, Ethan and Natan, Eyal and Toni, Francesco De and Dupont, G{\'e}rard and Kruszewski, Germ{\'a}n and Pistilli, Giada and ElSahar, Hady and Benyamina, Hamza and Tran, Hieu Trung and Yu, Ian and Abdulmumin, Idris and Johnson, Isaac and Gonzalez-Dios, Itziar and de la Rosa, Javier and Chim, Jenny and Dodge, Jesse and Zhu, Jian and Chang, Jonathan and Frohberg, Jorg and Tobing, Josephine and Bhattacharjee, Joydeep and Almubarak, Khalid and Chen, Kimbo and Lo, Kyle and von Werra, Leandro and Weber, Leon and Phan, Long and Allal, Loubna Ben and Tanguy, Ludovic and Dey, Manan and Mu{\~n}oz, Manuel Romero and Masoud, Maraim and Grandury, Mar'ia and vSavsko, Mario and Huang, Max and Coavoux, Maximin and Singh, Mayank and Jiang, Mike Tian-Jian and Vu, Minh Chien and Jauhar, Mohammad A. and Ghaleb, Mustafa and Subramani, Nishant and Kassner, Nora and Khamis, Nurulaqilla and Nguyen, Olivier and Espejel, Omar and de Gibert, Ona and Villegas, Paulo and Henderson, Peter and Colombo, Pierre and Amuok, Priscilla and Lhoest, Quentin and Harliman, Rheza and Bommasani, Rishi and L'opez, Roberto and Ribeiro, Rui and Osei, Salomey and Pyysalo, Sampo and Nagel, Sebastian and Bose, Shamik and Muhammad, Shamsuddeen Hassan and Sharma, Shanya and Longpre, S. and Nikpoor, Somaieh and Silberberg, S. and Pai, Suhas and Zink, Sydney and Torrent, Tiago Timponi and Schick, Timo and Thrush, Tristan and Danchev, Valentin and Nikoulina, Vassilina and Laippala, Veronika and Lepercq, Violette and Prabhu, Vrinda and Alyafeai, Zaid and Talat, Zeerak and Raja, Arun and Heinzerling, Benjamin and Si, Chenglei and Salesky, Elizabeth and Mielke, Sabrina J. and Lee, Wilson Y. and Sharma, Abheesht and Santilli, Andrea and Chaffin, Antoine and Stiegler, Arnaud and Datta, Debajyoti and Szczechla, Eliza and Chhablani, Gunjan and Wang, Han and Pandey, Harshit and Strobelt, Hendrik and Fries, Jason Alan and Rozen, Jos and Gao, Leo and Sutawika, Lintang and Bari, M Saiful and Al-Shaibani, Maged S. and Manica, Matteo and Nayak, Nihal V. and Teehan, Ryan and Albanie, Samuel and Shen, Sheng and Ben-David, Srulik and Bach, Stephen H. and Kim, Taewoon and Bers, Tali and F{\'e}vry, Thibault and Neeraj, Trishala and Thakker, Urmish and Raunak, Vikas and Tang, Xiang and Yong, Zheng-Xin and Sun, Zhiqing and Brody, Shaked and Uri, Y and Tojarieh, Hadar and Roberts, Adam and Chung, Hyung Won and Tae, Jaesung and Phang, Jason and Press, Ofir and Li, Conglong and Narayanan, Deepak and Bourfoune, Hatim and Casper, Jared and Rasley, Jeff and Ryabinin, Max and Mishra, Mayank and Zhang, Minjia and Shoeybi, Mohammad and Peyrounette, Myriam and Patry, Nicolas and Tazi, Nouamane and Sanseviero, Omar and von Platen, Patrick and Cornette, Pierre and Lavall'ee, Pierre Franccois and Lacroix, R{\'e}mi and Rajbhandari, Samyam and Gandhi, Sanchit and Smith, Shaden and Requena, St{\'e}phane and Patil, Suraj and Dettmers, Tim and Baruwa, Ahmed and Singh, Amanpreet and Cheveleva, Anastasia and Ligozat, Anne-Laure and Subramonian, Arjun and N'ev'eol, Aur'elie and Lovering, Charles and Garrette, Daniel H and Tunuguntla, Deepak R. and Reiter, Ehud and Taktasheva, Ekaterina and Voloshina, Ekaterina and Bogdanov, Eli and Winata, Genta Indra and Schoelkopf, Hailey and Kalo, Jan-Christoph and Novikova, Jekaterina and Forde, Jessica Zosa and Tang, Xiangru and Kasai, Jungo and Kawamura, Ken and Hazan, Liam and Carpuat, Marine and Clinciu, Miruna and Kim, Najoung and Cheng, Newton and Serikov, Oleg and Antverg, Omer and van der Wal, Oskar and Zhang, Rui and Zhang, Ruochen and Gehrmann, Sebastian and Mirkin, Shachar and Pais, S. Osher and Shavrina, Tatiana and Scialom, Thomas and Yun, Tian and Limisiewicz, Tomasz and Rieser, Verena and Protasov, Vitaly and Mikhailov, Vladislav and Pruksachatkun, Yada and Belinkov, Yonatan and Bamberger, Zachary and Kasner, Zdenvek and Kasner, Zdeněk and Pestana, Amanda and Feizpour, Amir and Khan, Ammar and Faranak, Amy and Santos, Ananda Santa Rosa and Hevia, Anthony and Unldreaj, Antigona and Aghagol, Arash and Abdollahi, Arezoo and Tammour, Aycha and HajiHosseini, Azadeh and Behroozi, Bahareh and Ajibade, Benjamin Ayoade and Saxena, Bharat Kumar and Ferrandis, Carlos Mu{\~n}oz and Contractor, Danish and Lansky, David M. and David, Davis and Kiela, Douwe and Nguyen, Duong Anh and Tan, Edward and Baylor, Emi and Ozoani, Ezinwanne and Mirza, Fatim Tahirah and Ononiwu, Frankline and Rezanejad, Habib and Jones, H.A. and Bhattacharya, Indrani and Solaiman, Irene and Sedenko, Irina and Nejadgholi, Isar and Passmore, Jan and Seltzer, Joshua and Sanz, Julio Bonis and Fort, Karen and Dutra, L{\'i}via and Samagaio, Mairon and Elbadri, Maraim and Mieskes, Margot and Gerchick, Marissa and Akinlolu, Martha and McKenna, Michael and Qiu, Mike and Ghauri, Muhammed and Burynok, Mykola and Abrar, Nafis and Rajani, Nazneen and Elkott, Nour and Fahmy, Nourhan and Samuel, Olanrewaju and An, Ran and Kromann, R. P. and Hao, Ryan and Alizadeh, Samira and Shubber, Sarmad and Wang, Silas L. and Roy, Sourav and Viguier, Sylvain and Le, Thanh-Cong and Oyebade, Tobi and Le, Trieu Nguyen Hai and Yang, Yoyo and Nguyen, Zach and Kashyap, Abhinav Ramesh and Palasciano, Alfredo and Callahan, Alison and Shukla, Anima and Miranda-Escalada, Antonio and Singh, Ayush Kumar and Beilharz, Benjamin and Wang, Bo and de Brito, Caio Matheus Fonseca and Zhou, Chenxi and Jain, Chirag and Xu, Chuxin and Fourrier, Cl{\'e}mentine and Perin'an, Daniel Le'on and Molano, Daniel and Yu, Dian and Manjavacas, Enrique and Barth, Fabio and Fuhrimann, Florian and Altay, Gabriel and Bayrak, Giyaseddin and Burns, Gully and Vrabec, Helena U. and Bello, Iman I.B. and Dash, Isha and Kang, Ji Soo and Giorgi, John and Golde, Jonas and Posada, Jose David and Sivaraman, Karthi and Bulchandani, Lokesh and Liu, Lu and Shinzato, Luisa and de Bykhovetz, Madeleine Hahn and Takeuchi, Maiko and P{\`a}mies, Marc and Castillo, Mar{\'i}a Andrea and Nezhurina, Marianna and Sanger, Mario and Samwald, Matthias and Cullan, Michael and Weinberg, Michael and Wolf, M and Mihaljcic, Mina and Liu, Minna and Freidank, Moritz and Kang, Myungsun and Seelam, Natasha and Dahlberg, Nathan and Broad, Nicholas Michio and Muellner, Nikolaus and Fung, Pascale and Haller, Patricia and Haller, Patrick and Eisenberg, Renata and Martin, Robert and Canalli, Rodrigo and Su, Rosaline and Su, Ruisi and Cahyawijaya, Samuel and Garda, Samuele and Deshmukh, Shlok S and Mishra, Shubhanshu and Kiblawi, Sid and Ott, Simon and Sang-aroonsiri, Sinee and Kumar, Srishti and Schweter, Stefan and Bharati, Sushil Pratap and Laud, Tanmay and Gigant, Th{\'e}o and Kainuma, Tomoya and Kusa, Wojciech and Labrak, Yanis and Bajaj, Yashasvi and Venkatraman, Y. and Xu, Yifan and Xu, Ying and Xu, Yu and Tan, Zhee Xao and Xie, Zhongli and Ye, Zifan and Bras, Mathilde and Belkada, Younes and Wolf, Thomas},
  journal = {ArXiv},
  year = {2022},
  volume = {abs/2211.05100},
  url = {https://api.semanticscholar.org/CorpusID:253420279},
  paper_link = {https://arxiv.org/abs/2211.05100/},
}

NeurIPS
Can unconditional language models recover arbitrary sentences?

Nishant Subramani, Samuel Bowman, and Kyunghyun Cho

Advances in NeurIPS, 2019

Paper Abs Bib

Neural network-based generative language models like ELMo and BERT can work effectively as general purpose sentence encoders in text classification without further fine-tuning. Is it possible to adapt them in a similar way for use as general-purpose decoders? For this to be possible, it would need to be the case that for any target sentence of interest, there is some continuous representation that can be passed to the language model to cause it to reproduce that sentence. We set aside the difficult problem of designing an encoder that can produce such representations and, instead, ask directly whether such representations exist at all. To do this, we introduce a pair of effective, complementary methods for feeding representations into pretrained unconditional language models and a corresponding set of methods to map sentences into and out of this representation space, the reparametrized sentence space. We then investigate the conditions under which a language model can be made to generate a sentence through the identification of a point in such a space and find that it is possible to recover arbitrary sentences nearly perfectly with language models and representations of moderate size without modifying any model parameters.
@article{subramani2019can, title = {Can unconditional language models recover arbitrary sentences?}, author = {Subramani, Nishant and Bowman, Samuel and Cho, Kyunghyun}, journal = {Advances in NeurIPS}, volume = {32}, year = {2019}, url = {https://proceedings.neurips.cc/paper_files/paper/2019/file/48c8c3963853fff20bd9e8bee9bd4c07-Paper.pdf}, paper_link = {https://arxiv.org/abs/1907.04944/}, }