Nishant Subramani

Model Interpretability Researcher 🔎

prof_pic.jpg

I am a 3rd year PhD student 🎓 at CMU in the Language Technologies Institute (LTI) advised by Mona Diab. I am primarily interested in demystifying language model internals (MechInterp) and operationalizing them (ActionableInterp) to build more responsible, controllable, trustworthy, and efficient NLP systems. I wrote the first papers on steering vectors (NeurIPS2019, arXiv2020, ACL2022) and worked on open LLM initiatives such as OLMo and BLOOM winning multiple best paper awards and a GeekWire innovation of the year award.

If you’re interested in or actively working on model interpretability especially if you are from an underrepresented group, I’d love to chat and perhaps collaborate and/or mentor, please reach out!

The past two summers, I interned at the semantic machines team at Microsoft Research and the cloud AI research team at Google working on actionable interpretability. Before CMU, I spent nearly two and a half years as a predoctoral researcher on the AllenNLP team at AI2, where I worked with Matt Peters. Before that I spent two years in industry at startups as a research scientist working on NLP, vision, speech, and multimodal applications. I have had the opportunity to work closely with some amazing collaborators at other institutions including Jason Eisner, Ben Van Durme, Margaret Mitchell, Sasha Luccioni, Vladlen Koltun, and Doug Downey.


news

Oct 2025 New paper :microscope: LLM Microscope: What Model Internals Reveal About Answer Correctness and Context Utilization led by Jiarui Liu and Jivitesh Jain is out!
Oct 2025 At COLM :llama: in Montreal :bagel: presenting BERTology in the Modern World, :microscope: LLM Microscope, and :mouse: MICE for CATs at the Interplay workshop!
Aug 2025 At the New England Mech Interp workshop (NEMI) in Boston :tea: to present :mouse:MICE for CATs!!
Aug 2025 🩁 SimBA is accepted to EMNLP findings. See you in Suzhou in November :cn:!
Jul 2025 At ACL in Vienna to present Personal Information Parroting in LMs at the L2M2 workshop! Reach out if you want to chat!
Jul 2025 BERTology in the Modern World and :microscope: LLM Microscope are both accepted at the Interplay workshop at COLM2025 :maple_leaf:!
Jul 2025 At ICML in Vancouver :maple_leaf: to present SimBA on making LLM evaluation more efficient at the reliable and responsible foundation model workshop and Personal Information Parroting in LMs at the memorization workshop. Reach out if you want to chat about these topics, interpretability 🔎, or really anything NLP!
Jun 2025 New preprint: đŸ•” Model Internal Sleuthing: Finding Lexical Identity and Inflectional Morphology in Modern Language Models led by my undergrad mentee Michael Li is out!
May 2025 Started as a student researcher (intern) on the Cloud AI research team at Google Cloud working with Hamid Palangi on actionable interpretability 🔎 to supercharge tool-using agents đŸ€–
Apr 2025 At NAACL in Albuquerque :cactus: to present :mouse:MICE for CATs! Reach out if you want to chat about interpretability things 🔎

selected publications

  1. COLM (Interplay)
    modelinternalsleuthing.png
    Model Internal Sleuthing: Finding Lexical Identity and Inflectional Morphology in Modern Language Models
    Michael Li, and Nishant Subramani
    Workshop on the Interplay of Model Behavior and Model Internals, 2025
  2. COLM (Interplay)
    mimicroscope.png
    LLM Microscope: What Model Internals Reveal About Answer Correctness and Context Use
    Jiarui Liu*, Jivitesh Jain*, Mona T. Diab, and Nishant Subramani
    Workshop on the Interplay of Model Behavior and Model Internals, 2025
  3. EMNLP
    simba.png
    SimBA: Simplifying Benchmark Analysis Using Performance Matrices Alone
    Nishant Subramani*, Alfredo Gomez*, and Mona T. Diab
    In Findings of EMNLP, 2025
  4. ICML (MEMFM)
    pythiaparrot.png
    Personal Information Parroting in Language Models
    Nishant Subramani, Kshitish Ghate, and Mona Diab
    Workshop on The Impact of Memorization on Trustworthy Foundation Models, 2025
  5. NAACL
    mice4cat.png
    MICE for CATs: Model-Internal Confidence Estimation for Calibrating Agents with Tools
    Nishant Subramani, Jason Eisner, Justin Svegliato, Benjamin Van Durme, Yu Su, and Sam Thomson
    In Proceedings of NAACL, 2025
  6. ACL
    olmo.png
    Best Paper
    OLMo: Accelerating the Science of Language Models
    Dirk Groeneveld, Iz Beltagy, Evan Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson, and 31 more authors
    In Proceedings of ACL, 2024
  7. ACL
    dolma.png
    Best Resource Paper
    Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
    Luca Soldaini, Rodney Kinney, Akshita Bhagia, Dustin Schwenk, David Atkinson, Russell Authur, Ben Bogin, Khyathi Chandu, Jennifer Dumas, Yanai Elazar, Valentin Hofmann, Ananya Jha, and 24 more authors
    In Proceedings of ACL, 2024
  8. ACL
    steering_vecs_acl22.png
    Extracting Latent Steering Vectors from Pretrained Language Models
    Nishant Subramani, Nivedita Suresh, and Matthew Peters
    In Findings of ACL, 2022
  9. BigScience Workshop
    bloom_fig.png
    BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
    Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ili’c, Daniel Hesslow, Roman Castagn’e, Alexandra Sasha Luccioni, François Yvon, Matthias GallĂ©, Jonathan Tow, Alexander M. Rush, and 379 more authors
    ArXiv, 2022
  10. NeurIPS
    neurips2019_lstm_steering.png
    Can unconditional language models recover arbitrary sentences?
    Nishant Subramani, Samuel Bowman, and Kyunghyun Cho
    Advances in NeurIPS, 2019