publications | Nishant Subramani

2025

Under Review
Making Sense of LLM Benchmarks from the Performance Matrices Alone

Nishant Subramani^*, Alfredo Gomez^*, and Mona T. Diab

Preprint, 2025

Paper Abs Bib

Modern language models are evaluated on large benchmarks. Given how many different numbers these evaluations output, making sense of them for model selection can be difficult. We take a closer look at this using a model-centric lens and look at the evaluation numbers themselves. In this work, we analyze benchmarks in three stages: dataset & model comparison, representative set identification, and performance prediction. Since datasets and models relate strongly to one another, we develop an algorithm to identify a representative set of datasets that covers a benchmark using the raw evaluation scores alone. Using our algorithm, we find that with 5.9% (1/17), 1.7% (1/58), and 16.2% (12/74) of the datasets for HELM, MMLU, and BigBenchLite respectively, we achieve coverage levels of at least 95%. Additionally, using just these representative subsets, we can both preserve model ranks and predict performance on a held-out set of models with near zero mean-squared error. Taken together, our analysis can help model developers improve efficiency and allow dataset creators validate whether their newly created dataset differs from existing datasets in the benchmark.
@article{Subramani2025MakingSense, title = {Making Sense of LLM Benchmarks from the Performance Matrices Alone}, author = {Subramani, Nishant and Gomez, Alfredo and Diab, Mona T.}, journal = {Preprint}, year = {2025}, url = {}, paper_link = {}, }
Under Review
LLM Microscope: What Model Internals Reveal About Answer Correctness and Context Use

Jiarui Liu^*, Jivitesh Jain^*, Mona T. Diab, and Nishant Subramani

Preprint, 2025

Paper Abs Bib

Although large language models (LLMs) have tremendous utility, trustworthiness is still a chief concern: models often generate incorrect information with high confidence. Retrieval-augmented generation (RAG) is known to help, but identifying queries which may benefit from retrieved context, and the efficacy of such context, remains challenging. In this work, we operationalize interpretability methods to ascertain whether we can predict the correctness of model outputs from the model’s activations alone, and if model internals contain signals about the efficacy of external context. We consider correct, incorrect, and irrelevant context and introduce metrics to distinguish amongst them. Experiments on six different models reveal that a simple classifier trained on first-layer activations of the first output token can predict output correctness with high accuracy, enabling early auditing. Our model-internals-based metric significantly outperforms prompting baselines at distinguishing between correct and incorrect context, guarding against inaccuracies introduced by polluted context. These findings offer a lens to better understand the underlying decision-making processes of LLMs.
@article{LiuJain2025MIMicroscope, title = {LLM Microscope: What Model Internals Reveal About Answer Correctness and Context Use}, author = {Liu, Jiarui and Jain, Jivitesh and Diab, Mona T. and Subramani, Nishant}, journal = {Preprint}, year = {2025}, url = {}, paper_link = {}, }
Under Review
Model Internal Sleuthing: Finding Lexical Identity and Inflectional Morphology in Modern Language Models

Michael Li, and Nishant Subramani

Preprint, 2025

Paper Abs Bib

Large transformer-based language models dominate modern NLP, yet our understanding of how they encode linguistic information is rooted in studies of early models like BERT and GPT-2. To better understand today’s language models, we investigate how both classical architectures (BERT, DeBERTa, GPT-2)and contemporary large language models (Pythia, OLMo-2, Gemma-2, Qwen2.5, Llama-3.1) represent lexical identity and inflectional morphology. We train linear and nonlinear classifiers on layer-wise activations to predict word lemmas and inflectional features. We discover that models concentrate lexical information linearly in early layers and increasingly nonlinearly in later layers, while keeping inflectional information uniformly accessible and linearly separable throughout the layers. Further analysis reveals that these models encode inflectional morphology through generalizable abstractions, but rely predominantly on memorization to encode lexical identity. Remarkably, these patterns emerge across all 16 models we test, despite differences in architecture, size, and training regime (including pretrained and instruction-tuned variants). This consistency suggests that, despite substantial advances in LLM technologies, transformer models organize linguistic information in similar ways, indicating that these properties could be fundamental for next token prediction and are learned early during pretraining. Our code is available at https://github.com/ml5885/model_internal_sleuthing.
@article{LiSubramani2025ModelInternalSleuthing, title = {Model Internal Sleuthing: Finding Lexical Identity and Inflectional Morphology in Modern Language Models}, author = {Li, Michael and Subramani, Nishant}, journal = {Preprint}, year = {2025}, url = {https://arxiv.org/abs/2506.02132}, paper_link = {https://arxiv.org/abs/2506.02132}, }
Under Review
Personal Information Parroting in Language Models

Nishant Subramani, Kshitish Ghate, and Mona Diab

Preprint, 2025

Paper Abs Bib

Modern language models are trained on large scrapes of the Web, containing millions of personal information (PI) instances, many of which language models (LMs) memorize, increasing privacy risks. In this work, we develop the regexes and rules (R&R) detector suite to detect email addresses, phone numbers, and IP addresses, which outperforms the best regex-based PI detectors. On a manually curated set of 483 instances of PI, we measure memorization and parroting: finding that 13.6% are parroted verbatim by the Pythia-6.9b model, i.e. when the model is prompted with the tokens that precede the PI in the original document, greedy decoding generates the entire PI span exactly. We expand this analysis to study models of varying sizes (160M-6.9B) and timesteps of pretraining (70k-143k iterations) on the Pythia model suite and find that both model size and amount of pretraining are positively correlated with memorization. Even the smallest model, Pythia-160m, parrots 2.7% of the instances exactly. Consequently, we strongly recommend that pretraining datasets be aggressively filtered and anonymized to minimize PI parroting.
@article{Subramani2025personalinfoparroting, title = {Personal Information Parroting in Language Models}, author = {Subramani, Nishant and Ghate, Kshitish and Diab, Mona}, journal = {Preprint}, year = {2025}, url = {}, paper_link = {}, }
NAACL
MICE for CATs: Model-Internal Confidence Estimation for Calibrating Agents with Tools

Nishant Subramani, Jason Eisner, Justin Svegliato, Benjamin Van Durme, Yu Su, and Sam Thomson

In Proceedings of NAACL, 2025

Paper Abs Bib

Tool-using agents that act in the world need to be both useful and safe. Well-calibrated model confidences can be used to weigh the risk versus reward of potential actions, but prior work shows that many models are poorly calibrated. Inspired by interpretability literature exploring the internals of models, we propose a novel class of model-internal confidence estimators (MICE) to better assess confidence when calling tools. MICE first decodes from each intermediate layer of the language model using logit lens (nostalgebraist, 2020) and then computes similarity scores between each layer’s generation and the final output. These features are fed into a learned probabilistic classifier to assess confidence in the decoded output. On the simulated trial and error (STE) tool-calling dataset using Llama3 models, we find that MICE beats or matches the baselines on smoothed expected calibration error. Using MICE confidences to determine whether to call a tool significantly improves over strong baselines on a new metric, expected tool-calling utility. Further experiments show that MICE is sample-efficient, can generalize zero-shot to unseen APIs, and results in higher tool-calling utility in scenarios with varying risk levels. Our code is open source, available at https://github.com/microsoft/mice_for_cats.
@inproceedings{subramani-etal-2025-mice, title = {{MICE} for {CAT}s: Model-Internal Confidence Estimation for Calibrating Agents with Tools}, author = {Subramani, Nishant and Eisner, Jason and Svegliato, Justin and Van Durme, Benjamin and Su, Yu and Thomson, Sam}, editor = {Chiruzzo, Luis and Ritter, Alan and Wang, Lu}, booktitle = {Proceedings of NAACL}, month = apr, year = {2025}, address = {Albuquerque, New Mexico}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2025.naacl-long.615/}, pages = {12362--12375}, isbn = {979-8-89176-189-6}, paper_link = {https://aclanthology.org/2025.naacl-long.615/}, }

2024

ACL

Best Paper

OLMo: Accelerating the Science of Language Models

Dirk Groeneveld, Iz Beltagy, Evan Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson, and 31 more authors

In Proceedings of ACL, 2024

Paper Abs Bib

Language models (LMs) have become ubiquitous in both NLP research and in commercial product offerings. As their commercial importance has surged, the most powerful models have become closed off, gated behind proprietary interfaces, with important details of their training data, architectures, and development undisclosed. Given the importance of these details in scientifically studying these models, including their biases and potential risks, we believe it is essential for the research community to have access to powerful, truly open LMs. To this end, we have built OLMo, a competitive, truly Open Language Model, to enable the scientific study of language models. Unlike most prior efforts that have only released model weights and inference code, we release OLMo alongside open training data and training and evaluation code. We hope this release will empower the open research community and inspire a new wave of innovation.

@inproceedings{groeneveld-etal-2024-olmo,
  title = {{OLM}o: Accelerating the Science of Language Models},
  author = {Groeneveld, Dirk and Beltagy, Iz and Walsh, Evan and Bhagia, Akshita and Kinney, Rodney and Tafjord, Oyvind and Jha, Ananya and Ivison, Hamish and Magnusson, Ian and Wang, Yizhong and Arora, Shane and Atkinson, David and Authur, Russell and Chandu, Khyathi and Cohan, Arman and Dumas, Jennifer and Elazar, Yanai and Gu, Yuling and Hessel, Jack and Khot, Tushar and Merrill, William and Morrison, Jacob and Muennighoff, Niklas and Naik, Aakanksha and Nam, Crystal and Peters, Matthew and Pyatkin, Valentina and Ravichander, Abhilasha and Schwenk, Dustin and Shah, Saurabh and Smith, William and Strubell, Emma and Subramani, Nishant and Wortsman, Mitchell and Dasigi, Pradeep and Lambert, Nathan and Richardson, Kyle and Zettlemoyer, Luke and Dodge, Jesse and Lo, Kyle and Soldaini, Luca and Smith, Noah and Hajishirzi, Hannaneh},
  editor = {Ku, Lun-Wei and Martins, Andre and Srikumar, Vivek},
  booktitle = {Proceedings of ACL},
  month = aug,
  year = {2024},
  address = {Bangkok, Thailand},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/2024.acl-long.841},
  paper_link = {https://arxiv.org/abs/2402.00838},
}

ACL

Best Resource Paper
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

Luca Soldaini, Rodney Kinney, Akshita Bhagia, Dustin Schwenk, David Atkinson, Russell Authur, Ben Bogin, Khyathi Chandu, Jennifer Dumas, Yanai Elazar, Valentin Hofmann, Ananya Jha, and 24 more authors

In Proceedings of ACL, 2024

Paper Abs Bib

Information about pretraining corpora used to train the current best-performing language models is seldom discussed: commercial models rarely detail their data, and even open models are often released without accompanying training data or recipes to reproduce them. As a result, it is challenging to conduct and advance scientific research on language modeling, such as understanding how training data impacts model capabilities and limitations. To facilitate scientific research on language model pretraining, we curate and release Dolma, a three-trillion-token English corpus, built from a diverse mixture of web content, scientific papers, code, public-domain books, social media, and encyclopedic materials. We extensively document Dolma, including its design principles, details about its construction, and a summary of its contents. We present analyses and experimental results on intermediate states of Dolma to share what we have learned about important data curation practices. Finally, we open-source our data curation toolkit to enable reproduction of our work as well as support further research in large-scale data curation.
@inproceedings{soldaini-etal-2024-dolma, title = {Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research}, author = {Soldaini, Luca and Kinney, Rodney and Bhagia, Akshita and Schwenk, Dustin and Atkinson, David and Authur, Russell and Bogin, Ben and Chandu, Khyathi and Dumas, Jennifer and Elazar, Yanai and Hofmann, Valentin and Jha, Ananya and Kumar, Sachin and Lucy, Li and Lyu, Xinxi and Lambert, Nathan and Magnusson, Ian and Morrison, Jacob and Muennighoff, Niklas and Naik, Aakanksha and Nam, Crystal and Peters, Matthew and Ravichander, Abhilasha and Richardson, Kyle and Shen, Zejiang and Strubell, Emma and Subramani, Nishant and Tafjord, Oyvind and Walsh, Evan and Zettlemoyer, Luke and Smith, Noah and Hajishirzi, Hannaneh and Beltagy, Iz and Groeneveld, Dirk and Dodge, Jesse and Lo, Kyle}, booktitle = {Proceedings of ACL}, year = {2024}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2024.acl-long.840}, paper_link = {https://arxiv.org/abs/2402.00159}, }

NAACL

Evaluating Personal Information Parroting in Language Models

Nishant Subramani, Kshitish Ghate, and Mona Diab

2024

Bib

@article{subramani-etal-2024-evaluating,
  title = {Evaluating Personal Information Parroting in Language Models},
  author = {Subramani, Nishant and Ghate, Kshitish and Diab, Mona},
  booktitle = {Proceedings of the TrustNLP Workshop},
  month = jun,
  year = {2024},
  address = {Mexico City, Mexico},
  publisher = {Association for Computational Linguistics},
  paper_link = {https://nishantsubramani.github.io},
}

2023

EMNLP

Extended Abstract

Robust Tooling and New Resources for Large Language Model Evaluation via Catwalk

Kyle Richardson, Ian Magnusson, Oyvind Tafjord, Akshita Bhagia, Iz Beltagy, Arman Cohan, Pradeep Dasigi, Jesse Dodge, Dirk Groeneveld, Yuling Gu, Tushar Harsh Jha, and Nishant Subramani

2023

Bib

@article{richardson-etal-2023-robust-tooling,
  title = {Robust Tooling and New Resources for Large Language Model Evaluation via Catwalk},
  author = {Richardson, Kyle and Magnusson, Ian and Tafjord, Oyvind and Bhagia, Akshita and Beltagy, Iz and Cohan, Arman and Dasigi, Pradeep and Dodge, Jesse and Groeneveld, Dirk and Gu, Yuling and Harsh Jha, Ananya Khot, Tushar and Subramani, Nishant},
  booktitle = {Proceedings of the GEM Workshop},
  month = dec,
  year = {2023},
  address = {Sinagpore, Singapore},
  publisher = {Association for Computational Linguistics},
  paper_link = {https://nishantsubramani.github.io},
}

ACL
Detecting Personal Information in Training Corpora: an Analysis

Nishant Subramani, Sasha Luccioni, Jesse Dodge, and Margaret Mitchell

In Proceedings of the TrustNLP Workshop, 2023

Paper Abs Bib

Large language models are trained on increasing quantities of unstructured text, the largest sources of which are scraped from the Web. These Web scrapes are mainly composed of heterogeneous collections of text from multiple domains with minimal documentation. While some work has been done to identify and remove toxic, biased, or sexual language, the topic of personal information (PI) in textual data used for training Natural Language Processing (NLP) models is relatively under-explored. In this work, we draw from definitions of PI across multiple countries to define the first PI taxonomy of its kind, categorized by type and risk level. We then conduct a case study on the Colossal Clean Crawled Corpus (C4) and the Pile, to detect some of the highest-risk personal information, such as email addresses and credit card numbers, and examine the differences between automatic and regular expression-based approaches for their detection. We identify shortcomings in modern approaches for PI detection, and propose a reframing of the problem that is informed by global perspectives and the goals in personal information detection.
@inproceedings{subramani-etal-2023-detecting, title = {Detecting Personal Information in Training Corpora: an Analysis}, author = {Subramani, Nishant and Luccioni, Sasha and Dodge, Jesse and Mitchell, Margaret}, editor = {Ovalle, Anaelia and Chang, Kai-Wei and Mehrabi, Ninareh and Pruksachatkun, Yada and Galystan, Aram and Dhamala, Jwala and Verma, Apurv and Cao, Trista and Kumar, Anoop and Gupta, Rahul}, booktitle = {Proceedings of the TrustNLP Workshop}, year = {2023}, address = {Toronto, Canada}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2023.trustnlp-1.18}, paper_link = {https://aclanthology.org/2023.trustnlp-1.18/}, }

2022

EMNLP
Don’t Say What You Don’t Know: Improving the Consistency of Abstractive Summarization by Constraining Beam Search

Daniel King, Zejiang Shen, Nishant Subramani, Daniel S. Weld, Iz Beltagy, and Doug Downey

In Proceedings of the GEM Workshop, 2022

Paper Abs Bib

Abstractive summarization systems today produce fluent and relevant output, but often “hallucinate” statements not supported by the source text. We analyze the connection between hallucinations and training data, and find evidence that models hallucinate because they train on target summaries that are unsupported by the source. Based on our findings, we present PINOCCHIO, a new decoding method that improves the consistency of a transformer-based abstractive summarizer by constraining beam search to avoid hallucinations. Given the model states and outputs at a given step, PINOCCHIO detects likely model hallucinations based on various measures of attribution to the source text. PINOCCHIO backtracks to find more consistent output, and can opt to produce no summary at all when no consistent generation can be found. In experiments, we find that PINOCCHIO improves the consistency of generation by an average of 67% on two abstractive summarization datasets, without hurting recall.
@inproceedings{king-etal-2022-dont, title = {Don{'}t Say What You Don{'}t Know: Improving the Consistency of Abstractive Summarization by Constraining Beam Search}, author = {King, Daniel and Shen, Zejiang and Subramani, Nishant and Weld, Daniel S. and Beltagy, Iz and Downey, Doug}, editor = {Bosselut, Antoine and Chandu, Khyathi and Dhole, Kaustubh and Gangal, Varun and Gehrmann, Sebastian and Jernite, Yacine and Novikova, Jekaterina and Perez-Beltrachini, Laura}, booktitle = {Proceedings of the GEM Workshop}, month = dec, year = {2022}, address = {Abu Dhabi, United Arab Emirates (Hybrid)}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2022.gem-1.51}, paper_link = {https://arxiv.org/abs/2203.08436/}, }

EMNLP

GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

Sebastian Gehrmann, Abhik Bhattacharjee, Abinaya Mahendiran, Alex Wang, Alexandros Papangelis, Aman Madaan, Angelina Mcmillan-major, Anna Shvets, Ashish Upadhyay, Bernd Bohnet, Bingsheng Yao, Bryan Wilie, and 65 more authors

In Proceedings of EMNLP System Demonstrations, 2022

Paper Abs Bib

Evaluation in machine learning is usually informed by past choices, for example which datasets or metrics to use. This standardization enables the comparison on equal footing using leaderboards, but the evaluation choices become sub-optimal as better alternatives arise. This problem is especially pertinent in natural language generation which requires ever-improving suites of datasets, metrics, and human evaluation to make definitive claims. To make following best model evaluation practices easier, we introduce GEMv2. The new version of the Generation, Evaluation, and Metrics Benchmark introduces a modular infrastructure for dataset, model, and metric developers to benefit from each others work. GEMv2 supports 40 documented datasets in 51 languages. Models for all datasets can be evaluated online and our interactive data card creation and rendering tools make it easier to add new datasets to the living benchmark.

@inproceedings{gehrmann-etal-2022-gemv2,
  title = {{GEM}v2: Multilingual {NLG} Benchmarking in a Single Line of Code},
  author = {Gehrmann, Sebastian and Bhattacharjee, Abhik and Mahendiran, Abinaya and Wang, Alex and Papangelis, Alexandros and Madaan, Aman and Mcmillan-major, Angelina and Shvets, Anna and Upadhyay, Ashish and Bohnet, Bernd and Yao, Bingsheng and Wilie, Bryan and Bhagavatula, Chandra and You, Chaobin and Thomson, Craig and Garbacea, Cristina and Wang, Dakuo and Deutsch, Daniel and Xiong, Deyi and Jin, Di and Gkatzia, Dimitra and Radev, Dragomir and Clark, Elizabeth and Durmus, Esin and Ladhak, Faisal and Ginter, Filip and Winata, Genta Indra and Strobelt, Hendrik and Hayashi, Hiroaki and Novikova, Jekaterina and Kanerva, Jenna and Chim, Jenny and Zhou, Jiawei and Clive, Jordan and Maynez, Joshua and Sedoc, Jo{\~a}o and Juraska, Juraj and Dhole, Kaustubh and Chandu, Khyathi Raghavi and Beltrachini, Laura Perez and Ribeiro, Leonardo F . R. and Tunstall, Lewis and Zhang, Li and Pushkarna, Mahim and Creutz, Mathias and White, Michael and Kale, Mihir Sanjay and Eddine, Moussa Kamal and Daheim, Nico and Subramani, Nishant and Dusek, Ondrej and Liang, Paul Pu and Ammanamanchi, Pawan Sasanka and Zhu, Qi and Puduppully, Ratish and Kriz, Reno and Shahriyar, Rifat and Cardenas, Ronald and Mahamood, Saad and Osei, Salomey and Cahyawijaya, Samuel and {\v{S}}tajner, Sanja and Montella, Sebastien and Jolly, Shailza and Mille, Simon and Hasan, Tahmid and Shen, Tianhao and Adewumi, Tosin and Raunak, Vikas and Raheja, Vipul and Nikolaev, Vitaly and Tsai, Vivian and Jernite, Yacine and Xu, Ying and Sang, Yisi and Liu, Yixin and Hou, Yufang},
  editor = {Che, Wanxiang and Shutova, Ekaterina},
  booktitle = {Proceedings of EMNLP System Demonstrations},
  month = dec,
  year = {2022},
  address = {Abu Dhabi, UAE},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/2022.emnlp-demos.27},
  paper_link = {https://arxiv.org/abs/2206.11249/},
}

ACL
Extracting Latent Steering Vectors from Pretrained Language Models

Nishant Subramani, Nivedita Suresh, and Matthew Peters

In Findings of ACL, 2022

Paper Abs Bib

Prior work on controllable text generation has focused on learning how to control language models through trainable decoding, smart-prompt design, or fine-tuning based on a desired objective. We hypothesize that the information needed to steer the model to generate a target sentence is already encoded within the model. Accordingly, we explore a different approach altogether: extracting latent vectors directly from pretrained language model decoders without fine-tuning. Experiments show that there exist steering vectors, which, when added to the hidden states of the language model, generate a target sentence nearly perfectly (> 99 BLEU) for English sentences from a variety of domains. We show that vector arithmetic can be used for unsupervised sentiment transfer on the Yelp sentiment benchmark, with performance comparable to models tailored to this task. We find that distances between steering vectors reflect sentence similarity when evaluated on a textual similarity benchmark (STS-B), outperforming pooled hidden states of models. Finally, we present an analysis of the intrinsic properties of the steering vectors. Taken together, our results suggest that frozen LMs can be effectively controlled through their latent steering space.
@inproceedings{subramani-etal-2022-extracting, title = {Extracting Latent Steering Vectors from Pretrained Language Models}, author = {Subramani, Nishant and Suresh, Nivedita and Peters, Matthew}, editor = {Muresan, Smaranda and Nakov, Preslav and Villavicencio, Aline}, booktitle = {Findings of ACL}, month = may, year = {2022}, address = {Dublin, Ireland}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2022.findings-acl.48}, paper_link = {https://arxiv.org/abs/2205.05124/}, }

BigScience Workshop

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ili’c, Daniel Hesslow, Roman Castagn’e, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow, Alexander M. Rush, and 379 more authors

ArXiv, 2022

Paper Abs Bib

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.

@article{Scao2022BLOOMA1,
  title = {BLOOM: A 176B-Parameter Open-Access Multilingual Language Model},
  author = {Scao, Teven Le and Fan, Angela and Akiki, Christopher and Pavlick, Ellie and Ili'c, Suzana and Hesslow, Daniel and Castagn'e, Roman and Luccioni, Alexandra Sasha and Yvon, François and Gall{\'e}, Matthias and Tow, Jonathan and Rush, Alexander M. and Biderman, Stella and Webson, Albert and Ammanamanchi, Pawan Sasanka and Wang, Thomas and Sagot, Beno{\i}t and Muennighoff, Niklas and del Moral, Albert Villanova and Ruwase, Olatunji and Bawden, Rachel and Bekman, Stas and McMillan-Major, Angelina and Beltagy, Iz and Nguyen, Huu and Saulnier, Lucile and Tan, Samson and Suarez, Pedro Ortiz and Sanh, Victor and Laurenccon, Hugo and Jernite, Yacine and Launay, Julien and Mitchell, Margaret and Raffel, Colin and Gokaslan, Aaron and Simhi, Adi and Etxabe, Aitor Soroa and Aji, Alham Fikri and Alfassy, Amit and Rogers, Anna and Nitzav, Ariel Kreisberg and Xu, Canwen and Mou, Chenghao and Emezue, Chris C. and Klamm, Christopher and Leong, Colin and van Strien, Daniel Alexander and Adelani, David Ifeoluwa and Radev, Dragomir R. and Ponferrada, Eduardo Gonz'alez and Levkovizh, Efrat and Kim, Ethan and Natan, Eyal and Toni, Francesco De and Dupont, G{\'e}rard and Kruszewski, Germ{\'a}n and Pistilli, Giada and ElSahar, Hady and Benyamina, Hamza and Tran, Hieu Trung and Yu, Ian and Abdulmumin, Idris and Johnson, Isaac and Gonzalez-Dios, Itziar and de la Rosa, Javier and Chim, Jenny and Dodge, Jesse and Zhu, Jian and Chang, Jonathan and Frohberg, Jorg and Tobing, Josephine and Bhattacharjee, Joydeep and Almubarak, Khalid and Chen, Kimbo and Lo, Kyle and von Werra, Leandro and Weber, Leon and Phan, Long and Allal, Loubna Ben and Tanguy, Ludovic and Dey, Manan and Mu{\~n}oz, Manuel Romero and Masoud, Maraim and Grandury, Mar'ia and vSavsko, Mario and Huang, Max and Coavoux, Maximin and Singh, Mayank and Jiang, Mike Tian-Jian and Vu, Minh Chien and Jauhar, Mohammad A. and Ghaleb, Mustafa and Subramani, Nishant and Kassner, Nora and Khamis, Nurulaqilla and Nguyen, Olivier and Espejel, Omar and de Gibert, Ona and Villegas, Paulo and Henderson, Peter and Colombo, Pierre and Amuok, Priscilla and Lhoest, Quentin and Harliman, Rheza and Bommasani, Rishi and L'opez, Roberto and Ribeiro, Rui and Osei, Salomey and Pyysalo, Sampo and Nagel, Sebastian and Bose, Shamik and Muhammad, Shamsuddeen Hassan and Sharma, Shanya and Longpre, S. and Nikpoor, Somaieh and Silberberg, S. and Pai, Suhas and Zink, Sydney and Torrent, Tiago Timponi and Schick, Timo and Thrush, Tristan and Danchev, Valentin and Nikoulina, Vassilina and Laippala, Veronika and Lepercq, Violette and Prabhu, Vrinda and Alyafeai, Zaid and Talat, Zeerak and Raja, Arun and Heinzerling, Benjamin and Si, Chenglei and Salesky, Elizabeth and Mielke, Sabrina J. and Lee, Wilson Y. and Sharma, Abheesht and Santilli, Andrea and Chaffin, Antoine and Stiegler, Arnaud and Datta, Debajyoti and Szczechla, Eliza and Chhablani, Gunjan and Wang, Han and Pandey, Harshit and Strobelt, Hendrik and Fries, Jason Alan and Rozen, Jos and Gao, Leo and Sutawika, Lintang and Bari, M Saiful and Al-Shaibani, Maged S. and Manica, Matteo and Nayak, Nihal V. and Teehan, Ryan and Albanie, Samuel and Shen, Sheng and Ben-David, Srulik and Bach, Stephen H. and Kim, Taewoon and Bers, Tali and F{\'e}vry, Thibault and Neeraj, Trishala and Thakker, Urmish and Raunak, Vikas and Tang, Xiang and Yong, Zheng-Xin and Sun, Zhiqing and Brody, Shaked and Uri, Y and Tojarieh, Hadar and Roberts, Adam and Chung, Hyung Won and Tae, Jaesung and Phang, Jason and Press, Ofir and Li, Conglong and Narayanan, Deepak and Bourfoune, Hatim and Casper, Jared and Rasley, Jeff and Ryabinin, Max and Mishra, Mayank and Zhang, Minjia and Shoeybi, Mohammad and Peyrounette, Myriam and Patry, Nicolas and Tazi, Nouamane and Sanseviero, Omar and von Platen, Patrick and Cornette, Pierre and Lavall'ee, Pierre Franccois and Lacroix, R{\'e}mi and Rajbhandari, Samyam and Gandhi, Sanchit and Smith, Shaden and Requena, St{\'e}phane and Patil, Suraj and Dettmers, Tim and Baruwa, Ahmed and Singh, Amanpreet and Cheveleva, Anastasia and Ligozat, Anne-Laure and Subramonian, Arjun and N'ev'eol, Aur'elie and Lovering, Charles and Garrette, Daniel H and Tunuguntla, Deepak R. and Reiter, Ehud and Taktasheva, Ekaterina and Voloshina, Ekaterina and Bogdanov, Eli and Winata, Genta Indra and Schoelkopf, Hailey and Kalo, Jan-Christoph and Novikova, Jekaterina and Forde, Jessica Zosa and Tang, Xiangru and Kasai, Jungo and Kawamura, Ken and Hazan, Liam and Carpuat, Marine and Clinciu, Miruna and Kim, Najoung and Cheng, Newton and Serikov, Oleg and Antverg, Omer and van der Wal, Oskar and Zhang, Rui and Zhang, Ruochen and Gehrmann, Sebastian and Mirkin, Shachar and Pais, S. Osher and Shavrina, Tatiana and Scialom, Thomas and Yun, Tian and Limisiewicz, Tomasz and Rieser, Verena and Protasov, Vitaly and Mikhailov, Vladislav and Pruksachatkun, Yada and Belinkov, Yonatan and Bamberger, Zachary and Kasner, Zdenvek and Kasner, Zdeněk and Pestana, Amanda and Feizpour, Amir and Khan, Ammar and Faranak, Amy and Santos, Ananda Santa Rosa and Hevia, Anthony and Unldreaj, Antigona and Aghagol, Arash and Abdollahi, Arezoo and Tammour, Aycha and HajiHosseini, Azadeh and Behroozi, Bahareh and Ajibade, Benjamin Ayoade and Saxena, Bharat Kumar and Ferrandis, Carlos Mu{\~n}oz and Contractor, Danish and Lansky, David M. and David, Davis and Kiela, Douwe and Nguyen, Duong Anh and Tan, Edward and Baylor, Emi and Ozoani, Ezinwanne and Mirza, Fatim Tahirah and Ononiwu, Frankline and Rezanejad, Habib and Jones, H.A. and Bhattacharya, Indrani and Solaiman, Irene and Sedenko, Irina and Nejadgholi, Isar and Passmore, Jan and Seltzer, Joshua and Sanz, Julio Bonis and Fort, Karen and Dutra, L{\'i}via and Samagaio, Mairon and Elbadri, Maraim and Mieskes, Margot and Gerchick, Marissa and Akinlolu, Martha and McKenna, Michael and Qiu, Mike and Ghauri, Muhammed and Burynok, Mykola and Abrar, Nafis and Rajani, Nazneen and Elkott, Nour and Fahmy, Nourhan and Samuel, Olanrewaju and An, Ran and Kromann, R. P. and Hao, Ryan and Alizadeh, Samira and Shubber, Sarmad and Wang, Silas L. and Roy, Sourav and Viguier, Sylvain and Le, Thanh-Cong and Oyebade, Tobi and Le, Trieu Nguyen Hai and Yang, Yoyo and Nguyen, Zach and Kashyap, Abhinav Ramesh and Palasciano, Alfredo and Callahan, Alison and Shukla, Anima and Miranda-Escalada, Antonio and Singh, Ayush Kumar and Beilharz, Benjamin and Wang, Bo and de Brito, Caio Matheus Fonseca and Zhou, Chenxi and Jain, Chirag and Xu, Chuxin and Fourrier, Cl{\'e}mentine and Perin'an, Daniel Le'on and Molano, Daniel and Yu, Dian and Manjavacas, Enrique and Barth, Fabio and Fuhrimann, Florian and Altay, Gabriel and Bayrak, Giyaseddin and Burns, Gully and Vrabec, Helena U. and Bello, Iman I.B. and Dash, Isha and Kang, Ji Soo and Giorgi, John and Golde, Jonas and Posada, Jose David and Sivaraman, Karthi and Bulchandani, Lokesh and Liu, Lu and Shinzato, Luisa and de Bykhovetz, Madeleine Hahn and Takeuchi, Maiko and P{\`a}mies, Marc and Castillo, Mar{\'i}a Andrea and Nezhurina, Marianna and Sanger, Mario and Samwald, Matthias and Cullan, Michael and Weinberg, Michael and Wolf, M and Mihaljcic, Mina and Liu, Minna and Freidank, Moritz and Kang, Myungsun and Seelam, Natasha and Dahlberg, Nathan and Broad, Nicholas Michio and Muellner, Nikolaus and Fung, Pascale and Haller, Patricia and Haller, Patrick and Eisenberg, Renata and Martin, Robert and Canalli, Rodrigo and Su, Rosaline and Su, Ruisi and Cahyawijaya, Samuel and Garda, Samuele and Deshmukh, Shlok S and Mishra, Shubhanshu and Kiblawi, Sid and Ott, Simon and Sang-aroonsiri, Sinee and Kumar, Srishti and Schweter, Stefan and Bharati, Sushil Pratap and Laud, Tanmay and Gigant, Th{\'e}o and Kainuma, Tomoya and Kusa, Wojciech and Labrak, Yanis and Bajaj, Yashasvi and Venkatraman, Y. and Xu, Yifan and Xu, Ying and Xu, Yu and Tan, Zhee Xao and Xie, Zhongli and Ye, Zifan and Bras, Mathilde and Belkada, Younes and Wolf, Thomas},
  journal = {ArXiv},
  year = {2022},
  volume = {abs/2211.05100},
  url = {https://api.semanticscholar.org/CorpusID:253420279},
  paper_link = {https://arxiv.org/abs/2211.05100/},
}

FAccT
Data Governance in the Age of Large-Scale Data-Driven Language Technology

Yacine Jernite, Huu Nguyen, Stella Biderman, Anna Rogers, Maraim Masoud, Valentin Danchev, Samson Tan, Alexandra Sasha Luccioni, Nishant Subramani, Isaac Johnson, Gerard Dupont, Jesse Dodge, and 8 more authors

In Proceedings of FAccT, 2022

Paper Abs Bib

The recent emergence and adoption of Machine Learning technology, and specifically of Large Language Models, has drawn attention to the need for systematic and transparent management of language data. This work proposes an approach to global language data governance that attempts to organize data management amongst stakeholders, values, and rights. Our proposal is informed by prior work on distributed governance that accounts for human values and grounded by an international research collaboration that brings together researchers and practitioners from 60 countries. The framework we present is a multi-party international governance structure focused on language data, and incorporating technical and organizational tools needed to support its work.
@inproceedings{10.1145/3531146.3534637, author = {Jernite, Yacine and Nguyen, Huu and Biderman, Stella and Rogers, Anna and Masoud, Maraim and Danchev, Valentin and Tan, Samson and Luccioni, Alexandra Sasha and Subramani, Nishant and Johnson, Isaac and Dupont, Gerard and Dodge, Jesse and Lo, Kyle and Talat, Zeerak and Radev, Dragomir and Gokaslan, Aaron and Nikpoor, Somaieh and Henderson, Peter and Bommasani, Rishi and Mitchell, Margaret}, title = {Data Governance in the Age of Large-Scale Data-Driven Language Technology}, year = {2022}, isbn = {9781450393522}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3531146.3534637}, booktitle = {Proceedings of FAccT}, series = {FAccT '22}, paper_link = {https://arxiv.org/abs/2206.03216/}, }

TACL

Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets

Julia Kreutzer, Isaac Caswell, Lisa Wang, Ahsan Wahab, Daan Esch, Nasanbayar Ulzii-Orshikh, Allahsera Tapo, Nishant Subramani, Artem Sokolov, Claytone Sikasote, Monang Setyawan, Supheakmungkol Sarin, and 40 more authors

TACL, 2022

Paper Abs Bib

With the success of large-scale pre-training and multilingual modeling in Natural Language Processing (NLP), recent years have seen a proliferation of large, web-mined text datasets covering hundreds of languages. We manually audit the quality of 205 language-specific corpora released with five major public datasets (CCAligned, ParaCrawl, WikiMatrix, OSCAR, mC4). Lower-resource corpora have systematic issues: At least 15 corpora have no usable text, and a significant fraction contains less than 50% sentences of acceptable quality. In addition, many are mislabeled or use nonstandard/ambiguous language codes. We demonstrate that these issues are easy to detect even for non-proficient speakers, and supplement the human audit with automatic analyses. Finally, we recommend techniques to evaluate and improve multilingual corpora and discuss potential risks that come with low-quality data releases.

@article{kreutzer-etal-2022-quality,
  title = {Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets},
  author = {Kreutzer, Julia and Caswell, Isaac and Wang, Lisa and Wahab, Ahsan and van Esch, Daan and Ulzii-Orshikh, Nasanbayar and Tapo, Allahsera and Subramani, Nishant and Sokolov, Artem and Sikasote, Claytone and Setyawan, Monang and Sarin, Supheakmungkol and Samb, Sokhar and Sagot, Beno{\\i}t and Rivera, Clara and Rios, Annette and Papadimitriou, Isabel and Osei, Salomey and Suarez, Pedro Ortiz and Orife, Iroro and Ogueji, Kelechi and Rubungo, Andre Niyongabo and Nguyen, Toan Q. and M{\"u}ller, Mathias and M{\"u}ller, Andr{\'e} and Muhammad, Shamsuddeen Hassan and Muhammad, Nanda and Mnyakeni, Ayanda and Mirzakhalov, Jamshidbek and Matangira, Tapiwanashe and Leong, Colin and Lawson, Nze and Kudugunta, Sneha and Jernite, Yacine and Jenny, Mathias and Firat, Orhan and Dossou, Bonaventure F. P. and Dlamini, Sakhile and de Silva, Nisansa and {\c{C}}abuk Ball{\i}, Sakine and Biderman, Stella and Battisti, Alessia and Baruwa, Ahmed and Bapna, Ankur and Baljekar, Pallavi and Azime, Israel Abebe and Awokoya, Ayodele and Ataman, Duygu and Ahia, Orevaoghene and Ahia, Oghenefego and Agrawal, Sweta and Adeyemi, Mofetoluwa},
  editor = {Roark, Brian and Nenkova, Ani},
  journal = {TACL},
  volume = {10},
  year = {2022},
  address = {Cambridge, MA},
  publisher = {MIT Press},
  url = {https://aclanthology.org/2022.tacl-1.4},
  paper_link = {https://arxiv.org/abs/2103.12028/},
}

2021

ACL

The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics

Sebastian Gehrmann, Tosin Adewumi, Karmanya Aggarwal, Pawan Sasanka Ammanamanchi, Anuoluwapo Aremu, Antoine Bosselut, Khyathi Raghavi Chandu, Miruna-Adriana Clinciu, Dipanjan Das, Kaustubh Dhole, Wanyu Du, Esin Durmus, and 44 more authors

In Proceedings of the GEM Workshop, 2021

Paper Abs Bib

We introduce GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics. Measuring progress in NLG relies on a constantly evolving ecosystem of automated metrics, datasets, and human evaluation standards. Due to this moving target, new models often still evaluate on divergent anglo-centric corpora with well-established, but flawed, metrics. This disconnect makes it challenging to identify the limitations of current models and opportunities for progress. Addressing this limitation, GEM provides an environment in which models can easily be applied to a wide set of tasks and in which evaluation strategies can be tested. Regular updates to the benchmark will help NLG research become more multilingual and evolve the challenge alongside models. This paper serves as the description of the data for which we are organizing a shared task at our ACL 2021 Workshop and to which we invite the entire NLG community to participate.

@inproceedings{gehrmann-etal-2021-gem,
  title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and Metrics},
  author = {Gehrmann, Sebastian and Adewumi, Tosin and Aggarwal, Karmanya and Ammanamanchi, Pawan Sasanka and Aremu, Anuoluwapo and Bosselut, Antoine and Chandu, Khyathi Raghavi and Clinciu, Miruna-Adriana and Das, Dipanjan and Dhole, Kaustubh and Du, Wanyu and Durmus, Esin and Du{\v{s}}ek, Ond{\v{r}}ej and Emezue, Chris Chinenye and Gangal, Varun and Garbacea, Cristina and Hashimoto, Tatsunori and Hou, Yufang and Jernite, Yacine and Jhamtani, Harsh and Ji, Yangfeng and Jolly, Shailza and Kale, Mihir and Kumar, Dhruv and Ladhak, Faisal and Madaan, Aman and Maddela, Mounica and Mahajan, Khyati and Mahamood, Saad and Majumder, Bodhisattwa Prasad and Martins, Pedro Henrique and McMillan-Major, Angelina and Mille, Simon and van Miltenburg, Emiel and Nadeem, Moin and Narayan, Shashi and Nikolaev, Vitaly and Niyongabo Rubungo, Andre and Osei, Salomey and Parikh, Ankur and Perez-Beltrachini, Laura and Rao, Niranjan Ramesh and Raunak, Vikas and Rodriguez, Juan Diego and Santhanam, Sashank and Sedoc, Jo{\~a}o and Sellam, Thibault and Shaikh, Samira and Shimorina, Anastasia and Sobrevilla Cabezudo, Marco Antonio and Strobelt, Hendrik and Subramani, Nishant and Xu, Wei and Yang, Diyi and Yerukola, Akhila and Zhou, Jiawei},
  editor = {Bosselut, Antoine and Durmus, Esin and Gangal, Varun Prashant and Gehrmann, Sebastian and Jernite, Yacine and Perez-Beltrachini, Laura and Shaikh, Samira and Xu, Wei},
  booktitle = {Proceedings of the GEM Workshop},
  month = aug,
  year = {2021},
  address = {Online},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/2021.gem-1.10},
  pages = {96--120},
  paper_link = {https://arxiv.org/abs/2102.01672/},
}

NeurIPS
Natural Adversarial Objects

Felix Lau, Nishant Subramani, Sasha Harrison, Aerin Kim, Elliot Branson, and Rosanne Liu

DataCentricAI Workshop, 2021

Paper Abs Bib

Although state-of-the-art object detection methods have shown compelling performance, models often are not robust to adversarial attacks and out-of-distribution data. We introduce a new dataset, Natural Adversarial Objects (NAO), to evaluate the robustness of object detection models. NAO contains 7,934 images and 9,943 objects that are unmodified and representative of real-world scenarios, but cause state-of-the-art detection models to misclassify with high confidence. The mean average precision (mAP) of EfficientDet-D7 drops 74.5% when evaluated on NAO compared to the standard MSCOCO validation set. Moreover, by comparing a variety of object detection architectures, we find that better performance on MSCOCO validation set does not necessarily translate to better performance on NAO, suggesting that robustness cannot be simply achieved by training a more accurate model. We further investigate why examples in NAO are difficult to detect and classify. Experiments of shuffling image patches reveal that models are overly sensitive to local texture. Additionally, using integrated gradients and background replacement, we find that the detection model is reliant on pixel information within the bounding box, and insensitive to the background context when predicting class labels. NAO can be downloaded at https://drive.google.com/drive/folders/15P8sOWoJku6SSEiHLEts86ORfytGezi8.
@article{Lau2021NaturalAO, title = {Natural Adversarial Objects}, author = {Lau, Felix and Subramani, Nishant and Harrison, Sasha and Kim, Aerin and Branson, Elliot and Liu, Rosanne}, journal = {DataCentricAI Workshop}, year = {2021}, volume = {abs/2111.04204}, url = {https://api.semanticscholar.org/CorpusID:243848218}, paper_link = {https://arxiv.org/abs/2111.04204/}, }

2020

NeurIPS
A Survey of Deep Learning Approaches for OCR and Document Understanding

Nishant Subramani, Alexandre Matton, Malcolm Greaves, and Adrian Lam

MLRSA Workshop, 2020

Paper Abs Bib

Documents are a core part of many businesses in many fields such as law, finance, and technology among others. Automatic understanding of documents such as invoices, contracts, and resumes is lucrative, opening up many new avenues of business. The fields of natural language processing and computer vision have seen tremendous progress through the development of deep learning such that these methods have started to become infused in contemporary document understanding systems. In this survey paper, we review different techniques for document understanding for documents written in English and consolidate methodologies present in literature to act as a jumping-off point for researchers exploring this area.
@article{Subramani2020ASO, title = {A Survey of Deep Learning Approaches for OCR and Document Understanding}, author = {Subramani, Nishant and Matton, Alexandre and Greaves, Malcolm and Lam, Adrian}, journal = {MLRSA Workshop}, year = {2020}, volume = {abs/2011.13534}, url = {https://api.semanticscholar.org/CorpusID:227209404}, paper_link = {https://arxiv.org/abs/2011.13534/}, }
arXiv
Discovering Useful Sentence Representations from Large Pretrained Language Models

Nishant Subramani, and Nivedita Suresh

ArXiv, 2020

Paper Abs Bib

Despite the extensive success of pretrained language models as encoders for building NLP systems, they haven’t seen prominence as decoders for sequence generation tasks. We explore the question of whether these models can be adapted to be used as universal decoders. To be considered "universal," a decoder must have an implicit representation for any target sentence s, such that it can recover that sentence exactly when conditioned on its representation. For large transformer-based language models trained on vast amounts of English text, we investigate whether such representations can be easily discovered using standard optimization methods. We present and compare three representation injection techniques for transformer-based models and three accompanying methods which map sentences to and from this representation space. Experiments show that not only do representations exist for sentences from a variety of genres. More importantly, without needing complex optimization algorithms, our methods recover these sentences almost perfectly without fine-tuning the underlying language model at all.
@article{Subramani2020DiscoveringUS, title = {Discovering Useful Sentence Representations from Large Pretrained Language Models}, author = {Subramani, Nishant and Suresh, Nivedita}, journal = {ArXiv}, year = {2020}, volume = {abs/2008.09049}, url = {https://api.semanticscholar.org/CorpusID:221186910}, paper_link = {https://arxiv.org/abs/2008.09049/}, }
AAAI
Learning Efficient Representations for Fake Speech Detection

Nishant Subramani, and Delip Rao

In Proceedings of AAAI, 2020

Paper Abs Bib

Synthetic speech or “fake speech” which matches personal vocal traits has become better and cheaper due to advances in deep learning-based speech synthesis and voice conversion approaches. This increased accessibility of synthetic speech systems and the growing misuse of them highlights the critical need to build countermeasures. Furthermore, new synthesis models evolve all the time and the efficacy of previously trained detection models on these unseen attack vectors is poor. In this paper, we focus on: 1) How can we build highly accurate, yet parameter and sample-efficient models for fake speech detection? 2) How can we rapidly adapt detection models to new sources of fake speech? We present four parameter-efficient convolutional architectures for fake speech detection with best detection F1 scores of around 97 points on a large dataset of fake and bonafide speech. We show how the fake speech detection task naturally lends itself to a novel multi-task problem further improving F1 scores for a mere 0.5% increase in model parameters. Our multi-task setting also helps in data-sparse situations, commonplace in adversarial settings. We investigate an alternative approach to the data-sparsity problem using transfer learning and show that it is possible to meet purely supervised detection performance for unseen attack vectors with as little as 6.25% of the training data. This is the first known application of transfer learning in adversarial settings for speech. Finally, we show how well our transfer learning approach adapts in an instance-efficient way to new attack vectors using the Real-Time Voice Cloning toolkit. We exceed the purely supervised detection performance (99.18 F1) with as little as 6.25% of the data.
@inproceedings{Subramani2020LearningER, title = {Learning Efficient Representations for Fake Speech Detection}, author = {Subramani, Nishant and Rao, Delip}, booktitle = {Proceedings of AAAI}, year = {2020}, url = {https://api.semanticscholar.org/CorpusID:213460309}, paper_link = {https://ojs.aaai.org/index.php/AAAI/article/view/6044/}, }

2019

NeurIPS
Can unconditional language models recover arbitrary sentences?

Nishant Subramani, Samuel Bowman, and Kyunghyun Cho

Advances in NeurIPS, 2019

Paper Abs Bib

Neural network-based generative language models like ELMo and BERT can work effectively as general purpose sentence encoders in text classification without further fine-tuning. Is it possible to adapt them in a similar way for use as general-purpose decoders? For this to be possible, it would need to be the case that for any target sentence of interest, there is some continuous representation that can be passed to the language model to cause it to reproduce that sentence. We set aside the difficult problem of designing an encoder that can produce such representations and, instead, ask directly whether such representations exist at all. To do this, we introduce a pair of effective, complementary methods for feeding representations into pretrained unconditional language models and a corresponding set of methods to map sentences into and out of this representation space, the reparametrized sentence space. We then investigate the conditions under which a language model can be made to generate a sentence through the identification of a point in such a space and find that it is possible to recover arbitrary sentences nearly perfectly with language models and representations of moderate size without modifying any model parameters.
@article{subramani2019can, title = {Can unconditional language models recover arbitrary sentences?}, author = {Subramani, Nishant and Bowman, Samuel and Cho, Kyunghyun}, journal = {Advances in NeurIPS}, volume = {32}, year = {2019}, url = {https://proceedings.neurips.cc/paper_files/paper/2019/file/48c8c3963853fff20bd9e8bee9bd4c07-Paper.pdf}, paper_link = {https://arxiv.org/abs/1907.04944/}, }

2018

ICML

Pag2admg: An Algorithm for the Complete Causal Enumeration of a Markov Equivalence Class

Nishant Subramani

Causal ML Workshop, 2018

Paper Bib

@article{subramani2018pag2admg,
  title = {Pag2admg: An Algorithm for the Complete Causal Enumeration of a Markov Equivalence Class},
  author = {Subramani, Nishant},
  journal = {Causal ML Workshop},
  year = {2018},
  url = {https://arxiv.org/abs/1612.00099/},
  paper_link = {https://arxiv.org/abs/1612.00099/},
}

2017

AAAI

Student Abstract

PAG2ADMG: A Novel Methodology to Enumerate Causal Graph Structures

Nishant Subramani, and Doug Downey

Proceedings of AAAI, 2017

Paper Bib

@article{Subramani_Downey_2017,
  title = {PAG2ADMG: A Novel Methodology to Enumerate Causal Graph Structures},
  author = {Subramani, Nishant and Downey, Doug},
  volume = {31},
  url = {https://ojs.aaai.org/index.php/AAAI/article/view/11121},
  number = {1},
  journal = {Proceedings of AAAI},
  year = {2017},
  paper_link = {https://ojs.aaai.org/index.php/AAAI/article/view/11121},
}