Gauri K.

My Publications

My Publications

Here are some of my research publications and articles.

CAPTURE: Context-Aware Prompt Injection Testing and Robustness Enhancement

Authors: Gauri Kholkar et al.

Conference: ACL 2025, LLMSEC Workshop

Abstract: Prompt injection remains a major security risk for large language models. However, the efficacy of existing guardrail models in context-aware settings remains underexplored, as they often rely on static attack benchmarks. Additionally, they have over-defense tendencies. We introduce CAPTURE, a novel context-aware benchmark assessing both attack detection and over-defense tendencies with minimal in-domain examples. Our experiments reveal that current prompt injection guardrail models suffer from high false negatives in adversarial cases and excessive false positives in benign scenarios, highlighting critical limitations.

Towards Socio-Culturally Aware Evaluation of Large Language Models for Content Moderation

Authors: Shanu Kumar, Gauri Kholkar, Saish Mendke, Anubhav Sadana, Parag Agrawal, Sandipan Dandapat

Conference: COLING 2025

Abstract: With the growth of social media and large language models, content moderation has become crucial. Many existing datasets lack adequate representation of different groups, resulting in unreliable assessments. To tackle this, we propose a socio-culturally aware evaluation framework for LLM-driven content moderation and introduce a scalable method for creating diverse datasets using persona-based generation. Our analysis reveals that these datasets provide broader perspectives and pose greater challenges for LLMs than diversity-focused generation methods without personas. This challenge is especially pronounced in smaller LLMs, emphasizing the difficulties they encounter in moderating such diverse content.

Read on arXiv

LITMUS Predictor: An AI Assistant for Building Reliable, High-Performing and Fair Multilingual NLP Systems

Authors: Anirudh Srinivasan*†1, Gauri Kholkar*2, Rahul Kejriwal*2, Tanuja Ganu3, Sandipan Dandapat2, Sunayana Sitaram3, Balakrishnan Santhanam2, Somak Aditya3, Kalika Bali3, Monojit Choudhury3 (*Equal contribution)

Conference: AAAI 2022

Abstract: Pre-trained multilingual language models are gaining popularity due to their cross-lingual zero-shot transfer ability, but these models do not perform equally well in all languages. Evaluating task-specific performance of a model in a large number of languages is often a challenge due to lack of labeled data, as is targeting improvements in low performing languages through few-shot learning. We present a tool - LITMUS Predictor - that can make reliable performance projections for a fine-tuned task-specific model in a set of languages without test and training data, and help strategize data labeling efforts to optimize performance and fairness objectives.

Read the PDF