Chirag Agarwal's picture

Chirag Agarwal

AikyamLab

·

https://chirag-agarwall.github.io/

AI & ML interests

Explainability and Interpretability; AI Safety; AI Alignment

Recent Activity

upvoted a paper 28 days ago

Towards Understanding the Robustness of Sparse Autoencoders

submitted a paper 28 days ago

Towards Understanding the Robustness of Sparse Autoencoders

upvoted a paper 28 days ago

PageGuide: Browser extension to assist users in navigating a webpage and locating information

View all activity

Organizations

upvoted 2 papers 28 days ago

Towards Understanding the Robustness of Sparse Autoencoders

Paper • 2604.18756 • Published Apr 20 • 10

PageGuide: Browser extension to assist users in navigating a webpage and locating information

Paper • 2604.23772 • Published Apr 26 • 7

upvoted a paper 4 months ago

CURE-Med: Curriculum-Informed Reinforcement Learning for Multilingual Medical Reasoning

Paper • 2601.13262 • Published Jan 19 • 3

upvoted a paper 5 months ago

CLINIC: Evaluating Multilingual Trustworthiness in Language Models for Healthcare

Paper • 2512.11437 • Published Dec 12, 2025 • 4

upvoted a paper 6 months ago

Polarity-Aware Probing for Quantifying Latent Alignment in Language Models

Paper • 2511.21737 • Published Nov 21, 2025 • 1

upvoted a collection 6 months ago

Polarity-Aware Probing Datasets

Datasets for PA-Probing described in "Polarity-Aware Probing for Quantifying Latent Alignment in Language Models" https://www.arxiv.org/pdf/2511.21737 • 2 items • Updated Dec 7, 2025 • 1

upvoted a paper over 2 years ago

Faithfulness vs. Plausibility: On the (Un)Reliability of Explanations from Large Language Models

Paper • 2402.04614 • Published Feb 7, 2024 • 3