Ayush Gupta

I am a Ph.D. student at the AIEM lab, Johns Hopkins University in the department of Computer Science. I am advised by Prof. Rama Chellappa working on problems in Computer Vision and Deep Learning. My research has two focus points - general-purpose vision language models, where I work on multimodal LLMs on tasks like VQA, Video Grounding and LLM interpretability; and on fine-grained computer vision problems, where I work on person re-identification and gait recognition. I am supported by an IARPA grant, BRIAR. I have also worked on a DARPA grant, ANSR.

Previously, I obtained a B.E. in Computer Science from Birla Institute of Technology and Science (BITS), Pilani. At BITS Pilani I was working under the guidance of Prof. Poonam Goyal on video captioning, and collaborating with Dr. Yogesh S Rawat from UCF on Gait Recognition.

Email / LinkedIn / CV / Google Scholar / Twitter / Github

News

01/2026: Started an internship at Honda Research Institute, USA! I will be working on multimodal LLMs on video understanding tasks.
10/2025: Completed my PhD qualifiers, I am now a PhD candidate!
08/2025: Our paper Polysemantic Dropout: Conformal OOD Detection for Specialized LLMs has been accepted to EMNLP 2025, as part of the main conference! This work was also done as part of my internship at SRI.
07/2025: Our paper TOGA: Temporally Grounded Open-Ended Video QA with Weak Supervision has been accepted to ICCV 2025 in Hawaii! This work was done as part of my internship at SRI.
07/2025: Our paper Mind the Gap: Bridging Occlusion in Gait Recognition via Residual Gap Correction has been accepted as an oral paper at IJCB 2025 in Osaka, Japan!
05/2025: Joining Amazon as an Applied Science Intern! I will be working on vision-language models with the Ring Devices team.
10/2024: Our paper MimicGait: A Model-Agnostic Approach for Occluded Gait Recognition using Correlational Knowledge Distillation has been accepted to WACV 2025!
09/2024: Our paper GaitContour: Efficient Gait Recognition based on a Contour-Pose Representation has been accepted to WACV 2025!

Publications

	Every Token Counts: A Self-Similarity based Framework for Query-Based Counting in Video-LLMs Ayush Gupta, Yan Li, Srinivas Parthasarthy, Jim Thomas under submission We enhance the video counting abilities of multimodal LLMs through a self-similarity based approach. Part of my Amazon Internship. More details coming soon!
	TOGA: Temporally Grounded Open-Ended Video QA with Weak Supervision Ayush Gupta, Anirban Roy, Rama Chellappa, Nathaniel D. Bastian, Alvaro Velasquez, Susmit Jha ICCV 2025 We use multimodal LLMs for temporal grounding of question-answer pairs in unconstrained videos. Project Website / arXiv
	Polysemantic Dropout: Conformal OOD Detection for Specialized LLMs Ayush Gupta, Ramneet Kaur, Anirban Roy, Adam D. Cobb, Rama Chellappa, Susmit Jha EMNLP 2025 (main conference) We propose an inference time method to automatically detect out of distribution inputs, and predict the output uncertainity in specialized LLMs. Project Website / arXiv
	Mind the Gap: Bridging Occlusion in Gait Recognition via Residual Gap Correction Ayush Gupta, Siyuan Huang, Rama Chellappa IJCB 2025 (oral) We employ Residual Correction for recovering complete features from occluded gait sequences. Project Website / arXiv
	You Can Run but not Hide: Improving Gait Recognition with Intrinsic Occlusion Type Awareness Ayush Gupta, Rama Chellappa WACV 2024 (oral) We use an auxiliary occlusion detector to solve the occlusion problem in long range gait recognition. Project Website / arXiv
	MimicGait: A Model-Agnostic Approach for Occluded Gait Recognition using Correlational Knowledge Distillation Ayush Gupta, Rama Chellappa WACV 2025 We tackle the occlusion problem in gait recognition using correlational knowledge distillation. Project Website / arXiv
	GaitContour: Efficient Gait Recognition based on a Contour-Pose Representation Yuxiang Guo, Anshul Shah, Jiang Liu, Ayush Gupta, Cheng Peng, Rama Chellappa WACV 2025 We develop a novel, efficient contour-based representation for gait recognition. arXiv
	Tackling Domain Shifts in Person Re-Identification: A Survey and Analysis Vuong Nguyen, Samiha Mirza, Abdollah Zakeri, Ayush Gupta, Rahma Aloui, Khadija Khaldi, Pranav Mantini, Shishir Shah, Fatima Merchant CVPR 2024 Continual Learning Workshop A comprehensive survey on domain shift in Person Re-ID. We evaluate existing methods under various settings and give directions for future research. Paper
	Transfer Learning for Frailty Classification in Older Adults Laura McDaniel, Ayush Gupta, Ime Essien, Ryan Roemmich, Peter Abadir, Rama Chellappa under submission Using computer vision techniques to diagnose frailty among older adults.
	EchoSAM: Predicting Ejection Fraction using Segmentation Guided Vision Transformers Basudha Pal, Ayush Gupta ,Vishal Patel Predicting the Ejection Fraction from ultrasound images of the heart, utilizing the Segment Anything Model.
	GaitZero: Temporal Self-similarity for Unsupervised Gait Recognition Ayush Gupta, Alexander Matasa, Shruti Vyas, Yogesh S Rawat Developing a novel technique for unsupervised gait recognition using temporal self similarity.
	Visually Guided Knowledge selection for Video Captioning Ayush Gupta, Ashrya Agrawal, Poonam Goyal, Navneet Goyal An approach for generating natural language captions of videos using external knowledge bases. Paper

Template Credits : Jon Barron