Gait recognition is an important biometric technique over large distances. State-of-the-art gait recognition systems perform very well in controlled environments at close range. Recently, there has been an increased interest in gait recognition in the wild prompted by the collection of outdoor, more challenging datasets containing variations in terms of illumination, pitch angles and distances. An important problem in these environments is that of occlusion, where the subject is partially blocked from camera view. While important, this problem has received little attention. Thus, we propose MimicGait, a model-agnostic approach for gait recognition in the presence of occlusions. We train the network using a multi-instance correlational distillation loss to capture both inter-sequence and intra-sequence correlations in the occluded gait patterns of a subject, utilizing an auxiliary Visibility Estimation Network to guide the training of the proposed mimic network. We demonstrate the effectiveness of our approach on challenging real-world datasets like GREW, Gait3D and BRIAR.
We try to learn the correlations between the motion of different body parts through our approach, so that when some body parts are missing, we are able to utilize the observable motion to extract gait features even from the occluded body. We train a 'mimic network' to replicate the behavior of a teacher model that has been trained on unoccluded gait sequences to generate ideal gait features. The mimic network is designed to handle occluded sequences by learning from the predictions of the teacher model. This is achieved through a process called correlational knowledge distillation, where the mimic network is trained to capture both inter-sequence and intra-sequence correlations in the occluded gait patterns. The auxiliary Visibility Estimation Network helps guide the training by providing visibility scores for the occluded regions, ensuring that the mimic network focuses on the important parts of the input.
Occlusions can be classified as consistent (static) or dynamic (changing). Consistent occlusions occur due to obstacles like sidewalks or bad camera angles, while dynamic occlusions happen when objects or people temporarily block the subject. We simulate both types by placing stationary or moving black patches on input frames. Consistent occlusions remove the top, bottom, or middle part of the frame. We focus on top and bottom occlusions in our main results and evaluate generalizability with middle and dynamic occlusions. During training and evaluation, occlusions are introduced randomly, covering 40%-60% of the frame. More details are in the supplementary material.
Our evaluation consists of two metrics: Rank-retrieval accuracy and our new proposed metric Relative Performance (RP). Rank-retrieval is an absolute measure of performance, which might be affected by factors like strength of backbone. However, we want to evaluate a model-agnostic approach (like MimicGait), and the evaluation should not be affected by the strength of the backbone. Hence, we normalize rank-retrieval accuracy by the strength of the backbone, and call it RP.
We also evaluate the generalizability and adaptability of our approach by testing on new occlusion types like middle occlusions and dynamic occlusions. We show that our approach is able to generalize to new occlusion types and is able to also able to adapt to new occlusions.
We perform ablation studies on the proposed approach, to test the importance of the individual components - the mimic network, and the two proxy tasks which the visibility estimation network is trained on. We find that removing the visibility estimation network leads to a significant drop in performance, highlighting its importance in guiding the mimic network. Additionally, we observe that training the visibility estimation network on both proxy tasks results in better performance compared to training on a single task.
@inproceedings{gupta2025wacv,
title={MimicGait: A Model Agnostic approach for Occluded Gait Recognition using Correlational Knowledge Distillation},
author={Ayush Gupta and Rama Chellappa},
booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
year={2025}
}