Multi-modal AI for Temporal Action Localization
Developed a cutting-edge multi-modal AI model for Temporal Action Localization (TAL) and Spatio-Temporal Action Localization (STAL) in grocery shopping scenarios. This project was part of Amazon's prestigious ICCV 2025 Challenge, focusing on understanding complex human behaviors in retail environments.
Achieved top performance in Temporal Action Localization
Leading performance in Spatio-Temporal Action Localization
Rapid prototyping and optimization within tight deadline
Leveraged AdaTAD (Adaptive Temporal Action Detection) for robust temporal boundary detection with adaptive thresholding mechanisms
Integrated Segment Anything Model 2 (SAM2) for precise spatial segmentation and object-level action localization
Combined video, audio, and contextual information through advanced fusion architectures for comprehensive scene understanding
Optimized joint spatio-temporal modeling by combining AdaTAD's temporal precision with SAM2's spatial accuracy