Zhaowei Cai
I am a Senior Applied Scientist with Amazon AGI multimodal team, where I work on computer vision and machine learning. I received my Ph.D. and M.S. degrees from UC San Diego, advised by Nuno Vasconcelos.
I have fortunately worked as research intern at Facebook AI Research (FAIR), Micsoft Research Redmond (MSR), IBM T. J. Watson Research, and Institute of Automation, Chinese Academy of Sciences (CASIA).
Email  / 
Google Scholar  / 
- [12/2024] Nova is being launched! Please check it out.
- [02/2023] PolyFormer on referring image segmentation was accepted by CVPR 2023!
- [01/2023] MaskVLM was accepted by ICLR 2023!
- [12/2022] The code of Semi-ViT has been released!
- [10/2022] I will be serving as an Area Chair for CVPR 2023 and ICCV 2023.
- [09/2022] Semi-ViT was accepted by NeurIPS 2022. The codes will be coming soon!
- [08/2022] Check out our recent works (semi-supervised ViT and masked vision and language modeling).
- [07/2022] Two papers (X-DETR and few-shot detection benchmark) were accepted by ECCV 2022.
- [06/2022] The code of Omni-DETR has been released! Check our code.
- [03/2022] Omni-DETR was accepted by CVPR 2022. The code is coming soon!
- [06/2021] The code of EMAN has been released! Check our code.
- [02/2021] The EMAN paper was accepted by CVPR 2021 as Oral.
I am interested in computer vision and machine learning, especially vision and language understanding, object detection, semi- and self-supervised learning, low-precision neural networks, etc.
Mixed-Query Transformer: A Unified Image Segmentation Architecture
Pei Wang, Zhaowei Cai, Hao Yang, Ashwin Swaminathan, R. Manmatha, Stefano Soatto
arXiv, 2024
arxiv /
Open-World Dynamic Prompt and Continual Visual Representation Learning
Youngeun Kim*, Jun Fang*, Qin Zhang, Zhaowei Cai, Yantao Shen, Rahul Duggal, Dripta S. Raychaudhuri, Zhuowen Tu, Yifan Xing, Onkar Dabeer (*equal contribution)
ECCV, 2024
arxiv /
PolyFormer: Referring Image Segmentation as Sequential Polygon Generation
Jiang Liu*, Hui Ding*, Zhaowei Cai, Yuting Zhang, Ravi Kumar Satzoda, Vijay Mahadevan, R. Manmatha (*equal contribution)
CVPR, 2023
arxiv /
code /
Masked Vision and Language Modeling for Multi-modal Representation Learning
Gukyeong Kwon, Zhaowei Cai, Avinash Ravichandran, Erhan Bas, Rahul Bhotika, Stefano Soatto
ICLR, 2023
arxiv /
Semi-supervised Vision Transformers at Scale
Zhaowei Cai, Avinash Ravichandran, Paolo Favaro, Manchen Wang, Davide Modolo, Rahul Bhotika, Zhuowen Tu, Stefano Soatto
NeurIPS, 2022
arxiv /
code /
X-DETR: A Versatile Architecture for Instance-wise Vision-Language Tasks
Zhaowei Cai, Gukyeong Kwon, Avinash Ravichandran, Erhan Bas, Zhuowen Tu, Rahul Bhotika and Stefano Soatto
ECCV, 2022
arxiv /
code /
Rethinking Few-Shot Object Detection on a Multi-Domain Benchmark
Kibok Lee, Hao Yang, Satyaki Chakraborty, Zhaowei Cai, Gurumurthy Swaminathan, Avinash Ravichandran and Onkar Dabeer
ECCV, 2022
arxiv /
code /
Omni-DETR: Omni-Supervised Object Detection with Transformers
Pei Wang, Zhaowei Cai, Hao Yang, Gurumurthy Swaminathan, Nuno Vasconcelos, Bernt Schiele and Stefano Soatto
CVPR, 2022
arxiv /
code /
Contrastive Neighborhood Alignment
Pengkai Zhu, Zhaowei Cai, Yuanjun Xiong, Zhuowen Tu, Luis Goncalves, Vijay Mahadevan and Stefano Soatto
arXiv, 2022
arxiv /
Advanced Methods for Robust Object Detection
Zhaowei Cai and Nuno Vasconcelos
book chapter of Advanced Methods and Deep Learning in Computer Vision, 2021
link /
Exponential Moving Average Normalization for Self-supervised and Semi-supervised Learning
Zhaowei Cai, Avinash Ravichandran, Subhransu Maji, Charless Fowlkes, Zhuowen Tu and Stefano Soatto
CVPR, 2021 (Oral)
arxiv /
code /
Rethinking Differentiable Search for Mixed-Precision Neural Networks
Zhaowei Cai and Nuno Vasconcelos
CVPR, 2020
arxiv /
code /
UA-DETRAC: A New Benchmark and Protocol for Multi-Object Detection and Tracking
Longyin Wen, Dawei Du, Zhaowei Cai, Zhen Lei, Ming-Ching Chang, Honggang Qi, Jongwoo Lim, Ming-Hsuan Yang and Siwei Lyu
CVIU, 2020
arxiv /
project /
Towards Universal Object Detection by Domain Attention
Xudong Wang, Zhaowei Cai, Dashan Gao and Nuno Vasconcelos
CVPR, 2019
arxiv /
project /
code /
Cascade R-CNN: High Quality Object Detection and Instance Segmentation
Zhaowei Cai and Nuno Vasconcelos
T-PAMI, 2019
arxiv /
project /
code /
Cascade R-CNN: Delving into High Quality Object Detection
Zhaowei Cai and Nuno Vasconcelos
CVPR, 2018 (Spotlight)
arxiv /
project /
code /
Deep Learning with Low Precision by Half-wave Gaussian Quantization
Zhaowei Cai, Xiaodong He, Jian Sun and Nuno Vasconcelos
CVPR, 2017 (Spotlight)
arxiv /
project /
code /
A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection
Zhaowei Cai, Quanfu Fan, Rogerio S. Feris and Nuno Vasconcelos
ECCV, 2016
arxiv /
project /
code /
Learning Complexity-Aware Cascades for Pedestrian Detection
Zhaowei Cai, Mohammad Saberian and Nuno Vasconcelos
T-PAMI, 2019
Learning Complexity-Aware Cascades for Deep Pedestrian Detection
Zhaowei Cai, Mohammad Saberian and Nuno Vasconcelos
ICCV, 2015 (Oral)
arxiv /
demo /
Robust Deformable and Occluded Object Tracking with Dynamic Graph
Zhaowei Cai, Longyin Wen, Zhen Lei, Nuno Vasconcelos and Stan Z. Li
T-IP, 2014
project /
Structured Visual Tracking with Dynamic Graph
Zhaowei Cai Longyin Wen, Jianwei Yang, Zhen Lei and Stan Z. Li
ACCV, 2012
Robust Online Learned Spatio-Temporal Context Model for Visual Tracking
Longyin Wen, Zhaowei Cai, Zhen Lei, Dong Yi and Stan Z. Li
T-IP, 2014
Online Spatio-Temporal Structural Context Learning for Visual Tracking
Longyin Wen, Zhaowei Cai, Zhen Lei, Dong Yi and Stan Z. Li
ECCV, 2012
Person-Specific Face Tracking with Online Recognition
Zhaowei Cai, Longyin Wen, Dong Cao, Zhen Lei, Dong Yi and Stan Z. Li
FG, 2013
A New Projection Space for Separation of Specular and Diffuse Reflection Components in Color Images
Jianwei Yang, Zhaowei Cai, Zhen Lei, Dong Yi and Stan Z. Li
ACCV, 2012
- Area Chair: ICML25, CVPR25, ICLR25, NeurIPS24, EMNLP24, CVPR24, ICCV23, CVPR23
- Workshop co-organizer of Adversarial Robustness in the Real World in ECCV 2020.
- Journal Reviewer: T-PAMI, IJCV, T-IP, CVIU, T-MM, PR, T-CSVT, T-ITS, T-Cybernetics
- Conference Reviewer: ECCV24, NeurIPS23, NeurIPS22, ICML22, CVPR22, ICLR22, NeurIPS21, ICCV21, ICML21, CVPR21, ICLR21, NeurIPS20, ICML20, ECCV20, CVPR20, NeurIPS19, ICML19, ICCV19, CVPR19 (outstanding reviewer), NIPS18, ECCV18, ICML18, CVPR18, NIPS17, ICCV17, CVPR17
- Invited talk at UCSC AI seminar: Image/Object/Mask-level Vision and Language Understanding.
- Invited talk at UCSB NLP group: Pushing the Limits of Object Detection.
- Invited talk at UCSB: Low-precision Neural Networks.
- Guest lecture at UCSC: Exponential Moving Average Normalization for Self- and Semi- Supervised Learning.