Zhaowei Cai

I am a Senior Applied Scientist with Amazon AGI multimodal team, where I work on computer vision and machine learning. I received my Ph.D. and M.S. degrees from UC San Diego, advised by Nuno Vasconcelos.

I have fortunately worked as research intern at Facebook AI Research (FAIR), Micsoft Research Redmond (MSR), IBM T. J. Watson Research, and Institute of Automation, Chinese Academy of Sciences (CASIA).

Email / Google Scholar / Github

Recent News

[06/2025] One paper accepted by ICCV25.
[02/2025] We are hiring full-time Applied Scientists. Let me know if you are interested.
[12/2024] Nova is being launched! Please check it out.
[02/2023] PolyFormer on referring image segmentation was accepted by CVPR 2023!
[01/2023] MaskVLM was accepted by ICLR 2023!
[12/2022] The code of Semi-ViT has been released!
[10/2022] I will be serving as an Area Chair for CVPR 2023 and ICCV 2023.
[09/2022] Semi-ViT was accepted by NeurIPS 2022. The codes will be coming soon!
[08/2022] Check out our recent works (semi-supervised ViT and masked vision and language modeling).
[07/2022] Two papers (X-DETR and few-shot detection benchmark) were accepted by ECCV 2022.
[06/2022] The code of Omni-DETR has been released! Check our code.
[03/2022] Omni-DETR was accepted by CVPR 2022. The code is coming soon!
[06/2021] The code of EMAN has been released! Check our code.
[02/2021] The EMAN paper was accepted by CVPR 2021 as Oral.

Research

I am interested in computer vision and machine learning, especially vision and language understanding, object detection, semi- and self-supervised learning, low-precision neural networks, etc.

	The Amazon Nova Family of Models: Technical Report and Model Card The Amazon Artificial General Intelligence team Tech Report, 2024 arxiv / bibtex Amazon Nova Premier: Technical report and model card The Amazon Artificial General Intelligence team Tech Report, 2025 bibtex
	Scaling up Image Segmentation across Data and Tasks Pei Wang, Zhaowei Cai, Hao Yang, Ashwin Swaminathan, R. Manmatha, Stefano Soatto CVPR, 2025 bibtex Mixed-Query Transformer: A Unified Image Segmentation Architecture Pei Wang, Zhaowei Cai, Hao Yang, Ashwin Swaminathan, R. Manmatha, Stefano Soatto arXiv, 2024 arxiv / bibtex
	Open-World Dynamic Prompt and Continual Visual Representation Learning Youngeun Kim, Jun Fang, Qin Zhang, Zhaowei Cai, Yantao Shen, Rahul Duggal, Dripta S. Raychaudhuri, Zhuowen Tu, Yifan Xing, Onkar Dabeer (*equal contribution) ECCV, 2024 arxiv / bibtex
	PolyFormer: Referring Image Segmentation as Sequential Polygon Generation Jiang Liu, Hui Ding, Zhaowei Cai, Yuting Zhang, Ravi Kumar Satzoda, Vijay Mahadevan, R. Manmatha (*equal contribution) CVPR, 2023 arxiv / code / bibtex
	Masked Vision and Language Modeling for Multi-modal Representation Learning Gukyeong Kwon, Zhaowei Cai, Avinash Ravichandran, Erhan Bas, Rahul Bhotika, Stefano Soatto ICLR, 2023 arxiv / bibtex
	Semi-supervised Vision Transformers at Scale Zhaowei Cai, Avinash Ravichandran, Paolo Favaro, Manchen Wang, Davide Modolo, Rahul Bhotika, Zhuowen Tu, Stefano Soatto NeurIPS, 2022 arxiv / code / bibtex
	X-DETR: A Versatile Architecture for Instance-wise Vision-Language Tasks Zhaowei Cai, Gukyeong Kwon, Avinash Ravichandran, Erhan Bas, Zhuowen Tu, Rahul Bhotika and Stefano Soatto ECCV, 2022 arxiv / code / bibtex
	Rethinking Few-Shot Object Detection on a Multi-Domain Benchmark Kibok Lee, Hao Yang, Satyaki Chakraborty, Zhaowei Cai, Gurumurthy Swaminathan, Avinash Ravichandran and Onkar Dabeer ECCV, 2022 arxiv / code / bibtex
	Omni-DETR: Omni-Supervised Object Detection with Transformers Pei Wang, Zhaowei Cai, Hao Yang, Gurumurthy Swaminathan, Nuno Vasconcelos, Bernt Schiele and Stefano Soatto CVPR, 2022 arxiv / code / bibtex
	Contrastive Neighborhood Alignment Pengkai Zhu, Zhaowei Cai, Yuanjun Xiong, Zhuowen Tu, Luis Goncalves, Vijay Mahadevan and Stefano Soatto arXiv, 2022 arxiv / bibtex
	Advanced Methods for Robust Object Detection Zhaowei Cai and Nuno Vasconcelos book chapter of Advanced Methods and Deep Learning in Computer Vision, 2021 link / bibtex
	Exponential Moving Average Normalization for Self-supervised and Semi-supervised Learning Zhaowei Cai, Avinash Ravichandran, Subhransu Maji, Charless Fowlkes, Zhuowen Tu and Stefano Soatto CVPR, 2021 (Oral) arxiv / code / bibtex
	Rethinking Differentiable Search for Mixed-Precision Neural Networks Zhaowei Cai and Nuno Vasconcelos CVPR, 2020 arxiv / code / bibtex
	UA-DETRAC: A New Benchmark and Protocol for Multi-Object Detection and Tracking Longyin Wen, Dawei Du, Zhaowei Cai, Zhen Lei, Ming-Ching Chang, Honggang Qi, Jongwoo Lim, Ming-Hsuan Yang and Siwei Lyu CVIU, 2020 arxiv / project / bibtex
	Towards Universal Object Detection by Domain Attention Xudong Wang, Zhaowei Cai, Dashan Gao and Nuno Vasconcelos CVPR, 2019 arxiv / project / code / bibtex
	Cascade R-CNN: High Quality Object Detection and Instance Segmentation Zhaowei Cai and Nuno Vasconcelos T-PAMI, 2019 arxiv / project / code / bibtex Cascade R-CNN: Delving into High Quality Object Detection Zhaowei Cai and Nuno Vasconcelos CVPR, 2018 (Spotlight) arxiv / project / code / bibtex
	Deep Learning with Low Precision by Half-wave Gaussian Quantization Zhaowei Cai, Xiaodong He, Jian Sun and Nuno Vasconcelos CVPR, 2017 (Spotlight) arxiv / project / code / bibtex
	A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection Zhaowei Cai, Quanfu Fan, Rogerio S. Feris and Nuno Vasconcelos ECCV, 2016 arxiv / project / code / bibtex
	Learning Complexity-Aware Cascades for Pedestrian Detection Zhaowei Cai, Mohammad Saberian and Nuno Vasconcelos T-PAMI, 2019 Learning Complexity-Aware Cascades for Deep Pedestrian Detection Zhaowei Cai, Mohammad Saberian and Nuno Vasconcelos ICCV, 2015 (Oral) arxiv / demo / bibtex
	Robust Deformable and Occluded Object Tracking with Dynamic Graph Zhaowei Cai, Longyin Wen, Zhen Lei, Nuno Vasconcelos and Stan Z. Li T-IP, 2014 project / bibtex Structured Visual Tracking with Dynamic Graph Zhaowei Cai Longyin Wen, Jianwei Yang, Zhen Lei and Stan Z. Li ACCV, 2012 bibtex
	Robust Online Learned Spatio-Temporal Context Model for Visual Tracking Longyin Wen, Zhaowei Cai, Zhen Lei, Dong Yi and Stan Z. Li T-IP, 2014 bibtex Online Spatio-Temporal Structural Context Learning for Visual Tracking Longyin Wen, Zhaowei Cai, Zhen Lei, Dong Yi and Stan Z. Li ECCV, 2012 bibtex
	Person-Specific Face Tracking with Online Recognition Zhaowei Cai, Longyin Wen, Dong Cao, Zhen Lei, Dong Yi and Stan Z. Li FG, 2013 bibtex
	A New Projection Space for Separation of Specular and Diffuse Reflection Components in Color Images Jianwei Yang, Zhaowei Cai, Zhen Lei, Dong Yi and Stan Z. Li ACCV, 2012 bibtex

Service

Area Chair: NeurIPS25, ACL25, ICML25, CVPR25, ICLR25, NeurIPS24, EMNLP24, CVPR24, ICCV23, CVPR23
Workshop co-organizer of Adversarial Robustness in the Real World in ECCV 2020.
Journal Reviewer: T-PAMI, IJCV, T-IP, CVIU, T-MM, PR, T-CSVT, T-ITS, T-Cybernetics
Conference Reviewer: ICCV25, ECCV24, NeurIPS23, NeurIPS22, ICML22, CVPR22, ICLR22, NeurIPS21, ICCV21, ICML21, CVPR21, ICLR21, NeurIPS20, ICML20, ECCV20, CVPR20, NeurIPS19, ICML19, ICCV19, CVPR19 (outstanding reviewer), NIPS18, ECCV18, ICML18, CVPR18, NIPS17, ICCV17, CVPR17

Talks

Invited talk at UCSC AI seminar: Image/Object/Mask-level Vision and Language Understanding.
Invited talk at UCSB NLP group: Pushing the Limits of Object Detection.
Invited talk at UCSB: Low-precision Neural Networks.
Guest lecture at UCSC: Exponential Moving Average Normalization for Self- and Semi- Supervised Learning.

template credit to Jon Barron