Zhaowei Cai

I am a Senior Applied Scientist with Amazon AGI multimodal team, where I work on computer vision and machine learning. I received my Ph.D. and M.S. degrees from UC San Diego, advised by Nuno Vasconcelos.

I have fortunately worked as research intern at Facebook AI Research (FAIR), Micsoft Research Redmond (MSR), IBM T. J. Watson Research, and Institute of Automation, Chinese Academy of Sciences (CASIA).

Email  /  Google Scholar  /  Github

profile photo

Recent News
  • [06/2025] One paper accepted by ICCV25.
  • [02/2025] We are hiring full-time Applied Scientists. Let me know if you are interested.
  • [12/2024] Nova is being launched! Please check it out.
  • [02/2023] PolyFormer on referring image segmentation was accepted by CVPR 2023!
  • [01/2023] MaskVLM was accepted by ICLR 2023!
  • [12/2022] The code of Semi-ViT has been released!
  • [10/2022] I will be serving as an Area Chair for CVPR 2023 and ICCV 2023.
  • [09/2022] Semi-ViT was accepted by NeurIPS 2022. The codes will be coming soon!
  • [08/2022] Check out our recent works (semi-supervised ViT and masked vision and language modeling).
  • [07/2022] Two papers (X-DETR and few-shot detection benchmark) were accepted by ECCV 2022.
  • [06/2022] The code of Omni-DETR has been released! Check our code.
  • [03/2022] Omni-DETR was accepted by CVPR 2022. The code is coming soon!
  • [06/2021] The code of EMAN has been released! Check our code.
  • [02/2021] The EMAN paper was accepted by CVPR 2021 as Oral.

Research

I am interested in computer vision and machine learning, especially vision and language understanding, object detection, semi- and self-supervised learning, low-precision neural networks, etc.

The Amazon Nova Family of Models: Technical Report and Model Card
The Amazon Artificial General Intelligence team
Tech Report, 2024
arxiv / bibtex

Amazon Nova Premier: Technical report and model card
The Amazon Artificial General Intelligence team
Tech Report, 2025
bibtex

Scaling up Image Segmentation across Data and Tasks
Pei Wang, Zhaowei Cai, Hao Yang, Ashwin Swaminathan, R. Manmatha, Stefano Soatto
CVPR, 2025
bibtex

Mixed-Query Transformer: A Unified Image Segmentation Architecture
Pei Wang, Zhaowei Cai, Hao Yang, Ashwin Swaminathan, R. Manmatha, Stefano Soatto
arXiv, 2024
arxiv / bibtex

Open-World Dynamic Prompt and Continual Visual Representation Learning
Youngeun Kim*, Jun Fang*, Qin Zhang, Zhaowei Cai, Yantao Shen, Rahul Duggal, Dripta S. Raychaudhuri, Zhuowen Tu, Yifan Xing, Onkar Dabeer (*equal contribution)
ECCV, 2024
arxiv / bibtex

PolyFormer: Referring Image Segmentation as Sequential Polygon Generation
Jiang Liu*, Hui Ding*, Zhaowei Cai, Yuting Zhang, Ravi Kumar Satzoda, Vijay Mahadevan, R. Manmatha (*equal contribution)
CVPR, 2023
arxiv / code / bibtex

Masked Vision and Language Modeling for Multi-modal Representation Learning
Gukyeong Kwon, Zhaowei Cai, Avinash Ravichandran, Erhan Bas, Rahul Bhotika, Stefano Soatto
ICLR, 2023
arxiv / bibtex

Semi-supervised Vision Transformers at Scale
Zhaowei Cai, Avinash Ravichandran, Paolo Favaro, Manchen Wang, Davide Modolo, Rahul Bhotika, Zhuowen Tu, Stefano Soatto
NeurIPS, 2022
arxiv / code / bibtex

X-DETR: A Versatile Architecture for Instance-wise Vision-Language Tasks
Zhaowei Cai, Gukyeong Kwon, Avinash Ravichandran, Erhan Bas, Zhuowen Tu, Rahul Bhotika and Stefano Soatto
ECCV, 2022
arxiv / code / bibtex

Rethinking Few-Shot Object Detection on a Multi-Domain Benchmark
Kibok Lee, Hao Yang, Satyaki Chakraborty, Zhaowei Cai, Gurumurthy Swaminathan, Avinash Ravichandran and Onkar Dabeer
ECCV, 2022
arxiv / code / bibtex

Omni-DETR: Omni-Supervised Object Detection with Transformers
Pei Wang, Zhaowei Cai, Hao Yang, Gurumurthy Swaminathan, Nuno Vasconcelos, Bernt Schiele and Stefano Soatto
CVPR, 2022
arxiv / code / bibtex

Contrastive Neighborhood Alignment
Pengkai Zhu, Zhaowei Cai, Yuanjun Xiong, Zhuowen Tu, Luis Goncalves, Vijay Mahadevan and Stefano Soatto
arXiv, 2022
arxiv / bibtex

Advanced Methods for Robust Object Detection
Zhaowei Cai and Nuno Vasconcelos
book chapter of Advanced Methods and Deep Learning in Computer Vision, 2021
link / bibtex

Exponential Moving Average Normalization for Self-supervised and Semi-supervised Learning
Zhaowei Cai, Avinash Ravichandran, Subhransu Maji, Charless Fowlkes, Zhuowen Tu and Stefano Soatto
CVPR, 2021 (Oral)
arxiv / code / bibtex

Rethinking Differentiable Search for Mixed-Precision Neural Networks
Zhaowei Cai and Nuno Vasconcelos
CVPR, 2020
arxiv / code / bibtex

UA-DETRAC: A New Benchmark and Protocol for Multi-Object Detection and Tracking
Longyin Wen, Dawei Du, Zhaowei Cai, Zhen Lei, Ming-Ching Chang, Honggang Qi, Jongwoo Lim, Ming-Hsuan Yang and Siwei Lyu
CVIU, 2020
arxiv / project / bibtex

Towards Universal Object Detection by Domain Attention
Xudong Wang, Zhaowei Cai, Dashan Gao and Nuno Vasconcelos
CVPR, 2019
arxiv / project / code / bibtex

Cascade R-CNN: High Quality Object Detection and Instance Segmentation
Zhaowei Cai and Nuno Vasconcelos
T-PAMI, 2019
arxiv / project / code / bibtex

Cascade R-CNN: Delving into High Quality Object Detection
Zhaowei Cai and Nuno Vasconcelos
CVPR, 2018 (Spotlight)
arxiv / project / code / bibtex

Deep Learning with Low Precision by Half-wave Gaussian Quantization
Zhaowei Cai, Xiaodong He, Jian Sun and Nuno Vasconcelos
CVPR, 2017 (Spotlight)
arxiv / project / code / bibtex

A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection
Zhaowei Cai, Quanfu Fan, Rogerio S. Feris and Nuno Vasconcelos
ECCV, 2016
arxiv / project / code / bibtex

Learning Complexity-Aware Cascades for Pedestrian Detection
Zhaowei Cai, Mohammad Saberian and Nuno Vasconcelos
T-PAMI, 2019

Learning Complexity-Aware Cascades for Deep Pedestrian Detection
Zhaowei Cai, Mohammad Saberian and Nuno Vasconcelos
ICCV, 2015 (Oral)
arxiv / demo / bibtex

Robust Deformable and Occluded Object Tracking with Dynamic Graph
Zhaowei Cai, Longyin Wen, Zhen Lei, Nuno Vasconcelos and Stan Z. Li
T-IP, 2014
project / bibtex

Structured Visual Tracking with Dynamic Graph
Zhaowei Cai Longyin Wen, Jianwei Yang, Zhen Lei and Stan Z. Li
ACCV, 2012
bibtex

Robust Online Learned Spatio-Temporal Context Model for Visual Tracking
Longyin Wen, Zhaowei Cai, Zhen Lei, Dong Yi and Stan Z. Li
T-IP, 2014
bibtex

Online Spatio-Temporal Structural Context Learning for Visual Tracking
Longyin Wen, Zhaowei Cai, Zhen Lei, Dong Yi and Stan Z. Li
ECCV, 2012
bibtex

Person-Specific Face Tracking with Online Recognition
Zhaowei Cai, Longyin Wen, Dong Cao, Zhen Lei, Dong Yi and Stan Z. Li
FG, 2013
bibtex

A New Projection Space for Separation of Specular and Diffuse Reflection Components in Color Images
Jianwei Yang, Zhaowei Cai, Zhen Lei, Dong Yi and Stan Z. Li
ACCV, 2012
bibtex


Service
  • Area Chair: NeurIPS25, ACL25, ICML25, CVPR25, ICLR25, NeurIPS24, EMNLP24, CVPR24, ICCV23, CVPR23
  • Workshop co-organizer of Adversarial Robustness in the Real World in ECCV 2020.
  • Journal Reviewer: T-PAMI, IJCV, T-IP, CVIU, T-MM, PR, T-CSVT, T-ITS, T-Cybernetics
  • Conference Reviewer: ICCV25, ECCV24, NeurIPS23, NeurIPS22, ICML22, CVPR22, ICLR22, NeurIPS21, ICCV21, ICML21, CVPR21, ICLR21, NeurIPS20, ICML20, ECCV20, CVPR20, NeurIPS19, ICML19, ICCV19, CVPR19 (outstanding reviewer), NIPS18, ECCV18, ICML18, CVPR18, NIPS17, ICCV17, CVPR17

Talks
  • Invited talk at UCSC AI seminar: Image/Object/Mask-level Vision and Language Understanding.
  • Invited talk at UCSB NLP group: Pushing the Limits of Object Detection.
  • Invited talk at UCSB: Low-precision Neural Networks.
  • Guest lecture at UCSC: Exponential Moving Average Normalization for Self- and Semi- Supervised Learning.


template credit to Jon Barron