Zhaowei Cai

I am a Senior Applied Scientist with Amazon AGI multimodal team, where I work on computer vision and machine learning. I received my Ph.D. and M.S. degrees from UC San Diego, advised by Nuno Vasconcelos.

I have fortunately worked as research intern at Facebook AI Research (FAIR), Micsoft Research Redmond (MSR), IBM T. J. Watson Research, and Institute of Automation, Chinese Academy of Sciences (CASIA).

Email  /  Google Scholar  /  Github

profile photo

Recent News
  • [02/2023] PolyFormer on referring image segmentation was accepted by CVPR 2023!
  • [01/2023] MaskVLM was accepted by ICLR 2023!
  • [12/2022] The code of Semi-ViT has been released!
  • [10/2022] I will be serving as an Area Chair for CVPR 2023 and ICCV 2023.
  • [09/2022] We start hiring research interns for 2023 on various computer vision topics at AWS AI Labs. If you are interested, please drop me an email at zhaoweic@amazon.com!/li>
  • [09/2022] Semi-ViT was accepted by NeurIPS 2022. The codes will be coming soon!
  • [08/2022] Check out our recent works (semi-supervised ViT and masked vision and language modeling).
  • [07/2022] Two papers (X-DETR and few-shot detection benchmark) were accepted by ECCV 2022. The codes will be coming soon!
  • [06/2022] The code of Omni-DETR has been released! Check our code.
  • [03/2022] Omni-DETR was accepted by CVPR 2022. The code is coming soon!
  • [06/2021] The code of EMAN has been released! Check our code.
  • [02/2021] The EMAN paper was accepted by CVPR 2021 as Oral.

Research

I am interested in computer vision and machine learning, especially vision and language understanding, object detection, semi- and self-supervised learning, low-precision neural networks, etc.

PolyFormer: Referring Image Segmentation as Sequential Polygon Generation
Jiang Liu*, Hui Ding*, Zhaowei Cai, Yuting Zhang, Ravi Kumar Satzoda, Vijay Mahadevan, R. Manmatha (*equal contribution)
CVPR, 2023
arxiv / code / bibtex

Masked Vision and Language Modeling for Multi-modal Representation Learning
Gukyeong Kwon, Zhaowei Cai, Avinash Ravichandran, Erhan Bas, Rahul Bhotika, Stefano Soatto
ICLR, 2023
arxiv / bibtex

Semi-supervised Vision Transformers at Scale
Zhaowei Cai, Avinash Ravichandran, Paolo Favaro, Manchen Wang, Davide Modolo, Rahul Bhotika, Zhuowen Tu, Stefano Soatto
NeurIPS, 2022
arxiv / code / bibtex

X-DETR: A Versatile Architecture for Instance-wise Vision-Language Tasks
Zhaowei Cai, Gukyeong Kwon, Avinash Ravichandran, Erhan Bas, Zhuowen Tu, Rahul Bhotika and Stefano Soatto
ECCV, 2022
arxiv / code / bibtex

Rethinking Few-Shot Object Detection on a Multi-Domain Benchmark
Kibok Lee, Hao Yang, Satyaki Chakraborty, Zhaowei Cai, Gurumurthy Swaminathan, Avinash Ravichandran and Onkar Dabeer
ECCV, 2022
arxiv / code / bibtex

Omni-DETR: Omni-Supervised Object Detection with Transformers
Pei Wang, Zhaowei Cai, Hao Yang, Gurumurthy Swaminathan, Nuno Vasconcelos, Bernt Schiele and Stefano Soatto
CVPR, 2022
arxiv / code / bibtex

Contrastive Neighborhood Alignment
Pengkai Zhu, Zhaowei Cai, Yuanjun Xiong, Zhuowen Tu, Luis Goncalves, Vijay Mahadevan and Stefano Soatto
arXiv, 2022
arxiv / bibtex

Advanced Methods for Robust Object Detection
Zhaowei Cai and Nuno Vasconcelos
book chapter of Advanced Methods and Deep Learning in Computer Vision, 2021
link / bibtex

Exponential Moving Average Normalization for Self-supervised and Semi-supervised Learning
Zhaowei Cai, Avinash Ravichandran, Subhransu Maji, Charless Fowlkes, Zhuowen Tu and Stefano Soatto
CVPR, 2021 (Oral)
arxiv / code / bibtex

Rethinking Differentiable Search for Mixed-Precision Neural Networks
Zhaowei Cai and Nuno Vasconcelos
CVPR, 2020
arxiv / code / bibtex

UA-DETRAC: A New Benchmark and Protocol for Multi-Object Detection and Tracking
Longyin Wen, Dawei Du, Zhaowei Cai, Zhen Lei, Ming-Ching Chang, Honggang Qi, Jongwoo Lim, Ming-Hsuan Yang and Siwei Lyu
CVIU, 2020
arxiv / project / bibtex

Towards Universal Object Detection by Domain Attention
Xudong Wang, Zhaowei Cai, Dashan Gao and Nuno Vasconcelos
CVPR, 2019
arxiv / project / code / bibtex

Cascade R-CNN: High Quality Object Detection and Instance Segmentation
Zhaowei Cai and Nuno Vasconcelos
T-PAMI, 2019
arxiv / project / code / bibtex

Cascade R-CNN: Delving into High Quality Object Detection
Zhaowei Cai and Nuno Vasconcelos
CVPR, 2018 (Spotlight)
arxiv / project / code / bibtex

Deep Learning with Low Precision by Half-wave Gaussian Quantization
Zhaowei Cai, Xiaodong He, Jian Sun and Nuno Vasconcelos
CVPR, 2017 (Spotlight)
arxiv / project / code / bibtex

A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection
Zhaowei Cai, Quanfu Fan, Rogerio S. Feris and Nuno Vasconcelos
ECCV, 2016
arxiv / project / code / bibtex

Learning Complexity-Aware Cascades for Pedestrian Detection
Zhaowei Cai, Mohammad Saberian and Nuno Vasconcelos
T-PAMI, 2019

Learning Complexity-Aware Cascades for Deep Pedestrian Detection
Zhaowei Cai, Mohammad Saberian and Nuno Vasconcelos
ICCV, 2015 (Oral)
arxiv / demo / bibtex

Robust Deformable and Occluded Object Tracking with Dynamic Graph
Zhaowei Cai, Longyin Wen, Zhen Lei, Nuno Vasconcelos and Stan Z. Li
T-IP, 2014
project / bibtex

Structured Visual Tracking with Dynamic Graph
Zhaowei Cai Longyin Wen, Jianwei Yang, Zhen Lei and Stan Z. Li
ACCV, 2012
bibtex

Robust Online Learned Spatio-Temporal Context Model for Visual Tracking
Longyin Wen, Zhaowei Cai, Zhen Lei, Dong Yi and Stan Z. Li
T-IP, 2014
bibtex

Online Spatio-Temporal Structural Context Learning for Visual Tracking
Longyin Wen, Zhaowei Cai, Zhen Lei, Dong Yi and Stan Z. Li
ECCV, 2012
bibtex

Person-Specific Face Tracking with Online Recognition
Zhaowei Cai, Longyin Wen, Dong Cao, Zhen Lei, Dong Yi and Stan Z. Li
FG, 2013
bibtex

A New Projection Space for Separation of Specular and Diffuse Reflection Components in Color Images
Jianwei Yang, Zhaowei Cai, Zhen Lei, Dong Yi and Stan Z. Li
ACCV, 2012
bibtex


Service
  • Area Chair: NeurIPS24, CVPR24, ICCV23, CVPR23
  • Workshop co-organizer of Adversarial Robustness in the Real World in ECCV 2020.
  • Journal Reviewer: T-PAMI, IJCV, T-IP, CVIU, T-MM, PR, T-CSVT, T-ITS, T-Cybernetics
  • Conference Reviewer: NeurIPS22, ICML22, CVPR22, ICLR22, NeurIPS21, ICCV21, ICML21, CVPR21, ICLR21, NeurIPS20, ICML20, ECCV20, CVPR20, NeurIPS19, ICML19, ICCV19, CVPR19 (outstanding reviewer), NIPS18, ECCV18, ICML18, CVPR18, NIPS17, ICCV17, CVPR17

Talks
  • Invited talk at UCSC AI seminar: Image/Object/Mask-level Vision and Language Understanding.
  • Invited talk at UCSB NLP group: Pushing the Limits of Object Detection.
  • Invited talk at UCSB: Low-precision Neural Networks.
  • Guest lecture at UCSC: Exponential Moving Average Normalization for Self- and Semi- Supervised Learning.


template credit to Jon Barron