International Conference on Computer Vision
| Year | Title | Citations | Links |
|---|---|---|---|
| 2021 | Swin Transformer: Hierarchical Vision Transformer using Shifted Windows | 29,391 | S2 · arXiv |
| 2023 | Segment Anything | 11,700 | S2 · arXiv |
| 2019 | Searching for MobileNetV3 | 8,639 | S2 · arXiv |
| 2021 | Emerging Properties in Self-Supervised Vision Transformers | 8,177 | S2 · arXiv |
| 2023 | Adding Conditional Control to Text-to-Image Diffusion Models | 6,045 | S2 · arXiv |
| 2019 | FCOS: Fully Convolutional One-Stage Object Detection | 5,846 | S2 · arXiv |
| 2019 | CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features | 5,626 | S2 · arXiv |
| 2021 | Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions | 4,666 | S2 · arXiv |
| 2022 | Scalable Diffusion Models with Transformers | 4,611 | S2 · arXiv |
| 2019 | CenterNet: Keypoint Triplets for Object Detection | 3,235 | S2 · arXiv |
| 2019 | KPConv: Flexible and Deformable Convolution for Point Clouds | 3,052 | S2 · arXiv |
| 2021 | ViViT: A Video Vision Transformer | 2,763 | S2 · arXiv |
| 2019 | FaceForensics++: Learning to Detect Manipulated Facial Images | 2,705 | S2 · arXiv |
| 2021 | Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields | 2,540 | S2 · arXiv |
| 2021 | Vision Transformers for Dense Prediction | 2,408 | S2 · arXiv |
| 2023 | Sigmoid Loss for Language Image Pre-Training | 2,398 | S2 · arXiv |
| 2021 | Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet | 2,379 | S2 · arXiv |
| 2021 | CvT: Introducing Convolutions to Vision Transformers | 2,306 | S2 · arXiv |
| 2021 | An Empirical Study of Training Self-Supervised Vision Transformers | 2,234 | S2 · arXiv |
| 2019 | SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences | 2,219 | S2 |