introduction | Zangwei

计算机视觉

Image(video) -> (sensing device) -> (interpreting device) -> Interpretations
Automatic understanding of images and video
- Measurement: Computing properties of the 3D world from visual data
- Perception and interpretation: Algorithms and representations to allow a machine to recognize objects, scene and people
应用
- Faces and digital cameras
- Video-based interfaces
- safety and security
  - Navigation, driver safety
  - Monitoring pool
  - Pedestrian detection
  - Surveillance
- Vison for medical and neuroimages
困难：Gap between low level signal and high level meanings
- ill posed problem
- large variation: illumination, object pose, clutter, occlusions, intra-class appearance, viewpoint
- intra-class variation
- context
- complexity
Progress chart by dataset
- Roberts 1963
- COIL
- MIT-CMU Faces (2000)
- UIUC Cars
- INRIA Pedestrians
- MSRC 21 Objects (2005)
- Caltech-101
- Caltech-256
- PASCAL VOC (2010)：奠定了计算机视觉评价体系
- Faces in the Wild
- 80M Tiny Images
- ImageNet：多；长尾
- Birds-200
Tasks
- Recogintion: General categories
- Large scale recognition
- Recognition in first-person view
- Object detection, instance segmentation
- image captioning
- image generation
CVPR'19 by the numbers
- Submissions: 5160
- Accepted: 1294
- Registered Attendees: 9227
Marr’s vision framework
- computational level
- algorithmic/representational level
- implementational/physical level
Malik’s Perspective: Recognition, Reorganization, Reconstruction
Important note: In general, computer vision does not work (except in certain situations/conditions)