<![CDATA[PhD Defense by Viraj Prabhu]]>

670909 event 1699302882 1699987868 <![CDATA[PhD Defense by Viraj Prabhu]]> Title: Towards Reliable Computer Vision Systems

Date: Monday, November 20, 2023

Time: 4:00-6:00pm (ET)

Location: CODA C1115 (Druid Hills) & Zoom

Viraj Prabhu

PhD student in Computer Science

School of Interactive Computing

Georgia Institute of Technology

Committee

Dr. Judy Hoffman (advisor), School of Interactive Computing, Georgia Institute of Technology

Dr. Dhruv Batra, School of Interactive Computing, Georgia Institute of Technology & Meta

Dr. James Hays, School of Interactive Computing, Georgia Institute of Technology

Dr. Zsolt Kira, School of Interactive Computing, Georgia Institute of Technology

Dr. Sanja Fidler, University of Toronto & NVIDIA

Abstract

The real world has infinite visual variation – across viewpoints, time, space, and curation. As deep visual models become ubiquitous in high-stakes applications, their ability to generalize across such variation becomes increasingly important. Such generalization will alleviate the need to label a large corpus for every new deployment, which may be infeasible due to data volume (e.g., autonomous driving) or labeling cost (e.g., medical diagnosis). Further, it is necessary to overcome the natural spatiotemporal distribution shifts that a deployed model will invariably face (e.g., changing geographies and seasons). Finally, such generalization will unlock the possibility of knowledge transfer from inexpensive sources of data (e.g., transferring models trained in simulation to reality).

In this thesis, I will present opportunities to improve such generalization at different stages of the ML lifecycle. First, I will discuss proactive strategies for training robust models by leveraging simulation to augment the long tail of real training data. Next, I will present reactive strategies to recover from unforeseen distribution shifts via self-supervised domain adaptation. Finally, I will present a framework to stress-test the robustness of vision models by leveraging foundation models for text and image synthesis to generate challenging counterfactual test cases.

]]> Towards Reliable Computer Vision Systems

]]> <![CDATA[]]> 221981 1788 100811