OODles of ODDs: Surveying the landscape of Out-of-distribution work in recent ML literature

In this work, we chart out the landscape of ideas encapsulated within the ambit of Out-of-Distribution (OOD) or Out-of-distribution-detection (ODD) labels in ML literature.
The SPICE for this is disseminated via two formats.

  1. Static SPICE (PDF with URLs to the papers)

  2. Interactive SPICE below. (Click on the links and the representative Image appears)

Our wish list for the field

1.NSL (Neural Structured Learning): and it’s ramifications on OOD still remains to be explored: Did you just harness NSL to get a better top-k accuracy on your precious little validation set? Check the newly minted model's OOD susceptibility levels. You might be in for a surprise.

2. Entangled domains: If you, as a machine learner have ever worked on User-authentication (Not user-surveillance. But anonymized user-authentication ), then the [ User | Not-user ] classification is IMHO the ultimate stress-test (perhaps alongside new species identification) for OOD algorithms. In this case, there is zero hand-waviness as to what In-distribution and Out-of-distribution. In-distribution captures all the vagaries captured the user-enrollment. The out-of-distribution captures everything in the universe that is NOT generated by the User you are trying to authenticate. Try your fancy-schmancy, ahem, "ODD algorithms" from your Computer-vision biased SOTA and an attitude-correct and a reality check is in store for you!

3. X-MNISTs are far harder than you think: Most if y'all think of digit-glyph classification when you think of MNIST/QMNIST/Kannada-MNIST. However these datasets also carry the volunteer-ID information associated with the digits. (Example See this tensor for Kannada-MNIST). Should you split these datasets to be in-cohort volunteers authoring the digits and pose ODD as out-of-cohort-volunteer detection, your easy-little MNIST dataset turns into an entangled mess in a jiffy. Give it a try!

4. Fine-grained classification: While acknowledging the need for standardized comparisons, we remark that the OOD landscape is littered with strange matchups drawn from Tiny-ImageNet , LSUN, CIFAR-x and SVHN datasets that seem rather ad-hoc. This coincides with this uncanny side-stepping of the world of fine-grained classification datasets. Don't they constitute the Rosetta stone for the field?

5. Anomaly detection / outlier detection / novelty detection:[ **Insert Distracted Boyfriend meme here** ]. In the pre-deep learning Neanderthal era, loads of interesting and useful ideas were

6. Mind the Domain-bias: This should seem like a no-brainer, but I'd reckon its worth reiterating that there's exists a vast world outside of Machine Learning outside of Computer Vision. The intuitions developed on a bunch of idiosyncratic datasets in Computer Vision do not / cannot and will not hold true when you switch the application. As someone who grapples with human kinematics and motion-sensor tensors on a daily basis, I can vouch that when it comes to transfer learning, optimal architecture design , learning-rate regimes and ODD, most best-practices developed in Computer Vision do not hold true!

7. Hypothesis testing literature: Seriously people. There's an entire rich sub-domain of stats called Hypothesis testing that the DL-bros have totally side-stepped. Take a look. Especially at this Hyppo

8. Don’t ignore calibration : I've observed that often model calibration ideas are weirdly kept separated from the OOD question. Isn't efficient ODD at the heart of a well calibrated model?

9. Mind the loopholes in falling back on humans : Face-based surveillance systems deployed in real world are, in fact, human-in-the-loop systems. yet they falter. This should not come as a surprise. The human vision system is NOT bias-free! Mind the 3 Ps: Psychonomics, Prosopagnosia, Pareidolia.

10. In defense of the NN- Isolation bias: Much akin to Model calibration, the sub-domains of Membership inference attacks, Dataset distillation, Lottery-ticket-hypothesis-type model compression and data-augmentation techniques strongly influence OOD susceptibility. In the case of

11. Emergent lure of Africa as an OOD proving ground: As much of the imagined occident and the Far East falls under the sway of Big Data, it gets accommodated as ‘in-distribution’. This is quickly eliciting a predictivism driven ‘observer effect’ where Algorithms trained in familiar “in-distributed” settings excelling on in-distributed data are ‘Meh!’. This sets the stage for an emergent lure of Africa as an OOD proving ground where algorithms trained in in-distributed occidental bastions endure their baptism by fire.