Mid-Level Vision vs. High-Level Vision Tasks.
We provide a comprehensive evaluation of prominent self-supervised learning methods (SSLs) across a wide range of mid-level vision tasks (a), complementing the standard evaluation in high-level vision tasks (b). Although SSL performance in mid-level vision tasks (e.g., depth estimation) is positively correlated with ImageNet linear probing (c, left), this correlation is much weaker than that observed among high-level vision tasks (e.g., Caltech-256 vs. ImageNet classification) (c, right), as indicated by the R2
statistics.