In this talk I will demonstrate that transfer of visual representations pre-trained on large-scale data can dramatically improve sample efficiency and simplify hyperparameter tuning. In the first part of the talk I will discuss challenges that arise in large-scale pre-training and how to address them. Then I will dive into strategies for adapting pre-trained models for a target task. Finally, I will present extensive empirical evaluation of large-scale visual models and highlight many surprising findings. In particular, it turns out that huge models pre-trained on large data not only achieve state-of-the-art performance on many standard vision benchmarks, but are also very strong few-shot learners and generalize well in out-of-distribution evaluation scenarios.