The generalization performance of deep learning is impressive. In unsupervised learning, recent generative models such as diffusion models are able to generate very high-quality images. In supervised learning, deep networks trained to classify images learn features that transfer to a wide variety of tasks. This raises the question: to what extent is what is learned independent on the training samples and even the training data distribution? I will show that diffusion models transition from memorization to generalization as the number of training samples increases, and that networks trained on different image classification datasets learn common features even at deep layers. These results imply that a large part of what deep learning extracts from an image dataset is universal.