This page will be a master list of resources (e.g. video lectures, books, papers, etc.). Most of these should just be useful links, the actual content and my summaries will be in posts.

General Deep Learning

Papers/books

Video Lectures

Websites

  • Depth First Learning Self-contained curriculum over various topics, e.g. optimization, the sigmoid, variational inference

Statistical Learning Theory

Video Lectures

Computer Vision

Papers

  • [Krizhevsky ‘12] ImageNet Classification with Deep Convolutional Neural Networks
  • [Szegedy ‘14] Going deeper with convolutions
  • [Szegedy ‘15] Rethinking the Inception Architecture for Computer Vision
  • [Vinyals ‘15] Show and Tell: A Neural Image Caption Generator
  • [He ‘15] Deep Residual Learning for Image Recognition (aka resnet)
  • [Simonyan ‘15] Very Deep Convolutional Networks for Large-Scale Image Recognition
  • [Ren ‘16] Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
  • [Redmon ‘16] You Only Look Once: Unified, Real-Time Object Detection
  • [Reed ‘16] Generative Adversarial Text to Image Synthesis
  • [Oord ‘16] Pixel Recurrent Neural Networks
  • [Oord ‘16] Conditional Image Generation with PixelCNN Decoders
  • [He ‘16] Identity Mappings in Deep Residual Networks
  • [Xie ‘17’] Aggregated Residual Transformations for Deep Neural Networks
  • [Qi ‘17] PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation
  • [Huang ‘17] Densely Connected Convolutional Networks
  • [He ‘18] Mask R-CNN
  • [Gkioxari ‘19] Mesh R-CNN
  • [Zhang ‘19] Self-Attention Generative Adversarial Networks
  • [Tan ‘20] EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
  • [Park ‘20] Contrastive Learning for Unpaired Image-to-Image Translation
  • [Dosovitsky ‘20] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
  • [Tolstikhin ‘21] MLP-Mixer: An all-MLP Architecture for Vision
  • [Bardes ‘21] VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning
  • [Chen ‘21] Pix2seq: A Language Modeling Framework for Object Detection
  • [Liu ‘21] A ConvNet for the 2020s
  • [He ‘21] Masked Autoencoders Are Scalable Vision Learners

Graph Neural Networks

Papers

Reinforcement Learning

Papers

Natural Language Processing

Papers

  • [Bengio ‘94] Learning Long-Term Dependencies with Gradient Descent is difficult
  • [Graves ‘12] Sequence Transduction with Recurrent Neural Networks
  • [Graves ‘13] Speech Recognition with Deep Recurrent Neural Networks
  • [Mikolov ‘13] Efficient Estimation of Word Representations in Vector Space
  • [Mikolov ‘13] Distributed Representations of Words and Phrases and their Compositionality
  • [Sutskever ‘14] Sequence to Sequence Learning with Neural Networks
  • [Abdel-Hamid ‘14] Convolutional neural networks for speech recognition
  • [Bahdanau ‘15] Neural Machine Translation by Jointly Learning to Align and Translate
  • [Amodei ‘15] Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
  • [Chorowski ‘15] Attention-Based Models for Speech Recognition
  • [Zhang ‘15] Character-level Convolutional Networks for Text Classification
  • [Luong ‘16] Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models
  • [Chan ‘16’] Listen, Attend and Spell
  • [Miller ‘16] Key-Value Memory Networks for Directly Reading Documents
  • [Vaswani ‘17] Attention is all you need
  • [Chiu ‘17] State-of-the-art Speech Recognition With Sequence-to-Sequence Models
  • [Wang ‘17] Tacotron: Towards End-to-End Speech Synthesis
  • [Huang ‘18] Music Transformer: Generating music with long-term structure
  • [Radford ‘18] Improving Language Understanding by Generative Pre-Training
  • [Peters ‘18] Deep contextualized word representations
  • [Radford ‘19] Language Models are Unsupervised Multitask Learners
  • [Devlin ‘19] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
  • [Lan ‘19] ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
  • [Liu ‘19] RoBERTa: A Robustly Optimized BERT Pretraining Approach
  • [Bińkowski ‘19] High Fidelity Speech Synthesis with Adversarial Networks
  • [Kong ‘19] A Mutual Information Maximization Perspective of Language Representation Learning
  • [Brown 20’] Language Models are Few-Shot Learners
  • [Beltagy ‘20] Longformer: The Long-Document Transformer
  • [Kitaev ‘20] Reformer: The Efficient Transformer
  • [Conformer ‘20] Conformer: Convolution-augmented Transformer for Speech Recognition
  • [Xie ‘21] SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers

Websites

Miscellaneous

Papers

  • [Graves ‘06] Neural turing machines
  • [Kingma ‘14] the VAE paper
  • [Srivastava ‘14] Dropout: A Simple Way to Prevent Neural Networks from Overfitting
  • [Kingma ‘14] Adam: A Method for Stochastic Optimization
  • [Ioffe ‘15] Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
  • [Han ‘15] Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
  • [Salimans ‘16] Improved techniques for training GANs
  • [Ba ‘16] Layer Normalization
  • [Chollet ‘17] Xception: Deep Learning with Depthwise Separable Convolutions
  • [Wu ‘18] Group Normalization
  • [Hu ‘18] Squeeze-and-Excitation Networks
  • [McCandlish ‘18] An Empirical Model of Large-Batch Training
  • [Barratt ‘18] A Note on the Inception Score
  • [Gidaris ‘18] Unsupervised Representation Learning by Predicting Image Rotations
  • [Oord ‘18] Representation Learning with Contrastive Predictive Coding
  • [Recht ‘19] Do ImageNet Classifiers Generalize to ImageNet?
  • [Frankle ‘19] The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
  • [Wu ‘21] NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
  • [Drori ‘21] A Neural Network Solves and Generates Mathematics Problems by Program Synthesis: Calculus, Differential Equations, Linear Algebra, and More

Unsupervised Learning

Papers

Meta