I’ve decided that I’m going to try to gorup things into 10 day “units,” not because it will aid the flow of content in any way, but because it should aid the setting of goals. For each 10 days, I’ll try to cover at least 2-3 papers, 3 lectures, 1 section in the math for machine learning textbook along with supplementary resources if necessary, and some work in colab notebooks. The content from these different categories may or may not be related to each other, but most likely won’t be.

Unit 1: 2-01 to 2-10

Goals:

Lecture: Watch weeks 5-6 of [NYU]
Papers:
- [Graves ‘12 Transduction]
Math: [2. Linear Algebra ]
Coding: Canoodling with machine translation

2022-02-01

watched NYU 5L on EBMs. The concept seems pretty cool but extremely confusing
Started playing around with coding on the machine translation task. Going in blind for now.

2022-02-02

watched NYU 5.1 and NYU 5.2. Details on EBM still unclear but it seems will become more clear with next 2 lectures.
Started first pass through [Graves ‘12 Transduction]

2022-02-03

Second pass through [Graves ‘12 Transduction].
Notes:
- What: use RNN to transduce sequences, i.e. map each unit of an input sequence to an output sequence chunk.
- How: See https://lorenlugosch.github.io/images/transducer/transducer-model.png from Loren’s blog post
- Why: RNN had been mostly used for problems where alignment was already known, this paper shows RNN application to new domains of problems
- Blog post by Loren Lugosch has a nice summary
TODO: implement tranducer models
- Found an implementation for reference to double check work

2022-02-04

Watched NYU 6 and NYU 7L
- Realized I skipped 06L on latent variable EBMs, which explains why I felt lost a couple times during the lecture. Need to watch 6 tomorrow
Section 2.0-2.1 of the Math for Machine Learning book. Glad I watched the 3Blue1Brown series first, definitely gives more life into the subject. I’d imagine I’d be extremely bored reading the dry textbook chapter.
Set up some basic networks with a recurrent unit in the translation notebook, actual training fun to be done tomorrow

2022-02-05

Trained some toy models to translate < 20 word sentences from german to english. Notebook
First half of [NYU 6L] on latent EBMs
Unrelated to ML side note: finished watching K-Drama our beloved summer on netflix, easily slots into my top 3 dramas of all time

2022-02-05

Second half of [NYU 6L] on latent EBMs
- Barlow twins and SwAV seem very cool advances in pretraining - interested to see if similar methods are applicable in text (also if I have enough VRAM to train big enough networks)
Unrelated to ML side note: finished watching K-Drama our beloved summer on netflix, easily slots into my top 3 dramas of all time

2022-02-06

Textbook chapter 2 on linear algebra finished
Started implementing Tranducer, still a bit confused on how to train the model but hopefully will become clearer over time

2022-02-27

Yeah… the game Lost Ark came out on Feb 8th so that’s what happened in those missing days outside of work time mostly. My characters in that game are in a good place now, so I should have more time to re-apply myself in DL moving forward. During this time, I have almost finished watching the NYU deep learning lecture series on Youtube, and below are some interesting parts I remember and may follow up on.

Awni’s lecture on Speech
- Cool connection with energy based modeling here - casting alignment as a search over a latent space helps EBMs click more in my head
- Awni mentioned differentiable beam-search as a research effort - search for more on this later
- Alignment search corresponds to beam search later on down the road - follow up test to-do: train speech system with CTC loss for X epochs, then substitute out latent search for beam search and just backprop through beam-searched path
Ishan’s lecture on SSL in vision
- Need to revisit barlow twisn to make sure I understand the premise correctly
- are there any layer-wise losses that attempt to constrain representations at every level like barlow twins, but layer-wise?
- future research direction?: better ways to break symmetry in vision networks
  - teacher-student -> study group
Optimization lecture
- Different approximations to calculating positive Hessians in non-convex cases exist, follow up and find an article on this
- 75 page paper by Bouttou on SGD - might be worth a skim later