Self-Supervised Learning¶
约 205 个字 预计阅读时间 1 分钟
Task Specific Models¶
Pretext Task¶
*Goal: *Define an auxiliary task (e.g., rotation) for pre-training with lots of unlabeled data.
Then, remove last layers, train smaller network on fewer data of target task.
Pre-train feature function \(f_{\theta}\), and fine tune the readout layer. (e.g. MLP, classifiers)
e.g.:
-
Short cuts
Useful for solving pre-text but not target task.
The Neural Network often finds the most simple way to complete a task.(edge continuity...)
Contrastive Learning¶
*Goal: *the pre-training task and the transfer task are “aligned”
- Score functions are often chosen as cosine similarity: \(s(\mathbf{f_1, f_2}) = \mathbf{\frac{f_1^T f_2}{||f_1||\ ||f_2||}}\), similar to the siame network!
-
Loss function: multi-class cross entropy loss
This is also known as the InfoNCE loss. For mutual information \(MI\), \(MI[f(x), f(x^+)] \geq log(N) - \mathcal L\)
More sample \(N\), tighter bound, larger mutual information.
Momentum Contrast¶
Barlow Twins¶
**Goal: ** reduce redundancy between neurons, as well as increase similarity between \(Z^A, Z^B\).
In other words, Neurons should be invariant to data augmentations but independent of others.
\(\mathcal C\) is just \(Cov(Z^A, Z^B)\) ! \(\mathcal C_{i,j}\) = 0, means no corrolation.
No negative samples needed any more!