Self-Supervised Learning¶

约 205 个字预计阅读时间 1 分钟

*Goal: *Define an auxiliary task (e.g., rotation) for pre-training with lots of unlabeled data.

Then, remove last layers, train smaller network on fewer data of target task.

Pre-train feature function \(f_{\theta}\), and fine tune the readout layer. (e.g. MLP, classifiers)

e.g.:

Short cuts

Useful for solving pre-text but not target task.

The Neural Network often finds the most simple way to complete a task.(edge continuity...)

*Goal: *the pre-training task and the transfer task are “aligned”

Score functions are often chosen as cosine similarity: \(s(\mathbf{f_1, f_2}) = \mathbf{\frac{f_1^T f_2}{||f_1||\ ||f_2||}}\), similar to the siame network!

Loss function: multi-class cross entropy loss

This is also known as the InfoNCE loss. For mutual information \(MI\), \(MI[f(x), f(x^+)] \geq log(N) - \mathcal L\)

More sample \(N\), tighter bound, larger mutual information.

**Goal: ** reduce redundancy between neurons, as well as increase similarity between \(Z^A, Z^B\).

In other words, Neurons should be invariant to data augmentations but independent of others.

\(\mathcal C\) is just \(Cov(Z^A, Z^B)\) ! \(\mathcal C_{i,j}\) = 0, means no corrolation.

No negative samples needed any more!