跳转至

Self-Supervised Learning

约 205 个字 预计阅读时间 1 分钟

Task Specific Models

image-20240229093558453

image-20240229093616918

image-20240229093642113

Pretext Task

*Goal: *Define an auxiliary task (e.g., rotation) for pre-training with lots of unlabeled data.

Then, remove last layers, train smaller network on fewer data of target task.

Pre-train feature function \(f_{\theta}\), and fine tune the readout layer. (e.g. MLP, classifiers)

e.g.:

image-20240229093903788

  • Short cuts

    Useful for solving pre-text but not target task.

    The Neural Network often finds the most simple way to complete a task.(edge continuity...)

Contrastive Learning

*Goal: *the pre-training task and the transfer task are “aligned”

image-20240229094157492

  • Score functions are often chosen as cosine similarity: \(s(\mathbf{f_1, f_2}) = \mathbf{\frac{f_1^T f_2}{||f_1||\ ||f_2||}}\), similar to the siame network!
  • Loss function: multi-class cross entropy loss

    image-20240301090442102

    This is also known as the InfoNCE loss. For mutual information \(MI\), \(MI[f(x), f(x^+)] \geq log(N) - \mathcal L\)

    More sample \(N\), tighter bound, larger mutual information.

Momentum Contrast

image-20240301090748264

Barlow Twins

image-20240301090824846

**Goal: ** reduce redundancy between neurons, as well as increase similarity between \(Z^A, Z^B\).

In other words, Neurons should be invariant to data augmentations but independent of others.

image-20240301091001749

\(\mathcal C\) is just \(Cov(Z^A, Z^B)\) ! \(\mathcal C_{i,j}\) = 0, means no corrolation.

No negative samples needed any more!