学习率预热是什么意思？

1. Introduction

In this tutorial, we’re going to discuss the Learning Rate Warm-up, which is a method that aims to automatically tune a hyper-parameter called Learning Rate (LR) before formally starting to train a model.

2. Context

3. Dive Into the Learning Rate Warm-up Heuristic

4. Different Types of Learning Rate Warm-up Heuristics and Alternatives

There exist several types of LR warm-ups. Let’s go through some of them below:

LR schedule

We denote by $\nu$ the initial LR and the number of warm-up steps (epochs or iterations):

Constant Warm-up: a constant LR value $\nu_w < \nu$ is used to warm up the network. Then, the training directly starts with LR $\nu$ . One drawback is the abrupt change of LR from $\nu_0$ to $\nu$
Gradual or linear Warm-up: the warm-up starts at LR $\nu_0$ and linearly increases to $\nu$ in steps. In other words, the LR at step is $\nu_j = \nu_0 + j\delta, \delta = (\nu - \nu_0) / (w - 1)$ . The gradual change of LR smoothes the Warm-up by connecting $\nu_0$ and $\nu$ with several intermediate LR steps

Here again, there is no method to find the most efficient LR Warm-up type other than trying them one by one with different LR values.

There are other alternatives to this heuristic such as using an optimizer called RAdam. This relatively recent optimizer provides better control of the gradient variance, which is necessary when the model trains at a high LR. RAdam detects variance instabilities and smoothly changes the LR to avoid divergence at the earliest training steps.

5. Conclusion

In this article, we discussed the Learning Rate Warm-up as a powerful network regularization technique. In most common model training, it is combined with a high LR that decreases over time, which is a standard and reliable solution to obtain a good model, including large models such as Transformers.

Persistence

REST

Security

1. Introduction

2. Context

3. Dive Into the Learning Rate Warm-up Heuristic

4. Different Types of Learning Rate Warm-up Heuristics and Alternatives

5. Conclusion