Paddle中:对于一个可训练的参数,如果在ParamAttr中定义了正则化,那么会忽略optimizer中的正则化
my_conv2d = Conv2D(
in_channels=10,
out_channels=10,
kernel_size=1,
stride=1,
padding=0,
weight_attr=ParamAttr(regularizer=L2Decay(coeff=0.01)), # <----- 此处优先级高
bias_attr=False)
然而实际上和正则化是一个东西
下面用L2DecayAPI来进行一个小实验:
import paddle
from paddle.regularizer import L2Decay
paddle.seed(1107)
linear = paddle.nn.Linear(3, 4, bias_attr=False)
old_linear_weight = linear.weight.detach().clone()
# print(old_linear_weight.mean())
inp = paddle.rand(shape=[2, 3], dtype="float32")
out = linear(inp)
coeff = 0.1
loss = paddle.mean(out)
# loss += 0.5 * coeff * (linear.weight ** 2).sum()
momentum = paddle.optimizer.Momentum(
learning_rate=0.1,
parameters=linear.parameters(),
weight_decay=L2Decay(coeff)
)
loss.backward()
momentum.step()
delta_weight = linear.weight - old_linear_weight
# print(linear.weight.mean())
# print( - delta_weight / linear.weight.grad ) # 学习率
print( delta_weight )
momentum.clear_grad()
coeffcoeffcoeff就是权重衰减系数或者叫正则化系数
有点儿跟不上潮流了,我还以以为这个权重衰减是每次迭代给权重乘以一个0.9999让他进行衰减呢
都在loss后边加一个参数的L1/L2范数,当成损失函数进行优化
我在5年前管这个限制参数过大的东西叫正则化,结果现在叫权重衰减了hhh
loss+=0.5∗coeff∗reduce_sum)lossmathrel{+}=0.5*coeff*reduce\_sum)loss+=0.5∗coeff∗reduce_sum)
文章为作者独立观点,不代表股票交易接口观点