家人们来看看是不是报这个warning:
[W reducer.cpp:283] Warning: Grad strides do not match bucket view strides.
This may indicate grad was not created according to the gradient layout contract,
or that the param"s strides changed since DDP was constructed.
This is not an error, but may impair performance.
这里说了这不是个ERROR,但可能影响性能,如果诸位时间不是很着急,那就不用管,但是我这里越训练剩余时间越长hhh
我和这个老哥问题一样:
下边这个自动化所的老哥,给了一种解决方式:他认为是transpose和permute操作导致了步长不一致,于是用contiguous来让数据在内存中连续
我这样操作了:
文章为作者独立观点,不代表股票交易接口观点