Description
I am learning the chronicles_prequel, and I find the last table in the chapter indicates the higher TFLOPS is achieved with Zero_Stage = 1.
Trying with ZeRO_STAGE=0/1
Zero_stage=1 could reduce the memory cost, but how come it increases the performance with other parameter being the same?
| Nodes |
Size |
ZS |
DP |
TP |
PP |
MBS |
GBS |
Mem |
Sec/it |
TFLOPs |
Notes |
| 48 |
181B |
1 |
4 |
8 |
12 |
2 |
2048 |
37GB |
120.29 |
134.02 |
02-21 |
| 48 |
181B |
0 |
4 |
8 |
12 |
2 |
2048 |
72GB |
137.34 |
113.02 |
02-21 |