upgrade value residual to learnt mixing per token / head #2218
Job | Run time |
---|---|
7m 58s | |
7m 58s | |
7m 58s | |
8m 39s | |
8m 39s | |
8m 39s | |
8m 6s | |
8m 6s | |
8m 6s | |
10m 43s | |
10m 43s | |
10m 43s | |
19m 50s | |
19m 50s | |
19m 50s | |
16m 11s | |
16m 11s | |
16m 11s | |
13m 9s | |
13m 9s | |
13m 9s | |
8m 1s | |
5m 51s | |
8m 1s | |
8m 15s | |
8m 15s | |
8m 15s | |
8m 16s | |
8m 16s | |
8m 16s | |
5h 25m 14s |