When I train MOE Model SFT using megatron as backend, can I use data packing? If I can, could you tell me how i set the config
When I train MOE Model SFT using megatron as backend, can I use data packing? If I can, could you tell me how i set the config