generator = load_csm_1b_local(model_path="/mnt/d/csm-chekpoints/", device="cuda")
text = "Мени азыр угуп жатасыздарбы?"
generate_streaming_audio(
generator=generator,
text=text,
speaker=0,
context=[], # No context needed for basic generation
output_file="streaming_audio.wav",
)
Are these steps correct to use a finetuned model?
config.jsonandmodel.safetensorsloadandmergecheckpoint.pyfile and got anothermodel.safetensorsfileFor inference I used
load_csm_1b_localfunction: