It seems like the current implementation is forever increasing the kvbuffer, which seems to go against streaming principles where you'd want to be able to have a continuous pipe and not blow up VRAM? I.e. when streaming we'd want to keep vram low by using a sliding window of context, no?
It seems like the current implementation is forever increasing the kvbuffer, which seems to go against streaming principles where you'd want to be able to have a continuous pipe and not blow up VRAM? I.e. when streaming we'd want to keep vram low by using a sliding window of context, no?