simple_LLM Some simple or fake implementations of LLM infra function todo: training: parallel_implementation inference (vllm features): clean_vllm chunked paged attention cuda algorithm flash attention