Skip to content
View JonSnow1807's full-sized avatar
  • Boston, MA

Block or report JonSnow1807

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 250 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse
JonSnow1807/README.md

Chinmay Shrivastava

Software Engineer ร— AI/ML Developer ร— Performance Architect

LinkedIn ย  Email ย  Hugging Face

ย  ย  ย 



About Me

I'm a software engineer who transforms complex challenges into elegant solutions that scale. From optimizing CUDA kernels for 1.46x speedups to building real-time platforms with sub-500ms latency, I thrive at the intersection of technical excellence and business impact.

My approach is simple: measure twice, optimize once, ship constantly. Whether it's achieving 94% accuracy in production ML systems or rendering 1M+ points at 858 FPS, I believe in pushing the boundaries of what's possible while keeping the user experience at the center.

Currently seeking opportunities to tackle meaningful challenges at companies building the future.


๐Ÿ† Impact-Driven Projects

๐Ÿค– Intelligent Knowledge Assistant

94% accuracy 120ms latency Production RAG

Built a production RAG system with fine-tuned Llama-3.1-8B that matches GPT-4 quality at a fraction of the cost. Implemented custom attention caching that reduced latency by 73%, enabling real-time responses.

Technical Deep Dive
  • Architecture: Hierarchical vector indexing with FAISS
  • Innovation: Custom KV-cache optimization for transformers
  • Stack: PyTorch, LangChain, FastAPI, PostgreSQL
  • Deployment: Kubernetes with horizontal autoscaling

๐ŸŽฌ Real-time Collaboration Platform

<500ms sync WebSocket protocol 85% bandwidth optimized

Created a video watch party platform with perfect synchronization across distributed clients. Engineered a binary WebSocket protocol with delta compression, achieving sub-500ms latency for seamless real-time collaboration.

Technical Deep Dive
  • Protocol: Custom binary format over WebSocket
  • Scaling: Redis pub/sub for horizontal distribution
  • Stack: React, NestJS, Socket.IO, Redis
  • Security: JWT with room-based permissions

โšก GPU Performance Engineering

1.46x speedup 95.3% bandwidth utilization Kernel fusion

Developed fused CUDA kernels for transformer models, achieving near-theoretical memory bandwidth utilization. This optimization enables significantly faster inference for large language models through innovative kernel fusion techniques.

Technical Deep Dive
  • Technique: Kernel fusion for LayerNorm + Activation
  • Memory: Coalesced access patterns, shared memory
  • Stack: CUDA C++, PyTorch extensions, nvprof
  • Impact: 46% inference speedup for LLMs

๐ŸŽฎ High-Performance 3D Visualization

858 FPS 1M+ points 7.2x faster

Built a 3D point cloud viewer that outperforms industry standards by 7.2x. Implemented custom spatial indexing and SIMD optimizations to achieve real-time rendering of massive datasets.

Technical Deep Dive
  • Algorithm: Custom octree with frustum culling
  • Rendering: Instanced drawing with GPU batching
  • Stack: C++17, OpenGL 4.5, GLM, ImGui
  • Optimization: SIMD intrinsics for transforms


๐Ÿ›  Technical Expertise

Python
Python
TypeScript
TypeScript
C++
C++
React
React
PyTorch
PyTorch
Docker
Docker
Kubernetes
Kubernetes
Systems
Systems
๐Ÿ“š View Complete Tech Stack
Core Languages:
  Expert: [Python, TypeScript, C++, JavaScript]
  Proficient: [CUDA, SQL, Bash]

AI/ML Stack:
  Frameworks: [PyTorch, Transformers, LangChain, scikit-learn]
  Techniques: [Fine-tuning, RAG, Embeddings, Vector Search]
  Production: [ONNX, TensorRT, Model Quantization, Batching]
  
Backend Engineering:
  Python: [FastAPI, Django, Flask, Celery]
  Node.js: [NestJS, Express, Socket.IO, Bull]
  APIs: [REST, GraphQL, gRPC, WebSockets]
  
Frontend Development:
  Core: [React, Next.js, Redux, TypeScript]
  UI: [Tailwind CSS, Material-UI, Framer Motion]
  Advanced: [Three.js, D3.js, WebRTC, Canvas API]
  
Data & Infrastructure:
  Databases: [PostgreSQL, MongoDB, Redis, Elasticsearch]
  Vector DBs: [Pinecone, FAISS, Chroma, Qdrant]
  Message Queues: [RabbitMQ, Kafka, Redis Pub/Sub]
  
DevOps & Cloud:
  Containers: [Docker, Docker Compose, Buildkit]
  Orchestration: [Kubernetes, Helm, ArgoCD]
  CI/CD: [GitHub Actions, GitLab CI, Jenkins]
  Cloud: [AWS (EC2, S3, Lambda), GCP, Vercel]
  
Performance & Systems:
  GPU: [CUDA, cuDNN, Thrust, OptiX]
  CPU: [SIMD, OpenMP, Threading, Profiling]
  Graphics: [OpenGL, Vulkan, Shaders]

๐Ÿ’ก Engineering Philosophy



User First
Every optimization should improve the user experience


Data Driven
Measure twice, optimize once, validate always


Ship Fast
Perfect tomorrow loses to good today


Think Scale
Build for 10x growth from day one

๐Ÿ“ˆ What I Bring to Your Team

Capability Evidence
๐Ÿ—๏ธ Full Product Ownership Shipped end-to-end solutions from concept to production
โšก Performance Excellence 1.46x-7.2x improvements across different domains
๐Ÿ“Š Production Experience Deployed scalable systems with real-world usage
๐ŸŽฏ Technical Precision 94% ML accuracy, 95.3% GPU efficiency achieved
๐Ÿš€ Rapid Execution From idea to MVP in days, not months

๐ŸŽฏ Looking For My Next Adventure

I'm excited about joining teams that are:

  • Building products that matter - Real problems, real impact, real users
  • Pushing technical boundaries - Where "impossible" is just another challenge
  • Moving fast with purpose - Velocity with vision, not just for speed's sake
  • Creating the future - Not just following trends, but setting them

Open to Opportunities In:


๐Ÿ“ฌ Let's Connect


I'm always excited to discuss challenging problems and explore how I can contribute to your team's success.

Whether you're building the next breakthrough in AI, scaling systems to billions, or creating products that change lives - let's talk.


ย  ย 





Status: Actively seeking new opportunities | Availability: Immediate | Location: Flexible/Remote

Pinned Loading

  1. llm-knowledge-assistant llm-knowledge-assistant Public

    Production-ready RAG system with fine-tuned Llama-3.1-8B for expert-level domain Q&A

    Python

  2. Fused-LayerNorm-CUDA-Operator Fused-LayerNorm-CUDA-Operator Public

    High-performance CUDA implementation of LayerNorm for PyTorch achieving 1.46x speedup through kernel fusion. Optimized for large language models (4K-8K hidden dims) with vectorized memory access, wโ€ฆ

    Python

  3. Mustard-Watch-Party Mustard-Watch-Party Public

    Real-time video synchronization platform for YouTube watch parties. Built with React, NestJS, Socket.IO WebSockets, PostgreSQL & Prisma ORM. Features <500ms sync latency, multi-user rooms, JWT authโ€ฆ

    TypeScript

  4. pytorch-autotune pytorch-autotune Public

    ๐Ÿš€ 2-4x faster PyTorch training with one line of code. Beats torch.compile by 79%. Zero config, automatic hardware optimization for T4/V100/A100/H100 GPUs.

    Python

  5. student-scheduler student-scheduler Public

    Google OR-Tools constraint solver scheduling 500+ students. Flask/PostgreSQL/Redis backend, Docker/K8s deployment, CI/CD. Zero conflicts, 60-second optimization.

    Python