kuncoro.io
blog / talks / tools / shelf / playground
Blog
2026
  • SGLang Diffusion: Why Serving Diffusion Models Is Nothing Like LLM Serving
    How SGLang extends from LLM to diffusion model serving for image and video generation, and why the compute patterns, memory management, and parallelism strategies are fundamentally different.
  • HolmesGPT Wiki: AI-Powered Troubleshooting for Cloud Native
    A comprehensive guide to HolmesGPT, the CNCF Sandbox AI agent that investigates Kubernetes and cloud native incidents using an agentic loop with live observability data.
  • Prefill-Decode Disaggregation in SGLang: Why Splitting LLM Inference Phases Changes Everything
    How SGLang's PD disaggregation separates the compute-heavy prefill phase from memory-bound decode, delivering 2-3x throughput gains for production LLM serving.
  • Kubernetes Gang Scheduling for LLM Inference: From Volcano to OME
    How gang scheduling prevents deadlocks in multi-GPU LLM serving, and why OME with Kueue makes it declarative.
  • Why Monorepos Are a Huge Enabler for Agentic Coding
    Five reasons why monorepos unlock more effective AI-assisted coding and debugging in large systems, plus the trade-offs to consider.
  • SGLang Prometheus Metrics: A Guide for Production Monitoring
  • Goals for 2026
2025
  • Understanding Mini-SGLang: RadixAttention and Overlap Scheduling
2019
  • Kubernetes Cluster Creation on Baremetal Host Using Cluster API
  • Intro to Open Policy Agent
© 2026 | kuncoro.io