Blog | kuncoro.io

Blog

2026

SGLang Diffusion: Why Serving Diffusion Models Is Nothing Like LLM Serving

How SGLang extends from LLM to diffusion model serving for image and video generation, and why the compute patterns, memory management, and parallelism strategies are fundamentally different.
HolmesGPT Wiki: AI-Powered Troubleshooting for Cloud Native

A comprehensive guide to HolmesGPT, the CNCF Sandbox AI agent that investigates Kubernetes and cloud native incidents using an agentic loop with live observability data.
Prefill-Decode Disaggregation in SGLang: Why Splitting LLM Inference Phases Changes Everything

How SGLang's PD disaggregation separates the compute-heavy prefill phase from memory-bound decode, delivering 2-3x throughput gains for production LLM serving.
Kubernetes Gang Scheduling for LLM Inference: From Volcano to OME

How gang scheduling prevents deadlocks in multi-GPU LLM serving, and why OME with Kueue makes it declarative.
Why Monorepos Are a Huge Enabler for Agentic Coding

Five reasons why monorepos unlock more effective AI-assisted coding and debugging in large systems, plus the trade-offs to consider.
SGLang Prometheus Metrics: A Guide for Production Monitoring
Goals for 2026

2025

Understanding Mini-SGLang: RadixAttention and Overlap Scheduling

2019