The LeepCast Blog

Stories worth your attention

No hype, no filler — reporting and analysis on AI, software, hardware, and security, written for the people who build things.

Reasoning Models: When to Pay a Model to Think
AI & ML

Reasoning Models: When to Pay a Model to Think

A new class of models thinks before it answers — burning extra time and tokens to reason through hard problems. Sometimes that’s transformative. Often it’s an expensive way to answer a question a fast model already nails.

Dhileep KumarJun 13, 20267 min read
Getting Real Work Out of AI Coding Agents
AI & ML

Getting Real Work Out of AI Coding Agents

Most people use Claude Code and Cursor like fancy autocomplete and wonder why the magic runs out. The teams getting real leverage treat them as agents you delegate whole tasks to — and that’s a different skill.

Dhileep KumarJun 13, 20267 min read
Prompt Engineering That Actually Works
AI & ML

Prompt Engineering That Actually Works

People keep declaring prompt engineering dead. Meanwhile it’s still the cheapest, fastest lever you have on output quality — if you know the handful of techniques that actually move the needle.

Dhileep KumarJun 13, 20267 min read
Choosing a Vector Database in 2026
AI & ML

Choosing a Vector Database in 2026

Every RAG app and AI feature with memory needs somewhere to store embeddings and search them fast. There are a dozen vector databases now — but the real choice is simpler than the marketing makes it look.

Dhileep KumarJun 13, 20267 min read
Streaming LLM Responses: Real-Time UX for AI Apps
AI & ML

Streaming LLM Responses: Real-Time UX for AI Apps

Waiting ten seconds for a full AI response feels broken. Streaming the answer token by token, the moment it starts generating, is the difference between an app that feels slow and one that feels alive. Here’s how.

Dhileep KumarJun 12, 20267 min read
Semantic Caching: Cut Your LLM Bill Without Hurting Quality
AI & ML

Semantic Caching: Cut Your LLM Bill Without Hurting Quality

Teams are quietly killing AI features — not because they don’t work, but because the token bill doesn’t justify them. Semantic caching is the fix: serve a cached answer when someone asks the same thing in different words.

Dhileep KumarJun 12, 20267 min read
Multi-Agent Orchestration: Coordinating Specialist LLM Agents
AI & ML

Multi-Agent Orchestration: Coordinating Specialist LLM Agents

One agent doing everything turns into a confused generalist with a bloated prompt. The fix that’s everywhere in 2026 is orchestration — a coordinator routing work to focused specialist agents. Here’s how to build one.

Dhileep KumarJun 12, 20267 min read
GraphRAG: When Vector Search Isn’t Enough
AI & ML

GraphRAG: When Vector Search Isn’t Enough

Vector RAG finds paragraphs that look similar to your question. But some questions are about how things connect — and for those you need a graph. Here’s when plain retrieval breaks, and how GraphRAG fixes it.

Dhileep KumarJun 12, 20267 min read
Guardrails for LLM Apps: Stopping Prompt Injection and Bad Output
AI & ML

Guardrails for LLM Apps: Stopping Prompt Injection and Bad Output

An LLM app takes untrusted text in and sends model-generated text out — two open doors. Guardrails are the checks on both sides that keep prompt injection, leaked data, and bad output from reaching anyone.

Dhileep KumarJun 11, 20267 min read
LLM Observability: Tracing What Your AI Does in Production
AI & ML

LLM Observability: Tracing What Your AI Does in Production

You shipped the LLM feature, the demo worked, and now it’s a black box serving real users. Observability is how you see what your AI is actually doing — before a silent quality drop becomes a support queue.

Dhileep KumarJun 11, 20267 min read
Giving Your AI Agent Memory
AI & ML

Giving Your AI Agent Memory

An agent that forgets everything the moment a session ends isn’t an assistant — it’s a very smart goldfish. Memory is the discipline that turns a stateless model into something that actually knows you.

Dhileep KumarJun 11, 20267 min read
Fine-Tuning a Small Model with LoRA
AI & ML

Fine-Tuning a Small Model with LoRA

Prompting and RAG cover most needs, but sometimes you need the model itself to change. LoRA made fine-tuning cheap enough to do on a single GPU — here’s when it’s worth it and how it actually works.

Dhileep KumarJun 11, 20267 min read
Structured Outputs: Getting Reliable JSON and Tool Calls from LLMs
AI & ML

Structured Outputs: Getting Reliable JSON and Tool Calls from LLMs

Ask a model for JSON and it’ll often hand you prose, a markdown fence, and a trailing apology. For anything automated, “often valid” is a bug. Here’s how to make an LLM return data your code can actually trust.

Dhileep KumarJun 10, 20267 min read
Context Engineering: Managing What Your LLM Actually Sees
AI & ML

Context Engineering: Managing What Your LLM Actually Sees

Prompt engineering was about choosing your words. Context engineering is about everything else in the window — what you put in, what you leave out, and in what order. It’s the discipline that separates an LLM demo from an LLM product.

Dhileep KumarJun 10, 20268 min read
Building a RAG Pipeline That Actually Works
AI & ML

Building a RAG Pipeline That Actually Works

Bolting a vector database onto an LLM gives you a demo. Getting it to answer real questions over real documents is an engineering problem — chunking, retrieval, reranking, and knowing when not to retrieve at all. Here’s the pipeline that survives production.

Dhileep KumarJun 10, 20268 min read
How to Evaluate LLM Apps: Evals That Catch Failures Before Production
AI & ML

How to Evaluate LLM Apps: Evals That Catch Failures Before Production

You can’t assertEquals a language model. That’s why teams ship LLM features blind and find the regressions in production. Evals are the missing discipline — here’s how to build ones that actually catch failures.

Dhileep KumarJun 10, 20267 min read
Running LLMs Locally: Ollama vs vLLM in 2026
AI & ML

Running LLMs Locally: Ollama vs vLLM in 2026

Open models are good enough now that running one on your own hardware is a real choice, not a hobby. The decision usually comes down to two tools — Ollama for ease, vLLM for throughput. Here’s how to pick and run.

Dhileep KumarJun 10, 20267 min read
Model Context Protocol Explained: Build Your First MCP Server
AI & ML

Model Context Protocol Explained: Build Your First MCP Server

Every AI tool used to reinvent its own integrations. The Model Context Protocol turned that M×N mess into a standard — and in eighteen months it became the USB-C of AI apps. Here’s what it is and how to ship a server.

Dhileep KumarJun 10, 20267 min read
Raspberry Pi AI Projects for Beginners
Hardware

Raspberry Pi AI Projects for Beginners

Most people meet AI through a text box, which hides the most interesting part: making a model do something in the physical world. A Raspberry Pi is the cheapest, friendliest way to cross that line — and its limits are the lesson.

Dhileep KumarJun 9, 20267 min read
Deploying LLM Apps on GKE, Step by Step
Software

Deploying LLM Apps on GKE, Step by Step

There’s a wide, quiet gap between an LLM app that works on your laptop and one that survives real users on Kubernetes. GKE closes a lot of it — but only if you know which parts it solves and which it leaves to you.

Dhileep KumarJun 9, 20268 min read
AI API Gateway Architecture, Explained
Software

AI API Gateway Architecture, Explained

Every team that ships more than one LLM feature ends up building the same box in front of the model — usually by accident. Here’s what an AI gateway actually does, a reference design, and the mistakes that bite a year later.

Dhileep KumarJun 9, 20267 min read
Building AI Agents Using Java + Spring Boot
AI & ML

Building AI Agents Using Java + Spring Boot

Everyone reaches for Python to build agents. But an agent is mostly plumbing — a supervised loop around a model — and Spring Boot has been quietly excellent at plumbing for fifteen years. Here’s how to build one where your data already lives.

Dhileep KumarJun 9, 20267 min read
AI Interview Prep Tools That Actually Work
AI & ML

AI Interview Prep Tools That Actually Work

Most AI job tools overpromise. Interview prep is the rare corner where they quietly overdeliver — if you use them to rehearse, and ignore the ones that promise to take the interview for you.

Dhileep KumarJun 8, 20266 min read
Can AI Build a Resume Better Than Humans?
AI & ML

Can AI Build a Resume Better Than Humans?

Hand the same career history to a person and a model and you get two very different resumes. The honest answer to which is better isn’t one or the other — it’s what happens when you stop treating it as a contest.

Dhileep KumarJun 8, 20266 min read
The Best AI Tools for Job Hunting in 2026
AI & ML

The Best AI Tools for Job Hunting in 2026

Every week there’s a new AI tool promising to land you a job. Most are thin wrappers around a chatbot. Here’s how to tell the few that matter from the noise — organized by the job you’re actually trying to get done.

Dhileep KumarJun 8, 20267 min read
How I Use AI to Apply to 100 Jobs Automatically
AI & ML

How I Use AI to Apply to 100 Jobs Automatically

I built an AI pipeline that scans, scores, and tailors applications across hundreds of postings. The twist: the goal was never to apply to more jobs — it was to apply to far fewer, and mean it.

Dhileep KumarJun 8, 20268 min read
The agents are here: what autonomous coding means for engineers
AI & ML

The agents are here: what autonomous coding means for engineers

A new wave of AI agents can plan, write, and ship code end to end. We dig into what actually works today — and what's still hype.

Dhileep KumarJun 6, 20267 min read
Why everyone is rewriting their tooling in Rust
Software

Why everyone is rewriting their tooling in Rust

From bundlers to linters, the JS ecosystem is going native. Here's the performance story behind the migration.

Dhileep KumarJun 5, 20266 min read
The new silicon: a closer look at on-device AI chips
Hardware

The new silicon: a closer look at on-device AI chips

NPUs are landing in everything from phones to laptops. What they can do, and why it matters for privacy.

Dhileep KumarJun 4, 20266 min read
Supply chain attacks are evolving — here's how teams respond
Security

Supply chain attacks are evolving — here's how teams respond

A practical playbook for hardening your dependency graph without grinding shipping to a halt.

Dhileep KumarJun 3, 20267 min read