Posts tagged "python"
4 articles
The Hidden Cost of Context Windows: Managing Tokens in Production
128k tokens sounds like infinite space until you're paying $0.40 per conversation and users are hitting limits mid-session. Here's how I actually manage context in long-running AI applications.
Async Python Patterns for AI Backends (That I Learned the Hard Way)
FastAPI and async Python are the obvious choice for AI backends — until you hit subtle concurrency bugs, blocked event loops, and streaming responses that silently drop chunks. Here's how I actually structure these systems.
Prompts Are Code: How I Manage Them Like a Senior Engineer
A prompt buried in a string literal is a bug waiting to happen. Here's how I version, test, and deploy prompts with the same rigour I'd apply to any production code.
RAG is Not Magic: Honest Lessons from Production Retrieval Systems
Every RAG demo looks impressive. Production RAG is a different story. Here's what actually breaks, why naive chunking destroys quality, and how I structure retrieval pipelines that hold up under real load.