Afzal Zubair
BlogAbout
#python

Posts tagged "python"

4 articles

The Hidden Cost of Context Windows: Managing Tokens in Production
AILLMsPython

The Hidden Cost of Context Windows: Managing Tokens in Production

128k tokens sounds like infinite space until you're paying $0.40 per conversation and users are hitting limits mid-session. Here's how I actually manage context in long-running AI applications.

March 18, 20265 min read
Async Python Patterns for AI Backends (That I Learned the Hard Way)
PythonFastAPIAI

Async Python Patterns for AI Backends (That I Learned the Hard Way)

FastAPI and async Python are the obvious choice for AI backends — until you hit subtle concurrency bugs, blocked event loops, and streaming responses that silently drop chunks. Here's how I actually structure these systems.

February 24, 20265 min read
Prompts Are Code: How I Manage Them Like a Senior Engineer
AILLMsPrompt Engineering

Prompts Are Code: How I Manage Them Like a Senior Engineer

A prompt buried in a string literal is a bug waiting to happen. Here's how I version, test, and deploy prompts with the same rigour I'd apply to any production code.

February 10, 20265 min read
RAG is Not Magic: Honest Lessons from Production Retrieval Systems
AILLMsRAG

RAG is Not Magic: Honest Lessons from Production Retrieval Systems

Every RAG demo looks impressive. Production RAG is a different story. Here's what actually breaks, why naive chunking destroys quality, and how I structure retrieval pipelines that hold up under real load.

January 28, 20266 min read
Afzal Zubair

AI & full-stack engineering. Thoughts on LLMs, voice AI, and modern web development.

Navigation

  • Home
  • Blog
  • About

© 2026 Afzal Zubair. Built with Next.js & Tailwind CSS.