Afzal Zubair
BlogAbout
#performance

Posts tagged "performance"

2 articles

Why Your LLM App Feels Slow (And It's Not the Model)
AIReactPerformance

Why Your LLM App Feels Slow (And It's Not the Model)

An LLM generating 50 tokens/second isn't slow — but if your UI makes the user stare at a spinner for the first 2 seconds, it feels slow. Most LLM latency is a UX problem, not an infrastructure problem.

April 5, 20266 min read
The Hidden Cost of Context Windows: Managing Tokens in Production
AILLMsPython

The Hidden Cost of Context Windows: Managing Tokens in Production

128k tokens sounds like infinite space until you're paying $0.40 per conversation and users are hitting limits mid-session. Here's how I actually manage context in long-running AI applications.

March 18, 20265 min read
Afzal Zubair

AI & full-stack engineering. Thoughts on LLMs, voice AI, and modern web development.

Navigation

  • Home
  • Blog
  • About

© 2026 Afzal Zubair. Built with Next.js & Tailwind CSS.