#performance

Posts tagged "performance"

2 articles

Why Your LLM App Feels Slow (And It's Not the Model)

An LLM generating 50 tokens/second isn't slow — but if your UI makes the user stare at a spinner for the first 2 seconds, it feels slow. Most LLM latency is a UX problem, not an infrastructure problem.

April 5, 20266 min read

AILLMsPython

The Hidden Cost of Context Windows: Managing Tokens in Production

128k tokens sounds like infinite space until you're paying $0.40 per conversation and users are hitting limits mid-session. Here's how I actually manage context in long-running AI applications.

March 18, 20265 min read