#performance
Posts tagged "performance"
2 articles
AIReactPerformance
Why Your LLM App Feels Slow (And It's Not the Model)
An LLM generating 50 tokens/second isn't slow — but if your UI makes the user stare at a spinner for the first 2 seconds, it feels slow. Most LLM latency is a UX problem, not an infrastructure problem.
April 5, 20266 min read
AILLMsPython
The Hidden Cost of Context Windows: Managing Tokens in Production
128k tokens sounds like infinite space until you're paying $0.40 per conversation and users are hitting limits mid-session. Here's how I actually manage context in long-running AI applications.
March 18, 20265 min read