The POC was brilliant. Then reality hit: 3-second response times, $30K monthly API bills, and hallucinations under load. What worked for 10 test cases broke at 10,000 real ones.
Solution: Small models built for production - millisecond responses, predictable costs.