AI Product Development

After this lesson, you will be able to: Take an AI idea to a deployed feature: scope, cost, latency, UX, and the honest question of when AI is the wrong tool.

Most AI product launches fail not on capability but on cost, latency, or UX framing. This lesson covers the questions every senior AI engineer asks before writing the first prompt.

Prerequisites:Building AI Evaluations

The four questions before you write a prompt

1. What's the user-visible job? Describe it in one sentence without the word 'AI'. 2. What does failure cost? A wrong word in a draft is recoverable; a wrong word in a medication dose is not. 3. Is this a hidden classifier or a generative feature? Many 'AI features' are really classifiers (route, tag, score) and would work better with a simpler approach. 4. Where does the user catch errors? If the model is wrong, can the user see it? Can they fix it cheaply?

Cost estimation: how to do the napkin math

Anthropic Sonnet 4.x pricing is roughly $3/M input, $15/M output tokens (verify at anthropic.com/pricing — these change). Use these to ballpark.

python

# Per-request cost example
input_tokens  = 2000   # system prompt + user content + context
output_tokens = 500    # response budget

rate_in  = 3 / 1_000_000   # $/token
rate_out = 15 / 1_000_000  # $/token

cost_per_request = input_tokens * rate_in + output_tokens * rate_out
print(f"${cost_per_request:.4f} per request")
# $0.0135 per request

# At scale
daily_requests = 10_000
monthly_cost = cost_per_request * daily_requests * 30
print(f"${monthly_cost:.2f}/month")
# $4,050/month — surprises companies that 'just want to try AI'

# Cost levers in order of impact:
# 1. Use a smaller model (Haiku is ~10x cheaper than Sonnet)
# 2. Use prompt caching for static system prompts (Anthropic: ~90% cheaper on cached input)
# 3. Cap max_tokens hard
# 4. Batch off-realtime requests (Anthropic Message Batches: ~50% off)

Latency budgets

Chat UI: 200ms first token feels instant, 1s feels normal, 3s feels slow, 5s+ feels broken. Always stream tokens so the user sees output starting immediately. Background job: latency is irrelevant; throughput and cost are. Use batch APIs and queues. On-keystroke (autocomplete): need sub-200ms total. Use a small fast model, aggressive caching, and pre-warm.

The 30-day AI feature launch checklist

Walk this list before shipping any AI feature to real users.

1
Eval set with at least 20 cases is in place (see ai-evals)
2
Cost-per-request projected at expected scale; finance has signed off
3
Latency target is documented and tested at p50 and p99
4
User knows they're talking to AI; the failure mode is visible (citations, regenerate button, edit button)
5
There's a non-AI fallback (or the feature degrades to plain text) when the API is down
6
Rate limiting per user/team is in place; runaway prompts can't bankrupt you
7
PII handling is documented; user data is not sent to a model that trains on it without consent
8
Logging captures prompts + responses with PII redaction; you can audit any complaint

Common mistakes only experienced AI PMs avoid

Treating AI features as identical to non-AI features in design reviews. They have different failure modes, different cost shape, different UX requirements. Promising determinism. LLMs are non-deterministic by default; users WILL get different answers on identical inputs. Set expectation. Skipping the kill switch. Have a config flag that disables the AI path and falls back to a non-AI alternative; you'll need it. Forgetting model deprecation. Frontier models retire on 12-month cycles. Build a model abstraction now, not when you're scrambling.

Quick Check

Which cost lever has the biggest impact on a high-traffic AI feature?

Pick the highest-leverage option.

Switching from Sonnet to Haiku for tasks Haiku can handleReducing max_tokens by 10%Switching the framework from LangChain to a direct SDK callSwitching from English to a shorter language

←Building AI Evaluations

Back to Artificial Intelligence

AI Project: Build Something with AI→