BiTree
  • Search For Lessons
  • Curriculum
  • Pricing
  • For Educators
  • Become a Tutor
  • About
  • Contact
Log InGet Started

Questions, concerns, bug reports, or suggestions? We read every message, write to us at [email protected].

More ways to reach us →
BiTree

Live coding lessons for aspiring developers and security professionals.

[email protected]

(201) 785-7951

Mon–Fri, 9 AM–5 PM EST

Learn

  • Search For Lessons
  • Curriculum
  • Pricing

Company

  • About
  • For Educators & Schools
  • Become a Tutor
  • Contact Us

Legal

  • Terms of Service
  • Privacy Policy
© 2026 BiTree. All rights reserved.
Curriculum/Artificial Intelligence/AI Product Development
50 minIntermediate

AI Product Development

After this lesson, you will be able to: Take an AI idea to a deployed feature: scope, cost, latency, UX, and the honest question of when AI is the wrong tool.

Most AI product launches fail not on capability but on cost, latency, or UX framing. This lesson covers the questions every senior AI engineer asks before writing the first prompt.

Prerequisites:Building AI Evaluations

The four questions before you write a prompt

1. What's the user-visible job? Describe it in one sentence without the word 'AI'. 2. What does failure cost? A wrong word in a draft is recoverable; a wrong word in a medication dose is not. 3. Is this a hidden classifier or a generative feature? Many 'AI features' are really classifiers (route, tag, score) and would work better with a simpler approach. 4. Where does the user catch errors? If the model is wrong, can the user see it? Can they fix it cheaply?

💡 When AI is the wrong tool

Deterministic logic: 'parse this CSV' is a regex job, not an LLM job. High-frequency / latency-sensitive: real-time keystrokes, low-latency game logic. High-stakes single-shot: medical diagnosis, legal verdicts, anything where 'mostly right' is unacceptable. Sensitive data with strict residency rules: open-source on-prem may fit, frontier APIs often don't. Pick a non-AI baseline FIRST. If the baseline ships in a day and the AI ships in a month for the same outcome, ship the baseline.

Cost estimation: how to do the napkin math

Anthropic Sonnet 4.x pricing is roughly $3/M input, $15/M output tokens (verify at anthropic.com/pricing — these change). Use these to ballpark.

python
# Per-request cost example
input_tokens = 2000 # system prompt + user content + context
output_tokens = 500 # response budget
rate_in = 3 / 1_000_000 # $/token
rate_out = 15 / 1_000_000 # $/token
cost_per_request = input_tokens * rate_in + output_tokens * rate_out
print(f"${cost_per_request:.4f} per request")
# $0.0135 per request
# At scale
daily_requests = 10_000
monthly_cost = cost_per_request * daily_requests * 30
print(f"${monthly_cost:.2f}/month")
# $4,050/month — surprises companies that 'just want to try AI'
# Cost levers in order of impact:
# 1. Use a smaller model (Haiku is ~10x cheaper than Sonnet)
# 2. Use prompt caching for static system prompts (Anthropic: ~90% cheaper on cached input)
# 3. Cap max_tokens hard
# 4. Batch off-realtime requests (Anthropic Message Batches: ~50% off)

Latency budgets

Chat UI: 200ms first token feels instant, 1s feels normal, 3s feels slow, 5s+ feels broken. Always stream tokens so the user sees output starting immediately. Background job: latency is irrelevant; throughput and cost are. Use batch APIs and queues. On-keystroke (autocomplete): need sub-200ms total. Use a small fast model, aggressive caching, and pre-warm.

The 30-day AI feature launch checklist

Walk this list before shipping any AI feature to real users.

  1. 1

    Eval set with at least 20 cases is in place (see ai-evals)

  2. 2

    Cost-per-request projected at expected scale; finance has signed off

  3. 3

    Latency target is documented and tested at p50 and p99

  4. 4

    User knows they're talking to AI; the failure mode is visible (citations, regenerate button, edit button)

  5. 5

    There's a non-AI fallback (or the feature degrades to plain text) when the API is down

  6. 6

    Rate limiting per user/team is in place; runaway prompts can't bankrupt you

  7. 7

    PII handling is documented; user data is not sent to a model that trains on it without consent

  8. 8

    Logging captures prompts + responses with PII redaction; you can audit any complaint

Common mistakes only experienced AI PMs avoid

Treating AI features as identical to non-AI features in design reviews. They have different failure modes, different cost shape, different UX requirements. Promising determinism. LLMs are non-deterministic by default; users WILL get different answers on identical inputs. Set expectation. Skipping the kill switch. Have a config flag that disables the AI path and falls back to a non-AI alternative; you'll need it. Forgetting model deprecation. Frontier models retire on 12-month cycles. Build a model abstraction now, not when you're scrambling.

Quick Check

Which cost lever has the biggest impact on a high-traffic AI feature?

Pick the highest-leverage option.

Sign in and purchase access to unlock this lesson.

Sign in to purchase
←Building AI Evaluations
Back to Artificial Intelligence
AI Project: Build Something with AI→