Cryptography Fundamentals for AI

After this lesson, you will be able to: Understand the cryptographic primitives AI practitioners increasingly need: hashing and encryption for protecting data and models, and the privacy-preserving techniques (federated learning, differential privacy, homomorphic encryption) shaping responsible AI.

AI systems handle sensitive data and valuable models, so practitioners increasingly need cryptography literacy. This is a shorter, applied counterpart to the Cybersecurity Cryptography subtrack: the primitives you need to protect data and models, plus the privacy-preserving ML techniques that come up in interviews and responsible-AI work.

Prerequisites:What Is AI? Demystifying the Field

Why AI practitioners need cryptography

Training data often contains personal or proprietary information; trained models are expensive assets worth stealing; and inference requests can leak what users ask. Protecting data at rest and in transit, securing API keys, and reasoning about privacy are now part of building AI responsibly. You do not need to be a cryptographer, but you need to know which primitive protects what.

The primitives you will use

Hashing (SHA-256) for integrity: verify a dataset or model file has not been tampered with, and deduplicate or anonymize identifiers (carefully, since hashing alone is not anonymization for low-entropy values). Symmetric encryption (AES-256-GCM) for data at rest and in transit. Asymmetric crypto and TLS for securing the API calls your app makes to model providers. And secure secret storage for the API keys that, if leaked, let anyone run up your bill or access your data. These mirror the Cybersecurity crypto subtrack, applied to ML pipelines.

Privacy-preserving machine learning

Three techniques come up repeatedly. Federated learning trains a shared model across many devices without the raw data ever leaving each device (only model updates are shared). Differential privacy adds calibrated noise so a model or query cannot reveal whether any single individual was in the training data, with a tunable privacy budget. Homomorphic encryption allows computation directly on encrypted data, so a server can run inference without ever seeing the plaintext input (powerful but still expensive). Knowing what each provides, and its cost, is increasingly expected in AI roles touching sensitive data.

Practical guidance

Encrypt training data and model artifacts at rest; use TLS for every provider API call (it is on by default, do not disable certificate validation). Store API keys in a secrets manager, never in notebooks or committed code. Treat hashing as integrity, not anonymization, for sensitive low-entropy fields. Reach for differential privacy when publishing models or statistics derived from personal data. And as always, use vetted libraries; never implement crypto yourself.

Quick Check

Which technique lets multiple devices contribute to training a shared model without their raw data ever leaving the device?

Pick one.

Federated learning (only model updates are shared, not raw data)Homomorphic encryptionSHA-256 hashingECB-mode encryption

Common mistakes only experienced practitioners catch

Treating a hash of an email or phone number as anonymization (low-entropy values are trivially reversed by guessing). Committing API keys in notebooks shared on GitHub. Disabling TLS verification to 'make the API work.' Assuming differential privacy is free of accuracy cost (it trades privacy for utility via the noise budget). Reinventing crypto. Forgetting that the model itself can memorize and leak training data, which is what these techniques mitigate.

←AI Project: Build Something with AI

Back to Artificial Intelligence

Passion Project: Anthropic + Next.js + Vercel→