After this lesson, you will be able to: Defend forms and content against bad bots and scrapers using Cloudflare Bot Fight Mode, Turnstile, robots.txt, honeypot fields, and layered user-agent checks, while understanding how scrapers work.
Bots are most of the traffic on the internet. Some are good (search crawlers); many are bad (scrapers, credential stuffers, vulnerability scanners). This lesson covers Cloudflare Bot Fight Mode, Turnstile as a privacy-friendly CAPTCHA, robots.txt and what it does and does not enforce, honeypot fields, user-agent checks, and how understanding scrapers helps you defend against them.
Good bots identify themselves and respect rules: Googlebot, Bingbot, uptime monitors. Bad bots do not: content scrapers that copy your data, credential stuffers replaying leaked logins, and vulnerability scanners probing for known holes. The goal is not to block all automation; it is to let the good ones through and make the bad ones expensive.
Bot Fight Mode (free tier) uses behavioral signals and machine learning to score requests and challenge likely bots. A managed challenge presents a lightweight, often invisible, check; a JS challenge requires the client to run JavaScript (most simple scrapers cannot); block drops the request entirely. Start with managed challenge on sensitive paths so you do not break legitimate users, and reserve block for confirmed-bad patterns.
Turnstile replaces reCAPTCHA without sending users to label crosswalks or tracking them across the web. Render the widget, then verify the token server-side.
// Client: render the widget (gives you a token on success)// <div class="cf-turnstile" data-sitekey={SITE_KEY}></div>// Server: verify the token before trusting the formasync function verifyTurnstile(token: string, ip: string) {const res = await fetch("https://challenges.cloudflare.com/turnstile/v0/siteverify",{method: "POST",headers: { "content-type": "application/json" },body: JSON.stringify({secret: process.env.TURNSTILE_SECRET_KEY,response: token,remoteip: ip,}),},);const data = await res.json();return data.success === true; // reject the request if false}
robots.txt tells well-behaved crawlers which paths to avoid. It is a polite request, not enforcement: malicious bots ignore it, and worse, it advertises the paths you want hidden. Use it to keep search engines out of admin or duplicate pages, never to protect anything sensitive. Sensitive paths need real authentication, not a Disallow line.
A honeypot is a form field hidden from humans with CSS; real users never fill it, naive bots do. User-agent checks are a weak signal (trivially spoofed) but still worth using as one layer.
// Honeypot: hidden field, reject the submission if it is filled// <input name="website" tabindex="-1" autocomplete="off"// style="position:absolute;left:-9999px" aria-hidden="true" />export async function POST(req: Request) {const form = await req.formData();if (form.get("website")) {// A human never fills a hidden field. This is a bot.return new Response("OK", { status: 200 }); // fail silently}// user-agent is spoofable, so treat it as ONE weak signal among manyconst ua = req.headers.get("user-agent") ?? "";if (!ua || /curl|python-requests|scrapy/i.test(ua)) {// raise suspicion / add friction, do not rely on this alone}}
Modern scrapers use headless browsers (Puppeteer, Playwright) that run real JavaScript, so JS challenges alone do not stop them. They look for stable HTML structure and predictable API endpoints. You make scraping harder without harming users by: requiring auth for bulk data, paginating and rate-limiting API responses, avoiding exposing a clean JSON API that returns everything, and using Turnstile on the endpoints that matter. You will never make a determined scraper impossible; you make it expensive enough that they move on.
Pick the correct statement.
Treating user-agent as proof of identity instead of a weak hint. Putting sensitive paths in robots.txt and thereby advertising them. A honeypot field with a name like 'honeypot' that bots learn to skip (name it something plausible like 'website' or 'phone2'). Blocking by IP so aggressively that you ban a whole corporate NAT or a mobile carrier. Forgetting that headless-browser scrapers pass JS challenges, so layering (auth + rate limit + Turnstile) matters more than any single trick.
Sign in and purchase access to unlock this lesson.