Fetch the web.
Get clean Markdown.
Turn server-rendered docs into clean Markdown your agents can actually use. No browser, no API key, no data leaving your machine.
pip install docpullSee examplesBest for static docs, API references, and server-rendered sites. JS-rendered SPAs are detected and skipped — pass --strict-js-required to make that an error so your agent can route elsewhere.
How it works
Three steps from URL to usable Markdown.
Point
Give docpull a docs URL, public or gated.
Fetch
It discovers pages, respects robots.txt, and converts server HTML.
Use
Use the Markdown in search, RAG, offline archives, or skills.
Features
The boring pieces that make documentation ingestion dependable.
Markdown Agents Can Use
Every page includes clean Markdown plus frontmatter for title, source URL, headings, and description. Drop it into RAG, search, or a skill directory.
No Duplicate Slop
Pages are SHA-256 hashed while they stream in, so duplicates are caught before they hit disk instead of cleaned up later.
Safe for Agent-Chosen URLs
HTTPS-only, robots.txt compliant, SSRF-protected, and DNS-pinned at connect time. Use --require-pinned-dns when proxy settings weaken that guarantee.
Cheap to Re-run
Cached pages use If-None-Match and If-Modified-Since. Re-runs fetch what changed, and saved frontier state lets interrupted crawls resume.
Crawl the Parts That Matter
Include and exclude path globs during discovery, so your model gets the relevant docs instead of every route the site exposes.
Profiles
Choose the output shape before you crawl.
RAG
Clean Markdown with metadata and deduping for retrieval.
docpull URL --profile ragMirror
A fuller local archive with cache, resume, and stable paths.
docpull URL --profile mirrorQuick
A 50-page sample when you need to inspect output first.
docpull URL --profile quickLLM
Token-aware NDJSON chunks that skip JS-only pages unless strict mode is enabled.
docpull URL --profile llm --stream | jq .Examples
See the command, then see the artifact it leaves behind.
docpull https://docs.stripe.com./docs/authentication.md:
---
title: "Authentication"
source: https://docs.stripe.com/authentication
---
# Authentication
The Stripe API uses API keys to authenticate requests.
You can view and manage your API keys in the Stripe
Dashboard.
Test mode secret keys have the prefix sk_test_ and live
mode secret keys have the prefix sk_live_...Install
Install once, then crawl from your terminal, scripts, or agent workflow. Requires Python 3.10 or newer.
pip install docpullWhy docpull?
Answers to questions people ask before installing.