AIMLPM/markcrawl
by Various
Fast Python web crawler for RAG and AI ingestion. Extracts clean Markdown from any site for LLMs and vector stores.
MCP
AIMLPM/markcrawl
Added 1 June 2026
Overview
A fast Python web crawler that extracts clean Markdown from websites. Designed for RAG and AI ingestion pipelines, it outputs content ready for use by LLMs and vector stores.
Best for
Best for
Developers who need a lightweight Markdown crawler for small to medium-scale RAG ingestion
Use cases
- Building RAG pipelines with web content
- Ingesting documentation into vector stores
- Extracting clean Markdown for LLM training data
Notes
A fast Python web crawler that extracts clean Markdown from websites. Designed for RAG and AI ingestion pipelines, it outputs content ready for use by LLMs and vector stores.
2 stars on GitHub. Last updated 2026-05-17. Licensed MIT.
Use cases
- Building RAG pipelines with web content
- Ingesting documentation into vector stores
- Extracting clean Markdown for LLM training data
Pros
- Fast crawling for Markdown extraction
- Produces clean Markdown output suitable for LLM ingestion
- Python-based, easy to integrate into data pipelines
Cons
- Very low community adoption (2 stars), indicating early-stage or limited testing
- No single vendor backing for support or maintenance
- May lack advanced features like rate limiting or dynamic content handling
Indexed from awesome-mcp-servers-punkpeye and enriched against its public facts.
Pros
- Fast crawling for Markdown extraction
- Produces clean Markdown output suitable for LLM ingestion
- Python-based, easy to integrate into data pipelines
Cons
- Very low community adoption (2 stars), indicating early-stage or limited testing
- No single vendor backing for support or maintenance
- May lack advanced features like rate limiting or dynamic content handling
Pairs with
Other entries in the index that connect to this one. Click through to see the chain.