Crawl with Crawl4AI, Ollama and Blitzbrowser

In this tutorial, we will see how we can use Crawl4AI, Ollama and BlitzBrowser to crawl and extract data from unstructed websites.
This stack is perfect when you need to extract unstructured information from websites. Crawl4AI will manage your crawler orchestration, Ollama will provide the LLM of your choice to format the unstructured data and BlitzBrowser will operate your headless browsers.
Requirements for this tutorial
- You need basic python knowledge.
Prepare your environment
Install Crawl4AI
Crawl4AI has to be installed in your environment. You can find how to install Crawl4AI on their GitHub.
Install Ollama
If you have access to a remote Ollama instance, you can skip the local installation. Otherwise you have to install it locally.
In this tutorial, we will use the Gemma 3 model. To run the model locally, you need to pull the image ollama pull gemma3:latest
and then ollama serve
.
Run your web scraper
Now that your environment is ready, we will usse our website https://blitzbrowser.com/ to find the pricing plans. The following code example contains everything to run Crawl4AI, Ollama and BlitzBrowser out-of-the-box.
The only configuration you need is an access key to connect to BlitzBrowser headless browsers. You
can find how to get an access for free. Once you have an access
key, you have to set the environment variable BLITZBROWSER_ACCESS_KEY
to your key.
Scrape pricing plans of BlitzBrowser example
import asyncio
from typing import List
from crawl4ai import *
from pydantic import BaseModel
import os
# Classes used as JSON schema to format the output of Gemma 3
class Pricing(BaseModel):
name: str
href: str
class PricingPlans(BaseModel):
pricing_plans: List[Pricing]
# Browser config to use BlitzBrowser headless browsers with Chrome DevTools Protocol
browser_config = BrowserConfig(
headless=True,
verbose=True,
browser_mode="cdp",
cdp_url=f"wss://cdp.blitzbrowser.com?accessKey={os.environ.get('BLITZBROWSER_ACCESS_KEY')}",
)
# LLM strategy to format the data extracted
extraction_strategy = LLMExtractionStrategy(
llm_config=LLMConfig(provider="ollama/gemma3:4b", base_url="http://localhost:11434"),
extraction_type="schema",
schema=PricingPlans.model_json_schema(),
instruction="Extract all the pricing plans JSON array containing their 'name' and 'price'.",
chunk_token_threshold=1200,
overlap_rate=0.1,
apply_chunking=True,
input_format="markdown",
verbose=True
)
# Config for crawler
crawl_config = CrawlerRunConfig(
extraction_strategy=extraction_strategy,
cache_mode=CacheMode.BYPASS
)
async def main():
async with AsyncWebCrawler(config=browser_config) as crawler:
result = await crawler.arun(url="https://blitzbrowser.com/", config=crawl_config)
if result.success:
print("Extracted content:", result.extracted_content)
else:
print("Error:", result.error_message)
if __name__ == "__main__":
asyncio.run(main())
Conclusion
At this point, you should now be ready to web scrape any websites you want with Crawl4AI, Ollama and BlitzBrowser.
Contribute to the Docs
Found an issue or have an idea for an improvement? Our documentation is open source. Feel free to contribute directly on GitHub.