No description
- Change blpop key from 'task_queue' to 'tasks' - Pass coroutine object directly to dispatcher to prevent silent await failure |
||
|---|---|---|
| docs | ||
| infra | ||
| skills | ||
| src | ||
| tests | ||
| venv | ||
| .gitignore | ||
| docker-compose.yml | ||
| Dockerfile | ||
| faea-dashboard.sh | ||
| implementation_plan.md | ||
| README.md | ||
| requirements.txt | ||
| walkthrough.md | ||
FAEA: High-Fidelity Autonomous Extraction Agent (v1.0)
Overview
FAEA is a hybrid extraction system designed to defeat advanced bot mitigation (Cloudflare, Akamai, etc.) using a "Headless-Plus" architecture. It combines full-browser fidelity (Camoufox/Playwright) for authentication with high-speed clients (curl_cffi) for data extraction.
Status: Released v1.0
Docs: Architecture Definition v0.2
Features
- Bifurcated Execution: Browser for Auth, Curl for Extraction.
- TLS Fingerprint Alignment: Browser and Extractor both mimic
Chrome/124. - Evasion Layer:
- GhostCursor: Human-like mouse movements (Bezier curves, Fitts's Law).
- EntropyScheduler: Jittered request timing (Gaussian + Phase Drift).
- Mobile Proxy Rotation: Sticky session management.
- Production Ready:
- Docker Swarm/Compose scaling.
- Redis-backed persistent task queues.
- Prometheus/Grafana monitoring.
Configuration
Environment Variables
| Variable | Description | Default |
|---|---|---|
REDIS_URL |
Connection string for Redis | redis://redis:6379 |
BROWSERFORGE_SEED |
Seed for consistent canvas fingerprinting | (Optional) |
PROXY_API_KEY |
API Key for mobile proxy provider | (Required for production) |
Resource Requirements
- Camoufox: Requires
shm_size: 2gbto prevent Chrome crashing on complex pages. - Memory: Ensure host has at least 4GB RAM for a basic 5-browser cluster.
Production Usage
1. Scaling the Cluster
Start the stack with recommended production replicas:
docker-compose up -d --scale camoufox-pool=5 --scale curl-pool=20
2. Monitoring
Access the observability stack:
- Grafana:
http://localhost:3000(Default:admin/admin).- Dashboards: "FAEA Overview", "Extraction Health".
- Prometheus:
http://localhost:9090. - Metrics:
auth_attempts_total: Success/Failure counters.session_duration_seconds: Histogram of session validity.
3. Task Dispatch
Push tasks to the task_queue in Redis.
Python Example:
import redis
import json
r = redis.from_url("redis://localhost:6379")
payload = {
"type": "auth",
"url": "https://example.com/login",
"session_id": "session_001"
}
r.rpush("task_queue", json.dumps(payload))
print("Task dispatched!")
Curl Example:
Use redis-cli:
redis-cli LPUSH task_queue '{"type": "extract", "url": "https://example.com/data", "session_id": "session_001"}'
Architecture
src/browser/: Camoufox (Firefox/Chrome) manager for auth.src/extractor/: Curl Client for high-speed extraction.src/core/: Shared logic (Session, Scheduler, Recovery, Monitoring).src/orchestrator/: Worker loops and task management.
Testing
Run unit tests:
./venv/bin/pytest tests/unit/