# FAEA: High-Fidelity Autonomous Extraction Agent ## Overview FAEA is a hybrid extraction system designed to defeat advanced bot mitigation (Cloudflare, Akamai, etc.) using a "Headless-Plus" architecture. It combines full-browser fidelity (Camoufox/Playwright) for authentication with high-speed clients (curl_cffi) for data extraction. ## Features - **Bifurcated Execution**: Browser for Auth, Curl for Extraction. - **TLS Fingerprint Alignment**: Browser and Extractor both mimic `Chrome/124`. - **Evasion**: - **GhostCursor**: Human-like mouse movements (Bezier curves, Fitts's Law). - **EntropyScheduler**: Jittered request timing (Gaussian + Phase Drift). - **Mobile Proxy Rotation**: Sticky session management. - **Production Ready**: - Docker Swarm/Compose scaling. - Redis-backed persistent task queues. - Prometheus/Grafana monitoring. ## Getting Started ### Prerequisites - Docker & Docker Compose - Redis (optional, included in compose) ### Quick Start (Dev) ```bash docker-compose up --build ``` ## Production Usage ### 1. Scaling the Cluster The infrastructure is designed to scale horizontally. ```bash # Scale to 5 Browsers and 20 Extractors docker-compose up -d --scale camoufox-pool=5 --scale curl-pool=20 ``` ### 2. Monitoring Access the dashboards: - **Grafana**: `http://localhost:3000` (Default creds: admin/admin) - **Prometheus**: `http://localhost:9090` - **Metrics**: Authentication Success Rate, Session Duration, Extraction Throughput. ### 3. Task Dispatch configuration Tasks are dispatched via Redis `task_queue` list. Payload format: ```json { "type": "auth", "url": "https://example.com/login", "session_id": "sess_123" } ``` ## Architecture - `src/browser/`: Camoufox (Firefox/Chrome) manager for auth. - `src/extractor/`: Curl Client for high-speed extraction. - `src/core/`: Shared logic (Session, Scheduler, Recovery). - `src/orchestrator/`: Worker loops and task management. ## Testing Run unit tests: ```bash ./venv/bin/pytest tests/unit/ ```