FAEA/README.md
Luciabrightcode dab56e743b feat(phase4): Implement Deployment & Optimization Layer
- Scale docker-compose.yml (5 browser, 20 extractor replicas)
- Add Prometheus and Grafana monitoring services
- Implement persistent Redis TaskWorker in src/orchestrator/worker.py
- Implement MetricsCollector in src/core/monitoring.py
- Implement SessionRecoveryManager in src/core/recovery.py
- Update README.md with production usage guide
- Update root documentation (implementation_plan.md, walkthrough.md)
2025-12-23 13:09:27 +08:00

2 KiB

FAEA: High-Fidelity Autonomous Extraction Agent

Overview

FAEA is a hybrid extraction system designed to defeat advanced bot mitigation (Cloudflare, Akamai, etc.) using a "Headless-Plus" architecture. It combines full-browser fidelity (Camoufox/Playwright) for authentication with high-speed clients (curl_cffi) for data extraction.

Features

  • Bifurcated Execution: Browser for Auth, Curl for Extraction.
  • TLS Fingerprint Alignment: Browser and Extractor both mimic Chrome/124.
  • Evasion:
    • GhostCursor: Human-like mouse movements (Bezier curves, Fitts's Law).
    • EntropyScheduler: Jittered request timing (Gaussian + Phase Drift).
    • Mobile Proxy Rotation: Sticky session management.
  • Production Ready:
    • Docker Swarm/Compose scaling.
    • Redis-backed persistent task queues.
    • Prometheus/Grafana monitoring.

Getting Started

Prerequisites

  • Docker & Docker Compose
  • Redis (optional, included in compose)

Quick Start (Dev)

docker-compose up --build

Production Usage

1. Scaling the Cluster

The infrastructure is designed to scale horizontally.

# Scale to 5 Browsers and 20 Extractors
docker-compose up -d --scale camoufox-pool=5 --scale curl-pool=20

2. Monitoring

Access the dashboards:

  • Grafana: http://localhost:3000 (Default creds: admin/admin)
  • Prometheus: http://localhost:9090
  • Metrics: Authentication Success Rate, Session Duration, Extraction Throughput.

3. Task Dispatch configuration

Tasks are dispatched via Redis task_queue list. Payload format:

{
  "type": "auth",
  "url": "https://example.com/login",
  "session_id": "sess_123"
}

Architecture

  • src/browser/: Camoufox (Firefox/Chrome) manager for auth.
  • src/extractor/: Curl Client for high-speed extraction.
  • src/core/: Shared logic (Session, Scheduler, Recovery).
  • src/orchestrator/: Worker loops and task management.

Testing

Run unit tests:

./venv/bin/pytest tests/unit/