FAEA/README.md
Luciabrightcode dab56e743b feat(phase4): Implement Deployment & Optimization Layer
- Scale docker-compose.yml (5 browser, 20 extractor replicas)
- Add Prometheus and Grafana monitoring services
- Implement persistent Redis TaskWorker in src/orchestrator/worker.py
- Implement MetricsCollector in src/core/monitoring.py
- Implement SessionRecoveryManager in src/core/recovery.py
- Update README.md with production usage guide
- Update root documentation (implementation_plan.md, walkthrough.md)
2025-12-23 13:09:27 +08:00

66 lines
2 KiB
Markdown

# FAEA: High-Fidelity Autonomous Extraction Agent
## Overview
FAEA is a hybrid extraction system designed to defeat advanced bot mitigation (Cloudflare, Akamai, etc.) using a "Headless-Plus" architecture. It combines full-browser fidelity (Camoufox/Playwright) for authentication with high-speed clients (curl_cffi) for data extraction.
## Features
- **Bifurcated Execution**: Browser for Auth, Curl for Extraction.
- **TLS Fingerprint Alignment**: Browser and Extractor both mimic `Chrome/124`.
- **Evasion**:
- **GhostCursor**: Human-like mouse movements (Bezier curves, Fitts's Law).
- **EntropyScheduler**: Jittered request timing (Gaussian + Phase Drift).
- **Mobile Proxy Rotation**: Sticky session management.
- **Production Ready**:
- Docker Swarm/Compose scaling.
- Redis-backed persistent task queues.
- Prometheus/Grafana monitoring.
## Getting Started
### Prerequisites
- Docker & Docker Compose
- Redis (optional, included in compose)
### Quick Start (Dev)
```bash
docker-compose up --build
```
## Production Usage
### 1. Scaling the Cluster
The infrastructure is designed to scale horizontally.
```bash
# Scale to 5 Browsers and 20 Extractors
docker-compose up -d --scale camoufox-pool=5 --scale curl-pool=20
```
### 2. Monitoring
Access the dashboards:
- **Grafana**: `http://localhost:3000` (Default creds: admin/admin)
- **Prometheus**: `http://localhost:9090`
- **Metrics**: Authentication Success Rate, Session Duration, Extraction Throughput.
### 3. Task Dispatch configuration
Tasks are dispatched via Redis `task_queue` list.
Payload format:
```json
{
"type": "auth",
"url": "https://example.com/login",
"session_id": "sess_123"
}
```
## Architecture
- `src/browser/`: Camoufox (Firefox/Chrome) manager for auth.
- `src/extractor/`: Curl Client for high-speed extraction.
- `src/core/`: Shared logic (Session, Scheduler, Recovery).
- `src/orchestrator/`: Worker loops and task management.
## Testing
Run unit tests:
```bash
./venv/bin/pytest tests/unit/
```