docs: Release v1.0 Final Documentation

- README.md: Add Production Usage, Configuration, and Monitoring guide
- docs/ADD_v0.2.md: Architecture Definition v0.2 (Released v1.0)
This commit is contained in:
Luciabrightcode 2025-12-23 13:20:07 +08:00
parent e15dcb2cd7
commit 4ad9c7f99b
2 changed files with 1389 additions and 25 deletions

View file

@ -1,12 +1,15 @@
# FAEA: High-Fidelity Autonomous Extraction Agent # FAEA: High-Fidelity Autonomous Extraction Agent (v1.0)
## Overview ## Overview
FAEA is a hybrid extraction system designed to defeat advanced bot mitigation (Cloudflare, Akamai, etc.) using a "Headless-Plus" architecture. It combines full-browser fidelity (Camoufox/Playwright) for authentication with high-speed clients (curl_cffi) for data extraction. FAEA is a hybrid extraction system designed to defeat advanced bot mitigation (Cloudflare, Akamai, etc.) using a "Headless-Plus" architecture. It combines full-browser fidelity (Camoufox/Playwright) for authentication with high-speed clients (curl_cffi) for data extraction.
**Status**: Released v1.0
**Docs**: [Architecture Definition v0.2](docs/ADD_v0.2.md)
## Features ## Features
- **Bifurcated Execution**: Browser for Auth, Curl for Extraction. - **Bifurcated Execution**: Browser for Auth, Curl for Extraction.
- **TLS Fingerprint Alignment**: Browser and Extractor both mimic `Chrome/124`. - **TLS Fingerprint Alignment**: Browser and Extractor both mimic `Chrome/124`.
- **Evasion**: - **Evasion Layer**:
- **GhostCursor**: Human-like mouse movements (Bezier curves, Fitts's Law). - **GhostCursor**: Human-like mouse movements (Bezier curves, Fitts's Law).
- **EntropyScheduler**: Jittered request timing (Gaussian + Phase Drift). - **EntropyScheduler**: Jittered request timing (Gaussian + Phase Drift).
- **Mobile Proxy Rotation**: Sticky session management. - **Mobile Proxy Rotation**: Sticky session management.
@ -15,48 +18,67 @@ FAEA is a hybrid extraction system designed to defeat advanced bot mitigation (C
- Redis-backed persistent task queues. - Redis-backed persistent task queues.
- Prometheus/Grafana monitoring. - Prometheus/Grafana monitoring.
## Getting Started ## Configuration
### Prerequisites ### Environment Variables
- Docker & Docker Compose | Variable | Description | Default |
- Redis (optional, included in compose) |----------|-------------|---------|
| `REDIS_URL` | Connection string for Redis | `redis://redis:6379` |
| `BROWSERFORGE_SEED` | Seed for consistent canvas fingerprinting | (Optional) |
| `PROXY_API_KEY` | API Key for mobile proxy provider | (Required for production) |
### Quick Start (Dev) ### Resource Requirements
```bash - **Camoufox**: Requires `shm_size: 2gb` to prevent Chrome crashing on complex pages.
docker-compose up --build - **Memory**: Ensure host has at least 4GB RAM for a basic 5-browser cluster.
```
## Production Usage ## Production Usage
### 1. Scaling the Cluster ### 1. Scaling the Cluster
The infrastructure is designed to scale horizontally. Start the stack with recommended production replicas:
```bash ```bash
# Scale to 5 Browsers and 20 Extractors
docker-compose up -d --scale camoufox-pool=5 --scale curl-pool=20 docker-compose up -d --scale camoufox-pool=5 --scale curl-pool=20
``` ```
### 2. Monitoring ### 2. Monitoring
Access the dashboards: Access the observability stack:
- **Grafana**: `http://localhost:3000` (Default creds: admin/admin) - **Grafana**: `http://localhost:3000` (Default: `admin` / `admin`).
- **Prometheus**: `http://localhost:9090` - Dashboards: "FAEA Overview", "Extraction Health".
- **Metrics**: Authentication Success Rate, Session Duration, Extraction Throughput. - **Prometheus**: `http://localhost:9090`.
- **Metrics**:
- `auth_attempts_total`: Success/Failure counters.
- `session_duration_seconds`: Histogram of session validity.
### 3. Task Dispatch configuration ### 3. Task Dispatch
Tasks are dispatched via Redis `task_queue` list. Push tasks to the `task_queue` in Redis.
Payload format:
```json **Python Example:**
{ ```python
import redis
import json
r = redis.from_url("redis://localhost:6379")
payload = {
"type": "auth", "type": "auth",
"url": "https://example.com/login", "url": "https://example.com/login",
"session_id": "sess_123" "session_id": "session_001"
} }
r.rpush("task_queue", json.dumps(payload))
print("Task dispatched!")
```
**Curl Example:**
Use `redis-cli`:
```bash
redis-cli LPUSH task_queue '{"type": "extract", "url": "https://example.com/data", "session_id": "session_001"}'
``` ```
## Architecture ## Architecture
- `src/browser/`: Camoufox (Firefox/Chrome) manager for auth. - `src/browser/`: Camoufox (Firefox/Chrome) manager for auth.
- `src/extractor/`: Curl Client for high-speed extraction. - `src/extractor/`: Curl Client for high-speed extraction.
- `src/core/`: Shared logic (Session, Scheduler, Recovery). - `src/core/`: Shared logic (Session, Scheduler, Recovery, Monitoring).
- `src/orchestrator/`: Worker loops and task management. - `src/orchestrator/`: Worker loops and task management.
## Testing ## Testing

1342
docs/ADD_v0.2.md Normal file

File diff suppressed because it is too large Load diff