No description
Find a file
Luciabrightcode 88d1d08561 fix(core): resolve scheduler async syntax error and robust coroutine handling
- Handle coroutine objects in addition to functions in dispatch_with_entropy
- Prevent RuntimeWarning for unawaited coroutines
2025-12-23 15:07:06 +08:00
docs docs: Release v1.0 Final Documentation 2025-12-23 13:20:07 +08:00
infra feat(phase4): Implement Deployment & Optimization Layer 2025-12-23 13:09:27 +08:00
skills chore(governance): Align Architect SKILL.md with project standards 2025-12-23 13:34:51 +08:00
src fix(core): resolve scheduler async syntax error and robust coroutine handling 2025-12-23 15:07:06 +08:00
tests Updated implementation_plan.md and walkthrough.md for approval before kicking off Phase4 2025-12-23 12:18:11 +08:00
venv feat: complete Phase 2 core components (Camoufox & CurlClient) 2025-12-22 18:01:15 +08:00
.gitignore chore(governance): Initialize Architect role and finalize handover 2025-12-23 13:28:59 +08:00
docker-compose.yml fix(deploy): update browser base image and resolve compose conflict 2025-12-23 14:21:47 +08:00
Dockerfile Initial commmit 2025-12-22 17:14:46 +08:00
implementation_plan.md fix(deps): add missing runtime dependencies 2025-12-23 14:47:08 +08:00
README.md docs: Release v1.0 Final Documentation 2025-12-23 13:20:07 +08:00
requirements.txt fix(deps): add missing runtime dependencies 2025-12-23 14:47:08 +08:00
walkthrough.md fix(deps): add missing runtime dependencies 2025-12-23 14:47:08 +08:00

FAEA: High-Fidelity Autonomous Extraction Agent (v1.0)

Overview

FAEA is a hybrid extraction system designed to defeat advanced bot mitigation (Cloudflare, Akamai, etc.) using a "Headless-Plus" architecture. It combines full-browser fidelity (Camoufox/Playwright) for authentication with high-speed clients (curl_cffi) for data extraction.

Status: Released v1.0
Docs: Architecture Definition v0.2

Features

  • Bifurcated Execution: Browser for Auth, Curl for Extraction.
  • TLS Fingerprint Alignment: Browser and Extractor both mimic Chrome/124.
  • Evasion Layer:
    • GhostCursor: Human-like mouse movements (Bezier curves, Fitts's Law).
    • EntropyScheduler: Jittered request timing (Gaussian + Phase Drift).
    • Mobile Proxy Rotation: Sticky session management.
  • Production Ready:
    • Docker Swarm/Compose scaling.
    • Redis-backed persistent task queues.
    • Prometheus/Grafana monitoring.

Configuration

Environment Variables

Variable Description Default
REDIS_URL Connection string for Redis redis://redis:6379
BROWSERFORGE_SEED Seed for consistent canvas fingerprinting (Optional)
PROXY_API_KEY API Key for mobile proxy provider (Required for production)

Resource Requirements

  • Camoufox: Requires shm_size: 2gb to prevent Chrome crashing on complex pages.
  • Memory: Ensure host has at least 4GB RAM for a basic 5-browser cluster.

Production Usage

1. Scaling the Cluster

Start the stack with recommended production replicas:

docker-compose up -d --scale camoufox-pool=5 --scale curl-pool=20

2. Monitoring

Access the observability stack:

  • Grafana: http://localhost:3000 (Default: admin / admin).
    • Dashboards: "FAEA Overview", "Extraction Health".
  • Prometheus: http://localhost:9090.
  • Metrics:
    • auth_attempts_total: Success/Failure counters.
    • session_duration_seconds: Histogram of session validity.

3. Task Dispatch

Push tasks to the task_queue in Redis.

Python Example:

import redis
import json

r = redis.from_url("redis://localhost:6379")

payload = {
    "type": "auth",
    "url": "https://example.com/login",
    "session_id": "session_001"
}

r.rpush("task_queue", json.dumps(payload))
print("Task dispatched!")

Curl Example: Use redis-cli:

redis-cli LPUSH task_queue '{"type": "extract", "url": "https://example.com/data", "session_id": "session_001"}'

Architecture

  • src/browser/: Camoufox (Firefox/Chrome) manager for auth.
  • src/extractor/: Curl Client for high-speed extraction.
  • src/core/: Shared logic (Session, Scheduler, Recovery, Monitoring).
  • src/orchestrator/: Worker loops and task management.

Testing

Run unit tests:

./venv/bin/pytest tests/unit/