Updated implementation_plan.md and walkthrough.md for approval before kicking off Phase4

2025-12-23 12:18:11 +08:00 · 2025-12-23 12:18:11 +08:00 · 2ff4d593f8
commit 2ff4d593f8
parent 4f76105e3d
6 changed files with 125 additions and 27 deletions
--- a/implementation_plan.md
+++ b/implementation_plan.md
@ -1,29 +1,58 @@
-# Phase 3: Evasion & Resilience Implementation Plan (COMPLETED)
+# Phase 4: Deployment & Optimization Implementation Plan

 ## Goal Description
-Implement the "Human" behavior layer to defeat behavioral biometrics and temporal analysis.
+Transition the system from a functional prototype to a scalable, production-ready extraction grid. This involves:
+1.  **Scaling**: Configuring Docker Compose for high concurrency (5 Browsers, 20 Extractors).
+2.  **Resilience**: Implementing persistent task queues and auto-recovery logic.
+3.  **Observability**: Integrating Prometheus metrics for monitoring health and success rates.

-## Completed Changes
+## User Review Required
+> [!NOTE]
+> **Monitoring**: We will add `prometheus` and `grafana` containers to `docker-compose.yml` to support the metrics collected by `src/core/monitoring.py`.
+> **Task Loops**: We will introduce a new entry point `src/orchestrator/worker.py` to act as the persistent long-running process consuming from Redis.

-### Browser Tier (Human Mimesis)
-   **GhostCursorEngine** (`src/browser/ghost_cursor.py`):
-    -   Implemented composite cubic Bezier curves.
-    -   Implemented Fitts's Law velocity profiles.
-    -   Added random micro-movements for human drift simulation.
+## Proposed Changes

-### Core Tier (Temporal & Network Entropy)
-   **EntropyScheduler** (`src/core/scheduler.py`):
-    -   Implemented Gaussian noise injection ($\sigma=5.0$).
-    -   Implemented Phase shift rotation to prevent harmonic detection.
-   **MobileProxyRotator** (`src/core/proxy.py`):
-    -   Implemented Sticky Session logic.
-    -   Implemented Cooldown management.
+### Infrastructure
+#### [UPDATE] [docker-compose.yml](file:///home/kasm-user/workspace/FAEA/docker-compose.yml)
+-   **Services**:
+    -   `camoufox`: Scale to 5 replicas. Set `shm_size: 2gb`. Limit CPU/Mem.
+    -   `extractor`: Scale to 20 replicas. Limit resources.
+    -   `prometheus`: Add service for metrics collection.
+    -   `grafana`: Add service for visualization.
+    -   `redis`: Optimize config.

-### Remediation: TLS Fingerprint Alignment
-   **Tuned** `src/browser/manager.py`: Updated to trigger `Chrome/124`.
-   **Tuned** `src/extractor/client.py`: Updated to use `chrome124` impersonation verify consistency.
-   **Verified**: Static alignment achieved. Dynamic verification (`tests/manual/verify_tls.py`) confirms logic but faced prompt-specific network blocks.
+### Core Tier (Orchestration & Monitoring)
+#### [NEW] [src/core/monitoring.py](file:///home/kasm-user/workspace/FAEA/src/core/monitoring.py)
+-   **Class**: `MetricsCollector`
+-   **Metrics**:
+    -   `auth_attempts` (Counter)
+    -   `session_duration` (Histogram)
+    -   `extraction_throughput` (Counter)

-## Verification Status
-   **Functional**: Components implemented and unit-testable.
-   **TLS**: Aligned to Chrome 124 standard.
+#### [NEW] [src/orchestrator/worker.py](file:///home/kasm-user/workspace/FAEA/src/orchestrator/worker.py)
+-   **Class**: `TaskWorker`
+-   **Features**:
+    -   Infinite loop consuming tasks from Redis lists (`BLPOP`).
+    -   Dispatch logic: `auth` -> `CamoufoxManager`, `extract` -> `CurlClient`.
+    -   Integration with `SessionRecoveryManager` for handling failures.
+
+#### [NEW] [src/core/recovery.py](file:///home/kasm-user/workspace/FAEA/src/core/recovery.py)
+-   **Class**: `SessionRecoveryManager`
+-   **Features**:
+    -   Handle `cf_clearance_expired`, `ip_reputation_drop`, etc.
+
+### Documentation
+#### [UPDATE] [README.md](file:///home/kasm-user/workspace/FAEA/README.md)
+-   Add "Production Usage" section.
+-   Document how to scale and monitor.
+
+## Verification Plan
+
+### Automated Tests
+-   **Integration**: Verify Worker picks up task from Redis.
+-   **Metrics**: Verify `/metrics` endpoint is exposed and scraping.
+
+### Manual Verification
+-   `docker-compose up --scale camoufox=5 --scale extractor=20` to verify stability.
+-   Check Grafana dashboard for metric data flow.
--- a/src/browser/pycache/ghost_cursor.cpython-310.pyc
+++ b/src/browser/pycache/ghost_cursor.cpython-310.pyc
--- a/src/extractor/client.py
+++ b/src/extractor/client.py
@ -29,7 +29,8 @@ class CurlClient:
        logger.info("Initializing CurlClient...")
        
        # impersonate argument controls TLS Client Hello
-        # 'chrome120' matches our hardcoded Camoufox build in this MVP
+        # impersonate argument controls TLS Client Hello
+        # 'chrome124' matches our hardcoded Camoufox build in this MVP
        self.session = AsyncSession(impersonate=self.session_state.tls_fingerprint)
        
        # 1. Inject Cookies
--- a/tests/unit/pycache/test_ghost_cursor.cpython-310-pytest-9.0.2.pyc
+++ b/tests/unit/pycache/test_ghost_cursor.cpython-310-pytest-9.0.2.pyc
--- a/tests/unit/test_ghost_cursor.py
+++ b/tests/unit/test_ghost_cursor.py
@ -0,0 +1,59 @@
+import pytest
+import math
+from src.browser.ghost_cursor import GhostCursorEngine
+
+def test_bezier_curve_generation():
+    engine = GhostCursorEngine()
+    start = (0, 0)
+    end = (100, 100)
+    
+    # Test control point generation
+    c1, c2 = engine._generate_bezier_controls(start, end)
+    
+    # Basic bounds check: Control points should be somewhat between start and end 
+    # but can overshoot. 
+    # Just ensure they are tuples of floats
+    assert isinstance(c1, tuple)
+    assert len(c1) == 2
+    assert isinstance(c2, tuple)
+    assert len(c2) == 2
+
+def test_bezier_point_calculation():
+    engine = GhostCursorEngine()
+    p0 = (0,0)
+    p1 = (10, 20)
+    p2 = (80, 90)
+    p3 = (100, 100)
+    
+    # t=0 should be start
+    res_0 = engine._bezier_point(0, p0, p1, p2, p3)
+    assert math.isclose(res_0[0], 0)
+    assert math.isclose(res_0[1], 0)
+    
+    # t=1 should be end
+    res_1 = engine._bezier_point(1, p0, p1, p2, p3)
+    assert math.isclose(res_1[0], 100)
+    assert math.isclose(res_1[1], 100)
+    
+    # t=0.5 should be somewhere in between
+    res_mid = engine._bezier_point(0.5, p0, p1, p2, p3)
+    assert 0 < res_mid[0] < 100
+    assert 0 < res_mid[1] < 100
+
+def test_waypoints_generation():
+    engine = GhostCursorEngine()
+    start = (0, 0)
+    end = (300, 300)
+    count = 3
+    
+    waypoints = engine._generate_waypoints(start, end, count)
+    
+    assert len(waypoints) == count + 1 # +1 for the end point
+    assert waypoints[0] == start
+    assert waypoints[-1] == end
+    
+    # Check intermediate points exist
+    for i in range(1, count):
+        assert waypoints[i] != start
+        assert waypoints[i] != end
+
--- a/walkthrough.md
+++ b/walkthrough.md
@ -95,8 +95,17 @@ tests/unit/test_session_core.py .. [100%]
 - **EntropyScheduler**: Implemented (`src/core/scheduler.py`).
 - **MobileProxyRotator**: Implemented (`src/core/proxy.py`).

-## 4. Next Steps (Phase 4: Deployment & Optimization)
- Tune Bezier parameters against live detection.
- Implement persistent Redis task queues.
- Scale Proxy Rotator for high concurrency.
+## Phase 4: Deployment & Optimization Walkthrough (Planned)
+
+### 1. Goals
+- Scale infrastructure (5x Browser, 20x Extractor).
+- Implement persistent task workers with Redis.
+- Implement Monitoring (Prometheus/Grafana).
+- Implement auto-recovery logic.
+
+### 2. Next Steps
+- Update `docker-compose.yml`.
+- Implement `src/orchestrator/worker.py`.
+- Implement `src/core/monitoring.py`.
+