FAEA/implementation_plan.md

2.2 KiB

Phase 2: Core Components (Headless-Plus) Implementation Plan

Goal Description

Implement the core logic for the "Headless-Plus" architecture:

  1. Browser Tier: CamoufoxManager to handle browser instantiation, profile injection, and state extraction.
  2. Extractor Tier: CurlCffiClient to consume shared state and execute high-speed requests with matching fingerprints.

User Review Required

Important

Mocking Strategy: Since we might not have a live "Cloudflare-protected" target easily accessible for automated testing, I will implement a Mock Target using a local http.server or FastAPI that logs headers/TLS info to verify fingerprints.

Proposed Changes

Browser Tier

[NEW] src/browser/manager.py

  • Class: CamoufoxManager
  • Responsibilities:
    • Launch Camoufox (via Playwright) with specific user_agent and viewport.
    • initialize(): Set up browser context.
    • extract_session_state(): Gather cookies, storage, and fingerprint info into SessionState.

Extractor Tier

[NEW] src/extractor/client.py

  • Class: CurlClient
  • Responsibilities:
    • Initialize with SessionState.
    • Configure curl_cffi session to match SessionState.tls_fingerprint.
    • fetch(url): Execute requests using the shared state.

Testing Infrastructure

[NEW] tests/e2e/test_handover.py

  • A full flow test:
    1. Instantiate CamoufoxManager -> Navigate to Mock Server -> Extract State.
    2. Save State to Redis.
    3. Instantiate CurlClient with State -> Request Mock Server.
    4. Verify: Mock Server sees matching User-Agent and (if possible) consistent TLS signatures.

Verification Plan

Automated Tests

  1. Mock Server Test:
    • Start a local server that captures headers.
    • Run the E2E script.
    • Assert that both Browser and Client requests look identical (or sufficiently similar).

Manual Verification

  • Run docker-compose up and execute a manual script inside the orchestrator container to trigger the flow.