2.2 KiB
2.2 KiB
Phase 2: Core Components (Headless-Plus) Implementation Plan
Goal Description
Implement the core logic for the "Headless-Plus" architecture:
- Browser Tier:
CamoufoxManagerto handle browser instantiation, profile injection, and state extraction. - Extractor Tier:
CurlCffiClientto consume shared state and execute high-speed requests with matching fingerprints.
User Review Required
Important
Mocking Strategy: Since we might not have a live "Cloudflare-protected" target easily accessible for automated testing, I will implement a Mock Target using a local
http.serverorFastAPIthat logs headers/TLS info to verify fingerprints.
Proposed Changes
Browser Tier
[NEW] src/browser/manager.py
- Class:
CamoufoxManager - Responsibilities:
- Launch Camoufox (via Playwright) with specific
user_agentandviewport. initialize(): Set up browser context.extract_session_state(): Gather cookies, storage, and fingerprint info intoSessionState.
- Launch Camoufox (via Playwright) with specific
Extractor Tier
[NEW] src/extractor/client.py
- Class:
CurlClient - Responsibilities:
- Initialize with
SessionState. - Configure
curl_cffisession to matchSessionState.tls_fingerprint. fetch(url): Execute requests using the shared state.
- Initialize with
Testing Infrastructure
[NEW] tests/e2e/test_handover.py
- A full flow test:
- Instantiate
CamoufoxManager-> Navigate to Mock Server -> Extract State. - Save State to Redis.
- Instantiate
CurlClientwith State -> Request Mock Server. - Verify: Mock Server sees matching
User-Agentand (if possible) consistent TLS signatures.
- Instantiate
Verification Plan
Automated Tests
- Mock Server Test:
- Start a local server that captures headers.
- Run the E2E script.
- Assert that both Browser and Client requests look identical (or sufficiently similar).
Manual Verification
- Run
docker-compose upand execute a manual script inside the orchestrator container to trigger the flow.