Open SourceOpen Source · Core Developer · 2025

WhatsApp Automation & Third-Party Sync Engine

An enterprise-grade, headless browser automation engine designed to programmatically interface with WhatsApp Web infrastructure without relying on official API wrappers. Features an advanced anti-detection stealth framework, decoupled filtering pipelines, and real-time syncing to Notion workspaces.

StealthAnti-bot evasion framework
Real-timeNotion workspace synchronization
ModularPipeline architecture pattern

Tech Stack

PythonPlaywrightSeleniumNotion APIYAMLUnittest

Stakeholders

Enterprise Operations Team

Utilize real-time WhatsApp logging and message synchronization to Notion for business tracking

Automation Developers

Maintain browser wrapper mechanics, update xpath and class selectors as WhatsApp Web layout evolves

Zafran (Lead Architect)

Designed the bot evasion abstraction layer, central pipeline supervisor, and the synchronization engine

The Problem

Enterprise operations require programmatically interfacing with WhatsApp messages to synchronize states and logging telemetry to platforms like Notion. However, modern bot mitigation systems employ sophisticated behavioral fingerprinting (analyzing navigator.webdriver flags, WebGL capabilities, mouse movements) that instantly block standard headless automation engines. Additionally, real-time sync introduces a high risk of duplicate records or data gaps during API rate limits and network blips.

The Solution

Developed a Python-based headless browser automation engine powered by Playwright and Selenium, featuring a modular pipeline architecture with isolated boundaries. The engine includes a stealth injection layer that dynamically overrides client-side JavaScript automation properties and simulates human-like timing variations to evade detection. To handle network blips, the pipeline uses an isolated filter layer and a localized exclusion system (excluded_log.txt) for robust state reconciliation without duplication.

Architecture

The system implements a Modular Pipeline Architecture across four decoupled components: an Infrastructure Layer (for low-level browser orchestration and stealth injection), a Processing Pipeline (for page state management and DOM event extraction), a Logical Filter Layer (for payload screening and local state tracking), and an External Sync Layer (for shipping messages to the Notion API). This ensures that scraping mechanics, filter rules, and third-party integrations can be updated independently without affecting one another.

  1. 01

    Infrastructure & Evasion Layer (browser.py, stealth.py)

    Configures headless/headful Playwright/Selenium instances. Dynamically overrides client-side JS flags (such as navigator.webdriver) and mimics human-like mouse movement delays to bypass advanced bot protection systems.

  2. 02

    Processing Supervisor (engine.py, main.py)

    Serves as the execution coordinator. Monitors page states, tracks real-time chat DOM mutations, extracts message payloads, and pipes them securely to the logic pipeline.

  3. 03

    Logical Filter Layer (filters.py, excluded_log.txt)

    An isolated step in the data pipeline. Screens extracted payloads against configurable business rules, filtering out unwanted logs and maintaining a local exclusion state to guarantee zero duplication.

  4. 04

    External Sync Integration (notion_sync.py)

    A third-party state synchronizer. Standardizes message payloads and dispatches them asynchronously to the target Notion database via the Notion REST API with rate-limit handling.

Dev Setup

Prerequisites

  • Python 3.9+
  • Playwright / Selenium
  • Notion API Integration Token
bash — setup
$git clone https://github.com/ZafranSY/automation-wasap.git && cd automation-wasap
$python -m venv venv && source venv/bin/activate

# Windows: venv\Scripts\activate

$pip install -r requirements.txt
$cp config/settings.yaml.example config/settings.yaml

# Or edit existing config/settings.yaml with database_id and api_key

$python main.py

# Starts the automation orchestration engine

$python -m unittest src/test_filters.py

# Runs filter matrix unit tests

$python -m unittest src/test_browser.py

# Runs browser context validation tests

Challenges

  1. 01

    Anti-Automation Telemetry & Bot Mitigation Bypasses

    Modern web applications deploy sophisticated fingerprinting algorithms that analyze browser properties (such as navigator.webdriver flags, WebGL capabilities, and exact mouse movement delays) to detect and block headless chromium instances instantly. Mitigation: Developed an independent stealth abstraction layer (src/stealth.py) that modifies runtime browser configurations and injects custom evaluation scripts before loading page code, effectively masking automation variables and simulating randomized timing patterns.

  2. 02

    Processing Synchronization Failures & Network Blips

    Synchronizing data between real-time browser streams and a cloud-based API introduces a high risk of message duplication or data gaps whenever network calls drop, rate limits hit, or browser sessions reset. Mitigation: Built a defensive logging system backed by strict message filter logic (src/filters.py, data/excluded_log.txt). The platform maintains a localized exclusion record to track previously handled states, falling back gracefully to local logs in the event of API sync failures.

What I Learned

  • 01

    Bypassing modern bot detection requires overriding browser variables *before* any client-side script executes — late injection is useless.

  • 02

    Decoupling data extraction (scraping) from data persistence (syncing) makes the codebase resilient to API changes and rate limit adjustments.

  • 03

    Local state fallbacks are essential for headless automation platforms that lack built-in queueing mechanisms.

  • 04

    Rigorous unit testing of page/DOM extraction logic is vital since target web platforms change their UI structures frequently.