Skip to content

Client

The client is a single-file Python script at client/blindproof.py. It uses PEP 723 inline metadata for its runtime dependencies, so a fresh machine with only uv installed can run it with no setup:

uv run client/blindproof.py init ~/.blindproof
uv run client/blindproof.py watch ~/Documents/MyNovel
uv run client/blindproof.py enrol
uv run client/blindproof.py sync

Per-save data flow

sequenceDiagram
    participant Editor
    participant Watcher as SnapshotHandler
    participant Capture as capture_path()
    participant Crypto as AES-GCM + HMAC
    participant Disk as Local store

    Editor->>Watcher: file saved (.md, .txt, .docx)
    Watcher->>Watcher: suffix filter + per-path HMAC dedup
    Watcher->>Capture: read + extract plaintext
    Capture->>Crypto: AES-256-GCM encrypt<br/>HMAC-SHA256 commit
    Crypto->>Disk: ciphertext/<uuid>.bin
    Crypto->>Disk: Snapshot row (SQLite)
  1. watchdog fires on_modified / on_created / on_moved events (the last is important — many editors save via write-temp-then-rename).
  2. SnapshotHandler filters by suffix (.md, .txt, .docx) and deduplicates by per-path content equality — saving the same bytes twice only records one snapshot.
  3. capture_path(path, store, ciphertext_dir, keys, now) reads the file, extracts plaintext, and hands off to capture(). It also runs the same per-path content dedup itself — so a duplicate save records one snapshot even when capture_path is called directly, bypassing the watcher (as the Mac helper's sidecar does).
  4. capture() derives a nonce, encrypts the plaintext with AES-256-GCM, derives a per-leaf HMAC key (HKDF(mac_key, info=b"blindproof/leaf/v2/" || ciphertext_ref)), computes the commitment as HMAC-SHA256(per_leaf_key, plaintext), and writes the ciphertext to <store>/ciphertext/<uuid>.bin.
  5. A Snapshot row lands in SQLite, tagged commitment_scheme = "v2-per-leaf". Two saves of identical content produce different commitments because their ciphertext_refs differ — by design, this is what lets the bundle hand a publisher one reveal-key without exposing the rest of the timeline. A separate content_dedup_hmac (HMAC under mac_key, never uploaded) is stored alongside so an unchanged save can still be recognised. Both the watcher and capture_path itself compare it against the previous snapshot for that path and skip recording when it matches.

Plaintext lives only in RAM and only for the duration of the capture call. It is never written to disk outside the user's own editor.

Public API

All in client/blindproof.py:

Text and crypto

Symbol Purpose
extract(suffix, raw) Text extraction, dispatched through the _EXTRACTORS registry. .md / .txt: UTF-8, BOM and CRLF handled. .docx (via python-docx): body paragraphs joined with \n, no trailing newline, blank paragraphs kept as blank lines, headings included as their text; tables, footnotes, comments and headers/footers excluded. Raises ValueError on an unsupported suffix.
extractor_version(suffix) The frozen extractor version for a suffix (text-v1, docx-v1). Recorded on every snapshot so a future extractor change can never silently alter how a historical commitment was derived. New formats add a registry entry with their own version.
word_count(text), char_count(text) Metadata.
derive_master_key(passphrase, salt) argon2id — server-stored salt, passphrase never leaves client.
derive_subkeys(master_key)Keys HKDF-SHA256 → enc_key + mac_key.
derive_leaf_key(mac_key, ciphertext_ref) Per-leaf HMAC key for the v2 commitment scheme. The matching reveal-key the publisher uses to recompute a leaf is the hex of this value.
hmac_commit(key, plaintext) HMAC-SHA256 commitment. Production captures pass the per-leaf key; the legacy v1 scheme passes mac_key directly.
encrypt(enc_key, plaintext)(nonce, ciphertext) AES-256-GCM, 12-byte random nonce.
decrypt(enc_key, nonce, ciphertext) Inverse; raises on auth-tag failure.

Capture and store

Symbol Purpose
Snapshot Immutable record: captured_at, path, file_type, plaintext_hmac, nonce, ciphertext_ref, ciphertext_size, word_count, char_count, commitment_scheme, content_dedup_hmac, extractor_version, synced_at. The dedup HMAC is local-only and never crosses the network; extractor_version is uploaded as non-sensitive provenance.
capture(path, raw, keys, now)CapturedData Pure encryption + metadata path, no I/O.
capture_path(path, store, ciphertext_dir, keys, now) Full snapshot from disk: read, capture, persist. Skips writing when the content matches the previous snapshot for that path (returning that existing snapshot), so direct callers like the Mac sidecar inherit the dedup without the watcher.
restore_snapshot(snapshot, ciphertext_dir, keys) Decrypt back to plaintext bytes.
SnapshotStore SQLite persistence: record / list / count / get / latest_for_path / list_unsynced / mark_synced. Thread-safe. ISO-8601 timestamps. Additive on-open migrations (no migration framework).
SnapshotHandler watchdog handler with suffix and directory filters, per-path HMAC dedup, on_moved routed through capture.

Backend sync

Symbol Purpose
BackendClient urllib-based HTTP client with injectable transport for tests. Methods: enrol, login, upload_snapshot, request_proof_bundle (accepts an optional reveals map, computed by compute_reveals, that the backend embeds in bundle.json).
compute_reveals(store, keys)dict[str, str] Walk the local store and derive {ciphertext_ref: reveal_key_hex} for every v2-per-leaf snapshot. Posted only at bundle-generation time, never during sync.
sync_snapshots(store, ciphertext_dir, keys, client, now) For each unsynced snapshot: encrypt the path with enc_key (fresh nonce per upload), read ciphertext from disk, POST to /api/snapshots, stamp synced_at.
load_backend_config, save_backend_config backend.json under the store dir: token + argon2 salt + server URL.

Passphrase caching

Symbol Purpose
PassphraseCache Protocol.
InMemoryPassphraseCache Tests.
KeyringPassphraseCache Real use; delegates to the OS keyring.
BLINDPROOF_PASSPHRASE_INSECURE env var Smoke-test backdoor; never use in production.

CLI

parse_args() + main() dispatch to these subcommands:

  • init <store_dir> — create a store and derive the master key.
  • watch <path> — start the file watcher.
  • restore <snapshot_id> <out_path> — decrypt a captured snapshot.
  • enrol — register with the backend; writes backend.json.
  • sync — push unsynced snapshots.

Store location is configurable via BLINDPROOF_STORE_DIR; default ~/.blindproof.

What's deliberately not in the client

  • Auto-sync. watch captures; sync uploads. Decoupling them keeps capture resilient when the network is down.
  • Embedded dashboard. The GUI opens https://blindproof.co.uk/dashboard in the system browser instead of bundling a webview.
  • Faithful .docx restore (model B). .docx capture uses approach A: we extract the body text and then commit, encrypt, and store that text — the original .docx bytes are not kept. So restoring a Word snapshot returns text, not a re-openable .docx. Storing the original bytes too (a second encrypted blob — "model B") is on the roadmap; it is purely additive and leaves the commitment untouched, but because approach A discards the original bytes at capture time it would only help captures made after it ships, never historical ones.
  • .docx manuscript-match in verify.py. The verifier re-extracts a supplied manuscript to recompute the HMAC, but currently only .md / .txt manuscripts. Matching against a .docx manuscript would mean teaching verify.py the same docx-v1 extraction (and adding python-docx to its deliberately minimal dependency list) — deferred.
  • More formats. Scrivener (RTF) and Google Docs are next (see blindproof_spec.md §9). Legacy .doc and Apple Pages are possible later but extract far less deterministically than .docx; model B matters more for them, since keeping the original bytes lets a future, better converter re-extract for restore without disturbing the already-anchored commitment. Each format drops into the _EXTRACTORS registry with its own extractor_version.

These .docx follow-ups (model B, verifier matching, more formats) are tracked in issue #21. Mac-helper .docx capture has since shipped (the helper's captureSuffixes includes .docx and the sidecar interpreter carries python-docx). - Key rotation. V1. Today, rotating a passphrase means creating a new store.

See also