PhysioJEPA
Self-supervised ECG-PPG representation learning via Joint Embedding Predictive Architecture.
Key finding: mask ratio is the hidden lever
We discovered that unimodal ECG-JEPA (Weimann & Conrad, 2024) has a predictor shortcut vulnerability at the standard 50% mask ratio: the predictor learns local-interpolation shortcuts that degrade downstream performance as training progresses.
Raising mask ratio from 50% to 75% eliminates the shortcut and recovers full downstream performance, matching cross-modal JEPA:
| Model | Mask | AF AUROC (ep25) |
|---|---|---|
| Unimodal ECG-JEPA | 0.50 | 0.703 |
| Unimodal ECG-JEPA | 0.75 | 0.848 |
| Cross-modal ECG-PPG JEPA | -- | 0.847 |
| PhysioJEPA (cross-modal + Δt) | -- | 0.835 |
The mechanism: at 50% masking with contiguous blocks, the predictor has 25 visible context patches and 25 target patches. It discovers a short-range interpolation shortcut early in training (L_self dips at step ~1500). As the encoder refines and patches become less linearly interpolatable, the shortcut fails (L_self spikes at step ~4675). The encoder locks into a self-consistent but downstream-uninformative optimum.
At 75% masking (12 visible, 37 target), no interpolation path exists. The predictor learns long-range structure from the start.
Cross-modal prediction works by the same mechanism: 0% of PPG is visible as context, so no interpolation shortcut can form.
Confirmed by 5 ablation arms
- Slow tau (ema_end=0.999, warmup=60%): spike persists -> tau is NOT the cause
- Smaller predictor (depth 4->1): spike persists -> capacity is NOT the cause
- Sinusoidal queries (no learned embeddings): spike WORSENS
- Mask ratio 0.75: spike ELIMINATED, AUROC recovers to 0.848
- Full data (10x): spike delayed but present -> architectural, not data-scale
Architecture
- ECG encoder: ViT-S (12 layers, d=256, 8 heads) on single-lead II @ 250 Hz
- PPG encoder: ViT-T (6 layers, d=256) on Pleth @ 125 Hz
- Predictor: 4-layer cross-attention transformer
- EMA target encoder (tau 0.996 -> 0.9999 cosine over 30% of training)
- Loss: L1 latent prediction (cross-modal) + 0.3 * L1 ECG self-prediction
Dataset
Training: lucky9-cyou/mimic-iv-aligned-ppg-ecg (MIMIC-IV ICU waveforms, ~814 hours, ~381 patients, sample-accurate ECG-PPG alignment)
Evaluation: PTB-XL (PhysioNet, 21.8k 12-lead ECGs, lead II resampled to 250 Hz)
Usage
# Install
git clone https://huggingface.co/guychuk/PhysioJEPA
cd PhysioJEPA
uv sync
# Smoke test (CPU, random data)
PYTHONPATH=src uv run python scripts/smoke_test.py
# Train (requires GPU + MIMIC data)
PYTHONPATH=src uv run python scripts/train.py --config configs/base.yaml --model A --mask_ratio 0.75
Repository structure
src/physiojepa/
models.py # 4 model variants (A=unimodal, B=cross-modal, C=InfoNCE, F=PhysioJEPA)
vit.py # ViT-1D encoder + cross-attention predictor
data.py # MIMIC dataset with sliding windows
data_fast.py # mmap-backed fast dataset for full-scale runs
trainer.py # shared training loop with WandB + collapse monitoring
ema.py # EMA with cosine tau schedule
masking.py # I-JEPA multi-block 1D masking
probe.py # linear probe evaluators
configs/
base.yaml # shared hyperparameters
docs/
RESEARCH_LOG.md # complete research narrative
e2_e3_results.md # K-gate results + ablation findings
EXPERIMENT_TRACKING.md # experiment matrix + post-hoc results
RESEARCH_DEVELOPMENT.md # full research development document
Citation
@misc{physiojepa2026,
title={PhysioJEPA: Mask Ratio as the Hidden Lever in Cardiac JEPA},
author={Oz Labs},
year={2026},
url={https://huggingface.co/guychuk/PhysioJEPA}
}
License
MIT