PhysioJEPA

Self-supervised ECG-PPG representation learning via Joint Embedding Predictive Architecture.

Key finding: mask ratio is the hidden lever

We discovered that unimodal ECG-JEPA (Weimann & Conrad, 2024) has a predictor shortcut vulnerability at the standard 50% mask ratio: the predictor learns local-interpolation shortcuts that degrade downstream performance as training progresses.

Raising mask ratio from 50% to 75% eliminates the shortcut and recovers full downstream performance, matching cross-modal JEPA:

Model	Mask	AF AUROC (ep25)
Unimodal ECG-JEPA	0.50	0.703
Unimodal ECG-JEPA	0.75	0.848
Cross-modal ECG-PPG JEPA	--	0.847
PhysioJEPA (cross-modal + Δt)	--	0.835

The mechanism: at 50% masking with contiguous blocks, the predictor has 25 visible context patches and 25 target patches. It discovers a short-range interpolation shortcut early in training (L_self dips at step ~1500). As the encoder refines and patches become less linearly interpolatable, the shortcut fails (L_self spikes at step ~4675). The encoder locks into a self-consistent but downstream-uninformative optimum.

At 75% masking (12 visible, 37 target), no interpolation path exists. The predictor learns long-range structure from the start.

Cross-modal prediction works by the same mechanism: 0% of PPG is visible as context, so no interpolation shortcut can form.

Confirmed by 5 ablation arms

Slow tau (ema_end=0.999, warmup=60%): spike persists -> tau is NOT the cause
Smaller predictor (depth 4->1): spike persists -> capacity is NOT the cause
Sinusoidal queries (no learned embeddings): spike WORSENS
Mask ratio 0.75: spike ELIMINATED, AUROC recovers to 0.848
Full data (10x): spike delayed but present -> architectural, not data-scale

Architecture

ECG encoder: ViT-S (12 layers, d=256, 8 heads) on single-lead II @ 250 Hz
PPG encoder: ViT-T (6 layers, d=256) on Pleth @ 125 Hz
Predictor: 4-layer cross-attention transformer
EMA target encoder (tau 0.996 -> 0.9999 cosine over 30% of training)
Loss: L1 latent prediction (cross-modal) + 0.3 * L1 ECG self-prediction

Dataset

Training: lucky9-cyou/mimic-iv-aligned-ppg-ecg (MIMIC-IV ICU waveforms, ~814 hours, ~381 patients, sample-accurate ECG-PPG alignment)

Evaluation: PTB-XL (PhysioNet, 21.8k 12-lead ECGs, lead II resampled to 250 Hz)

Usage

# Install
git clone https://huggingface.co/guychuk/PhysioJEPA
cd PhysioJEPA
uv sync

# Smoke test (CPU, random data)
PYTHONPATH=src uv run python scripts/smoke_test.py

# Train (requires GPU + MIMIC data)
PYTHONPATH=src uv run python scripts/train.py --config configs/base.yaml --model A --mask_ratio 0.75

Repository structure

src/physiojepa/
  models.py      # 4 model variants (A=unimodal, B=cross-modal, C=InfoNCE, F=PhysioJEPA)
  vit.py         # ViT-1D encoder + cross-attention predictor
  data.py        # MIMIC dataset with sliding windows
  data_fast.py   # mmap-backed fast dataset for full-scale runs
  trainer.py     # shared training loop with WandB + collapse monitoring
  ema.py         # EMA with cosine tau schedule
  masking.py     # I-JEPA multi-block 1D masking
  probe.py       # linear probe evaluators
configs/
  base.yaml      # shared hyperparameters
docs/
  RESEARCH_LOG.md          # complete research narrative
  e2_e3_results.md         # K-gate results + ablation findings
  EXPERIMENT_TRACKING.md   # experiment matrix + post-hoc results
  RESEARCH_DEVELOPMENT.md  # full research development document

Citation

@misc{physiojepa2026,
  title={PhysioJEPA: Mask Ratio as the Hidden Lever in Cardiac JEPA},
  author={Oz Labs},
  year={2026},
  url={https://huggingface.co/guychuk/PhysioJEPA}
}

License

MIT

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

guychuk
/

PhysioJEPA