Post
467
just published a short article about something that bit me hard while porting PI05βs subtask prediction to PyTorch: left vs right alignment in transformer padding.
turns out JAX (what Physical Intelligence used) and Hugging Face use opposite padding conventions β and if you donβt catch it, your model silently produces nonsense instead of crashing. no NaN, no error, just garbled subtasks π€‘
i walk through the full tensor pipeline β images β embeddings β pad masks β attention masks β position IDs β and show exactly where the mismatch corrupts everything. also included the implementation file with the fix.
if youβve ever ported a model between frameworks or messed with custom attention patterns, i think you will enjoy it
turns out JAX (what Physical Intelligence used) and Hugging Face use opposite padding conventions β and if you donβt catch it, your model silently produces nonsense instead of crashing. no NaN, no error, just garbled subtasks π€‘
i walk through the full tensor pipeline β images β embeddings β pad masks β attention masks β position IDs β and show exactly where the mismatch corrupts everything. also included the implementation file with the fix.
if youβve ever ported a model between frameworks or messed with custom attention patterns, i think you will enjoy it