README.md CHANGED
@@ -1,110 +1,3 @@
1
- ---
2
- tags:
3
- - computer_vision
4
- - animal_pose_and_shape_estimation
5
- - DeepLabCut
6
- pipeline_tag: image-to-3d
7
- ---
8
- # MODEL CARD:
9
-
10
- ## Model Details
11
-
12
- • PRIMA model(s) developed by the [M.W.Mathis Lab](http://www.mackenziemathislab.org/) in 2026, trained to predict quadruped shape and pose from images.
13
- Please see **paper link** for details.
14
-
15
- • There are two main models:
16
- - `s1ckpt.ckpt` is the stage-1 model trained with Animal3D, CtrlAni3D, and Quadruped2D datasets.
17
- - `s3ckpt.ckpt` is the stage-3 model trained with Animal3D, CtrlAni3D, and Quadruped3D datasets.
18
-
19
- ```python
20
- from pathlib import Path
21
- from huggingface_hub import hf_hub_download
22
-
23
- repo_id = "MLAdaptiveIntelligence/PRIMA"
24
-
25
- model_dir = Path("./prima_model")
26
- model_dir.mkdir(parents=True, exist_ok=True)
27
-
28
- # download stage-1 checkpoint
29
- s1_path = hf_hub_download(
30
- repo_id=repo_id,
31
- filename="s1ckpt.ckpt",
32
- local_dir=model_dir
33
- )
34
-
35
- # donwload stage-3 checkpoint
36
- s3_path = hf_hub_download(
37
- repo_id=repo_id,
38
- filename="s3ckpt.ckpt",
39
- local_dir=model_dir
40
- )
41
- ```
42
- ## Intended Use
43
- • Intended to be used for shape and pose estimation of quadruped images taken from a single view.
44
-
45
- • Intended for academic and research professionals working in fields related to animal behavior, such as neuroscience
46
- and ecology.
47
-
48
- • Not suitable as a zero-shot model for applications that require high shape and pose precision, but can be further optimized with 2D keypoint
49
- annotations or from SuperAnimal to improve accuracy. Also, it is not suitable for videos that look dramatically different from those
50
- we show in the paper.
51
-
52
- ## Metrics
53
- • PA-MPJPE (Procrustes-aligned mean per-joint position error), computed over 3D joints.
54
-
55
- • PA-MPVPE (Procrustes-aligned mean per-vertex position error), computed over the SMAL mesh vertices.
56
-
57
- • PCK (Percentage of Correct Keypoints) measures the proportion of predicted keypoints within a specified threshold of the ground-truth keypoints.
58
-
59
- • AUC (Area Under the Curve), computed by integrating the PCK values as the threshold varies from 0 to 1.
60
-
61
-
62
- ## Evaluation Data
63
- • In the paper we benchmark on Animal3d, CtrlAni3D, Quadruped2D, and AnimalKingdom.
64
-
65
- ## Training Data:
66
- It consists of being trained together on the following datasets:
67
- - **Animal3D** see full details at (1).
68
- - **CtrlAni3D** See full details at (2).
69
- - **Quadruped2D** See full details at (3).
70
- - **Quadruped3D** See full details at **paper link**.
71
-
72
-
73
- ## Ethical Considerations
74
- • No experimental data were collected for this model; all datasets used are cited.
75
-
76
- ## License
77
- Modified MIT.
78
-
79
- Copyright 2026 by Mackenzie Mathis, Xiaohang Yu, and contributors.
80
-
81
- Permission is hereby granted to you (hereafter "LICENSEE") a fully-paid, non-exclusive,
82
- and non-transferable license for academic, non-commercial purposes only (hereafter “LICENSE”)
83
- to use the "MODEL" weights (hereafter "MODEL"), subject to the following conditions:
84
-
85
- The above copyright notice and this permission notice shall be included in all copies or substantial
86
- portions of the Software:
87
-
88
- This software may not be used to harm any animal deliberately.
89
-
90
- LICENSEE acknowledges that the MODEL is a research tool.
91
- THE MODEL IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING
92
- BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
93
- IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
94
- WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE MODEL
95
- OR THE USE OR OTHER DEALINGS IN THE MODEL.
96
-
97
- If this license is not appropriate for your application, please contact Prof. Mackenzie W. Mathis
98
- (mackenzie@post.harvard.edu) and/or the TTO office at EPFL (tto@epfl.ch) for a commercial use license.
99
-
100
- Please cite **paper link** if you use this model in your work.
101
-
102
- ## References
103
- 1. Xu, J., Zhang, Y., Peng, J., Ma, W., Jesslen, A., Ji, P., Hu, Q., Zhang, J., Liu, Q.,
104
- Wang, J., et al.: Animal3d: A comprehensive dataset of 3d animal pose and shape.
105
- In: ICCV. pp. 9099–9109 (2023)
106
- 2. Lyu, J., Zhu, T., Gu, Y., Lin, L., Cheng, P., Liu, Y., Tang, X., An, L.: Animer:
107
- Animal pose and shape estimation using a family-aware transformer. In: CVPR. pp.
108
- 17486–17496 (2025)
109
- 3. Ye, S., Filippova, A., Lauer, J., Schneider, S., Vidal, M., Qiu, T., Mathis, A.,
110
- Mathis, M.W.: Superanimal pretrained pose estimation models for behavioral analysis. Nature communications 15(1), 5165 (2024)
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
amr_vitbb.pth DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:dcf670ec02263ef338a8f01ef3330c2ca120f25b8f0b5d59128e1886652d8eb7
3
- size 2523794490
 
 
 
 
config_s1_HYDRA.yaml DELETED
@@ -1,147 +0,0 @@
1
- task_name: train
2
- tags:
3
- - dev
4
- train: true
5
- test: false
6
- ckpt_path: true
7
- seed: null
8
- trainer:
9
- _target_: pytorch_lightning.Trainer
10
- default_root_dir: ${paths.output_dir}
11
- accelerator: gpu
12
- devices: 1
13
- deterministic: false
14
- num_sanity_val_steps: 0
15
- log_every_n_steps: ${GENERAL.LOG_STEPS}
16
- val_check_interval: ${GENERAL.VAL_STEPS}
17
- check_val_every_n_epoch: ${GENERAL.VAL_EPOCHS}
18
- precision: 16-mixed
19
- max_steps: ${GENERAL.TOTAL_STEPS}
20
- limit_val_batches: 80
21
- paths:
22
- root_dir: ${oc.env:PROJECT_ROOT}
23
- data_dir: ${paths.root_dir}/data/
24
- log_dir: logs/
25
- output_dir: ${hydra:runtime.output_dir}
26
- work_dir: ${hydra:runtime.cwd}
27
- extras:
28
- ignore_warnings: false
29
- enforce_tags: true
30
- print_config: true
31
- exp_name: newbioposeStage02
32
- SMAL:
33
- DATA_DIR: data/smal
34
- MODEL_PATH: data/smal/my_smpl_00781_4_all.pkl
35
- SHAPE_PRIOR_PATH: data/smal/my_smpl_data_00781_4_all.pkl
36
- POSE_PRIOR_PATH: data/smal/walking_toy_symmetric_pose_prior_with_cov_35parts.pkl
37
- NUM_JOINTS: 34
38
- EXTRA:
39
- FOCAL_LENGTH: 1000
40
- NUM_LOG_IMAGES: 4
41
- NUM_LOG_SAMPLES_PER_IMAGE: 4
42
- PELVIS_IND: 0
43
- DATASETS:
44
- CONFIG:
45
- SCALE_FACTOR: 0.3
46
- ROT_FACTOR: 30
47
- TRANS_FACTOR: 0.02
48
- COLOR_SCALE: 0.2
49
- ROT_AUG_RATE: 0.6
50
- TRANS_AUG_RATE: 0.5
51
- DO_FLIP: false
52
- FLIP_AUG_RATE: 0.0
53
- EXTREME_CROP_AUG_RATE: 0.0
54
- EXTREME_CROP_AUG_LEVEL: 1
55
- ANIMAL3D:
56
- ROOT_IMAGE: ./datasets/animal3d/
57
- JSON_FILE:
58
- TRAIN: ./datasets/animal3d/train.json
59
- TEST: ./datasets/animal3d/test.json
60
- WEIGHT: 1.0
61
- CONTROL_ANIMAL3D:
62
- ROOT_IMAGE: ./datasets/control_animal3dlatest/
63
- JSON_FILE:
64
- TRAIN: ./datasets/control_animal3dlatest/train.json
65
- TEST: ./datasets/control_animal3dlatest/test.json
66
- WEIGHT: 0.5
67
- QUADRUPED2D:
68
- ROOT_IMAGE: ./datasets/quadruped2d/
69
- JSON_FILE:
70
- TRAIN: ./datasets/quadruped2d/train.json
71
- TEST: ./datasets/quadruped2d/test.json
72
- WEIGHT: 0.15
73
- GENERAL:
74
- TOTAL_STEPS: 450000
75
- LOG_STEPS: 533
76
- VAL_STEPS: 533
77
- VAL_EPOCHS: 1
78
- CHECKPOINT_EPOCHS: 1
79
- CHECKPOINT_SAVE_TOP_K: 2
80
- NUM_WORKERS: 2
81
- PREFETCH_FACTOR: 2
82
- LOSS_WEIGHTS:
83
- KEYPOINTS_3D: 0.05
84
- KEYPOINTS_2D: 0.01
85
- INTERMEDIATE_KP2D: 0.001
86
- INTERMEDIATE_KP3D: 0.001
87
- GLOBAL_ORIENT: 0.005
88
- POSE: 0.001
89
- BETAS: 0.0005
90
- TRANSL: 0.0005
91
- ADVERSARIAL: 0.0
92
- SUPCON: 0.0005
93
- TRAIN:
94
- LR: 3.75e-06
95
- WEIGHT_DECAY: 0.0001
96
- BATCH_SIZE: 48
97
- LOSS_REDUCTION: mean
98
- NUM_TRAIN_SAMPLES: 2
99
- NUM_TEST_SAMPLES: 64
100
- POSE_2D_NOISE_RATIO: 0.01
101
- SMPL_PARAM_NOISE_RATIO: 0.005
102
- MODEL:
103
- IMAGE_SIZE: 256
104
- IMAGE_MEAN:
105
- - 0.485
106
- - 0.456
107
- - 0.406
108
- IMAGE_STD:
109
- - 0.229
110
- - 0.224
111
- - 0.225
112
- BACKBONE:
113
- TYPE: vith
114
- PRETRAINED_WEIGHTS: ./data/amr_vitbb.pth
115
- FREEZE: false
116
- USE_BIOCLIP_EMBEDDING: true
117
- BIOCLIP_EMBEDDING:
118
- EMBED_DIM: 1280
119
- TYPE: bioclip1
120
- USE_KEYPOINT_EMBEDDING: false
121
- KEYPOINT_EMBEDDING:
122
- NUM_KEYPOINTS: 26
123
- KEYPOINT_DIM: 2
124
- EMBED_DIM: 1280
125
- HIDDEN_DIM: 512
126
- TYPE: token
127
- SMAL_HEAD:
128
- TYPE: new_bio_pose_transformer_decoder
129
- IN_CHANNELS: 1280
130
- IEF_ITERS: 1
131
- DECODER_DIM: 1280
132
- NUM_DECODER_LAYERS: 6
133
- NUM_HEADS: 8
134
- MLP_RATIO: 4.0
135
- USE_KEYPOINT_2D_TOKENS: true
136
- USE_KEYPOINT_3D_TOKENS: true
137
- KEYPOINT_TOKEN_UPDATE: true
138
- KP2D_INJECT_IMAGE_FEAT: true
139
- TRANSFORMER_DECODER:
140
- depth: 6
141
- heads: 8
142
- mlp_dim: 1024
143
- dim_head: 64
144
- dropout: 0.0
145
- emb_dropout: 0.0
146
- norm: layer
147
- context_dim: 1280
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
config_s3_HYDRA.yaml DELETED
@@ -1,147 +0,0 @@
1
- task_name: train
2
- tags:
3
- - dev
4
- train: true
5
- test: false
6
- ckpt_path: true
7
- seed: null
8
- trainer:
9
- _target_: pytorch_lightning.Trainer
10
- default_root_dir: ${paths.output_dir}
11
- accelerator: gpu
12
- devices: 1
13
- deterministic: false
14
- num_sanity_val_steps: 0
15
- log_every_n_steps: ${GENERAL.LOG_STEPS}
16
- val_check_interval: ${GENERAL.VAL_STEPS}
17
- check_val_every_n_epoch: ${GENERAL.VAL_EPOCHS}
18
- precision: 16-mixed
19
- max_steps: ${GENERAL.TOTAL_STEPS}
20
- limit_val_batches: 80
21
- paths:
22
- root_dir: ${oc.env:PROJECT_ROOT}
23
- data_dir: ${paths.root_dir}/data/
24
- log_dir: logs/
25
- output_dir: ${hydra:runtime.output_dir}
26
- work_dir: ${hydra:runtime.cwd}
27
- extras:
28
- ignore_warnings: false
29
- enforce_tags: true
30
- print_config: true
31
- exp_name: bestquad3dStage02
32
- SMAL:
33
- DATA_DIR: data/smal
34
- MODEL_PATH: data/smal/my_smpl_00781_4_all.pkl
35
- SHAPE_PRIOR_PATH: data/smal/my_smpl_data_00781_4_all.pkl
36
- POSE_PRIOR_PATH: data/smal/walking_toy_symmetric_pose_prior_with_cov_35parts.pkl
37
- NUM_JOINTS: 34
38
- EXTRA:
39
- FOCAL_LENGTH: 1000
40
- NUM_LOG_IMAGES: 4
41
- NUM_LOG_SAMPLES_PER_IMAGE: 4
42
- PELVIS_IND: 0
43
- DATASETS:
44
- CONFIG:
45
- SCALE_FACTOR: 0.3
46
- ROT_FACTOR: 30
47
- TRANS_FACTOR: 0.02
48
- COLOR_SCALE: 0.2
49
- ROT_AUG_RATE: 0.6
50
- TRANS_AUG_RATE: 0.5
51
- DO_FLIP: false
52
- FLIP_AUG_RATE: 0.0
53
- EXTREME_CROP_AUG_RATE: 0.0
54
- EXTREME_CROP_AUG_LEVEL: 1
55
- ANIMAL3D:
56
- ROOT_IMAGE: ./datasets/animal3d/
57
- JSON_FILE:
58
- TRAIN: ./datasets/animal3d/train.json
59
- TEST: ./datasets/animal3d/test.json
60
- WEIGHT: 1.0
61
- CONTROL_ANIMAL3D:
62
- ROOT_IMAGE: ./datasets/control_animal3dlatest/
63
- JSON_FILE:
64
- TRAIN: ./datasets/control_animal3dlatest/train.json
65
- TEST: ./datasets/control_animal3dlatest/test.json
66
- WEIGHT: 0.5
67
- QUADRUPED2D:
68
- ROOT_IMAGE: ./datasets/quadruped2d/
69
- JSON_FILE:
70
- TRAIN: ./datasets/quadruped2d/train3d_60filtered.json
71
- TEST: ./datasets/quadruped2d/test.json
72
- WEIGHT: 0.5
73
- GENERAL:
74
- TOTAL_STEPS: 450000
75
- LOG_STEPS: 451
76
- VAL_STEPS: 451
77
- VAL_EPOCHS: 1
78
- CHECKPOINT_EPOCHS: 1
79
- CHECKPOINT_SAVE_TOP_K: 2
80
- NUM_WORKERS: 2
81
- PREFETCH_FACTOR: 2
82
- LOSS_WEIGHTS:
83
- KEYPOINTS_3D: 0.05
84
- KEYPOINTS_2D: 0.01
85
- INTERMEDIATE_KP2D: 0.01
86
- INTERMEDIATE_KP3D: 0.01
87
- GLOBAL_ORIENT: 0.005
88
- POSE: 0.001
89
- BETAS: 0.0005
90
- TRANSL: 0.0005
91
- ADVERSARIAL: 0.0
92
- SUPCON: 0.0005
93
- TRAIN:
94
- LR: 3.75e-06
95
- WEIGHT_DECAY: 0.0001
96
- BATCH_SIZE: 48
97
- LOSS_REDUCTION: mean
98
- NUM_TRAIN_SAMPLES: 2
99
- NUM_TEST_SAMPLES: 64
100
- POSE_2D_NOISE_RATIO: 0.01
101
- SMPL_PARAM_NOISE_RATIO: 0.005
102
- MODEL:
103
- IMAGE_SIZE: 256
104
- IMAGE_MEAN:
105
- - 0.485
106
- - 0.456
107
- - 0.406
108
- IMAGE_STD:
109
- - 0.229
110
- - 0.224
111
- - 0.225
112
- BACKBONE:
113
- TYPE: vith
114
- PRETRAINED_WEIGHTS: ./data/amr_vitbb.pth
115
- FREEZE: false
116
- USE_BIOCLIP_EMBEDDING: true
117
- BIOCLIP_EMBEDDING:
118
- EMBED_DIM: 1280
119
- TYPE: bioclip1
120
- USE_KEYPOINT_EMBEDDING: false
121
- KEYPOINT_EMBEDDING:
122
- NUM_KEYPOINTS: 26
123
- KEYPOINT_DIM: 2
124
- EMBED_DIM: 1280
125
- HIDDEN_DIM: 512
126
- TYPE: token
127
- SMAL_HEAD:
128
- TYPE: new_bio_pose_transformer_decoder
129
- IN_CHANNELS: 1280
130
- IEF_ITERS: 1
131
- DECODER_DIM: 1280
132
- NUM_DECODER_LAYERS: 6
133
- NUM_HEADS: 8
134
- MLP_RATIO: 4.0
135
- USE_KEYPOINT_2D_TOKENS: true
136
- USE_KEYPOINT_3D_TOKENS: true
137
- KEYPOINT_TOKEN_UPDATE: true
138
- KP2D_INJECT_IMAGE_FEAT: true
139
- TRANSFORMER_DECODER:
140
- depth: 6
141
- heads: 8
142
- mlp_dim: 1024
143
- dim_head: 64
144
- dropout: 0.0
145
- emb_dropout: 0.0
146
- norm: layer
147
- context_dim: 1280
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
my_smpl_00781_4_all.pkl DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:22831db0e0e564dc95128e098da19995c2dda39b1aa18acc1335a6e62e0e3a59
3
- size 33686326
 
 
 
 
my_smpl_data_00781_4_all.pkl DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:b21364a1eb9cc60c9ff5bb07182eab3d715a200da48904e0e0465fbb8b57e153
3
- size 246211
 
 
 
 
s3ckpt.ckpt DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:194c24e8179ce8ef2fe52e2e7c39f67d3a246c94277cc4aa9dc883d578087239
3
- size 10222809027
 
 
 
 
walking_toy_symmetric_pose_prior_with_cov_35parts.pkl DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:c91a65abff0a67f40f888b3e7c05c350e9d1c128a07ee6c1b01ed4449cf8379f
3
- size 541909