modularStarEncoder
/

ModularStarEncoder

Feature Extraction

ModularStarEncoder

Model card Files Files and versions

andreagurioli1995 commited on May 20, 2025

Commit

1fd6a72

·

verified ·

1 Parent(s): 8d946e5

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -20,7 +20,7 @@ To enhance efficiency, we replaced the causal self-attention layers with bidirec
 Finally, our implementation integrates FlashAttention V2 for faster inference.
-- **Paper:** [One Model to Train them All: Hierarchical Self-Distillation for Enhanced Early Layer Embeddings](https://arxiv.org/abs/2503.03008)
 - **Languages:** 600+ Programming languages
@@ -86,8 +86,8 @@ The model is licensed under the BigCode OpenRAIL-M v1 license agreement. You can
 # Citation
 ```
-@article{gurioli2025modeltrainallhierarchical,
-      title={One Model to Train them All: Hierarchical Self-Distillation for Enhanced Early Layer Embeddings},
       author={Andrea Gurioli and Federico Pennino and João Monteiro and Maurizio Gabbrielli},
       year={2025},
       eprint={2503.03008},

 Finally, our implementation integrates FlashAttention V2 for faster inference.
+- **Paper:** [MoSE: Hierarchical Self-Distillation Enhances Early Layer Embeddings](https://arxiv.org/abs/2503.03008)
 - **Languages:** 600+ Programming languages
 # Citation
 ```
+@article{gurioli2025mosehierarchicalselfdistillationenhances,
+      title={MoSE: Hierarchical Self-Distillation Enhances Early Layer Embeddings},
       author={Andrea Gurioli and Federico Pennino and João Monteiro and Maurizio Gabbrielli},
       year={2025},
       eprint={2503.03008},