SentenceTransformer based on abdeljalilELmajjodi/model

This is a sentence-transformers model finetuned from abdeljalilELmajjodi/model on the all-nli dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for retrieval.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: abdeljalilELmajjodi/model
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 1024 dimensions
  • Similarity Function: Cosine Similarity
  • Supported Modality: Text
  • Training Dataset:
    • all-nli

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'last_hidden_state'}}, 'module_output_name': 'token_embeddings', 'architecture': 'XLMRobertaModel'})
  (1): Pooling({'embedding_dimension': 1024, 'pooling_mode': 'mean', 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'A woman is walking across the street eating a banana, while a man is following with his briefcase.',
    'An actress and her favorite assistant talk a walk in the city.',
    'A man in a restaurant is waiting for his meal to arrive.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.9891, 0.9925],
#         [0.9891, 1.0000, 0.9870],
#         [0.9925, 0.9870, 1.0000]])

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.3132
spearman_cosine 0.2196

Training Details

Training Dataset

all-nli

  • Dataset: all-nli
  • Size: 80 training samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 80 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 10 tokens
    • mean: 25.59 tokens
    • max: 52 tokens
    • min: 5 tokens
    • mean: 12.04 tokens
    • max: 29 tokens
    • min: 0.0
    • mean: 0.47
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    Woman in white in foreground and a man slightly behind walking with a sign for John's Pizza and Gyro in the background. The woman and man are outdoors. 1.0
    An older man sits with his orange juice at a small table in a coffee shop while employees in bright colored shirts smile in the background. An older man drinks his juice as he waits for his daughter to get off work. 0.5
    Children smiling and waving at camera There are children present 1.0
  • Loss: CoSENTLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "pairwise_cos_sim"
    }
    

Evaluation Dataset

all-nli

  • Dataset: all-nli
  • Size: 20 evaluation samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 20 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 14 tokens
    • mean: 26.3 tokens
    • max: 52 tokens
    • min: 7 tokens
    • mean: 11.75 tokens
    • max: 29 tokens
    • min: 0.0
    • mean: 0.65
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    High fashion ladies wait outside a tram beside a crowd of people in the city. The women do not care what clothes they wear. 0.0
    The school is having a special event in order to show the american culture on how other cultures are dealt with in parties. A school is hosting an event. 1.0
    An older man is drinking orange juice at a restaurant. A man is drinking juice. 1.0
  • Loss: CoSENTLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "pairwise_cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • num_train_epochs: 1
  • warmup_steps: 0.05
  • bf16: True
  • fp16_full_eval: True
  • load_best_model_at_end: True
  • push_to_hub: True
  • gradient_checkpointing: True

All Hyperparameters

Click to expand
  • do_predict: False
  • prediction_loss_only: True
  • per_device_train_batch_size: 8
  • per_device_eval_batch_size: 8
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_ratio: None
  • warmup_steps: 0.05
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • enable_jit_checkpoint: False
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • use_cpu: False
  • seed: 42
  • data_seed: None
  • bf16: True
  • fp16: False
  • bf16_full_eval: False
  • fp16_full_eval: True
  • tf32: None
  • local_rank: -1
  • ddp_backend: None
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • push_to_hub: True
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: True
  • gradient_checkpointing_kwargs: None
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • auto_find_batch_size: False
  • full_determinism: False
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • use_cache: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss Validation Loss pair-score-evaluator-dev_spearman_cosine
0.1 1 3.0315 - -
0.5 5 3.0018 - -
1.0 10 3.7124 2.6509 0.2737
0.1 1 2.8043 - -
0.5 5 2.9111 - -
1.0 10 3.1121 2.6923 0.2196
  • The bold row denotes the saved checkpoint.

Training Time

  • Training: 5.6 minutes

Framework Versions

  • Python: 3.12.13
  • Sentence Transformers: 5.4.1
  • Transformers: 5.0.0
  • PyTorch: 2.10.0+cu128
  • Accelerate: 1.13.0
  • Datasets: 4.8.5
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CoSENTLoss

@article{10531646,
    author={Huang, Xiang and Peng, Hao and Zou, Dongcheng and Liu, Zhiwei and Li, Jianxin and Liu, Kay and Wu, Jia and Su, Jianlin and Yu, Philip S.},
    journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
    title={CoSENT: Consistent Sentence Embedding via Similarity Ranking},
    year={2024},
    doi={10.1109/TASLP.2024.3402087}
}
Downloads last month
30
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for aicha-zeroual/result_model

Finetuned
(27)
this model

Paper for aicha-zeroual/result_model

Evaluation results