| --- |
| library_name: transformers |
| pipeline_tag: text-classification |
| tags: |
| - text-classification |
| - pytorch |
| - jax |
| - code_x_glue_cc_defect_detection |
| - code |
| - roberta |
| - security |
| - vulnerability-detection |
| - codebert |
| - apache-2.0 |
| license: apache-2.0 |
| --- |
| |
| # CodeBERT fine-tuned for Java Vulnerability Detection |
|
|
| CodeBERT model fine-tuned for detecting security vulnerabilities in Java code. |
|
|
| ## Model Description |
|
|
| This model is fine-tuned from [microsoft/codebert-base](https://huggingface.co/microsoft/codebert-base) for binary classification of secure/insecure Java code. |
|
|
| ## Intended Uses |
|
|
| - Detect security vulnerabilities in Java source code |
| - Binary classification: Safe (LABEL_0) vs Vulnerable (LABEL_1) |
|
|
| ## How to Use |
| ```python |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification |
| |
| tokenizer = AutoTokenizer.from_pretrained("mangsense/codebert_java") |
| model = AutoModelForSequenceClassification.from_pretrained("mangsense/codebert_java") |
| |
| # run code |
| ```python |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification |
| import torch |
| import numpy as np |
| tokenizer = AutoTokenizer.from_pretrained('mrm8488/codebert-base-finetuned-detect-insecure-code') |
| model = AutoModelForSequenceClassification.from_pretrained('mrm8488/codebert-base-finetuned-detect-insecure-code') |
|
|
| inputs = tokenizer("your code here", return_tensors="pt", truncation=True, padding='max_length') |
| labels = torch.tensor([1]).unsqueeze(0) # Batch size 1 |
| outputs = model(**inputs, labels=labels) |
| loss = outputs.loss |
| logits = outputs.logits |
| |
| print(np.argmax(logits.detach().numpy())) |
| ``` |
| |
| ## Training Data |
| |
| Trained on CodeXGLUE Defect Detection dataset. |
| |
| ## Limitations |
| |
| - Focused on Java code only |
| - May not detect all types of vulnerabilities |