DeepSeek-Coder 6.7B - SecureCode Edition

License Training Dataset Base Model perfecXion.ai

Security-optimized code model - built for vulnerability detection

🤗 Model Card | 📊 Dataset | 💻 perfecXion.ai


🎯 What is This?

This is DeepSeek-Coder 6.7B Instruct fine-tuned on the SecureCode v2.0 dataset - a code model specifically designed for security analysis and vulnerability detection.

DeepSeek-Coder was trained on 2 trillion tokens with a unique focus on code understanding and generation. Combined with SecureCode training, this model excels at:

Identifying subtle security flaws in complex codebases ✅ Generating hardened implementations optimized for security ✅ Explaining vulnerability chains with step-by-step attack demonstrations ✅ Providing remediation guidance with defense-in-depth patterns

The Result: A security-first code model that balances performance with specialized vulnerability detection capabilities.

Why Deep Seek-Coder? This model offers:

  • 🔍 Excellent code comprehension - Trained specifically for understanding code structure
  • 🛡️ Security-aware architecture - Pre-training included security-focused code
  • Efficient inference - Compact 6.7B size with strong performance
  • 🎯 Balanced trade-off - Better than 3B models, more efficient than 13B+
  • 💰 Cost-effective - Optimal performance-per-parameter ratio

🚨 The Problem This Solves

AI coding assistants produce vulnerable code in 45% of security-relevant scenarios (Veracode 2025). DeepSeek-Coder SecureCode Edition addresses this by combining deep code understanding with security expertise.

Real-world impact:

  • Equifax breach (SQL injection): $425 million
  • Capital One (SSRF): 100 million records exposed
  • SolarWinds (auth bypass): 18,000 orgs compromised

This model was specifically fine-tuned to prevent these vulnerability classes.


💡 Key Features

🛡️ Security-Optimized Base Model

DeepSeek-Coder outperforms many larger models on code tasks:

  • HumanEval: 78.6% pass@1 (beats CodeLlama 13B)
  • MBPP: 70.2% pass@1
  • Strong performance on security-relevant code patterns

Now enhanced with 1,209 security-focused examples covering OWASP Top 10:2025.

🔐 Comprehensive Vulnerability Coverage

Trained on real-world security incidents:

  • 224 examples of Broken Access Control
  • 199 examples of Authentication Failures
  • 125 examples of Injection attacks
  • 115 examples of Cryptographic Failures
  • Full OWASP Top 10:2025 coverage

🌍 Multi-Language Security Expertise

Fine-tuned on security examples across:

  • Python (Django, Flask, FastAPI)
  • JavaScript/TypeScript (Express, NestJS)
  • Java (Spring Boot)
  • Go (Gin framework)
  • PHP (Laravel, Symfony)
  • C# (ASP.NET Core)
  • Ruby (Rails)
  • Rust (Actix, Rocket)

📋 Complete Security Context

Every response includes:

  1. Vulnerable code demonstrating the flaw
  2. Secure implementation with best practices
  3. Attack demonstration with exploit payloads
  4. Operational guidance for production hardening

📊 Training Details

Parameter Value
Base Model deepseek-ai/deepseek-coder-6.7b-instruct
Fine-tuning Method LoRA (Low-Rank Adaptation)
Training Dataset SecureCode v2.0
Dataset Size 841 training examples
Training Epochs 3
LoRA Rank (r) 16
LoRA Alpha 32
Learning Rate 2e-4
Quantization 4-bit (bitsandbytes)
Trainable Parameters ~35M (0.52% of total)
Total Parameters 6.7B
Context Window 16K tokens
GPU Used NVIDIA A100 40GB
Training Time ~85 minutes (estimated)

Training Methodology

LoRA fine-tuning preserves DeepSeek-Coder's code expertise while adding security knowledge:

  • Trains only 0.52% of parameters
  • Maintains base model quality
  • Adds OWASP-focused security understanding
  • Efficient deployment with minimal overhead

🚀 Usage

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model
base_model = "deepseek-ai/deepseek-coder-6.7b-instruct"
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    device_map="auto",
    torch_dtype="auto",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)

# Load SecureCode adapter
model = PeftModel.from_pretrained(model, "scthornton/deepseek-coder-6.7b-securecode")

# Analyze code for vulnerabilities
prompt = """### User:
Identify all security vulnerabilities in this authentication middleware:

```javascript
const authenticate = async (req, res, next) => {
    const token = req.headers.authorization;
    const decoded = jwt.verify(token, process.env.JWT_SECRET);
    req.user = await User.findById(decoded.userId);
    next();
};

Assistant:

"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.7) response = tokenizer.decode(outputs[0], skip_special_tokens=True) print(response)


### Production Deployment (4-bit Quantization)

```python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel

# 4-bit quantization - runs on 12GB GPU
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype="bfloat16"
)

model = AutoModelForCausalLM.from_pretrained(
    "deepseek-ai/deepseek-coder-6.7b-instruct",
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True
)

model = PeftModel.from_pretrained(model, "scthornton/deepseek-coder-6.7b-securecode")
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-6.7b-instruct", trust_remote_code=True)

🎯 Use Cases

1. Vulnerability Scanning in CI/CD

Integrate into development pipelines for automated security checks:

Scan this Pull Request for OWASP Top 10 vulnerabilities

2. Security-Focused Code Generation

Generate implementations with security as priority:

Write a secure user registration endpoint with input validation, rate limiting, and SQL injection prevention

3. Legacy Code Remediation

Identify and fix vulnerabilities in existing code:

Refactor this legacy authentication system to fix all security issues

4. Security Training & Education

Use for developer security training:

Explain common authentication bypass techniques and how to prevent them

5. Threat Modeling

Analyze architectural security:

Identify potential attack vectors in this microservices architecture

⚠️ Limitations

What This Model Does Well

✅ Security vulnerability identification ✅ Code understanding and analysis ✅ Generating secure implementations ✅ Explaining attack vectors

What This Model Doesn't Do

❌ Not a replacement for static analysis tools ❌ Cannot discover novel 0-day vulnerabilities ❌ Not legal/compliance advice ❌ Not a replacement for security experts


📈 Performance Benchmarks

Hardware Requirements

Minimum:

  • 14GB RAM
  • 10GB GPU VRAM (with 4-bit quantization)

Recommended:

  • 24GB RAM
  • 12GB+ GPU (RTX 3060 Ti, RTX 4070)

Inference Speed (on RTX 3060 12GB):

  • ~35 tokens/second (4-bit quantization)
  • ~50 tokens/second (bfloat16)

Code Generation (Base Model Scores)

Benchmark Score
HumanEval 78.6%
MBPP 70.2%
MultiPL-E 68.9%

🔬 Dataset Information

Trained on SecureCode v2.0:

  • 1,209 examples with real CVE grounding
  • 11 vulnerability categories (OWASP Top 10:2025)
  • 11 programming languages
  • 100% expert validation

📄 License

Model: Apache 2.0 | Dataset: CC BY-NC-SA 4.0


📚 Citation

@misc{thornton2025securecode-deepseek,
  title={DeepSeek-Coder 6.7B - SecureCode Edition},
  author={Thornton, Scott},
  year={2025},
  publisher={perfecXion.ai},
  url={https://huggingface.co/scthornton/deepseek-coder-6.7b-securecode}
}

🔗 Related Models

View Collection


Built with ❤️ for secure software development

perfecXion.ai | Contact

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for scthornton/deepseek-coder-6.7b-securecode

Finetuned
(54)
this model

Dataset used to train scthornton/deepseek-coder-6.7b-securecode

Collection including scthornton/deepseek-coder-6.7b-securecode