Qwen2-VL-7B Traffic Detection LoRA
Fine-tuned LoRA adapter for traffic and urban scene understanding.
Model Details
- Base Model: Qwen/Qwen2-VL-7B-Instruct
- Method: LoRA (Low-Rank Adaptation)
- LoRA Rank: 32
- LoRA Alpha: 64
- Quantization: 4-bit (NF4)
- Training Strategy: Sculptor Method (Inverse Masking)
Training Hyperparameters
- Learning Rate: 2e-4
- Batch Size: 2 per device
- Gradient Accumulation: 8 steps
- Effective Batch Size: 16
- Epochs: 20
- Optimizer: PagedAdamW 8-bit
- Scheduler: Cosine with warmup (3%)
- LoRA Dropout: 0.05
- Weight Decay: 0.01
Usage
Installation
pip install transformers peft torch qwen-vl-utils
Inference (4-bit)
import torch
from transformers import Qwen2VLForConditionalGeneration, AutoProcessor, BitsAndBytesConfig
from peft import PeftModel
from qwen_vl_utils import process_vision_info
# Load base model
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True
)
base_model = Qwen2VLForConditionalGeneration.from_pretrained(
"Qwen/Qwen2-VL-7B-Instruct",
quantization_config=bnb_config,
device_map="auto"
)
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "muk0644/Urban-Traffic-Qwen2-VL2")
# Load processor
processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-7B-Instruct")
# Inference
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": "image.jpg"},
{"type": "text", "text": "Count the visible Objects"}
]
}
]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(text=[text], images=image_inputs, videos=video_inputs, padding=True, return_tensors="pt").to(model.device)
with torch.no_grad():
output_ids = model.generate(**inputs, max_new_tokens=512)
generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(inputs.input_ids, output_ids)]
response = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
Dataset
Traffic and urban scene images for:
- Traffic scene description
- Vehicle detection
- Road condition analysis
- Urban environment understanding
Framework Versions
- PEFT: 0.18.0
- Transformers: 4.x
- PyTorch: 2.x
- Downloads last month
- 26
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support