Qwen2-VL-7B Traffic Detection LoRA

Fine-tuned LoRA adapter for traffic and urban scene understanding.

Model Details

Base Model: Qwen/Qwen2-VL-7B-Instruct
Method: LoRA (Low-Rank Adaptation)
LoRA Rank: 32
LoRA Alpha: 64
Quantization: 4-bit (NF4)
Training Strategy: Sculptor Method (Inverse Masking)

Training Hyperparameters

Learning Rate: 2e-4
Batch Size: 2 per device
Gradient Accumulation: 8 steps
Effective Batch Size: 16
Epochs: 20
Optimizer: PagedAdamW 8-bit
Scheduler: Cosine with warmup (3%)
LoRA Dropout: 0.05
Weight Decay: 0.01

Usage

Installation

pip install transformers peft torch qwen-vl-utils

Inference (4-bit)

import torch
from transformers import Qwen2VLForConditionalGeneration, AutoProcessor, BitsAndBytesConfig
from peft import PeftModel
from qwen_vl_utils import process_vision_info

# Load base model
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True
)

base_model = Qwen2VLForConditionalGeneration.from_pretrained(
    "Qwen/Qwen2-VL-7B-Instruct",
    quantization_config=bnb_config,
    device_map="auto"
)

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "muk0644/Urban-Traffic-Qwen2-VL2")

# Load processor
processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-7B-Instruct")

# Inference
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": "image.jpg"},
            {"type": "text", "text": "Count the visible Objects"}
        ]
    }
]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(text=[text], images=image_inputs, videos=video_inputs, padding=True, return_tensors="pt").to(model.device)

with torch.no_grad():
    output_ids = model.generate(**inputs, max_new_tokens=512)

generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(inputs.input_ids, output_ids)]
response = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

Dataset

Traffic and urban scene images for:

Traffic scene description
Vehicle detection
Road condition analysis
Urban environment understanding

Framework Versions

PEFT: 0.18.0
Transformers: 4.x
PyTorch: 2.x

Downloads last month: 26

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for muk0644/Urban-Traffic-Qwen2-VL2

Base model

Qwen/Qwen2-VL-7B

Finetuned

Qwen/Qwen2-VL-7B-Instruct

Adapter

(188)

this model