Qwen2-VL-7B Traffic Detection LoRA

Fine-tuned LoRA adapter for traffic and urban scene understanding.

Model Details

  • Base Model: Qwen/Qwen2-VL-7B-Instruct
  • Method: LoRA (Low-Rank Adaptation)
  • LoRA Rank: 32
  • LoRA Alpha: 64
  • Quantization: 4-bit (NF4)
  • Training Strategy: Sculptor Method (Inverse Masking)

Training Hyperparameters

  • Learning Rate: 2e-4
  • Batch Size: 2 per device
  • Gradient Accumulation: 8 steps
  • Effective Batch Size: 16
  • Epochs: 20
  • Optimizer: PagedAdamW 8-bit
  • Scheduler: Cosine with warmup (3%)
  • LoRA Dropout: 0.05
  • Weight Decay: 0.01

Usage

Installation

pip install transformers peft torch qwen-vl-utils

Inference (4-bit)

import torch
from transformers import Qwen2VLForConditionalGeneration, AutoProcessor, BitsAndBytesConfig
from peft import PeftModel
from qwen_vl_utils import process_vision_info

# Load base model
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True
)

base_model = Qwen2VLForConditionalGeneration.from_pretrained(
    "Qwen/Qwen2-VL-7B-Instruct",
    quantization_config=bnb_config,
    device_map="auto"
)

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "muk0644/Urban-Traffic-Qwen2-VL2")

# Load processor
processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-7B-Instruct")

# Inference
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": "image.jpg"},
            {"type": "text", "text": "Count the visible Objects"}
        ]
    }
]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(text=[text], images=image_inputs, videos=video_inputs, padding=True, return_tensors="pt").to(model.device)

with torch.no_grad():
    output_ids = model.generate(**inputs, max_new_tokens=512)

generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(inputs.input_ids, output_ids)]
response = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

Dataset

Traffic and urban scene images for:

  • Traffic scene description
  • Vehicle detection
  • Road condition analysis
  • Urban environment understanding

Framework Versions

  • PEFT: 0.18.0
  • Transformers: 4.x
  • PyTorch: 2.x
Downloads last month
26
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for muk0644/Urban-Traffic-Qwen2-VL2

Base model

Qwen/Qwen2-VL-7B
Adapter
(188)
this model