Green Patent Detection: Advanced Agentic Workflow with QLoRA

Project Summary

This is the final assignment it synthesizes Assignments 2 and 3 into a data labelling pipeline. A Generative LLM is fine-tuned via QLoRA to understand patent language, then integrated as the "jduge" of a Multi-Agent System (MAS) to debate and label complex patent claims. Finally, a targeted Human-in-the-Loop (HITL) review step produces a gold dataset for a final PatentSBERTa fine-tuning.

Pipeline Architecture

patents_50k_green.parquet

[Part A & B] Baseline PatentSBERTa + Uncertainty Sampling

Top 100 high-risk claims (u ≈ 1.0)

[Part C – Step 1] QLoRA Fine-tuning on Colab (Qwen3-8B, 4-bit, 3 epochs)

qlora_green_patent_adapter (LoRA weights)
Qwen3-8B.Q4_K_M.gguf (served via LM Studio)

[Part C – Step 2] Multi-Agent System (CrewAI)

Agent 1 – Advocate (Qwen3-4B, argues for green: Advocator)
Agent 2 – Skeptic (Qwen3-4B, argues against green: Skeptic)
Agent 3 – Judge (QLoRA Qwen3-8B, final verdict: Judge)

[Part D] Exception-Based HITL (only deadlocks / low-confidence)

26 claims reviewed with deadlock, 3 human overrides
hitl_green_100_final.csv (gold labels)

[Part D] Final PatentSBERTa Fine-tuning on gold dataset

patentsberta_finetuned_final/

Part C – Step 1: QLoRA Domain Adaptation

The generative LLM fine-tuning was performed on Google Colab (T4, 15 GB VRAM) using Unsloth's QLoRA implementation.

Parameter	Value
Base model	`unsloth/Qwen3-8B-bnb-4bit`
LoRA rank (r)	16
LoRA alpha	16
Target modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Training examples	2,000 (train_silver, Alpaca format)
Epochs	3 (375 total steps)
Batch size	4 × 4 gradient accumulation = effective 16
Learning rate	2e-4 (AdamW 8-bit, linear schedule)
Max sequence length	2,048 tokens
Training loss	0.8899
Training time	~105 minutes on T4
VRAM usage	~5 GB (4-bit quantization)

The fine-tuned adapter was exported to GGUF Q4_K_M format (4.682 GB) and served locally via LM Studio for use in the MAS.

Part C – Step 2: Multi-Agent System

Three agents collaborate to label each of the 100 high-risk patent claims using CrewAI as the orchestration framework. The QLoRA fine-tuned model serves as the Judge's brain.

Agent	Model	Temperature	Role
Advocate	Qwen3-4B (LM Studio)	0.1	Argues FOR Y02 green classification
Skeptic	Qwen3-4B (LM Studio)	0.1	Argues AGAINST (identifies greenwashing)
Judge	QLoRA Qwen3-8B (LM Studio)	0.1	Weighs debate and produces final JSON label

Each claim produces a structured JSON output: classification (0/1), confidence (Low/Medium/High), and rationale.

Part D: Targeted HITL & Final Fine-tuning

Exception-Based HITL was applied — only intervening when agents reached a deadlock or produced low-confidence outputs.

Metric	Value
Total claims reviewed by MAS	100
Auto-accepted (high confidence)	74
Escalated to human review	26
Human overrides	3
Human agreement rate with Judge	88.5%

The gold-labelled dataset (hitl_green_100_final.csv) was used to fine-tune PatentSBERTa for 3 epochs using CosineSimilarityLoss on an AMD Radeon RX 9070 XT via DirectML (fell back to CPU, completed in ~31 minutes).

Performance Results

Model Version	Training Data Source	F1 Score (Test Set)
1. Baseline	Frozen Embeddings (No Fine-tuning)	0.7494
2. Assignment 2 Model	Silver + Gold (Simple Generic LLM)	0.7465
3. Assignment 3 Model	Silver + Gold (Advanced Techniques / MAS)	0.7467
4. Final Model	Silver + Gold (QLoRA-Powered MAS + Targeted HITL)	0.7530

The Final Model achieves the highest F1 score across all iterations, demonstrating that QLoRA domain adaptation combined with structured agent debate and targeted human review produces measurable improvements.

Key Findings

QLoRA advantages:

Adapts a generative LLM to patent language with only 0.53% of parameters trained
Enables a domain-aware Judge that understands Y02 classification logic
4-bit quantization fits 8B model on a free 15 GB T4 GPU

MAS + HITL advantages:

Debate structure surfaces disagreements that single-model approaches miss
Exception-based HITL reduces human effort by 74% (26 vs 100 reviews)
Gold labels are higher-quality than silver LLM labels alone

Limitations:

DirectML (AMD GPU) not fully supported by sentence-transformers training — fell back to CPU
torchao 0.16.0 conflicts with transformers lazy loader in certain environments

Repository Contents

File	Description
`Final_Assignment.ipynb`	Main notebook (Parts A–D)
`patentsberta_finetuned_final/`	Final fine-tuned PatentSBERTa model
`hitl_green_100_final.csv`	Gold dataset — 100 claims with HITL labels and debate rationales
`final_classifier.joblib`	Serialised final Logistic Regression classifier
`qlora_outputs.zip`	QLoRA adapter weights (`qlora_green_patent_adapter/`)
`Part C Step 1.ipynb`	Colab notebook for QLoRA fine-tuning
`Debate transcripts`	All debate transcrips for MAS

Related Repositories

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support