Green Patent Detection: Advanced Agentic Workflow with QLoRA
Project Summary
This is the final assignment it synthesizes Assignments 2 and 3 into a data labelling pipeline. A Generative LLM is fine-tuned via QLoRA to understand patent language, then integrated as the "jduge" of a Multi-Agent System (MAS) to debate and label complex patent claims. Finally, a targeted Human-in-the-Loop (HITL) review step produces a gold dataset for a final PatentSBERTa fine-tuning.
Pipeline Architecture
patents_50k_green.parquet
[Part A & B] Baseline PatentSBERTa + Uncertainty Sampling
- Top 100 high-risk claims (u β 1.0)
[Part C β Step 1] QLoRA Fine-tuning on Colab (Qwen3-8B, 4-bit, 3 epochs)
- qlora_green_patent_adapter (LoRA weights)
- Qwen3-8B.Q4_K_M.gguf (served via LM Studio)
[Part C β Step 2] Multi-Agent System (CrewAI)
- Agent 1 β Advocate (Qwen3-4B, argues for green: Advocator)
- Agent 2 β Skeptic (Qwen3-4B, argues against green: Skeptic)
- Agent 3 β Judge (QLoRA Qwen3-8B, final verdict: Judge)
[Part D] Exception-Based HITL (only deadlocks / low-confidence)
- 26 claims reviewed with deadlock, 3 human overrides
- hitl_green_100_final.csv (gold labels)
[Part D] Final PatentSBERTa Fine-tuning on gold dataset
- patentsberta_finetuned_final/
Part C β Step 1: QLoRA Domain Adaptation
The generative LLM fine-tuning was performed on Google Colab (T4, 15 GB VRAM) using Unsloth's QLoRA implementation.
| Parameter | Value |
|---|---|
| Base model | unsloth/Qwen3-8B-bnb-4bit |
| LoRA rank (r) | 16 |
| LoRA alpha | 16 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Training examples | 2,000 (train_silver, Alpaca format) |
| Epochs | 3 (375 total steps) |
| Batch size | 4 Γ 4 gradient accumulation = effective 16 |
| Learning rate | 2e-4 (AdamW 8-bit, linear schedule) |
| Max sequence length | 2,048 tokens |
| Training loss | 0.8899 |
| Training time | ~105 minutes on T4 |
| VRAM usage | ~5 GB (4-bit quantization) |
The fine-tuned adapter was exported to GGUF Q4_K_M format (4.682 GB) and served locally via LM Studio for use in the MAS.
Part C β Step 2: Multi-Agent System
Three agents collaborate to label each of the 100 high-risk patent claims using CrewAI as the orchestration framework. The QLoRA fine-tuned model serves as the Judge's brain.
| Agent | Model | Temperature | Role |
|---|---|---|---|
| Advocate | Qwen3-4B (LM Studio) | 0.1 | Argues FOR Y02 green classification |
| Skeptic | Qwen3-4B (LM Studio) | 0.1 | Argues AGAINST (identifies greenwashing) |
| Judge | QLoRA Qwen3-8B (LM Studio) | 0.1 | Weighs debate and produces final JSON label |
Each claim produces a structured JSON output: classification (0/1), confidence (Low/Medium/High), and rationale.
Part D: Targeted HITL & Final Fine-tuning
Exception-Based HITL was applied β only intervening when agents reached a deadlock or produced low-confidence outputs.
| Metric | Value |
|---|---|
| Total claims reviewed by MAS | 100 |
| Auto-accepted (high confidence) | 74 |
| Escalated to human review | 26 |
| Human overrides | 3 |
| Human agreement rate with Judge | 88.5% |
The gold-labelled dataset (hitl_green_100_final.csv) was used to fine-tune PatentSBERTa for 3 epochs using CosineSimilarityLoss on an AMD Radeon RX 9070 XT via DirectML (fell back to CPU, completed in ~31 minutes).
Performance Results
| Model Version | Training Data Source | F1 Score (Test Set) |
|---|---|---|
| 1. Baseline | Frozen Embeddings (No Fine-tuning) | 0.7494 |
| 2. Assignment 2 Model | Silver + Gold (Simple Generic LLM) | 0.7465 |
| 3. Assignment 3 Model | Silver + Gold (Advanced Techniques / MAS) | 0.7467 |
| 4. Final Model | Silver + Gold (QLoRA-Powered MAS + Targeted HITL) | 0.7530 |
The Final Model achieves the highest F1 score across all iterations, demonstrating that QLoRA domain adaptation combined with structured agent debate and targeted human review produces measurable improvements.
Key Findings
QLoRA advantages:
- Adapts a generative LLM to patent language with only 0.53% of parameters trained
- Enables a domain-aware Judge that understands Y02 classification logic
- 4-bit quantization fits 8B model on a free 15 GB T4 GPU
MAS + HITL advantages:
- Debate structure surfaces disagreements that single-model approaches miss
- Exception-based HITL reduces human effort by 74% (26 vs 100 reviews)
- Gold labels are higher-quality than silver LLM labels alone
Limitations:
- DirectML (AMD GPU) not fully supported by sentence-transformers training β fell back to CPU
- torchao 0.16.0 conflicts with transformers lazy loader in certain environments
Repository Contents
| File | Description |
|---|---|
Final_Assignment.ipynb |
Main notebook (Parts AβD) |
patentsberta_finetuned_final/ |
Final fine-tuned PatentSBERTa model |
hitl_green_100_final.csv |
Gold dataset β 100 claims with HITL labels and debate rationales |
final_classifier.joblib |
Serialised final Logistic Regression classifier |
qlora_outputs.zip |
QLoRA adapter weights (qlora_green_patent_adapter/) |
Part C Step 1.ipynb |
Colab notebook for QLoRA fine-tuning |
Debate transcripts |
All debate transcrips for MAS |