Update README.md

5e2358d verified 2 months ago

6.82 kB

	---
	license: mit
	datasets:
	- saiteja33/DAMASHA
	language:
	- en
	base_model:
	- FacebookAI/roberta-base
	- answerdotai/ModernBERT-base
	pipeline_tag: token-classification
	---

	# DAMASHA-MAS: Mixed-Authorship Adversarial Segmentation (Token Classification)

	This repository contains a token-classification model trained on the DAMASHA-MAS benchmark, introduced in:

	> DAMASHA: Detecting AI in Mixed Adversarial Texts via Segmentation with Human-interpretable Attribution

	The model aims to segment mixed human–AI text at token level – i.e., decide for each token whether it was written by a human or an LLM, even under syntactic adversarial attacks.

	- Base encoders:
	- [`FacebookAI/roberta-base`](https://huggingface.co/FacebookAI/roberta-base)
	- [`answerdotai/ModernBERT-base`](https://huggingface.co/answerdotai/ModernBERT-base)
	- Architecture (high level): RoBERTa + ModernBERT feature fusion → BiGRU + CRF with the Info-Mask gating mechanism from the paper.
	- Task: Token classification (binary authorship: human vs AI).
	- Language: English
	- License (this model): MIT
	- Training data license: CC-BY-4.0 via the DAMASHA dataset.

	If you use this model, please also cite the DAMASHA paper and dataset (see Citation section).

	---

	## 1. Model Highlights

	- Fine-grained mixed-authorship detection
	Predicts authorship per token, allowing reconstruction of human vs AI spans in long documents.

	- Adversarially robust
	Trained and evaluated on syntactically attacked texts (misspelling, Unicode substitutions, invisible characters, punctuation swaps, case perturbations, and “all-mixed” attacks).

	- Human-interpretable Info-Mask
	The architecture incorporates stylometric features (perplexity, POS density, punctuation density, lexical diversity, readability) via an Info-Mask module that gates token representations in an interpretable way.

	- Strong reported performance (from the paper)
	On DAMASHA-MAS, the RMC\* model (RoBERTa + ModernBERT + CRF + Info-Mask) achieves:
	- Token-level: Accuracy / Precision / Recall / F1 ≈ 0.98
	- Span-level (strict): SBDA ≈ 0.45, SegPre ≈ 0.41
	- Span-level (relaxed IoU ≥ 0.5): ≈ 0.82

	> ⚠️ The exact numbers for this specific checkpoint may differ depending on training run and configuration. The values above are from the paper’s best configuration (RMC\*).

	---

	## 2. Intended Use

	### What this model is for

	- Research on human–AI co-authorship
	- Studying where LLMs “take over” in mixed texts.
	- Analysing robustness of detectors under adversarial perturbations.

	- Tooling / applications (with human oversight)
	- Assisting editors, educators, or moderators to highlight suspicious spans rather than making final decisions.
	- Exploring interpretability overlays (e.g., heatmaps over tokens) when combined with Info-Mask outputs.

	### What this model is not for

	- Automated “cheating detector” / plagiarism court.
	- High-stakes decisions affecting people’s livelihood, grades, or reputation without human review.
	- Non-English or heavily code-mixed text (training data is English-centric).

	Use this model as a signal, not a judge.

	---

	## 3. Data: DAMASHA-MAS

	The model is trained on the MAS benchmark released with the DAMASHA paper and hosted as the Hugging Face dataset:

	- Dataset: [`saiteja33/DAMASHA`](https://huggingface.co/datasets/saiteja33/DAMASHA)

	### 3.1 What’s in MAS?

	MAS consists of mixed human–AI texts with explicit span tags:

	- Human text comes from several corpora for domain diversity, including:
	- Reddit (M4-Reddit)
	- Yelp & /r/ChangeMyView (MAGE-YELP, MAGE-CMV)
	- News summaries (XSUM)
	- Wikipedia (M4-Wiki, MAGE-SQuAD)
	- ArXiv abstracts (MAGE-SciGen)
	- QA texts (MAGE-ELI5)

	- AI text is generated by multiple modern LLMs:
	- DeepSeek-V3-671B (open-source)
	- GPT-4o, GPT-4.1, GPT-4.1-mini (closed-source)

	### 3.2 Span tagging

	Authorship is marked using explicit tags around AI spans:

	- `<AI_Start>` … `</AI_End>` denote AI-generated segments within otherwise human text.
	- The dataset stores text in a `hybrid_text` column, plus metadata such as `has_pair`, and adversarial variants include `attack_name`, `tag_count`, and `attacked_text`.
	- Tags are sentence-level in annotation, but the model is trained to output token-level predictions for finer segmentation.

	> During training, these tags are converted into token labels (2 labels total; see `config.id2label` in the model files).

	### 3.3 Adversarial attacks

	MAS includes multiple syntactic attacks applied to the mixed text:

	- Misspelling
	- Unicode character substitution
	- Invisible characters
	- Punctuation substitution
	- Upper/lower case swapping
	- All-mixed combinations of the above

	These perturbations make tokenization brittle and test robustness of detectors in realistic settings.

	---

	## 4. Model Architecture & Training

	### 4.1 Architecture (conceptual)

	The model follows the Info-Mask RMC\* architecture described in the DAMASHA paper:

	1. Dual encoders
	- RoBERTa-base and ModernBERT-base encode the same input sequence.
	2. Feature fusion
	- Hidden states from both encoders are fused into a shared representation.
	3. Stylometric Info-Mask
	- Hand-crafted style features (perplexity, POS density, punctuation density, lexical diversity, readability) are projected, passed through multi-head attention, and turned into a scalar mask per token.
	- This mask gates the fused encoder states, down-weighting style-irrelevant tokens and emphasizing style-diagnostic ones. :contentReference[oaicite:16]{index=16}
	4. Sequence model + CRF
	- A BiGRU layer captures sequential dependencies, followed by a CRF layer for structured token labeling with a sequence-level loss. :contentReference[oaicite:17]{index=17}

	### 4.2 Training setup (from the paper)

	Key hyperparameters used for the Info-Mask models on MAS:

	- Number of labels: 2
	- Max sequence length: 512
	- Batch size: 64
	- Epochs: 5
	- Optimizer: AdamW (with cosine annealing LR schedule)
	- Weight decay: 0.01
	- Gradient clipping: 1.0
	- Dropout: Dynamic 0.1–0.3 (initial 0.1)
	- Warmup ratio: 0.1
	- Early stopping patience: 2

	Hardware & compute (as reported):

	- AWS EC2 g6e.xlarge, NVIDIA L40S (48GB) GPU, Ubuntu 24.04
	- ≈ 400 GPU hours for experiments.

	> The exact training script used for this checkpoint is available in the project GitHub:
	> <https://github.com/saitejalekkala33/DAMASHA>

	---

	---
	license: mit
	---