| | --- |
| | license: mit |
| | datasets: |
| | - saiteja33/DAMASHA |
| | language: |
| | - en |
| | base_model: |
| | - FacebookAI/roberta-base |
| | - answerdotai/ModernBERT-base |
| | pipeline_tag: token-classification |
| | --- |
| | |
| | # DAMASHA-MAS: Mixed-Authorship Adversarial Segmentation (Token Classification) |
| |
|
| | This repository contains a **token-classification model** trained on the **DAMASHA-MAS** benchmark, introduced in: |
| |
|
| | > **DAMASHA: Detecting AI in Mixed Adversarial Texts via Segmentation with Human-interpretable Attribution** |
| |
|
| | The model aims to **segment mixed human–AI text** at *token level* – i.e., decide for each token whether it was written by a *human* or an *LLM*, even under **syntactic adversarial attacks**. |
| |
|
| | - **Base encoders:** |
| | - [`FacebookAI/roberta-base`](https://huggingface.co/FacebookAI/roberta-base) |
| | - [`answerdotai/ModernBERT-base`](https://huggingface.co/answerdotai/ModernBERT-base) |
| | - **Architecture (high level):** RoBERTa + ModernBERT feature fusion → BiGRU + CRF with the **Info-Mask** gating mechanism from the paper. |
| | - **Task:** Token classification (binary authorship: human vs AI). |
| | - **Language:** English |
| | - **License (this model):** MIT |
| | - **Training data license:** CC-BY-4.0 via the DAMASHA dataset. |
| |
|
| | If you use this model, **please also cite the DAMASHA paper and dataset** (see Citation section). |
| |
|
| | --- |
| |
|
| | ## 1. Model Highlights |
| |
|
| | - **Fine-grained mixed-authorship detection** |
| | Predicts authorship **per token**, allowing reconstruction of human vs AI **spans** in long documents. |
| |
|
| | - **Adversarially robust** |
| | Trained and evaluated on **syntactically attacked texts** (misspelling, Unicode substitutions, invisible characters, punctuation swaps, case perturbations, and “all-mixed” attacks). |
| |
|
| | - **Human-interpretable Info-Mask** |
| | The architecture incorporates **stylometric features** (perplexity, POS density, punctuation density, lexical diversity, readability) via an **Info-Mask** module that gates token representations in an interpretable way. |
| |
|
| | - **Strong reported performance (from the paper)** |
| | On DAMASHA-MAS, the **RMC\*** model (RoBERTa + ModernBERT + CRF + Info-Mask) achieves: |
| | - **Token-level**: Accuracy / Precision / Recall / F1 ≈ **0.98** |
| | - **Span-level (strict)**: SBDA ≈ **0.45**, SegPre ≈ **0.41** |
| | - **Span-level (relaxed IoU ≥ 0.5)**: ≈ **0.82** |
| |
|
| | > ⚠️ The exact numbers for *this* specific checkpoint may differ depending on training run and configuration. The values above are from the paper’s best configuration (RMC\*). |
| | |
| | --- |
| | |
| | ## 2. Intended Use |
| | |
| | ### What this model is for |
| | |
| | - **Research on human–AI co-authorship** |
| | - Studying where LLMs “take over” in mixed texts. |
| | - Analysing robustness of detectors under adversarial perturbations. |
| | |
| | - **Tooling / applications (with human oversight)** |
| | - Assisting editors, educators, or moderators to **highlight suspicious spans** rather than making final decisions. |
| | - Exploring **interpretability overlays** (e.g., heatmaps over tokens) when combined with Info-Mask outputs. |
| | |
| | ### What this model is *not* for |
| |
|
| | - Automated “cheating detector” / plagiarism court. |
| | - High-stakes decisions affecting people’s livelihood, grades, or reputation **without human review**. |
| | - Non-English or heavily code-mixed text (training data is English-centric). |
| |
|
| | Use this model as a **signal**, not a judge. |
| |
|
| | --- |
| |
|
| | ## 3. Data: DAMASHA-MAS |
| |
|
| | The model is trained on the **MAS** benchmark released with the DAMASHA paper and hosted as the Hugging Face dataset: |
| |
|
| | - **Dataset:** [`saiteja33/DAMASHA`](https://huggingface.co/datasets/saiteja33/DAMASHA) |
| |
|
| | ### 3.1 What’s in MAS? |
| |
|
| | MAS consists of **mixed human–AI texts with explicit span tags**: |
| |
|
| | - Human text comes from several corpora for **domain diversity**, including: |
| | - Reddit (M4-Reddit) |
| | - Yelp & /r/ChangeMyView (MAGE-YELP, MAGE-CMV) |
| | - News summaries (XSUM) |
| | - Wikipedia (M4-Wiki, MAGE-SQuAD) |
| | - ArXiv abstracts (MAGE-SciGen) |
| | - QA texts (MAGE-ELI5) |
| |
|
| | - AI text is generated by multiple modern LLMs: |
| | - **DeepSeek-V3-671B** (open-source) |
| | - **GPT-4o, GPT-4.1, GPT-4.1-mini** (closed-source) |
| |
|
| | ### 3.2 Span tagging |
| |
|
| | Authorship is marked using **explicit tags** around AI spans: |
| |
|
| | - `<AI_Start>` … `</AI_End>` denote AI-generated segments within otherwise human text. |
| | - The dataset stores text in a `hybrid_text` column, plus metadata such as `has_pair`, and adversarial variants include `attack_name`, `tag_count`, and `attacked_text`. |
| | - Tags are sentence-level in annotation, but the model is trained to output **token-level** predictions for finer segmentation. |
| |
|
| | > During training, these tags are converted into **token labels** (2 labels total; see `config.id2label` in the model files). |
| |
|
| | ### 3.3 Adversarial attacks |
| |
|
| | MAS includes multiple **syntactic attacks** applied to the mixed text: |
| |
|
| | - Misspelling |
| | - Unicode character substitution |
| | - Invisible characters |
| | - Punctuation substitution |
| | - Upper/lower case swapping |
| | - All-mixed combinations of the above |
| |
|
| | These perturbations make tokenization brittle and test robustness of detectors in realistic settings. |
| |
|
| | --- |
| |
|
| | ## 4. Model Architecture & Training |
| |
|
| | ### 4.1 Architecture (conceptual) |
| |
|
| | The model follows the **Info-Mask RMC\*** architecture described in the DAMASHA paper: |
| |
|
| | 1. **Dual encoders** |
| | - RoBERTa-base and ModernBERT-base encode the same input sequence. |
| | 2. **Feature fusion** |
| | - Hidden states from both encoders are fused into a shared representation. |
| | 3. **Stylometric Info-Mask** |
| | - Hand-crafted style features (perplexity, POS density, punctuation density, lexical diversity, readability) are projected, passed through multi-head attention, and turned into a **scalar mask per token**. |
| | - This mask gates the fused encoder states, down-weighting style-irrelevant tokens and emphasizing style-diagnostic ones. :contentReference[oaicite:16]{index=16} |
| | 4. **Sequence model + CRF** |
| | - A BiGRU layer captures sequential dependencies, followed by a **CRF** layer for structured token labeling with a sequence-level loss. :contentReference[oaicite:17]{index=17} |
| |
|
| | ### 4.2 Training setup (from the paper) |
| |
|
| | Key hyperparameters used for the Info-Mask models on MAS: |
| |
|
| | - **Number of labels:** 2 |
| | - **Max sequence length:** 512 |
| | - **Batch size:** 64 |
| | - **Epochs:** 5 |
| | - **Optimizer:** AdamW (with cosine annealing LR schedule) |
| | - **Weight decay:** 0.01 |
| | - **Gradient clipping:** 1.0 |
| | - **Dropout:** Dynamic 0.1–0.3 (initial 0.1) |
| | - **Warmup ratio:** 0.1 |
| | - **Early stopping patience:** 2 |
| |
|
| | **Hardware & compute** (as reported): |
| |
|
| | - AWS EC2 g6e.xlarge, NVIDIA L40S (48GB) GPU, Ubuntu 24.04 |
| | - ≈ 400 GPU hours for experiments. |
| |
|
| | > The exact training script used for this checkpoint is available in the project GitHub: |
| | > <https://github.com/saitejalekkala33/DAMASHA> |
| |
|
| | --- |
| |
|
| | --- |
| | license: mit |
| | --- |