Complaint Classification FastText Model

This is a FastText model trained to classify student complaints into different departmental categories such as 'Mess/Food', 'Cleanliness', 'Infrastructure', 'Technical Issues', 'Academics', and 'Ragging'. It's designed to help streamline the process of directing student grievances to the correct department for resolution.

Model Description

This model is a supervised text classifier built using the FastText library. It takes a raw complaint text as input and outputs a predicted category along with a confidence score. The model was trained on a dataset of student complaints, which underwent specific text preprocessing steps including cleaning, slang normalization, and keyword boosting to improve classification accuracy.

Intended Use

This model is intended for use in educational institutions to automatically categorize student complaints, thereby improving the efficiency of complaint resolution systems. It can be integrated into ticketing systems, chatbots, or other platforms where initial complaint routing is required.

Training Data

The model was trained on a custom dataset of student complaints, ML_Project_Complaint_Dataset_Duration_Imp.csv. The dataset includes two primary columns: complaint_text and category.

Preprocessing steps applied to the training data:

  1. Text Cleaning: Lowercasing, removal of extra whitespace.
  2. Slang Normalization: Replacement of common informal words/slang with their standard equivalents (e.g., 'bakwass' to 'bakwas', 'plz' to 'please').
  3. Keyword Boosting: Addition of relevant keywords to complaints containing specific terms (e.g., adding 'mess food quality eating' to complaints mentioning 'food', 'mess', etc.) to enhance category recognition.

The dataset was balanced using random sampling to ensure each category had an equal number of samples before the train-test split.

Evaluation

The model was evaluated on a held-out test set. Key metrics are:

  • Precision: 0.948
  • Recall: 0.948

How to Use

To use this model for prediction, you need to first download it and then apply the same preprocessing steps that were used during training. Below is a Python example:

import fasttext
import re
from huggingface_hub import hf_hub_download

# --- Preprocessing Functions (MUST be the same as training) ---
def clean_text(text):
    text = str(text).lower()
    text = re.sub(r'\s+', ' ', text).strip()
    return text

def normalize_slang(text):
    replacements = {
        "bakwass": "bakwas",
        "boht": "bahut",
        "nhi": "nahi",
        "nai": "nahi",
        "yaarrr": "yaar",
        "yawwrrr": "yaar",
        "pls": "please",
        "plz": "please"
    }
    for k, v in replacements.items():
        text = text.replace(k, v)
    return text

def boost_keywords(text):
    if any(word in text for word in ["khana", "khaana", "Jevan", "food", "mess", "roti", "sabzi", "rice", "milk"]):
        text += " mess food quality eating"
    if any(word in text for word in ["ganda","flush","toilet", "washroom", "dirty", "dust", "garbage", "smell"]):
        text += " cleanliness hygiene sanitation"
    # TECH
    if any(word in text for word in ["wifi", "internet", "net", "network", "server", "pcs", "system"]):
        text += " technical network issue"
    # INFRA
    if any(word in text for word in ["ac", "fan", "light", "door", "bench", "lock", "window"]):
        text += " infrastructure maintenance"
    # ACADEMICS
    if any(word in text for word in ["teacher", "lecture", "class", "test", "exam", "assignment"]):
        text += " academics study"
    # RAGGING
    if any(word in text for word in ["ragging", "bully", "harass", "senior"]):
        text += " ragging harassment"
    return text

# --- Download and Load Model ---
# Replace 'Sheshank2609/Complaint_Classifier' with your actual repo_id
model_path = hf_hub_download(repo_id="Sheshank2609/Complaint_Classifier", filename="complaint_classifier.ftz")
model = fasttext.load_model(model_path)

# --- Prediction Function ---
def predict_complaint_hf(text):
    processed_text = clean_text(text)
    processed_text = normalize_slang(processed_text)
    processed_text = boost_keywords(processed_text)

    predictions = model.predict(processed_text, k=1)
    label = predictions[0][0].replace("__label__", "")
    confidence = predictions[1][0] * 100

    return label, round(confidence, 2)

# --- Example Usage ---
complaint1 = "My room is very dirty, full of dust."
label1, conf1 = predict_complaint_hf(complaint1)
print(f"Complaint: \"{complaint1}\"\nDepartment: {label1}\nConfidence: {conf1}%\n")

complaint2 = "The mess food is terrible today, no taste."
label2, conf2 = predict_complaint_hf(complaint2)
print(f"Complaint: \"{complaint2}\"\nDepartment: {label2}\nConfidence: {conf2}%\n")

complaint3 = "The wifi is not working in the hostel common room."
label3, conf3 = predict_complaint_hf(complaint3)
print(f"Complaint: \"{complaint3}\"\nDepartment: {label3}\nConfidence: {conf3}%\n")

Limitations and Bias

  • Training Data Dependence: The model's performance is highly dependent on the quality and diversity of the training data. It might not perform well on complaints that significantly differ in style, language, or content from the training set.
  • Slang Coverage: While slang normalization is applied, new slang or region-specific informalities not present in the replacement dictionary may affect performance.
  • Category Specificity: The defined categories might not cover all possible complaint types. Misclassifications can occur for complaints that are ambiguous or fall between defined categories.
  • Language: The model is primarily trained for English text, potentially with some Hindi slang normalization. Performance on other languages will be poor.

Contact

For questions or feedback, please open an issue on the Hugging Face model repository or contact [Your Name/Email/Link].

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support