Complaint Classification FastText Model
This is a FastText model trained to classify student complaints into different departmental categories such as 'Mess/Food', 'Cleanliness', 'Infrastructure', 'Technical Issues', 'Academics', and 'Ragging'. It's designed to help streamline the process of directing student grievances to the correct department for resolution.
Model Description
This model is a supervised text classifier built using the FastText library. It takes a raw complaint text as input and outputs a predicted category along with a confidence score. The model was trained on a dataset of student complaints, which underwent specific text preprocessing steps including cleaning, slang normalization, and keyword boosting to improve classification accuracy.
Intended Use
This model is intended for use in educational institutions to automatically categorize student complaints, thereby improving the efficiency of complaint resolution systems. It can be integrated into ticketing systems, chatbots, or other platforms where initial complaint routing is required.
Training Data
The model was trained on a custom dataset of student complaints, ML_Project_Complaint_Dataset_Duration_Imp.csv. The dataset includes two primary columns: complaint_text and category.
Preprocessing steps applied to the training data:
- Text Cleaning: Lowercasing, removal of extra whitespace.
- Slang Normalization: Replacement of common informal words/slang with their standard equivalents (e.g., 'bakwass' to 'bakwas', 'plz' to 'please').
- Keyword Boosting: Addition of relevant keywords to complaints containing specific terms (e.g., adding 'mess food quality eating' to complaints mentioning 'food', 'mess', etc.) to enhance category recognition.
The dataset was balanced using random sampling to ensure each category had an equal number of samples before the train-test split.
Evaluation
The model was evaluated on a held-out test set. Key metrics are:
- Precision: 0.948
- Recall: 0.948
How to Use
To use this model for prediction, you need to first download it and then apply the same preprocessing steps that were used during training. Below is a Python example:
import fasttext
import re
from huggingface_hub import hf_hub_download
# --- Preprocessing Functions (MUST be the same as training) ---
def clean_text(text):
text = str(text).lower()
text = re.sub(r'\s+', ' ', text).strip()
return text
def normalize_slang(text):
replacements = {
"bakwass": "bakwas",
"boht": "bahut",
"nhi": "nahi",
"nai": "nahi",
"yaarrr": "yaar",
"yawwrrr": "yaar",
"pls": "please",
"plz": "please"
}
for k, v in replacements.items():
text = text.replace(k, v)
return text
def boost_keywords(text):
if any(word in text for word in ["khana", "khaana", "Jevan", "food", "mess", "roti", "sabzi", "rice", "milk"]):
text += " mess food quality eating"
if any(word in text for word in ["ganda","flush","toilet", "washroom", "dirty", "dust", "garbage", "smell"]):
text += " cleanliness hygiene sanitation"
# TECH
if any(word in text for word in ["wifi", "internet", "net", "network", "server", "pcs", "system"]):
text += " technical network issue"
# INFRA
if any(word in text for word in ["ac", "fan", "light", "door", "bench", "lock", "window"]):
text += " infrastructure maintenance"
# ACADEMICS
if any(word in text for word in ["teacher", "lecture", "class", "test", "exam", "assignment"]):
text += " academics study"
# RAGGING
if any(word in text for word in ["ragging", "bully", "harass", "senior"]):
text += " ragging harassment"
return text
# --- Download and Load Model ---
# Replace 'Sheshank2609/Complaint_Classifier' with your actual repo_id
model_path = hf_hub_download(repo_id="Sheshank2609/Complaint_Classifier", filename="complaint_classifier.ftz")
model = fasttext.load_model(model_path)
# --- Prediction Function ---
def predict_complaint_hf(text):
processed_text = clean_text(text)
processed_text = normalize_slang(processed_text)
processed_text = boost_keywords(processed_text)
predictions = model.predict(processed_text, k=1)
label = predictions[0][0].replace("__label__", "")
confidence = predictions[1][0] * 100
return label, round(confidence, 2)
# --- Example Usage ---
complaint1 = "My room is very dirty, full of dust."
label1, conf1 = predict_complaint_hf(complaint1)
print(f"Complaint: \"{complaint1}\"\nDepartment: {label1}\nConfidence: {conf1}%\n")
complaint2 = "The mess food is terrible today, no taste."
label2, conf2 = predict_complaint_hf(complaint2)
print(f"Complaint: \"{complaint2}\"\nDepartment: {label2}\nConfidence: {conf2}%\n")
complaint3 = "The wifi is not working in the hostel common room."
label3, conf3 = predict_complaint_hf(complaint3)
print(f"Complaint: \"{complaint3}\"\nDepartment: {label3}\nConfidence: {conf3}%\n")
Limitations and Bias
- Training Data Dependence: The model's performance is highly dependent on the quality and diversity of the training data. It might not perform well on complaints that significantly differ in style, language, or content from the training set.
- Slang Coverage: While slang normalization is applied, new slang or region-specific informalities not present in the replacement dictionary may affect performance.
- Category Specificity: The defined categories might not cover all possible complaint types. Misclassifications can occur for complaints that are ambiguous or fall between defined categories.
- Language: The model is primarily trained for English text, potentially with some Hindi slang normalization. Performance on other languages will be poor.
Contact
For questions or feedback, please open an issue on the Hugging Face model repository or contact [Your Name/Email/Link].