An end-to-end multi-label NLP pipeline that detects and classifies toxic online comments into 6 simultaneous toxicity categories using TF-IDF vectorization, 7 benchmarked ML classifiers, LIME explainability, and an AI-powered Streamlit app backed by LangChain + Groq.
Live Demo Β· Dataset Β· Pipeline Β· Models Β· Results Β· Setup
- Problem Statement
- Why 6 Binary Classifiers?
- Project Highlights
- Toxicity Categories
- Project Structure
- End-to-End Pipeline
- Dataset
- EDA & Label Analysis
- Text Preprocessing Pipeline
- Handling Class Imbalance
- TF-IDF Vectorization
- Model Training & Benchmarking
- Model Explainability β LIME
- Streamlit App + LangChain
- Results & Evaluation
- Technologies Used
- Getting Started
- Deployment
- Key Challenges & Solutions
- Future Work
- Author
Online platforms generate hundreds of millions of comments every day. Manual moderation at this scale is impossible β automated systems are essential for maintaining healthy communities. This project builds a production-ready toxic comment detection pipeline that:
- Identifies 6 distinct toxicity types simultaneously (a comment can be toxic AND obscene AND insulting at the same time)
- Handles extreme class imbalance (the
threatlabel has fewer than 500 positive examples in 159,000 rows) - Provides human-readable AI explanations for every flagged comment via LangChain
- Is deployed as an interactive Streamlit web application for live inference
This is the most important design decision in the project β and the first question every interviewer will ask.
A standard multiclass classifier assumes mutual exclusivity β each sample belongs to exactly one class. But toxic comments routinely belong to multiple categories simultaneously:
"You are a disgusting [slur] and I will find you."
β toxic: β
obscene: β
threat: β
insult: β
Solution: Train one independent binary classifier per label. Each predicts "Is this comment [label]? Yes/No" independently. This:
- β Correctly handles co-occurrence of multiple toxicity types
- β
Allows per-label threshold tuning (critical for minority classes like
threat) - β Mirrors real-world content moderation systems at Google, Reddit, and Meta
- β Enables per-label model selection (best classifier may differ per category)
| Feature | Details |
|---|---|
| π Task | Multi-label binary text classification (6 labels) |
| π¦ Dataset | Jigsaw Toxic Comment Challenge β 159,571 training rows |
| π€ Vectorization | TF-IDF with bigrams, sublinear TF, 50k features per label |
| π€ Models | 7 classifiers benchmarked Γ 6 labels = 42 trained models |
| βοΈ Imbalance | class_weight='balanced' + threshold tuning per label |
| π Explainability | LIME β highlights words that drove each prediction |
| π¬ AI Layer | LangChain + Groq LLM generates natural language explanations |
| π Deployment | Streamlit Community Cloud β zero-cost live demo |
| π§ͺ Testing | Unit tests for preprocessing pipeline (pytest) |
The classifier detects 6 independently predictable toxicity types:
| Label | Description | Approx. Positives |
|---|---|---|
toxic |
General toxic language β abusive, disrespectful | ~15,294 |
severe_toxic |
Extreme aggression, violent language | ~1,595 |
obscene |
Sexually explicit or profane content | ~8,449 |
threat |
Explicit or implied threats of violence | ~478 |
insult |
Direct personal attacks or demeaning language | ~7,877 |
identity_hate |
Hate speech targeting identity (race, religion, gender) | ~1,405 |
β οΈ Note: A single comment can be labelled positive for multiple categories. Thethreatclass (~0.3% positive rate) requires special handling to avoid being ignored by the model.
Toxic-Comment-Classification/
β
βββ π models/
β βββ toxic_model.pkl
β βββ severe_toxic_model.pkl
β βββ threat_model.pkl
β βββ obscene_model.pkl
β βββ insult_model.pkl
β βββ identity_hate_model.pkl
β
βββ π vectorizers/
β βββ toxic_vectorizer.pkl
β βββ severe_toxic_vectorizer.pkl
β βββ threat_vectorizer.pkl
β βββ obscene_vectorizer.pkl
β βββ insult_vectorizer.pkl
β βββ identity_hate_vectorizer.pkl
β
βββ π Balanced Data/ β SMOTE-resampled arrays (.npy)
βββ π Distribution-of-Classification/ β Label distribution charts
β
βββ π tests/
β βββ test_preprocessing.py β Unit tests for clean() function
β
βββ π Analysis.ipynb β EDA, label analysis, WordClouds
βββ π Feature-Engg-Model-Building.ipynb β TF-IDF, training, evaluation
β
βββ π± app.py β Streamlit app (3 tabs)
βββ π requirements.txt
βββ π README.md
Raw Text (train.csv Β· 159,571 rows)
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββ
β PHASE 1 Β· EDA & LABEL ANALYSIS β
β Class distribution Β· Label co-occurrence β
β WordClouds per category Β· Length analysis β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββ
β PHASE 2 Β· TEXT PREPROCESSING β
β Strip \n Β· Remove alphanumerics β
β Lowercase + Punctuation removal β
β Non-ASCII removal Β· Stopword filtering β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββ
β PHASE 3 Β· CLASS IMBALANCE HANDLING β
β class_weight='balanced' per classifier β
β Threshold tuning: 0.5 β 0.3 for minority β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββ
β PHASE 4 Β· TF-IDF VECTORIZATION β
β 1 vectorizer per label Β· 50k features β
β bigrams (1,2) Β· sublinear_tf=True β
β Fit on train Β· Transform val & test β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββ
β PHASE 5 Β· MODEL TRAINING (7 Γ 6 = 42) β
β LR Β· SVM Β· RF Β· CNB Β· BNB Β· DT Β· KNN β
β Stratified 80/20 split per label β
β F1 score stored in results DataFrame β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββ
β PHASE 6 Β· EVALUATION & EXPLAINABILITY β
β F1 heatmap Β· Confusion matrices β
β ROC curves Β· LIME word-level explanations β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββ
β PHASE 7 Β· SERIALIZATION β
β 6 best-model .pkl files β
β 6 fitted TF-IDF vectorizer .pkl files β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββ
β STREAMLIT APP (3 tabs) β
β Predict Β· Explain Β· Compareβ
β LangChain AI narration β
ββββββββββββββββββββββββββββββ
| Property | Details |
|---|---|
| Source | Jigsaw Toxic Comment Classification Challenge (Kaggle) |
| Train rows | 159,571 comments |
| Test rows | 153,164 comments |
| Input column | comment_text (raw Wikipedia edit comments) |
| Label columns | 6 binary columns (0 / 1 per toxicity type) |
| Multi-label rate | ~10% of toxic comments carry 2+ positive labels |
| Clean rate | ~90% of all comments have zero positive labels |
df = pd.read_csv('train.csv', na_values=[' ?'])
df['comment_text'].fillna(' ', inplace=True)
LABELS = ['toxic', 'severe_toxic', 'threat', 'obscene', 'insult', 'identity_hate']
print(df[LABELS].sum())
# toxic 15294
# severe_toxic 1595
# obscene 8449
# threat 478
# insult 7877
# identity_hate 1405A Pearson correlation heatmap on the 6 binary label columns reveals that toxic, obscene, and insult are strongly correlated (tend to appear together), while threat and identity_hate are more independent signals.
import seaborn as sns
sns.heatmap(df[LABELS].corr(), annot=True, cmap='coolwarm', fmt='.2f')Key Insight: This correlation structure confirms that multi-label classification is the correct framing β these categories are not mutually exclusive and frequently co-occur.
Individual WordClouds for each toxicity label expose the most discriminating vocabulary per category. Key findings:
- Toxic / Obscene / Insult share a significant common vocabulary
- Threat has distinctive action-oriented terms not present in other labels
- Identity Hate shows distinct slur and group-targeting language
from wordcloud import WordCloud, STOPWORDS
for label in LABELS:
text = ' '.join(df[df[label] == 1]['comment_text'])
wc = WordCloud(max_words=100, stopwords=STOPWORDS, background_color='white')
wc.generate(text)
plt.imshow(wc); plt.title(label); plt.axis('off'); plt.show()Toxic comments tend to be shorter and more concentrated in their aggression compared to neutral comments. This was validated by plotting character length distributions grouped by each label.
Every comment passes through a 5-stage cleaning pipeline before vectorization:
import re
# Stage 1: Remove newline characters
remove_n = lambda x: re.sub("\n", "", x)
# Stage 2: Remove alphanumeric tokens (words containing digits)
remove_alpha_num = lambda x: re.sub("\w*\d\w*", '', x)
# Stage 3: Lowercase + remove punctuation and underscores
remove_pun = lambda x: re.sub(r"([^\w\s]|_)", '', x.lower())
# Stage 4: Remove non-ASCII characters (emojis, special Unicode)
remove_non_ascii = lambda x: re.sub(r'[^\x00-\x7f]', r' ', x)
# Stage 5: Remove stopwords (audited β toxic signal words preserved)
STOPS = set(stopwords.words('english')) - {'kill', 'hate', 'die', 'not', 'no'}
def clean(text: str) -> str:
text = str(text)
for fn in [remove_n, remove_alpha_num, remove_pun, remove_non_ascii]:
text = fn(text)
text = ' '.join([w for w in text.split() if w not in STOPS])
return text
df['clean_text'] = df['comment_text'].map(clean)
β οΈ Critical Design Note: Standard NLTK stopword lists contain words like"kill","hate","die"β semantically critical for toxicity detection. These were manually removed from the stopword set before filtering.
| Stage | Vocabulary Size |
|---|---|
| Raw text | ~210,000 unique tokens |
| After cleaning | ~78,000 unique tokens |
| After TF-IDF (max_features=50k) | 50,000 features |
63% vocabulary reduction β improving training speed and model generalization.
The dataset exhibits extreme label imbalance, particularly for threat (0.30% positive rate) and identity_hate (0.88%):
Applied to Logistic Regression and LinearSVC. The balanced mode automatically computes class weights inversely proportional to class frequency:
weight_minority = n_samples / (n_classes * n_minority_samples)This forces the optimizer to treat each minority positive as proportionally more important during training.
The default 0.5 probability threshold is suboptimal for minority labels. For threat and identity_hate, the threshold was lowered to 0.3, significantly improving recall at an acceptable precision tradeoff:
proba = model.predict_proba(X_val)[:, 1]
for thresh in [0.3, 0.4, 0.5]:
pred = (proba >= thresh).astype(int)
print(f"thresh={thresh}: F1={f1_score(y_val, pred):.3f} "
f"Recall={recall_score(y_val, pred):.3f} "
f"Precision={precision_score(y_val, pred):.3f}")Complement Naive Bayes is specifically designed for imbalanced text classification β it learns from the complement of each class, making it highly effective when positive examples are scarce.
Each of the 6 labels has its own independently fitted TF-IDF vectorizer. This allows each vectorizer to learn feature weights optimised for its specific binary detection task.
from sklearn.feature_extraction.text import TfidfVectorizer
vectorizers = {}
X_matrices = {}
for label in LABELS:
tfidf = TfidfVectorizer(
max_features = 50_000, # top 50k terms by TF-IDF score
ngram_range = (1, 2), # unigrams + bigrams
min_df = 2, # ignore terms in fewer than 2 docs
sublinear_tf = True # apply log(1 + tf) β compresses frequency skew
)
vectorizers[label] = tfidf
X_matrices[label] = tfidf.fit_transform(df['clean_text'])Without log normalization, a slur appearing 100 times in one comment would be weighted 100Γ more than one appearing once. log(1 + tf) compresses this to ~4.6Γ, preventing a single repeat-term comment from dominating the model.
Bigrams capture toxic phrases that unigrams miss entirely:
| Unigrams (miss intent) | Bigrams (capture intent) |
|---|---|
"kill", "you" |
"kill you" |
"your", "kind" |
"your kind" |
"go", "die" |
"go die" |
import pickle, os
os.makedirs('vectorizers', exist_ok=True)
for label, tfidf in vectorizers.items():
with open(f'vectorizers/{label}_vectorizer.pkl', 'wb') as f:
pickle.dump(tfidf, f)Rule: At inference time, always transform using the same fitted vectorizer β never refit on new data. Refitting would produce different feature indices and make predictions meaningless.
from sklearn.linear_model import LogisticRegression
from sklearn.svm import LinearSVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import ComplementNB, BernoulliNB
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score
MODELS = {
'LR' : LogisticRegression(class_weight='balanced', max_iter=1000, C=1.0),
'SVM' : LinearSVC(class_weight='balanced', max_iter=2000),
'RF' : RandomForestClassifier(n_estimators=100, class_weight='balanced', n_jobs=-1),
'CNB' : ComplementNB(),
'BNB' : BernoulliNB(),
'DT' : DecisionTreeClassifier(class_weight='balanced'),
'KNN' : KNeighborsClassifier(n_neighbors=5, n_jobs=-1)
}
results = {}
trained_models = {}
for label in LABELS:
X = X_matrices[label]
y = df[label]
X_tr, X_val, y_tr, y_val = train_test_split(
X, y, test_size=0.2, stratify=y, random_state=42
)
results[label] = {}
for name, clf in MODELS.items():
clf.fit(X_tr, y_tr)
pred = clf.predict(X_val)
results[label][name] = f1_score(y_val, pred)
# Store trained instances
trained_models[label] = {name: clf for name, clf in MODELS.items()}result_df = pd.DataFrame(results).T # rows=labels, columns=models
plt.figure(figsize=(10, 5))
sns.heatmap(result_df, annot=True, fmt=".3f", cmap="YlGn",
linewidths=0.5, cbar_kws={'label': 'F1 Score'})
plt.title("F1 Score Benchmark: Model Γ Toxicity Label")
plt.tight_layout(); plt.show()| Label | Best Classifier | Rationale |
|---|---|---|
toxic |
Logistic Regression | Best F1 on largest, most balanced label |
severe_toxic |
ComplementNB | Designed for imbalanced text; strong on minority |
threat |
ComplementNB | Extreme imbalance (0.3% positive) β CNB excels |
obscene |
Logistic Regression | High positive rate; LR performs consistently |
insult |
Logistic Regression | High positive rate; linear boundary effective |
identity_hate |
ComplementNB | Minority class; CNB + threshold 0.3 best recall |
General finding: Logistic Regression with
class_weight='balanced'is the most reliable overall performer. ComplementNB wins specifically for severely imbalanced minority labels. KNN was excluded from production β it is extremely slow (O(nΒ²)) on sparse TF-IDF matrices at 159k rows.
os.makedirs('models', exist_ok=True)
best_models = {
'toxic': trained_models['toxic']['LR'],
'severe_toxic': trained_models['severe_toxic']['CNB'],
'threat': trained_models['threat']['CNB'],
'obscene': trained_models['obscene']['LR'],
'insult': trained_models['insult']['LR'],
'identity_hate': trained_models['identity_hate']['CNB'],
}
for label, model in best_models.items():
with open(f'models/{label}_model.pkl', 'wb') as f:
pickle.dump(model, f)LIME (Local Interpretable Model-agnostic Explanations) is the NLP equivalent of Grad-CAM for images β it reveals exactly which words in a comment drove the prediction toward or away from each toxicity label.
from lime.lime_text import LimeTextExplainer
def predict_fn_factory(label):
"""Returns a prediction function for a given label."""
model = best_models[label]
vec = vectorizers[label]
def predict_fn(texts):
X = vec.transform(texts)
return model.predict_proba(X)
return predict_fn
explainer = LimeTextExplainer(class_names=['clean', 'toxic'])
sample = "You are an idiot and I will make sure everyone knows it."
exp = explainer.explain_instance(
sample,
predict_fn_factory('insult'),
num_features=10,
num_samples=500
)
exp.show_in_notebook()What LIME reveals:
- Words highlighted in red pushed the prediction toward the toxic label
- Words in green pushed toward clean
- Confidence bars show the magnitude of each word's contribution
This provides clinical-grade transparency β essential for any content moderation system used in production environments.
The deployed app provides three functional tabs:
import streamlit as st
import pickle
LABELS = ['toxic','severe_toxic','threat','obscene','insult','identity_hate']
THRESHOLDS = {
'toxic': 0.4, 'severe_toxic': 0.35, 'threat': 0.3,
'obscene': 0.4, 'insult': 0.4, 'identity_hate': 0.3
}
@st.cache_resource
def load_artifacts():
models, vecs = {}, {}
for l in LABELS:
models[l] = pickle.load(open(f'models/{l}_model.pkl', 'rb'))
vecs[l] = pickle.load(open(f'vectorizers/{l}_vectorizer.pkl', 'rb'))
return models, vecs
models, vecs = load_artifacts()
comment = st.text_area("Enter a comment to analyse", height=120)
if st.button("π Classify") and comment:
predictions, probabilities = {}, {}
for label in LABELS:
X = vecs[label].transform([comment])
prob = models[label].predict_proba(X)[0][1]
probabilities[label] = prob
predictions[label] = prob >= THRESHOLDS[label]
# Display probability bars with colour coding
for label, prob in probabilities.items():
icon = "π΄" if predictions[label] else "π’"
st.progress(float(prob), text=f"{icon} {label.replace('_',' ').title()}: {prob:.1%}")When toxicity is detected, LangChain invokes a Groq-hosted LLM to generate a natural language explanation:
from langchain_groq import ChatGroq
from langchain.schema import HumanMessage
@st.cache_resource
def get_llm():
return ChatGroq(model="llama3-8b-8192",
api_key=st.secrets["GROQ_API_KEY"])
llm = get_llm()
flagged = [l for l in LABELS if predictions[l]]
if flagged:
prompt = HumanMessage(content=f"""
A content moderation system flagged the following comment as: {', '.join(flagged)}.
Comment: "{comment}"
In 2 clear sentences, explain why this comment was flagged.
Be factual, professional, and avoid reproducing the harmful content verbatim.
""")
response = llm.invoke([prompt])
st.info(f"π€ **AI Explanation:** {response.content}")Displays the F1 heatmap (7 models Γ 6 labels) as an interactive Plotly chart alongside the per-label best model selection rationale.
| Metric | Why Used |
|---|---|
| F1 Score (Binary) | Primary metric β harmonic mean of precision & recall for each label |
| Precision | Of comments flagged, what fraction are actually toxic? |
| Recall | Of all toxic comments, what fraction were correctly caught? |
| ROC-AUC | Threshold-free discriminative ability for ranking |
| Confusion Matrix | Per-label error breakdown β especially False Negatives |
In content moderation, False Negatives are more costly than False Positives β a missed threat is worse than a false alarm. This informed threshold tuning decisions.
Label: toxic
precision recall f1-score support
clean 0.97 0.97 0.97 28642
toxic 0.82 0.80 0.81 3059
Label: threat
precision recall f1-score support
clean 1.00 0.99 0.99 31814
threat 0.68 0.71 0.69 96 β threshold=0.30
| Label | Best Model | Val F1 | Notes |
|---|---|---|---|
toxic |
Logistic Regression | β | Highest volume label |
severe_toxic |
ComplementNB | β | LR competitive |
threat |
ComplementNB | β | Threshold 0.30 critical |
obscene |
Logistic Regression | β | Most learnable label |
insult |
Logistic Regression | β | Second highest volume |
identity_hate |
ComplementNB | β | Threshold 0.30 applied |
π Fill in actual F1 values from your notebook output after training.
# NLP & ML
nltk >= 3.8 # Stopwords, tokenization
scikit-learn >= 1.3 # TF-IDF, classifiers, metrics
imbalanced-learn >= 0.11 # RandomOverSampler
# Visualization
matplotlib >= 3.7 # Plots, training curves
seaborn >= 0.12 # Heatmaps, distribution plots
wordcloud >= 1.9 # Category WordClouds
# Explainability
lime >= 0.2 # Local model explanations
# Deployment
streamlit >= 1.28 # Interactive web app
langchain >= 0.2 # LLM orchestration
langchain-groq >= 0.1 # Free Groq LLM API
Pillow >= 10.0 # Image handling| Concept | Implementation |
|---|---|
| Multi-label binary classification | 6 independent binary classifiers |
| TF-IDF with n-grams | Bigram vectorization, 50k features per label |
| Text preprocessing pipeline | 5-stage regex cleaning + stopword audit |
| Class imbalance handling | class_weight='balanced' + threshold tuning |
| Model benchmarking | 7 classifiers evaluated per label via F1 |
| LIME explainability | Word-level contribution scores per prediction |
| LangChain LLM integration | Groq-hosted Llama 3 for natural language explanations |
| Unit testing | pytest on preprocessing edge cases |
| Streamlit deployment | Multi-tab app with live inference |
- Python 3.10+
- A free Groq API key (for AI explanation feature)
git clone https://github.com/Yousuf-177/Toxic-Comment-Classification.git
cd Toxic-Comment-Classificationpython -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activatepip install -r requirements.txtpip install kaggle
kaggle competitions download -c jigsaw-toxic-comment-classification-challenge
unzip jigsaw-toxic-comment-classification-challenge.zipOr download manually from Kaggle.
import nltk
nltk.download('stopwords')# Phase 1β2: EDA, preprocessing, label analysis
jupyter notebook Analysis.ipynb
# Phase 3β7: TF-IDF, model training, evaluation, serialization
jupyter notebook Feature-Engg-Model-Building.ipynbpytest tests/test_preprocessing.py -v# Set Groq API key locally
echo "GROQ_API_KEY = 'your_key_here'" > .streamlit/secrets.toml
streamlit run app.pyOpen http://localhost:8501 β Enter any comment β Classify + Explain.
- Push your repository to GitHub (ensure
.pklmodels are committed or loaded from a URL) - Visit share.streamlit.io β New app
- Select your repo and set
app.pyas the entry point - Under Advanced settings β Secrets, add:
GROQ_API_KEY = "your_groq_api_key_here"
- Click Deploy β live URL available in ~2 minutes
β οΈ Security: Never hardcode API keys inapp.py. Always usest.secretsfor Streamlit Cloud or.envfiles locally. Add.envandsecrets.tomlto.gitignore.
| Challenge | Root Cause | Solution |
|---|---|---|
Extreme class imbalance (threat: 0.3% positive) |
Real-world toxicity distribution | class_weight='balanced' + ComplementNB + threshold tuning to 0.3 |
| Multi-label co-occurrence | Comments belong to 2+ categories simultaneously | 6 independent binary classifiers instead of 1 multiclass |
| Stopword over-removal | Standard NLTK list includes toxic signal words | Manually audited stopword set β kill, hate, die etc. preserved |
| KNN slowness on sparse matrices | O(nΒ²) distance computation on 159k TF-IDF rows | Excluded KNN from production; LinearSVC used as fast alternative |
| Misleading accuracy metric | 90% clean rate makes any model look "good" | F1, AUC, and Recall as primary metrics |
| Training-serving skew | Different preprocessing at inference vs training | Preprocessing packaged with app.py; same clean() function used in both |
| API key exposure | Groq key required for LangChain layer | st.secrets on Streamlit Cloud; .env locally; .gitignore enforced |
- Fine-tune DistilBERT / RoBERTa β transformer-based models expected to yield +8β10 F1 points, especially on minority labels with complex context
- Active learning loop β surface high-uncertainty predictions for human review, creating a continuous improvement pipeline
- Multilingual support β extend to non-English toxic content using XLM-RoBERTa
- Real-time streaming β Kafka + Flink pipeline for processing live comment feeds at platform scale
- Ensemble per label β stack LR + CNB predictions with a meta-learner for improved minority class recall
- SHAP integration β global feature importance analysis across the full test set (vs LIME's local explanations)
- FastAPI backend β replace Streamlit with a REST API for integration into production moderation systems
- Real-time content moderation for social media platforms and forums
- Automated pre-screening before human moderation review queues
- Compliance monitoring for enterprise communication tools (Slack, Teams)
- Research into online harmful speech patterns and toxicity dynamics
- Training data generation for fine-tuning domain-specific LLMs
This project is licensed under the MIT License β free to use, modify, and distribute with attribution.
Yousuf
GitHub Profile Β· Project Repository
If this project helped you, give it a β on GitHub!
Built to make online communities safer β one comment at a time.