Sentiment Analysis
Transformer-based sentiment classification and trend tracking
Sentiment Analysis
Sentiment Analysis classifies article sentiment using transformer models (DistilBERT) and tracks sentiment trends over time by topic.
Overview
The sentiment analyzer:
- Classifies article sentiment: positive, neutral, or negative
- Computes sentiment scores (-1.0 to +1.0)
- Aggregates daily sentiment by topic
- Detects sentiment shifts using moving averages
Architecture
Sentiment Classification
Model
Uses Hugging Face's distilbert-base-uncased-finetuned-sst-2-english:
- Model Size: 67MB
- Accuracy: ~92% on SST-2 benchmark
- Inference Time: ~50ms per article (CPU)
- Context Window: 512 tokens (truncates longer articles)
Sentiment Score Mapping
# Model output → Sentiment score
"POSITIVE" (confidence 0.85) → +0.85
"NEGATIVE" (confidence 0.92) → -0.92
"NEUTRAL" → 0.0Classification Thresholds
if sentiment_score > 0.3:
classification = "positive"
elif sentiment_score < -0.3:
classification = "negative"
else:
classification = "neutral"Usage
CLI Commands
Analyze Sentiment
aiwebfeeds nlp sentimentOptions:
--batch-size: Number of articles (default: 100)--force: Reprocess all articles
# Process 50 articles
aiwebfeeds nlp sentiment --batch-size 50View Sentiment Trends
# 30-day sentiment trend for "AI Safety"
aiwebfeeds nlp sentiment-trend "AI Safety" --days 30Output:
AI Safety - Sentiment Trend (30 days)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Date Avg Sentiment Articles Positive Neutral Negative
2023-10-01 +0.45 24 18 4 2
2023-10-02 +0.32 19 12 5 2
2023-10-03 -0.15 28 8 12 8 ⚠️ ShiftDetect Sentiment Shifts
# Show topics with sentiment shifts (>0.3 change in 7-day MA)
aiwebfeeds nlp sentiment-shiftsOutput:
Recent Sentiment Shifts
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Topic Previous Current Change Status
AI Safety +0.25 -0.18 -0.43 🔴 Major shift
AI Regulation -0.10 +0.35 +0.45 🟢 ImprovingCompare Topics
aiwebfeeds nlp sentiment-compare "AI Safety" "AI Capabilities"Shows side-by-side sentiment trends for two topics.
Python API
from ai_web_feeds.nlp import SentimentAnalyzer
from ai_web_feeds.config import Settings
analyzer = SentimentAnalyzer(Settings())
article = {
"id": 1,
"title": "RLHF Concerns",
"content": "Critics have raised serious concerns about RLHF..."
}
sentiment = analyzer.analyze_sentiment(article)
# Returns: {
# "sentiment_score": -0.65,
# "classification": "negative",
# "confidence": 0.89,
# "model_name": "distilbert-base-uncased-finetuned-sst-2-english"
# }Batch Processing
Sentiment analysis runs hourly:
from ai_web_feeds.nlp.scheduler import NLPScheduler
nlp_scheduler = NLPScheduler(scheduler)
nlp_scheduler.register_jobs()
# Registers:
# - Sentiment analysis (every hour)
# - Sentiment aggregation (15 min after analysis)Database Schema
article_sentiment Table
CREATE TABLE article_sentiment (
article_id INTEGER PRIMARY KEY,
sentiment_score REAL NOT NULL CHECK(sentiment_score BETWEEN -1.0 AND 1.0),
classification TEXT NOT NULL CHECK(classification IN ('positive', 'neutral', 'negative')),
model_name TEXT NOT NULL,
confidence REAL NOT NULL CHECK(confidence BETWEEN 0 AND 1),
computed_at DATETIME DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (article_id) REFERENCES feed_entries(id)
);topic_sentiment_daily Table
Aggregated daily sentiment by topic:
CREATE TABLE topic_sentiment_daily (
id INTEGER PRIMARY KEY AUTOINCREMENT,
topic TEXT NOT NULL,
date DATE NOT NULL,
avg_sentiment REAL NOT NULL,
article_count INTEGER NOT NULL,
positive_count INTEGER DEFAULT 0,
neutral_count INTEGER DEFAULT 0,
negative_count INTEGER DEFAULT 0,
UNIQUE(topic, date)
);Sentiment Aggregation
Daily Aggregation
Runs 15 minutes after sentiment analysis:
# Group sentiment scores by (topic, date)
aggregates = {}
for article in recent_articles:
for topic in article.topics:
key = (topic, article.date)
aggregates[key]["scores"].append(article.sentiment_score)
aggregates[key][article.classification] += 1
# Compute average
for (topic, date), data in aggregates.items:
avg_sentiment = sum(data["scores"]) / len(data["scores"])
storage.upsert_topic_sentiment_daily(
topic=topic,
date=date,
avg_sentiment=avg_sentiment,
article_count=len(data["scores"]),
positive_count=data["positive"],
neutral_count=data["neutral"],
negative_count=data["negative"]
)Shift Detection
7-day moving average:
def detect_shift(topic: str, threshold: float = 0.3) -> bool:
"""Detect sentiment shift using 7-day moving average"""
trend = storage.get_topic_sentiment_trend(topic, days=14)
# Compute 7-day MA for last 2 weeks
ma_recent = mean([day.avg_sentiment for day in trend[:7]])
ma_previous = mean([day.avg_sentiment for day in trend[7:14]])
shift = abs(ma_recent - ma_previous)
return shift > thresholdConfiguration
class Phase5Settings(BaseSettings):
sentiment_batch_size: int = 100
sentiment_cron: str = "0 * * * *" # Every hour
sentiment_model: str = "distilbert-base-uncased-finetuned-sst-2-english"
sentiment_shift_threshold: float = 0.3Environment Variables:
PHASE5_SENTIMENT_BATCH_SIZE=100
PHASE5_SENTIMENT_SHIFT_THRESHOLD=0.3
PHASE5_SENTIMENT_MODEL=distilbert-base-uncased-finetuned-sst-2-englishPerformance
- Throughput: ~100 articles/hour (CPU)
- Memory: ~500MB (model loaded)
- Storage: ~50 bytes per sentiment record
Use Cases
Monitor Topic Sentiment
Track sentiment for specific topics:
# Daily check for "AI Safety" sentiment
aiwebfeeds nlp sentiment-trend "AI Safety" --days 7Detect Controversies
Identify topics with negative sentiment spikes:
# Topics with sentiment < -0.5 in last 7 days
aiwebfeeds nlp sentiment-shifts --threshold -0.5Compare Competing Approaches
# Compare sentiment for competing techniques
aiwebfeeds nlp sentiment-compare "RLHF" "Constitutional AI"Model Details
DistilBERT Architecture
- Base Model: BERT distilled to 66M parameters (40% smaller)
- Training: Fine-tuned on SST-2 (Stanford Sentiment Treebank)
- Input: Max 512 tokens (articles truncated to ~2000 chars)
- Output: Binary classification (positive/negative) with confidence
Limitations
- Context Window: Only first 512 tokens considered
- Binary Classification: Model trained for binary sentiment (positive/negative), neutral inferred
- Domain Shift: SST-2 is movie reviews; AI articles may differ
- No Fine-tuning: Pre-trained model used as-is (no domain adaptation)
Troubleshooting
Low Confidence Scores
Symptom: All sentiment predictions have low confidence (<0.6).
Cause: Articles too long, model only sees truncated beginning.
Solution: Increase truncation window or use extractive summarization before analysis.
Model Download Fails
Symptom: OSError: Can't find model
Solution:
# Models auto-download to ~/.cache/huggingface/hub
# Ensure internet connection and disk space (~67MB)
# Manual download:
python -c "from transformers import pipeline; pipeline('sentiment-analysis', model='distilbert-base-uncased-finetuned-sst-2-english')"Sentiment Shifts Not Detected
Symptom: No shifts reported despite obvious sentiment changes.
Cause: Threshold too high.
Solution:
# Lower threshold to 0.2
export PHASE5_SENTIMENT_SHIFT_THRESHOLD=0.2Future Enhancements
- Domain-Specific Fine-tuning: Train on AI article sentiment labels
- Aspect-Based Sentiment: Sentiment for specific entities/topics within articles
- Multilingual Support: Add models for non-English content
- Real-Time Alerts: Webhook notifications for sentiment shifts
See Also
- Quality Scoring - Article quality assessment
- Entity Extraction - Named entity recognition
- Topic Modeling - Discover subtopics