Sentiment Analysis

Sentiment Analysis classifies article sentiment using transformer models (DistilBERT) and tracks sentiment trends over time by topic.

Overview

The sentiment analyzer:

Classifies article sentiment: positive, neutral, or negative
Computes sentiment scores (-1.0 to +1.0)
Aggregates daily sentiment by topic
Detects sentiment shifts using moving averages

Architecture

Sentiment Classification

Model

Uses Hugging Face's distilbert-base-uncased-finetuned-sst-2-english:

Model Size: 67MB
Accuracy: ~92% on SST-2 benchmark
Inference Time: ~50ms per article (CPU)
Context Window: 512 tokens (truncates longer articles)

Sentiment Score Mapping

# Model output → Sentiment score
"POSITIVE" (confidence 0.85) → +0.85
"NEGATIVE" (confidence 0.92) → -0.92
"NEUTRAL" → 0.0

Classification Thresholds

if sentiment_score > 0.3:
    classification = "positive"
elif sentiment_score < -0.3:
    classification = "negative"
else:
    classification = "neutral"

Usage

CLI Commands

Analyze Sentiment

aiwebfeeds nlp sentiment

Options:

--batch-size: Number of articles (default: 100)
--force: Reprocess all articles

# Process 50 articles
aiwebfeeds nlp sentiment --batch-size 50

View Sentiment Trends

# 30-day sentiment trend for "AI Safety"
aiwebfeeds nlp sentiment-trend "AI Safety" --days 30

Output:

AI Safety - Sentiment Trend (30 days)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Date       Avg Sentiment  Articles  Positive  Neutral  Negative
2023-10-01    +0.45         24        18        4         2
2023-10-02    +0.32         19        12        5         2
2023-10-03    -0.15         28         8       12         8  ⚠️  Shift

Detect Sentiment Shifts

# Show topics with sentiment shifts (>0.3 change in 7-day MA)
aiwebfeeds nlp sentiment-shifts

Output:

Recent Sentiment Shifts
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Topic          Previous  Current  Change  Status
AI Safety        +0.25    -0.18    -0.43   🔴 Major shift
AI Regulation    -0.10    +0.35    +0.45   🟢 Improving

Compare Topics

aiwebfeeds nlp sentiment-compare "AI Safety" "AI Capabilities"

Shows side-by-side sentiment trends for two topics.

Python API

from ai_web_feeds.nlp import SentimentAnalyzer
from ai_web_feeds.config import Settings

analyzer = SentimentAnalyzer(Settings())

article = {
    "id": 1,
    "title": "RLHF Concerns",
    "content": "Critics have raised serious concerns about RLHF..."
}

sentiment = analyzer.analyze_sentiment(article)
# Returns: {
#     "sentiment_score": -0.65,
#     "classification": "negative",
#     "confidence": 0.89,
#     "model_name": "distilbert-base-uncased-finetuned-sst-2-english"
# }

Batch Processing

Sentiment analysis runs hourly:

from ai_web_feeds.nlp.scheduler import NLPScheduler

nlp_scheduler = NLPScheduler(scheduler)
nlp_scheduler.register_jobs()
# Registers:
# - Sentiment analysis (every hour)
# - Sentiment aggregation (15 min after analysis)

Database Schema

article_sentiment Table

CREATE TABLE article_sentiment (
    article_id INTEGER PRIMARY KEY,
    sentiment_score REAL NOT NULL CHECK(sentiment_score BETWEEN -1.0 AND 1.0),
    classification TEXT NOT NULL CHECK(classification IN ('positive', 'neutral', 'negative')),
    model_name TEXT NOT NULL,
    confidence REAL NOT NULL CHECK(confidence BETWEEN 0 AND 1),
    computed_at DATETIME DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (article_id) REFERENCES feed_entries(id)
);

topic_sentiment_daily Table

Aggregated daily sentiment by topic:

CREATE TABLE topic_sentiment_daily (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    topic TEXT NOT NULL,
    date DATE NOT NULL,
    avg_sentiment REAL NOT NULL,
    article_count INTEGER NOT NULL,
    positive_count INTEGER DEFAULT 0,
    neutral_count INTEGER DEFAULT 0,
    negative_count INTEGER DEFAULT 0,
    UNIQUE(topic, date)
);

Sentiment Aggregation

Daily Aggregation

Runs 15 minutes after sentiment analysis:

# Group sentiment scores by (topic, date)
aggregates = {}
for article in recent_articles:
    for topic in article.topics:
        key = (topic, article.date)
        aggregates[key]["scores"].append(article.sentiment_score)
        aggregates[key][article.classification] += 1

# Compute average
for (topic, date), data in aggregates.items:
    avg_sentiment = sum(data["scores"]) / len(data["scores"])
    storage.upsert_topic_sentiment_daily(
        topic=topic,
        date=date,
        avg_sentiment=avg_sentiment,
        article_count=len(data["scores"]),
        positive_count=data["positive"],
        neutral_count=data["neutral"],
        negative_count=data["negative"]
    )

Shift Detection

7-day moving average:

def detect_shift(topic: str, threshold: float = 0.3) -> bool:
    """Detect sentiment shift using 7-day moving average"""
    trend = storage.get_topic_sentiment_trend(topic, days=14)

    # Compute 7-day MA for last 2 weeks
    ma_recent = mean([day.avg_sentiment for day in trend[:7]])
    ma_previous = mean([day.avg_sentiment for day in trend[7:14]])

    shift = abs(ma_recent - ma_previous)
    return shift > threshold

Configuration

class Phase5Settings(BaseSettings):
    sentiment_batch_size: int = 100
    sentiment_cron: str = "0 * * * *"  # Every hour
    sentiment_model: str = "distilbert-base-uncased-finetuned-sst-2-english"
    sentiment_shift_threshold: float = 0.3

Environment Variables:

PHASE5_SENTIMENT_BATCH_SIZE=100
PHASE5_SENTIMENT_SHIFT_THRESHOLD=0.3
PHASE5_SENTIMENT_MODEL=distilbert-base-uncased-finetuned-sst-2-english

Performance

Throughput: ~100 articles/hour (CPU)
Memory: ~500MB (model loaded)
Storage: ~50 bytes per sentiment record

Use Cases

Monitor Topic Sentiment

Track sentiment for specific topics:

# Daily check for "AI Safety" sentiment
aiwebfeeds nlp sentiment-trend "AI Safety" --days 7

Detect Controversies

Identify topics with negative sentiment spikes:

# Topics with sentiment < -0.5 in last 7 days
aiwebfeeds nlp sentiment-shifts --threshold -0.5

Compare Competing Approaches

# Compare sentiment for competing techniques
aiwebfeeds nlp sentiment-compare "RLHF" "Constitutional AI"

Model Details

DistilBERT Architecture

Base Model: BERT distilled to 66M parameters (40% smaller)
Training: Fine-tuned on SST-2 (Stanford Sentiment Treebank)
Input: Max 512 tokens (articles truncated to ~2000 chars)
Output: Binary classification (positive/negative) with confidence

Limitations

Context Window: Only first 512 tokens considered
Binary Classification: Model trained for binary sentiment (positive/negative), neutral inferred
Domain Shift: SST-2 is movie reviews; AI articles may differ
No Fine-tuning: Pre-trained model used as-is (no domain adaptation)

Troubleshooting

Low Confidence Scores

Symptom: All sentiment predictions have low confidence (<0.6).

Cause: Articles too long, model only sees truncated beginning.

Solution: Increase truncation window or use extractive summarization before analysis.

Model Download Fails

Symptom: OSError: Can't find model

Solution:

# Models auto-download to ~/.cache/huggingface/hub
# Ensure internet connection and disk space (~67MB)

# Manual download:
python -c "from transformers import pipeline; pipeline('sentiment-analysis', model='distilbert-base-uncased-finetuned-sst-2-english')"

Sentiment Shifts Not Detected

Symptom: No shifts reported despite obvious sentiment changes.

Cause: Threshold too high.

Solution:

# Lower threshold to 0.2
export PHASE5_SENTIMENT_SHIFT_THRESHOLD=0.2

Future Enhancements

Domain-Specific Fine-tuning: Train on AI article sentiment labels
Aspect-Based Sentiment: Sentiment for specific entities/topics within articles
Multilingual Support: Add models for non-English content
Real-Time Alerts: Webhook notifications for sentiment shifts

Sentiment Analysis

Sentiment Analysis

Overview

Architecture

Sentiment Classification

Model

Sentiment Score Mapping

Classification Thresholds

Usage

CLI Commands

Analyze Sentiment

View Sentiment Trends

Detect Sentiment Shifts

Compare Topics

Python API

Batch Processing

Database Schema

article_sentiment Table

topic_sentiment_daily Table

Sentiment Aggregation

Daily Aggregation

Shift Detection

Configuration

Performance

Use Cases

Monitor Topic Sentiment

Detect Controversies

Compare Competing Approaches

Model Details

DistilBERT Architecture

Limitations

Troubleshooting

Low Confidence Scores

Model Download Fails

Sentiment Shifts Not Detected

Future Enhancements

See Also

On this page