Source types: blog, newsletter, podcast, journal, preprint, organization, aggregator, video, docs, forum, dataset, code-repo
Content mediums: text, audio, video, code, data
Topic classification with relevance weights
Language and localization support
Quality scoring and curation status
Contributor attribution

Advanced Fetching

Comprehensive Metadata Extraction

Extracts 100+ fields from feeds:

Basic info: title, subtitle, description, link, language, copyright, generator
Author/publisher: name, email, managing editor, webmaster
Visual assets: images, logos, icons
Technical: TTL, skip hours/days, cloud config, PubSubHubbub
Extensions: iTunes podcast metadata, Dublin Core, Media RSS, GeoRSS

Quality Assessment

Three-dimensional scoring system (0-1):

Completeness Score: Measures metadata completeness
Richness Score: Evaluates content depth and quality
Structure Score: Assesses feed validity and structure

Content Analysis

Item statistics (total, with content, with authors, with media)
Average content lengths
Publishing frequency detection
Update pattern analysis

Reliability Features

Conditional requests using ETag and Last-Modified headers
Automatic retry with exponential backoff
Configurable timeouts
Comprehensive error logging
Success rate tracking

Analytics & Reporting

Overview Statistics

Total feeds, items, and topics
Feed status distribution (verified, active, inactive, archived)
Recent activity tracking (24h, 7d, 30d)

Distribution Analysis

Source type distribution
Content medium distribution
Topic distribution across feeds
Language distribution
Geographic distribution (via GeoRSS)

Performance Metrics

Fetch success/failure rates
Average fetch duration
Error type distribution
HTTP status code analysis
Bandwidth usage

Content Intelligence

Content coverage analysis
Author attribution tracking
Category and tag analysis
Publishing trends by time/day
Content freshness metrics

Feed Health Monitoring

Per-feed health scores (0-1)
Health status (Excellent, Good, Fair, Poor, Critical)
Success rate tracking
Content quality metrics
Publishing frequency analysis
Historical trend analysis

Contributor Analytics

Top contributors by feed count
Verification rates
Quality benchmarking
Contribution timeline

Reporting

JSON reports: Full analytics export
OPML export: For feed readers
CSV export: Via Python API
Custom queries: Database access

Platform-Specific Integration

Supported Platforms

Social/Community:

Reddit: Subreddits and user feeds with sorting (hot, top, new)
Hacker News: Multiple feed types (frontpage, newest, best, ask, show, jobs)
Dev.to: User and organization feeds

Publishing:

Medium: Publications, users, and tags
Substack: Newsletter feeds
GitHub: Releases, commits, tags, activity

Media:

YouTube: Channels and playlists
Podcasts: iTunes podcast metadata support

Auto-Discovery

Automatic feed URL generation for known platforms
HTML-based feed discovery for generic sites
Common feed URL pattern detection
Platform-specific configuration support

Data Storage

Database Schema

SQLModel-based ORM for type safety
Support for SQLite and PostgreSQL
Efficient relationship management
JSON columns for flexible metadata storage

Models

FeedSource: Main feed registry with metadata
FeedItem: Individual feed entries
FeedFetchLog: Detailed fetch history and metrics
Topic: Topic taxonomy and relationships

Export & Interoperability

OPML Export

Standard OPML format
Categorized OPML by source type
Filtered OPML generation
Compatible with all major feed readers

Data Formats

YAML: Human-editable feed configuration
JSON: API consumption and export
JSON Schema: Validation and documentation
SQL: Direct database queries

CLI Tools

Feed Management

ai-web-feeds enrich all        # Enrich feeds with metadata
ai-web-feeds validate          # Validate feed configuration
ai-web-feeds export            # Export to various formats

Data Fetching

ai-web-feeds fetch one <id>    # Fetch single feed
ai-web-feeds fetch all         # Fetch all feeds

Analytics

ai-web-feeds analytics overview        # Dashboard view
ai-web-feeds analytics distributions   # Distribution analysis
ai-web-feeds analytics quality         # Quality metrics
ai-web-feeds analytics performance     # Fetch performance
ai-web-feeds analytics content         # Content statistics
ai-web-feeds analytics trends          # Publishing trends
ai-web-feeds analytics health <id>     # Feed health report
ai-web-feeds analytics report          # Full JSON report

OPML Management

ai-web-feeds opml generate     # Generate OPML files
ai-web-feeds opml categorize   # Generate categorized OPML

Quality & Curation

Curation Workflow

Verification status tracking
Quality score calculation (automated)
Curation notes and metadata
Contributor attribution
Curation history

Quality Dimensions

Completeness (0-1): Metadata completeness
Richness (0-1): Content depth and quality
Structure (0-1): Feed validity and structure

Health Status

Excellent (0.8-1.0): Optimal performance
Good (0.6-0.8): Healthy with minor issues
Fair (0.4-0.6): Some problems present
Poor (0.2-0.4): Needs attention
Critical (0.0-0.2): Failing/broken

Extensibility

Plugin Architecture

Custom platform generators
Configurable discovery rules
Extension metadata support
Flexible JSON storage for unknown fields

API Design

Clean Python API for programmatic use
Rich CLI for interactive use
Database session management
Async/await support for concurrent operations

Use Cases

Content Aggregation: Build comprehensive AI/ML content aggregators
Research: Track and analyze AI/ML publication patterns
Monitoring: Monitor feed health and reliability
Discovery: Find new AI/ML content sources
Analysis: Analyze publishing trends and patterns
Curation: Build high-quality curated feed lists
Integration: Feed data into other systems via exports
Alerting: Get notified when feeds break or content is published

Architecture

ai-web-feeds/
├── packages/ai_web_feeds/     # Core library
│   ├── models.py              # Data models
│   ├── storage.py             # Database management
│   ├── utils.py               # Feed discovery & enrichment
│   ├── fetcher.py             # Advanced feed fetching
│   └── analytics.py           # Analytics engine
├── apps/cli/                  # CLI application
│   └── commands/              # CLI commands
│       ├── fetch.py           # Fetch commands
│       ├── analytics.py       # Analytics commands
│       ├── enrich.py          # Enrichment commands
│       ├── export.py          # Export commands
│       ├── opml.py            # OPML commands
│       └── validate.py        # Validation commands
└── data/                      # Data files
    ├── feeds.yaml             # Feed registry
    ├── topics.yaml            # Topic taxonomy
    └── aiwebfeeds.db          # SQLite database