AI Web FeedsAIWebFeeds
Features

Twitter/X and arXiv Integration

Generate RSS feeds from Twitter/X and arXiv for AI research tracking

Overview

AI Web Feeds provides native integrations for Twitter/X and arXiv, enabling you to track AI researchers, discussions, and papers through RSS feeds.

Twitter/X integration uses Nitter instances (privacy-focused alternative Twitter frontend) to generate RSS feeds.

Twitter/X Integration

Supported Feed Types

Get tweets from a specific user.

- id: "karpathy-twitter"
  site: "https://twitter.com/karpathy"
  title: "Andrej Karpathy on Twitter"
  topics: ["ai", "ml", "research"]
  source_type: "twitter"
  mediums: ["text"]
  platform_config:
    platform: "twitter"
    twitter:
      username: "karpathy"
      nitter_instance: "nitter.net"  # Optional, defaults to nitter.net

Generated Feed URL: https://nitter.net/karpathy/rss

Get tweets from a Twitter list.

- id: "ai-researchers-list"
  site: "https://twitter.com/i/lists/1234567890"
  title: "AI Researchers List"
  topics: ["ai", "research"]
  source_type: "twitter"
  platform_config:
    platform: "twitter"
    twitter:
      list_id: "1234567890"

Generated Feed URL: https://nitter.net/i/lists/1234567890/rss

Get tweets matching a search query.

- id: "twitter-llm-search"
  site: "https://twitter.com/search"
  title: "Twitter Search - LLM discussions"
  topics: ["llm", "community"]
  source_type: "twitter"
  platform_config:
    platform: "twitter"
    twitter:
      search_query: "LLM OR large language model"

Generated Feed URL: https://nitter.net/search/rss?q=LLM+OR+large+language+model

Configuration Schema

The platform_config.twitter object supports:

FieldTypeDescription
usernamestringTwitter username (without @)
list_idstringTwitter list ID
search_querystringTwitter search query
nitter_instancestringNitter instance URL (default: nitter.net)

Alternative Nitter Instances

For reliability, you can use different Nitter instances:

  • nitter.net (default)
  • nitter.privacy.com.de
  • nitter.1d4.us
  • nitter.kavin.rocks
Nitter instances may have rate limits or availability issues. Consider using multiple instances for redundancy.

arXiv Integration

Supported Feed Types

RSS feeds for specific arXiv categories.

- id: "arxiv-cs-lg"
  site: "https://arxiv.org/list/cs.LG/recent"
  title: "arXiv - Computer Science - Machine Learning"
  topics: ["research", "papers", "ml"]
  source_type: "arxiv"
  mediums: ["text"]
  platform_config:
    platform: "arxiv"
    arxiv:
      category: "cs.LG"

Generated Feed URL: http://export.arxiv.org/rss/cs.LG

Papers by specific authors.

- id: "arxiv-bengio"
  site: "https://arxiv.org"
  title: "arXiv - Yoshua Bengio papers"
  topics: ["research", "papers", "ml"]
  source_type: "arxiv"
  platform_config:
    platform: "arxiv"
    arxiv:
      author: "Yoshua Bengio"
      max_results: 50

Generated Feed URL: http://export.arxiv.org/api/query?search_query=au:Yoshua+Bengio&max_results=50&sortBy=submittedDate&sortOrder=descending

Advanced search capabilities.

- id: "arxiv-transformer-search"
  site: "https://arxiv.org"
  title: "arXiv - Transformer papers"
  topics: ["research", "nlp"]
  source_type: "arxiv"
  platform_config:
    platform: "arxiv"
    arxiv:
      search_query: "all:transformer AND all:attention"
      max_results: 100

Generated Feed URL: http://export.arxiv.org/api/query?search_query=all:transformer+AND+all:attention&max_results=100&sortBy=submittedDate&sortOrder=descending

Configuration Schema

The platform_config.arxiv object supports:

FieldTypeDescription
categorystringarXiv category (e.g., cs.LG, stat.ML)
authorstringAuthor name for author-specific feeds
search_querystringAdvanced search query
max_resultsintegerMaximum number of results (default: 50)
  • cs.LG - Machine Learning
  • cs.AI - Artificial Intelligence
  • cs.CL - Computation and Language (NLP)
  • cs.CV - Computer Vision and Pattern Recognition
  • cs.NE - Neural and Evolutionary Computing
  • stat.ML - Machine Learning (Statistics)
  • cs.RO - Robotics
  • cs.IR - Information Retrieval

arXiv Search Syntax

When using search_query, you can use arXiv's advanced search:

  • au:author_name - Author search
  • ti:title_words - Title search
  • abs:abstract_words - Abstract search
  • all:keywords - Search all fields
  • Use AND, OR, ANDNOT for boolean queries

Example: all:transformer AND cat:cs.LG

Implementation Details

Platform Detection

The system automatically detects Twitter/X and arXiv URLs:

Twitter/X domains:

  • twitter.com, www.twitter.com
  • x.com, www.x.com

arXiv domains:

  • arxiv.org, www.arxiv.org
  • export.arxiv.org

Feed URL Generation

Platform-specific generators:

  1. generate_twitter_feed_url(url, platform_config) - Generates Nitter RSS URLs
  2. generate_arxiv_feed_url(url, platform_config) - Generates arXiv RSS/API URLs

These are automatically called during feed discovery.

Testing

Run the integration tests:

# All Twitter/arXiv tests
aiwebfeeds test file test_utils.py -k "twitter or arxiv"

# Specific test class
aiwebfeeds test file test_utils.py -k "TestTwitterIntegration"
aiwebfeeds test file test_utils.py -k "TestArxivIntegration"

Usage Examples

Adding a Twitter Feed

Add to data/feeds.yaml:

- id: "your-twitter-feed"
  site: "https://twitter.com/username"
  title: "Feed Title"
  topics: ["ai"]
  source_type: "twitter"
  platform_config:
    platform: "twitter"

Adding an arXiv Feed

Add to data/feeds.yaml:

- id: "your-arxiv-feed"
  site: "https://arxiv.org/list/cs.LG/recent"
  title: "Feed Title"
  topics: ["research", "ml"]
  source_type: "arxiv"
  platform_config:
    platform: "arxiv"

Limitations

Twitter/X

  • Relies on Nitter instances which may have rate limits or availability issues
  • Nitter instances may be blocked or shut down
  • Consider using multiple Nitter instances for redundancy

arXiv

  • RSS feeds update once per day (overnight)
  • API queries limited to 100 results maximum
  • API has rate limiting (3 seconds between requests recommended)
  • Author searches may return false positives for common names

Best Practices

  1. Twitter/X: Monitor your chosen Nitter instance for availability
  2. arXiv: Use specific categories rather than broad searches for better signal
  3. Both: Set appropriate max_results to avoid overwhelming feeds
  4. Both: Use topic_weights to indicate relevance when a feed covers multiple topics

Future Enhancements

Potential improvements:

  • Automatic Nitter instance failover
  • arXiv paper metadata enrichment
  • Twitter thread reconstruction
  • arXiv citation tracking
  • Integration with arXiv vanity for better author disambiguation