Intelligent feed search with autocomplete, faceted filtering, and semantic similarity

Search & Discovery

Status: ✅ Fully Implemented Phase: Phase 1 (MVP) Completion: 100%

The Search & Discovery feature enables users to find feeds through full-text search, autocomplete suggestions, faceted filtering, and semantic similarity.

Features

Unified Search Interface

Single search bar at /search with real-time autocomplete (<200ms response time).

Full-Text Search

Search Across: Feed titles, descriptions, recent article titles (if cached)
Ranking: TF-IDF scoring with boost factors:
- Verified feeds: +20%
- Active feeds: +10%
- Popular feeds: +5%
Highlighting: Search terms bolded in result snippets

Autocomplete Suggestions

Within 200ms, get:

Top 5 matching feeds
Top 3 matching topics
Top 3 recent searches (user-specific, localStorage)

Faceted Filtering

Filter results by multiple criteria (AND logic):

Source Type: blog, podcast, newsletter, video, social, other
Topics: Multi-select from topic taxonomy
Verified Status: Toggle verified-only filter
Activity Status: Active/inactive toggle

Result Count Badges: "Blogs (45)", "Verified (23)" displayed next to each filter option.

Semantic Search

Toggle "Include similar results" to enable vector similarity search:

Embeddings: Sentence-BERT (384-dim all-MiniLM-L6-v2 model)
Similarity Threshold: ≥0.7 cosine similarity
Configurable Modes:
- Local (default): Sentence-Transformers, zero setup
- Hugging Face API (optional): Requires AIWF_EMBEDDING__HF_API_TOKEN

Saved Searches

Save: Store query + filters with custom name
Replay: One-click load from sidebar
Persistence: Browser localStorage with Export/Import JSON for cross-device transfer

Search History

Last 10 searches stored per user (localStorage or database if logged in).

Configuration

# Autocomplete suggestions limit (5 feeds + 3 topics)
AIWF_SEARCH__AUTOCOMPLETE_LIMIT=8

# Full-text search results per page
AIWF_SEARCH__FULL_TEXT_LIMIT=20

# Semantic similarity threshold (0.0-1.0)
AIWF_SEARCH__SEMANTIC_SIMILARITY_THRESHOLD=0.7

# Embedding provider: "local" or "huggingface"
AIWF_EMBEDDING__PROVIDER=local

# Hugging Face API token (optional, for HF provider)
AIWF_EMBEDDING__HF_API_TOKEN=

# Hugging Face model name
AIWF_EMBEDDING__HF_MODEL=sentence-transformers/all-MiniLM-L6-v2

# Local model name
AIWF_EMBEDDING__LOCAL_MODEL=sentence-transformers/all-MiniLM-L6-v2

# Embedding cache size (LRU)
AIWF_EMBEDDING__EMBEDDING_CACHE_SIZE=1000

Usage

Web Interface

Navigate to /search:

Type query in search bar
Select autocomplete suggestion or press Enter
Apply faceted filters (left sidebar)
Toggle "Include similar results" for semantic search
Click "Save Search" to store query for later

Keyboard Shortcuts:

Cmd/Ctrl+K: Focus search bar
Arrow keys: Navigate autocomplete suggestions
Enter: Execute search

CLI

# Full-text search
uv run aiwebfeeds search "transformer attention" --limit 20

# Semantic search
uv run aiwebfeeds search "machine learning" --semantic --threshold 0.7

# Filter by source type and topic
uv run aiwebfeeds search "pytorch" --source-type blog --topic deeplearning

# Save search
uv run aiwebfeeds search save --name "ML Research" --query "deep learning" --topics "llm,training"

API

// Full-text search
const response = await fetch("/api/search?q=transformer&limit=20");
const results = await response.json();

// Semantic search
const semanticResults = await fetch("/api/search?q=neural networks&semantic=true&threshold=0.7");

// Autocomplete
const suggestions = await fetch("/api/search/autocomplete?prefix=mach");

// Save search
await fetch("/api/search/saved", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    name: "AI Research",
    query: "artificial intelligence",
    filters: { source_type: ["blog"], topics: ["llm", "agents"] },
  }),
});

Performance

Autocomplete: <200ms response time (95th percentile, NFR-002)
Full-Text Search: <500ms for 10,000+ feeds (NFR-003)
Semantic Search: <3s total latency (2s vector search + 1s rendering, NFR-004)
FTS5 Scalability: Supports 50,000+ feeds with sub-second queries

Zero Results Handling

When no results found, display:

Spelling suggestions
"Browse by topic" link
"Suggest a feed" link → GitHub issue template

Success Criteria

✅ Search results appear within 500ms for 95% of queries
✅ 70% of searches yield >0 results (zero-result rate <30%)
✅ Average click-through rate on search results ≥40%
✅ 50% of users who search use faceted filters
✅ Saved searches used by 20% of active users within first month
✅ Semantic search increases relevance by 25% (A/B test CTR)

Analytics Dashboard - View search analytics and popular queries
Recommendations - AI-powered feed suggestions
Data Model - SearchQuery and SavedSearch schemas

Search & Discovery

On this page