pipeline.md

Daily Trades — Pipeline Architecture

Overview

Daily pipeline: ingest headlines, filter for trends, score with context awareness, write articles with cross-article dedup. Dashboard for review, manual "mark used" before distribution.

Tech Stack

Models: Anthropic API — Haiku 4.5 (filter), Opus 4.6 (score + write)
Database: SQLite (data/daily_trades.db)
RSS: feedparser (Python)
Prompts: External markdown files in prompts/ loaded at runtime
Runtime: Python 3.12 on homelab server
Review UI: Dashboard on port 8080

Daily Flow


CRON (weekday morning)
│
├─ 1. Ingest RSS feeds              (no API call)
├─ 2. Filter + rank new headlines   (1 Haiku call)
├─ 3. Apply decay to scored pool    (no API call)
├─ 4. Score all filtered headlines  (1-4 Opus calls, batches of 5)
├─ 5. Select top 3 with dedup      (no API call — Python logic)
├─ 6. Write articles sequentially   (3 Opus calls, chained context)
└─ Ready for review on dashboard

Step 1: Ingest

Fetch all 16 active RSS feeds. Dedup by URL and content hash (SHA256 of normalized title). New headlines get status='new'. Headlines older than 7 days that are still new or filtered get expired.

Step 2: Filter + Rank

1 Haiku 4.5 call. Prompt: prompts/filter.md

Input:

All status='new' headlines
Last 30 days of scored/used headline titles (headline-level dedup)
Active trend clusters from last 30 days (trend-level dedup)

Output: Top 20 ranked by trend potential → status='filtered'. Rest → status='rejected'.

Key test: "Could you draw a multi-year trendline for this story?"

Step 3: Decay

Apply time decay to all status='scored' headlines:


adjusted_score = composite_score - (days_since_scored × 3)

Headlines below 40 adjusted score → status='expired'.

Step 4: Score

1-4 Opus 4.6 calls (batches of 5). Prompt: prompts/score.md

Scores all filtered headlines on four criteria:

Trend (35%) — Multi-month/year structural shift?
Relatability (25%) — Can a 25-year-old feel this?
Novelty (20%) — Under the radar?
Company mapping (20%) — Maps to interesting, non-obvious public companies?

Context injected into prompt:

Recent tickers used in last 30 days (with frequency)
Active trend clusters

Each headline gets: four scores, composite, tickers, trend_summary, and a trend_cluster label.

Ticker rules baked into prompt:

Avoid mega-caps ($500B+)
Prefer pure-play (>50% revenue exposure)
Avoid conglomerates where trend is <10% of revenue
Prefer winners over losers
2 tickers fine, don't force 3

Step 5: Candidate Selection

Pure Python, no API call.

Walks scored headlines by adjusted score descending. Skips any where:

trend_cluster matches another selected headline (no overlap in same issue)
trend_cluster was published in last 7 days
Any ticker was published in last 30 days
Any ticker already selected in this issue

Picks top 3 that pass all constraints.

Step 6: Write Articles

3 Opus 4.6 calls, sequential. Prompt: prompts/write.md

Sequential chaining with context accumulation:

Article 1: gets recent issue history (last 14 days)
Article 2: gets article 1's headline + tickers + cluster
Article 3: gets articles 1 + 2

Each article stored with article_md, linked to an issues record. Status stays scored — only becomes used when manually marked via dashboard.

Headline Lifecycle


new → filtered → scored → used (manually, via dashboard or automation)
new → rejected (killed by filter)
scored → expired (decayed below 40)
new/filtered → expired (older than 7 days)

Database Tables

headlines — All RSS headlines with scores, articles, status, trend_cluster, issue_id

issues — Each writing run: date, article IDs, tickers, clusters, status (draft/published)

sources — RSS feed registry

Constants


POOL_TARGET = 10        # Display target (scoring no longer stops early)
POOL_THRESHOLD = 55     # Minimum adjusted score to count in pool
DECAY_PER_DAY = 3       # Points lost per day
EXPIRE_SCORE = 40       # Below this → expired
SCORE_BATCH_SIZE = 5    # Headlines per scoring API call

Commands


python3 scripts/run.py              # Full pipeline
python3 scripts/run.py --ingest     # Fetch RSS feeds
python3 scripts/run.py --filter     # Filter + rank
python3 scripts/run.py --decay      # Apply decay
python3 scripts/run.py --score      # Score filtered headlines
python3 scripts/run.py --write      # Write top 3 articles
python3 scripts/run.py --top        # Show top scored headlines
python3 scripts/run.py --pool       # Pool health
python3 scripts/run.py --sources    # List feeds
python3 scripts/run.py --serve      # Dashboard on :8080