← Back to docs
pipeline.md

Daily Trades — Pipeline Architecture

Overview

Daily pipeline: ingest headlines, filter for trends, score with context awareness, write articles with cross-article dedup. Dashboard for review, manual "mark used" before distribution.

Tech Stack

Daily Flow


CRON (weekday morning)
│
├─ 1. Ingest RSS feeds              (no API call)
├─ 2. Filter + rank new headlines   (1 Haiku call)
├─ 3. Apply decay to scored pool    (no API call)
├─ 4. Score all filtered headlines  (1-4 Opus calls, batches of 5)
├─ 5. Select top 3 with dedup      (no API call — Python logic)
├─ 6. Write articles sequentially   (3 Opus calls, chained context)
└─ Ready for review on dashboard

Step 1: Ingest

Fetch all 16 active RSS feeds. Dedup by URL and content hash (SHA256 of normalized title). New headlines get status='new'. Headlines older than 7 days that are still new or filtered get expired.

Step 2: Filter + Rank

1 Haiku 4.5 call. Prompt: prompts/filter.md

Input:

Output: Top 20 ranked by trend potential → status='filtered'. Rest → status='rejected'.

Key test: "Could you draw a multi-year trendline for this story?"

Step 3: Decay

Apply time decay to all status='scored' headlines:


adjusted_score = composite_score - (days_since_scored × 3)

Headlines below 40 adjusted score → status='expired'.

Step 4: Score

1-4 Opus 4.6 calls (batches of 5). Prompt: prompts/score.md

Scores all filtered headlines on four criteria:

Context injected into prompt:

Each headline gets: four scores, composite, tickers, trend_summary, and a trend_cluster label.

Ticker rules baked into prompt:

Step 5: Candidate Selection

Pure Python, no API call.

Walks scored headlines by adjusted score descending. Skips any where:

Picks top 3 that pass all constraints.

Step 6: Write Articles

3 Opus 4.6 calls, sequential. Prompt: prompts/write.md

Sequential chaining with context accumulation:

Each article stored with article_md, linked to an issues record. Status stays scored — only becomes used when manually marked via dashboard.

Headline Lifecycle


new → filtered → scored → used (manually, via dashboard or automation)
new → rejected (killed by filter)
scored → expired (decayed below 40)
new/filtered → expired (older than 7 days)

Database Tables

headlines — All RSS headlines with scores, articles, status, trend_cluster, issue_id

issues — Each writing run: date, article IDs, tickers, clusters, status (draft/published)

sources — RSS feed registry

Constants


POOL_TARGET = 10        # Display target (scoring no longer stops early)
POOL_THRESHOLD = 55     # Minimum adjusted score to count in pool
DECAY_PER_DAY = 3       # Points lost per day
EXPIRE_SCORE = 40       # Below this → expired
SCORE_BATCH_SIZE = 5    # Headlines per scoring API call

Commands


python3 scripts/run.py              # Full pipeline
python3 scripts/run.py --ingest     # Fetch RSS feeds
python3 scripts/run.py --filter     # Filter + rank
python3 scripts/run.py --decay      # Apply decay
python3 scripts/run.py --score      # Score filtered headlines
python3 scripts/run.py --write      # Write top 3 articles
python3 scripts/run.py --top        # Show top scored headlines
python3 scripts/run.py --pool       # Pool health
python3 scripts/run.py --sources    # List feeds
python3 scripts/run.py --serve      # Dashboard on :8080