Initial release of Papernews

Daily AI-curated newspaper PDF delivered via Telegram. Fetches RSS feeds,
summarizes top stories with Claude, compiles a LaTeX PDF in newspaper layout.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Dave Campbell 2026-06-08 12:41:55 +00:00
commit e31282cbb5
8 changed files with 581 additions and 0 deletions

29
.env.example Normal file
View file

@ -0,0 +1,29 @@
# Papernews — environment configuration
# Copy this file to .env and fill in your values.
# Telegram bot token — create a bot via @BotFather on Telegram
TELEGRAM_BOT_TOKEN=your_bot_token_here
# Your Telegram chat ID — send a message to @userinfobot to find it
TELEGRAM_CHAT_ID=your_chat_id_here
# Anthropic API key — https://console.anthropic.com
ANTHROPIC_API_KEY=your_anthropic_api_key_here
# Location shown in the newspaper masthead
LOCATION=Your City
# Delivery schedule (24-hour time, default: 7:00am)
SCHEDULE_HOUR=7
SCHEDULE_MINUTE=0
# Timezone for the schedule — any IANA timezone name
# Examples: America/Chicago, America/New_York, America/Los_Angeles, Europe/London
TIMEZONE=America/Chicago
# How many days to keep PDFs before auto-pruning (default: 5)
RETENTION_DAYS=5
# Set to true to run the pipeline immediately on container start (for testing).
# Reset to false after testing so it doesn't fire on every restart.
RUN_NOW=false

10
.gitignore vendored Normal file
View file

@ -0,0 +1,10 @@
# Credentials — never commit these
.env
# Generated PDFs
output/
# Python
__pycache__/
*.pyc
*.pyo

31
Dockerfile Normal file
View file

@ -0,0 +1,31 @@
# Papernews — Dockerfile
# Python 3.12 slim + a minimal LaTeX install for newspaper PDF generation.
# texlive-latex-extra adds titlesec, microtype, and other layout packages.
# Image is large (~700MB) due to texlive — this is expected and a one-time cost.
#
# Rebuild after code changes: docker compose up -d --build
FROM python:3.12-slim
# Install LaTeX. --no-install-recommends keeps the layer as lean as possible.
RUN apt-get update && apt-get install -y --no-install-recommends \
texlive-latex-base \
texlive-latex-recommended \
texlive-fonts-recommended \
texlive-latex-extra \
lmodern \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
# Install Python deps first (layer-cached until requirements.txt changes)
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY papernews.py .
# /output is mounted from the host — PDFs are written and pruned here
VOLUME ["/output"]
CMD ["python", "papernews.py"]

21
LICENSE Normal file
View file

@ -0,0 +1,21 @@
MIT License
Copyright (c) 2026 Rock Campbell
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

108
README.md Normal file
View file

@ -0,0 +1,108 @@
# Papernews
A self-hosted daily newspaper delivered to your Telegram. Fetches RSS feeds from your chosen news sources, summarizes the top stories with Claude AI, and compiles everything into a clean newspaper-style PDF — once a day, no feeds to scroll, no algorithm pulling you back.
Inspired by a Reddit post on r/RemarkableTablet.
![Sample Papernews PDF showing newspaper-style layout with sections](screenshot.png)
## How it works
1. RSS feeds are fetched from configured news sources
2. Claude picks and summarizes the 46 most newsworthy stories per section
3. A LaTeX PDF is compiled in newspaper layout (two-column, masthead header)
4. The PDF is delivered to your Telegram chat
5. PDFs older than 5 days are automatically pruned
## Requirements
- Docker + Docker Compose
- A Telegram bot token (create one via [@BotFather](https://t.me/botfather))
- An [Anthropic API key](https://console.anthropic.com)
## Setup
**1. Clone the repo**
```bash
git clone https://github.com/yourusername/papernews.git
cd papernews
```
**2. Configure**
```bash
cp .env.example .env
```
Edit `.env` and fill in:
- `TELEGRAM_BOT_TOKEN` — from @BotFather
- `TELEGRAM_CHAT_ID` — your Telegram user ID (send `/start` to [@userinfobot](https://t.me/userinfobot) to find it)
- `ANTHROPIC_API_KEY` — from [console.anthropic.com](https://console.anthropic.com)
- `LOCATION` — shown in the masthead (e.g. `Kansas City, Missouri`)
**3. Build and run**
```bash
docker compose up -d --build
```
The container will start and wait for the scheduled time. To test immediately:
```bash
docker compose run --rm -e RUN_NOW=true papernews
```
## Configuration
All configuration is via environment variables in `.env`:
| Variable | Default | Description |
|---|---|---|
| `TELEGRAM_BOT_TOKEN` | required | Bot token from @BotFather |
| `TELEGRAM_CHAT_ID` | required | Your Telegram chat ID |
| `ANTHROPIC_API_KEY` | required | Anthropic API key |
| `LOCATION` | `Your City` | City/region shown in the masthead |
| `SCHEDULE_HOUR` | `7` | Hour to deliver (24-hour) |
| `SCHEDULE_MINUTE` | `0` | Minute to deliver |
| `TIMEZONE` | `America/Chicago` | Any IANA timezone name |
| `RETENTION_DAYS` | `5` | Days to keep PDFs before pruning |
| `RUN_NOW` | `false` | Set `true` to run immediately on start |
## Customizing news sources
Edit the `SECTIONS` list in `papernews.py`. Each section has a title and a list of RSS feed URLs:
```python
SECTIONS = [
{
"title": "Local --- My City",
"feeds": [
("Local Paper", "https://localpaper.com/feed"),
("Local TV", "https://localtv.com/rss"),
],
},
{
"title": "National",
"feeds": [
("AP News", "https://feeds.apnews.com/rss/apf-topnews"),
("NPR", "https://feeds.npr.org/1001/rss.xml"),
],
},
# add as many sections as you like
]
```
The `---` in section titles renders as an em dash in the PDF.
## Checking logs
```bash
docker logs papernews
```
## Notes
- The Docker image is ~700MB due to the LaTeX install. This is a one-time cost.
- PDFs are saved to `./output/` and pruned automatically.
- The bot token is only used to send outbound documents — the container does not poll for incoming messages.
## License
MIT — see [LICENSE](LICENSE).

22
docker-compose.yml Normal file
View file

@ -0,0 +1,22 @@
# Papernews — Docker Compose
# Daily AI-curated newspaper PDF: fetches RSS, summarizes with Claude,
# compiles LaTeX, delivers to Telegram, prunes PDFs older than 5 days.
#
# Schedule: 7:00am CT daily.
# To test immediately: set RUN_NOW=true in .env, then docker compose up
# Rebuild after code changes: docker compose up -d --build
services:
papernews:
build: .
container_name: papernews
restart: unless-stopped
env_file: .env
volumes:
# PDFs are written here and auto-pruned after 5 days
- ./output:/output
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"

356
papernews.py Normal file
View file

@ -0,0 +1,356 @@
#!/usr/bin/env python3
"""
papernews.py Daily AI-curated newspaper PDF
Fetches RSS feeds from configurable news sources, asks Claude to pick and
summarize the top stories, renders a newspaper-style PDF via LaTeX, delivers
it to Telegram, and prunes PDFs older than a configurable number of days.
Schedule: configurable via SCHEDULE_HOUR/SCHEDULE_MINUTE (default: 7:00am).
Timezone: configurable via TIMEZONE (default: America/Chicago).
Set RUN_NOW=true in the environment to fire once immediately on startup (for testing).
"""
import json
import logging
import os
import re
import shutil
import subprocess
import tempfile
from datetime import datetime, timedelta
from pathlib import Path
from zoneinfo import ZoneInfo
import anthropic
import feedparser
import requests
from apscheduler.schedulers.blocking import BlockingScheduler
# ── Configuration ──────────────────────────────────────────────────────────
BOT_TOKEN = os.environ["TELEGRAM_BOT_TOKEN"]
CHAT_ID = os.environ["TELEGRAM_CHAT_ID"]
ANTHROPIC_KEY = os.environ["ANTHROPIC_API_KEY"]
OUTPUT_DIR = Path("/output")
LOCATION = os.environ.get("LOCATION", "Your City")
TZ = ZoneInfo(os.environ.get("TIMEZONE", "America/Chicago"))
RETENTION_DAYS = int(os.environ.get("RETENTION_DAYS", "5"))
SCHEDULE_HOUR = int(os.environ.get("SCHEDULE_HOUR", "7"))
SCHEDULE_MIN = int(os.environ.get("SCHEDULE_MINUTE", "0"))
RUN_NOW = os.environ.get("RUN_NOW", "").lower() == "true"
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
)
log = logging.getLogger(__name__)
# Convenience alias — used in run() and the scheduler
CT = TZ
# ── News sections and RSS feeds ────────────────────────────────────────────
# Each section has a display title and a list of (source_name, rss_url) pairs.
SECTIONS = [
{
"title": "Local --- Kansas City",
"feeds": [
("KCUR", "https://www.kcur.org/feed"),
("Fox4KC", "https://www.fox4kc.com/feed/"),
("KSHB", "https://www.kshb.com/news.rss"),
],
},
{
"title": "National",
"feeds": [
("AP News", "https://feeds.apnews.com/rss/apf-topnews"),
("NPR", "https://feeds.npr.org/1001/rss.xml"),
("PBS", "https://www.pbs.org/newshour/feeds/rss/headlines"),
],
},
{
"title": "International",
"feeds": [
("BBC World", "http://feeds.bbci.co.uk/news/world/rss.xml"),
("The Guardian", "https://www.theguardian.com/world/rss"),
("Reuters", "https://feeds.reuters.com/reuters/topNews"),
],
},
{
"title": "Technology",
"feeds": [
("Hacker News", "https://news.ycombinator.com/rss"),
("Ars Technica", "http://feeds.arstechnica.com/arstechnica/index"),
],
},
]
# How many headlines to collect from each feed before handing off to Claude
MAX_ARTICLES_PER_FEED = 6
# ── LaTeX helpers ──────────────────────────────────────────────────────────
# Maps each special LaTeX character to its escaped equivalent.
# translate() processes each character independently — no double-substitution risk.
_LATEX_ESCAPES = str.maketrans({
"\\": r"\textbackslash{}",
"&": r"\&",
"%": r"\%",
"$": r"\$",
"#": r"\#",
"_": r"\_",
"{": r"\{",
"}": r"\}",
"~": r"\textasciitilde{}",
"^": r"\textasciicircum{}",
})
def esc(text: str) -> str:
"""Escape special LaTeX characters in user-supplied text."""
return str(text).translate(_LATEX_ESCAPES)
# Raw string template — avoids double-escaping LaTeX braces.
# Uses simple str.replace() substitution for %(date)s and %(body)s markers.
LATEX_TEMPLATE = r"""\documentclass[10pt]{article}
\usepackage[margin=0.65in, top=0.4in, columnsep=0.3in]{geometry}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{lmodern}
\usepackage{multicol}
\usepackage{parskip}
\usepackage{microtype}
\usepackage{titlesec}
%% Section header: centered, bold, with a rule below
\titleformat{\section}[block]{\large\bfseries\filcenter}{}{0pt}{}[\titlerule]
\titlespacing*{\section}{0pt}{14pt}{6pt}
\setlength{\parindent}{0pt}
\setlength{\parskip}{4pt}
\pagestyle{empty}
\begin{document}
%% Masthead full width above the two-column body
\begin{center}
{\fontsize{42}{50}\selectfont\textbf{Papernews}}\\[2pt]
\rule{\linewidth}{2pt}\\[2pt]
{\footnotesize PAPERLOCATION\hfill\textit{PAPERDATE}}\\
\rule{\linewidth}{0.4pt}
\end{center}
\vspace{6pt}
%% Two-column newspaper content
\begin{multicols}{2}
PAPERBODY
\end{multicols}
\end{document}
"""
def build_section_latex(title: str, stories: list) -> str:
"""Render one newspaper section from its AI-summarized stories."""
lines = [rf"\section*{{{esc(title)}}}"]
for story in stories:
headline = esc(story.get("headline", ""))
body = esc(story.get("body", ""))
lines.append(rf"\textbf{{{headline}}}")
lines.append("")
lines.append(body)
lines.append("")
return "\n".join(lines)
# ── RSS fetching ───────────────────────────────────────────────────────────
# Fetch via requests (with explicit timeout) then parse the content string,
# so feedparser never makes its own unguarded network calls that can hang forever.
_FEED_HEADERS = {"User-Agent": "Papernews/1.0 (rocklab daily digest)"}
def fetch_articles(feeds: list) -> list:
"""Pull the top headlines from each feed in a section."""
articles = []
for source, url in feeds:
try:
resp = requests.get(url, timeout=12, headers=_FEED_HEADERS)
resp.raise_for_status()
feed = feedparser.parse(resp.text)
for entry in feed.entries[:MAX_ARTICLES_PER_FEED]:
title = entry.get("title", "").strip()
summary = entry.get("summary", entry.get("description", "")).strip()
# Strip HTML tags — Claude handles messy plain text fine
summary = re.sub(r"<[^>]+>", " ", summary).strip()
if title:
articles.append({
"source": source,
"title": title,
"summary": summary[:500],
})
except Exception as exc:
log.warning("Feed fetch failed — %s (%s): %s", source, url, exc)
return articles
# ── Claude summarization ───────────────────────────────────────────────────
def summarize_section(client: anthropic.Anthropic, section_title: str, articles: list) -> list:
"""Ask Claude to select and summarize the top stories for one section."""
if not articles:
log.warning("No articles fetched for section: %s", section_title)
return []
article_text = "\n\n".join(
f"[{a['source']}] {a['title']}\n{a['summary']}" for a in articles
)
prompt = (
f"You are the {section_title} editor for a daily newspaper called Papernews.\n\n"
"From the articles below, select the 46 most newsworthy stories and write a concise digest.\n\n"
"Return ONLY a JSON array with no other text:\n"
'[\n {"headline": "Short headline", "body": "13 sentences of newspaper prose."},\n ...\n]\n\n'
"Rules: headlines under 10 words, body text factual and tight, no source names or URLs.\n\n"
f"Articles:\n{article_text}"
)
try:
msg = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
messages=[{"role": "user", "content": prompt}],
)
raw = msg.content[0].text.strip()
# Strip markdown code fences if Claude wraps the JSON
if raw.startswith("```"):
raw = raw.split("```")[1]
if raw.startswith("json"):
raw = raw[4:]
return json.loads(raw.strip())
except Exception as exc:
log.error("Claude summarization failed for '%s': %s", section_title, exc)
return []
# ── PDF compilation ────────────────────────────────────────────────────────
def compile_pdf(latex: str, output_path: Path) -> bool:
"""Compile a LaTeX string to PDF and copy it to output_path."""
with tempfile.TemporaryDirectory() as tmp:
tex_path = Path(tmp) / "papernews.tex"
tex_path.write_text(latex, encoding="utf-8")
result = subprocess.run(
["pdflatex", "-interaction=nonstopmode", "-output-directory", tmp, str(tex_path)],
capture_output=True,
text=True,
)
pdf_src = Path(tmp) / "papernews.pdf"
if result.returncode != 0 or not pdf_src.exists():
log.error("pdflatex failed. Last 3000 chars of output:\n%s", result.stdout[-3000:])
return False
shutil.copy2(pdf_src, output_path)
return True
# ── Telegram delivery ──────────────────────────────────────────────────────
def send_pdf(path: Path, caption: str) -> bool:
"""Upload the PDF to the configured Telegram chat."""
url = f"https://api.telegram.org/bot{BOT_TOKEN}/sendDocument"
try:
with open(path, "rb") as f:
resp = requests.post(
url,
data={"chat_id": CHAT_ID, "caption": caption},
files={"document": f},
timeout=60,
)
if resp.ok:
log.info("PDF delivered to Telegram.")
return True
log.error("Telegram sendDocument failed: %s%s", resp.status_code, resp.text)
return False
except Exception as exc:
log.error("Telegram send error: %s", exc)
return False
def send_message(text: str) -> None:
"""Send a plain-text message to Telegram (used for error alerts)."""
url = f"https://api.telegram.org/bot{BOT_TOKEN}/sendMessage"
try:
requests.post(url, json={"chat_id": CHAT_ID, "text": text}, timeout=15)
except Exception as exc:
log.error("Telegram message error: %s", exc)
# ── Cleanup ────────────────────────────────────────────────────────────────
def prune_old_pdfs() -> None:
"""Delete papernews PDFs from OUTPUT_DIR older than RETENTION_DAYS."""
cutoff = datetime.now() - timedelta(days=RETENTION_DAYS)
for pdf in OUTPUT_DIR.glob("papernews-*.pdf"):
if datetime.fromtimestamp(pdf.stat().st_mtime) < cutoff:
pdf.unlink()
log.info("Pruned: %s", pdf.name)
# ── Main pipeline ──────────────────────────────────────────────────────────
def run() -> None:
"""Full pipeline: fetch → summarize → render → deliver → prune."""
now = datetime.now(CT)
date_str = now.strftime("%A, %B %-d, %Y")
pdf_path = OUTPUT_DIR / f"papernews-{now.strftime('%Y-%m-%d')}.pdf"
log.info("Starting Papernews run for %s", date_str)
client = anthropic.Anthropic(api_key=ANTHROPIC_KEY)
body_parts = []
for section in SECTIONS:
log.info("Processing section: %s", section["title"])
articles = fetch_articles(section["feeds"])
log.info(" Fetched %d articles", len(articles))
stories = summarize_section(client, section["title"], articles)
log.info(" Summarized to %d stories", len(stories))
if stories:
body_parts.append(build_section_latex(section["title"], stories))
if not body_parts:
msg = "Papernews: no content generated today — check container logs."
log.error(msg)
send_message(msg)
return
# Substitute location, date, and body into the template using plain str.replace()
# so no special characters in the content can interfere with formatting.
latex = (
LATEX_TEMPLATE
.replace("PAPERLOCATION", LOCATION)
.replace("PAPERDATE", date_str)
.replace("PAPERBODY", "\n\n".join(body_parts))
)
if not compile_pdf(latex, pdf_path):
send_message("Papernews: LaTeX compile failed — check container logs.")
return
if not send_pdf(pdf_path, f"Papernews — {date_str}"):
log.error("PDF compiled but Telegram delivery failed: %s", pdf_path)
return
prune_old_pdfs()
log.info("Papernews complete: %s", pdf_path.name)
# ── Entry point ────────────────────────────────────────────────────────────
if __name__ == "__main__":
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
if RUN_NOW:
log.info("RUN_NOW=true — running immediately.")
run()
scheduler = BlockingScheduler(timezone=CT)
scheduler.add_job(run, "cron", hour=SCHEDULE_HOUR, minute=SCHEDULE_MIN, id="papernews_daily")
log.info("Papernews scheduled for %02d:%02d daily.", SCHEDULE_HOUR, SCHEDULE_MIN)
scheduler.start()

4
requirements.txt Normal file
View file

@ -0,0 +1,4 @@
feedparser>=6.0
anthropic>=0.50.0
apscheduler>=3.10,<4.0
requests>=2.28.0