Initial release of Papernews
Daily AI-curated newspaper PDF delivered via Telegram. Fetches RSS feeds, summarizes top stories with Claude, compiles a LaTeX PDF in newspaper layout. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
commit
e31282cbb5
8 changed files with 581 additions and 0 deletions
29
.env.example
Normal file
29
.env.example
Normal file
|
|
@ -0,0 +1,29 @@
|
||||||
|
# Papernews — environment configuration
|
||||||
|
# Copy this file to .env and fill in your values.
|
||||||
|
|
||||||
|
# Telegram bot token — create a bot via @BotFather on Telegram
|
||||||
|
TELEGRAM_BOT_TOKEN=your_bot_token_here
|
||||||
|
|
||||||
|
# Your Telegram chat ID — send a message to @userinfobot to find it
|
||||||
|
TELEGRAM_CHAT_ID=your_chat_id_here
|
||||||
|
|
||||||
|
# Anthropic API key — https://console.anthropic.com
|
||||||
|
ANTHROPIC_API_KEY=your_anthropic_api_key_here
|
||||||
|
|
||||||
|
# Location shown in the newspaper masthead
|
||||||
|
LOCATION=Your City
|
||||||
|
|
||||||
|
# Delivery schedule (24-hour time, default: 7:00am)
|
||||||
|
SCHEDULE_HOUR=7
|
||||||
|
SCHEDULE_MINUTE=0
|
||||||
|
|
||||||
|
# Timezone for the schedule — any IANA timezone name
|
||||||
|
# Examples: America/Chicago, America/New_York, America/Los_Angeles, Europe/London
|
||||||
|
TIMEZONE=America/Chicago
|
||||||
|
|
||||||
|
# How many days to keep PDFs before auto-pruning (default: 5)
|
||||||
|
RETENTION_DAYS=5
|
||||||
|
|
||||||
|
# Set to true to run the pipeline immediately on container start (for testing).
|
||||||
|
# Reset to false after testing so it doesn't fire on every restart.
|
||||||
|
RUN_NOW=false
|
||||||
10
.gitignore
vendored
Normal file
10
.gitignore
vendored
Normal file
|
|
@ -0,0 +1,10 @@
|
||||||
|
# Credentials — never commit these
|
||||||
|
.env
|
||||||
|
|
||||||
|
# Generated PDFs
|
||||||
|
output/
|
||||||
|
|
||||||
|
# Python
|
||||||
|
__pycache__/
|
||||||
|
*.pyc
|
||||||
|
*.pyo
|
||||||
31
Dockerfile
Normal file
31
Dockerfile
Normal file
|
|
@ -0,0 +1,31 @@
|
||||||
|
# Papernews — Dockerfile
|
||||||
|
# Python 3.12 slim + a minimal LaTeX install for newspaper PDF generation.
|
||||||
|
# texlive-latex-extra adds titlesec, microtype, and other layout packages.
|
||||||
|
# Image is large (~700MB) due to texlive — this is expected and a one-time cost.
|
||||||
|
#
|
||||||
|
# Rebuild after code changes: docker compose up -d --build
|
||||||
|
|
||||||
|
FROM python:3.12-slim
|
||||||
|
|
||||||
|
# Install LaTeX. --no-install-recommends keeps the layer as lean as possible.
|
||||||
|
RUN apt-get update && apt-get install -y --no-install-recommends \
|
||||||
|
texlive-latex-base \
|
||||||
|
texlive-latex-recommended \
|
||||||
|
texlive-fonts-recommended \
|
||||||
|
texlive-latex-extra \
|
||||||
|
lmodern \
|
||||||
|
&& apt-get clean \
|
||||||
|
&& rm -rf /var/lib/apt/lists/*
|
||||||
|
|
||||||
|
WORKDIR /app
|
||||||
|
|
||||||
|
# Install Python deps first (layer-cached until requirements.txt changes)
|
||||||
|
COPY requirements.txt .
|
||||||
|
RUN pip install --no-cache-dir -r requirements.txt
|
||||||
|
|
||||||
|
COPY papernews.py .
|
||||||
|
|
||||||
|
# /output is mounted from the host — PDFs are written and pruned here
|
||||||
|
VOLUME ["/output"]
|
||||||
|
|
||||||
|
CMD ["python", "papernews.py"]
|
||||||
21
LICENSE
Normal file
21
LICENSE
Normal file
|
|
@ -0,0 +1,21 @@
|
||||||
|
MIT License
|
||||||
|
|
||||||
|
Copyright (c) 2026 Rock Campbell
|
||||||
|
|
||||||
|
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||||
|
of this software and associated documentation files (the "Software"), to deal
|
||||||
|
in the Software without restriction, including without limitation the rights
|
||||||
|
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||||
|
copies of the Software, and to permit persons to whom the Software is
|
||||||
|
furnished to do so, subject to the following conditions:
|
||||||
|
|
||||||
|
The above copyright notice and this permission notice shall be included in all
|
||||||
|
copies or substantial portions of the Software.
|
||||||
|
|
||||||
|
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||||
|
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||||
|
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||||
|
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||||
|
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||||
|
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||||
|
SOFTWARE.
|
||||||
108
README.md
Normal file
108
README.md
Normal file
|
|
@ -0,0 +1,108 @@
|
||||||
|
# Papernews
|
||||||
|
|
||||||
|
A self-hosted daily newspaper delivered to your Telegram. Fetches RSS feeds from your chosen news sources, summarizes the top stories with Claude AI, and compiles everything into a clean newspaper-style PDF — once a day, no feeds to scroll, no algorithm pulling you back.
|
||||||
|
|
||||||
|
Inspired by a Reddit post on r/RemarkableTablet.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
## How it works
|
||||||
|
|
||||||
|
1. RSS feeds are fetched from configured news sources
|
||||||
|
2. Claude picks and summarizes the 4–6 most newsworthy stories per section
|
||||||
|
3. A LaTeX PDF is compiled in newspaper layout (two-column, masthead header)
|
||||||
|
4. The PDF is delivered to your Telegram chat
|
||||||
|
5. PDFs older than 5 days are automatically pruned
|
||||||
|
|
||||||
|
## Requirements
|
||||||
|
|
||||||
|
- Docker + Docker Compose
|
||||||
|
- A Telegram bot token (create one via [@BotFather](https://t.me/botfather))
|
||||||
|
- An [Anthropic API key](https://console.anthropic.com)
|
||||||
|
|
||||||
|
## Setup
|
||||||
|
|
||||||
|
**1. Clone the repo**
|
||||||
|
```bash
|
||||||
|
git clone https://github.com/yourusername/papernews.git
|
||||||
|
cd papernews
|
||||||
|
```
|
||||||
|
|
||||||
|
**2. Configure**
|
||||||
|
```bash
|
||||||
|
cp .env.example .env
|
||||||
|
```
|
||||||
|
|
||||||
|
Edit `.env` and fill in:
|
||||||
|
- `TELEGRAM_BOT_TOKEN` — from @BotFather
|
||||||
|
- `TELEGRAM_CHAT_ID` — your Telegram user ID (send `/start` to [@userinfobot](https://t.me/userinfobot) to find it)
|
||||||
|
- `ANTHROPIC_API_KEY` — from [console.anthropic.com](https://console.anthropic.com)
|
||||||
|
- `LOCATION` — shown in the masthead (e.g. `Kansas City, Missouri`)
|
||||||
|
|
||||||
|
**3. Build and run**
|
||||||
|
```bash
|
||||||
|
docker compose up -d --build
|
||||||
|
```
|
||||||
|
|
||||||
|
The container will start and wait for the scheduled time. To test immediately:
|
||||||
|
```bash
|
||||||
|
docker compose run --rm -e RUN_NOW=true papernews
|
||||||
|
```
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
All configuration is via environment variables in `.env`:
|
||||||
|
|
||||||
|
| Variable | Default | Description |
|
||||||
|
|---|---|---|
|
||||||
|
| `TELEGRAM_BOT_TOKEN` | required | Bot token from @BotFather |
|
||||||
|
| `TELEGRAM_CHAT_ID` | required | Your Telegram chat ID |
|
||||||
|
| `ANTHROPIC_API_KEY` | required | Anthropic API key |
|
||||||
|
| `LOCATION` | `Your City` | City/region shown in the masthead |
|
||||||
|
| `SCHEDULE_HOUR` | `7` | Hour to deliver (24-hour) |
|
||||||
|
| `SCHEDULE_MINUTE` | `0` | Minute to deliver |
|
||||||
|
| `TIMEZONE` | `America/Chicago` | Any IANA timezone name |
|
||||||
|
| `RETENTION_DAYS` | `5` | Days to keep PDFs before pruning |
|
||||||
|
| `RUN_NOW` | `false` | Set `true` to run immediately on start |
|
||||||
|
|
||||||
|
## Customizing news sources
|
||||||
|
|
||||||
|
Edit the `SECTIONS` list in `papernews.py`. Each section has a title and a list of RSS feed URLs:
|
||||||
|
|
||||||
|
```python
|
||||||
|
SECTIONS = [
|
||||||
|
{
|
||||||
|
"title": "Local --- My City",
|
||||||
|
"feeds": [
|
||||||
|
("Local Paper", "https://localpaper.com/feed"),
|
||||||
|
("Local TV", "https://localtv.com/rss"),
|
||||||
|
],
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"title": "National",
|
||||||
|
"feeds": [
|
||||||
|
("AP News", "https://feeds.apnews.com/rss/apf-topnews"),
|
||||||
|
("NPR", "https://feeds.npr.org/1001/rss.xml"),
|
||||||
|
],
|
||||||
|
},
|
||||||
|
# add as many sections as you like
|
||||||
|
]
|
||||||
|
```
|
||||||
|
|
||||||
|
The `---` in section titles renders as an em dash in the PDF.
|
||||||
|
|
||||||
|
## Checking logs
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker logs papernews
|
||||||
|
```
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
|
||||||
|
- The Docker image is ~700MB due to the LaTeX install. This is a one-time cost.
|
||||||
|
- PDFs are saved to `./output/` and pruned automatically.
|
||||||
|
- The bot token is only used to send outbound documents — the container does not poll for incoming messages.
|
||||||
|
|
||||||
|
## License
|
||||||
|
|
||||||
|
MIT — see [LICENSE](LICENSE).
|
||||||
22
docker-compose.yml
Normal file
22
docker-compose.yml
Normal file
|
|
@ -0,0 +1,22 @@
|
||||||
|
# Papernews — Docker Compose
|
||||||
|
# Daily AI-curated newspaper PDF: fetches RSS, summarizes with Claude,
|
||||||
|
# compiles LaTeX, delivers to Telegram, prunes PDFs older than 5 days.
|
||||||
|
#
|
||||||
|
# Schedule: 7:00am CT daily.
|
||||||
|
# To test immediately: set RUN_NOW=true in .env, then docker compose up
|
||||||
|
# Rebuild after code changes: docker compose up -d --build
|
||||||
|
|
||||||
|
services:
|
||||||
|
papernews:
|
||||||
|
build: .
|
||||||
|
container_name: papernews
|
||||||
|
restart: unless-stopped
|
||||||
|
env_file: .env
|
||||||
|
volumes:
|
||||||
|
# PDFs are written here and auto-pruned after 5 days
|
||||||
|
- ./output:/output
|
||||||
|
logging:
|
||||||
|
driver: "json-file"
|
||||||
|
options:
|
||||||
|
max-size: "10m"
|
||||||
|
max-file: "3"
|
||||||
356
papernews.py
Normal file
356
papernews.py
Normal file
|
|
@ -0,0 +1,356 @@
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
papernews.py — Daily AI-curated newspaper PDF
|
||||||
|
|
||||||
|
Fetches RSS feeds from configurable news sources, asks Claude to pick and
|
||||||
|
summarize the top stories, renders a newspaper-style PDF via LaTeX, delivers
|
||||||
|
it to Telegram, and prunes PDFs older than a configurable number of days.
|
||||||
|
|
||||||
|
Schedule: configurable via SCHEDULE_HOUR/SCHEDULE_MINUTE (default: 7:00am).
|
||||||
|
Timezone: configurable via TIMEZONE (default: America/Chicago).
|
||||||
|
Set RUN_NOW=true in the environment to fire once immediately on startup (for testing).
|
||||||
|
"""
|
||||||
|
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
import os
|
||||||
|
import re
|
||||||
|
import shutil
|
||||||
|
import subprocess
|
||||||
|
import tempfile
|
||||||
|
from datetime import datetime, timedelta
|
||||||
|
from pathlib import Path
|
||||||
|
from zoneinfo import ZoneInfo
|
||||||
|
|
||||||
|
import anthropic
|
||||||
|
import feedparser
|
||||||
|
import requests
|
||||||
|
from apscheduler.schedulers.blocking import BlockingScheduler
|
||||||
|
|
||||||
|
# ── Configuration ──────────────────────────────────────────────────────────
|
||||||
|
BOT_TOKEN = os.environ["TELEGRAM_BOT_TOKEN"]
|
||||||
|
CHAT_ID = os.environ["TELEGRAM_CHAT_ID"]
|
||||||
|
ANTHROPIC_KEY = os.environ["ANTHROPIC_API_KEY"]
|
||||||
|
OUTPUT_DIR = Path("/output")
|
||||||
|
LOCATION = os.environ.get("LOCATION", "Your City")
|
||||||
|
TZ = ZoneInfo(os.environ.get("TIMEZONE", "America/Chicago"))
|
||||||
|
RETENTION_DAYS = int(os.environ.get("RETENTION_DAYS", "5"))
|
||||||
|
SCHEDULE_HOUR = int(os.environ.get("SCHEDULE_HOUR", "7"))
|
||||||
|
SCHEDULE_MIN = int(os.environ.get("SCHEDULE_MINUTE", "0"))
|
||||||
|
RUN_NOW = os.environ.get("RUN_NOW", "").lower() == "true"
|
||||||
|
|
||||||
|
logging.basicConfig(
|
||||||
|
level=logging.INFO,
|
||||||
|
format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
|
||||||
|
)
|
||||||
|
log = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
# Convenience alias — used in run() and the scheduler
|
||||||
|
CT = TZ
|
||||||
|
|
||||||
|
|
||||||
|
# ── News sections and RSS feeds ────────────────────────────────────────────
|
||||||
|
# Each section has a display title and a list of (source_name, rss_url) pairs.
|
||||||
|
SECTIONS = [
|
||||||
|
{
|
||||||
|
"title": "Local --- Kansas City",
|
||||||
|
"feeds": [
|
||||||
|
("KCUR", "https://www.kcur.org/feed"),
|
||||||
|
("Fox4KC", "https://www.fox4kc.com/feed/"),
|
||||||
|
("KSHB", "https://www.kshb.com/news.rss"),
|
||||||
|
],
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"title": "National",
|
||||||
|
"feeds": [
|
||||||
|
("AP News", "https://feeds.apnews.com/rss/apf-topnews"),
|
||||||
|
("NPR", "https://feeds.npr.org/1001/rss.xml"),
|
||||||
|
("PBS", "https://www.pbs.org/newshour/feeds/rss/headlines"),
|
||||||
|
],
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"title": "International",
|
||||||
|
"feeds": [
|
||||||
|
("BBC World", "http://feeds.bbci.co.uk/news/world/rss.xml"),
|
||||||
|
("The Guardian", "https://www.theguardian.com/world/rss"),
|
||||||
|
("Reuters", "https://feeds.reuters.com/reuters/topNews"),
|
||||||
|
],
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"title": "Technology",
|
||||||
|
"feeds": [
|
||||||
|
("Hacker News", "https://news.ycombinator.com/rss"),
|
||||||
|
("Ars Technica", "http://feeds.arstechnica.com/arstechnica/index"),
|
||||||
|
],
|
||||||
|
},
|
||||||
|
]
|
||||||
|
|
||||||
|
# How many headlines to collect from each feed before handing off to Claude
|
||||||
|
MAX_ARTICLES_PER_FEED = 6
|
||||||
|
|
||||||
|
|
||||||
|
# ── LaTeX helpers ──────────────────────────────────────────────────────────
|
||||||
|
# Maps each special LaTeX character to its escaped equivalent.
|
||||||
|
# translate() processes each character independently — no double-substitution risk.
|
||||||
|
_LATEX_ESCAPES = str.maketrans({
|
||||||
|
"\\": r"\textbackslash{}",
|
||||||
|
"&": r"\&",
|
||||||
|
"%": r"\%",
|
||||||
|
"$": r"\$",
|
||||||
|
"#": r"\#",
|
||||||
|
"_": r"\_",
|
||||||
|
"{": r"\{",
|
||||||
|
"}": r"\}",
|
||||||
|
"~": r"\textasciitilde{}",
|
||||||
|
"^": r"\textasciicircum{}",
|
||||||
|
})
|
||||||
|
|
||||||
|
|
||||||
|
def esc(text: str) -> str:
|
||||||
|
"""Escape special LaTeX characters in user-supplied text."""
|
||||||
|
return str(text).translate(_LATEX_ESCAPES)
|
||||||
|
|
||||||
|
|
||||||
|
# Raw string template — avoids double-escaping LaTeX braces.
|
||||||
|
# Uses simple str.replace() substitution for %(date)s and %(body)s markers.
|
||||||
|
LATEX_TEMPLATE = r"""\documentclass[10pt]{article}
|
||||||
|
\usepackage[margin=0.65in, top=0.4in, columnsep=0.3in]{geometry}
|
||||||
|
\usepackage[T1]{fontenc}
|
||||||
|
\usepackage[utf8]{inputenc}
|
||||||
|
\usepackage{lmodern}
|
||||||
|
\usepackage{multicol}
|
||||||
|
\usepackage{parskip}
|
||||||
|
\usepackage{microtype}
|
||||||
|
\usepackage{titlesec}
|
||||||
|
|
||||||
|
%% Section header: centered, bold, with a rule below
|
||||||
|
\titleformat{\section}[block]{\large\bfseries\filcenter}{}{0pt}{}[\titlerule]
|
||||||
|
\titlespacing*{\section}{0pt}{14pt}{6pt}
|
||||||
|
|
||||||
|
\setlength{\parindent}{0pt}
|
||||||
|
\setlength{\parskip}{4pt}
|
||||||
|
\pagestyle{empty}
|
||||||
|
|
||||||
|
\begin{document}
|
||||||
|
|
||||||
|
%% Masthead — full width above the two-column body
|
||||||
|
\begin{center}
|
||||||
|
{\fontsize{42}{50}\selectfont\textbf{Papernews}}\\[2pt]
|
||||||
|
\rule{\linewidth}{2pt}\\[2pt]
|
||||||
|
{\footnotesize PAPERLOCATION\hfill\textit{PAPERDATE}}\\
|
||||||
|
\rule{\linewidth}{0.4pt}
|
||||||
|
\end{center}
|
||||||
|
\vspace{6pt}
|
||||||
|
|
||||||
|
%% Two-column newspaper content
|
||||||
|
\begin{multicols}{2}
|
||||||
|
|
||||||
|
PAPERBODY
|
||||||
|
|
||||||
|
\end{multicols}
|
||||||
|
\end{document}
|
||||||
|
"""
|
||||||
|
|
||||||
|
|
||||||
|
def build_section_latex(title: str, stories: list) -> str:
|
||||||
|
"""Render one newspaper section from its AI-summarized stories."""
|
||||||
|
lines = [rf"\section*{{{esc(title)}}}"]
|
||||||
|
for story in stories:
|
||||||
|
headline = esc(story.get("headline", ""))
|
||||||
|
body = esc(story.get("body", ""))
|
||||||
|
lines.append(rf"\textbf{{{headline}}}")
|
||||||
|
lines.append("")
|
||||||
|
lines.append(body)
|
||||||
|
lines.append("")
|
||||||
|
return "\n".join(lines)
|
||||||
|
|
||||||
|
|
||||||
|
# ── RSS fetching ───────────────────────────────────────────────────────────
|
||||||
|
# Fetch via requests (with explicit timeout) then parse the content string,
|
||||||
|
# so feedparser never makes its own unguarded network calls that can hang forever.
|
||||||
|
_FEED_HEADERS = {"User-Agent": "Papernews/1.0 (rocklab daily digest)"}
|
||||||
|
|
||||||
|
|
||||||
|
def fetch_articles(feeds: list) -> list:
|
||||||
|
"""Pull the top headlines from each feed in a section."""
|
||||||
|
articles = []
|
||||||
|
for source, url in feeds:
|
||||||
|
try:
|
||||||
|
resp = requests.get(url, timeout=12, headers=_FEED_HEADERS)
|
||||||
|
resp.raise_for_status()
|
||||||
|
feed = feedparser.parse(resp.text)
|
||||||
|
for entry in feed.entries[:MAX_ARTICLES_PER_FEED]:
|
||||||
|
title = entry.get("title", "").strip()
|
||||||
|
summary = entry.get("summary", entry.get("description", "")).strip()
|
||||||
|
# Strip HTML tags — Claude handles messy plain text fine
|
||||||
|
summary = re.sub(r"<[^>]+>", " ", summary).strip()
|
||||||
|
if title:
|
||||||
|
articles.append({
|
||||||
|
"source": source,
|
||||||
|
"title": title,
|
||||||
|
"summary": summary[:500],
|
||||||
|
})
|
||||||
|
except Exception as exc:
|
||||||
|
log.warning("Feed fetch failed — %s (%s): %s", source, url, exc)
|
||||||
|
return articles
|
||||||
|
|
||||||
|
|
||||||
|
# ── Claude summarization ───────────────────────────────────────────────────
|
||||||
|
def summarize_section(client: anthropic.Anthropic, section_title: str, articles: list) -> list:
|
||||||
|
"""Ask Claude to select and summarize the top stories for one section."""
|
||||||
|
if not articles:
|
||||||
|
log.warning("No articles fetched for section: %s", section_title)
|
||||||
|
return []
|
||||||
|
|
||||||
|
article_text = "\n\n".join(
|
||||||
|
f"[{a['source']}] {a['title']}\n{a['summary']}" for a in articles
|
||||||
|
)
|
||||||
|
|
||||||
|
prompt = (
|
||||||
|
f"You are the {section_title} editor for a daily newspaper called Papernews.\n\n"
|
||||||
|
"From the articles below, select the 4–6 most newsworthy stories and write a concise digest.\n\n"
|
||||||
|
"Return ONLY a JSON array with no other text:\n"
|
||||||
|
'[\n {"headline": "Short headline", "body": "1–3 sentences of newspaper prose."},\n ...\n]\n\n'
|
||||||
|
"Rules: headlines under 10 words, body text factual and tight, no source names or URLs.\n\n"
|
||||||
|
f"Articles:\n{article_text}"
|
||||||
|
)
|
||||||
|
|
||||||
|
try:
|
||||||
|
msg = client.messages.create(
|
||||||
|
model="claude-sonnet-4-6",
|
||||||
|
max_tokens=2048,
|
||||||
|
messages=[{"role": "user", "content": prompt}],
|
||||||
|
)
|
||||||
|
raw = msg.content[0].text.strip()
|
||||||
|
# Strip markdown code fences if Claude wraps the JSON
|
||||||
|
if raw.startswith("```"):
|
||||||
|
raw = raw.split("```")[1]
|
||||||
|
if raw.startswith("json"):
|
||||||
|
raw = raw[4:]
|
||||||
|
return json.loads(raw.strip())
|
||||||
|
except Exception as exc:
|
||||||
|
log.error("Claude summarization failed for '%s': %s", section_title, exc)
|
||||||
|
return []
|
||||||
|
|
||||||
|
|
||||||
|
# ── PDF compilation ────────────────────────────────────────────────────────
|
||||||
|
def compile_pdf(latex: str, output_path: Path) -> bool:
|
||||||
|
"""Compile a LaTeX string to PDF and copy it to output_path."""
|
||||||
|
with tempfile.TemporaryDirectory() as tmp:
|
||||||
|
tex_path = Path(tmp) / "papernews.tex"
|
||||||
|
tex_path.write_text(latex, encoding="utf-8")
|
||||||
|
|
||||||
|
result = subprocess.run(
|
||||||
|
["pdflatex", "-interaction=nonstopmode", "-output-directory", tmp, str(tex_path)],
|
||||||
|
capture_output=True,
|
||||||
|
text=True,
|
||||||
|
)
|
||||||
|
pdf_src = Path(tmp) / "papernews.pdf"
|
||||||
|
if result.returncode != 0 or not pdf_src.exists():
|
||||||
|
log.error("pdflatex failed. Last 3000 chars of output:\n%s", result.stdout[-3000:])
|
||||||
|
return False
|
||||||
|
|
||||||
|
shutil.copy2(pdf_src, output_path)
|
||||||
|
return True
|
||||||
|
|
||||||
|
|
||||||
|
# ── Telegram delivery ──────────────────────────────────────────────────────
|
||||||
|
def send_pdf(path: Path, caption: str) -> bool:
|
||||||
|
"""Upload the PDF to the configured Telegram chat."""
|
||||||
|
url = f"https://api.telegram.org/bot{BOT_TOKEN}/sendDocument"
|
||||||
|
try:
|
||||||
|
with open(path, "rb") as f:
|
||||||
|
resp = requests.post(
|
||||||
|
url,
|
||||||
|
data={"chat_id": CHAT_ID, "caption": caption},
|
||||||
|
files={"document": f},
|
||||||
|
timeout=60,
|
||||||
|
)
|
||||||
|
if resp.ok:
|
||||||
|
log.info("PDF delivered to Telegram.")
|
||||||
|
return True
|
||||||
|
log.error("Telegram sendDocument failed: %s — %s", resp.status_code, resp.text)
|
||||||
|
return False
|
||||||
|
except Exception as exc:
|
||||||
|
log.error("Telegram send error: %s", exc)
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def send_message(text: str) -> None:
|
||||||
|
"""Send a plain-text message to Telegram (used for error alerts)."""
|
||||||
|
url = f"https://api.telegram.org/bot{BOT_TOKEN}/sendMessage"
|
||||||
|
try:
|
||||||
|
requests.post(url, json={"chat_id": CHAT_ID, "text": text}, timeout=15)
|
||||||
|
except Exception as exc:
|
||||||
|
log.error("Telegram message error: %s", exc)
|
||||||
|
|
||||||
|
|
||||||
|
# ── Cleanup ────────────────────────────────────────────────────────────────
|
||||||
|
def prune_old_pdfs() -> None:
|
||||||
|
"""Delete papernews PDFs from OUTPUT_DIR older than RETENTION_DAYS."""
|
||||||
|
cutoff = datetime.now() - timedelta(days=RETENTION_DAYS)
|
||||||
|
for pdf in OUTPUT_DIR.glob("papernews-*.pdf"):
|
||||||
|
if datetime.fromtimestamp(pdf.stat().st_mtime) < cutoff:
|
||||||
|
pdf.unlink()
|
||||||
|
log.info("Pruned: %s", pdf.name)
|
||||||
|
|
||||||
|
|
||||||
|
# ── Main pipeline ──────────────────────────────────────────────────────────
|
||||||
|
def run() -> None:
|
||||||
|
"""Full pipeline: fetch → summarize → render → deliver → prune."""
|
||||||
|
now = datetime.now(CT)
|
||||||
|
date_str = now.strftime("%A, %B %-d, %Y")
|
||||||
|
pdf_path = OUTPUT_DIR / f"papernews-{now.strftime('%Y-%m-%d')}.pdf"
|
||||||
|
|
||||||
|
log.info("Starting Papernews run for %s", date_str)
|
||||||
|
client = anthropic.Anthropic(api_key=ANTHROPIC_KEY)
|
||||||
|
|
||||||
|
body_parts = []
|
||||||
|
for section in SECTIONS:
|
||||||
|
log.info("Processing section: %s", section["title"])
|
||||||
|
articles = fetch_articles(section["feeds"])
|
||||||
|
log.info(" Fetched %d articles", len(articles))
|
||||||
|
stories = summarize_section(client, section["title"], articles)
|
||||||
|
log.info(" Summarized to %d stories", len(stories))
|
||||||
|
if stories:
|
||||||
|
body_parts.append(build_section_latex(section["title"], stories))
|
||||||
|
|
||||||
|
if not body_parts:
|
||||||
|
msg = "Papernews: no content generated today — check container logs."
|
||||||
|
log.error(msg)
|
||||||
|
send_message(msg)
|
||||||
|
return
|
||||||
|
|
||||||
|
# Substitute location, date, and body into the template using plain str.replace()
|
||||||
|
# so no special characters in the content can interfere with formatting.
|
||||||
|
latex = (
|
||||||
|
LATEX_TEMPLATE
|
||||||
|
.replace("PAPERLOCATION", LOCATION)
|
||||||
|
.replace("PAPERDATE", date_str)
|
||||||
|
.replace("PAPERBODY", "\n\n".join(body_parts))
|
||||||
|
)
|
||||||
|
|
||||||
|
if not compile_pdf(latex, pdf_path):
|
||||||
|
send_message("Papernews: LaTeX compile failed — check container logs.")
|
||||||
|
return
|
||||||
|
|
||||||
|
if not send_pdf(pdf_path, f"Papernews — {date_str}"):
|
||||||
|
log.error("PDF compiled but Telegram delivery failed: %s", pdf_path)
|
||||||
|
return
|
||||||
|
|
||||||
|
prune_old_pdfs()
|
||||||
|
log.info("Papernews complete: %s", pdf_path.name)
|
||||||
|
|
||||||
|
|
||||||
|
# ── Entry point ────────────────────────────────────────────────────────────
|
||||||
|
if __name__ == "__main__":
|
||||||
|
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
if RUN_NOW:
|
||||||
|
log.info("RUN_NOW=true — running immediately.")
|
||||||
|
run()
|
||||||
|
|
||||||
|
scheduler = BlockingScheduler(timezone=CT)
|
||||||
|
scheduler.add_job(run, "cron", hour=SCHEDULE_HOUR, minute=SCHEDULE_MIN, id="papernews_daily")
|
||||||
|
log.info("Papernews scheduled for %02d:%02d daily.", SCHEDULE_HOUR, SCHEDULE_MIN)
|
||||||
|
scheduler.start()
|
||||||
4
requirements.txt
Normal file
4
requirements.txt
Normal file
|
|
@ -0,0 +1,4 @@
|
||||||
|
feedparser>=6.0
|
||||||
|
anthropic>=0.50.0
|
||||||
|
apscheduler>=3.10,<4.0
|
||||||
|
requests>=2.28.0
|
||||||
Loading…
Add table
Reference in a new issue