Slack Daily Summarizer: Automated Message Extraction & AI-Powered Insights
About
This project automates the collection and summarization of Slack conversations across an entire workspace, converting raw message noise into actionable summaries and task lists.
The Problem
In async-heavy teams, Slack conversations sprawl across dozens of channels, private groups, and DMs. Staying informed requires manually scrolling through countless conversations, piecing together context from threads, and manually tracking action items. This is tedious, error-prone, and scales poorly.
The Solution
A Python-based pipeline that:
Extracts all messages from a single calendar day across every conversation type the authenticated user can access—public channels, private channels, DMs, and group DMs. Uses Slack's conversations.list API to enumerate all accessible conversations, conversations.history to fetch top-level messages, and conversations.replies to capture full thread context. Handles pagination, rate limiting, and timezone conversion (America/Los_Angeles) automatically. Gracefully skips problematic conversations without breaking the entire job.
Aggregates by conversation into one row per conversation_name, concatenating all messages chronologically with sender names prefixed. This transforms 230+ individual message rows into 28 readable conversation summaries, reducing noise while preserving context.
Extracts summaries and action items using GPT-4o with structured output. Instead of asking the model for free-form text, the pipeline defines a strict Pydantic schema (summary: string, action_items: list[str]) and lets the model fill it in. This eliminates parsing headaches and ensures clean, valid data every time.
Key Technical Decisions
Complete thread coverage: The Slack API separates top-level messages (from conversations.history) from thread replies (from conversations.replies). Most naive implementations miss replies entirely. This pipeline explicitly fetches both, ensuring no context is lost.
Structured output over free-form: LLMs produce inconsistent results when asked for unstructured data. Using Pydantic to define the output schema guarantees parseable, predictable results—no string manipulation needed downstream.
Row-level aggregation before LLM processing: Feeding 230 individual messages to GPT-4o is expensive (token usage) and noisy (the model struggles with raw message volume). Aggregating to 28 conversation summaries first improves quality and reduces cost.
Graceful error handling: One bad API call doesn't tank the whole pipeline. Each conversation is processed independently with error logging, so problems in a single channel don't block the rest.
Output
A clean Pandas DataFrame (slack_messages_df) containing every message and thread reply from the day, plus aggregated per-conversation summaries and structured action item lists. Results include metadata (sender names, conversation types, timestamps), making downstream analysis straightforward.
Use Cases
Daily team recaps without manual scrolling
Compliance auditing and message archival
Sentiment analysis across conversations
Automated action item tracking
Knowledge base population from team discussions
Tech Stack
Python, Pandas, Slack Web API, OpenAI API (GPT-4o), Pydantic, Zerve (serverless compute platform). No infrastructure management needed—code runs in the cloud with configurable compute.
The pipeline demonstrates how to combine APIs (Slack), data transformation (Pandas), and modern LLMs (structured output) to turn unstructured communication into structured insights.



