Slack Research Pipeline

Research pipeline to scrape and analyze Block Slack channels for product insights on the new Universal Agent Environment.

Quick Start (4 steps)

1. Get a Slack token         (see instructions below)
2. Save it in .env           (one command)
3. pip install requirements  (one command)
4. python scripts/scrape_slack.py  (go get coffee)

Step 1: Get a Slack Token

1a. Go to the Slack App Dashboard

Open this URL in your browser:

👉 https://api.slack.com/apps

Sign in with your Block Slack account if prompted.

1b. Create a New App

Click the green "Create New App" button (top right)
Select "From scratch"
Fill in:
- App Name: slack-research-scraper (or whatever you want)
- Workspace: Pick your Block workspace from the dropdown
Click "Create App"

You'll land on the app's settings page.

1c. Add Permissions (Scopes)

In the left sidebar, click "OAuth & Permissions"
Scroll down to the section called "User Token Scopes"
- ⚠️ Make sure it says User Token Scopes, NOT Bot Token Scopes
Click "Add an OAuth Scope" and add these one by one:

channels:history      → Read messages in public channels
channels:read         → List channels and find their IDs
groups:history        → Read messages in private channels (some targets may be private)
users:read            → Read user profiles (job title, department)

1d. Install the App to Your Workspace

Scroll back up on the same OAuth & Permissions page
Click the "Install to Workspace" button
Slack shows a permissions screen — click "Allow"
You'll now see a User OAuth Token starting with xoxp-
Copy this token — you need it for the next step

💡 If you ever need to see the token again, come back to this same OAuth & Permissions page. You can also click "Reinstall" if you change scopes.

1e. Join the Target Channels

The token uses YOUR channel access. Make sure you've joined all of these channels in Slack (just open each one and click "Join"):

#g2-community
#goose-help
#goose-inspiration
#goose-dev
#builderbot-community
#builderbot-team
#staged
#nexus

Step 2: Save the Token

Open your terminal and run this (paste your real token where it says):

echo 'SLACK_TOKEN=xoxp-YOUR-TOKEN-HERE' > /Users/tulsi/slack-research/.env

For example if your token is xoxp-123-456-789-abc, you'd run:

echo 'SLACK_TOKEN=xoxp-123-456-789-abc' > /Users/tulsi/slack-research/.env

Verify it saved:

cat /Users/tulsi/slack-research/.env

You should see your token printed back. The .env file is gitignored so it won't be committed anywhere.

🔒 Never paste tokens directly into chat, code, or commits. Always use .env files or environment variables.

Step 3: Install Dependencies

cd /Users/tulsi/slack-research
pip install -r scripts/requirements.txt

Step 4: Run the Scraper

cd /Users/tulsi/slack-research
python scripts/scrape_slack.py

This will:

Authenticate with Slack
Find all 8 target channels
Pull messages + full threads (8 months for goose/g2 channels, 6 months for others)
Fetch user profiles (title, department) and anonymize names
Save everything as JSON in raw/

Estimated time: Depends on message volume. Could be 10-60 minutes due to rate limiting (the script handles this automatically with retries).

Step 5: Analyze

Stats (no LLM needed)

python scripts/analyze.py --stats-only

Generates processed/stats-report.md with:

Message counts per channel
Who's talking (by role/department)
Keyword-based category signals
Most reacted messages (high-signal pain points)
Longest threads (deep discussions)
Activity over time

Full Analysis (with LLM classification)

python scripts/analyze.py

This generates everything above PLUS prepared LLM prompt batches in processed/llm-batches/. Each batch is a prompt you can send to any LLM (Goose, Claude, GPT, etc.) for deeper classification.

Synthesize LLM Results

After processing the LLM batches, save results as batch-NNN-result.json in the same directory, then:

python scripts/synthesize.py

This produces the final processed/research-report.md with:

Categorized findings mapped to product vision
Top pain points, feature requests, workarounds
Executive summary
Vision alignment analysis

Output Structure

slack-research/
├── .env                          ← Your token (gitignored)
├── .gitignore
├── README.md                     ← You are here
├── scripts/
│   ├── requirements.txt
│   ├── scrape_slack.py           ← Step 4: Scrape channels
│   ├── analyze.py                ← Step 5: Stats + LLM prep
│   └── synthesize.py             ← Step 5: Combine LLM results
├── raw/                          ← Raw scraped data (gitignored)
│   ├── g2-community.json
│   ├── goose-help.json
│   ├── goose-inspiration.json
│   ├── goose-dev.json
│   ├── builderbot-community.json
│   ├── builderbot-team.json
│   ├── staged.json
│   ├── nexus.json
│   └── user_profiles.json
└── processed/
    ├── stats-report.md           ← Statistical analysis
    ├── llm-batches/              ← Prompts for LLM classification
    │   ├── batch-000-prompt.md
    │   ├── batch-000-data.json
    │   ├── batch-000-result.json ← You fill these in
    │   └── ...
    └── research-report.md        ← Final synthesized report

Troubleshooting

Problem	Solution
`❌ SLACK_TOKEN not found`	Check your .env file exists and has the token
`❌ Auth failed: invalid_auth`	Token is wrong or expired — regenerate at api.slack.com/apps
`❌ Bot is not in #channel`	Join that channel in Slack with your personal account
`missing_scope`	Go back to OAuth & Permissions, add the missing scope, click Reinstall
`ratelimited`	Script handles this automatically — just wait
Script seems slow	Normal! Rate limiting means ~1 API call/sec. Large channels take time.

Target Channels & Time Ranges

Channel	Time Range	Why
#g2-community	8 months	Core G2 user feedback
#goose-help	8 months	Support requests = pain points
#goose-inspiration	8 months	What people want to build
#goose-dev	8 months	Developer perspective on limitations
#builderbot-community	6 months	Adjacent tool community
#builderbot-team	6 months	Internal team discussions
#staged	6 months	Staging/deployment workflows
#nexus	6 months	Cross-tool integration discussions

Research Taxonomy

Messages are classified into these categories:

Category	What We're Looking For	Maps to Product Area
pain_point	Frustrations, bugs, things broken	Core Reliability & UX
feature_request	"I wish it could..."	Feature Roadmap
workaround	"What I do instead is..."	Unmet Needs (gold mine)
use_case	"I used it to..."	Product Positioning
non_engineer_attempt	Non-eng people trying tools	"For Everyone" Validation
multiplayer_collab	Team/sharing needs	Multiplayer Features
customization	UI/behavior modification	Self-Editable UI / Apps
scheduling_automation	Cron/event-driven needs	Scheduled Agents
context_memory	Forgetting context, sessions	Projects / Workspaces
disconnection	Tools not integrated	Unified Environment
praise	Things people love	Preserve & Build On
onboarding_confusion	Getting started issues	Onboarding & Docs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Slack Research Pipeline

Quick Start (4 steps)

Step 1: Get a Slack Token

1a. Go to the Slack App Dashboard

1b. Create a New App

1c. Add Permissions (Scopes)

1d. Install the App to Your Workspace

1e. Join the Target Channels

Step 2: Save the Token

Step 3: Install Dependencies

Step 4: Run the Scraper

Step 5: Analyze

Stats (no LLM needed)

Full Analysis (with LLM classification)

Synthesize LLM Results

Output Structure

Troubleshooting

Target Channels & Time Ranges

Research Taxonomy

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
app		app
processed		processed
scripts		scripts
.gitignore		.gitignore
README.md		README.md
env-example.txt		env-example.txt

Folders and files

Latest commit

History

Repository files navigation

Slack Research Pipeline

Quick Start (4 steps)

Step 1: Get a Slack Token

1a. Go to the Slack App Dashboard

1b. Create a New App

1c. Add Permissions (Scopes)

1d. Install the App to Your Workspace

1e. Join the Target Channels

Step 2: Save the Token

Step 3: Install Dependencies

Step 4: Run the Scraper

Step 5: Analyze

Stats (no LLM needed)

Full Analysis (with LLM classification)

Synthesize LLM Results

Output Structure

Troubleshooting

Target Channels & Time Ranges

Research Taxonomy

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages