Research pipeline to scrape and analyze Block Slack channels for product insights on the new Universal Agent Environment.
1. Get a Slack token (see instructions below)
2. Save it in .env (one command)
3. pip install requirements (one command)
4. python scripts/scrape_slack.py (go get coffee)
Open this URL in your browser:
Sign in with your Block Slack account if prompted.
- Click the green "Create New App" button (top right)
- Select "From scratch"
- Fill in:
- App Name:
slack-research-scraper(or whatever you want) - Workspace: Pick your Block workspace from the dropdown
- App Name:
- Click "Create App"
You'll land on the app's settings page.
- In the left sidebar, click "OAuth & Permissions"
- Scroll down to the section called "User Token Scopes"
⚠️ Make sure it says User Token Scopes, NOT Bot Token Scopes
- Click "Add an OAuth Scope" and add these one by one:
channels:history → Read messages in public channels
channels:read → List channels and find their IDs
groups:history → Read messages in private channels (some targets may be private)
users:read → Read user profiles (job title, department)
- Scroll back up on the same OAuth & Permissions page
- Click the "Install to Workspace" button
- Slack shows a permissions screen — click "Allow"
- You'll now see a User OAuth Token starting with
xoxp- - Copy this token — you need it for the next step
💡 If you ever need to see the token again, come back to this same OAuth & Permissions page. You can also click "Reinstall" if you change scopes.
The token uses YOUR channel access. Make sure you've joined all of these channels in Slack (just open each one and click "Join"):
#g2-community#goose-help#goose-inspiration#goose-dev#builderbot-community#builderbot-team#staged#nexus
Open your terminal and run this (paste your real token where it says):
echo 'SLACK_TOKEN=xoxp-YOUR-TOKEN-HERE' > /Users/tulsi/slack-research/.envFor example if your token is xoxp-123-456-789-abc, you'd run:
echo 'SLACK_TOKEN=xoxp-123-456-789-abc' > /Users/tulsi/slack-research/.envVerify it saved:
cat /Users/tulsi/slack-research/.envYou should see your token printed back. The .env file is gitignored so it
won't be committed anywhere.
🔒 Never paste tokens directly into chat, code, or commits. Always use .env files or environment variables.
cd /Users/tulsi/slack-research
pip install -r scripts/requirements.txtcd /Users/tulsi/slack-research
python scripts/scrape_slack.pyThis will:
- Authenticate with Slack
- Find all 8 target channels
- Pull messages + full threads (8 months for goose/g2 channels, 6 months for others)
- Fetch user profiles (title, department) and anonymize names
- Save everything as JSON in
raw/
Estimated time: Depends on message volume. Could be 10-60 minutes due to rate limiting (the script handles this automatically with retries).
python scripts/analyze.py --stats-onlyGenerates processed/stats-report.md with:
- Message counts per channel
- Who's talking (by role/department)
- Keyword-based category signals
- Most reacted messages (high-signal pain points)
- Longest threads (deep discussions)
- Activity over time
python scripts/analyze.pyThis generates everything above PLUS prepared LLM prompt batches in
processed/llm-batches/. Each batch is a prompt you can send to any LLM
(Goose, Claude, GPT, etc.) for deeper classification.
After processing the LLM batches, save results as batch-NNN-result.json
in the same directory, then:
python scripts/synthesize.pyThis produces the final processed/research-report.md with:
- Categorized findings mapped to product vision
- Top pain points, feature requests, workarounds
- Executive summary
- Vision alignment analysis
slack-research/
├── .env ← Your token (gitignored)
├── .gitignore
├── README.md ← You are here
├── scripts/
│ ├── requirements.txt
│ ├── scrape_slack.py ← Step 4: Scrape channels
│ ├── analyze.py ← Step 5: Stats + LLM prep
│ └── synthesize.py ← Step 5: Combine LLM results
├── raw/ ← Raw scraped data (gitignored)
│ ├── g2-community.json
│ ├── goose-help.json
│ ├── goose-inspiration.json
│ ├── goose-dev.json
│ ├── builderbot-community.json
│ ├── builderbot-team.json
│ ├── staged.json
│ ├── nexus.json
│ └── user_profiles.json
└── processed/
├── stats-report.md ← Statistical analysis
├── llm-batches/ ← Prompts for LLM classification
│ ├── batch-000-prompt.md
│ ├── batch-000-data.json
│ ├── batch-000-result.json ← You fill these in
│ └── ...
└── research-report.md ← Final synthesized report
| Problem | Solution |
|---|---|
❌ SLACK_TOKEN not found |
Check your .env file exists and has the token |
❌ Auth failed: invalid_auth |
Token is wrong or expired — regenerate at api.slack.com/apps |
❌ Bot is not in #channel |
Join that channel in Slack with your personal account |
missing_scope |
Go back to OAuth & Permissions, add the missing scope, click Reinstall |
ratelimited |
Script handles this automatically — just wait |
| Script seems slow | Normal! Rate limiting means ~1 API call/sec. Large channels take time. |
| Channel | Time Range | Why |
|---|---|---|
| #g2-community | 8 months | Core G2 user feedback |
| #goose-help | 8 months | Support requests = pain points |
| #goose-inspiration | 8 months | What people want to build |
| #goose-dev | 8 months | Developer perspective on limitations |
| #builderbot-community | 6 months | Adjacent tool community |
| #builderbot-team | 6 months | Internal team discussions |
| #staged | 6 months | Staging/deployment workflows |
| #nexus | 6 months | Cross-tool integration discussions |
Messages are classified into these categories:
| Category | What We're Looking For | Maps to Product Area |
|---|---|---|
| pain_point | Frustrations, bugs, things broken | Core Reliability & UX |
| feature_request | "I wish it could..." | Feature Roadmap |
| workaround | "What I do instead is..." | Unmet Needs (gold mine) |
| use_case | "I used it to..." | Product Positioning |
| non_engineer_attempt | Non-eng people trying tools | "For Everyone" Validation |
| multiplayer_collab | Team/sharing needs | Multiplayer Features |
| customization | UI/behavior modification | Self-Editable UI / Apps |
| scheduling_automation | Cron/event-driven needs | Scheduled Agents |
| context_memory | Forgetting context, sessions | Projects / Workspaces |
| disconnection | Tools not integrated | Unified Environment |
| praise | Things people love | Preserve & Build On |
| onboarding_confusion | Getting started issues | Onboarding & Docs |