Skip to content

tulsi-builder/slack-research

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Slack Research Pipeline

Research pipeline to scrape and analyze Block Slack channels for product insights on the new Universal Agent Environment.


Quick Start (4 steps)

1. Get a Slack token         (see instructions below)
2. Save it in .env           (one command)
3. pip install requirements  (one command)
4. python scripts/scrape_slack.py  (go get coffee)

Step 1: Get a Slack Token

1a. Go to the Slack App Dashboard

Open this URL in your browser:

👉 https://api.slack.com/apps

Sign in with your Block Slack account if prompted.

1b. Create a New App

  1. Click the green "Create New App" button (top right)
  2. Select "From scratch"
  3. Fill in:
    • App Name: slack-research-scraper (or whatever you want)
    • Workspace: Pick your Block workspace from the dropdown
  4. Click "Create App"

You'll land on the app's settings page.

1c. Add Permissions (Scopes)

  1. In the left sidebar, click "OAuth & Permissions"
  2. Scroll down to the section called "User Token Scopes"
    • ⚠️ Make sure it says User Token Scopes, NOT Bot Token Scopes
  3. Click "Add an OAuth Scope" and add these one by one:
channels:history      → Read messages in public channels
channels:read         → List channels and find their IDs
groups:history        → Read messages in private channels (some targets may be private)
users:read            → Read user profiles (job title, department)

1d. Install the App to Your Workspace

  1. Scroll back up on the same OAuth & Permissions page
  2. Click the "Install to Workspace" button
  3. Slack shows a permissions screen — click "Allow"
  4. You'll now see a User OAuth Token starting with xoxp-
  5. Copy this token — you need it for the next step

💡 If you ever need to see the token again, come back to this same OAuth & Permissions page. You can also click "Reinstall" if you change scopes.

1e. Join the Target Channels

The token uses YOUR channel access. Make sure you've joined all of these channels in Slack (just open each one and click "Join"):

  • #g2-community
  • #goose-help
  • #goose-inspiration
  • #goose-dev
  • #builderbot-community
  • #builderbot-team
  • #staged
  • #nexus

Step 2: Save the Token

Open your terminal and run this (paste your real token where it says):

echo 'SLACK_TOKEN=xoxp-YOUR-TOKEN-HERE' > /Users/tulsi/slack-research/.env

For example if your token is xoxp-123-456-789-abc, you'd run:

echo 'SLACK_TOKEN=xoxp-123-456-789-abc' > /Users/tulsi/slack-research/.env

Verify it saved:

cat /Users/tulsi/slack-research/.env

You should see your token printed back. The .env file is gitignored so it won't be committed anywhere.

🔒 Never paste tokens directly into chat, code, or commits. Always use .env files or environment variables.


Step 3: Install Dependencies

cd /Users/tulsi/slack-research
pip install -r scripts/requirements.txt

Step 4: Run the Scraper

cd /Users/tulsi/slack-research
python scripts/scrape_slack.py

This will:

  • Authenticate with Slack
  • Find all 8 target channels
  • Pull messages + full threads (8 months for goose/g2 channels, 6 months for others)
  • Fetch user profiles (title, department) and anonymize names
  • Save everything as JSON in raw/

Estimated time: Depends on message volume. Could be 10-60 minutes due to rate limiting (the script handles this automatically with retries).


Step 5: Analyze

Stats (no LLM needed)

python scripts/analyze.py --stats-only

Generates processed/stats-report.md with:

  • Message counts per channel
  • Who's talking (by role/department)
  • Keyword-based category signals
  • Most reacted messages (high-signal pain points)
  • Longest threads (deep discussions)
  • Activity over time

Full Analysis (with LLM classification)

python scripts/analyze.py

This generates everything above PLUS prepared LLM prompt batches in processed/llm-batches/. Each batch is a prompt you can send to any LLM (Goose, Claude, GPT, etc.) for deeper classification.

Synthesize LLM Results

After processing the LLM batches, save results as batch-NNN-result.json in the same directory, then:

python scripts/synthesize.py

This produces the final processed/research-report.md with:

  • Categorized findings mapped to product vision
  • Top pain points, feature requests, workarounds
  • Executive summary
  • Vision alignment analysis

Output Structure

slack-research/
├── .env                          ← Your token (gitignored)
├── .gitignore
├── README.md                     ← You are here
├── scripts/
│   ├── requirements.txt
│   ├── scrape_slack.py           ← Step 4: Scrape channels
│   ├── analyze.py                ← Step 5: Stats + LLM prep
│   └── synthesize.py             ← Step 5: Combine LLM results
├── raw/                          ← Raw scraped data (gitignored)
│   ├── g2-community.json
│   ├── goose-help.json
│   ├── goose-inspiration.json
│   ├── goose-dev.json
│   ├── builderbot-community.json
│   ├── builderbot-team.json
│   ├── staged.json
│   ├── nexus.json
│   └── user_profiles.json
└── processed/
    ├── stats-report.md           ← Statistical analysis
    ├── llm-batches/              ← Prompts for LLM classification
    │   ├── batch-000-prompt.md
    │   ├── batch-000-data.json
    │   ├── batch-000-result.json ← You fill these in
    │   └── ...
    └── research-report.md        ← Final synthesized report

Troubleshooting

Problem Solution
❌ SLACK_TOKEN not found Check your .env file exists and has the token
❌ Auth failed: invalid_auth Token is wrong or expired — regenerate at api.slack.com/apps
❌ Bot is not in #channel Join that channel in Slack with your personal account
missing_scope Go back to OAuth & Permissions, add the missing scope, click Reinstall
ratelimited Script handles this automatically — just wait
Script seems slow Normal! Rate limiting means ~1 API call/sec. Large channels take time.

Target Channels & Time Ranges

Channel Time Range Why
#g2-community 8 months Core G2 user feedback
#goose-help 8 months Support requests = pain points
#goose-inspiration 8 months What people want to build
#goose-dev 8 months Developer perspective on limitations
#builderbot-community 6 months Adjacent tool community
#builderbot-team 6 months Internal team discussions
#staged 6 months Staging/deployment workflows
#nexus 6 months Cross-tool integration discussions

Research Taxonomy

Messages are classified into these categories:

Category What We're Looking For Maps to Product Area
pain_point Frustrations, bugs, things broken Core Reliability & UX
feature_request "I wish it could..." Feature Roadmap
workaround "What I do instead is..." Unmet Needs (gold mine)
use_case "I used it to..." Product Positioning
non_engineer_attempt Non-eng people trying tools "For Everyone" Validation
multiplayer_collab Team/sharing needs Multiplayer Features
customization UI/behavior modification Self-Editable UI / Apps
scheduling_automation Cron/event-driven needs Scheduled Agents
context_memory Forgetting context, sessions Projects / Workspaces
disconnection Tools not integrated Unified Environment
praise Things people love Preserve & Build On
onboarding_confusion Getting started issues Onboarding & Docs

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors