A Telegram bot that grants one active user at a time, queues additional users, and fetches Instagram profile snapshots through Selenium + ChromeDriver.
- Single-user access with a managed waiting queue.
- Guided conversation to collect Instagram usernames and refresh interval.
- Selenium-powered scraping helpers with pluggable Chrome profile support.
- Optional APScheduler job for unattended crawls that persist JSON snapshots.
- Python 3.10 or newer.
- Google Chrome and a matching ChromeDriver on your PATH.
- Dependencies listed in
requirements.txt(pip install -r requirements.txt).
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt
set TELEGRAM_BOT_TOKEN=your-telegram-token
# Optional Selenium tweaks
set INSTA_CHROME_PROFILE_DIR=C:\\path\\to\\chrome-profile
set INSTA_CHROME_PROFILE_NAME=Profile 1
set INSTA_CHROME_HEADLESS=1
python -m bot.mainOpen Telegram, talk to your bot, and send /start.
- Telegram token – set
TELEGRAM_BOT_TOKENin your environment. The bot raises a clear error if it is missing. - Chrome profile – customise Selenium via:
INSTA_CHROME_PROFILE_DIR(folder that contains the Chrome profile to reuse).INSTA_CHROME_PROFILE_NAME(optional profile directory name inside user data).INSTA_CHROME_HEADLESS(1/true/on) to launch Chrome in headless mode.- Without variables the bot creates/uses a local
chrome_profile/folder (ignored by git).
/start– request access and see your position in the queue./start_tracking– supply usernames and an hourly interval to fetch analytics./change– update the usernames being tracked while you hold the lock./cancel_tracking– release your slot for the next user./end– clear your session and leave the queue entirely.
Responses include follower counts, total posts, and placeholders for post metrics (likes, comments, hashtags). Replace the placeholder logic in utils/instagram_crawler.py to collect real engagement numbers.
bot/scheduler.py exposes create_scheduler() which schedules scheduled_crawl() every six hours by default. Execute the module directly to run a background scheduler, or import and integrate it into your own service.
Sample output lives at bot/data/sample_output.example.json. Real crawls write JSON to bot/data/ and the folder is git-ignored so production data stays local.
.gitignoreexcludes Chrome/Selenium artefacts, caches, and compiled files.- Runtime directories generated by Chrome (profiles, caches, Crashpad, etc.) have been removed from the repository root.
- Replace placeholder scraping with real metrics (likes, comments, hashtags, captions).
- Add resilience against Instagram throttling (retries, exponential backoff, proxy support).
- Extend analytics in
utils/data_processing.pyand add automated tests. - Secure long-running deployments (process supervision, persistent storage, structured logging).