Summary:
- Downloads news articles by searching on http://www.news.google.com with keywords of interest and specific date ranges.
- Generates CVS files of news text
- Uses pre-trained NLP models to perform sentiment analysis of the news text.
- Is a scraper script which can search google news for a given date range using keywords of interest.
- sample data generated by the script looks like this. ( for keywords 'bitcoin cryptocurrency)
- Takes csv file generated by google_news_scraper.py
- Performs sentiment analysis on each cell
- Performs flair (https://pypi.org/project/flair/), textblob (https://pypi.org/project/textblob/), and VADER (https://www.nltk.org/_modules/nltk/sentiment/vader.html) NLP processing to get sentiment scores.
- Averages score of each row ( by each metic) to get overall sentiment analysis score. Goal is to get overall sentiment score of all the news publised on the given date.
- Sample data generated at this stage looks like this.
This framework is used in https://github.com/pratikpv/predicting_bitcoin_market
Code from https://towardsdatascience.com/web-scraping-news-articles-in-python-9dd605799558 is referenced as base to write scraper code.