Environment, Social and Corporate Governance (ESG) refers to the three central factors in measuring the sustainability and societal impact of an investment in a company or business. These criteria help to better determine the future financial performance of companies (return and risk).
This analysis extracts text from an ESG report in PDF format from the internet, performs NLP on the information, summarizes the key ESG initiatives with Word Clouds and TFIDFs, and discovers topics by building a Latent Dirichlet Allocation (LDA) model.
To keep this exercise as simple as possible, only one ESG report is being used. Specifically the Citibank's 2019 ESG report.
An additional notebook is provided for analyzing sustainability reports by Cabot Corp.
Given that ESG is a broad topic, different companies focus on different aspects of ESG depending on their business operations and culture. One can potentially ingest more ESG reports from different companies across all sectors and industries to capture relevant ESG topics. This to be attempted in another analysis.
- https://github.com/jingjieyeo/esg-nlp/blob/master/notebook/esg-report-analysis.ipynb
- https://github.com/jingjieyeo/esg-nlp/blob/master/notebook/cabot-sustainability-report-analysis.ipynb
- A data-driven approach to Environmental, Social and Governance
- Higher ESG ratings are generally positively correlated with valuation and profitability while negatively correlated with volatility.
- Topic Modeling with Gensim (Python)
- Citibank's 2019 ESG report
- Databricks - ESG Reports
- Databricks - Data Driven ESG Score
- Databricks - ESG Market Risk
- Topic Modeling and Latent Dirichlet Allocation (LDA) in Python
- Evaluate Topic Models: Latent Dirichlet Allocation (LDA)
- Topic modeling visualization – How to present the results of LDA models?