This repository contains the code and data for the paper "Automated Identification of Competing Narratives in Political Discourse on Social Media" published at the Text2Story Workshop 2025@ECIR.
The project is organized as follows:
dataset/folder should contain the data to be used for the analysisapp.pyandapp_pages/contain the code for the web application build with Streamlit- the numbered scripts are used to preprocess the data
The dataset is expected to be in JSONL format. Each line should contain one post. The data should contain the following fields:
date: the date of the posttext: the text of the postuser_id: unique identifier for the author of the posttranslation: the translation of the post (optional)
- Python
- Pipenv
- Clone the repo
- Add your dataset to the
dataset/folder. For exampledataset/twitter-covid/dataset.jsonl.gz- The data should be in JSONL format
- Each line should contain one post.
- Add your configuration in
.env. See.env.samplefor a template. - Install the dependencies:
pipenv install
The configuration is done in the .env file. The following variables are available:
DATASET: Path inside thedataset/folder to the dataset.- Example:
twitter-covid/dataset.jsonl.gz
- Example:
TEXT_ATTR: Name of the field in the dataset that contains the text of the post.- Example:
text
- Example:
TEXT_TRANSLATION_ATTR: Name of the field in the dataset that contains the translation of the post. (optional)- Example:
translation
- Example:
USER_ATTR: Name of the field in the dataset that contains the unique identifier of the author.- Example:
user_id
- Example:
EMBEDDING_MODEL: Name of the sentence embedding model to use.- Example:
paraphrase-multilingual-MiniLM-L12-v2
- Example:
EMBED_TRANSLATION: Whether to use the translation for the embeddings. (optional)0or1
OPENAI_URL: URL for an OpenAI compatible API. (optional)OPENAI_API_KEY: API key for the OpenAI API. (optional)OPENAI_MODEL: Name of the LLM to use. (optional)- Example:
phi4:14b
- Example:
LLMs can optionally be used to summarize events and stories. Otherwise, we fall back to keyword extraction.
- Activate the virtual environment:
pipenv shell - Run the numbered scripts in order to preprocess the data
- Run the Streamlit app:
streamlit run app.py
If you use this code or data, please cite the following paper:
@inproceedings{wildemann2025automated,
title = {Automated Identification of Competing Narratives in Political Discourse on Social Media},
author = {Sergej Wildemann and Erick Elejalde},
editor = {Ricardo Campos and
Al{\'{\i}}pio M{\'{a}}rio Jorge and
Adam Jatowt and
Sumit Bhatia and
Marina Litvak},
booktitle = {Proceedings of Text2Story - Eigth Workshop on Narrative Extraction
From Texts held in conjunction with the 47th European Conference on
Information Retrieval {(ECIR} 2025), Lucca, Italy, April 10, 2025},
year = {2025},
series = {{CEUR} Workshop Proceedings},
}This project is licensed under the MIT License - see the LICENSE file for details.
