Unemployment detection on Twitter

Data and models for ACL 2022 paper "Multilingual Detection of Personal Employment Status on Twitter"

Data

The labeled tweets can be found in the data folder. For each country, the relevant labels can be found in the <COUNTRY_CODE>.csv file. In total, the files contain respectively 8376 English, 11002 Spanish and 7156 Portuguese labeled tweets from users based in the United States (US), Mexico (MX) and Brazil (BR). The tweets are labeled for the 5 classes of interest in the paper, namely whether the tweet indicates that its author "was hired in the past month" (is_hired_1mo), "lost her job in the past month" (lost_job_1mo), "is looking for a job" (job_search), "is unemployed" (is_unemployed) or whether the tweet is a job offer (job_offer). 1 indicate positive labels while 0 indicate negative labels.

Due to Twitter data sharing policies, we are not able to release the raw tweets and can only release the tweet IDs (tweet_id) in combination with the labels. The raw tweets can then be retrieved by using the Twitter API.

Models

Our model have been open-sourced on 🤗 and can be found in this collection. Please find in the table below the model names in the hub for each language and class.

Class	English	Spanish	Portuguese
Is Unemployed	`worldbank/bert-twitter-en-is-unemployed`	`worldbank/bert-twitter-es-is-unemployed`	`worldbank/bert-twitter-pt-is-unemployed`
Lost Job	`worldbank/bert-twitter-en-lost-job`	`worldbank/bert-twitter-es-lost-job`	`worldbank/bert-twitter-pt-lost-job`
Job Search	`worldbank/bert-twitter-en-job-search`	`worldbank/bert-twitter-es-job-search`	`worldbank/bert-twitter-pt-job-search`
Is Hired	`worldbank/bert-twitter-en-is-hired`	`worldbank/bert-twitter-es-is-hired`	`worldbank/bert-twitter-pt-is-hired`
Job Offer	`worldbank/bert-twitter-en-job-offer`	`worldbank/bert-twitter-es-job-offer`	`worldbank/bert-twitter-pt-job-offer`

To use a specific model, say the English model for the class Is Unemployed, do the following:

from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("worldbank/bert-twitter-en-is-unemployed")
model = AutoModel.from_pretrained("worldbank/bert-twitter-en-is-unemployed")

Citation

If you find our work useful, please cite:

@inproceedings{tonneau-etal-2022-multilingual,
    title = "Multilingual Detection of Personal Employment Status on {T}witter",
    author = "Tonneau, Manuel  and
      Adjodah, Dhaval  and
      Palotti, Joao  and
      Grinberg, Nir  and
      Fraiberger, Samuel",
    booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = may,
    year = "2022",
    address = "Dublin, Ireland",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.acl-long.453",
    doi = "10.18653/v1/2022.acl-long.453",
    pages = "6564--6587",
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Unemployment detection on Twitter

Data

Models

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Unemployment detection on Twitter

Data

Models

Citation