CTI-ANN: A Self-Training-Based Annotation Framework with Tailored Augmentation for Cyber Threat Intelligence Posts

This repository contains two datasets curated for the paper "CTI-ANN: A Self-Training-Based Annotation Framework with Tailored Augmentation for Cyber Threat Intelligence Posts."
All datasets contain LLM-generated summarized versions of X posts in compliance with X’s privacy policies. These summaries are designed to preserve the original semantics without redistributing any raw content. They are approximate and serve only to convey the general meaning of the posts for academic and research purposes. The original post contents can be retrieved using the provided post IDs via the X API.

Included Files

X-Annotated-Dataset.csv: Manually cross-annotated X-Annotated Dataset
X-CTI-ANN-Dataset.csv: Automatically annotated X-CTI-ANN Dataset generated via self-training
X-CTI-Entire-Dataset.csv: X-CTI-Entire Dataset used to generate the X-CTI-ANN Dataset (shared for reproducibility purposes)

Description

Each dataset contains post IDs from the social media platform X and their corresponding labels. Labels are:

CTI-Positive: Post is related to cyber threat intelligence (CTI)
CTI-Negative: Post is not related to CTI

Please note that only post IDs and labels are provided, in compliance with platform policies.

Licensing

The source code in this repository is licensed under the MIT License. See the LICENSE file for details.

All data artifacts (including post IDs and paraphrased summaries) are licensed under the Creative Commons Attribution 4.0 (CC-BY 4.0) License. See the datasets/DATA_LICENSE.md file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
datasets		datasets
lists		lists
prompts		prompts
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CTI-ANN: A Self-Training-Based Annotation Framework with Tailored Augmentation for Cyber Threat Intelligence Posts

Included Files

Description

Licensing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

License

bigbases/CTI-ANN

Folders and files

Latest commit

History

Repository files navigation

CTI-ANN: A Self-Training-Based Annotation Framework with Tailored Augmentation for Cyber Threat Intelligence Posts

Included Files

Description

Licensing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages