Skip to content

reschultzed/cnn-headlines

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

cnn-headlines

CNN Program Headlines, 1992–2022

This repository contains a collection of program titles and headlines for news programs of the Cable News Network (CNN), complete between January 1, 1992 and the most recent update on May 10, 2022. For the full transcribed text of the programs whose titles are indexed here, see CNN's website (for programs aired after April 4, 2001) or LexisNexis (for programs before that date).

This dataset consists of two large files. cnn_full_combined.tsv contains the titles of 478,177 programs and their respective airdates. Many of these program titles contain several distinct news headlines separated by semicolons, indicating that several different stories were covered on that program. This practice became significantly more common in March 2004, leading to a dramatic reduction in the quantity of data. cnn_full_separated.tsv contains 1,068,713 headlines, with the aforementioned program titles split up into distinct rows of data for each headline they contain, for greater comparability across the March 2004 boundary.

The dataset is a useful resource for studying how CNN has covered events over the course of three decades, and can be used as a rough proxy for general public awareness of current events in the United States.

About

CNN Program Headlines, 1992–2022

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published