title | subtitle | author | date | output | ||||
---|---|---|---|---|---|---|---|---|
Walkthrough: Tracking the spread of the debunked Plandemic video |
How the DFRLab used CrowdTangle and CoorNet to track the spread of a viral conspiracy video on Facebook |
Zarine Kharazian, DFRLab |
June 28, 2020 |
|
Despite efforts by major platforms to limit its spread, copies of the widely debunked conspiracy video “Plandemic” continued to multiply and spread largely through niche online conspiracy communities in early May 2020.
The DFRLab used the CrowdTangle API and an R package called CooRNet, developed by Fabio Giglietto, Nicola Righetti, and Luca Rossi, to track the spread of the viral conspiracy through hundreds of Facebook groups.
This document walks through the data analysis portion of the research, providing reproducible code for the key visualizations.
The first step was to get a dataset from CrowdTangle of posts promoting the Plandemic conspiracy shared to public Faebook groups that also contained URLs. The goal here was to capture posts that linked to either a copy of the video hosted off of Facebook, such as on YouTube or dedicated domains, or to other content that furthered the conspiracy (blog posts, op-eds, etc).
We created a search for posts containing the Plandemic video in CrowdTangle. Then, we used the CrowdTangle Historical Data feature to get all of the posts from the saved search containing links that were posted between May 3, 2020 - May 10, 2020.
CooRNet
is an R package that detects "coordinated link sharing behavior," which it defines as when public Facebook entities, such as pages and groups, repeatedly share the same links within an unusually short period of time from each other.
What constitutes an “unusually short period” of time is defined by the “coordination interval,” which CooRNet calculates algorithmically. The rationale of using this measure as a proxy for coordination is that it would be unlikely that different Facebook entities would share the same links as one another within that unusually short period of time on a repeated basis.
In this analysis, we were not as interested in capturing coordination as we were in mapping the rapid spread of the Plandemic conspiracy through Facebook groups. The below analysis, therefore, is not necessarily evidence of coordination on the part of the disparate Facebook assets; rather, it suggests a pattern of rapid link-sharing related to Plandemic throughout hundreds of different conspiracy communities, demonstrating the conspiracy's crossover appeal and the shared dynamics among these communities.
We started out by following the tutorial available on the CoorNet site to extract a list of entities engaged in rapid linksharing. This series of steps, especially calling get_ctshares
, may take a while, as it queries the CrowdTangle API. NOTE: To make the process considerably faster, you can request a rate limit increase from CrowdTangle using this form. You can also set the sleep_time
parameter in the get_ctshares
function to 1
to reduce the sleep time between calls, as I have done below.
#From the tutorial:
urls <- get_urls_from_ct_histdata(ct_histdata_csv="/Users/zkharazian/Downloads/2020-05-09-21-40-09-EDT-Historical-Report-plandemic-2020-05-03--2020-05-10.csv")
ctshares <- get_ctshares(urls, "url", "date", sleep_time = 1 clean_urls = TRUE)
output <- get_coord_shares(ctshares, parallel = TRUE, clean_urls = TRUE, keep_ourl_only = TRUE)
get_outputs(output, ct_shares_marked.df = TRUE, highly_connected_g = TRUE, highly_connected_coordinated_entities = TRUE)
We now have a dataframe of highly_conneted_coordinated_entities
that repeatedly shared the same URLs within the coordination interval.
And now we'll display the top 50 entities sorted by coord.shares in an inline table:
# Load DT package for displaying inline tables
library(DT)
# Display inline table of top 50 Facebook groups identified by CooRNet, sorted by coord.shares
datatable(head(highly_connected_coordinated_entities_names, 50), options = list(order = list(list(3, 'desc'))))
<\/th>\n | account.name<\/th>\n | avg.account.subscriberCount<\/th>\n | coord.shares<\/th>\n | degree<\/th>\n | strength<\/th>\n <\/tr>\n <\/thead>\n<\/table>","options":{"order":[[3,"desc"]],"columnDefs":[{"className":"dt-right","targets":[2,3,4,5]},{"orderable":false,"targets":0}],"autoWidth":false,"orderClasses":false}},"evals":[],"jsHooks":[]}</script>
But what is the threshold that defines a rapid link share for these highly connected entities? To determine that, we ran the cord_int<-estimate_coord_interval(ctshares, q=0.1, p=0.5)
cord_int
This returned a coordination interval of 14 seconds. A link share between two groups that occurred within 14 seconds is defined as unusually rapid, relative to the rest of the dataset. The Top Shared URLsWe also wanted to a plot of the top URLs in the dataset that were being rapidly shared among the groups. We first got the list of URLs using the # Get top URLs
top_urls_all <- get_top_coord_urls(output, order_by = "shares", component = FALSE, top = 6)
#Drop unwanted columns
top_urls_all <- select(top_urls_all, expanded, shares, engagement)
# Display as inline table
datatable(top_urls_all) %>%
formatStyle(names(top_urls_all), lineHeight='1%')
|
---|