From 5004bb5372bdb8f194c7fb52f2d7a3a7577c49d3 Mon Sep 17 00:00:00 2001 From: Elizabeth Alpert Date: Wed, 8 Dec 2021 13:51:56 +1000 Subject: [PATCH] add overview of inputs and outputs to readme --- README.md | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) diff --git a/README.md b/README.md index c2d8226..a981758 100644 --- a/README.md +++ b/README.md @@ -12,6 +12,38 @@ and we can't guarantee no breaking changes either of library interface or databa notably, the database schema will have a significant change to allow multiple JSON files to be loaded into the same database file. +## Contents + +- [Collecting Twitter Data](#collecting-twitter-data) +- [Input and Output](#input-and-output) +- [Installation](#installation) +- [Usage](#usage) +- [About tidy_tweet](#about-tidy_tweet) + +## Collecting Twitter data + +If you do not have a preferred Twitter collection tool already, we recommend [Twarc](https://github.com/DocNow/twarc/). +tidy_tweet is designed to work directly with Twarc output. Other collection methods may work with tidy_tweet as long +as they output the API result from Twitter with minimal alteration (see [Input and Output](#input-and-output)), however +at this time we do not have the resources to support Twitter data outputs from tools other than Twarc. + +## Input and Output + +### Input: Twitter results pages + +tidy_tweet takes as input a series of JSON/dict objects, each object of which is a page of Twitter API v2 search or +timeline results. Typically, this will be a JSON file such as those output by `twarc2 search`. + +JSON files with multiple pages of results are expected to be newline-delimited, with each line being a distinct results +page object, and no commas between top-level objects. + +### Output: Sqlite database of tweets and metadata + +After processing your Twitter results pages with tidy_tweet (see [Usage](#usage)), you will have an +[SQLite](https://sqlite.org/index.html) database file at the location you specified. + +Database schema will be published here as soon as the initial schema is finalised. + ## Installation tidy_tweet is a Python package and can be installed with pip.