Skip to content

Commit

Permalink
add overview of inputs and outputs to readme
Browse files Browse the repository at this point in the history
  • Loading branch information
betsybookwyrm committed Dec 8, 2021
1 parent 15dcf9a commit 5004bb5
Showing 1 changed file with 32 additions and 0 deletions.
32 changes: 32 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,38 @@ and we can't guarantee no breaking changes either of library interface or databa
notably, the database schema will have a significant change to allow multiple JSON files to be loaded into the same
database file.

## Contents

- [Collecting Twitter Data](#collecting-twitter-data)
- [Input and Output](#input-and-output)
- [Installation](#installation)
- [Usage](#usage)
- [About tidy_tweet](#about-tidy_tweet)

## Collecting Twitter data

If you do not have a preferred Twitter collection tool already, we recommend [Twarc](https://github.com/DocNow/twarc/).
tidy_tweet is designed to work directly with Twarc output. Other collection methods may work with tidy_tweet as long
as they output the API result from Twitter with minimal alteration (see [Input and Output](#input-and-output)), however
at this time we do not have the resources to support Twitter data outputs from tools other than Twarc.

## Input and Output

### Input: Twitter results pages

tidy_tweet takes as input a series of JSON/dict objects, each object of which is a page of Twitter API v2 search or
timeline results. Typically, this will be a JSON file such as those output by `twarc2 search`.

JSON files with multiple pages of results are expected to be newline-delimited, with each line being a distinct results
page object, and no commas between top-level objects.

### Output: Sqlite database of tweets and metadata

After processing your Twitter results pages with tidy_tweet (see [Usage](#usage)), you will have an
[SQLite](https://sqlite.org/index.html) database file at the location you specified.

Database schema will be published here as soon as the initial schema is finalised.

## Installation

tidy_tweet is a Python package and can be installed with pip.
Expand Down

0 comments on commit 5004bb5

Please sign in to comment.