-
-
Notifications
You must be signed in to change notification settings - Fork 153
The CourtListener Development Database
For FLP staff and the occasional volunteer, we now host a full-ish copy of the FLP database in AWS. This can be really useful when:
- The issue you're debugging can't be reproduced without a bunch of data
- You need to test performance against a stupidly large collection of data
- You want to have approximate counts of live data
The way this works is that we use replication to copy the majority of the CL dataset to a special DB in AWS, then we sever the connection, making it a snapshot of the database.
The database was last synced on 2023-10-24T21:40:00Z
Access to the database requires three things:
- You must be FLP staff or be approved by an FLP director.
- Your IP address has to be placed on the allowlist.
- You need the host, username, and password.
FLP staff has access to the AWS security group that controls permission to this database. They can go to this link and then add an IP address:
To add an IP address you must add it to the "Inbound" rules. In the Actions drop down, select "Edit Inbound Rules", and you'll see a page like this:
Click the button towards the bottom to add a new rule, set the type to Postgres, and in the Source column, select "My IP". Use /32
as your netmask if it's not on there already. Name the rule like the others: "Mike at Work" or "Mike at Bali" or whatever.
Save.
Test that this worked by doing:
psql -h dev.courtlistener.com
You don't need the password to test that. If that connects and asks for a password, good news, you did it. If that doesn't connect, read on about proxies.
It appears that a lot of browsers use proxies. Safari seems to do this by default. This means that the "My IP" trick above won't work and you won't be able to connect to the DB. Even asking Google your IP won't work. Your browser has a different IP than your postgres client and the rest of your laptop.
Get around this by getting your IP on the command line:
curl ipquail.com
Take that value, give it a netmask of /32
, and put that into the inbound rules for the security group. Save and test as above.
This database does not have user tables, but accessing it does get you into the FLP infrastructure. We want to keep this really tight and take this seriously.
To exchange passwords, there are two and only two approved apps: Whatsapp and Signal.
- As a sender, set the conversation to have expiring messages of one week or less, then send the information.
- As a recipient, copy the information to a password manager. Bitwarden and 1Password are preferred.
Once you have your IP allowed and know the password, the rest is easy:
psql -h dev.courtlistener.com --dbname courtlistener --user django
That is:
- Host: dev.courtlistener.com
- Database name: courtlistener
- User: django
We need to figure out how to do this. Probably we'll create a new database in the RDS instance (CREATE DATABASE XYZ
), and nuke the old one.
Yes, a few:
- Don't go building things with it outside of what you said you'd do. If you're a dev, this is for developing on behalf of FLP. If you're a volunteer, this is for completing some discrete project.
- Try not to mess up the data too much. You're not the only one using it, so don't delete a table or something, unless you are prepared to confess that you did so, and spend time fixing it. Sure, tweak here and there, but try to keep it largely intact if you can.
- You can create users and user-stuff, like alerts and favorites. All the regular tables are in place, they're just empty.
Severing the connection to the upstream database allows the data in the dev DB to be different than the data in the prod DB. If we didn't do this and somebody created an item in the dev DB (because they're testing something, say), that could create a conflict when the same item tried to sync from upstream.
Making things more annoying, it's not like this would break immediately. It'd only break hours, days, or weeks later when the data in prod changed for some reason. Somehow that always seems to happen on the weekend, so we do a one-time sync, and we cut the connection, making it a snapshot, not a replica.
Two things aren't in here:
- As stated above, we stop syncing as of whenever it is launched. So stuff that's newer than that or changes after that point are missing.
- No user data (though the tables are in place).