The goal of the analysis, is to accurately predict the likelihood of a bill being passed into law, for a given bill. This coupled with the topic area classificaion can help the ACLU (and other advocacy organizations) determine where to target their advocacy resources.
We stratified the prediction task interms of time, and geography. In terms of time, we are developing models for short-term predictions and long-term predictions and in terms of geography, we are developing models that consider all state legislatures as a whole, and models that are targeted for each state legislature.
In the short-term predictions, the model predicts the likelihood of the bill being passed into law in the immediate future, e.g. next week/month, while in the long-term predictions, the model predicts the likelihood of the bill being passed into law in the given legislative session. Both scores can be important to prioritize their resources.
State legislatures vary significantly across the country in their size, legislative process, session length, bill passage success rates, and data availability. This variability suggests that modeling efforts specialized to a specific state could potentially better results. As an initial step, we are training models that consider data from all states. Parallelly, we are picking several states to develop more specialized state-wise models as well.
Cohort Definition: Each week, the model will score all the bills that are active (that hasn't been passed/failed in a vote/vetoed) and has had some acitivity in the last two months.
For the country-wide model bills from the whole country is considered in the cohort, and for state-wise models, only the bills from their respective state is considered in the cohort.
Label Definition: The label will indicate whether the bill will be passed into law in a certain timespan in the future.
In the short-term models, the timespan is set to one month, while in the long term models it is set to one year.
The models are trained with features that capture the context around the bill and its content. The context of the bill is captured by including the following information:
- Bill information such as its type, the introduced chamber, and its age
- Information about the events that the bill has gone through
- Information about the legislative make up with respect to parties
- Information about bill sponsors
- Information about the bill content
The model trainng and evaluation was conducted using Triage, a general purpose modeling framework for risk modeling in public policy contexts. A configuration file is used to set up the components of the experiment. The configuration files used for experimentatation are here. The details about setting up triage configuration files are here.
Triage is used to:
- Create the temporal data splits for temporal validation
- Generate the train-test matrices using the provided cohort, label and feature definitions
- Train and evaluate the ML model grid
This setup is used to predict both the long-term and short-term passage models and select the models that will be used to predict forward.
We predict the short-term and long-term passage likelihood of active legislative bills every week using the selected models. The bill_passage_inference.py
is used to run the predict forward pipeline.
Each week, models are retrained using the most recent historical data. Then, the retrained models are used to predict the short-term and long-term passage likelihoods for active bill as of the prediction dates.
This pipeline leverages components of Triage to perform the following:
- Generate cohort, features, and labels for training matrix
- Train the models
- Predict forward.
Once the predicted scores are written to the DB, the scores are made available to the public through a user interface and a dwnloadable spreadsheet.