Last summer, I participated in the NIJ’s Recidivism Forecasting Challenge. Surprisingly, the predictions I submitted won some prizes in certain categories.
Last spring, the National Institute of Justice’s (NIJ) opened a forecasting challenge that aimed to increase public safety and improve the fair administration of justice across the United States. The Recidivism Forecasting Challenge used data from the State of Georgia about persons released from prison to parole supervision from January 1, 2013, through December 31, 2015. Contestants were asked to submit forecasts (percent likelihoods) of whether individuals in the dataset recidivated within one year, two years, or three years after release using a total of 48 predictor variables.
I learned about the challenge via an email from the Social Systems Data Science Network in the College of Education. I was looking for potential datasets for a machine learning class I was assigned to teach during Fall 2021. I started playing with the dataset, gathered additional information from different data sources, and fitted a few basic models (XGBoost and Penalized regression). AUC was very low even for the best models, and I wouldn’t use these models for any high-stakes decisions; however, I decided to give it a shot and submitted my predictions without much expectation. Then, surprise! Among many categories, my entries yielded the 3rd best performance for predicting recidivism in Year 1 for male parolees, female parolees, and average accuracy. They also yielded the best 5th performance in Year 2 for female parolees. All entries were submitted in the Large Team Category with a team name CrescentStar. I was expecting more competition because the prizes were very large, but only about 20 teams participated in the Large Team Category.
All the details about the challenge can be found on the NIJ’s website. Official results were released at this link. Finally, all my code, datasets, entries etc., were released in this Github repo. The repo also includes a final report summarizing the steps I implemented and models I built.
If you are into similar challenges, I discovered https://www.challenge.gov/ that provides a list of active challenges opened by different government agencies. There are some interesting datasets and problems. From time to time, some challanges are coming out for those in education or social sciences may be interested.