Using Machine Intelligence to Understand the Student Loan Problem

Posted by Jon Millis

06.05.2016 12:30 PM

In March, the US Department of Education released its latest College Scorecard to “provide insights into the performance of schools eligible to receive federal financial aid, and offer a look at the outcomes of students at those schools.” Fortunately for us data-driven strategists (read: nerds) at Nutonian, the government also released the raw data it used to drive at its summary results and findings.

While Washington’s number-crunchers did a nice job increasing transparency about each college’s strengths and lifetime earnings ROI, there was one angle that was noticeably absent given the election cycle: a deep-dive into loan repayment rates. With so many students and families adamant that the current loan structure is broken and leads to a blatant poverty trap, why haven’t more analysts dug into this question? How flawed are current loan costs, if at all, and what leads to students being unable to pay them off?


We put machine intelligence to the test to automatically sift through the College Scorecard data set to highlight the most important relationships and predictive variables that influence loan repayment. It’s important to note that we only used the quantitative inputs available to us, and consequently, variables like motivational drive, professional network, etc. will not show up in our models, despite potentially playing a significant role in a student’s ability to repay his/her loans.

Our focus will be on the Scorecard variable “Repayment Rate 7 Years from Graduation”, or the percentage of students able to make any contribution to their loans 7 years after graduating college. The first step in using Eureqa is simply formulating a question. In this case, we’ll ask: “What causes low repayment rates?”

Using Eureqa, we built a model to predict the likelihood a school will have a repayment rate below 80% after 7 years. More interestingly, we were able to quickly identify a few of the drivers (“features”) of repayment.

After running Eureqa for five minutes, we found that repayment rate is:

  • Positively correlated with parent/guardian income – The higher the family’s income, the more likely the student is to repay his or her loans.
  • Negatively correlated with a school’s percentage of students on loans – The higher the proportion of a school’s students that are on loan programs, the less likely a student is to replay his or her loans.
  • Negatively correlated with a school’s percentage of non-white students– The higher a school’s proportion of non-white students, the less likely a student is to repay his or her loans.
  • Negatively correlated with a school’s acceptance rate – The higher a school’s acceptance rate, the less likely a student is to repay his or her loans.

Figure 1, below, shows the likelihood that students will default on their loans (y-axis) plotted against that student’s family income (x-axis). Default rate is remarkably high until family income hits about $60,000, and then it plummets. Let’s think about that for a second. If a family is making less than $50,000 per year, it’s more likely than not that their child will default on a loan payment and incur even more expenses as a penalty. For a lower or middle-class family hoping to send its child to school to climb the economic ladder, the system, to put it mildly, is not doing them any favors.


So what steps could the government take in addition to reexamining the pricing structure of their loan rates? Economists agree that successful completion of a college degree trends with better outcomes not just for an individual, but for society as a whole. College degrees generally spell higher incomes and intellectual capital, both of which college graduates use to enrich other people around them. One way the US government tries to “nudge” more people in the direction of a college degree is by issuing Pell Grants, or financial assistance packages to students that don’t need to be repaid. Most Pell Grants sit between $3,700 and $5,700 per year.

Unfortunately, there’s a positive correlation between the percentage of a school’s students receiving Pell Grants and students’ likelihood of default. Schools with a higher ratio of Pell Grant recipients tend to experience higher rates of default on their loans, even though Pell Grants are intended as a direct subsidy to chip away at student expenses. This suggests that Pell Grants may not do enough to help students escape their debt.

How about another interesting finding. What role does faculty quality have on student success? There’s a linear relationship between faculty salary and graduation rates: The higher the average monthly salary of a school’s professors, the higher the percentage of students that graduate within six years. This could indicate that the highest-paying schools draw the best professors, who pass off a higher-quality work ethic and knowledge to their students. Or, of course, these variables could simply be correlated, and students who are more likely to graduate college from the very beginning follow the schools with the pricier professors.

The results of the College Scorecard won’t rattle the earth with their insights, but they do bring to light potential problems inherent in the US college system. Ideally, we’d like to see the Department of Education collect more data about quantifiable loan rates, students and their characteristics so we can go deeper into the causes of loan default, and rely less on one-dimensional data like family income and ethnicity. From there, we may be able to determine an “optimal” loan rate that considers the trade-off between student/societal value achieved from an affordable education, and the government’s ability to keep its loaning sustainable.


Topics: College Scorecard, Machine Intelligence

Leave a Reply

Your email address will not be published. Required fields are marked *

Follow Me

Posts by Topic

see all