Using Machine Intelligence to Understand the Student Loan Problem

Posted by Jon Millis

06.05.2016 12:30 PM

In March, the US Department of Education released its latest College Scorecard to “provide insights into the performance of schools eligible to receive federal financial aid, and offer a look at the outcomes of students at those schools.” Fortunately for us data-driven strategists (read: nerds) at Nutonian, the government also released the raw data it used to drive at its summary results and findings.

While Washington’s number-crunchers did a nice job increasing transparency about each college’s strengths and lifetime earnings ROI, there was one angle that was noticeably absent given the election cycle: a deep-dive into loan repayment rates. With so many students and families adamant that the current loan structure is broken and leads to a blatant poverty trap, why haven’t more analysts dug into this question? How flawed are current loan costs, if at all, and what leads to students being unable to pay them off?


We put machine intelligence to the test to automatically sift through the College Scorecard data set to highlight the most important relationships and predictive variables that influence loan repayment. It’s important to note that we only used the quantitative inputs available to us, and consequently, variables like motivational drive, professional network, etc. will not show up in our models, despite potentially playing a significant role in a student’s ability to repay his/her loans.

Our focus will be on the Scorecard variable “Repayment Rate 7 Years from Graduation”, or the percentage of students able to make any contribution to their loans 7 years after graduating college. The first step in using Eureqa is simply formulating a question. In this case, we’ll ask: “What causes low repayment rates?”

Using Eureqa, we built a model to predict the likelihood a school will have a repayment rate below 80% after 7 years. More interestingly, we were able to quickly identify a few of the drivers (“features”) of repayment.

After running Eureqa for five minutes, we found that repayment rate is:

  • Positively correlated with parent/guardian income – The higher the family’s income, the more likely the student is to repay his or her loans.
  • Negatively correlated with a school’s percentage of students on loans – The higher the proportion of a school’s students that are on loan programs, the less likely a student is to replay his or her loans.
  • Negatively correlated with a school’s percentage of non-white students– The higher a school’s proportion of non-white students, the less likely a student is to repay his or her loans.
  • Negatively correlated with a school’s acceptance rate – The higher a school’s acceptance rate, the less likely a student is to repay his or her loans.

Figure 1, below, shows the likelihood that students will default on their loans (y-axis) plotted against that student’s family income (x-axis). Default rate is remarkably high until family income hits about $60,000, and then it plummets. Let’s think about that for a second. If a family is making less than $50,000 per year, it’s more likely than not that their child will default on a loan payment and incur even more expenses as a penalty. For a lower or middle-class family hoping to send its child to school to climb the economic ladder, the system, to put it mildly, is not doing them any favors.


So what steps could the government take in addition to reexamining the pricing structure of their loan rates? Economists agree that successful completion of a college degree trends with better outcomes not just for an individual, but for society as a whole. College degrees generally spell higher incomes and intellectual capital, both of which college graduates use to enrich other people around them. One way the US government tries to “nudge” more people in the direction of a college degree is by issuing Pell Grants, or financial assistance packages to students that don’t need to be repaid. Most Pell Grants sit between $3,700 and $5,700 per year.

Unfortunately, there’s a positive correlation between the percentage of a school’s students receiving Pell Grants and students’ likelihood of default. Schools with a higher ratio of Pell Grant recipients tend to experience higher rates of default on their loans, even though Pell Grants are intended as a direct subsidy to chip away at student expenses. This suggests that Pell Grants may not do enough to help students escape their debt.

How about another interesting finding. What role does faculty quality have on student success? There’s a linear relationship between faculty salary and graduation rates: The higher the average monthly salary of a school’s professors, the higher the percentage of students that graduate within six years. This could indicate that the highest-paying schools draw the best professors, who pass off a higher-quality work ethic and knowledge to their students. Or, of course, these variables could simply be correlated, and students who are more likely to graduate college from the very beginning follow the schools with the pricier professors.

The results of the College Scorecard won’t rattle the earth with their insights, but they do bring to light potential problems inherent in the US college system. Ideally, we’d like to see the Department of Education collect more data about quantifiable loan rates, students and their characteristics so we can go deeper into the causes of loan default, and rely less on one-dimensional data like family income and ethnicity. From there, we may be able to determine an “optimal” loan rate that considers the trade-off between student/societal value achieved from an affordable education, and the government’s ability to keep its loaning sustainable.


Topics: College Scorecard, Machine Intelligence

The Perils of Black Box Machine Learning: Baseball, Movies, and Predicting Deaths in Game of Thrones

Posted by Jon Millis

22.04.2016 10:17 AM

Making predictions is fun. I was a huge baseball fan growing up. There was nothing quite like chatting with my dad and my friends, crunching basic statistics and watching games, reading scouting reports, and finally, expressing my opinion on what would happen (the Braves would win the World Series) and why things were happening (Manny Ramirez was on a hot streak because he was facing inexperienced left-handed pitchers). I was always right…unless I was wrong.*

One of the reasons business people, scientists and hobbyists like predictive modeling is that in many cases, it allows us to sharpen our predictions. There’s a reason why Moneyball became a best-selling book, as it was one of the first widely publicized examples of applying analytics to gain a competitive advantage, in this case by identifying the most important player statistics that translate to winning baseball games. Predictive modeling was the engine that drove the Oakland A’s from a small budget cellar-dweller to a perennial championship contender. By being able to understand the components of a valuable baseball player – not merely predict their statistics – the A’s held on to a valuable advantage for years.


High five, Zito! You won 23 games for the price of a relief pitcher!

The A’s were ahead of their time, focusing on forecasting wins and diagnosing the “why”. With this dual-pronged approach, they could make significant tweaks to change future outcomes. But many times, predictive modeling is different, and takes the form of “black box” predictions. “Leonardo DiCaprio will win an Oscar,” or, “Your farm will yield 30 trucks-worth of corn this season.” That’s nice to hear if you’re confident your system will be right 100% of the time. Sometimes you don’t need to know the “why”; you just need an answer. But more often, if you want to be sure you’ll be getting an accurate prediction, you need to understand not only what will happen, but why it will happen. How did you come to that conclusion? What are the different inputs or variables at play?

If, for example, a machine learning algorithm predicts that Leonardo DiCaprio will win an Oscar – but one of the deciding variables is that survival movie stars always win the award if they wear a black bow tie to the awards ceremony – we would want to know this, so we could tweak our model and remove the false correlation, as it likely has nothing to do with the outcome. We might consequently be left with a model that now only includes box office sales, number of star actors, type of movie, and the number of stars given by Roger Ebert. This model is one we can be more confident in because as movie buffs, we’re mini “domain experts” that can confirm the model makes sense. And to boot, we can have full insight into why Leonardo will win the Oscar, so we can place a more confident bet in Vegas. (You know…if that’s your thing.) Operating in the “black box” confines of most machine learning approaches would render the previous iterations impossible.


I don’t always win the Oscars…but when I do, I wear a black bow tie.

That’s why my head continues to spin at the mistakenly godlike, magic bullet appeal of black box modeling. It’s fantastic for certain applications: specifically, answering the question, “What will happen?” when you don’t care how the answer is derived, so long as it’s right. One example of this could be a high-frequency trading application that executes a trade based on the fact that its algorithm can predict with 95% accuracy when a stock will appreciate.

But for most things, the value of a prediction is understanding the “what will happen” and the “why”. I almost shook my computer with frustration this morning when I read that a team of researchers at the Technical University of Munich had used artificial intelligence (more specifically, machine learning) to predict that Tommen Baratheon would be the first character killed off in the upcoming season of Game of Thrones – but didn’t give any indication of how or why that will happen. It’s because the algorithm said so. Are you kidding me, guys?! That’s like saying, “Jon, you will eat a horrendous dinner tonight that will verrrrry likely leave you violently ill for days, buuuuuut unfortunately come back later to see how that happened or where you ate for dinner, because we just don’t feel like telling you.” What good is a prediction without context and understanding? Will I get sick from the spinach in my fridge, from bad meatloaf at a restaurant, or from a coworker who decided to come over and sneeze on my food as I’m finishing up this blog post far too late on a Thursday night?? (Stay away from my food, Michael. STAY AWAY!!!) Without that context, I can’t make any change to improve my outcome of being home sick as a dogface for an entire week.

There’s a reason that people see artificial intelligence and machine learning as fairy dust. A lot of the time, it works, but it’s hard to use, requires technical expertise, and it frequently operates in a total black box. I like to understand my predictions. That’s why when I was a 10-year-old kid, I decided I’d work on bringing machine intelligence – the automated creation and interpretation of models – to the world and join Nutonian. Well…that may not be entirely true. More likely, I was trying to predict how well I’d have to hit a curveball to make it to the MLB.


*Sayings like this always remind me of one of my favorite off-the-field players of all-time, Yogi Berra, a Hall of Famer known as much for his wit and turns of phrases as his talent:


Topics: Baseball, Game of Thrones, Machine Intelligence, Machine learning

Machine Intelligence Strips Off Our Data Science Blinders

Posted by Guest Author

07.10.2015 10:00 AM

by Dan Woods

In our increasingly digital lives, we have been trained to trust the way that technology works. That is, right up until it doesn’t.

Consider a GPS. A lot of powerful technology is used to correctly make an optimal GPS route. Few people understand why their GPS system chooses the routes that it does, but we’ve come to simply accept its recommended navigation directions because they tend to be good enough. It’s OK even when the predicted route doesn’t work – say, it prompts you to turn the wrong way on a one-way street or you run into construction and need to make a detour – we have corrective mechanisms in place to override its instructions.

However, accepting blinders on data-driven solutions can be dangerous. The higher the cost of a mistake, the higher the consequences are for false positives and false negatives. Have you ever started internet sleuthing and found a symptom checker that declared that your runny nose and painful headache meant you had cancer? Instead of being gently let down by your exasperated doctor the next morning, imagine if the hospital immediately enrolled you in chemotherapy treatment based solely on this output. While this is an extreme example, outsourcing too much responsibility to machines could lead to mistakes just as costly.

A fundamentally new approach to data science is needed to accomplish this partnership – one that allows each side to equally communicate ideas and strategies to each other, rather than one side dictating the constraints of the connection. This approach is machine intelligence, with the driving philosophy that the partnership between man and machines is greater than the sum of its parts.

Nutonian’s machine intelligence system, Eureqa, doesn’t put blinders on users. In fact, the system purposefully shows its work, surfaces user-friendly ways to reach advanced results, and encourages rapid iteration to incorporate the user’s domain expertise into the results. Regardless of technical expertise, users all across the organization can use Eureqa to discover new business strategies, while retaining the ability to audit and correct sub-optimal paths before committing to them.

The abundance of data in the business world needs more than a one-sided discussion. Use machine intelligence to open up a new horizon of possibilities in the golden age of analytics.



Dan Woods is CTO and founder of CITO Research. He has written more than 20 books about the strategic intersection of business and technology. Dan writes about data science, cloud computing, mobility, and IT management in articles, books, and blogs, as well as in his popular column on

Topics: Eureqa, Golden Age of Analytics, Machine Intelligence

Intelligent Partners: Man and the Machine

Posted by Guest Author

30.09.2015 10:30 AM

by Dan Woods

When it comes to the creative processes inherent in predictive modeling, it is time for a new paradigm, one in which the user and machine learning work in tandem to achieve better results than could be achieved working separately. Nutonian’s vision for this is machine intelligence.

What’s important to understand is how collaboration between people and machine intelligence powers statistical creativity. However, this new paradigm first requires unlearning some of the patterns established by early forms of artificial intelligence.

Consider a game of chess. A chess master has a great memory and can evaluate a lot of positions, but that’s child’s play compared to Deep Blue. This chess-playing computer is known for being the first piece of artificial intelligence to win both a chess game and a chess match against a reigning master. Deep Blue can evaluate every possible move it can take at each turn, considering 200 million positions every second.

While Deep Blue’s power to play great chess is an awesome achievement, we need to put it into context. Deep Blue’s wins were the culmination of 12 years of development towards an extremely specialized task, and its potential moves were reliant on a static list of previous games. The computer can’t invent new moves that weren’t already in its database, and it would have to start back from square one if the rules of chess ever changed.

Imagine instead that the chess master and Deep Blue were on the same side of the table, working together. What if the two could communicate? Collaboratively creating and vetting potential strategies – one using his hard-earned expertise to handle new information and uncommon situations and the other using its vast database to discover optimal strategies and provide a sounding board? Wouldn’t this combination be more powerful?

The machine intelligence paradigm puts man and machine learning on the same team as equal partners. While Nutonian’s Eureqa automatically generates potential solutions through a powerful evolutionary search process, it communicates how it arrived at its results and flexibly accommodates outside guidance. This transparency allows anyone to incorporate their expertise into the system and seed the next round of discovery.

Nutonian believes that the best results happen when the user and the machines work together as partners in the process of invention. This productive partnership between man and machine heralds the golden age of analytics.



Dan Woods is CTO and founder of CITO Research. He has written more than 20 books about the strategic intersection of business and technology. Dan writes about data science, cloud computing, mobility, and IT management in articles, books, and blogs, as well as in his popular column on

Topics: Eureqa, Golden Age of Analytics, Machine Intelligence

Are Machines Partners or Foes?

Posted by Guest Author

22.09.2015 10:30 AM

by Dan Woods

The exploitation of data in the business world demands a new data-driven approach to innovation. Human-driven data analysis needs to make way for new machine-driven methods capable of handling access to the new abundance of data. However, much hysteria has been recently directed at the dangers of big data and over-reliance on Artificial Intelligence (AI). Is this fear warranted, or is it just much ado about nothing?

Some science and technology experts have called AI “our greatest threat,” one which may spell “the end of the human race.” In its specific use with business data, many more have decried the perils of using “big data” to predict the future. The traditional data science approach is blind to unquantifiable factors and can be fooled by misleading correlations, but many businesses deal with sensitive subjects that require informed judgment and imprecise factors. On a more personal level, if even a skilled career like data science can be automated by a machine, what is there left for the rest of us?

What is still important to remember is that machines have their own strengths and weaknesses, just as humans do, and both sides have important roles to play in supporting each other. Machine algorithms have the capacity to churn through endless amounts of data but are subject to the biases of how they were programmed and limited by the inputs they are given. Humans can synthesize decades of experience into bursts of creativity but struggle to visualize data once it goes beyond three dimensions.

Here is where machine intelligence comes in. Machine intelligence allows its users, regardless of technical expertise, to harness and guide the power of today’s virtually unlimited compute power while encoding nuance and domain expertise from the user into the results. While automation allows machine intelligence to create new predictive models, the results are specifically designed to be transparent, interpretable and interactive. The end user can investigate how the system arrived at its conclusions and easily kick out false correlations, recognize mismatches with business realities, audit the robustness of potential models to assuage stakeholder concerns, and export the results into any number of other systems for further analysis.

Instead of perpetuating the machine vs. man rhetoric, Nutonian’s introduction of machine intelligence establishes a new machine-as-partner paradigm. Allowing the strengths of each side to bolster each other’s weaknesses empowers businesses to scale their data science initiatives across all levels and functional areas, exponentially increasing their analytical capacity to answer high-value questions. Augmentation, not replacement, is the key to the golden age of analytics.



Dan Woods is CTO and founder of CITO Research. He has written more than 20 books about the strategic intersection of business and technology. Dan writes about data science, cloud computing, mobility, and IT management in articles, books, and blogs, as well as in his popular column on

Topics: Eureqa, Golden Age of Analytics, Machine Intelligence

Follow Me

Posts by Topic

see all