PUBLISHED October 2, 2016 IN Research

Cynthia Rudin: Training Computers to Find Patterns That Humans Miss

Part of the Building the Faculty, 2016 Series

Computer scientist Cynthia Rudin is looking at the next step in artificial intelligence studies ©2016 Kevin Seifert Photography

Robin A. Smith @dukeresearch

A few years back, the police in Cambridge, Massachusetts had a problem: Burglaries were on the rise.

To narrow down a list of suspects without fingerprints or DNA, crime analysts had to spend hours each day sifting through crime reports and comparing them to past crimes by hand.

Add to this the fact that home break-ins are notoriously difficult to solve. They often take place when no one is watching. Fewer than one in seven burglaries get solved nationwide.

Associate professor Cynthia Rudin, who came to Duke this year from MIT, thought she and her team might be able to help figure out a solution.

Rudin and her colleagues developed an algorithm -- basically a set of instructions -- that allows computers to scour massive amounts of crime data automatically and uncover connections between break-ins that humans may have missed.

The algorithm, Series Finder, trawls through crime data looking for patterns of similar behavior. Some criminals operate during the day, others work at night, some burglars target apartments, others houses. If several break-ins happen in the same neighborhood, around the same time of day, and almost all of them involve unlocked back doors and stolen wallets, then the crimes could be linked to the same person.

Rudin’s student Tong Wang trained a computer program on data from more than 7,000 break-ins that happened in Cambridge between 1997 and 2011, some of which were known to be committed by the same person and some were not.

Once the program learns a pattern, it is able to adapt, making predictions that improve over time as it is exposed to new data.

In a blind test, the algorithm was able to find a series of related crimes that had taken human crime analysts six months to find.

It also found multiple patterns the crime analysts originally failed to spot. Ten break-ins that happened over a six-month period in the Cambridgeport neighborhood were previously thought to be unrelated, but were later linked to the same suspect using Series Finder.

“When we presented our findings, the senior analyst in the Cambridge Crime Analysis Unit got up and shook my graduate student’s hand,” Rudin said. “It was like finding a needle in a haystack.”

Rudin and her students Jiaming Zeng and Berk Ustun have also developed models to better predict whether someone is likely to stay out of prison once they are released.

About two-thirds of people released from prison in the U.S. are arrested again within three years of getting out of jail.

Using data from a national study that tracked more than 38,000 just-released inmates for three years, Rudin’s team produced recidivism risk scores that were just as accurate, but easier to understand, than alternative methods. The hope is that judges and other officials will be better able to use such scoring systems to help decide who to release, or how to allocate social services to help parolees avoid becoming repeat offenders.

“Computers do not have the same biases that humans do,” Rudin said.

Rudin is an expert in using big data and a branch of artificial intelligence called machine learning to help people make better decisions.

“Our goal is to design machine learning models that are understandable, so people are more likely to use them and trust them,” Rudin said. “They can’t trust models they don’t understand.”

Rudin doesn’t just use her research to analyze crime data. She and her collaborators have developed machine learning algorithms that accurately predict strokes, dementia, even power outages.

In 2015, her team used methods similar to the ones they developed to predict recidivism to help doctors better identify patients at risk of being readmitted to the hospital within a month after being discharged.

They have also developed new methods to help doctors and patients predict what a person might suffer from in the future based on symptoms and conditions they experienced in the past, along with data from other patients with similar medical histories.

Similar to the way Netflix and Amazon recommend titles and products you might like -- “people who bought this also bought this” -- their tool lets computers make personalized health predictions, such as “patients like you who have had high cholesterol and taken certain medications in the past may be more likely to suffer a heart attack in the future.”

“No doctor can keep a whole database of medical records in their head and make accurate predictions from it,” Rudin said.

With support from a six-year, $480,000 CAREER award from the National Science Foundation, she has been developing machine learning algorithms for ranking.

Her models have helped power companies rank manholes in New York City by how likely they are to catch fire or explode.

Manhole covers explode when the insulation for the electrical cables inside breaks down and a spark from the wiring starts a fire. When they blow, several city blocks can lose power at once. Predicting such events in advance could make it possible to send a repair crew to prevent them before they happen.

Rudin grew up in Buffalo, New York. She earned a bachelor’s degree from the University at Buffalo, where she studied physics, math and music theory.

She received a PhD in applied and computational mathematics from Princeton University in 2004, then held positions at New York University and Columbia University before landing at MIT in 2009.

In 2016 Rudin joined the computer science and electrical and computer engineering departments at Duke. She has secondary appointments in statistics and mathematics, and directs the Prediction Analysis Lab.

Rudin and her family live in Chapel Hill.