Peter Hoff: Reckoning with Big Data

Portrait of Duke Professor Peter Hoff
Peter Hoff creates statistical tools to interpret data across disciplines. Photo by John Joyner/Duke University Photography

Duke statistician Peter Hoff finds great beauty in his chosen field: the precision, the order, the binary, true or false outcomes. But for him, statistics is also a gateway to a much larger world.

 “To me, statistics is ultimately trying to address the problem of how we learn about the world around us,” Hoff said. “If you are a statistician and you develop an interesting statistical method, it could be applicable to biology, to social sciences, to physical sciences. You get exposure and you get to learn about all these different disciplines.”

Hoff, who joined the Duke Department of Statistical Science in July after 16 years on the faculty at the University of Washington, specializes in building statistical tools to analyze network or “relational” data. These types of data, which document the complex, changing sets of interactions between different individuals within a group, are currently popping up in all areas of research, from the social sciences to genomics.

Hoff’s tools are designed to extract patterns and meaning from these wide-ranging subjects, which can vary from friendships within social networks and relationships between countries on the international stage, to interactions between different sets of proteins within a cell.

“I’m interested in trying to understand patterns in these networks, and also what factors lead to the formation of ties between people, between countries, between objects in general,” he said.

Born in Michigan and raised in Indiana, Hoff always enjoyed math and science. But he first fell in love with statistics as a discipline while an undergraduate at Indiana University.

“Statistics ended up being the perfect thing for me because it was a way to do math and computation, which I enjoy aesthetically, but also it’s an avenue through which I can learn about lots of different types of science,” Hoff said.

After earning a doctorate in statistics at the University of Wisconsin, he joined the faculty at the University of Washington-Seattle in 2000. While there, he authored an introductory text on Bayesian Statistics, and started building tools for making sense of twenty-first century data.

“The types of data that people are gathering now are different than the types of data that people gathered ten, twenty years ago,” Hoff said. “And so any time you have a new data structure, a new type of data, you need to develop new statistical methodologies for it.”

One of the beauties of creating these statistical tools, he says, is that sometimes the same approach can be applied to a great variety of subjects. He recently created a tool to sift out the correlations between gene expression levels of Leukemia patients that is now being applied to a fruit fly’s metabolism changes throughout the life cycle.

Hoff was brought to Duke as part of the Provost’s Quantitative Initiative, a $10 million dollar investment in hiring world-class faculty specializing in statistics, mathematics, computer science and engineering. The initiative has a particular emphasis on attracting interdisciplinary researchers who are likely to have a broad impact across multiple disciplines at Duke, including the physical sciences, social sciences, engineering and medicine.

“Peter Hoff is an outstanding representation of what the quantitative initiative hopes to achieve,” said Lawrence Carin, Vice Provost for Research at Duke. “He’s one of the foremost statisticians in the world, and his research touches many other disciplines beyond statistics, particularly the social sciences and health.”

Hoff isn’t only interested in data as an abstract entity. In his teaching at Duke he also wants to connect statistics students with the nitty-gritty of data gathering – either by pairing them with scientists, or by giving them the resources for building their own simple devices to collect and analyze data.

“As a statistician, I often get the data after it has been gathered by other scientists,” he said. “I would like to try to develop some projects where students in statistics as well as students in other departments such as computer science, engineering or biology are working with the tools that actually gather the data. Having them involved not just with the data analysis but also seeing what it’s like to gather data, and seeing all the challenges there, would be a great educational activity.”

This interest in hands-on learning or “tinkering,” as he calls it, arose from basic woodworking projects with his nine-year-old son -- a passtime that he originally thought had no link to computer-bound world of statistics.

“We started by just building rudimentary boxes,” he said. “But gradually we started to make devices with temperature sensors or pressure sensors, things to gather data. And of course as soon as that happened I had to analyze the data.”