Accessing Important Census Data, Confidentially

Triangle Census Research Data Center makes data available to researchers while protecting privacy

Deborah Rho is using census data to follow immigrants' income trends. The Triangle Census Center allows her to do the research while protecting the privacy of the individual data used in the study.
Deborah Rho is using census data to follow immigrants' income trends. The Triangle Census Center allows her to do the research while protecting the privacy of the individual data used in the study.

To most people, the U.S. Census Bureau is simply the government office that, every 10 years, counts the number of people who live in the United States. But to many researchers at Duke University, the Census Bureau is a rich source of demographic and economic data, data that make it possible to answer all sorts of important and interesting questions.

A lot of the data, however, are confidential, and rightfully so: would you want any member of the public to have the ability to look up your salary or age? So how are researchers at Duke able to access and use them?

Enter the Triangle Census Research Data Center.

One of the new occupants of what will open in the fall of 2013 as SSRI West, the Triangle Census Research Data Center (RDC) is one of 15 Census Data Centers across the country that allow qualified researchers access to otherwise confidential data. What makes the data so special -- and so sensitive -- is that they are linked to individual people and businesses. And having access to data on such a "micro" or individual level makes it possible for researchers at Duke to answer questions that would be impossible to answer with the kind of aggregate data (state unemployment rates, for instance) that the government normally releases to the public at large.  

One of those researchers is Deborah Rho, a Ph.D. student in economics. Rho is collaborating with Seth Sanders, a professor in the economics department, on a project that studies the earnings of recent immigrants to the United States. To what extent do the earnings of recent immigrants resemble those of native-born workers? Do the earnings of recent immigrants rise as fast as those of the native-born? What distinguishes their study from previous ones is that Rho and Sanders are able to examine not just earnings patterns, but earnings patterns in relation to the kind of firm for which an immigrant works. And such an inquiry is made possible only by the access to confidential data that the RDC provides.     

"We would not be able to link earnings with workplace characteristics without the work that the Census is doing and without the access they grant me through the RDC," Rho says. The Census, through the RDCs, provides an infrastructure to do research in a way, Rho stresses, that does not compromise the privacy of the data.

Another researcher at Duke who is using the RDC is Daniel Xu, a professor of economics. Xu is working with a team of scholars to study the effects of government subsidies intended to increase the number of dentists working in rural, underserved areas of the United States. The problem is more complicated than it may at first seem. In most of those areas, there is already one, sometimes two, dentists, and they are often making good money, since there is so little competition for their services. What happens when a new dentist, with the aid of a government subsidy, moves into the market? The profits of the existing dentists fall as they lose business to the new dentist; as a result one of the existing dentists may move his or her practice elsewhere. The net result? The same number of dentists as before.

Confidential data provided through the RDC allow Xu to closely track the movements of dentists into and out of a particular market. Without having to reveal information on individual dentists, Xu is developing a cost-benefit analysis that will determine how much money the government needs to spend in subsidies to increase the number of dentists in a given underserved area.

"The data are confidential, but the Center should not be a secret," says the director of the RDC at Duke, Gale Boyd. He says that people come knocking on the RDC's door after they have squeezed all they can from the aggregate data the Census makes public. The questions that people really want to answer require data on the level of the individual person and the individual firm. Those are the kinds of data the RDC provides.

Boyd's own research has focused on energy efficiency. He is currently writing a paper with a former student in the master's program in economics at Duke, Mark Curtis, who is now in the Ph.D. program in economics at Georgia State University, which is located in a city (Atlanta) that is also home to an RDC. The two are looking at whether the management practices of firms influence the energy use of those firms. Such an empirical analysis could not be done without the confidential, microlevel business data that the RDC provides.

To use the RDC, researchers must be granted Special Sworn Status. Applicants must undergo a medium-level FBI background check and submit a work and residence history, fingerprints, and a sworn affidavit that the applicant will protect the data under penalty of law.

As Boyd notes, the federal statistical system is broad and rich, but fragmented. He foresees a day when RDCs function as gateways to other federal data. Already the RDC environment, which promotes access to confidential data, is rubbing off on other departments. According to Boyd, the National Center for Health Statistics now offers more broadly provided access to confidential data. And the Bureau of Labor Statistics now uses the RDC model to provide researchers access to confidential data.

In the past, it has been mostly economists who have used the RDC. But Boyd hopes that the move to SSRI West will draw a broader range of social scientists to his Center. In fact, that is already happening, he says. "Because RDCs have historically dealt mostly with business data, economists have been more likely to know about them and use them. But in the last 5-10 years, lots of rich demographic data have been added to RDCs, so the availability and usefulness of RDCs to other social scientists are growing."

Ever since his days as a Ph.D. student in the 1980s, when he began working on energy use in industries, Boyd has recognized the value of microlevel data; and ever since then, he's worked to make those kinds of data available to more and more researchers. There are often institutional, technical, and legal barriers to making confidential data available, but one by one they are being overcome. "There was a time when you could not get access at all to confidential, microlevel data. Confidential, microlevel business data were opened to the research community -- but you had to go to the Census headquarters in Washington to access them. Now you can go to an RDC."