Skip to main content

Crowdsourcing Research: A Team Effort

A software developer discusses crowd-sourced transcription

Software engineer Ben Brumfield speaks at Duke University Wednesday, Nov. 20, 2013. Photo by Les Todd, Duke Photography

Thanks to Ben Brumfield, we may soon be Googling our way through Byzantine records and medieval manuscripts.

"Search engines, because of how they organize information, allow us to make serendipitous connections between information we might not otherwise make," Brumfield told a Duke audience Wednesday. "[Crowd-sourced] transcription can help people make those connections with handwritten materials."

Brumfield, a software engineer based in Austin, Texas, has expedited the digital transcription of handwritten documents -- ranging from naval weather logs to travel diaries -- with his online crowd sourcing platform FromThePage.

At Duke, Brumfield discussed the future implications of crowd-sourced transcription for the academic community. The event was sponsored by the Duke Collaboratory for Classics Computing.

The need for crowd-sourced transcription arose from search engines' inability to index handwritten materials, as scans of document pages are represented by pixels rather than individual words. Thanks to thousands of volunteers worldwide, millions of pages of previously unsearchable documents are now a few keystrokes away, he said. Like the volunteers, those who access these documents on sites like FromThePage are a diverse group, from journalists analyzing campaign contributions to environmental scientists tracking changes in global weather patterns.

"Crowd sourcing is not about free labor," Brumfield said. "People do this work for their own reasons, like wanting to explore their family histories. In addition, making your [collections] readable by search engines allows you to engage new readers and add value to existing research."

Joshua Sosin, professor of classical studies and director of the Duke Collaboratory for Classics Computing, is at the forefront of Duke's efforts to transcribe its own handwritten collections. Sosin said that although many students and professors visit the library's collections and partially transcribe the sources that are pertinent to their research, nearly all of these transcripts disappear once the researchers leave the library.

"Scholars or students come to the Rubenstein, check out these precious materials, they transcribe and develop all sorts of interesting ideas about them," Sosin said. "Then they take their notebooks out of the library and we lose all the extra value-added materials developed by these students. If we can host a platform for students and scholars to share their notes and ideas on our collections, the library's base of knowledge will grow with every term paper or book that our scholars produce."

As part of his visit, Brumfield helped Duke Libraries set up a trial run of FromThePage’s transcription software on a local network. Sosin and Brumfield are hopeful that researchers will eventually be able to enter their own transcriptions into a central database. Other researchers will then be able to search through these transcripts. Brumfield said such easily accessible transcripts would add immensely to the university’s accumulated knowledge.

"Even if something in the library's collection was transcribed by a genealogist, it's going to stay transcribed and is accessible by any other member of the community," Brumfield said. "Crowd sourcing is truly a cross-disciplinary effort to make more information available to everyone."

Click here for more information on the Duke Collaboratory for Classics Computing and their efforts to transcribe Duke's special collections.