
Contemporary scholars are painstakingly creating a comprehensive new annotated internet archive to enable poet Walt Whitman's works to reach 21st-century readers in a way not possible before, according to Matt Cohen, assistant professor of English at Duke University and an editor of the Walt Whitman Archive.
"The archive frees Whitman's works in their original form to travel the open internet the way he traveled the open road," said Cohen. "Unlike the scattered physical archives protecting the original papers, the Whitman archive welcomes scholars, students and the public around the clock, seven days a week."
The new archive will "show students who've grown up reading on computer screens what poetry has to offer," said Cohen.
Cohen said the archive also fulfills yearnings Whitman expressed in his poetry. "Whitman wanted 'an audience interminable,' hundreds of millions, across generations," Cohen said. "He celebrated advances in information technology - the steam printing press and the telegraph. Whitman would have loved the internet and published everything on it."
Creating the archive is a huge and arduous task involving far more than the usual process of posting Whitman's texts on the internet, said Cohen. Scholars are building the Whitman archive by obtaining high-quality images of manuscripts and printed documents, transcribing text with great accuracy and including in digital files additional information needed by scholars, he said.
"Producing manuscript images and transcribed text for scholarly use can take hundreds of thousands of dollars and thousands of man-hours," Cohen said. "Work on the archive has been under way for more than a decade."
Cohen said archive scholars painstakingly choose and embed informational "tags" in text files to aid searches and capture important information about the work and its origins. "Scholars need far more than the final printed words of a work," Cohen said.
For example, an editor could insert the following tag to denote words were deleted by striking them through and the editor is 80 percent certain Whitman himself made the deletion: <del type="overstrike" cert="80%">editor</del>, said Cohen.
The archive's tags conform to XML, the standard "eXtensible Markup Language" that enables embedding of data in documents to identify and characterize that data, said Cohen.
According to Cohen, the authoritative archive contrasts with some internet Whitman texts containing errors such as changes made by publishers long ago without Whitman's approval.
Cohen said the archive is an unparalleled resource for studying the development of Whitman's masterpiece, "Leaves of Grass," with excellent images and transcriptions of all seven editions from the 95-page first edition in 1855 to the 438-page "deathbed edition" of 1892.
"Tracing the evolution of Whitman's poetry isn't easy," Cohen said. "Whitman cut up one edition of 'Leaves of Grass' to prepare the next, moved and pasted fragments, scribbled changes, sewed in pages from separate published works and incorporated old manuscripts. Glue spots on a document in Duke's Trent Collection match spots on a document kept elsewhere, showing they were once together."
The archive "raises the bar for literary study of any kind, of any author," said Cohen. "A thoroughness that simply couldn't be asked before is now a basic condition of Whitman research." Cohen said students will decry the lack of comparable resources for other authors.
Cohen leads a team of Duke graduate and undergraduate students producing the archive's version of Horace Traubel's "With Walt Whitman in Camden," a 5,000-page biographical journal, he said.
"Traubel was Whitman's nurse, secretary, literary representative and companion late in life," Cohen said. "Traubel transcribed conversations and incorporated photographs, correspondence and marginal notations."
Work on the Traubel journals is about half complete, with the first of nine volumes public and two more appearing soon, said Cohen.
Cohen said he already has the transcription of all nine volumes on his own computer, and the ability to search the entire work has yielded new insights. Cohen presented some of his findings April 1, 2005, at "Leaves of Grass: The 150th Anniversary Conference," held at the University of Nebraska - Lincoln.
"Traubel is seen as a trustworthy intermediary for Whitman because they shared a passion for bookmaking," Cohen said. "Pulling out all their discussions of bookmaking weakens that rationale. They passionately disagreed about bookmaking. Traubel wanted to make books as hand-crafted art. Whitman insisted on books the average reader could afford."
Cohen said he is profoundly grateful to his students for making such insights possible by transcribing and tagging Traubel's works. "I've done that work myself," Cohen said. "As Whitman says, 'I am the man, I suffered, I was there.'"
Cultural anthropology major Leigh Spoon, a senior from Washington, D.C., said she began working on the Traubel project in Matt Cohen's Information Science and Information Studies course on editing digital texts in the humanities. The class edited one volume of "With Walt Whitman in Cambridge" for publication in the Whitman archive.
"Professor Cohen forced students to consider all the implications of moving literature to the computer, including what is lost," said Spoon.
Spoon said "tagging" text for the Whitman archive is a high-pressure job because "you're always aware labeling a passage with the wrong 'tag' could make a scholar miss something important about a great American figure." Such "tags" are critical to the use of the archive by literary scholars.
Teaching assistant Allison Dushane, a graduate student in English, said choosing appropriate tags can be surprisingly complex.
"For example, you need tags to identify a quote and the person quoted, but should a statement in Whitman's handwriting be labeled the same as something Traubel attributes to Whitman?" said Dushane. "Should a search for Whitman quotes find both?"
Spoon said the interdisciplinary course "was much better than just a class in English or a class in computer science because you heard different perspectives."
The variety of perspectives not only enlivened discussions but also improved the quality of the markup, said Dushane. "It was people from computer science, English, math and physics, faculty, undergraduate and graduate students," Dushane said. "Everyone helped improve the coding."
Skills from the course helped Spoon with her thesis and the emphasis on the consequences of editorial decisions helped prepare her for a career as a food writer, she said.
First-year English graduate student Patrick Jagoda from Chicago, Ill., said the Traubel project convinced him that "learning how digital text operates should be part of literary education in the twenty-first century."
The Whitman archive is affiliated with the Institute for Advanced Technology in the Humanities at the University of Virginia, with support from Duke, the University of North Carolina - Chapel Hill, the University of Nebraska - Lincoln, the University of Iowa and other institutions, Cohen said.
Cohen began working on the archive in 1995 when a graduate student in American studies at the College of William & Mary, he said. Fellow graduate student Charles Green had the idea for the archive in 1994 and Whitman scholars Kenneth Price, now at the University of Nebraska - Lincoln, and Ed Folsom of the University of Iowa embraced the idea. Price and Folsom remain the archive's co-directors.