Tracking Atrocities Using Big Data

Data for Ukraine Project Shows Incidents As They Happen

#DataForUkraine plots war incidents on charts and maps.
#DataForUkraine plots war incidents on charts and maps.

In a war zone, response times matter. Minutes can save lives and hours can boost relief efforts.

Erik Wibbels A Duke social scientist and his team have helped create a new tool for tracking major incidents in Ukraine hours before media can report them. The goal is to get the information in the hands of the organizations working on behalf of the Ukrainian people so help can be delivered faster.

“If you have a five-hour lead on a bombing in a city that is near a border and likely to lead to a surge in refugees, you can get people ready for that wave sooner rather than later,” said Erik Wibbels, a professor of political science at the Trinity College of Arts & Sciences at Duke.

The #DataforUkraine project uses hourly data from Twitter to report on incidents in Ukraine. Relying on several hundred accounts to identify Twitter communities of interest, it classifies millions of individual tweets into four event categories: civilian resistance, human rights abuses, internally displaced people and humanitarian support and needs.

The information is presented on graphs showing the incidence of each event type over time and plotted on a map to make the locations easier to chart.

“If you are working for the International Red Cross, and you see a spike on a map that is five hours before the news, this can tell you where the humanitarian supplies need to go,” Wibbels said.

The project came about when Wibbels was asked if the Machine Learning for Peace project at Duke, which tracks changes in political regimes around the world, could be applied to events in Ukraine.

It couldn’t, but Wibbels saw how the tools could be adapted for that use. Wibbels contacted Ernesto Calvo at the University of Maryland, who studies social media and knew how to harness the tens of  millions of daily tweets on Ukraine. Graeme Robertson, a Ukraine expert of the University of North Carolina at Chapel Hill, and Olga Onuch at the University of Manchester provided the criteria needed to distill those tweets into usable data.

“The data are drawn from a collection of more than 400 Twitter accounts and their followers, including politicians, civil society activists, journalists and media at the national and local levels,” Robertson said.

Tweets in Ukrainian, Russian and English are searched for more than 600 keywords, reflecting the real-world language used in all three languages to describe the incidents the team is looking for.

#DataForUkraine plots war incidents on charts and maps

A random sample of the automatically collected tweets are verified manually to monitor and improve the accurate classification of events. The data and graphics are updated every three hours.

“In our initial assessment, we found our big data approach helped identify major events of human rights abuses roughly 3 to 7 hours before media were able to report on them,” said Zung-Ru Lin, one of the Duke researchers who poured hundreds of hours into the project after the invasion began.

When Russian occupational forces shot at peaceful protesters in Kherson on March 21, the model identified it as the very first reports were posted on social media.

“This was a fairly immediate capturing of a major incident of human rights abuse in real-time, as it unfolded,” Wibbels said.

To help Ukraine, the key is getting the data tools into the hands of organizations who can use it to direct their efforts.

The researchers have shared the project with advisors to the Ukrainian president and the Ukrainian Ministry of Digital Transformation, United States Agency for International Development (USAID), members of parliament and the foreign aid office in the U.K., and human rights organizations in Geneva, and others.

Beyond its uses in directing aid, the data could also be valuable documentation in forensic analysis after the war, as the International Criminal Court and others examine the sequence of events in Ukraine for evidence of war crimes, Wibbels said.

The value of collecting this real-time data, both now and in for these future uses, is why Wibbels’ team of data scientists poured hundreds of hours into getting the interface up and running.

“They’ve built this,” he said, “in real time.”