The Future of Innovation: Linked Open Data Helps Bring Order to the Chaos of COVID-19
- Linked Open Data provides a universal standard for data sharing between multiple sources, allowing researchers of SARS-CoV-2 and policymakers to quickly analyze data and get a complete picture of the problem
- Governments and organizations get real-time data about the spread of COVID19, identifying disproportionately affected groups and providing resources to help save lives
The COVID19 pandemic has had a profound impact on our health, infrastructure, and economy, requiring us to quickly adapt to changing world conditions. It highlighted our interconnectedness and showed that no nation can tackle the issue on its own, regardless of wealth or size.
Extended lockdowns have influenced how we interact with each other, leading to an increase in remote work, online education, and telemedicine. Other technological innovations, such as Linked Open Data, can help us better prepare for similar global emergencies in the future.
Data is the foundation of our technology. It is a public good to have access to information and knowledge, and there are great advantages to delivering information quickly.
As part of the fight against COVID-19, we must improve information sharing within our organizations to get a better understanding of SARS-CoV-2, measure the effectiveness of our response, and make informed evidence-based decisions.
In this article, we examine the benefit of Linked Data for responding to COVID-19, and the necessity of open data for sharing information and knowledge between scientists, research organizations, and public health.
WHAT IS LINKED DATA?
Currently, the majority of the World Wide Web’s data is not readily accessible — including COVID-19 data.
It is made up of data that’s stored and siloed in web pages and tables and requires many man-hours to make sense of and analyze. The data is not connected with context, making it difficult for machines to find information, understand it, and present it to us in a meaningful way.
That’s where Linked Data comes in. It is a way of publishing structured data on the web so that it can be combined with other Linked Data that is readily queryable and easily consumed. This provides a universal language for computers and focuses on the relationships between datasets rather than solely the data itself.
The most common way of storing and representing Linked Data is with a graph database. A graph database is a database that is focused on objects and the relations between these objects.
When we have these graphs that are built with contextualized data, interoperability between network sources ensures data integrity and accuracy by allowing computers to verify links between related information. It also allows people to share, integrate, and analyze information across different sources in an efficient and relevant way.
As the saying goes, “Speed is of the essence.” Linked Data is much faster than pulling all the relevant data by hand from multiple sources, cleaning it, and making it understandable. As a result, opening more opportunities for functionality.
THE SEMANTIC WEB
Tim Berners-Lee founded the World Wide Web Consortium in 1994 to help realize his vision of a Semantic Web, where all types of data could be linked together and accessed from anywhere on the Internet.
Lee wanted scientists to be able to retrieve information using search engines like Google, but also seamlessly connect it with data from other sources such as pharmaceutical databases, weather reports, news items, and more.
The development of Linked Data started in the mid-2000s, with the goal of making data available in a format that computers can easily understand so that people can explore it and make new discoveries at speeds never before possible.
While Linked Data and the Semantic Web have been around for a while, it’s only recently that their potential has become widely apparent. The coronavirus pandemic brought this into focus. We’ll discuss next how it positively impacted scientists researching COVID-19.
PROMOTING SCIENTIFIC COLLABORATION
In just seven months, there have been more than 50,000 scientific papers published about COVID-19 and SARS-CoV-2, making it one of the fastest-growing areas of research.
This was great news for the scientific community, however, it would be impossible for any researcher to examine all of the papers, let alone extract the relevant information. Many of these datasets aren’t interoperable, meaning they can’t be easily combined or used together.
One of the greatest advantages of Linked Data is its interoperability, the ability to link and explore independent sources. Connecting scientific and medical databases together enables researchers to share data faster, cross-reference results, and build upon data sets from multiple sources.
Collaborative efforts help us build a complete picture (like its effects on different age groups or those with pre-existing medical conditions) to form baselines for treatment, establish patient management protocols, and expedite vaccine clinical trials.
Covid-on-the-Web, a Linked Open Data project led by the French Institute of Medical Research and French National Cancer Institute, simplifies the exploration of COVID-19 scientific literature. They created an enriched Linked Data version of the COVID-19 Open Research Dataset (CORD-19), which includes over 500,000 scholarly articles, along with a Linked Data Visualizer, which assists with querying and visual analysis of the data set.
By making the Linked Dataset and code openly available, contributors are able to advance the current state of knowledge about COVID-19.
NATIONAL RESPONSE COORDINATION
As the number of cases rise, so does the need for a coordinated response. Governments and health organizations need to quickly share and correlate data as multiple datasets are needed to get a complete picture of the problem at hand.
Public health departments maintain their own database about who has been infected and how severe their condition is, while hospitals collect data about patients who come in for testing and are admitted for treatment. With Linked Data technology, we can ensure that both datasets match up and determine who is at risk and where they are located.
This data helps governments and hospitals allocate finite resources and prioritize access to medical equipment during shortages. Similarly, we can pinpoint vulnerable and disproportionately affected populations so that they can receive life-saving medical treatment as soon as possible.
DEVELOPING EFFICIENT SYSTEMS
In response to the socioeconomic damage caused by the MERS outbreak in 2015, South Korea underlined the need for a systematization of epidemiological investigation to prevent the spread of infectious diseases.
The aim is to quickly identify whether an infectious disease occurs and identify the source of infection and transmission process so that local governments can lead the response.
After the large-scale spread of COVID-19, the Korea Centers for Disease Control and Prevention (KCDC) rapidly established computerization of the process since previous investigations were conducted by time-consuming methods such as sending documents, contacting by telephone, and handwritten recording.
The KCDC built a Smart Management System to track the spread of COVID-19, which included Linked Datasets of personal information, geo-location, and travel history data. The data was heavily researched and developed, resulting in a proposed diagnosis and tracking model of >80% accuracy.
Once in place, the system provided real-time data related to the personal movement of confirmed cases. This data was displayed on a map for quarantine authorities to quickly respond to areas of outbreak. The new system rapidly shortened the process from 25 hours to 10 minutes.
HEALTHCARE AND UNDERSERVED POPULATIONS
Doctors and nurses in Australia treated the country’s first COVID-19 patients while data scientists worked behind the scenes to assist frontline workers. The Population Health Research Network (PHRN), supported by the National Collaborative Research Infrastructure Strategy, is an Australian nationwide collaboration with a mission to lead and enable the linking of data for research using privacy-preserving methods across states.
In early 2020, PHRN initially used Linked Data to support COVID-19 surveillance — linking their Border Force data on returned travelers to government health records to track its spread across the country.
Datasets that were previously linked every three months were updated daily, including the number of COVID-19 positive notifications, hospital admissions, and deaths.
Data scientists then linked this data to the state’s patient flow portal to identify where cases were located in the health system to better track people who tested positive, understand longer-term outcomes for patients, and identify stressors to help hospitals anticipate patients who might require ICU care.
They also collected data linked to vulnerable groups to target them with wellness checks and pop-up testing clinics from agencies like the Red Cross. Recent additions include data from the Australian Immunization Register, providing information on outcomes of vaccinated and unvaccinated people.
ACCURATE REAL-TIME DATA
In the past, government agencies relied heavily on manual processes that were time-consuming and prone to error.
With Linked Data, governments can connect data from multiple sources without changing or duplicating it, which means no more inefficient manual processes or changes in how data is captured and stored.
Researchers at the University of Edinburgh, in collaboration with Public Health Scotland, established the Early Pandemic Evaluation and Enhanced Surveillance of COVID-19 (EAVE II.) EAVE-II successfully tracked 5.4 million people, or 98% of Scotland’s population, in real-time using national Linked Data and is a key resource in Scotland’s COVID-19 response.
Utilizing their Linked Database, the researchers were the first to publish single dose AstraZeneca and Pfizer vaccine effectiveness, gaining global attention.
Collaboration thrives in open environments where restrictions and boundaries are minimized, allowing for greater benefit. Linking open data provides secure transparency while enabling integration, so the data can be reused across different organizations and community boundaries.
In the US, the Centers for Disease Control and Prevention (CDC)’s National Center for Health Statistics (NCHS), the primary agency in charge of the nation’s health statistics, is expanding the utility of current and future data links.
Accessible only through the NCHS and Federal Statistical Research Data Centers, most Linked Datasets are restricted and create barriers and limits to their use. The project seeks to create publicly available synthetic Linked Data files while protecting privacy and expanding data accessibility.
To simplify access to linkable multi-jurisdictional data, the Canadian government’s Health Data Research Network Canada and Canadian Partnership for Tomorrow’s Health have partnered to enable linkage between their datasets.
The Networked Data Lab, a collaborative network of analysts across the UK, provides national and local leaders with insights to advise and shape health and social care systems in response to COVID-19. They hope to showcase the value of Linked Data by making the code for their analytical tools openly available on GitHub.
LINKED OPEN DATA AND BEYOND
In the wake of COVID-19, we have learned that we greatly underestimated how vulnerable our societies are to infectious diseases. As a result, we have a unique opportunity to rebuild better.
Collaboration across scientific disciplines, governments, and organizations is essential for a rapid and efficient response– a solution Linked Open Data provides.
In a time where data is rapidly changing, information derived from a variety of sources can help manage outbreaks and coordinate responses to better serve those affected.
Accurate real-time data allows rapid decisions to be made for effective policies, such as testing and tracing protocols or vaccine deployment prioritization.
Linked Open Data has already proven its usefulness in fighting COVID-19. We can expect it to become even more valuable in the future as technology continues to evolve and expand.
We at Semantu strive to make Web3 development accessible for all. We believe with the right foundations, you can build out functionality that will pay off over time. Our platform lets you build your application in a click-and-go environment, so you can concentrate on what matters most — taking care of your users.
Our open-source initiative, LINCD (Linked Interoperable Code and Data) is aimed at making structured data easier to use and publish on the web. The goal is to support and enable developers to create digital assets that are interoperable both on a coding and data level.
Visit the LINCD website to learn more.