Riddle, Matthew C.; Blonde, Lawrence; Gerstein, Hertzel C.; Gregg, Edward W.; Holman, Rury R.; Lachin, John M.; Nichols, Gregory A.; Turchin, Alexander; Cefalu, William T.
Abstract
Technological progress in the past half century has greatly increased our ability to collect, store, and transmit vast quantities of information, giving rise to the term big data. This term refers to very large data sets that can be analyzed to identify patterns, trends, and associations. In medicine-including diabetes care and research-big data come from three main sources: electronic medical records (EMRs), surveys and registries, and randomized controlled trials (RCTs). These systems have evolved in different ways, each with strengths and limitations. EMRs continuously accumulate information about patients and make it readily accessible but are limited by missing data or data that are not quality assured. Because EMRs vary in structure and management, comparisons of data between health systems may be difficult. Registries and surveys provide data that are consistently collected and representative of broad populations but are limited in scope and may be updated only intermittently. RCT databases excel in the specificity, completeness, and accuracy of their data, but rarely include a fully representative sample of the general population. Also, they are costly to build and seldom maintained after a trial's end. To consider these issues, and the challenges and opportunities they present, the editors of Diabetes Care convened a group of experts in management of diabetes-related data on 21 June 2018, in conjunction with the American Diabetes Association's 78th Scientific Sessions in Orlando, FL. This article summarizes the discussion and conclusions of that forum, offering a vision of benefits that might be realized from prospectively designed and unified data-management systems to support the collective needs of clinical, surveillance, and research activities related to diabetes.