Introduction to Data Science
Despite the recent hype and discussion regarding data science, there is much confusion about what data science actually is. Some embrace the term and field of data science, for example, Davenport and Patil (2012) labelled it “the sexiest job of the 21st century”. Others are more critical of the term, for example, Bloor (2013) claimed that there is “nothing at all new” about data science and it is “utterly misleading”. Regardless of this debate, whenever data science is referred to in this report, it means something more or less similar to the definition stated by Stanton (2012), who wrote that: “data science refers to an emerging area of work concerned with the collection, preparation, analysis, visualisation, management, and preservation of large collections of information.” (p.ii). According to the general consensus of literature, data science is a systematic process, hence the word ‘science’, that involves a combination of statistics and analytics to gain knowledge about the world (Dhar, 2013). There is undoubtedly a lot of hype and excitement about data science, and the potential opportunities it presents.
Human Health
The beauty of data science is that it allows for more effective decision-making, as it produces empirical data that is more objective and reliable than subjective human opinion. Strong evidence suggests that a variety of different industries which have embraced data analytics, including banking, retail and sports, have seen a number of benefits and improvements to the efficiency of its service. This paper focuses on the benefits and opportunities presented by data science in the field of human health. This includes observing how data can improve the quality and delivery of healthcare services; how it can help in developing and testing new medicines and treatments; and how it can improve human health and nutrition in general.
According to the World Health Organisation (2016, p. V), the biggest global challenges to human health include reducing maternal and child mortality, reducing diseases such as HIV/AIDS, tuberculosis, malaria, and hepatitis, and improving nutrition. For Western countries, the challenges are different. In Britain, for example, the major priorities include an ageing population, rising medical and technology costs, bad lifestyle choices, such as poor diet and substance abuse (Scantlebury & Moody, 2015; Snowden, 2015; My Health London, 2016).
Opportunities
The application of data science to the field of human health is genuinely exciting and could be truly revolutionary. If one had the choice of utilising data science in any particular field, healthcare is arguably the most important. Effective collection and analysis of data can literally save many lives. The consensus is that data science has the potential to improve the quality, speed and efficiency of healthcare. (Institute for Health Technology Transformation, 2013, p.7; Harvard Business Review, 2014). According to a McKinsey & Company (2013) report: “researchers can mine data to see what treatments are most effective for particular conditions, identify patterns related to drug side effects or hospital readmissions, and gain other important information that can help patients and reduce costs.” (p.1).
Perhaps one of the most significant benefits for physicians and patients is that big data allows for more personalised healthcare tailored towards specific individual needs. Unfortunately, in healthcare, what works for one patient may not necessarily work well for another. As a result, a more individualist approach is preferable. A patient-centred model is one in which “patients actively participate in their own care and receive services focused on individual needs and preferences, informed by advice and oversight from their healthcare providers.” (Chawla & Davis, 2013, p. 661). Taking into account genetic, environmental, historical and lifestyle factors, physicians can use data to efficiently determine the health of a patient, predicting diseases and illnesses which they are at risk of developing. This view is shared by Benker et al. (2016), who states that:
“A dataset with millions of treatment outcomes joined with tens of thousands of genomic and tumour sequences would be a game-changer. It would allow researchers to study relationships between specific genetic variations and responsiveness to different treatments, moving us closer to truly individualized medicine.” (p.8)
In addition, large research databases, such as sequencing of human genome data, will almost certainly provide useful and meaningful information to scientists and medical professionals, helping them to diagnose, understand and treat conditions like cancer better, benefiting whole populations of people (O’Driscoll, Daugelaite, & Sleator, 2013, p.777).
General citizens can also keep track of their diet and fitness, and make better informed decisions about their health by using smart phone applications such as MyFitnessPal and other new technologies such as Fitbit (Ipjian & Johnston, 2016). This information is collected as data which can be visualised, providing empirical data about health in a cheap, simple and convenient manner. It is often stated that ‘prevention is better than cure’; this and other similar digital technologies can allow the chance for prevention to become truly possible. Individuals can access scientific information about their health without ever needing to visit a doctor or professional, potentially saving huge amounts of money in healthcare.
McKinsey & Company (2013) proficiently summarises the five main potential benefits of big data as: improving ‘living’ – ensuring patients take an active role in treatment and prevention; ‘care’ – ensuring patients receive timely and appropriate treatment; ‘provider’ – allowing patients to be treated by high-performing professionals; ‘value’ – ensuring cost-effective treatment; and ‘innovation’ – advancing new technologies, therapies and approaches to medicine. (pp. 6-7).
Challenges
Although data science presents a number of potentially positive opportunities, the data revolution in healthcare is very much in its early days and is, unfortunately, lagging behind other industries. A number of challenges need to be addressed and overcome in order for the benefits to truly be attained. One of the main challenges facing the industry is that there appears to be a fairly strong resistance to change from the traditional ways of operating. Many physicians are accustomed to using their own subjective professional judgement on making treatment decisions rather than making decisions driven by data (McKinsey & Company 2013, p. 2). In addition, many hospitals and treatment centres are culturally dependent upon using paper and are reluctant to completely embrace the transition to digitalisation and big data (Institute for Health Technology Transformation, 2013, p. 13). In order to take full advantage of the potential opportunities presented by data science, patients, physicians and stakeholders require a shift in their mentality away from traditional practices towards new, data-driven, analytical practices in healthcare (McKinsey & Company, 2013, p.10).
Another major challenge within the industry is that there are concerns regarding the readiness to be able to deal with the sheer volume and complexity of the data. Many of the IT systems and technology used in healthcare are outdated, under-invested and in need of modernising. Healthcare data is often unstructured, fragmented and produced in incompatible formats making data mining and data analysis very time-consuming and very difficult (Institute for Health & Technology Transformation, 2013, p. 13). Furthermore, there is a shortage of supply of individuals with the knowledge, technical skills and expertise to be able to handle the data effectively. As a result, there is a growing demand to increase the supply of data scientists, with calls for more and more educational institutions to offer data related courses and private companies to offer on-the-job training (McKinsey & Company, 2013, p.13). A data scientist is no easy job however. An effective data scientist must be efficient in many areas, including cleaning, analysing and visualising data, ensuring that data are quality, usable and trustworthy. Data scientists are required to be competent in statistics and computer programming, with a respect for the scientific method and an ability to be able to spot trends and communicate their findings effectively (Schutt & O’Neil, 2013, pp.10-12).
One of the largest concerns, particularly for the general public, is regarding privacy of information. Healthcare data is perceptibly sensitive and it is reasonable to assume that most people would prefer their information to remain private. It is completely reasonable for people to have concerns over what healthcare providers are doing with collected data and so a level of transparency is required to ensure data is not being misused. As McKinsey & Company (2013) notes:
“In other data-driven revolutions, some players have taken advantage of data transparency by pursuing objectives that create value only for themselves. In healthcare, some stakeholders may try to take advantage of big data more quickly and aggressively than their competitors, without regard to clinically proven outcomes.” (p.9)
On the other hand, effective use of big data necessitates a certain degree of openness and sharing of data; if privacy is overbearing it would severely limit the potential benefits of data science. Thus, a correct balance must be weighted which sensibly protects patient privacy and at the same time does not limit the potential of data-driven healthcare.
In summary, the main challenges facing the application of data science to healthcare include letting go of traditional methods, improving technology, developing skilled data scientists and dealing with ethical concerns regarding privacy.
References
Benker, K., Harris, T., Malone, K., Mancini, A., Topczewska, O., & Wagner, D. (2016). Big Data Analytics and the Cancer Moonshot. Civis Analytics.
Bloor, R. (2013). A Data Science Rant. Inside Analysis. Retrieved from http://insideanalysis.com/2013/08/a-data-science-rant/
Chawla, N. & Davis, D. (2013). Bringing Big Data to Personalized Healthcare: A Patient-Centered Framework. Journal of General Internal Medicine, 28(S3), 660-665. http://dx.doi.org/10.1007/s11606-013-2455-8
Davenport, T. & Patil, D. (2012). Data Scientist: The Sexiest Job of the 21st Century. Harvard Business Review. Retrieved from https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century
Dhar, V. (2013). Data Science and Prediction. Communications of The ACM, 56(12), 64-73.
Harvard Business Review. (2014). How Big Data Impacts Healthcare. Retrieved from https://hbr.org/resources/pdfs/comm/sap/18826_HBR_SAP_Healthcare_Aug_2014.pdf
Ipjian, M. & Johnston, C. (2016). Smartphone technology facilitates dietary change in healthy adults. Nutrition, 1-5. http://dx.doi.org/http://dx.doi.org/10.1016/j.nut.2016.08.003
Institute for Health Technology Transformation. (2013). Transforming Health Care Through Big Data: Strategies for leveraging big data in the health care industry. Retrieved from http://c4fd63cb482ce6861463-bc6183f1c18e748a49b87a25911a0555.r93.cf2.rackcdn.com/iHT2_BigData_2013.pdf
McKinsey & Company. (2013). The ‘big data’ revolution in healthcare: Accelerating value and innovation. Center for US Health System Reform.
My Health London. (2016). Today’s NHS – our current challenges. Retrieved 4th November, 2016 from https://www.myhealth.london.nhs.uk/help/nhs-today
O’Driscoll, A., Daugelaite, J., & Sleator, R. (2013). ‘Big data’, Hadoop and cloud computing in genomics. Journal Of Biomedical Informatics, 46(5), 774-781. http://dx.doi.org/10.1016/j.jbi.2013.07.001
Scantlebury, R. & Moody, A. (2015). Health Survey for England, 2014: Chapter 9, Adult Obesity and Overweight. The Health and Social Care Information Centre. Retrieved from http://content.digital.nhs.uk/catalogue/PUB19295/HSE2014-ch9-adult-obe.pdf
Schutt, R. & O’Neil, C. (2013). Doing Data Science (1st ed.). O’Reilly Media.
Snowden, C. (2015). Death and Taxes. Institute of Economic Affairs. Retrieved from https://iea.org.uk/wp-content/uploads/2016/07/Death%20and%20Taxes%20December%202015.pdf
Stanton, J. (2012). An Introduction to Data Science. Syracuse University. Retrieved from https://ischool.syr.edu/media/documents/2012/3/DataScienceBook1_1.pdf
World Health Organisation. (2016). World Health Statistics 2016: Monitoring health for the SDGs. Retrieved from http://www.who.int/gho/publications/world_health_statistics/2016/en/