Data Science – Data Danfit

Introduction to Data Science

Despite the recent hype and discussion regarding data science, there is much confusion about what data science actually is. Some embrace the term and field of data science, for example, Davenport and Patil (2012) labelled it “the sexiest job of the 21st century”. Others are more critical of the term, for example, Bloor (2013) claimed that there is “nothing at all new” about data science and it is “utterly misleading”. Regardless of this debate, whenever data science is referred to in this report, it means something more or less similar to the definition stated by Stanton (2012), who wrote that: “data science refers to an emerging area of work concerned with the collection, preparation, analysis, visualisation, management, and preservation of large collections of information.” (p.ii). According to the general consensus of literature, data science is a systematic process, hence the word ‘science’, that involves a combination of statistics and analytics to gain knowledge about the world (Dhar, 2013). There is undoubtedly a lot of hype and excitement about data science, and the potential opportunities it presents.

Human Health

The beauty of data science is that it allows for more effective decision-making, as it produces empirical data that is more objective and reliable than subjective human opinion. Strong evidence suggests that a variety of different industries which have embraced data analytics, including banking, retail and sports, have seen a number of benefits and improvements to the efficiency of its service. This paper focuses on the benefits and opportunities presented by data science in the field of human health. This includes observing how data can improve the quality and delivery of healthcare services; how it can help in developing and testing new medicines and treatments; and how it can improve human health and nutrition in general.

According to the World Health Organisation (2016, p. V), the biggest global challenges to human health include reducing maternal and child mortality, reducing diseases such as HIV/AIDS, tuberculosis, malaria, and hepatitis, and improving nutrition. For Western countries, the challenges are different. In Britain, for example, the major priorities include an ageing population, rising medical and technology costs, bad lifestyle choices, such as poor diet and substance abuse (Scantlebury & Moody, 2015; Snowden, 2015; My Health London, 2016).

Opportunities

The application of data science to the field of human health is genuinely exciting and could be truly revolutionary. If one had the choice of utilising data science in any particular field, healthcare is arguably the most important. Effective collection and analysis of data can literally save many lives. The consensus is that data science has the potential to improve the quality, speed and efficiency of healthcare. (Institute for Health Technology Transformation, 2013, p.7; Harvard Business Review, 2014). According to a McKinsey & Company (2013) report: “researchers can mine data to see what treatments are most effective for particular conditions, identify patterns related to drug side effects or hospital readmissions, and gain other important information that can help patients and reduce costs.” (p.1).

Perhaps one of the most significant benefits for physicians and patients is that big data allows for more personalised healthcare tailored towards specific individual needs. Unfortunately, in healthcare, what works for one patient may not necessarily work well for another. As a result, a more individualist approach is preferable. A patient-centred model is one in which “patients actively participate in their own care and receive services focused on individual needs and preferences, informed by advice and oversight from their healthcare providers.” (Chawla & Davis, 2013, p. 661). Taking into account genetic, environmental, historical and lifestyle factors, physicians can use data to efficiently determine the health of a patient, predicting diseases and illnesses which they are at risk of developing. This view is shared by Benker et al. (2016), who states that:

“A dataset with millions of treatment outcomes joined with tens of thousands of genomic and tumour sequences would be a game-changer. It would allow researchers to study relationships between specific genetic variations and responsiveness to different treatments, moving us closer to truly individualized medicine.” (p.8)

In addition, large research databases, such as sequencing of human genome data, will almost certainly provide useful and meaningful information to scientists and medical professionals, helping them to diagnose, understand and treat conditions like cancer better, benefiting whole populations of people (O’Driscoll, Daugelaite, & Sleator, 2013, p.777).

General citizens can also keep track of their diet and fitness, and make better informed decisions about their health by using smart phone applications such as MyFitnessPal and other new technologies such as Fitbit (Ipjian & Johnston, 2016). This information is collected as data which can be visualised, providing empirical data about health in a cheap, simple and convenient manner. It is often stated that ‘prevention is better than cure’; this and other similar digital technologies can allow the chance for prevention to become truly possible. Individuals can access scientific information about their health without ever needing to visit a doctor or professional, potentially saving huge amounts of money in healthcare.

McKinsey & Company (2013) proficiently summarises the five main potential benefits of big data as: improving ‘living’ – ensuring patients take an active role in treatment and prevention; ‘care’ – ensuring patients receive timely and appropriate treatment; ‘provider’ – allowing patients to be treated by high-performing professionals; ‘value’ – ensuring cost-effective treatment; and ‘innovation’ – advancing new technologies, therapies and approaches to medicine. (pp. 6-7).

Challenges

Although data science presents a number of potentially positive opportunities, the data revolution in healthcare is very much in its early days and is, unfortunately, lagging behind other industries. A number of challenges need to be addressed and overcome in order for the benefits to truly be attained. One of the main challenges facing the industry is that there appears to be a fairly strong resistance to change from the traditional ways of operating. Many physicians are accustomed to using their own subjective professional judgement on making treatment decisions rather than making decisions driven by data (McKinsey & Company 2013, p. 2). In addition, many hospitals and treatment centres are culturally dependent upon using paper and are reluctant to completely embrace the transition to digitalisation and big data (Institute for Health Technology Transformation, 2013, p. 13). In order to take full advantage of the potential opportunities presented by data science, patients, physicians and stakeholders require a shift in their mentality away from traditional practices towards new, data-driven, analytical practices in healthcare (McKinsey & Company, 2013, p.10).

Another major challenge within the industry is that there are concerns regarding the readiness to be able to deal with the sheer volume and complexity of the data. Many of the IT systems and technology used in healthcare are outdated, under-invested and in need of modernising. Healthcare data is often unstructured, fragmented and produced in incompatible formats making data mining and data analysis very time-consuming and very difficult (Institute for Health & Technology Transformation, 2013, p. 13). Furthermore, there is a shortage of supply of individuals with the knowledge, technical skills and expertise to be able to handle the data effectively. As a result, there is a growing demand to increase the supply of data scientists, with calls for more and more educational institutions to offer data related courses and private companies to offer on-the-job training (McKinsey & Company, 2013, p.13). A data scientist is no easy job however. An effective data scientist must be efficient in many areas, including cleaning, analysing and visualising data, ensuring that data are quality, usable and trustworthy. Data scientists are required to be competent in statistics and computer programming, with a respect for the scientific method and an ability to be able to spot trends and communicate their findings effectively (Schutt & O’Neil, 2013, pp.10-12).

One of the largest concerns, particularly for the general public, is regarding privacy of information. Healthcare data is perceptibly sensitive and it is reasonable to assume that most people would prefer their information to remain private. It is completely reasonable for people to have concerns over what healthcare providers are doing with collected data and so a level of transparency is required to ensure data is not being misused. As McKinsey & Company (2013) notes:

“In other data-driven revolutions, some players have taken advantage of data transparency by pursuing objectives that create value only for themselves. In healthcare, some stakeholders may try to take advantage of big data more quickly and aggressively than their competitors, without regard to clinically proven outcomes.” (p.9)

On the other hand, effective use of big data necessitates a certain degree of openness and sharing of data; if privacy is overbearing it would severely limit the potential benefits of data science. Thus, a correct balance must be weighted which sensibly protects patient privacy and at the same time does not limit the potential of data-driven healthcare.

In summary, the main challenges facing the application of data science to healthcare include letting go of traditional methods, improving technology, developing skilled data scientists and dealing with ethical concerns regarding privacy.

References

Benker, K., Harris, T., Malone, K., Mancini, A., Topczewska, O., & Wagner, D. (2016). Big Data Analytics and the Cancer Moonshot. Civis Analytics.

Bloor, R. (2013). A Data Science Rant. Inside Analysis. Retrieved from http://insideanalysis.com/2013/08/a-data-science-rant/

Chawla, N. & Davis, D. (2013). Bringing Big Data to Personalized Healthcare: A Patient-Centered Framework. Journal of General Internal Medicine, 28(S3), 660-665. http://dx.doi.org/10.1007/s11606-013-2455-8

Davenport, T. & Patil, D. (2012). Data Scientist: The Sexiest Job of the 21st Century. Harvard Business Review. Retrieved from https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century

Dhar, V. (2013). Data Science and Prediction. Communications of The ACM, 56(12), 64-73.
Harvard Business Review. (2014). How Big Data Impacts Healthcare. Retrieved from https://hbr.org/resources/pdfs/comm/sap/18826_HBR_SAP_Healthcare_Aug_2014.pdf

Ipjian, M. & Johnston, C. (2016). Smartphone technology facilitates dietary change in healthy adults. Nutrition, 1-5. http://dx.doi.org/http://dx.doi.org/10.1016/j.nut.2016.08.003

Institute for Health Technology Transformation. (2013). Transforming Health Care Through Big Data: Strategies for leveraging big data in the health care industry. Retrieved from http://c4fd63cb482ce6861463-bc6183f1c18e748a49b87a25911a0555.r93.cf2.rackcdn.com/iHT2_BigData_2013.pdf

McKinsey & Company. (2013). The ‘big data’ revolution in healthcare: Accelerating value and innovation. Center for US Health System Reform.

My Health London. (2016). Today’s NHS – our current challenges. Retrieved 4th November, 2016 from https://www.myhealth.london.nhs.uk/help/nhs-today

O’Driscoll, A., Daugelaite, J., & Sleator, R. (2013). ‘Big data’, Hadoop and cloud computing in genomics. Journal Of Biomedical Informatics, 46(5), 774-781. http://dx.doi.org/10.1016/j.jbi.2013.07.001

Scantlebury, R. & Moody, A. (2015). Health Survey for England, 2014: Chapter 9, Adult Obesity and Overweight. The Health and Social Care Information Centre. Retrieved from http://content.digital.nhs.uk/catalogue/PUB19295/HSE2014-ch9-adult-obe.pdf

Schutt, R. & O’Neil, C. (2013). Doing Data Science (1st ed.). O’Reilly Media.

Snowden, C. (2015). Death and Taxes. Institute of Economic Affairs. Retrieved from https://iea.org.uk/wp-content/uploads/2016/07/Death%20and%20Taxes%20December%202015.pdf

Stanton, J. (2012). An Introduction to Data Science. Syracuse University. Retrieved from https://ischool.syr.edu/media/documents/2012/3/DataScienceBook1_1.pdf

World Health Organisation. (2016). World Health Statistics 2016: Monitoring health for the SDGs. Retrieved from http://www.who.int/gho/publications/world_health_statistics/2016/en/

There is a great deal of hype about data science at the moment. Much of the talk is regarding the potential ways in which data can contribute to society as a whole. Yet, there is a large amount of ambiguity about what data science actually is. Unfortunately, a generally accepted definition of data science is lacking, however, recent contributors to the field have attempted to define the topic. For example, Stanton (2012) defines data science as: “an emerging area of work concerned with the collection, preparation, analysis, visualisation, management, and preservation of large collections of information.” (p.ii). The role of a data scientist is to collect, clean and interpret data using a range of statistical and software engineering tools; the data scientist then must effectively visualise and communicate their findings (O’Neill and Schutt, 2013, p.16). This short paper discusses what data science is and how data, statistics and analytics have benefited professional sports.

The purpose of data is essentially to tell us information about how the real world is. As Kitchen (2014) states, “data are clearly a base material for how we make sense of the world” (p.12). Many authors within the field explain the process of extracting knowledge from data by depicting an image of a pyramid. At the base of the pyramid there is the real world, which data is abstracted from. This data provides information which, when interpreted, allows us to gain knowledge about how the world is. The highest levels of the pyramid represent understanding and wisdom about the truths of the world (Kitchen, 2014, p.9; Stanton, 2012, p.9). In other words, data provides us with empirical evidence about how the world is, which consequently, allows us to move towards basing our knowledge and decisions on objective facts rather than subjective opinion.
It is quite clear that an evidence based, objective approach can benefit a vast number of different industries and trades. A UK Government (2013) report outlined the significance of data and the opportunity it presents in contributing to businesses, services and society as a whole in the UK:

“The wider social and economic benefits are manifold: such as better health outcomes for NHS patients as a result of analysis of clinical trials data; consumers receiving personalised marketing of goods as businesses analyse spending habits; and greater transparency and accountability of government to citizens with the release of more open data.” (p.10)

A large amount of evidence suggests that data has improved the efficiency of the decision making process in a wide range of different industries.

Sport, in particular, is one of the clearest instances in how data analytics has had a positive impact. Perhaps the most famous sporting example is described in Moneyball, which details how financially constrained Oakland Athletics baseball team became a strong team, able to compete with the richest baseball teams in the league. They did this by adopting an analytic approach to recruiting amateur players (Lewis, 2013). The book argues that scouting methods at the time were outdated and flawed. Recruitment of players was down to subjective, human opinion, which is often completely irrational. Scouts and coaches had the propensity to generalise wildly from their own experience, and be excessively swayed by a player’s most recent performance. “The human mind played tricks on itself when it relied exclusively on what it saw, and every trick it played was a financial opportunity for someone who saw through the illusion to the reality.” (p.18). It was only after adopting a scientific approach to recruiting players, such as rigorously analysing percentage statistics, that the Oakland A’s began to spend money effectively, win games and become successful.

Similar success stories can be found in many other different sports. As sports teams strive for improvement, it is no surprise that sport science and data analytics is completely booming. Teams with the high levels of resources adopt sophisticated methods of collecting various types of data. Large numbers of sports scientists are employed who analyse in detail player performance and the effects of different training methods and strategies. Evidence suggests that there has been a clear positive impact on performance and, in particular, on fitness. For example, rugby and American football teams have notable seen reductions in the number of injuries to players due to “wearable sensors that monitor the intensity of activity and impact of collisions” (Marr, 2015). However, some sports are easier to extract data from than others. For instance, as Brooks, Kerr and Guttag (2016) point out, sports such as baseball, American football and tennis are much easier to break up into individual events, than sports such as soccer – which, despite being the most popular sport in the world, is yet to accomplish similar levels of sophisticated performance data analytics as other sports (p.49-50). Yet, a lot of data scientists and statisticians are developing new models of performance analysis to improve the game even further.

In conclusion, data science involves collecting, preparing, analysing and presenting data about how the world is in order to gain knowledge and understanding of the world. The majority of evidence suggests that data science is making a positive contribution to society, as it allows for more effective decision making in countless different fields. The main benefit is that it gives empirical evidence which is more objective and reliable than subjective human opinion. This paper has focused on how it has had a positive impact on sports, in particular looking at the famous Moneyball example, in which the Oakland A’s baseball team used data analytic techniques to improve player recruitment and become a strong and successful team. Evidence suggests that performance, fitness and health of athletes has improved because of using data.

Reference List

Brooks, J., Kerr, M., & Guttag, J. (2016). Developing a Data-Driven Player Ranking in Soccer Using Predictive Model Weights. KDD ’16, 49-55. http://dx.doi.org/10.1145/2939672.2939695

Kitchen, R. (2014). The Data Revolution. SAGE publications Ltd.

Lewis, M. (2003). Moneyball. New York: W.W. Norton.

Marr, B. (2015). Big Data: The Winning Formula In Sports. Forbes. Retrieved from http://www.forbes.com/sites/bernardmarr/2015/03/25/big-data-the-winning-formula-in-sports/#45d3db6e26dc

O’Neill, C. & Schutt, R. (2013). Doing Data Science: Straight Talk from the Frontline. O’Reilly Media.

Stanton, J. (2012). An Introduction to Data Science. Syracuse University. Retrieved from https://ischool.syr.edu/media/documents/2012/3/DataScienceBook1_1.pdf

UK Government. (2013). Seizing the data opportunity: A strategy for UK data capability. Retrieved from https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/254136/bis-13-1250-strategy-for-uk-data-capability-v4.pdf

Data Danfit

Category: Data Science

Data Science in Human Health: Opportunities and Challenges