Data Science and Sports

There is a great deal of hype about data science at the moment. Much of the talk is regarding the potential ways in which data can contribute to society as a whole. Yet, there is a large amount of ambiguity about what data science actually is. Unfortunately, a generally accepted definition of data science is lacking, however, recent contributors to the field have attempted to define the topic. For example, Stanton (2012) defines data science as: “an emerging area of work concerned with the collection, preparation, analysis, visualisation, management, and preservation of large collections of information.” (p.ii). The role of a data scientist is to collect, clean and interpret data using a range of statistical and software engineering tools; the data scientist then must effectively visualise and communicate their findings (O’Neill and Schutt, 2013, p.16). This short paper discusses what data science is and how data, statistics and analytics have benefited professional sports.

The purpose of data is essentially to tell us information about how the real world is. As Kitchen (2014) states, “data are clearly a base material for how we make sense of the world” (p.12). Many authors within the field explain the process of extracting knowledge from data by depicting an image of a pyramid. At the base of the pyramid there is the real world, which data is abstracted from. This data provides information which, when interpreted, allows us to gain knowledge about how the world is. The highest levels of the pyramid represent understanding and wisdom about the truths of the world (Kitchen, 2014, p.9; Stanton, 2012, p.9). In other words, data provides us with empirical evidence about how the world is, which consequently, allows us to move towards basing our knowledge and decisions on objective facts rather than subjective opinion.
It is quite clear that an evidence based, objective approach can benefit a vast number of different industries and trades. A UK Government (2013) report outlined the significance of data and the opportunity it presents in contributing to businesses, services and society as a whole in the UK:

“The wider social and economic benefits are manifold: such as better health outcomes for NHS patients as a result of analysis of clinical trials data; consumers receiving personalised marketing of goods as businesses analyse spending habits; and greater transparency and accountability of government to citizens with the release of more open data.” (p.10)

A large amount of evidence suggests that data has improved the efficiency of the decision making process in a wide range of different industries.

Sport, in particular, is one of the clearest instances in how data analytics has had a positive impact. Perhaps the most famous sporting example is described in Moneyball, which details how financially constrained Oakland Athletics baseball team became a strong team, able to compete with the richest baseball teams in the league. They did this by adopting an analytic approach to recruiting amateur players (Lewis, 2013). The book argues that scouting methods at the time were outdated and flawed. Recruitment of players was down to subjective, human opinion, which is often completely irrational. Scouts and coaches had the propensity to generalise wildly from their own experience, and be excessively swayed by a player’s most recent performance. “The human mind played tricks on itself when it relied exclusively on what it saw, and every trick it played was a financial opportunity for someone who saw through the illusion to the reality.” (p.18). It was only after adopting a scientific approach to recruiting players, such as rigorously analysing percentage statistics, that the Oakland A’s began to spend money effectively, win games and become successful.

Similar success stories can be found in many other different sports. As sports teams strive for improvement, it is no surprise that sport science and data analytics is completely booming. Teams with the high levels of resources adopt sophisticated methods of collecting various types of data. Large numbers of sports scientists are employed who analyse in detail player performance and the effects of different training methods and strategies. Evidence suggests that there has been a clear positive impact on performance and, in particular, on fitness. For example, rugby and American football teams have notable seen reductions in the number of injuries to players due to “wearable sensors that monitor the intensity of activity and impact of collisions” (Marr, 2015). However, some sports are easier to extract data from than others. For instance, as Brooks, Kerr and Guttag (2016) point out, sports such as baseball, American football and tennis are much easier to break up into individual events, than sports such as soccer – which, despite being the most popular sport in the world, is yet to accomplish similar levels of sophisticated performance data analytics as other sports (p.49-50). Yet, a lot of data scientists and statisticians are developing new models of performance analysis to improve the game even further.

In conclusion, data science involves collecting, preparing, analysing and presenting data about how the world is in order to gain knowledge and understanding of the world. The majority of evidence suggests that data science is making a positive contribution to society, as it allows for more effective decision making in countless different fields. The main benefit is that it gives empirical evidence which is more objective and reliable than subjective human opinion. This paper has focused on how it has had a positive impact on sports, in particular looking at the famous Moneyball example, in which the Oakland A’s baseball team used data analytic techniques to improve player recruitment and become a strong and successful team. Evidence suggests that performance, fitness and health of athletes has improved because of using data.

Reference List

Brooks, J., Kerr, M., & Guttag, J. (2016). Developing a Data-Driven Player Ranking in Soccer Using Predictive Model Weights. KDD ’16, 49-55. http://dx.doi.org/10.1145/2939672.2939695

Kitchen, R. (2014). The Data Revolution. SAGE publications Ltd.

Lewis, M. (2003). Moneyball. New York: W.W. Norton.

Marr, B. (2015). Big Data: The Winning Formula In Sports. Forbes. Retrieved from http://www.forbes.com/sites/bernardmarr/2015/03/25/big-data-the-winning-formula-in-sports/#45d3db6e26dc

O’Neill, C. & Schutt, R. (2013). Doing Data Science: Straight Talk from the Frontline. O’Reilly Media.

Stanton, J. (2012). An Introduction to Data Science. Syracuse University. Retrieved from https://ischool.syr.edu/media/documents/2012/3/DataScienceBook1_1.pdf

UK Government. (2013). Seizing the data opportunity: A strategy for UK data capability. Retrieved from https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/254136/bis-13-1250-strategy-for-uk-data-capability-v4.pdf