Data Science

What is Data Science?

Data Science refers to the multi-disciplinary field that extracts knowledge and insights from data using processes, algorithms, and systems based in the scientific method. Data science is a melting pot of statistics, machine learning, and data analysis, using each to their strengths to understand and conceptualize real-world phenomena. Referenced as early as 1960, the term "data science" didn't really come into its own until 1990's. As technology became more powerful, and the access to information became more widespread, the applications of data science proliferated. Turing award winner Jim Gray refers to data science as the "fourth paradigm" of science, along with empirical, theoretical, and computational. Data science offers a way of finding patterns, making predictions, and performing analysis to better understand a world that is more connected than it ever has been before.

Image from Flatiron School

Data Science vs. Statistics

It is often easy to confuse data science with statistics, and while there is much overlap, there are some defining functions that separate statistics from data science. Statistics present generalizations of phenomena within data sets, and while useful for a brief understanding, data science combines the principles of other disciplines in order to create an in-depth understanding of the data. It is important to note, that data science incorporates statistics, but is not limited to their use.

Applications of Data Science

Since the exponential increase in computational power, and the proliferation of a more connected world, data science has become more relevant than ever. Data science can be applied in almost any discipline that involves taking in large data sets and extracting insight. From warehouse delivery logistics, to the cutting edge of artificial intelligence, data science is an invaluable tool for the creation and function of much of our modern world.

Shipping and Transportation

Shipping companies all across the globe use data science to better optimize shipping routes, delivery times, and even mode of transport. By processing large amounts of tracking data, shipping companies are able to improve their efficiency as well as cut costs. For example, UPS analyzed the delivery routes of all of their trucks and compared it to fuel usage. They found that, on average, a driver use more gas navigating a left turn, rather than a right turn. This information was used to encourage their drivers to make more right turns than left. Since then, the company has saved over 10 million gallons of fuel each year.


Data from hospitals and healthcare facilities can be analyzed in order to provide better treatment to patients, as well as better understand the nature of disease and illness. Data science provides doctors with the tools to offer quicker turnaround times for lab results, leading to increased patient satisfaction, and in some cases, life-saving preventative treatment.


Banks use data science to comb through their financial data in order to detect fraud. In order to minimize the bank's losses, customers are now profiled based on their spending and saving data. Using data science, banks are able to better understand their customer's past expenditures, creating probabilities for potential risk and default. Similarly, by using data science tools to understand a customer's spending profile, a bank's customers are now alerted more promptly to potential compromises to their account.

Data Science and Machine Learning

Machine Learning is built upon, and very much a part of, the principles of data science. Machine learning algorithms use, or rather learn,  large data sets as training. They do so using methods such as regressions, transformations, and/or supervised clustering. By training the algorithm, machine learning can better fine tune its parameters and more accurately make predictions about upcoming uncertainties.