Data Science is about using data to create as much impact as possible for a company. The impact can be in form of multiple things, it could be in the form of insights, in the form of data products, or the form of product recommendations for a company. To do those things, one might need tools like making complicated models or data visualizations or writing code. But essentially as a data scientist, one has to solve real company problems using just the data using any tool available. The words ‘data science’ is the process of using data to find solutions or to predict outcomes for a problem.
Let’s consider an example for better understanding Data science. Suppose you look for shoes on Amazon, but you do not buy them then and there. Now the next day, you’re watching videos on YouTube and suddenly you see an ad for the same item, again you switch to Facebook, there again you see the same ad. This happens because Google tracks your search history and recommends ads based on search history. This is one of the coolest applications of data science.
Before Data science, the term data mining was popularized in an article called “data mining to knowledge discovery in databases” in 1996, in which it referred to the “overall process of discovering useful information from data.”
In 2001, William S. Cleveland wanted to bring data mining to another level. He did that by combing computer science with data mining. Basically, he made statistics a lot more technical which he believed would expand the possibilities of data mining and produce a powerful force for innovation. Now after taking advantage of computer power for statistics and he called this combo data science.
This is the time when web 2.0 emerged where websites were no longer just a digital pamphlet, but a medium for a shared experience amongst millions and millions of users. These are web sites like MySpace (2003), Facebook (2004), and YouTube (2005). We can now interact with these websites meaning that we can contribute, post, comment, like, upload and share, leaving our footprint in the digital landscape we call the Internet and help, create and shape the ecosystem we now know and love today. That’s a lot of data, so much data, it becomes too much to handle using traditional technologies. So it was called ‘Big Data.’ That opened a world of possibilities in finding insights using data. There was a need for parallel computing technology like MapReduce, Hadoop, and Spark.
So the rise of ‘big data’ in 2010 sparked the rise of data science, to support the needs of the businesses to draw insights from their massive unstructured data sets. So then, the journals of data science described data science as almost everything that has something to do with data. Collecting, analyzing, and modeling. Yet, the most important part is its applications, which include all sorts of applications.
The general public thinks of data science as researchers focused on machine learning and artificial intelligence (AI), but the industry is hiring data scientists as analysts. Therefore, there is a misalignment, the misalignment is that most of these data scientists can properly work on more technical problems but big companies like Google, Facebook, Netflix have so many low-hanging fruits to improve their products that they don’t require any advanced machine learning or the statistical knowledge to find these impacts in their analysis.
Being a good data scientist isn’t about how advanced one’s models are, but how much impact one can have with one’s work. Data scientists are not data crunchers, but problem solvers, strategists. Companies will be giving data scientists the most ambiguous and hard problems and expect in return to be guided in the right direction