Data Science As We Know It

introduction to data science

Data Science

There’s a well-known presentation slide that goes something like this: “Data Science is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it…”

Sounds right, but what actually is Data Science, in that case?

A buzzword!

A buzzword that is up to us to define. The definition changes according your job function. For someone in marketing, a Data Scientist would seem to be a person with a strong inclination towards statistics, databases, and programming. For us in the IT industry, programming and database skills are not a mystery. Hence a Data Scientist is usually someone with strong analytical capabilities who understands the business value of data and can use Machine Learning to answer business questions.

I think that Dave Holtz has a good point that the required skillset differs depending on who you are talking to.

What I believe is specifically important:

  • ability to access different data sources
  • strong background in statistics and linear algebra
  • proficiency in either Python or R from an analytical, statistical, or Machine Learning perspective
  • ability to draw conclusions from data and analysis
  • data visualization

When do we experience Data Science?

Some say Data Science and Machine Learning are becoming commodities. For sure they already exist as services. In my opinion, commoditization would lead to a disconnect between a business’ “real” operations and the analytics that are supposed to be associated with them. The result would be poor business decisions. At the moment, I see this type of disconnect growing in the AdTech industry, where there is less transparency than ever. I like to think of Data Science as a value add resulting in better business decisions, more precise models, workforce time savings, and subtle insights otherwise hidden in data.

I like how Experfy has divided their marketplace:

At the same time, Data Science is certainly used in this industry and many others. Every time you see an ad, go to your facebook account, search in google, receive a post package, see your CT scan – your are experiencing some form of Data Science. Take reCAPTCHA, for example – this was a way for Google to teach their neural network to recognize objects in a picture.

One thing to note is that while the overall direction of Data Science is unclear, there is a strong shift towards the commoditization of Data Science models. Google, Facebook, Amazon, Microsoft are opening their huge neural nets for image recognition, speech analysis, and translation.

Big Data and Data Science

This will be short: I believe that one cannot live without the other.

Simply storing data without using Data Science to analyze it is pointless.

The size of datasets we analyze will NEVER be smaller. They are constantly growing bigger. Big Data presents a great opportunity – with both horizontal and vertical scalability –  as a HUUUUGE enabler of Data Science analysis otherwise not possible in a reasonable time or with reasonable resources. Technologies like MapReduce, Spark, Tensorspark, Flink, and columnar databases make Data Science fun.

Data Science at Lingaro

The Data Science team is split between P&G and New Business. There are now 20 of us in total.

Data Science

Our favorite Python and R packages:

Data Science


Don’t miss this video about Data Science in action for a good cause.

For more data science news, check out the data4poland Facebook page.

Find out more about our Data Science services and solutions here.

Related News