What does a data scientist do?

A brief description of the main data careers

Ricardo Pinto
4 min readSep 18, 2022
Photo by fabio on Unsplash

Ever since I became a data scientist, every time someone asks me what do I the follow up question is always: OK, but what is that?

So, nothing better than writing in a non technical way by my own words what a data scientist does.

And if we understand its surroundings by also knowing his fellow data professionals, we will get a better grasp of the role of a data scientist.

So here goes a brief description of the main data careers in the data world. Starting obviously with the:

Data Scientist

When I am asked what data science is, I always like to answer with the following quote:

“(Data science) It is the capacity of making sense of your universe and do something useful with it.”

Cassie Kozyrkov*

Because that is exactly the role of a data scientist.

Understanding a problem and translating it into a solution using the data available.

In other words, our role is to understand the business needs and try to find the path that balances the gain and effort using the required data to do so as well as the proper tool (including the famous Machine Learning).

Which implies that sometimes we need to convince the stakeholders that a simple analysis, or a solution based on heuristics is enough to solve the problem.

This is probably the most generalist career of all. Not only because a data scientist needs to understand the whole data production chain to think of a solution, but also because during the prototyping phase a data scientist has to wear the hats of the remaining roles, such as the:

Data Engineer

Which might be the most misunderstood role of them all.

Throughout my career as a civil engineer (which is my graduation) I have realized that the general population does not understand very well the role of an engineer, maybe due to the fact that an engineer, either civil, mechanical, or even software, has as the main responsibility to plan, oversee, calculate, design/develop everything the people usually does not see.

For example, a civil engineer will worry if the structure of a house is sound, a mechanical engineer with the fatigue of production line parts’, and a software engineer with the algorithm.

And a data engineer follows this rule, he is responsible for planning, executing and guaranteeing the correct execution of the data stream.

In a broad way, he handles all the ingestion, maintenance and storage of data so that it can be delivered with the proper quality to those that need that data.

Which is why he is looked at as the hero without a cape. Because not only it is an “invisible” job, which many times doesn’t translate into direct profit($$$), but also guaranteeing that the data gathering does not fail while maintaining a data quality standard, is often an herculean task. And that quality deliver makes him the best friend of:

Data Analyst

The one which the name better translates the actual role: analyzing data.

Which in practice can be summed up into two main responsibilities: translating the data into business metrics (aka KPIs) or answers to pertinent questions and actively looking for insights.

On the contrary to popular belief, he is not a dashboard machine, however dashboards are a very useful tool in their daily job. Machine Learning is also a tool this data professional can use to enrich the analysis.

He is the person that has to be closer to business knowledge in order to make his work more effective.

These three roles are considered the main data roles. However, it is a market in constant mutation, and the companies themselves are still defining what they need to deliver data products. That is why two other roles are becoming quite common in the data market. The first one is the hyped data analyst “brother”, the:

Analytics Engineer

Despite its grey scope among the industry, in simplified terms an analytics engineer is a hybrid between the data analyst and the data engineer.

It is expected that he delivers/does everything a data analyst does. However, it is expected that he goes beyond and also automatizes and maintains its data streams, which would be his data engineering side.

And to wrap up, another hybrid, this one already more common, the:

Machine Learning Engineer

This professional main responsibility is to produtize the data products prototyped by the data scientists or to make them more robust, fast, scalable and easily re-adjustable.

That is why he becomes a hybrid between a data scientist and a software engineer, because he needs to master all the statistical and machine learning techniques used by the data scientists and encapsulate them in good code practices.

However, and particularly among small organizations, this scope ends up being absorbed by the data scientist.

And so we reached the end of the brief summary of the main data roles in the data industry. I hope that the next time you meet one of these professionals you are already able to say: ohh so this is what you do!

If you want to reach out and talk about this or any other subject, add me on Linkedin.

*I confess that I do not remember in which podcast Cassie Kozyrkov says it. If you know which one, let me know so I can make this reference more assertive.

--

--

Ricardo Pinto

Data Scientist with a civil engineering background. Water polo player. Loves ML/AI, data, decision science, gaming, manga.