In the last few years, it is very common to see in social media and many other platforms many posts and blogs talking and explaining the different roles of professionals within data fields.
Many people are trying to explain what the difference between a data analysis versus a data scientist versus a data engineer is, etcetera.
What most (if not all) of those blogs have in common are:
- The definition of each role is highly determined by competency for the individual, the job, or the type of projects / problems that it must solve
- All of them always include a Venn diagram!
In this blog, let’s try to take another different approach, more data centric rather than competency centric, and of course, we will do it without using a Venn diagram.
Not sure about you as reader, but personally I find Venn diagrams not very friendly and not always intuitive.
So, in one of my previous blogs where I explain the Digital Transformation journey, based on the data maturity model, it is easy to intuit that every specialization of each role is highly based on the level of maturity of the ‘state of data’ within the organization and the professional.
If we observe the Data Science Maturity Model, we can identify the relationship among the status of each phase of the data and the value added to the organization
Each maturity state requires specific skill sets and professionals. However, there are 2 great streams when we speak about the type of skill sets, and therefore professionals: The analytics stream, and the engineering stream.
The analytic stream requires individuals with the ability to analyze data, extract insights and generate knowledge, based on data.
The engineering stream in the other hand, requires individual with infrastructure knowledge to store, manage and administer data in multiple sources, as well as allow the data to flow correctly among the different pipes and applications.
We can now represent graphically both streams like this, based on the Data Science Maturity Model:
Now, let’s go with the role definition
Data Analytics stream
This is the professional who makes basic reporting and analysis in any organization. It uses desktop applications to perform its tasks. It describes the past.
Professional capable to aggregate and multiple historical reports and identify patterns among the different data points and is capable to explain ‘why things happen’ with data. It uses more sophisticated desktop and cloud tools.
Machine Learning engineer
Individual who develops predictive analytics based once patterns on historical data has been identified. It uses programming languages and normally cloud applications.
Artificial Intelligence developer
Expert machine learning engineer capable to program complex algorithms where machines can make decisions and anticipate the most likely event, based on historical data analyzed. It works with programming languages and requires large cloud computation solutions.
All those roles are part of the ‘Data Science’ domain knowledge.
Data Engineer stream
Individual who mainly works with relational databases, maintaining the right storage of structured / semi-structured data, architect efficient data warehouse solutions and administer access to the different analysis roles. Usually known as Data base Administrator or Data Base Manager.
Big Data Engineer
Engineer capable to work with unstructured data and able to architect storage solutions for large amounts of data (petabytes or more). Knowledgeable with Document database or No-SQL databases. It works with highly sophisticated cloud storage solutions.
One of the most frequent questions that I normally face when all those roles are well understood is: can a data scientist become a data engineer, or a data engineer become a data scientist?
In my experience and based on many professionals that I have met; it is possible only during the early stages of the maturity model. When the maturity stage is in the second half of the chart (Predictive and Prescriptive analytics) it became a lot more complex; and definitely when at the very end of the chart (Prescriptive analytics), I have never seen anybody capable of handling both AI applications and Big Data Storage / Architecture.
More DAC Content by Pablo Moreno