Data Engineering is about collection, cleansing, and transforming data into structured queryable forms. You may get confused by tech terms such as data marts, data lake etc; but really, they are just different containers of data.
Over the past years, data processing architectures have evolved from lambda architecture to kappa architecture. They serve different purposes but are both powerful and effective. Under the hood, there are supporting technology frameworks:
SSIS, Kettle, etc
SSAS, Spark, etc
Typically, data engineering takes up 80% of the time. The remaining 20% is spent on analysis and visualisation of data. Nonetheless, this step is where 80% of value is generated. Data is useless until interpreted.
Most popular tools for self-service BI include:
From 2018 and on, there is a clear trend of moving from pre-developed reports to interactive dashboards where users can play with the data with a powerful set of configurable visual widgets. This is referred to as "self-serviced BI". This approach allows users with abundant domain knowledge to leverage technology at a minimal learning cost.
The concept of "self-service" BI is very attractive. The only problem is, the value extracted is limited by the analytical capabilities of the users. And unfortunately, the mindset of users from different business functionalities can be utterly different and some of them are not so good with data.
This is where data science come into play. In the eyes of many, data science is like black magic. With some statistical model / machine learning applied to data in terabytes, a scientist draws a conclusion with mysterious metrics that only he/she can interpret. Over the recent years this has also been democratised by technologies such as Auto ML.