What Is Data Stack?

After some years of learning and utilizing R to conduct psychology research and cognitive neuroscience, I have made my mind and joined Convo as their Data Scientist in March 2018. The Director of Engineering is where I report daily and places me on the same software, internal systems, and app engineer teams squarely. 

All I can say is that the journey has been fun yet hard at some point. I found it difficult in the beginning as I am still learning the curves of a software engineer’s job and environment. Besides, I never had any experience with it, even being a software developer or coding stuff. But after joining the team, I was able to realize what I really wanted. The good thing here, Convo helped me dwell with the data engineering department, wherein setting up a data warehouse became my first major project.

The journey is once again tough since it’s my first time hearing about a data warehouse. So, I just asked for guidance from the CTO. Then, they replied that it refers to the place where all of the data are stored, enabling business owners to identify opportunities to make the teams more driven regarding the data decisions. 

Now, this information may be true, but I’m still wondering about the difference between this and the regular MySQL database. Given that, as I have a Ph.D., meaning I have the skills in researching, let us define what a data stack is, as well as the other information connected to it. Nowadays, as modern technology continuously shows up different things and possibilities, it was regarded as the brave new world. However, there is a deeper meaning about this phrase in relation to our topic today. Here, there is a whole data engineering theory host such as community slacks and SaaS vendors

Good thing, this newly-structured ecosystem comes in a handy and practical approach. Here, people have the capability to make a difficult job way easier. It can be done in two effective ways: 1 gathering data from different places and 2 transforming raw data into usable data. On the other hand, there is also a bad side in this regard, especially if there are so many things to settle, such as the ETL. You can also find it hard putting the keywords into Google as it can make you off-track. But all in all, there are changes in the ecosystem that pushed me to do more research towards vendors who are selling legacy and costly data management systems appearing like built on top of an outdated data assumption. After some time of Slack chats, I have made my mind understand and divert into building a data stack. Do you have an idea about it? 

Defining a Data Stack 

A data stack refers to the tool that makes data edible. It can be compared to a kitchen for data. You can think of it just like baking a cake wherein different ingredients make an edible cake. 

Can you visualize how a cake is made from different ingredients in the kitchen? Well, most cakes aren’t edible at all since it is made of butter and flour most. However, we are going to focus on the proper tools used in the baking, such as the kitchen timer, oven, mixing bowls, spatulas and spoons, and even an instructor. When all of these are utilized, it then makes up a lovely cake to munch off. 

So, just like baking a cake, there are bits of data that aren’t considered edible. However, after making more journeys with a data stack, these data bits can be turned into dimension tables and useful fact having clear field types and names that can be digested by the company’s different departments. Now, let’s move on to the tools composing data stacks and their function. 

Loading:

This tool allows the movement of one data to another place. The vendors included in this are Stitch, Fivetran, and Alooma.  

Warehousing:

This tool serves as the storage place of data excluding the Cloud. Vendors included here are Snowflake, Redshift, and BigQuery

Transforming:

Next, this tool transforms not edible data into edible ones. Here, vendors named XPlenty, dbt, and ETLeap are included. 

Analysis and Business Intelligence:

This tool is responsible for serving the data stack into teams. Periscope, Mode, Metabase, Looker, Cluvio, and Chartio are included here as the vendors. 

Any tech company that shows care to the data needs to have data stacks that perform the above-mentioned four functions. To share with you, the data stack that I built at Convo is composed of these functions. These then allowed me to develop a data stack just like my dream before. Now, all of it is starting to function well. 

My experience and education in this realm have aided me effectively with the help of Fishtown Analytics- creator of Sinter and dbt. Also, the dbt Slack community helped me. If you’re a newbie in data engineering or you’re a data scientist, this is the most recommended place for you to go. You can also learn from Stephen Levin or view the videos of Future-Proofing your Analytics Stack from Mode. Here, you can learn how to compare modular data stacks and all-in-one. These are necessary as they can provide you with answers about data stack designing. 

Note: Going back to my previous question about using MySQL database for a data warehouse, the answer is because the imported amount of data from the sources can be massive. This then can consume more processing time transforming to data. It’s better if you’ll use flexible cloud-based data warehouses as it effectively does the needed work. Also, it helps in virtual data loading tools as support to the data destinations, data reporting, and others else. 

Given these pieces of information, you can now start your data stacking journey. It’s better to start and learn early than late. Remember, the longer you work with data stacks, the more experience and learning you can get from it. So, never lose grasp of the chance. Make your life be spent to its best with data stacking! 

Also Read: Master ML, Data Science, SQL & Big Data With These Cheat Sheets