Introduction to Data Modeling

Data Modeling Diagramming the data objects/points in an information system Any type of data modeling will be always linked to our business needs and requirements, and how easy we want the information to be shown to the end users Data modeling is now about the how the data is going to be processed, but how the the data is best represented Thinking about how we are going to be storing the data, segregate it, build the logic for the data to be fetched ...

August 22, 2022 · 11 min · Satvik Jadhav

Order of Execution in a SQL Query

SQL Query Order of Execution Each SQL query begins with finding the data that we need in a database, this data is then filtered down into something that can be processed and understood as quickly as possible. Because each part of the query is executed sequentially, it’s important to understand the order of execution so that we know what results are accessible where. Let’s consider the below mentioned query : 1 2 3 4 5 6 7 8 9 SELECT DISTINCT column, AGG_FUNC(column_or_expression), … FROM mytable JOIN another_table ON mytable.column = another_table.column WHERE constraint_expression GROUP BY column HAVING constraint_expression ORDER BY column ASC/DESC LIMIT count OFFSET COUNT; Query order of execution 1. FROM and JOINs ...

May 29, 2022 · 3 min · Satvik Jadhav

Data Lake

What is a Data Lake Data lake is a central repository that holds big data from many different data sources Can be structured, semi-structured or unstructured data To ingest data as quickly as possible and make it available asap Used extensively for machine learning and analytical solutions Has to be secure and scale Hardware should be inexpensive so you can store as much data as possible Idea is to store as much data as possible so it can be made available to others, and they can make use of it later. R&D on data products Cannot always define the structure of the data Data Lake vs Data Warehouse Data Lake Data Warehouse Data is unstructured Data is structured Data Scientists or Data Analysts Business Analysts Stores data on the scale of petabytes Used for batch processing, Business Intelligence, and Reporting Stream Processing, Machine learning and real time analysis Data size is generally small Data is undefined, no relation between data Data Warehouses contain historic and relational data Gotchas of Data Lake Starts with a good intention, but soon turns into a Data Swamp: Very hard to be useful ...

May 7, 2022 · 2 min · Satvik Jadhav