top of page
  • Writer's pictureIshan Deshpande

What is Microsoft Fabric?

Updated: Jan 10



When we built any data solution there are many people involved working on various technologies. Mainly we have data engineer who work with ETL tool, data analyst who work on reporting tool, data scientist who work on ML model and administrator who ensures security and compliance. Most of the technologies we use here are discrete, that means they must be managed separately, need to be integrated with other services & have different subscriptions/ billing. This results in a lot of efforts in managing it and increases the cost.

To solve these issues Microsoft has come up with “Fabric”, which is an all-in-one analytics solution for enterprises that covers everything from data movement to data science, Real-Time Analytics, and business intelligence. You just need to have a fabric capacity and that’s all!

Lets take a deep dive into what all things we have in Fabric.




Microsoft has divided fabric into sections which they call as Experience, which is based on the role of a person who will be using it. Lets see what are those.

 

1. Power BI

It’s a world leading business analytics tool. There are 100+ connectors using which we can get data into Power BI and create datasets & Reports. Using Power Query, we can transform the data as per our report requirement. Along with reports we can also create dashboards and apps which can share the report content with large audience seamlessly. Its a world leading solution for anything and everything to do with reporting.

 

2. Data Factory

This experience includes Dataflows and Data pipelines

 

a)  Dataflows Gen2 - Dataflows provide a low-code interface for Extracting, Transforming and Loading data. There are 100+ in-built connectors and Transformations in Dataflow. It can be run repeatedly using manual or scheduled refresh, or as part of a data pipeline orchestration. It is built using Power Query engine which allows all users to ingest and transform data in their data estate.

b) Data Pipelines

It’s a combination of activities in a specific order which create a flow. You can consider activity as a task, for example- copy data, call Dataflow, invoke pipeline, send an Email etc. It also includes control flow statements like if condition, Foreach etc.

 

3. Data Engineering  

It includes all tool you need for your ETL process; some important components are -


a)  Lakehouse – Lakehouse is a combination of data warehouse and data lake, here we can store and manage structured and unstructured data in the same place. By default all table are created as Delta tables. It also has a SQL endpoint through which we can view and analyse the data using SQL queries.

We have a new feature called shortcut using which we can add internal and external sources like Azure data lake and Amazon S3 in the Lakehouse. Shortcuts just create a link to that data, so the data remain at its original source but still works lightning fast.

b)  Notebook - Notebooks are an interactive computing environment that allows users to execute code in various programming languages, including Python, R, Scala and SQL. You can use notebooks for data ingestion, preparation, analyze,  and loading into different destinations.

c)  Apache job definition - Spark job definitions are set of instructions that define how to execute a job on a Spark cluster. It includes information such as the input and output data sources, the transformations, and the configuration settings for the Spark application. Spark job definition allows you to submit batch/streaming job to Spark cluster, apply different transformation logic to the data hosted on your Lakehouse along with many other things.

 

 

4. Data Warehouse

In simple terms it’s a high-performance database system which can work with a huge amount of data. It’s like Synapse data warehouse and we can use SQL commands for read and write operations, unlike the SQL-end point of Lakehouse where we can just read data. In Fabric warehouse we can create tables, schemas, views, functions and stored procedures. We can also create measures and data model which will be useful when we create reports on the warehouse. To get data into data warehouse we can use T-SQL, Data pipeline and Dataflow Gen2.

 

5. Real-Time Analytics

It includes tool required for extracting, transforming , loading and visualizing streaming data, some important components are -

 

a)  KQL Database – Its like a SQL database for streaming data. It has its own language called Kusto Query.

b)  KQL Query Set – It’s a development studio for saving, managing, exporting and sharing KQL queries.

c)  Event Stream – It’s a hub for streaming data, where we can get data from multiple sources, transform it and save it into a destination like KQL database, Lakehouse etc.

 

6. Data Science

This experience includes tool for building ML model. Important components of this experience are –


a) ML model – This is the place where we define our algorithm, train the model and use it to predict outcomes and detect anomalies in the data.

b) Experiment – Its used to create, run, and track development of multiple models to validate our hypothesis. We can compare different runs of a model using visualization which help us to identify which one is better.

c) Notebook – It’s a coding environment which we can use to build ML models. It has some cool features like data wrangler which is used for data cleaning. It has a set of operations we might need to perform as part of pre-processing, and with just few clicks it generates the code for you.

 

7. Data Activator

It is a no-code experience in Microsoft Fabric for automatically taking actions when patterns are detected or conditions are met in changing data. It monitors data in Power BI reports and Event-streams items and takes appropriate action such as alerting users or kicking off Power Automate workflows when defined conditions or thresholds are met.

 

Another important components of Fabric are OneLake and Purview.


OneLake is a single, unified, logical data lake for your whole organization. Like OneDrive, OneLake comes automatically with every Microsoft Fabric tenant and is designed to be the single place for all your analytics data. It works on a principal of one data one copy to have a single source of truth.


Microsoft Purview provides a unified data governance solution to help manage and govern your on-premises, multi-cloud, and software as a service (SaaS) data.


Conclusion

To summarise its a one stop solution for all your data projects, with state of the art technologies. Experience make if easy for the developer or consumer to find services they need. We can get batch/ streaming data, transform it, store it and built reports and machine learning algorithms on top of it. All data is stored in OneLake, so accessing it from any service is very easy.


We will talk in details about all of this in upcoming blogs, that’s all for this blog see you in the next one!

bottom of page