Microsoft Fabric is the Future

The future of data analytics providing full technology stack

These days many companies are working with data from a plethora of data sources and locations, using on premises data storage and data stored in the cloud, connecting to Azure SQL, AWS to name a few as well as working with different types of data e.g. structured, unstructured (images, video) and semi-structured(csv, json files). There are data governance issues to consider as we navigate all of this. Being able to ensure data is secure and that we know who has access to what data instantly is now a necessity. Breaking down data silos is a crucial step for any organization aiming to maximize the value of its data.  

 

Over the past few years there’s been a massive evolution in the way we approach data storage as the data warehouse. The data warehouse methodologies of storing structured data are no longer enough, or they may be sufficient now, but don’t  allow the flexibility and scalability that we need. We’ve seen the inception of data lake, data lake house and data mesh. 

 

Eventually, whatever approach we take in an organization, the main goal is to add business value.  

Every company has its own unique way of working with its different data sources, meaning that the data architecture needs to be flexible enough to support its needs. This means that we may need to access different tools to support our organization.   

 

For one set of data, we might need a data warehouse, whereas at other times we might need to consider storing data in a data lake. Or we might need to use a big data solution like Apache spark to be able to process large volumes of data quickly. Having tools that allow us to analyze your data from visualization to data science capabilities are now the norm. But the challenge with all of these is that quite often, these tools are limited in the way they interact with each other, forcing us to spend a lot of time connecting and stitching them together. This is where Microsoft Fabric comes in. It’s a platform that provides these different experiences under one roof, and more importantly, ensures that they can interact with each other seamlessly. 

 

Data Analytics technology stack 

 

For simplicity, the technology that empowers data analytics in a company can be organized into three layers: They are piled upon each other, and each layer relies on the one below to function properly. Real-world data infrastructures may be more complex and contain several layers  

Let’s take a bird’s eye view of the fundamental features that you need to know about for each layer of the stack: 

 

  • The bottom layer is the Physical infrastructure. Companies can build and maintain their own physical infrastructure or rely on cloud providers like Azure from whom they rent only the required resources 
  • The middle layer is the data platform. Technology at this level implements a data infrastructure (data architecture). Data from different locations is virtually unified on a simpler consolidated view such as a data warehouse or lake house. 
  • The top layer is made of Applications. At this level Data Analytics applications get implemented into user-facing apps such as Power BI.  These applications leverage both the consolidated data and the horsepower provided by the foundational platform to serve users such as business analysts and data analyst who can explore data and identify insights. Others will enable expert users such as data scientists to take it to the next level by building predictions.  

 

                                                                

 

 

Microsoft Fabric – A complete data analytics toolbox 

 

Microsoft Fabric is a cloud-based platform where you can store, prepare and analyze your data all in one spot. 

It is a complete platform for all analytics workloads, and it includes data movement. Microsoft has integrated services for data movement, transformation, storage and visualization. It provides different experiences that include: 

  • Data engineering where you can use Apache Spark and do coding in Pyspark and PySQL.   
  • Data warehouse which leverages TSQL 
  • Realtime analytics that involves storing data in a KQL database and querying in Kusto 
  • Data Factory is a process by which data is automated by a sequence of activities for the data ingestion  and transformation process. It will allow you to combine low code power query data (Data Flow Gen2) flows with the scale and power of data factory pipelines. 
  • Power BI allows you to create the usual suite of reports by connecting to data sets created via the first three experiences of data engineering, data warehouse and real time analytics. 
  • Data science that enables the data scientist to build machine learning models seamlessly within the fabric environment. It integrates with Azure machine learning to provide built in experiment tracking and model development. It also allows data scientists to enrich organizational data with predictions and allows business intelligence users to integrate these predictions into their reports. It also gives them ability to reuse existing data. They don’t have to copy data from somewhere and rewire things up 

 

 

You can use as much or as few of these as is relevant to your organization, ultimately bringing all your data analytics activities together in a single platform.   

 

 

Microsoft Fabric empowers every user in an organization 

 

If you’re coming from a Power BI background, the Fabric UI offers a familiar experience. You would be familiar with concept of workspaces, capacities and how to create items inside of the service itself. It’s the most significant expansion to the Power BI and analytics platform.  

 

For those who are coming from Excel background and are familiar with creating Pivot tables, building Macros and Power Query Power Pivot, Microsoft Fabric’s Data flow Gen2 provides low code power query experience. Power query allows you to connect to over 200 data sources anything from files to on premises data sources to cloud data sources to systems and platforms like snowflake and bring the data together. The real advantage come because you’re combining data from different places to give you context.  

    

For those who are coming from Data Factory and or Azure Synapse Analytics, this will be a new experience for you as this is a SAAS service. You don’t have to provision these services; they’re made available to you. Before Fabric, you had to wire up all these different services. It was complicated to get going. You may have been an expert in Azure Synapse Analytics  or data warehouse or database, but you didn’t necessarily know how to use Data Factory or setting up data pipelines.    

 

 

Or you may be coming from data science, and it was really hard to get your data. What do I have access to? Where is the data located ? Who do I need to talk to get my data 

 Microsoft Fabric’s One Lake – One Storage 

 

 

The data lake or Lakehouse is going to be the central foundation on which all of this is built, and this is known as One Lake. Data is accessed from One Lake through data items that include the following: Data Warehouse, a Lakehouse, a KQL Data base and a Power BI dataset which will show as dataset defaults in Microsoft Fabric. It’s built into the fabric service, whose purpose is to provide a single integrated environment for data professionals of all types, whether you work with Power BI, whether you’re primarily a data engineer or a data scientist, Fabric is addressing all of this. This is something like OneDrive for data which is basically you’re one stop shop for storing all files in various folders and so on. We don’t have to move the data. We can take advantage of shortcuts where we can actually leverage existing data that maybe out there whether it’s in Azure Data Lake storage , maybe your data is in the Amazon, Azure or a different cloud, these things can be made available where you can just create that shortcut. You’re not moving data, it’s still where it was,   and you can leverage that and create amazing reports in Power BI or do data science operations on top of that 

This platform uses Apache Spark and SQL compute engines for data processing, and it’s going to scale automatically as usage  dictates. It will support data warehouse capabilities, acid transactions and SQL support via SQL endpoints.   

 

To summarize, MS Fabric is an innovative and powerful platform developed by Microsoft that aims to revolutionize the way we build and manage our data, enabling us to store it and then prepare it quickly and efficiently for analysis so that you can support your organization to be successful . It’s designed to tackle the complexities and challenges that arise when dealing with large scale applications and disparate datasets.  

Fabric ensures data estates are united, data governance is enhanced, and a collaborative ecosystem is nurtured. By dismantling data silos, Fabric makes data readily accessible, streamlining extraction of comprehensive insights.  

Other Blog Posts