Azure Data Fundamentals Certification (DP-900) - Full Course to PASS the Exam

618.55k views53032 WordsCopy TextShare
freeCodeCamp.org
Prepare for the Azure Data Fundamentals Certification and pass! ✏️ Course developed by Andrew Brown...
Video Transcript:
hey this is andrew brown your cloud instructor at exam pro and i'm bringing you another complete study course and this time it's the azure data fundamentals certification made available to you on free code camp so this course is designed to help you pass exam and achieve microsoft issued certification and the way we're going to do that is by going through lots of lecture content getting some hands-on experience with follow-alongs and on the daily exam i have these really great cheat sheets that are going to help you uh pass for sure so the great thing is
at the end of it you'll be able to get that certification and show that on your resume or linkedin that you have the azure knowledge so you can go get that data job or that promotion you've been looking for just to introduce myself i'm previously the cto of multiple edtech companies 15 years experience with five years specializing in the cloud i'm an aws community hero and i've published many many free cloud certification courses and i love star trek and coconut water i just want to take a moment here to tell you that this video cloud
certification course is made possible by viewers like you and i appreciate your support and thank you if you want to help support more free cloud courses just like this one the best ways to buy your extra study materials at exam pro dot co forward slash dp hyphen 900 to get study notes flash cards quizlets deliverable cheat sheets practice exams you can ask questions and get some support from our cloud engineers and just so you know if you want to keep up to date with upcoming courses you can follow me on twitter at andrew brown that's
uh and what you can do is once you do that you can tell me you passed the exam you can tell me what you'd like to see next because a lot of times the next course i make is based on the feedback i get from you so let's jump into the course now [Music] hey this is andrew brown from exam pro and we are at the start of our journey asking the most important question first which is what is the dp900 so the azure data fundamental certification is for those seeking a data related role such
as data analysts data engineer or data scientist and the certification will demonstrate a person can define and understand coordinated concepts hadoop workloads apache spark workloads mssql databases nosql databases data lakes data warehouses elts and big data analytics and more the certification is generally referred to by its course code the dp900 and it's the natural path for the azure data engineer or the azure data analyst certifications this is an easy course to pass and great for those new to cloud or data related technology so let's look at our big old road map here and what i'm
going to do is pull out my laser pointer so you can see where i am and we'll get some lines in here because i want to show you some of the paths you can take with the certification so we're over here with the dp900 and this is at the fundamentals level right and so a lot of times what people will do is they will have the az-900 oh they'll take that first if they're 100 new to cloud so i see a lot of people they start here and if you're if you've never used azure before
it's a great starting point and then they'll move on to the dp900 or sometimes what they'll do is they'll move on to the ai 900 and then onto the db dp900 or they'll take the dp900 and then the ai fundamentals but the reason you want to take this data fundamentals certification is primarily because you you're very likely want to take the data engineer certification or the data analyst certification which is at the associate track so this one it's basically uh the dp900 but actually knowing how to implement everything and then the data analyst is really
focused on power bi so just knowing how to use uh power bi uh to uh to its maximum extent okay now a lot of people that are going for the data scientist or ai engineer track will take the dp900 because you do need foundational knowledge about data to understand these roles and so you know it's just if you are going for the data scientist you probably want to add the dp900 to your track but if you've taken the az900 and the az-104 you might want to skip over this it's just up to you but yeah
there's a lot of ways that you can go about this okay and we'll just move on uh here so how long to study to pass the dp900 so if you have one year experience with azure you're looking at five hours of study if you've passed the az 900 you're looking at 10 hours if you're new to cloud you're looking at 15 hours of study for new people i'm saying 30 minutes a day for 14 days the course content itself is not very long but the thing is you have to factor in that you have to
do um you have to do practice exams and you also uh just have to put that knowledge into practice by using the console so even though you know the course content's not long there's additional time you have to do there where would you take this exam well you can do it in in-person test center or online from the convenience of your own home so there are two options you got psi and pearson vue and uh they both actually do in person and online and so just so you know if you ever heard the term proctor
a lot of times we talk about online exams being proctored exams because there's a person or supervisor that's watching you take the exam online okay so what does it take to pass this exam well you got to watch the lecture videos you got to do hands-ons and follow alongs and it helps to do paid practice exams that simulate the real exam this is pretty easy certification so you could probably get away with um [Music] okay so what i did was i just googled dp900 exam guide and so i made it to the microsoft site if
you scroll on down you're looking for that skills measure download that pdf open it up it will be over here now it's very common to see this red text here azure loves to update their exams even once a month they'll do this to me and people ask me are your exams out to date i'm like no they're just making my new changes if they do make major changes what they'll do is actually release a new version they'll call it the dp901 that's when you should be concerned about changes uh but uh yeah they do it
frequently so what you'll do is if you get if you see red text you gotta scroll down to the real section because this is the old one and this is the new one where they make minor changes okay and what we'll do is work our way down here and and take a look here so describe core data concepts so batch and streaming relational data we have data linux concepts so visualizations bi types of bar like types of charts they're talking specifically about power bi because there's a lot of different kinds of visualizations describe analytic techniques
so descriptive diagnostic predictive prescription cognitive most places don't describe cognitive so they add that additional one there elt and etl so um uh extract elt is more common for cloud so that's the one you really want to understand describe concepts of data processing onto how to work with relational data so describe what a relational workload is describe the structures within a database if you've ever worked with any kind of database you already know them tables indexes views columns et cetera we have described relational data services so they do p-a-s-i-a-s and s-a-s they're specifically talking about
in that comparison you'll see that like in the az 900 uh those three though i think they might have removed them as of recent but um they're specifically talking about the azure sql family so down below here you have azure sql and underneath it has a bunch of variants so like sql database sql manage instance and virtual machines and these sit in the pas and iis and you have to know that okay describe azure synapse analytics describe database for postgres mario by sql that's the open source sql databases uh so then we have relational data
so uh provisioning deployment of relational data deployment with the portal azure resource templates powershell cli you know what i don't ever see these on the exam you know but um you know they they have this in here okay identify security components now they say firewall they're actually talking about when you have a database because there's actually azure firewall but then there's um a a server firewall built into azure sql and that's what they're talking about there authentication like how to connect because there's a few different ways you can do that connectivity from on-prem to azure
vnets etc identity query tools so azure data studio sql studio management sql sql cmd utilities and things like that describe query techniques for using uh sql so compare ddls with dmls there's actually a lot more types of um data language files for sql so we do them all just because it's you know it's the proper way to do it but they only care about these two query relational data in sql database uh azure disk postgres azure database for mysql we'll go on down to this section here non-relational data workloads so describe the non-relational data describe
non-relational and nosql data recommend correct data stored determine when to use non-relational data and they're all talking about um because like non-relational data is mostly cosmodb a cosmodb like has a bunch of different sql engines there so that's going to help you understand that like graph and document key value store things like that describe non-relational data so we have table blob files cosmodb identify basic management tasks for non-relational data so provisioning deployment of non-relational data services describe method of deployment in uh azure portal et cetera et cetera again i don't see these a lot on
the exam so i don't know why this is in here identify data security components um so it's the same thing as before it's just for non-relational data basic connectivity issues v-nets etc identify management tools for non-relational data um describe analytical workloads so transactional data the difference between transactional analytical and we're talking about olap and oltp okay difference between batch and real time uh warehouses data warehouse solutions describe modern data warehouses so here we're talking about data bricks hadoop systems synapses is kind of like a data lake house and then actually azure data lake and it's
storage medium down below we have data ingestion so loading data azure data factory um hd insights databricks etc and then down with below we have a whole section on power bi what i'm surprised is they don't have much like on azure um streaming analytics and event hub because a lot of these have to consume from there so that's something i think that should be on the example they don't have it in here but yeah it's uh it's not a super hard exam it's mostly just describe describe identify see so you're all going to be in
great shape and i hope that helps you out and we're on to the actual course now [Music] hey this is andrew brown from exam pro and we are taking a quick look at all the core data related azure services that we are likely to encounter through this course so let's get to it the first starting with azure storage accounts and this is an umbrella service for various storage types such as tables files and blobs we have azure blob storage and this is a data store that stores things as objects instead of files and the advantage
here is that you get distributed storage these objects can span multiple machines for unstructured data you have azure tables which is a key value no school data store more like a database but it's under azure storage accounts and it's intended for its simpler projects you have azure files and this is a managed file share for nfs or smb so if you need a file share or file system that you need to mount to multiple virtual machines or workstation this is what you would use you have azure storage explorer this is a standalone application you download
to your windows linux or mac machine that easily allows you to explore the various services above then you have azure synapse analytics this is a data warehouse and unified analytics platform the service used to be called something like azure warehouse but they added analytics on top of it kind of making it into a lake house service and so that's what it is now we have cosmodb this is a fully managed nosql database service that can host various nosql engines such as azure tables documents key value and graph when you use cosmodb it's going to have
a core mode and that pretty much is its documents engine so a lot of times when we talk about cosmodb we just think of it as a documents documents database but it can actually have a variety underneath you have azure data lake store generation two we won't talk about gen 1 because it's just not really in use anymore but this is a centralized data repository for big data blob storage designed for vast amounts of data it actually is just azure blob storage with an additional layer of management you have azure data analytics this is a
big data as a service you can write usql to return data from your azure data lake then you have azure data box which isn't really covering the exam but i'm including here because i think it's a great addition so you can import and export terabytes of data via hard drive you mail into the azure data center onto our next page here we have sql server for azure virtual machines this is when you need an sql server where you're migrating an existing sql from your on-premise data center onto azure but you can't afford to make any
changes so you're literally taking the vm and lifting and then shifting it onto azure but you get to have access to the virtual machine underneath so you can control the os access layer and also if you already have an existing license it's a great solution for that as well if you are doing a lift and shift but you don't need to manage the virtual machine and you want azure to do all the work for you you have sql managed instances then you have azure sql which is the fully managed mssql database then you have azure
databases for mariodb postgres and mysql you have azure cache for redis now this is an in-memory data store for returning data extremely fast but is also extremely volatile and this isn't covered on the exam but i like to include it because i think it's just one of the data services that's important you have microsoft office 365 sharepoint not really covered on the exam but you will hear it mentioned throughout the course content and you know i think that if you haven't had exposure to you should know what it is it is a shared file system
for organizations the company owns all the files and applies fine-grained role-based access controls you have azure data bricks this is a third-party provider partnered with azure specializing in apache spark to provide very fast etl jobs as well as ml and streaming you have microsoft power bi this is a business intelligence tool used to create dashboards and interactive reports to empower business decisions we have hdinsights this is a fully managed hadoop system that can run many open source big data engines for doing data transformations for streaming etl elt we have azure data studio this is an
ide that looks very much like visual studio code but designed around data related tasks across platforms similar to sss ssis but broader data workloads you have azure data factory a managed etl elt pipeline builder easily build transformation pipelines via a web interface within azure and then you have sql server integration services ssis it's a standalone windows app to prepare data for sql workloads via transformation pipelines there's probably a bunch of other little services or tools that we don't have in this list but don't worry we'll cover them throughout the course just remember these ones that
we went over here today hey this is andrew brown from exam pro and we're taking a look at the types of cloud computing for azure data related services starting at the top of our pyramid is software as a service and it's a product that is run and managed by a service provider so you do not worry about how the service is maintained it just works and remains available and this is specifically designed for customers so uh it's not particularly azure services but it'll be like microsoft based services like power bi or the office 365 suite
is going to be software as a service going down to platform as a service this focuses on deployment and management of your apps so you do not worry about provisioning and configuring or understanding the hardware or os layer and this is specifically for developers so we would put hdinsights azure sql cosmodb managed sql all right and at the bottom we have infrastructure as a service these are the basic building blocks for cloud it it provides access to networking features computers and data storage space you do not worry about the it staff data centers and hardware
and underneath here we would have this would be for admins but we'd have azure disks virtual machines sql vms and honestly you know like when you look at aws and azure these kind of categories are defined a little bit differently so you know like manage sql i would probably put that infrastructure as a service but azure says that it goes into the mid tier i really want to pull up a particular document here that i think is important because this is all about the azure sql family and they're specifically categorizing these in particular so you
go down below here and when we're looking i have them on here but when you look here i have sql vm down below so that would be considered infrastructure as a service you have managed sql where they put in the middle but they categorize it as platform as a service and then you have it's covered up here we have azure sql database which is platform as a service i'm showing you this because they might ask you this question on the exam uh and so i just wanted to point that out to you there but there
you go [Music] okay so let's take a quick look at three azure data roles that azure wants you to know about specifically related to data services when i say roles here i don't mean like azure permissions i actually mean like jobs that people would do within azure and let's just take a look at it here so the first one is database administrator this is somebody that would configure and maintain a database such as azure data services or sql servers and they would be responsible for database management management security or granting users access backups monitoring performance
and common tools that they would use would be azure data studio sql management studio azure portal the cli the next role would be data engineer and that would be to design and implement data tasks related to the transfer and storage of big data responsibilities here would be database pipelines in process data ingestion storage prepare data for analytics prepare data for analytics processing and common tools that they would use would be azure synapses studio sql azure cli and the last role here we have is data analyst so this is analyzes business data to reveal important information
so you have provide insights into data visual reporting modeling data for analysis combines data for visualization analysis common tools here are power bi desktop power bi portal power bi service and power bi report builder so i just want you to know that there's definitely a lot more roles than just these three here but this kind of helps you narrow down what this entire dp900 is focused on which are these three kind of rules here but what we'll do is we'll jump into these common tools and just talk about them a little bit more in detail
uh next here okay [Music] okay so we're taking a look here at database administrator common tools the first being azure data studios this allows you to connect to azure sql data warehouses post postgres sql sql servers big data cluster on premise i say azure sql data warehouse it must be azure synopsis analytics i probably just wrote that incorrectly there but various libraries and extensions along with automation tools a graphical interface for managing on-premise and cloud-based data services runs on windows max and linux possible replacement for ssms but still lacks some of those features if you
launch the service it looks a lot like visual studio code because it probably is but it's specifically for data related tasks so if you're used to visual studio code you're going to be at home with the service you have sql server management studio ssms and it's an automation tooling for running sql commands or common database operations it has a graphical interface for managing on-premise and cloud-based data services but it only runs on windows so if you're on mac or linux you're going to be using azure data studio and if you're on windows you might just
have both of these installed because there's just some things you can do in ssms that are just a lot easier than azure data studio but it's more mature than azure data studio so you know it's just going to be the features are going to be a lot more richer in partic in particular for sql you have azure portal and cli so here you can manage sql database configuration so you can create delete resize the number of cores uh you can manage and provision other data azure data services automate the creating updating or modifying resources via
the azure resource manager templates which is infrastructure as code so those are the three major ones that a database administrator is going to be working with [Music] now let's take a look at data engineering common tools so at the top we have azure synapses studio so you know azure synapses analytics when you click into it there you launch a studio and this allows you to manage things like your data factories your warehouses sql pools spark pools things like that you're gonna have to know really really no sql there's tsql usql synapsis sql there's all sorts
of sqls within azure so it's definitely something you want to learn for the azure cli you'll have to be able to use it to then execute sql commands because once you connect to an sql server via the cli you're just going to be writing sql from there i added these they weren't in the common tools list prior but i just added them now because i thought they were useful so hd insights which would have for streaming data via apache kafka or apache spark or applying etl jobs via hive pig and apache spark as your data
bricks also because here you could be creating an apache spark cluster and using that to do etls or streaming jobs to your data warehouses your data lakes and of course you'd be working with blob storage and data likes as well so they should be on the list here but just there you go [Music] all right taking a look here at data analysts for common tools we have power bi desktop this is a standalone application for data visualization you can do data modeling here connect to many data sources and create interactive reports then you have power
bi portal or also known as the power bi service and really this is just intended for creating interactive dashboards you can definitely do other things here but this is what i know it for then you have power bi builder report or report builder this is another standalone application and this allows you to create paginated reports which are just printable reports definitely there are more tools than just these three for uh data analysts but this is what azure wants you to know uh so there you go [Music] hey this is andrew brown from exam pro and
we are looking at the data overview so the idea here is that we're going to be covering a lot of fundamental not necessarily as your specific data related knowledge that you need to know to really understand how to use azure data related services uh so i'll just give you a quick overview here and then we'll dive deeper into all these things so the first thing is data so that's units of information you have data documents these are types of abstract groupings of data you have data sets these are unstructured logical grouping of data when you
structure your data now it's called structured data and then you have data types these are single units of data that are intended to be used in a particular way then you have a bunch of loose concepts so you have batch and streaming so this is how do we move our data around we have relational non-relational databases or data so how do we access query and search our data you have data modeling how do we prepare and design our data schema versus schemas how do we structure our data for search data integrity and data corruption how
do we trust our data normalization and denormalization how do we trade quality versus speed and i'm sure we cover a lot more than just this list i just didn't feel like being exhausted here and put every little thing here so many things that have the word data in it but let's jump into it [Music] so the first question we should be asking ourselves is what is data so data is units of information that could be in the form of numbers text machine code images videos audio or even in a physical form like handwriting maybe if
you're in the future it's crystals i don't know and so just some images here or examples of graphics so here we have an image here is a bunch of binary code or machine code we have a book here we have uh like audio so we have like audio spectrum here uh and then you have mathematical formulas so i'm sure at this point you know what uh data is but just in case uh you know you need to just broaden your your thoughts of what data is it really could be everything including physical stuff so what
is a data document a data document defines the collective form in which data exists so common types of data documents would be data sets which is a logical grouping of data databases which is structured data that can be quickly accessed and searched data stores this is unstructured or semi-structured data for housing data data warehouses structured or semi-structured data for creating reports and analytics notebooks data that is arranged in pages and designed for easy consumption so just to give you some examples if we're looking at data sets we might talk about the mnist data set for
azure sql that would be a database for a data store such as a data lake we have azure data lake for a data warehouse we have azure synapsis analytics and for notebooks we might talk about jupyter notebooks but this could also include you know an actual handwritten notebook so there you go [Music] all right so what is a data set well a data set is a logical grouping of units of data that are generally closely related or share the same data structure and so i say data structure there but i just want you to know
that just because something has a data structure doesn't mean it's structured data it can be semi-structured like json objects or xml files but generally data sets are unstructured or structured they are publicly there are publicly available data sets that are used for learning statistics data analytics and machine learning the most popular one being the msns database i can't tell you how many times i've seen this data set but it's images of handwritten digits used to test classification clustering and image processing algorithms commonly used when learning how to build computer vision ml models to translate handwritten
text into digital text and here's an example of that data set there another very popular data set is the commons objects in context the coco data set i believe microsoft had a hand in this one and it's a data set which contains many common images in a json file a coco format that identify objects or segments within an image so there again it has json files that means that it's semi-structured not necessarily structured data here is an example of the data set you can see that there are images and they have borders drawn around things
that are trying to identify in the images such as objects and segmentation recognizing context super pixel stuff segmentation no idea what that means uh 329 000 images and it has a lot of labels and a bunch of other things in this data set another interesting one would be the imdb review data set where it has 25 000 highly polar movie reviews meaning people really like the movie or they really dislike them i pulled out an example here of pluto nash which at the time was highly uh liked and disliked there was a huge split between
this movie mostly i think people disliked it but apparently it's kind of split this is great for customer seg sentiment analysis again you'll probably be using ml for this but the idea is to say did people like the movies like or did they like it hate it uh we're sad about it that's kind of like customer sentiment right how did they feel some other data sets we have the free music archive this is a data set of musical uh music tracks so you have a hundred thousand tracks across 163 genres you have the uh li
i guess this is library but lib re speech a data set of a thousand hours of english speech you think they use an english word to describe that but no there are many more data sets online some are paid some you have to extract via an apis uh some you have to scrape the data yourself um but just taking a big look here of the full list we got crunchbase glassdoor fbi google trends data hub world health organization it's across the board right so you know there's a lot of things that you can get data
online a lot of times you are just creating your own data sets but it's good to know that there's a lot of stuff out there [Music] so what is a data type it's a single unit of data that tells the compiler or interpreter so a computer program how data is intended to be used and the variety of data types will greatly vary based on the computer program a better way of thinking of it instead of saying computer program just say programming language because that's where you're going to be encountering data types so let's take a
look at the most common data types the first being numeric data types these are data types involving mathematical numbers the most common being integer which is a whole number could be negative or positive a lot of programming languages will have a variety of these that have different ranges like int 32 into 64 what have you then you have floats or sometimes also known as decimals this is a number that has a decimal so 1.5 0.0 can be a negative number as well here it is example of an end and a float in python we'll take
a look at text data types this is a data type that contains a readable or non-readable letters so there are characters so characters is a single letter so it could be a to z it could be a digit it could be a blank space punctuation special characters then you have a string and a string is a sequence of characters that can be word sentences or paragraphs they don't necessarily have to be words but you know that's usually what you're using them for so here's an example of a character and a string then you have composite
data types and these contain cells of data that can be accessed via an index or a key so for example we have an array so that is a group of elements that contains the same data type that can be accessed via the index and position you have a hash commonly known as the dictionary and if you're using python you're probably known as the dictionary i know it's a hash because i like ruby so that's how i know it and it's a group of elements where a key can be used to retrieve a a value and
the thing is is that composites are they overlap with data structures so when we talk about data structures you might say hey is an array in hash a data structure and yes it is but yes it's also a data type it just depends on the programming language and the constraints around those okay so i know those get a little bit confusing i just wanted to point that out but here's an example of again this is um python so we have an array and down below we have a python dictionary and if you use json or
javascript yes uh you know a json object basically is a hash we've got one more here so we have binary data types this these are represented by a series of bits uh or bytes which either are zero or one so off and on so here's an example in python how to set up a byte um for billion values if we have true or false some languages represent billions as a zero or one so a lot of times when you're using like mysql true and false i think is zero and one in there sometimes it's a
t or an f when you're using um a post sql it's going to be a t or an f uh and sometimes it's just true or false so in python it's just capital t true and actually i think they have a lowercase tree which is a different kind of object it's a bit confusing but i'm just saying there's some variants there then you have enumeration data types or sometimes known as enums and these are a group of constants unchangeable variables so for example diamond spade hearts and clubs they're all related because it's card groups so
the idea here is the data type uh you know it could be also a data structure again it varies on the language just like composite types but here on the right hand side we have a shake and it's wrapped in a class and below this again this is python that's how you do in python and so the shape could be vanilla chocolate cookies and mint a lot of times the nums will map to an integer value or something so but not always the case um but yeah that's uh that type there and there you go
that's the common data types [Music] okay so let's take a look at schema versus schema list so what is a schema a schema in terms of a database is a formal language which describes the structure of data a blueprint of a database and a schema can be can define many different data structures that serve different purposes of a database so different data structures in a relational database could be things like tables fields relationships views indexes packages procedures functions mxl schemas cues triggers types sequences materialized views cinnamons cinnamons synonyms can't say that word database links and
directories gonna highly vary based on the database that you're using and i'm just going to show you an example of a schema so here is actually part of my schema for the xampro app and so this is a ruby on rail schema that defines the structure for a relational database and it's written in a dsl called ruby but the thing is is that this is going to highly vary uh based on again what you're using but just notice that you can see things like creating a table creating indexes creating columns for the database things like
that adding extensions uh schema-less uh is just kind of it's still schema but the idea here is the primary cell of a database can accept many types so just going back here for a moment notice here that we have very particular notice here that we have very particular data types like integer string and stuff like that the idea here is with schema list that that data type is a lot more flexible and the idea there is it allows you to forego the upfront data modeling that you normally would have to do which is a lot
of work and so that's one of the advantages of schema lists common stimulus databases would be key value document columns and then the subcategory of wide columns and graph and not a lot of information here but we will describe it in more detail when we talk about nosql databases okay [Music] all right let's talk about query and querying because those are terms that you're going to need to know because you're going to be doing quite a bit of it if you are going to have a career as a anything in data right like a data
analyst so what is a query a query is a request for data results also known as reads uh or to perform operations such as inserting updating or deleting also known as writes and so a query can perform maintenance operations on the data and it's not always restricted to just working with the data that resides within the database and you'll see this where there's like there's commands to do analysis on your database and other things like that but here's an example of a query so what is a data result well results are data results is the
results of the data returned from a query so here you generally will see tabular data that's usually what people want back but you know you can get back json or xml it really just depends on the database what is querying so this is the act of performing a query so the idea is that you write your query above and it's being sent as an as a request could be through an sdk cli the shell an api lots of ways for it to get there and then the idea is that the query is going to return
the the data results so what is a query language that is a scripting language or programming language designed as the format to submit a request or actions to the database noble query languages is sql graph sql cousteau xpath gremlin and there's a lot of them but you know those are ones that stand out to me right now and just to note up here this is sql okay so i just didn't want to make a big old line this way here but this is sql here okay [Music] all right so let's compare batch and stream processing
so batch processing is when you send batches a collection of data to be processed and batches are generally scheduled so you might say every day at 1 pm but you can also just queue up a batch whenever you feel like it batches are not real-time meaning that all the data is sent and then you wait until the batch is back to see the results batch processing is ideal for very large processing workloads batch processing is generally more cost efficient than stream processing and so here we just kind of have a representation so here we have
our data we've broken into batches or collections we pass it to something like an etl engine and it will transfer the data and then we'll insert into our database data warehouse data store data lake house wherever you want to put it then we have stream processing so this is when you process data as soon as it arrives you'll have producers which will send data to a stream and consumers which will pull data from a stream a lot of times the stream will look like a pipeline and data can be held in that stream for a
period of time so you have a better reusability of data if you need it for multiple consumers stream processing is good for real-time analytics real-time processing like streaming videos anything that has to do with real time if you need it right away it for that purpose it is much more expensive than batch processing um and here's a visual representation where we have bits of our data they go into our stream pipeline that can be held there for a while and sometimes minor operations will be performed on it but consumers will pull the data and do
what they want with that data if we want to contextualize these things in terms of services on azure the idea is you'd have your data sources and you'd ingest them into something like azure stream analytics or i didn't really make this graphic very good but the idea is that you go into stream analytics or maybe you go into hdinsights or maybe you go into azure synapse analytics or one of these intermediate steps and then eventually you go to power bi to make your visualization reports for stream processing you could use event hub so event hub
is a uh a single topic uh streaming service and you could ingest that into azure stream analytics um it's funny because this is the stream analytics icon this is actually the hadoop icon up here it's got it mixed up but anyway you go into stream analytics and you can insert that into cosmodb and then maybe pull cosmodb into power bi but yeah that is the difference between the two okay [Music] all right let's talk about relational data and this has to do with tables and relationships between other tables so let's talk about what a table
is it's a logical grouping of rows and columns so an excel spreadsheet is actually tabular data tabular data just means the data that makes use of table data structures okay then you have views which look a lot like tables except they are the result set so a table when you do a query you're returning back data and you're storing that queries data in memory and basically it's a temporary or virtual table then you have materialized views and it's the same thing as a view except the difference here instead of being stored in memory it's stored
on disk but again it's the results of a table so it's a virtual table then you have indexes and this is a copy of your data sorted by one or more multiple columns for faster reads at the cost of storage so think of it kind of like a virtual table but it does include all the columns and it's it's just so it's just to help you understand what order to retrieve data you have constraints these are rules applied to rights that can ensure data integrity so like if you have a database and you want to
make sure that there are no duplicate records you put a constraint for no duplicates things like that you have triggers this is a function that is is a trigger on a specific database event this is really useful let's say after you insert a a column in your database you want to have a uuid you'd have a function that would generate out the uuid then you have primary keys so one or more multiple columns that uniquely identify a table in the row the most common primary key is id a foreign key so a column which holds
the value of a primary key from another key to establish a relationship very commonly it's just the name of the other table with underscore id so relationship is when two tables have a reference to one another to join the data together i know this text is really boring but we're just going to cover so much about relational database tables so i didn't feel that we needed a visual here but let's keep moving forward here with relational data so for relational data we have tables and then the relationship between the tables let's talk about those relationships
so relational databases establish relationships to other tables via foreign keys referencing another's table's primary key so looking at the example here uh you know this is the primary there's a little icon here that shows you that it's the primary key so this is the primary key and over here in another table it's referencing a foreign key so that's the foreign key that's the primary key okay and there are four types of relationships between relational databases and their tables the first is one to one so imagine a monkey has a banana or here we have a
table called country and it has a capital it's one to one then you have one to many so a store has many customers or you could say a book has many pages notice that this denotes the many here then you have many to many so a project that has many tasks and tasks can belong to many projects or here a book can have many authors and an author can have many books so there's many to many and then last is a variant on the many to many so and it's via a join or junction table
i just call them join tables so a student has many classes through enrollments and a class has many students uh through enrollments so here it's the same thing a book can have many authors an author can have any books but it's all through a library so you could say a book has many authors through a library and an author has many books through a library okay so there you go [Music] okay so we know that relational databases store tabular data but the thing is that data can also be stored either in a row oriented way
or a column oriented way and let's just talk about the differences there and why we would do that so the first case we have row store the data is organized into rows this is great for traditional relational databases which are row stores good for general purpose databases suited for online transaction processing oltp we are going to come back to that term later on great when needing all possible columns in a row which is important during a query not the best at analytics or massive amounts of data all right we're looking at column store data is
organized into columns it's faster at aggregating values for analytics so ideas imagine that you want to count how many cities there are for millions of records if it's organized by column like querying based on column or data stored together as columns a lot faster generally these are no sql stores or esco-like databases it's a bit confusing because uh you know like you would think tableau data is just relational databases but when you want to do column store they're basically nosql stores so the term's a bit fuzzy there it's great for vast amounts of data when
we're talking about massive amounts we're talking millions and millions of records terabytes worth of data okay suited for online analytical processing oltp great when you only need a few columns so you don't need to get data from all the columns and there you go [Music] let's talk about database indexes which is a data structure that improves the speed of reads from the database table by storing the same or partial redundant data organized in a more efficient logical order and the logical order is commonly determined by one or more columns such as sort keys they're always
called that a common data structure for an index is a balanced tree uh and it's short for b tree not to be confused with binary tree which is something else so you might see b tree and be like okay that's how it's doing that uh so here we just have kind of a visual imagine you have a table or a foot like that's for a phone book and you want to quickly find people based on the phone number because maybe you're trying to find them based on the starting number being 344 or something so the
idea is you make an index and you say i want this to index by the phone number and so what it's going to do is change the order there so it might just pull the id or the number and reorder it and so now what you'll do is you'll use that index and that index will use as a reference to determine so it's not storing all the data but it will use that as a reference to the original table to quickly return your data and so here's a very easy way to create an index in
postgres so we'd say create index and then we give it a unique name and we'd say let's make it index on the um for our addresses but just on the phone number so there you go [Music] let's take a look here at data integrity versus data corruption so data integrity is the maintenance insurance of data accuracy and consistency over its entire life cycle and it's often used as a proxy term for data quality now you might think it's data validation but it's just a prerequisite of data integrity because again data integrity is all about the
entire life cycle making sure over all that it's going to stay consistent so validation is just one part of it the goal of data integrity is to ensure data is recorded exactly as intended data integrity is the opposite of data corruption so data corruption is the act or state of data not being in the intended state result in data loss or misinformation and data corruption occurs when unintended changes result when reading writing and so in the case when you're doing reads and writes maybe you have a hardware failure somebody just inputs the wrong data or
someone intentionally is being malicious to corrupt your data or there's unforeseen side effects for operator operations via computer code so you wrote code and you didn't know that it was doing something that it wasn't supposed to be doing so how do we ensure data integrity well we have a well-defined and documented data modeling so data modeling if you know exactly how your data is supposed to be and it doesn't match the model there then you'll know logical constraints on your database items so we talked about that when we talked about all the types of relational
data so constraints will keep that data integrity in place redundant and versions of your data to compare and restore so you have to be able to um not just validate your data but be able to bring it back to the state that it's supposed to be human analysis of the data so you know that's where data analysis will just check periodically uh hash functions to determine if changes have been tampered with you see this quite often when you're downloading uh open source software or software off like soft soft pedia where you can have an md5
hash to say did the thing i download match the thing that was expected uh principle of least privileges so limiting access to specific actions for specific user roles will mitigate uh problems that are unexpected with your data so all that stuff uh makes up data integrity okay [Music] okay it's time to compare normalized versus denormalized data so normalize is a schema designed to store non-redundant and consistent data whereas denormalize is a schema that combines data so that accessing data or querying it is very very fast so when we see tables and relationships like a relational
table where everything is very discreetly organized this is normalized data and then on the right hand side where you could take all those tables on the right-hand side and make them one table this would be extremely efficient so the left-hand side for normalized data integrity is maintained little to no redundant data many tables optimize for storage of data on the right hand side we have data tegrity is not necessarily maintained or there's not good controls in place you have to do extra work to make sure it is in good shape redundant data is common fewer
tables excessive data storage is less optimal now when you're using relational databases you can use both normalized and denormalized schemas and when you are using nosql it's a little bit harder but like there's cases where you can kind of model things like tables but generally data is denormalized in nosql so there's a bit more challenge with data integrity but the the upside is you get a lot more performance right so it's just way way faster at scale [Music] a pivot table is a table of statistics that summarizes the data of more extensive tables from a
database spreadsheet or business intelligence tool and pivot tables are a technique in data processing they arrange or rearrange so pivot statistics in order to draw attention to useful information and this leads to finding figures and facts quickly making them integral to data analysis so when you're looking at microsoft excel it's very easy to create pivot tables think of a pivot table as an interactive report where you can quickly aggregate or group your data based on various factors so maybe you're grouping it by year month week or day some average min or max and so over
here i have an example of a pivot table excel i got this from excel jet which they actually have really good information about pivot tables an example so if you think that you want to learn more about this i would go check out that resource but here you what you can see is that we have a table and notice that it has these little filters at the top right and so the idea is you can drop that down and say sort by date and and other stuff and what that you can do is create here
is another pivot table here where we said okay let's sum the sales based on blue and green so this is a pivot table and then created another pivot table okay uh and so um you know it becomes very useful tool in excel all right and just one more thing pivot tables used to be a trademarked word owned by microsoft so a lot of times pivot tables were specifically just in excel or their uh was it their microsoft access database but now pivot tables is a unique term or a general term that everybody uses just for
this kind of operations [Music] let's talk about data consistency and this is when data is being kept in two different places and whether the data exactly matches or does not match so when you have to duplicate data in many places and you need to keep them up to date to be exactly matching based on how the data is transmitted and service level the service levels of your cloud service provider they'll use these two terms and we'll hear strongly consistent and eventually consistent so strongly consistent means every time you request data so you query data you
can expect consistent data be returned within x time so they might say within 10 milliseconds 100 milliseconds one second so the thing is we will never return to you old data but you will have to wait at least x amount of seconds for the query to return whatever that defined time is we talk about eventual consistency when you're when you request when you request data you may get back inconsistent data within x amount of periods so two seconds we are giving you whatever data is currently in the database you may get new data or old
data but if you wait a little bit longer it will generally be up to date why would we have these two methods it just depends on your use case maybe you can tolerate some data to be inconsistent it's more important to get whatever data is available now and sometimes you need an absolute guarantee that the data is one to one okay so those are the two different ones [Music] so synchronous and asynchronous can refer to mechanisms of data transformation uh and data replications let's break these two down so synchronous is continuous streams i'm just going
to mark that there continuous stream of data that is synchronized by a timer clock so you get a guarantee of time of when the data will be synced you can only access data once the transfer is complete you get guaranteed consistency of data returned it at the time of access slower access times so here is the data and if you're thinking about strongly consistent that is what this is it's this is going to be when things are strongly consistent then we have asynchronous so continuous stream of data separated by a start and stop bits no
guarantee of time can access data anytime but may return older versions or empty placeholder faster access times no guarantee of consistency here it is so you see it's moving in bits right and so the idea is that we can access any time in between here to get maybe up to date data or not up to date data to solidify this let's put in some scenarios so a company has a primary database but they need to have a backup database in case their primary database fails the company cannot lose any data so everything must be in
sync the database is not going to be accessed while it is standing by to act as a replacement so the reason why this works is that you know if you have a backup database you have to make sure all your data is one-to-one then on the other side here a company has a primary database but they want a read replica a copy of the database so their data analytics person can create computational intensive reports that do not impact the primary database it does not matter if the data is exactly one-to-one at the time of access
because in this scenario it's like any time the database goes down you want it up to date to the second on this side it's like they might run reports once a day whatever so you know there's always new data coming in and b2 uh to burdensome to make sure that the data is always up to the second so there you go hey this is andrew brown from exam pro and we are looking at non-relational data and this is where we store data in a non-tabular form and will be optimized for different kinds of data structures
so what kind of data structures well we're looking at key value stores so each value has a key design to scale only simple lookups then you have a document store so primary entity is json-like data structure called a document you have column restore sometimes this falls under relational databases but it is a non-relational data type or database so it has a table like structure but data is stored around columns instead of rows then you have the graph database where data is represented with nodes and structures where relationships really really do matter and so sometimes non-relational
databases can be both a key value and document store like azure cosmo db or amazon dynamodb and the reason for that is that documents are actually a subset of key values which we'll talk about later when we get to that point [Music] hey it's andrew brown from exam pro and we're taking a look at data sources so a data source is where the data originates from so an analytics tool may be connected to various data sources to create a visualization report and a data source could be a data lake a data warehouse a data store
a database a data requested on demand via an api endpoint from a web app and flat files such as excel or spreadsheet and so the example here is that we have a data source and somehow there has to be a connector between them and it's going to be consumed by either a warehouse an etl engine a data lake or bi tool those are common ones that need data sources so extracting data from data sources so a data tool like a business intelligence software would establish a connection to multiple data sources at the bi would extract
data which could uh could be pulled data at the time of report or it could be pulled data on schedule or data could be streamed the mechanism for extracting data will vary per data source i just want you to know that because you know when you're using these services it does really vary on how it pulls the data so i just want you to understand there are a few different ways okay [Music] so a data store is a repository for persistently storing and managing collections of unstructured or semi-structured data so here i have kind of
a visual where we have files going into some kind of store and a data store is a very very broad term so it's interchangeably used with databases though databases is technically a subset of a data store but generally a data store indicates working with unstructured or even semi-structured data so if somebody said a data store i'm thinking that it's either unstructured semi-structured okay a data store can be specialized in storing flat files emails maybe a database as we said it was a subset uh or designed to be distributed across many many machines or it could
be a directory service okay so that's a data store [Music] so what is a database a database is a data store that stores semi-structured and structured data and but a better term i would say would be a databases more complex data store because it requires using formal design and modeling techniques databases can be generally categorized as either relational databases so this is structured data that strongly represents tabular data so tables roles and columns they're generally either row oriented or columnar oriented and when we talk about non-relational databases we're looking at semi-structured data that may or
may not distantly resemble tabular data and i know that i put this one over on the relational side sometimes it ends up here or there just understand that that one kind of floats in between the two really depends on the technology underneath and so here is a pretty common way you get your sql you hit the database you get a table back right so the databases have a rich set of functionality specialized uh language to query so that is our sql here right we have specialized modeling strategies to optimize retrieval for different use cases more
fine-tuned control over the transformations of the data into useful data structures or reports the thing here on the end normally a database infers someone is using a relational row-oriented data store so when somebody that's when someone says a database you're usually thinking like mysql sql postgres redb things like that okay [Music] so what is a data warehouse it's a relational data store designed for analytic workloads which is generally column oriented data store and again i'm going to make an emphasis here sometimes it's non-relational sometimes it's relational don't get too worried about that part okay companies
will have terabytes and millions of rows of data and they need a fast way to be able to produce analytic reports that's how you know you need a data warehouse okay data warehouses are generally generally perform aggregations so aggregations is grouping of data to find a total or average data warehouses are optimized around columns since they need to quickly aggregate column data and so here is an example where we have our warehouse and the idea is that we're taking data in from like an unstructured source through an etl and then here we have an sql
these are two different data sources we use sql to then get our results okay so data warehouses are generally designed to be hot hot meaning that the data will be returned very very fast even though they have vast amounts of data data warehouses are infrequently accessed meaning they aren't intended for real-time reporting but maybe once or twice a day or once a week to generate businesses and user reports now can it report extremely fast of course but it's not like at a at a per millisecond you're running it all day and keeping it all up
to date that's more for a stream right a data warehouse needs to be needs to consume data from a relational database on a regular basis or you know through an etl data gets transformed input in there okay generally generally data warehouses are read only so you insert data and then you read it you're not using it for transactional data okay [Music] what is a data mart a data mart is a subset of a data warehouse a data mart will store generally under 100 gigabytes and has a single business focus and so a data mart allows
different teams or departments to have control over their own data set for specific use cases so here we have a data warehouse and we are running queries to then put them in their own little data warehouses uh but the idea is that you know there are you know just smaller data sets that are more focused databars are generally designed to be read only because you're going to always have to pull data from the main data warehouse data marks also increase the frequency at which data can be accessed because of just smaller data sets and you
don't have to worry about you know a huge cost because the larger the data set you have to query over the more expensive it gets right the cost of query is much much lower and so you might even see people accessing these a lot more frequently than they would a data warehouse so there you go [Music] so what is a data lake it is a centralized storage repository that holds vast amounts of raw data so big data in either semi-structured or unstructured format a data lake lets you store all your data without careful design or
having to answer questions on the future use of the data so basically it's hoarding for data scientists here is kind of a visualization where you have a bunch of different data sources dropping into the data lake maybe you want to perform etls and put data back into the data lake and then you know you can extract out for reports ml all sorts of things we'll definitely cover data lakes when we get to the data lake section but uh this is the general overview here a daylight is commonly accessed for data workloads such as visualizations for
bis tools real-time analytics machine learning on-premise data data lakes are great for data scientists but it's very hard to use data lakes for bi reporting so it's not that you can't do it it's just that there's additional steps and so there might be a different solution that might be a bit easier such as a data warehouse if data lakes are not well maintained they can become data swamps which is basically like data corruption right so yeah there you go [Music] what is a data lake house well a data lake house combines the best elements of
a data lake and data warehouse and this isn't something that azure has an offering for right off the bat right now but you can definitely definitely believe that cloud service providers will have this in the future so i want you to know about this today even if it's not on your exam so data like houses compared to data warehouse can support video audio and text files support data science ml workloads have support for both streaming and etl work with many open source formats data will generally reside in a data lake or blob store so the
thing with data lake or data warehouse is usually used proprietary formats so this is a lot more flexible data like houses compared to data lakes can perform bi tasks very well which is something data links cannot do much easier to set up and maintain has management features to avoid a data like becoming a data swamp right because data warehouses are very well data modeled and data lakes aren't so data lakes are kind of in between and uh data lakes and with a data lake house is going to be more performant than a data lake so
where would you find a solution right now probably with data delta like which is a data bricks solution so it's apache data like i believe is open source and so if you wanted a managed version of it databricks has an offering for that and so they have this nice little graphic here where they show you that you know you combine the two and you get the best of both worlds so you know you'll see more data likes in the future so there you go [Music] hey it's andrew brown from exam pro we are looking at
data structures so what is a data structure this is data that is organized in a specific storage format that enables easy access and modification and a data structure can store various data types so data can be abstractly described to have a degree of structure so we have unstructured a bunch of loose data that has no organization or possibly any relation semi-structured data that can be browse or search with limitations structured data that can be easily browsed or searched so when we look at unstructured data think of a bunch of loose files and this is just
a screenshot from one of my folders here's a bunch of stuff or semi-structured data we have xml or structured data where it's like we're using a relational database so yeah there we go let's go drill down into these three types of abstractions [Music] so again what is unstructured data well it's just a bunch of loose data think of it as junk folder on your computer with a bunch of random files not optimized for search analysis or simply no relation between the various data so again there's a bunch of files in my uh one of my
folders there and so when we're talking about microsoft azure services that store unstructured data we have sharepoint so shared documents for an organization azure blob storage so unstructured object data store azure files a mountable file system for storing unstructured files azure data lake for big data it's basically blob storage but for vast amounts of data and if i wanted to add a fifth one there you know like azure azure disks but that's more for virtual machines okay [Music] so let's take a look here at semi-structured data which basically has no schema and the data has
some form of relationship it's easy to browse data to find related data and you can search data but there are limitations or or when you search you will pay a computative or operational cost a great way of thinking about this is think of a big box of legos and you have these lego pieces so it's not that there's not a schema defined there's a schema defined in the sense of the lego piece so it can connect and make relationships with other things that are compatible but overall the entire data as a whole does not have
a schema right you don't have you're not upfronting and doing data modeling so just understand that's why i put the asterisk there so there is a schema for the data structures just not for everything in total totally here that's why it's called semi-structured for concrete semi-structured data structures we've got xml json avro and parquet i don't know if it's pronounced parquet but that's the way i say it and for azure and other services that store semi semi structured data we have azure tables which is a key value store azure cosmodb where its primary one is
document of course it stores other types but when we're talking about semi-structure we're talking about a document store mongodb which is an open source document store and then we have apache cassandra which is a y column store database um there's no sql that's just an open source one okay [Music] all right so we're still on semi-structured data structures i just want to give them a little bit more attention because you know you might need to know some of the the guts to these semi-structures so what is semi-structured data semi-structured data is data that contains fields
the fields don't have to be the same in every entity you only define the fields that you need on a per entity basis so common semi-structured data structures is javascript object notation json format used in json notation stored data in memory read and write from files apache apache optimize row columner format also known as orc organizes data into columns rather than rows so column or data source structure apache parquet this is another column or data store a parkit file contains rows and groups we have apache avro which is row based format each record contains a
header that describes the structure of the data in the record you have also xml we're not going to go through all of these but i do want to drill down on some of these semi-structured data structures so you know how they internally work okay all right let's take a look at json so json stands for javascript object notation and it is a lightweight data interchange format it is easy for humans to read and write it is easy for machines to parse and generate and it is based on a subset of javascript so here is an
example of json and json is built on two structures the first is a collection of names name value pairs in other languages this is realized as an object a record a struct a dictionary a hash table key list or associative array so if you've ever heard of those things before that's basically what it looks like the other part is an ordered list of values other languages might call them arrays vectors list or sequence just to point them out there is the collection and there is the ordered list and json is a text format so that
it is completely language independent so it is used quite a bit these days all right let's take a look at apache org files which stands for optimize row columner it's a storage format for apache hadoop system so it is similar to rc files and parkit files and is the successor to rc files we're not going to cover rc files here but you know that's where these come from it was developed by facebook to support columnar reads predictive pushdowns and lazy reads is more storage efficient than rc files taking up 75 percent less space orc only
supports hadoop hive and pig when we get to the hadoop section you'll understand what those are work performs better with hive than parquet files org files are organized into stripes of data so here is that example there the autonomy of an org file so the file footer stores the auxiliary information the list of stripes in the file the number of rows per stripe each column the data type is column level aggregate information so count min max the stripe footer contains a directory of stream locations and we have the road data which is used for table
scans and the index table includes min max values for each column and the row positions for each columns and the default size of a stripe is 250 megabytes with large stripe sizes enable large efficient reads for hdfs which is the hadoop file system which we'll talk about when we get to the hadoop section but if you want to just review this here and then take a look at this graphic and it'll make a hundred percent sense here okay [Music] so let's take a look here at parquet files so apache parquet is a column restore file
format available to any project in the hadoop ecosystem so hive hbase mapreduce pig spark presto there's a huge list of them and not just here in the hadoop system but other other services azure even aws services work really well with parquet files so you know it's just becoming a very common format for columnar storage formats the parquet is built to support very efficient compression encoding schemes it uses the record shredding assembly algorithm that's why i don't go in detail like work here here about like talking about this data structure because it just gets complicated so
i just want you to know that parquet files uh are more generally used in org files or have very particular use cases uh and you're gonna come across parket more when you're doing column or storage file formats okay [Music] let's take a look at avro so apache avro is a row-based format that provides rich data structures compact fast binary data format a container file to store persistent data remote procedure calls rpcs simple integration with dynamic languages avro provides functionality similar to systems such as thrift and protocol buffers here is the data structure so when would
you be using avro over parquet well it's just when you uh when you have data like if you want to serialize your json into a more efficient format we're doing general queries right if you're doing analytics right columner based stuff you're doing it for um aggregation and stuff like that if you're just trying to kind of like simulate general general relational database structures for semi-structured or nosql databases you're going to want to use avro [Music] all right let's talk about uh structured data so structured data it has a scheme and data data has a relationship
it's easy to browse to find related data it's easy to search data the most common structured data is tabular data representing the rows and columns right so examples here for azure would be its postgres azure data sql database for postgres for msql azure sql for mssql and azure synapse analytics which is the data warehouse service okay hey this is andrew brown from exam pro and we are looking at what is data mining so this is the extraction of patterns and knowledge from large amounts of data not to be confused with the extraction of data itself
a lot of people think data mining means go on a website and start scraping that's not what it is uh cross industry standard process for data mining called chris dmm is defined into six phases there's a lot of way to define data mining but i just chose this one because i found that it's the easiest to understand so we got our big wheel here let's break through the six phases so business understanding right so here we are at the start of our journey over here business understanding so what does the business need then we have
data understanding which is what do we have to do and what data do we need and you can see that you can work between back and forth before you move on to the next step we have data preparation so how do we organize the data for modeling then we have modeling so what modeling techniques should we apply over here evaluation so what which data model best meets the business objectives and on the end here we have deployment so how do people access the data so there you go [Music] let's take a look at different types
of data mining methods and this will definitely not be an exhaustive list but it will give you a good idea what a data miner does so data mining methods or techniques is the way to find valid patterns and relationships in huge data sets so we have classification this is where you classify data into different classes you have clustering a division of information into groups of connected objects you have regression which is ident identify and analyze the relationships between variables because of the presence of other factors we have sequential so evaluating sequential data to discover sequential
patterns association rules discover a link or two or more items find a hidden pattern in the data sets these common these common constraints math formulas are used to determine significant and interesting links so we got support so indication of how frequently the item set appears in the data set confidence indication of how often the rule has been found to be true lift indication of importance compared to other items conviction indication of the strength of the rule from the statistical independence for outer detection we have observation of data items in the data set which do not
match an expected pattern or expected behavior we have prediction use a combination of data mining techniques such as trends clustering classification to predict future data not super important for you to remember all this but i'm just trying to like get you exposure to these terms and things so that as you see them more it'll make sense okay [Music] hey this is andrew brown from exam pro and we are looking at what is data wrangling so data wrangling is the process of transforming and mapping data from one raw data form into another format with the intent
of making it more appropriate and valuable for a variety of downstream purposes such as analytics also known as data munging so there are six core steps behind data wrangling the first is discovery so understand what your data is about and keep in mind domain specific details about your data as you move through the other steps structuring so you need to organize your content into a structure that will be easier to work for our end results cleaning so remove outliers change null values remove duplicates remove special characters standardizing formatting enriching appending or enhancing collected data with
relevant context obtained from additional sources validating authenticate the reliability quality and safety of the data and publishing so place your data in a data store so it can be used downstream so there you go [Music] hey this is andrew brown from exam pro and we're looking at what is data modeling but before we can answer that we should ask what is a data model so it's an abstract model that organizes elements of data and standardizes how they relate to one another and to the properties of real world entities a data model could be a relational
database that contains many tables so here's actually an example of some data modeling i did which is for the exam pro platform if you ever open up power bi they have like a data modeling tab so it becomes very clear what it is but generally uh you know data models just look like a bunch of tables and relationships but it's going to vary based on what you're using a data model for so a dml could be conceptual so how daters represented the organizational level abstractly without concretely describing how it works within the software so people
orders projects relationships logical so how data is presented in software tables columns object oriented classes physical so how data is physically stored so partitions cpus and table spaces so this one would probably be the the middle one here which is logical okay so you know this isn't just exactly how data modelling looks like there's all varieties the way data modeling or a data model can appear so what is data modeling a process used to define and analyze data requirements needed to support the business processes within the scope of the corresponding information systems and organizations so
here uh we have our uh data modeling here so you can see that uh it's actually broken up kind of into three sections which maps up really well see where it says physical conceptual things like that it matches up to our three categories here conceptual logical physical so just take in mind that uh you know if you have data modeling you can move from a conceptual to a logical to a physical one all right and so there you go [Music] all right let's take a look at etl versus elt so etl intel is used when
you want to move data from one location to another where the data store database have a different data structure so you need to transform the data for the target system a common use case would be mssql to cosmodb so this one is relational this one's nosql they just don't have the same data structures you'd have to do some kind of transformation and so here we have our visuals for etl and elt so let's talk about etl first which stands for extract transform and load so loads the data first into a staging server and then into
a target system so even though it's not shown here we actually have an intermediate virtual machine or server that's being loaded temporarily into doing the transformations and then when it's done it's going to output it into its target system uh used for on-premise relational and structured data so it's very common for on-prem like this could be a migration strategy so they could be taking an sql database and just moving it to um a sql database on azure right and so there might be like they're the same type of database but there could be different versions
of databases so the the feature sets slightly different so they do some transformations it's good for a small amount of data to be fair etl can be used for larger workloads but you know when we're comparing from elts it's generally smaller doesn't provide data lake support easy to implement mostly supports relational databases okay when we talk about extract load transform loads directly into the target system used for scalable cloud structures and unstructured data sources used for large amounts of data provides data like support requires specialized skills to implement and maintain supports for unstructured data readily
available so you're going to see the elt is going to be the more common use case where we're dealing with cloud but it does require a little bit more knowledge where this one is just like if you know sql you're going to be in good shape okay so there you go [Music] hey it's andrew brown from exam pro and we are looking at what data analytics is so this is when you're concerned with examining transforming arranging data so you can extract useful information a person that does data analytics is called a data analyst and they
commonly use tools such as sql business intelligence tools and spreadsheets if we look at the data and analytics workflow so you can understand their whole scope of their job they'll do data ingestion so getting data from multiple sources data cleaning and transformation so maybe they're using you know pandas and notebooks and things like that or sql commands dimensional reductions so they have to reduce the amount of data data and analysis which could be like statistics and things like that visualizations you use your bi tools or you are actually coding in dashboards and things like that
so yeah that is data analytics [Music] all right let's talk about kpis here so key performance indicators are type of performance measurement that a company will use or their organization will use to determine performance over time so here's an example of a kpi for product revenue and the goal was 3.12 million but the company actually generated out 2.29 million so there's 26 percent under their goal kpi can evaluate the success of an organization or for a specific organization activity and there are two categories of measurement for kpis we have quantitative and these quantitative and qualitative
or get easily confused because they're very similar name but quantitative the properties can be measured with a numerical result facts presented with a specific value so monthly revenue numbers of sign ups number of reports or defects so what we're looking up at here is quantitative okay and then we have qualitative so properties that are observed and can generally not be measured with the numerical results numerical or numeric or textual value that represents uh personal feelings tastes and opinions so maybe customer sentiment would be example there so that is kpis [Music] all right so we're taking
a look here at data analytics techniques and this is something you 100 need to know for the exam so pay close attention here okay so the top of our list here we have descriptive analytics and the question we are answering here is what has happened uh so here we might have specialized metrics such as kpis or return investment or things that we're more familiar with like generating sales and financial reports at this stage we have a lot of data right it's very accurate comprehensive it's either live data and we can make very effective visualizations to
understand our data because we have all that historical information but there's a lot more to it in terms of value and so we'll move down or out move out uh to see the other stuff that we can do here the next is diagnostic analytics so why did it happen it's supplemental to descriptive analytics we can drill down investigate descriptive metrics to determine root cause find and isolate anomalies into its own data set and apply statistical techniques we have predictive analytics so what will happen so we use historical data to predict trends or recurrence uh we
use either statistical or machine learning techniques apply this is where a data scientist might get involved we use neural networks decision trees regression classification neural networks just means deep learning machine learning okay uh prescriptive analytics how can we make it happen so goes a step further than predictive and uses ml by ingesting hybrid data to predict future scenarios that are exploitable and then the last one is what if this happens and that's cognitive analysis so using analytics to draw patterns to create what if scenarios and what actions can be taken if the scenarios become a
reality so there you go that is data analytic [Music] all right let's take a look here at microsoft onedrive so microsoft onedrive is a storage and storage synchronization service for files which reside in the cloud similar to products like dropbox google drive box and microsoft sharepoint so onedrive is intended for personal storage for a single individual you pay for different sizes of storage so you have five gigabytes which are free 100 gb 1tb and 6tb you do not worry about the underlying hardware the durability resilience fall tolerance availability that's what we call serverless technology here
is a nice screenshot of what it looks like to use onedrive both in the browser and on your phone files can be shared easily to other users via a shareable link or a specific email that has onedrive account files are accessed via a web app or shared folders that hold a reference to file stored in the cloud so shared folders mean like you can literally use your file system and you'll have folders that are linked to it files can be synchronized so a copy resides in a local computer hard drive is copied to the cloud
a file residing the cloud can be copied to a little computer hard drive copying occurs automatically when files are changed differences and files could result in conflicts and a user must choose which file to keep or how to resolve the conflict might have more options files can be versions so you can recover older versions of files older files may retain for 30 days and be automatically deleted so there you go that is onedrive let's take a look here at microsoft 365 sharepoint so 365 sharepoint is a web-based collaborative platform that integrates into the microsoft office
intended for document management and shared storage and here is a screenshot of my sharepoint uh and so there's the az-104 so the thing is is that it's basically one drive uh but for companies with a bunch of layers on top of it and it is extremely useful to use it's super super useful so like if you work in a company you should really be using sharepoint for sharing files so just growing down the feature list here so sharepoint sites because there's some concepts here just besides documents they have things like sites so data within sharepoint
is organized around sites a site is a collaborative space for teams with the following components so document library pages web parts and more sharepoint doctor document library which will expand there and that's what the screenshot is on the right-hand side so a document library is a file storage and synchronization but designed for teams it is very similar to onedrive but files are owned by the company and not an individual you can apply robust permissions to access files within or outside your organization a site always has a default document library called documents and so there's a
lot more going on in sharepoint but this is the most important feature of it and actually when you use onedrive when you install on your computer there's like a synchronization device it's the same thing so sharepoint most likely is using the onedrive technology underneath i don't know 100 but it's most likely the case so there you go [Music] hey this is andrew brown from exam pro and we are looking at data core concepts this one's four sheets long so let's jump into it so the first here is data which is units of information data documents
types of abstract groupings of data data sets unstructured logical grouping of data data structures which has some form of structure and there's variance right so we have unstructured a bunch of loose data that has no organization or possible relation we're talking about flat files here various files that can reside in a file system semi-structured so that's data that can be borrowed or searched with limitations so csvs xml json parquet and so if we're talking about xml files the markup looks like html for json it's a text file that's composed of dictionaries and arrays rc files
are storage formats designed for map reduce framework not something we covered in the uh lecture content but i just wanted to mention them there work so a columnar data structure 75 more efficient than rc files limited compatibility works very well with hive we have avro so a rose row-wise uh data structure for hadoop systems you have parkette a columnar data structure that has more support for hadoop systems than orc then we were talking about structured data so data that can be easily browsed or searched so tabular data and so tabular data is data that is
arranged as tables think of spreadsheets data types how single units of data are intended to be used we're not going to go through the whole list they're not going to ask that on the exam but you should know your data types uh four types of roles that azure cares for you to know we have database administrators so configures and maintains databases data engineer design and implement data tasks really to transfer and storage of big data data analysts analyzes business data to reveal important information then we have our tiers of computing so we have software as
a service a product that is run and managed by service provider platform as a service focus on the deployment management of your apps infrastructure as a service basic building blocks of cloud i.t provides access to networking computers data storage and space and remember that uh we're talking about sql it's going to be the sql vms on this layer and then here's going to be the managed sql and azure sql databases okay so we're on to the second page here so let's talk about data stores unstructured or semi-structured data for housing data a broad term that
can encompass anything that stores data databases structured data that can be accessed quickly and search generally relative row-based tabular data for oltp data warehouses structured semi-structured data for creating reports and analytics column-based tabular data for olap data mart's a subset of data warehouse for specific business data tasks data lakes combines the best of data warehouses and data lakes notebooks data that is arranged in pages designed for easy consumption batching when you send batches a collection of data to be processed not real time streaming when the data is processed as soon as it arrives so it's
real time relational data data that uses struct structure tabular data and has relationships between tables and in terms of relationships for relational relational stuff we have one to one so one to one so think a monkey has a banana one to many a store has many customers many to many a project has many tasks and tasks can belong to many projects a join table a student has many classes through enrollments the enrollments would be the joint table and a class has many students through enrollments then we're talking about row stores so or row wise data
organizing rows optimize for oltp then you have column store or columner data organizing columns optimize for olap so analytics now we have indexes a data structure that improves the reads of databases this is also shows up under non-relational databases but i just threw it here just because we have pivot tables it is a table of statistics that summarizes the data of more extensive table from a database spreadsheet or bi tool now talking about non-relational data data that has semi-structured data associated with schema new school databases so we got key value each value has a key
designed to scale only simple lookups i like to describe a simple dumb and not a lot of features we have document primary entities xml or json-like data structure called a document columner has a table like structure but the data is stored around columns instead of rows graph data is represented with nodes and structures where relationships matter okay uh we're on to the third page here so data modeling an abstract model that organizes elements of data and standardizes how they relate to one another in the real world entities schema a formal language to describe the structure
of data used by databases and data stores during the data modeling phase schema is generally used for when upfront data modeling can be foregone foregone i did not write that right but because the schema is flexible normally used with no skill databases data integrity the maintenance and assurance of data accuracy and consistency over its entire lifecycle data corruption the act of data not being in the intended state will result in data loss or misinformation normalization a schema designed to store non-redundant inconsistent data denormalize a schema that combines data so that access to data is fast
elts or etls transform data from one data store to another loads of data in an intermediate stage doesn't work does not work with data lakes e-l-t transformations done at the target data store uh works with data lakes more common in cloud services things of azure app analytics okay or azure synapse analytics where the data is loaded and done uh in the actual data warehouse or etc a query when a user requests data from a data store by using query language to return the data result data source data sources where data originates from so analytics and
data warehouse tools may be connected to various data sources bi tools would have data sources as well data consistency when data being kept in two different places and whether the date that the data exactly matches or does not match strongly consistent every time you request data you can expect consistent data to be returned within a time eventually consistent when you request data you may get inconsistent data so like stale data synchronization continuous stream of data that is synchronized by a timer or clock so guarantee of time asynchronous a synchronization continuous stream of data separated by
start and stop uh stop bits no guarantee of time and this is synchronization in terms of processing okay data mining the extraction of patterns and knowledge from large amounts of data not the extraction of data itself data wrangling the process of transforming mapping data from one raw data into from form into another format and we're on the last page here so data analytics data analytics is examining transforming arranging data so that you can extract and study useful information key performance indicators probably not talked about the exam but i threw it in here because it's just
important to know type of performance measurement that a company organization to determine performance over time then in terms of the types of uh analytics that we can utilize we have descriptive analytics what happened so accurate comprehensive like data effective visualization so dashboards reports kpis roi that's when you have all the information diagnostic analytics why did it happen drill down to investigate root cause sometimes they call that root cause analysis we didn't talk about that in the course but that's what it is focus on a subset of descriptive and now an analytics subset so it's a
subset of this one up here okay predictive analytics what will happen so use historical data with statistics and ml probably should highlight that in red there for you to generate trends or predictions predictive analytics what will happen use hybrid data with ml to predict future scenarios that are exploitable cognitive analysts what if this happens so use ml and nlp to determine what if scenarios to create plans if they happen these are all really similar but the thing is is that they just it's it's the lens you put on like the the reason why you're doing
it okay then we talk about run drive so storage and uh storage synchronization service for a single user and then we have sharepoint storage and storage synchronization service for an organization there's a little bit more to that but that is it for data core concepts [Music] let's take a look at azure synapse analytics and this is a data warehouse and a unified analyst platform we're going to talk more about the latter because like you know what a data warehouse at this point it's just a column or store and so here is a visual of um
the data analytics or data synapses studio so here you can see there's a query going on so we're just querying data but there's a lot we can do uh on the unified analytics platform so we can perform etl and elt processes in a code free a visual environment so you don't have to write any code using just data from more than 95 native connectors deeply integrated with apache spark uses tsql queries on both your data warehouse and spark engines just that's what we're looking at there is the tsql and supports multiple languages so tsql python
scala spark sql and net and it's integrated with artificial intelligence so ai and business intelligence tools bi so we could use azure machine learning studio or azure cognito services or microsoft power bi just to get a better visual of the entire flow here on the left hand side you're ingesting data from sources all the data is going to be stored on a data lake storage gen 2 here at the top here we have the azure synapse analytics studios that's where you're going to be doing the interface you're going to be working with and then you're
going to be able to output the various services and notice here that we have sql and apache spark which are the different runtime engines this really looks like to me a data lake or lake house which when i was talking about lake house i was like azure doesn't have an offering but now that i'm looking at the screenshot this definitely is a data lake house so yeah i guess azure synapse is a data lake house cool [Music] all right let's talk about synapse sql so snapchat scale is a distributed version of t sql designed for
data warehouse workloads it extends tsql to address streaming and machine learning scenarios it uses built-in streaming capabilities to land data from uh load supposed to say load data from cloud data sources into sql tables integrate ai with sql by using ml models to score data using tsql predict function and offers both serverless and dedicated resource models so for the serverless side this is great for unpredictable workloads so unplanned or bursty workloads use use the always available or serverless sql endpoint are our options for predictable workloads we create dedicated sql pools to reserve processing power for
data stored in sql tables so here we're talking about dedicated dedicated sql pools and others so let's talk about those very quickly i couldn't even be bothered to make a graphic for this because it's just too much work so i just pulled it right from the docs but dedicated sql pool is a query service over the over the data in your data warehouse the unit of scale is an abstraction of compute power that is known as a data warehouse unit dwd once your dedicated sql pool is created you can import big data with simple poly
based tsql queries and then use the power of this distributed query engine to perform high performance analytics then there's the serverless sql pools which looks like this and serverless sql pool is a query service over the data in your data lake scaling done automatically to accommodate query resource requirements as as topology changes over time by adding removing nodes or failure it adapts to changes and makes sure your query has enough resources and finishes successfully they're not going to test you on these data pool things but you know i just figured we'd provide a little more
context and some more uh language around here just to help solidify what the service is but there you go [Music] just a couple of things that i just want to give extra emphasis on which is apache spark and data lake with synapses so it's synapse but i just keep on saying synapses just get used to it azure synapse can deeply and seamlessly integrate with apache spark as you can see here in ml models with spark ml algorithms and azure ml integration for apache spark 2.4 with built-in support for linux foundation delta lake there's the apache
spark 3. so i'm surprised they're not up to date yet at least when i wrote this simplified resources models that freeze your you from having to worry about managing clusters fast spark startup and aggressive auto scaling built-in supportfor.net for spark allow you to reuse c-sharp expertise in existing.net code with the spark application talking about data lake here azure synapse removes the traditional technology barriers using sql and spark together and you can seamlessly mix and match based on your needs and expertise tables defined on files in the data lake are seamlessly consumed by spark or hive
sql and spark can directly explore and analyze parquet csv tsv json file stored in the data lake fast scalable data loading between spark and sql and spark databases so there you go [Music] so a data lake is a centralized data repository for unstructured and semi-structured data and a lake is intended to store vast amounts of data a daylight generally uses object blob or files as its storage medium and so the idea is you will collect data by putting putting various sources in there you'll do transformations so change your blend data into new semi-structures using etl
or elt and put it right back in the data lake or we could distribute it by allowing access to data to various programs and apis or publish the data set to a meta catalog so analysts can quickly find useful data now if we want to have an azure data lake uh you know you're gonna have to create a data lake storage but the thing is is that gen one is no longer really intended to be used the only people that should be using it are the people that are still in use but we'll focus on
gen 2. so gen 2 is the data lake storage for azure blob storage which has been extended to support big data analytic workloads designed to handle petabytes of data and hundreds of gigabytes of throughput in order to efficiently access the data data lake storage adds a hierarchical hierarchical namespace to the azure blob storage and this is what it looks like on the left-hand side so we have two ways of access through the blob endpoint w-a-s-b-s and the dfs endpoint abfs these are different drivers one is for object one is for file but they'll both get
you in there and they're both compatible with hdfs which is hadoop and then the hierarchical namespace you get access control throttling timeout management performance and optimizations okay so there you go [Music] all right let's take a look here at polybase and this is a data virtualization feature of sql servers and specifically we're saying msql right microsoft sql polybase enables your sql server instance to query data with tsql directly from sql server oracle teradata mongodb hadoop clusters cosmodb without separately installing client connection software and so here's a fancy example to show you uh how great this
tool is and polybase allows you to join data from an sql server instance with external data prior to polybase to join data to external data sources you can either transfer half your data so that all the data was in one location or query both sources of data then write custom query logic to join and integrate the data at the client level so there you go [Music] let's talk about how elts happen in um synapse analytics so you can perform elts uh in synapse sql and uh within the snaps analytics so the fastest and most scalable
way to load data is through poly base external tables and copy statements that's why we talked about polybase so with polybase and copy statement you can access external data stored in azure blob storage data lake store via the ts sql language makes sense because if you can do data like store blob storage is the same thing so here is a graphic where you can see we are ingesting from uh sql into polybase into our data warehouse and then we can talk to our data lake and do a variety of other things so the basic steps
for an etl are extract the source data into text files load the data into blob storage or azure data lake store prepare the data for loading load the data into into staging tables with polybase or copy command transform the data and insert the data into the production tables [Music] so azure data lakes analytics is an on-demand analytics job service that simplifies big data instead of deploying configuring and tuning hardware you write queries called usql to transform your data and extract valuable insights so the idea here is that by exporting approximately 2.8 billion rows of tcps
ds store sales data 500 gigabytes into a csv it took less than seven seven minutes and importing a full terabyte of source uh took uh under with a connector took under less than six hours so the idea is that it's pretty darn fast and just to show you where it is it's over here in the middle so the idea is that this tool just lets you do run uh queries on your data lake okay let's talk about usql so usql is a structured query language included with data like analytics to perform queries on your data
lake you can see there's a bunch of stuff in here like extract and stuff um and so usql can query and combine data from a variety of data sources including azure data lake storage blob storage sqldb data warehouse sql server instances running on azure vm you can install the azure data lake tools for visual studio to perform you sql jobs on your azure data lake i didn't see it in um azure data studio but it might be there too [Music] hey this is andrew brown from exam pro we're taking a look here at the azure
synapse and data lake cheat sheet for the dp900 let's jump into it a data lake is a centralized data repository for unstructured and semi-structured data a data lake is intended to store vast amounts of data data likes generally use objects blobs or files as its storage medium for the azure data lake storage generation 2 this is an azure blob storage which has been extended to support big data analytics workloads it does this via its hierarchical namespace and what the oracle namespace gives you is acls throttle management performance optimizers you can access your data like via
the wasb protocol blob or abfs which is a file system protocol azure synapse analytics is a data warehouse and unified analytics platform has two underlying transformation engines so we have esqel pools and spark pools synapse sql is tsql but designed to be distributed sql dedicated pools is reserve compute for processing serverless endpoints on-demand no guarantee of performance data stored on azure data lake store generation 2 operations are performed within the azure synapse studio polybase enables your sql server instance to query data with tsql used to connect many relational database sources probably use with other services
not just with azure snaps but there you go all right let's take a look at azure blob so blob storage is an object store that is optimized for storing massive amounts of unstructured data unstructured data is data that doesn't adhere to a particular data model or definition such as text or binary data as your blobs are composed of the following components so we have storage accounts which is a unique namespace in azure for your data you have containers which is similar to a folder and a file system and then the actual data being stored so
azure storage supports three types of blobs we've got block blobs so these store text environment data made up of blocks of data that can be managed individually so we're up to 4.75 terabytes we have a pen blobs these optimize for append operations ideal for scenarios such as logging data from virtual machines and we have page blobs these store random access files to up to eight terabytes in size and store virtual hard drives vhd files and serve as disks for azure virtual machines and there you go all right let's take a quick look here at azure
files so azure files is a fully managed file share in the cloud and a file share is a centralized server for storage that allows for multiple connections it's like having one big shared drive that everyone you know virtual machines can work on at the same time so here's an example or a diagram of it so to connect to the file share you use a network protocol such as server message block smb or network file system nfs when a connection is established the files shares file system will be accessible in the in the specific directory within
your own directory tree this is known as mounting so some use cases here completely replace your supplement your on-premise file servers nas drives a lift and shift of your on-prem storage to the cloud via classic lift or hybrid lift lift and shift means when you move workloads without re-architecting so importing local vms to the cloud a classic lift would be where both the application and its data are moved to azure a hybrid lift is where the application data is moved to azure files and the application continues to run on premise we have simplified the cloud
deployment so shared application settings so multiple vms and developer workstations need to access the same config files or diagnostic share we have all vms logged to the file share developers can mount and debug all logs in a centralized place we can dev test and debug so quickly share tools for developers needed for local environments we can do containerization so you can have azure files to persist volumes for stateful uh containers super useful when you're working with containers why use azure files instead of setting up your own file share well shared access so already set up
to work with the standard networking protocols it's fully managed so it's kept up to date with security patches designed to scale uh it has scripting tools to automate the management and creation of file uh files shared with azure api and powershell and it has resilience so it's built to be durable and always working so there you go [Music] hey this is andrew brown from exam pro and we're looking at azure account storage cheat sheet and this is a very short section so azure storage accounts an umbrella service for various forms of managed storage you have
azure tables blob storage and files there's of course cued and some other things in there but these are the three that we care about azure blob storage object storage is distributed across many machines supports three types so we got blah blah so store text and binary data blocks of data can be managed individually up to 4.7 terabytes append blocks optimize for append operations ideal for logging page blobs store random access files up to 8 terabytes in size azure files is a fully managed file share in the cloud to connect to the file share and network
protocols used either smb or nfs azure storage explorer a standalone cross-platform app to access various storage formats within the azure storage accounts and there you go let's talk about business intelligence tools so bi is both a data analysis strategy and technology for business information the most popular bi tools are tableau microsoft power bi and amazon quick site we're going to obviously be focusing on power bi because that's what azure would like us to focus on bi helps organizations make data-driven decisions by combining business analytics data mining data visualization data tools infrastructure and best practices and
there's the logo of the three so you know what it is and now we'll jump into power bi [Music] hey it's andrew brown from exam pro and we're taking a look at microsoft power bi which is a business intelligence tool for visualization business data and here's a screenshot of the power bi desktop and power bi can get a little bit confusing because they have a lot of things under the power bi name but i'll break them down here so it's nice and clear so the power bi desktop is a way to design and adjust reports
the power bi mobile is a view reports on the go on your phone power bi service sometimes called the power bi portal is to access some modified reports in the cloud and power bi embedded is a way to embed power bi components into your applications and usually you need to get data into a power api and so this is one of the most powerful reasons why people like using it it's because it ingests with so many data sources so in here this is a desktop one you go in and you can go under azure and
there's all like every azure service you'd ever want if you go the database tab there's a lot of database integrations for postgres mysql everything it's crazy and so power bi can directly integrate with azure services as you saw here i couldn't be bothered to make a graphic here but you know here you can see you can get things from hd insights sql databases account storage machine learning stream analytics event hubs things like that uh so just to compare the two because these are the most important services is the desktop and the service and they're very
easy to get mixed up so power bi desktop is a dell is is a downloadable free windows application and installed on a local windows computer if you're on a mac you cannot use it sorry or linux either report uh it it can it has reports or sorry so the role that somebody would be using would be you would be a report designer and you'd use uh the desktop application to publish power bi reports to the power bi service okay and power bi service is a cloud-based service where users view and interact with reports users in
power bi service can edit the reports and create visuals based on the existing data model and they can share and collaborate with co-workers so just looking at the overlapping services power bi desktop has many data sources transforming shaping and modeling measures calculated columns python themes rls creation then on the power bi server side you have some data sources that you can ingest dashboards that is the key thing for power bi services that you get dashboards you don't get that on the power bi desktop part apps and workspaces sharing data flow creation paginated reports rls management
gateway connections paginate reports is actually with the builder which you have to download so i'm not sure why it's in there both you get reports visualization security filters bookmarks q a and r visuals but just make a note here that you use the power bi desktop to create reports and then they're get they can be used in power bi service to create dashboards okay [Music] let's talk about data visualizations and chart types and specifically power bi ones so power bi has many kinds of visualizations we'll cover the most common ones but you can see over
here like look at all these little little squares that represents all different kinds of visualizations you can make and even with them they're highly configurable okay so let's go and look at bar and column charts so see how a set of variables changes across different categories we've all seen bar charts it supports stacked ones and bar charts stacked side by side or horizontal ones you know the you know what bar charts are line charts overall shape of an entire series of values so they're just lines uh we have a matrix so that is where you
have a tabular structure that summarizes the data you have key influencers the major contributors to a selected result or value and that one kind of has a very cool looking visualization you have tree map charts of colored rectangles with size representing the relative value of each item we have scatter graphs of represent relationships between two numerical values so you have an x and y it's basically a bunch of dots on a graph you have bubble chart it's the same thing but the the dots now are bubbles and the larger the bubble can represent a third
dimension you have dot plot charts and these are a little bit confusing to look at but they are basically bubble charts but you they're organized based on an x-axis so you're basically putting those into categories i'm always confused when i look at that one but that's just one and one more for us here is a field map so you have a geographic map where different areas can be filled so like here you have states of different colors that represent things it could be gradients there's a lot you can do with maps and geographical maps and
that again is not all the data visualizations but the most common ones you'll come across all right let's take a look here at power bi embedded and honestly this probably won't show up on the exam it's just that when you look use azure and you type in power bi it shows up in the console and i was like what is this thing and i thought this was kind of interesting and i feel like it's it's relevant so that's why i have it in here so azure power bi embedded is a platform as a service analytics
embedding solution that allows you to quickly embed visuals reports dashboards into an application for independent software vendors it enables you to visualize application data rather than building the service yourself for developers you embed reports and dashboards into an application for their customers to use azure power bi if you need a power bi pro user account you need to create an app workspace and you need to choose a capacity so either i guess it'd be like billing work via capacity based or hourly metric modes there you go [Music] let's take a look at power bi interactive
reports and these and so basically when you're using power bi desktop and you can generate reports in the in the portal or service but uh the reports are interactive so if you're getting confusing like there's power bi reports and interact reports basically by default everything's interactive with power bi okay so here is an example of one i just downloaded the uh like the example one that uh microsoft provides and as you can see they uh like in the middle of it there's like a little knob so that kind of gives you an indication of interactivity
or there's like other buttons here so like if you go here you can actually click between map and tabular i believe you can move this range around just get different information so they're highly interactive uh then they're extremely stylized as you can see you can make them look really really good a report can contain many pages and you can assemble reports as easy as choosing a visualization and dragging it out so you take that just drag it out where you want to go and customize it from there now underneath what you can do is you
can see the underlying data so it's just like tabular data and all the tables and fields that populate it and you can and i think with like a pro version you can modify it and so you cannot do these with dashboards so we're using the power bi service you're not going to get access to this when you're looking at data modeling it's again in power bi desktop you can see the relationships between models and modify them and do things with them again you cannot do this with dashboards and that's a key thing you need to
understand between the interactive reports and the dashboards let's take a quick look here at power bi service and also dashboards which is very important for this so power bi is a cloud-based service where users view and interact with reports and where they can create dashboards and so here is um a screenshot of me logged into power bi if you wanna know how to get there you go app.powerbi.com and uh if you already have you have to even if you have a microsoft account you have to fill in a form and then it activates the service
and you can go and explore some dashboards and reports right off the bat for free so it's very easy to jump into one of the concepts that's very important with power bi service is dashboards and so before we talk about that let's talk about what a tile is a tile is a snapshot of data pinned into your dashboard so here's an example of data a tile can be created from a report a data set a different dashboard qa q and a box excel sql server reporting service ssrs and many many more looking at a dashboard
is a single page often called a canvas that tells a story through a visualization so there it is the visualizations you can see on the dashboard are called tiles you can pin tiles to a dashboard from reports okay [Music] it is very very very important that we know the difference between reports and dashboards so this isn't going to be a fun slide but we'll have to go through it and work our way through so let's talk about the difference for capabilities they both have pages but dashboard has a single page and reports has multiple pages
for data sources one or more reports you can have one or more reports and one or more data sets per dashboard and for report a single data set per dashboard for filtering you can't filter or slice for filtering out reports many different ways to filter highlight and slice you can set alerts for dashboards you cannot for reports for features you can set one dashboard as your featured dashboard reports there's no such thing as a featured report you can see the underlying data set tables and fields so absolutely not for dashboards absolutely yes for reports for
customization new and for reports you got tons of customization so there you go [Music] all right let's talk about paginated reports which are reports designed to fit into page formats so they can be printed or shared the data display of all data are tables which can span multiple pages so rdls is an xml representation of an sql server reporting service so that's an ssrs report definition file a report definition contains data retrieval and layout information for report pattern reports are just a visualization of dot rdl files so power bi report builder is used to design
pixel perfect remember that word pixel perfect they really use that a hundred times over passionate reports using power bi report builder it is a tool specifically designed for creation of patching reports so if you want to figure out how to download this power bi thing within your power bi service go to the top right corner go to download i don't know why that's animated but it worked out fine and so once you're in there uh you will download the file and you'll install this really old looking software but i guess the thing is is that
this is a huge pain point for companies i guess and they really make a huge emphasis on it so i guess we need to know what it is [Music] hey this is andrew brown from exam pro and we are looking at the power bi cheat sheet for the dp900 let's jump into it so the first is business intelligence or bi which is both a data analysis strategy and technology for business information helps organizations make data driven decisions now we're talking about power bi so power bi desktop a desktop app to design interactive reports from various
data sources can be published to the power bi service then you have the power bi service also known as the power bi portal a web app to view reports and create interactive shareable dashboards by pinning various data sets and reports visualizations you have power bi mobile a mobile web app to view reports on the go power bi report builder a windows app that builds pixel perfect printable reports used to build page data reports power bi embedded embed power bi visualizations into web apps interactive reports reports in power bi drag visualizations load data from many data
sources both in desktop and and service meaning like you can do you can make reports in both power bi desktop and power bi service okay paginate reports pixel perfect printable report files uh tabular data laid out in page format dashboards build shareable dashboards by pinning various power bi visualizations a single page report basically designed for a screen only for the power bi service dashboard tiles or just tiles is a representation or represent a visualization that has been pinned to a dashboard it could be a bunch of other things but that's what the key thing it
is visualization is a visualization uh is a chart or graph that's backed by it says my but you see by a data set and uh whoops we went to the next part but that's it for power bi [Music] all right let's take a look at structured query language which uh stands for sql it's designed to access maintain data for a relational database management system on rdbms we use sql to insert update delete view data from our data databases tables and sql can join many tables and include many functions to transform uh the final output of
results on the right hand side that is a real query that i use in my postgres database to um grab exam sets so if you're on the xampp pro platform and you're doing a particular set of an exam this query gets that relative information and you can see that it's polling if you look down below here it's joining in tag information and then it has like sub queries and stuff so it's a very complex query and doing that formatting the sql syntax was standardized as iso 9075 that won't show up on your exam but it's
good to know relational databases will mostly adhere to the standard while adding in their uh not one but own database specific features sql's highly transferable skill and we see sql being used in non-relational databases provide a popular and familiar querying tool so it's something you definitely want to know how to do [Music] all right let's compare olap to oltp so online transactional processing versus online analytical processing when we're talking about ltp we're generally using databases so databases is built to store current transactions and enables fast access to specific transactions for ongoing business processing so think
of uh you know any kind of sql server and then on the right hand side we have data warehouses a data warehouse is built to store large quantities of historical data and enable fast complex queries across all the data so when we visually look at the oltp we have a bunch of small transactions that are evenly distributed so they look pretty similar in the read and writes and then for data warehouse we have very very few retransactions and uh we have large payloads i think that the arrows are supposed to be pointing this way but
that's okay it's not a big deal so when we're talking about databases you have a single data source you have short transactions small and simple queries with an emphasis on rights many transactions late it's latency sensitive and you have small payloads on the olap side we have multiple data sources so you're ingesting data long transactions long and complex queries with an emphasis on reads fewer transactions or very few and throughput sensitive and large payloads the use case over here would be general purpose adding items to your shopping cart would be an example a use case
on the analytics side would be generating reports so there you go [Music] all right let's take a look at some open source relational databases that we know we are going to definitely encounter on azure starting with mysql which was created by my school a b i believe that's like a switzerland or a swiss company and they they are required by sun microsystems and then they are required by oracle and mysql was or is an open source project uh so mysql is a pure relational database rdbms it is a simple database which makes it easy to
set up using maintain has multiple storage engines so in odb and my ism when it says that there's multiple they just mean there's two because i don't know of any other than those two but it's the most popular relational database the reason why it's been around forever it was one of the earliest mysql databases that was open source which is a very important thing to note um you know and it's just very easy to use mario db is a fork of mysql by the original creators of my mysql ab after oracle was acquired my school
required mysql there was a concern that oracle may change the open source licensing or stop future my school from being free to use if you know oracle they'll like to charge charge you for their stuff and oracle has their own database and so you know there was a lot of fear around that and the thing is is that when my sql ab sold their database to sun microsystems it it's because they trusted sun but then they didn't know that sun was going through a lot of financial difficulties and then literally a year later some was
acquired by oracle it's just how it goes eh so they would have never sold to oracle originally that's why we have mario db um then we have postgres which evolved from ingress the ingress project at the university of california postgres is an object-related relational database so o-r-d-b-m-s it just has a single storage engine which i guess is the ingress engine right here and so it's the most advanced relational database it can support full text search table inheritance triggers rows data types request slot you know they say the most advanced i mean it's more advanced than
mysql postgres is the database i love to use it's such a great service i don't understand how it's object relational i think it's just how they store the data underneath and that's what makes it so flexible but i can tell you that postgres is a lot easier to use like initially mysql the syntax is easier but postgres in terms of data modeling is a lot easier because you could just create columns you don't have to worry about them whereas like mysql you've got to fiddle around with the data types it's very frustrating or there's like
serious limitations on rows so yeah that's the two there okay and if you want to deploy these on azure it's really simple uh you just type in the name you go mario or mysql or postgres and then you just do azure database for mariodb and you just launch a server okay let's talk about read replicas for azure databases here so a read replica is a copy of your database that is kept synced with your primary database and this additional database is used to improve read contention if you've never heard the word contention before it means
heated disagreement uh so the idea is that if you have a lot of reads and it's and it's hurting the database you can offload those reads to your secondary database that's dedicated specifically for read operations so re-replicas can be applied to azure sql and the managed instances so i guess you can't do it with the azure vms uh like the virtual machines the sql vms that's what they're called you can have multiple rewrite for a database i can't remember what the range is maybe it's between two to six i don't think that matters for the
exam and just kind of a visual you have your read and writes that go to your primary and then a lot of your reads go to your read replicas and a very common use case to have a re-replica is so that you can use it as an olap when you are a very small size i'm not going to talk about an exam but i just know from a practical standpoint that's something that i've done multiple times over the years so there you go [Music] all right let's take a look at scitis on azure and honestly
i don't know if it's cetus situs sometimes i want to say citrus but i could not find a pronunciation for it so i'm going to call it situs and it is an open source postgres extension that transform postgres into a distributed database and so situs extends postgres to provide better support for database sharding real-time queries multi-tenancy which is super useful time time series workloads and if you go to azure postgres and you see hyperscale option it really is just using siteis so um this is a really really really really good service if you are using
postgres it's one of the few reasons i would consider using azure for my database because i use postgres as my primary one this will absolutely not show up on the exam but i think it's very useful to know what the service is [Music] all right let's take a look at the azure sql family when we say sql we're talking about microsoft's version of sql and if you search sql you'll see a bunch of stuff here even more than this and it can get really confusing but you absolutely need to know this for the exam you
need to know the difference between these three main services so at the top you have sql server on azure virtual machines or sql you'll just see like vm for desktop vm a lot when you need os level control and access when you need to lift and shift your workloads to the cloud when you have an existing sql license and you want to save money via the azure hybrid benefit this is when you're going to use that okay if you've never heard the term lift and shift the idea is that on your on-premise environment you're running
a virtual machine that is your database and you can literally save a virtual image import that into azure and it runs exactly how it did on your on premise so you don't get a lot of the advantages of the cloud but the idea is it's the easiest way to get onto the cloud right the next option is sql managed instance this is when you have an existing database you want to modernize it's the broadest sql server engine compatibility highly available just disaster recovery automated backups ideal for most migrations to cloud so the thing is if
you're going to do a lift and shift you can you can kind of go to esco manage so you probably have to do a transformation some kind of like etl job or something to go into here but the idea is that there's a lot of different versions of um of uh mssql depending on how old it is and stuff like that so if you aren't going to be bringing your license or need os level you really want to be using this one because you get all the built-in scalability stuff right the third one is azure
sql database this is a fully managed sql database designed to be fault tolerant built in disaster recovery hive it's highly available designed to scale uh and it's the best option um but again you know if you have an older database maybe you could do a transformation to it uh and then underneath it has sql servers i thought this was a fourth option but really when you go to azure sql and launch it you actually have to create a server because you can have multiple sql servers associated to a database and there's things called like what's
it called like elastic pool or something like that so that's just the underlying server for the azure sql database it's not a service in itself but it's just a component of that service okay even though you can go in the ui and see a list there makes it really really really confusing but yeah there you go so azure elastic pools is a feature of azure sql and it allows you to uh there are simple cost effective solutions for managing and scaling multiple databases that have varying and unpredictable usage demand so databases in elastic pool are
on a single server and share a set number of resources at a set price elastic pools in azure sql enable sas developers to optimize the price performance of a group databases within a prescribed budget while delivering performance elasticity for each database why would somebody want to do this because like it doesn't seem like a good practice to put a bunch of databases on a single server it's more like you'd rather want a database to be distributed across servers so if you're running a sas product software as a service you'll have multi-tenancy meaning that each person
has their own database there's different levels of tenancy but if you did give let's say you had a company and um or like you had five large clients and they use the same software and they're all varying sizes but they're not large enough to justify their own server this would be the service for you right where you are constantly spinning up databases per customer um but i don't think i would ever use this in practicality and i am a multi-tenant sas but uh it's nice that they provide that option so there you go [Music] hey
this is andrew brown from exam pro and we are on to the relational database cheat sheet uh and so let's jump into it so structure query language sql designed to access and maintain data for a relational database management system online transaction processing keyword there is transaction so frequent and short queries for transactional information so databases any kind of generic workload or web app online analytical processing so complex queries for large databases to produce reports and analytics so think data warehouse on to the open source relational databases we got mysql a pure relational database easy to
set up most popular open source relation relational database definitely something i started off with mariodb is a fork of mysql postgres is an object relational database now my favorite relational database to use is more advanced and well liked among developers read replicas is a duplicate of your database in sync with the main to help to reduce reads on your primary database now talking about azure sql it's an umbrella service for uh for different offerings of mssql databases hosting services so we have sql vm so for lift and shift when you want os access and control
or you need to bring your own license for azure hybrid benefit manage sql for lift and shift when your broadest when you need the broadest amount of compatibility with sql versions mssql in particular you can use uh manage sql on on-premise by using azure arc it gives you many of the benefits of a fully managed database but it's not as good as the azure sql database which is a fully managed sql database has a few options here you can run it as a single server run it as a database which is a collection of servers
run in an elastic pool so databases of different sizes residing on this on one server to save cost uh then we have connection policies so we have three modes we got default so choose proxy or default initially depending if the server is within or outside the azure network we have proxy outside the azure network proxy through a gateway it's important to remember to listen on port 1443 this might show up on your exam so remember this port 1443 when connecting via proxy mode through a gateway outside the azure network and then last is redirect redirected
with the azure network and that's the recommended way to do it and i just want to point out for these three here i didn't write it in here but remember that this one here is for infrastructures code this one is for platform as a service and this one's for platform as a service okay all right let's talk about tsql which stands for transact sql and it's a set of programming extensions from sybase and microsoft that added feature several features to the structured query language if you've never heard of cybase i think the original company that
actually made the microsoft sql database and then maybe microsoft bought them out or et cetera but there's a long history there so tsql expands on the sql standard to include procedural programming local variables very various support functions for string processing date processing mathematics changes to the delete and update statements for the microsoft sql servers there are five groups of sql commands and honestly there's five groups for regular sql actually normally they'll just say even for this exam they're only going to tell you of about the definition manipulation one but i'm going to tell you all
of them because you should know all of them it really helps to know them so the first is data definition language so this is ddl used to define the database schema we have data query language dql used for performing queries on the data data manipulation language dml manipulation of the data in the database data control language dcl rights permissions and controls of the database transaction control language tcl transactions within the database and so now that we've covered what tsql is let's dive in and actually look at all these types of documents let's start off with
the data definition language which is sql syntax commands for creating and modifying the database or database objects so tables index views store procedures functions and triggers the first is the create command here and so we can create a database or database object so here you would just say create whatever it is you want to create table database etc so here we're creating a table called users and we're providing those fields okay then we have alter so this alters the structure of an existing database so alter table whatever it is the table and then we can
add a column if we want to drop the database that deletes all the objects from the database truncate would just delete the records within the database comments is just a comment and rename is if we want to rename a database object now this is what's interesting here where we have execute sp rename so when we said that tsql extends this is its own little special language because in other other sql variants it definitely does not look like that okay [Music] let's take a look at data manipulation language dml and so this is going to have
to do with anything with manipulating data so the first thing is we have an insert command to be able to uh insert data so here you can see we say the values we want and if you're wondering how does it know what values it's going to be the order in which the column appears in the in the actual table that's how it knows that andrew goes under first name and email goes in like the email goes into the email field then for update this is uh when we want to update existing materials so we'll say
update users then we'll have to say where so we want to match the id 8 update user 8 and then we'll set the values that we want to change then we can delete a user very simple delete from users where id equals 6 merge or upsert to insert or update records at the same time i don't see these in other languages so i think it might be a tsql specific thing it's kind of hard to show this because they're very large queries but if you need to do an insert and update at the exact same
time you use this then you have call this allows you to call proceed or java sub programs basically functions so let's say you need to calculate a function uh to calculate the distance between toronto and chennai uh you could uh do that you have lock table and this is for concurrency control to ensure two people are not writing to the to the program at the same time okay [Music] all right let's take a look here at the data query language dql and the first thing we have is the select and by the way every if
it's query that means it has everything to do with selecting data okay so here what we're going to do is select the these particular fields and do from users if we want to get all fields we can just do an asterisk here um but that's the idea is like i want these fields from this users you could even say like where and other stuff in there we have show so this describes what a table looks like most other languages like bicycle would just say like show and then the table name but here again we have
that these kind of weird exec sp columns thing which again is a tsql specific thing and so show would describe what the table looks like so what columns is contained okay we have explained plan so returns the query plan of a microsoft azure synapse analytics sql statement without running the statement this thing is really complicated i could not show you an example there but you know that's what it does help is just like i want to understand more information about database objects so here you're asking more information about the user's table again this is the
special tsql stuff all right let's take a look here at data control and this has to do with well control right so we have grant saying i have a i have a table called employees i'm only going to let the um ts goal user or the uh mssql user andrew only be able to select insert update and delete on then you have revoke and that's the opposite let's just say you know we don't want bacon to be able to delete anything on the employees table and that's pretty much it [Music] let's take a look here
at transaction control language tcl so tcl commands are used to manage transactions in a database transactions is when you need multiple things to happen and if they don't all happen then you roll back on them okay so this is really important in finance where you have multiple people that have to be part of the purchasing decision and if all purchases don't follow through then you don't want to commit that transaction okay so we have command so set to permanently save any transaction to the database rollback restores the database to the last committed state save point
used to temporarily save a transaction so that you can roll back to this uh to the point whenever necessary set transactions specify characteristics for the transaction [Music] all right just a quick review of all the sql documents we or syntax documents we just looked at and so we had ddl which is for defining dml which is for manipulating dql which is for querying dcl which is for controlling and tcl which is for transacting now i highlighted ddl and dml in red because these are what the exam will focus on and sometimes they simplify and they'll
take things like that that go in um like select and they'll just put it in manipulation okay so these are the two main ones that you'd have to choose between but i wanted to show you all five of them because this is really what sql is based off of uh and so you know this is just the right way of looking at it this is not an exhaustive list of all the possible commands but it was what i could terry pick out that i recognized that i was familiar with okay [Music] hey this is andrew
brown from exam pro and we are on to the tsql cheat sheet so transact sql is a set of programming extensions from sybase and microsoft that adds several features to the structured query language this is used for ms sql databases okay for mss google servers there are five groups of sql commands and so we have data definition language used to define the database schema data query language used for performing queries on data data manipulation language manipulation of data in the database data control language rights permissions and other controls of the database transaction control language tcl
transactions within the database so on the exam they're going to ask you either this or this they don't bother with all of them but they're actually r5 okay and so that's all you need to know here [Music] let's talk about connectivity architecture so when a connection from a server to a azure sql database the client will connect to a gateway that listens on port 1443 and i want to remember that port number because it's an important part number that might show up on your exam okay so over here on the right-hand side we have a
virtual machine connecting to various different sql servers and so the idea here is that based on the connection policy the gateway will grant traffic and route access to the appropriate database so we actually have three kinds of policies we got proxy so connections are proxy through a gateway increased latency and reduced throughput intended for workloads connecting from outside the azure network redirect which is mostly recommended establishes a direct connection reduced latency improved throughput intended for workloads connecting inside the azure network and default which is just going to default if you launch a vm outside the
azure network it's going to use proxy if you launch it within the azure network it's going to use redirect so this thing this port 1443 is only important when you're doing proxy because if you're internal you don't need to go through that port there's no gateway to pass through but yeah that is just the different kinds of connection policies there [Music] all right so let's take a look here at ms sql database authentication so during the setup of your ms sql database you must select an authentication mode you got two options here windows authentication mode
which enables both windows authentication and disables sql server authentication and mixed mode which enables both windows authentication and sql server authentication so if you were to remote into your windows machine and under your server properties look under security and server authentication there are those two options interestingly they don't call it mixed mode in the ui but that's what it is it's called mix mode so let's talk about what windows authentication and sql server authentication is so windows authentication which is the recommended way is specific windows users and groups group accounts are trusted to log into
the sql server and very and this is the most secure and easy way to modify revoke privileges because you know if they're windows users that means that you can then manage them from azure active directory right then you have sql server authentication so we have a username and password which is set and stored on the primary database you cannot use a kerberos security protocol so that's one disadvantage login password must be passed over the network at the time of connection so that's an additional attack vector but this is an easier way to connect to the
database from outsider domain or from a web-based interface so it's just going to be based on the uh the scenario that you're in but if you can just stick with windows authentication [Music] let's take a look here at network connectivity so for your sql database you need to choose either a public or private endpoint a public endpoint is reachable outside the azure network over the internet and you would use firewall rules to protect your database for private endpoints you're uh they're only reachable within the azure network or connecting uh or originating from inside the network
so you would use azure private links to keep your traffic network within the azure network so here's just the two options you would see when you provision your database you'll just see here that you choose either public or private and you're setting firewall rules or you're creating private endpoints okay [Music] well let's take a look here at azure defender for sql which is a unified package for advanced sql security capabilities and what it does is vulnerability assessment and advanced threat protection so azure defender is available for azure sql the manage instance and synapse analytics and
what it does is it discovers and classifies sensitive data or classify sensitive data surfacing and mitigating potential database vulnerabilities detecting anomalous activities and you can turn it on at any time and you just pay a monthly cost within the azure portal so there you go let's take a look at azure database firewall rules so azure databases are protected by server firewalls a server firewall is an internal firewall that resides on the database server all connections are rejected by default to the database so once your database is provisioned you have an option where you click server
firewall and what you're going to do is configure it so you'll say here i allow azure so i give 0000 and notice also the connection policy remember we talked about that before so you can set proxy or redirect and so there's that there and then if you wanted to do it via tsql you could as well this is allow only allow the server at zero point four at zero zero zero point four because you can specify your range there is azure firewalls which is uh a totally different service and then there's network security groups which
it which is like a logical firewall around your uh vm or your i'm sorry around your nic cards and the subnet but this is the one we were talking about here which is the server firewalls for azure databases okay [Music] all right let's take a look here at always encrypted which is a feature that encrypts columns in the azure sql database or sql server so if you had a column like a credit card number and you wanted to always keep it encrypted you'd use always encrypted and so always encrypted uses two types of keys we
have column encryption keys which are used to encrypt the data in an encrypted column and call them master keys a key uh protecting the key that encrypts one or more column encryption keys and you can always uh uh you can al you can apply always encrypted using tsql and so since there is a key that encrypts the key that is called envelope encryption and that is a great way of doing that i imagine that maybe it gets stored in eks or whatever the name of the service that azure calls for their uh encryption keys let's
take a look here at role-based access control specifically for databases so role-based asset controls is when you apply roles to users to grant them fine grade actions for specific azure services and there's four in particular that we really do care about here uh to databases and this is and this will probably show up on your exam so you definitely need to know these we have sqldbcontributor so this role allows you to manage sql databases but don't access them can't manage their security related policies or their parent sql servers you have sql managed instance contributor this
is so you can manage sql managed instances and required network configuration can't give access to others sql security manager manage the security related policies of sql servers and databases uh but not access to the sql servers and last is sql server contributor manage sql servers and databases but not access not have access to them those sql servers okay so our bacs you definitely want to know these four okay [Music] let's take a look here at transparent data encryption tde which encrypts data at rest for microsoft databases it can be applied to server sql uh or
sql servers azure sql databases azure synapse analytics tde does real-time i o encryption and decryption of data logs and files encryption uses a database encryption key called a dek data database boot record stores the key for availability during recovery the d e key the d e k is a symmetric key so it's the same cryptographic key for both of the encryption uh of plain text and decryption of cipher text just to give you a visual there to help you out the idea is you have something that's plain text you use the same key to encrypt
it and then the same key to decrypt it and that's just how that works so the steps to apply tdd to database create a database master key create a certificate to support the tde create the database encryption key enable tde on the database so that's just how you do it there within the azure portal and there you go [Music] before we start talking about what dynamic data masking is let's define what is data masking this is when a request for data is transformed to mask the sensitive data so imagine you have a credit card here
and you do not want to expose that to particular users maybe there are part of your support team or or even the end user themselves and so the idea is that it's in the database it's stored in its raw format it's untransformed but what it'll do is pass through a masking service and that will have different rules on it which will then apply a particular filter so here we'll only show the last three so dynamic data masking which is a feature of a particular azure sql servers anyway can be applied to azure sql and manage
instances and synapse um and so the idea is you would just turn this feature on and then you create a masking policy so you could say exclude a particular users for masking so like let's say you have like root users admin users that need to see all the data you can do that you make masking rules so what fields should be masked and then you have masking functions so how to apply the masking field so here this would be a masking function that would say okay only show the last three letters okay let's take a
look at private links so azure private links allows you to establish secure connections between azure resources so traffic remains within the azure network so here's a big graphic of what that looks like and the idea is that you have a private link endpoint uh which is just a network interface that connects you privately and securely to a service powered by azure private link and private endpoints uses a private ip address for your vnet so many azure services by default work with private link and third party providers can be powered by private link as well private
link service which allows you to connect your own workloads to private link you may need an azure standard internal load bouncer to associate with the link service but the idea here is that if you have your sql server right that way you can connect it to your on-premise loads or if you have an sql server on your on-premise and you want to do it to a vm that's what you're going to use okay hey this is andrew brown from exam pro we are on to database security for azure the dp900 so let's jump into it
so mssql database authentication we have two modes when setting it up when you're remoting into a windows machine so you have windows authentication mode which enables windows authentication and disables sql server authentication and we have mix mode where enables both of these things what are these things well windows authentication is when you authenticate via windows users and sql server authentication is between the user and password you can connect from anywhere windows authentication is the recommended one because it's just more secure for network connectivity we have public endpoints so they're reachable outside the azure network over
the internet you use server firewalls for production and you have private endpoints so only reachable with the azure network so use azure private links to keep traffic within the azure network azure defender sql a unified package for advanced sql server security capabilities for vulnerability assessment and advanced threat protection server firewall rules an internal firewall that resides on the database server all connections are rejected by default uh to the database always encrypted a feature that encrypts columns in an azure sql database or sql server role-based access uh controls for databases so these are roles you need
to know the sql db contributor manages the sql database but not access them can can't manage their security-related policies or their parent sql servers sql manage instance contributor manage sql instances and require network configuration configuration can't can't give access to others sql security manager manage the security related policies of sql servers databases but not access to sql servers and the last is sql server contributor manage sql servers databases but not access to them to the sql servers okay you have transparent data encryption td encrypts data at rest for microsoft databases in many cases it's already
turned on for you dynamic data masking you can choose your database columns that will be masked obscured for specific users azure private links allows you to establish secure connections between azure resources so traffic remains within the azure network i should have put one underneath but this is generally if you want to also connect in a hybrid connection okay so there you go [Music] let's take a look at what a key value store is so a key value store is a data store that is really dumb but it's super super fast okay and so they'll lack
features that you would normally see in relational databases like relationships indexes aggregation transactions all sorts of things but you know there is a trade-off for that speed okay and so here is kind of a representation of a key value store which uh you have a key which is a unique you know key to identify the value and i'm representing the value as a bunch of ones and zeros because i want you to understand that there aren't really columns it's just key and value so the idea is that imagine that those ones and zeros actually represent
a dictionary and that's usually what they are is it associative array hash dictionary underneath okay and so even though it looks like you know what i mean like if this was a relational database you know you could see these as kind of like columns and so if we kind of did that that's how a key value store can kind of mimic um you know a tabular data right but the thing is is that you know there is no consistency between the the rows hence it is schema-less but that's kind of a way to get tabular
data from key values but due to their simple design they can scale well well beyond relational databases so relational databases it becomes very hard to shard them and do a bunch of other stuff with them but key value stores are super easy to scale but you know they come with a lot of extra engineering around them because of these missing features all right let's talk about document stores so document store is a no skill database that stores document as its primary data structure a document could be an xml but it's most commonly json or json
like structure and documents stores are sub classes of key value stores so the components of a document store compared to relational database looks like this so the idea is that you have tables is now collections rows are documents columns or fields indexes are still the same name and when you do joins they're called embedding and linking so you know if a key value store can kind of store this why would you do it well there's just a lot more features around the documents itself and so you know how we saw key value store didn't have
like it had like nothing like no functionality well document store brings a lot more of the functionality that you're used to in a relational database you know and so it makes things a little bit easier to work with okay [Music] all right let's take a quick look here at mongodb which is an open source document database which stores json-like documents and the primary data structure for mongodb is called a bson so a binary json is a subset of json so its data structure is very similar to json but it's designed to be both efficient and
storage in both storage space and scan speed compared to json and bson has more data types than json has date times byte arrays regular expressions md5 binary data javascript code json's just strings integers and arrays it's very very simple but because it has all these new other data types and it's stored in this binary format it's not plain text it's actually binary data that's the one reason why it the storage space and the scan speed is so fast now if you did use javascript to perform an operation like say insert data this is what it
would look like so you have kind of an idea that you're inserting items into a collection there okay just to list out some features of mongodb it supports searches against fields range queries regular expressions it supports primary and secondary indexes it's highly available it's it's high availability can be obtained via rep replica sets so replica to offload reads or access standby in case of failover momodube scales horizontally using sharding mongodb can run multiple servers via load balancing mongodb can be used as a file system which is called grid fs with with load balancing and data
replication features over multiple machines uh for storing files mongodb provides three ways to perform aggregation uh grouping dat and aggregations just grouping data to return a query so aggregation pipeline map reduce single purpose aggregation mongodb supports fixed collections called capped collections i'm going to become claims to support multi-document asset transactions so mongodb when it first came out didn't do all this stuff and people complained about it i like it being very hard to scale but now it's a lot easier to use so you know mongodb is something that is uh more uh a more popular
option nowadays than it was a few years ago so there you go [Music] all right let's take a look here at what a graph database is so graph database is a database composed of data structures that use vertices nodes or dots which form relationships to other vertices via edges arcs and lines so some use cases here fraud detection real-time recommendations engines master data management network and it operations identity and access management and there's a lot they're saying like it's really really good for that i am something i want to look into later traceability and manufacturing
contact tracing data lineage for gdpr customer 360 degree analysis like for marketing product recommendations social media graphing and feature engineering for ml so let's just kind of break down you know the little components here so what you'd have is a node and a node can contain data properties and then through that it would have a relationship through an edge and that relationship can have a direction and also data properties on it and so it's a lot more um verbose like in ter than a relational database and also just how it can point to stuff so
uh super useful uh for particular use cases [Music] let's take a look here at azure tinker pop which is a graph computing framework for both graph databases oltps and graph analytic systems olaps so tinkerpop enables developers to use a vendor agnostic distributed framework to traverse query many different graph systems they'll always say traverse because there's so many it's a tree right so there's a lot of databases that this thing connects to and so here they all are but the ones i want to indicate to you that are important is amazon neptune cosmodb hadoop via spark
neo4j which is one of the most popular graphing databases orient db and titan okay so the thing is is that this isn't a graph database it is a basically adapter to other graph databases and ticker pop includes a graph traversal language called gremlin which is the single language that can be used for all these graph systems so let's talk about gremlin gremlin is a graph traversal language for apache tinker pop and so it looks like this and sometimes uh you know like even without tinker pop i think this is with cosmodb that they'll support this
language by default so you don't necessarily need to have tinker pop to work with some databases but it's great to have that service if you if or like the framework if you need it so gremlin is is designed to write once and run anywhere w-o-r-a gremlin traversal can be evaluated as a real-time query so lltb or a batch analytics query so over here it's just kind of showing you these are the oltps graph databases over here and then on the right-hand side we have olaps okay and so gremlin hosted language embedding means you can use
your favorite programming language when you write gremlin okay so there you go [Music] hey this is andrew brown from exam pro and we are looking at azure tables which is a type of storage for nosql key value data store within the azure storage accounts azure table stores non-relational structured data with a schema-less design and there are two ways to interact with azure tables through the storage table storage api or microsoft azure storage explorer which i find is the easiest way to interact with it so just kind of looking at storage explorer there if you wanted
to add an entry you'd have to provide a partition key which is a unique identify fire for the partition with a given table and a row key a unique identifier for an entity within a given partial a partition and so you have all your data types here so we see string boolean binary data type double uh guids 32 and 64. if we wanted a query you'd have to query along the partition and row key so you could also do some additional filtering here so just notice here that um you know you have your partition key
you put your value like klingon and wharf and then this is not this is just additional properties you added a lot of time the way these key values work is that this will return the results like all the results and then server side and then client-side these will be filtered client-side i don't know if that's the case with azure table but that's generally how these things work and so there you go [Music] hey it's andrew brown from exam pro and we're looking at cosmodb which is a service for fully managed noaa school databases that are
designed to scale and be highly performant so cosmodb supports different kinds of nosql database engines which you interact via an api so we have the coresql which is their document datastore their azure cosmodb api for mongodb their azure table and gremlin okay and this will be using uh probably tinker pop um so all of these nosql engines uh uh specify capacity so you can do provision throughput for pay for guarantee of capacity or serverless pay for what you use so if you are just playing around with the service you can go ahead and choose that
serv serverless option and so a lot of times when people talk about cosmodb they're usually talking about coresql so if you say cosmodb it's usually document but understand that there's a bunch of stuff underneath it now if you want to start viewing data and making stuff and playing around with it you'd use the cosmo db explorer which is a web interface that you can find at cosmos.azure.com so after you made your cosm db cluster or container whatever they call it then you could go access your database so here we have the sql api and so
that would be the document store and you could just see here that we have we've created a new item here for that data okay and so i just want to show you that if you drop down here you choose container or database so we create a new container um also if you are in um azure it looks like they just have it here under the data explorer tab so it's the same thing it's the cosmo db explorer just in line okay so you don't have to like go to that url you could just click into
your um your it's called account cosmodb account and go to data explorer i just wanted to show you here like if you made a graph database that you can do everything through this explorer for all the different types the interface will change a bit so here we'd add a new vertex right and it's just slightly different okay [Music] all right so the thing about azure tables is that you can use it within either cosmodb okay or you can use it within account storage and the thing is is that um it's a really good comparison to
look at these two things because this way we can really understand like how powerful cosmodb is all right so what we'll do is compare the two so over here when you have azure tables in account storage it's fast but it has no upper bounds of latency for azure cosmodb it's going to give you single digit millisecond latency for reason writes for throughputs it's variable throughput it's limited to 20 000 operations you get a guaranteed uh backed by an sla and no upper limits when you're using cosmo db for global distribution it's a single region and
for cosmic db you have 30 plus regions for indexing you only get the primary index or partition and row no secondary indexes and then for cosmodb you get automatic and complete indexing in all properties no index management for green you get query execution uses index for primary key and scans otherwise and for uh cosmodb you get queries that can take advantage of automatic indexing on properties for fast query times for consistency we got strong with primary region and eventual with secondary regions and with uh cosmodb there's like five you know what i mean there's just
uh the consistent levels are a lot more flexible okay for pricing it's consumption based and then for cosmodb you have consumption based or provision capacity for the slas it's 99.99 availability and here it it's backed by an sla but some conditions it does not apply okay so you know hopefully that shows you that cosmodb like is very performant is globally available single digit millisecond and i i really feel like this is to compete with um adabus um dynamodb because it sounds so similar to dynamodb but yeah there you go hey this is andrew brown from
exam pro and we are on to the azure tables and cosmos db cheat sheet for the dp900 i want to point out something uh that i'm sure you already know about but in the course i spelt cosmos db without the s like everywhere and i'm not going to go back and fix that but i know i'm going to hear like never the end of it for like the next year okay so let's start at the top here azure tables it's a key value data store can be hosted on either azure storage account storage it is
designed for a single region and single table can be hosted on cosmos db and when it's hosted here it's designed for scale across multiple regions cosmodb a fully managed nosql service that supports multiple nosql engines called apis why they didn't call them engines i don't know coresql api this is the default one it's a document database you can use sql to query documents and when people are talking about cosmodb that's what they're talking about the document database the default one okay graph apis a graph database you can use with gremlin to transfer traverse the nodes
and edges mongodb api a mongodb database it is a document database tables ai is just as your table's key value but within cosmodb apache tinkerpop an open source framework to have an agnostic way to talk to many graph databases they probably won't ask you about tinker pop on the exam gremlin graph traversal language to traverse nodes and edges you definitely need to know what gremlin is and be used to seeing what it is like identify what it looks like mongodb an open source document database and the way it works is it has its own um
data structure its document structure called bson which is binary json a storage and compute optimized version of json introduces new data types cosmo db explorer a web ui to view cosmos databases and there you go [Music] so what is apache hadoop well it's an open source framework for distributed processing of large data sets hadoop allows you to distribute large data sets across many servers and computing queries across many servers so htfs and mapreduce were the first features that were launched with hadoop way back in the day in version one and since then there's a lot
more services now but the idea behind distributed processing is the idea is the idea is that your computer servers do not need to be on specialized hardware you can run them on common hardware and that's actually how google back in the day like uh there's a story over google they just kept on like adding like all random machines to build up their search engine didn't matter what it was and that eventually became the hadoop system so apache dupe framework has the following so hadoop common collections of common utilities and libraries that support other hadoop modules
hadoop distributed file system a brazilian and redundant file storage distributed on clusters of common hardware hadoop mapreduce writes apps that can process multi-terabyte data in parallel on large clusters of common hardware hbase a distributed scalable big data store yarn manage resources nodes containers and perform scheduling hive used for generating reports using sql pig a high level scripting language to write complex data transformations if they sound like they do the same thing they absolutely do they're just slightly different hadoop can integrate with many other open source projects via the hadoop components and we're going to see
a lot of those open source things here in a moment let's take a look here at apache kafka which is an open source streaming platform to create high performance data pipeline streaming analytics data integration and mission critical applications absolutely the number one streaming service though it is open source so you have to find a way to uh to host it kafka was originally developed by linkedin and open source in 2011. kafka was written in scala and java so to use kafka you're going to be writing java code here is just kind of a diagram of
how it works so you got producers consumers and topics so in kafka data is stored in partitions on a cluster which can span multiple machines which makes it distributed computing producers publish messages in a key and value format using kafka producer api and consumers can listen for messages and consume using the the cough consumer api messages are organized into topics producers will push messages to topics and consumers will listen on topics so there you go so azure hd insights is a managed service to run popular open source analytics services so here's kind of a graphic
of how it works and hdinsight supports the following frameworks apache hadoop which is it which is what it is it's the entire system apache spark kafka storm hive hbase and lap and also you can run our workloads hd insights has broad range of scenarios such as etl data warehousing machine learning internet of things just because you can put in so many things here and so you know hdinsights is just a managed version of hadoop and it just makes it really easy to to do stuff so you can consume stuff run these in clusters so i
think these are each called a cluster for whatever you're running and they can go out out to somewhere now once you launch hdinsights you can use the apache ambari which is an open source hadoop management web portal to provision manage and monitor your hadoop clusters and so when you create a cluster you're going to get one by default it's just under here where it says cluster dashboard and the idea here is you'll see all the types like htf mapreduce all the stuff here and it just makes it really easy to interact with hadoop okay [Music]
hey this is andrew brown from exam pro and welcome to the hadoop cheat sheet for the dp900 let's jump into it so apache hadoop is an open source framework for distributed processing of large data sets and underneath it has the hadoop distributed file system hdfs a resilient and redundant file storage that's distributed on clusters of common hardware you have mapreduce which writes apps that can uh process multi-terabytes data in parallel on large clusters of common hardware hbase a distributed scalable big data store yarn managed resources nodes containers perform scheduling hive used for generating reports using
an esco language pig a high level scripting language to write complex data transformations apache spark can perform is 100 times faster in memory and 10 times faster than disk than hadoop supports etl streaming and ml flows you can run that on hadoop i didn't just did not put it under the um i debated whether i should put it in in line or there but that's where i put it apache uh kafka a streaming pipeline analytics service hd insights a managed service to run popular open source analytics services it is fully managed hadoop system so it's
just hadoop but managed by azure there so there you go that's the chi chi [Music] apache spark is an open source unified analytics engine for big data and machine learning and spark lets you run workloads much much faster than hadoop though you can run it in the hadoop system okay so 100 times faster in memory 10 times faster than disk and which is why spark is being described as lightning fast so when it says hadoop you know it's talking about hive and pig and the other things that usually come along with hadoop and so apache
spark is a collection of libraries that work well together to form an analytics ecosystem so we have the spark core this is the underlying engine and api the api supports the following programming languages r sql python scala and java you have spark sql which introduces a data structure called a data frame not the same thing as a pandas data frame but they're just called the same thing okay which can be used with uh dsls to work with structures and semi-structured data you have spark streaming allows spark to ingest data from many streaming services so htfs
flume kafka twitter kinesis you have graph x so distributed graph processing framework you have machine learning the mlib library and this is a distributed machine learning framework with common machine learning statistical algorithms the way you are going to interact with spark is through resilient distributed data set rdd which is a dsl to execute various parallel operations on the epoxy spark cluster so here are some common functions map filter distinct count min max mean paralyzed you get the idea and here's an example of rdd api so just notice here um i think basically all of these
are are those functions right so that just makes it really easy to transform and work with data okay [Music] so databricks is a software company specializing in providing fully managed apache spark clusters and the company founders were the creators of the apache spark delta lake and ml flow open source projects and databricks has two main offerings the database platform and so dablex cloud based spark platform with an easy to use web ui where you can launch fully managed spark clusters launch notebooks to write code and interact with spark create workspaces to collaborate with team members
and role-based access controls create jobs for etl or data analysis tasks that run immediately or on schedule create ml workflows and is available on all main cloud service providers aws azure and gcp they also have the database community edition which is a free version of the databricks platform for educational use create a free michael cluster that terminates after two hours when idle no workspace jobs or rbac so really just a subset of the one above so azure data bricks is a partnership between microsoft and databricks to offer the database platform within the azure portal running
on azure compute services and it offers two environments workspaces and so basically this is just the azure database platform with integrations to azure data related services for building big pipelines so if you need to do batching you can use azure data factory streaming apache kafka event hub iot storage azure blob storage azure data like storage the other side of it is azure databricks sql analytics run sql queries on your data lake create multiple visualization types to explore your query results build and share your dashboards now if you want to launch a databricks workspace it's really
easy all you got to do is create a workspace and choose your plan launch a workspace use sso to connect to it and start your database platform so if you go up here and this is in the azure portal with azure databricks we'll just choose which one we want and then we launch the workspace it'll sson and then you're in there okay so um you know basically if you launch it's not going to cost you any money so if you want to play around with it you can do that or if you're really antsy you
can just go to the databricks website and try the community edition because there's risk there's no risk of spending any money when using that okay [Music] hey this is andrew brown from exam pro we're taking a look here at the apache spark and data bricks cheat sheet so let's jump into it so apache spark is an open source unified analytics engine for big data and machine learning it's a hundred times faster in memory than hadoop ten times faster than disk than hadoop can perform etl so batch streaming and ml workloads uh for the apache ecosystem
it's composed of spark core which is the underlying engine api spark sql you use sql it also has a new data structure called a data frame to work with data spark streaming which is a way to ingest data from many streaming services graph x distributed graph processing framework mlib a distributed machine learning framework there's also rdd it's a domain specific dsl to execute various parallel operations on apache spark clusters then we're talking about data bricks here it's a software company specializing at providing fully managed apache spark clusters we have azure databricks a partnership between microsoft
and database to offer the the database platform with within the azure portal running on azure compute services azure databricks offers two environments we have the databricks workspace so this is the databricks platform with integrations to azure data related services for building data pipelines if you went to azure or sorry if you went to databricks website and signed up it's the same portal okay azure database sql analytics run your query on your data lake and i believe that this is basically um azure synapse analytics so that's the engine that's in there okay [Music] take a look
here at sql management studio also known as ssms which is an ide for managing any sql infrastructure if you take a good look at the screenshot you can see that allows us to work with databases write sql statements and that's pretty much it so access can access configure manage administer and develop all components of sql server azure sql database azure synapse analytics and it has a few components in it so we have object explorer view and manage all objects in one or more instances of sql server template explorer build and manage files for boilerplate text
that you can use to speed the development of queries and scripts and this is deprecated but you might hear about it so i just mentioned it here and this is build projects used to manage administration items such as scripts and queries so not a complicated service but uh yeah there you go [Music] all right let's take a look here at server data tools ssdt which transforms data development by introducing ubiquitous declarative models that span all phases of development phases within visual studio so this is a visual studio tool like azure basically has like a tool
for everything like it'll be the same tool but it will be repurposed for their different other tooling products and this one's for visual studio so i don't have a great internal screenshot but you can see that it works with um it has a few different components inside there and analysis reporting integrations so it uses sdt transact so tsql to build debug maintain refactor databases also provides a table designer for creating editing tables in either database projects connected database instances be able to view control data loaded files easy to publish to sql database or sql server
has an object explorer offers a view of your database similar to sql ssm mess allows you to lightly duty database administration design work easily create edit rename delete tables stored procedures functions edit table data compare schemas execute queries by using contextual menu rights i don't think this is on the exam but because it's just one of these many services i figured we'd throw it in here just so that if you do see this initialism you know what it's for [Music] so azure data studio is a cross-platform database tool for professionals using on-premise and any data
platforms for windows mac os and linux and here is a screenshot and if you recognize it it looks just like visual studio code right if you open up the extensions that's what you would see so query design and manage your database and data warehouses data azure data studio offers a modern experience with intelligence very similar to experience to visual studio code code snippets source control integration integral terminal built-in charting customizable dashboards jupiter notebooks connected to your data set and it has a marketplace of free extension so some i know that seemed very useful was sql
database inspector so a great way of looking at your data cousteau extension for azure data studio postgres extension and many many many more so there you go [Music] so azure data factory is a managed service for etl elts and data integration create data driven workflows for orchestrating data movement and transferring data at scale and so if you see the image it makes it pretty clear what you can do with it there's a bit of a pipeline there so create pipelines to schedule data-driven workflows build complex etl processes that transfer data visually with data flows using
compute services such as hdinsights hadoop data bricks sql database publish and transform data to data stores such as azure synapse analytics raw data can be organized into meaningful data stores and data lakes let's break down the components here so we just have a better idea how this thing works so pipelines is a logical grouping of activities that perform units of work activities is a processing step in the pipeline data sets or data structures within the data store link services define connections information for data sources to connect to the data factory data flows are logic to
determine how data moves through the pipeline transform integration runtimes compute infrastructure used by data factory control flows orchestration of pipeline activities that include chaining activities and a sequence of branching you should know what control flows are and data flows are and there you go [Music] so microsoft sql server integration services ssis is a platform for building enterprise level data integrations data transformation solutions ssis can be used to automate sql database servers it can be used as an integration runtime for azure data factory you can perform the following tasks copy files download files loading data within
data warehouses cleansing mining managing sql server objects managing sql server data you can perform etl for a variety of sources xml flat files relational data sources and ssis has built-in tasks and transformation graphical tools for building packages integration service catalogs databases where you store run and manage your packages it comes with a graphical interface to transform so you don't have to write any code and the designer is a graphical tool that you can use to create and maintain integration services packages just to show you what it looks like here it is so it allows you
to drag out transformations and design different kinds of flows and control flows and data flows notice it has control flows and data flows similar to azure data factory uh but you can see this is the tool kind of azure data factory is like kind of like the web version of this but yeah there you go [Music] hey this is andrew brown from exam pro and we're taking a look here at etl and sql tools cheat sheet so let's jump into it the azure data factory is a managed service for etl elt and data integration jobs
you can create data driven workflows for orchestrating data movement and transforming data at scale build elt pipelines visually without writing any code via web interfaces you have ssis the s the sql server integration services a platform for building enterprise level integration data flow solutions a low code tool for building etl pipelines very similar to data factory but existed 15 years prior mostly focused around sql no surprise there integrates with azure data factory so you can extend it to non-relational database workloads okay we have azure data studio an id similar to a very similar to visual
studio code those cross-platform and works with sql and non-relational database data has many many many many extensions you have sql server management studio smss an id for managing any sql infrastructure that only works for windows more mature than data studio very similar in terms of parallel so we have like the modern version that's the web mode over this web that's both s like relational and non-relational and then the older one that's mature but it is a a windows app and does a lot of interesting stuff for sql you have sql server data tools ssdt this
is a visual studio extension to work and design visually sql databases within visual studio so there you go we're all done [Music] hey this is andrew brown from exam pro and i wanted to show you how to go ahead and install power bi so what i've done is i'm on my windows machine and i pulled up the microsoft store and all i did was go in the top right corner and type in power bi this is the easiest way to install power bi you can download the application but i find this is just the best
way to do it and we'll hit get or free to get started here um and it should now start downloading so we go in the top right corner you'll notice here that it is now downloading okay so we just have to wait for that to finish and i'll see you back here in a moment all right so after waiting a short while here it looks like power bi desktop is finished downloading what we can do is click into power bi and we can go ahead and launch this service and so we'll let that go ahead
and launch and the initial time you launch it it does take a little bit of time there so i stopped and restarted the video but here you can see we are now inside of power bi uh and if we wanted to get started there are a few ways we can go ahead and do that they have some nice tutorials here's in the bottom right corner but what i want to do is just get something open so we can start looking at something so if you were to type in sample data sets for power bi microsoft
has this nice page where you have a bunch of downloadable reports or things we can work with and so maybe we should give the first one a go so i'm going to go ahead and download this one here and if we go down below so we'll just take a look here so i think we just have to go ahead and click here and uh from here i just want to hit that download button so i'm just looking for it here it is okay great and so i'll go ahead and download that file notice that it's
it's the power bi x that is the file that we are looking for and we'll go ahead and open this file so after any short while there the file did open it does seem to take a while for things to open but i'll just hit the x there they have a new modeling mode we'll just say not yet and so down below you can see i have a bunch of tabs here and this allows us to go ahead and explore this data but you can see things are extremely interactive here in power bi desktop so
you can do all sorts of things uh here okay [Music] hey this is andrew brown from exam pro and in this follow along we're going to be looking all at azure sql so what i want you to do is go to the type here and type in sql and you'll notice you'll get the options like the databases managed instances and virtual machines those are the three under the azure sql tier but to make our lives a little bit easier we'll type in azure sql i find this really confusing and so i'm hoping that i can
show you the easiest way to find it but if you go to azure sql here and add it this will actually now give you the option to choose between the three so you can make an informed decision and we'll work our way from the right to the left so sql virtual machines is great when you want os level access to the virtual machine uh when you're doing a lift and shift that means you're moving your sql server from on-prem to the cloud because you want to take advantage of the cloud and also if you want
to bring your own license to take advantage of the azure hybrid benefit so if you drop down here you'll notice that you'll see byol and if we expand here it'll tell you all the details here as to why you'd want to use that i'm not going to spin one up because i'm going to show you the price okay so we go over to the price and we scroll on down we go to um page you go you're going to notice whether it's on or off with high uh hybrid benefit you're paying about a dollar to
two per hour and so you know it's you're not going to learn that much by spending one up because we are going to spin up an sql server but at this stage it's not going to really matter it's they're all very similar so you know i just i don't want to spend money if you're a student there let's look at sql managed instances we'll expand that again this is for lift and shift meaning you're moving from on-premise to the cloud but here you don't have access to the os level you're not bringing your own license
however the upside is it's a fully managed service meaning that it can scale very well it's going to have great backups and things like that and it comes in two flavors we have single instance in single instance as your arc now i didn't cover this in the course but azure arc allows you to um extend your i think it's the control plane it's either data plane or control plane but the control plane uh two different cloud providers and also to your on premise so the idea here is like you can use sql manage instances and
ha like you can have all the benefits of here and launch the instance within your own infrastructure so um you know if you really need to keep that server on-prem then you can do that with azure arc but here's a single instance um you know it's just whatever there and so i'll show you the pricing on that one and so we'll go to manage instances we'll scroll on down here and the lows cost one here for pays you goes a dollar so it's still kind of expensive if you're a student again i don't know if
it has a free tier and i i wouldn't imagine it would because that's not really a free tier kind of product but you know i just want you to know that that option's there we'll make our way now to uh sql databases and so we have a few options here we have single database elastic pool and database server okay so single database is you have it's a great fit for modern uh cloud cloud-borne applications that need fully managed database with predictable performance so it has hyperscale i mean that it can scale up to 100 terabytes
it has serverless compute it's easy to manage you have elastic pools we covered this in in the lecture content this is where you might be like a multi-tenant sas and you have multiple databases one per customer and you want to save class so you actually have them all running on the same server and you have database server and those so this is just to manage a group of single databases and elastic pools it's a way of grouping stuff together and so today what i'm going to do is launch myself a single database okay and what
we'll do is we'll just say azure sql or we'll say dp900 dp900 azure sql just so i can see what i'm doing we'll go down below here and i'm just going to call this um my azure uh sql we will drop down here and select a server we'll have to create a new server so this will be my azure sql server and you might have to work to choose a name i'm going to do one two three four five six there we go okay for the login i'm going to do azure user for the password
i'm going to do capital t testing one two three capital t testing one two three it's really nice sometimes they make you do four five six and then an exclamation mark but we have the shorter password there we're going to say okay do we want to use elastic pool no i don't think we need that today but let's expand the general purpose here i want to make sure that we have the lowest cost possible so notice to the right here it's 381 dollars if you're provision it you could also do serverless i've actually never used
the serverless feature here but this would be really great if you're just trying to learn because it's going to be based on usage here i've definitely used serverless services just not this one in particular for sql but what i'm going to do is stick with provision i'm going to look for a basic plan and what we'll do is scroll this all the way to the left here and you know this is just if we forget to turn off our database so now it's five five dollars a a month and probably you know per hour it's
you know fractions of a penny so we'll go here and hit apply and we'll scroll down as we have some options geo-redundant zone redundant locally redundant honestly we only need locally redundant so i'm going to go ahead and select that obviously you know if you're doing a production database you want geo-redundant because you're going to have um uh you know better like a better redundancy in terms of like if a region goes down you'll still have your database okay we're the next step to networking and here we either have public or private endpoint this is
really important for uh security purposes right and we do cover this in the lecture content as well and so i think it would be a great idea to have that there let's go read about firewall rules allow services and resources to access server yes to communicate from all resources inside the azure boundary that may or may not be private subscription so allow azure services and resources to access this service yes uh add current ip address yeah because that's me so i'm going to say yes to both of those it's going to set up our firewall
rules which is really nice we'll go next to security here we have azure defender for sql if you want that additional protection you turn that on it costs 15 a month and we'll go to the next tab here here we could um i guess let's see for backup so start with a blank database restore from a backup or select a sample i'll select a sample because that's nice to have some data i didn't know they had that there and so we'll go to next tags i didn't really cover tags in the course but they're actually
really important especially when we're talking about data because you do want to catalog and and categorize all your stuff so tagging is very simple what we do is we have a name so we could say um workload and i'll just say like uh learning right but allows us to filter and find resources later down the road we'll go ahead and hit review create we can review all of our stuff see that the price is okay with us we'll go ahead and create and again we're not going to get billed five dollars by pressing that button
because it's again build per hour um so you know we will be build uh some kind of sense i don't know if azure sql has a free tier probably does uh free tier let's see here free tier i don't see it but i'm not too worried about that um so you know again if you're really worried about any spend just uh watch and don't don't do it but i do recommend if you do have the pennies to go launch the stuff of yourself because you'll learn a lot more that way okay so i'll see you
back here when this is done uh provisioning all right so after a short little wait there our server is ready so i'm gonna go ahead and hit click go to resources and what we can do on the left hand side is just do a little exploration so look at the top here where it says set server firewall remember from our our security part where if we had set um some default rules well here they are okay so here is my ip address that is allowing me to connect to it notice we have that connection policy
where it's default proxy or redirect you know things like that so we'll go back uh one step here and i just want to show you down below that there are some features so notice that tde transparent data encryption is already turned on uh there's your azure defender for sql we have our dynamic data masking so on the left hand side we'll scroll on down and just take a look at some of the security so we go to tde and you can see it's already turned on for us which is great i'm not going to click
on the security center because it's a pain to exit out but let's look at dynamic data masking so here what it's doing is it's suggesting uh fields that should be mass and we go ahead and just click add and add and then we can mask these domains so i just click that there and so now these domains are masked and it shows you a bit of the mass function as to what it would do to to mask it so it's as easy as that um data discovery classification this will be useful it's in preview right
now it's not showing any information as of yet um but the great thing about this is that it could discover some information that you might care about that needs to be that's considered sensitive right so you might say this is confidential gdpr address here um information types let's see what we can do here so we'll say customer and we'll say credit card and we'll say highly confidential and i'll say add classification i haven't used the service before because i had never seen it before but um it looks like that's a great way to um you
know keep track of that kind of stuff so that's cool uh auditing i don't think i ever do auditing but um you can turn azure sql auditing it will track logs and stuff like that so the thing is is that if someone were to get into your database and try to make changes or to corrupt your data stuff like that you'd want to see who was actually accessing those queries so definitely if you care about security you'd absolutely want to turn that on and then we'll just scroll back to the top here back go to
our overview i want to show you that uh through connect we can connect to data studio or visual studio if you click this it's going to tell you you need to install it i don't have azure data yet so i'm going to go ahead and install it on this computer and we'll go down below and i'm on windows machine today so this is for windows mac and linux and so i will go ahead and use the user installer i guess and i will download that it's 100 megabytes as that's going we'll make our way back
to our main page here and i also want you to notice there's power bi so we will go and download this uh we can go download actually right now so if we were to click on um uh is it just get started yeah it's gonna download this my azure pb ids and we'll be able to open that file in power bi and you know we did power bi earlier in earlier follow along so it'll be easy to open i want you to know if you don't have time to do all the stuff because we're going
to have this database up and running for a little a little longer than you expect so if you if you don't feel like you can do this all in one setting you can delete this database and always go ahead and spin it up again when you're ready to continue but on the left hand side they have a query editor here and we're just going to go log in so i said my password was testing with capital t one two three we'll hit okay and now that we're in i can see my tables on the left-hand
side here uh we got views and stored procedures but let's say we wanted to execute a query so i'll just expand this i've never seen this database before so i'm just going to go with it but we'll type in select um oops and it's going to be very aggressive about autocompletes you got to be really careful here and i'm just going to say select asterisk which means all fields and we want to select it from sales lt dot customer and notice when it auto completes it has square back brackets around it that is just the
thing that tsql does let's say you had a field called first name and you wanted to have it with a space and that's what brackets that you do so they're just filling that in case you have reserved words or um you know spaces but we don't actually have any spaces here so we just remove it like that sales lte is kind of like the container for all these um all these tables it's just a naming convention they have for their table names and so we'll go here select all and hit run and we should get
some data back and so let's just perform a very simple join so a join will happen when you actually have a foreign key you can join on so here you can see this is the primary key customer doesn't seem to have a whole lot of information so let's go into products and this should have yeah here you see it says product category that is a foreign key so we can use that so i'm gonna go here and type in sales lt and we'll do product okay and then we'll do where or sorry we'll do a
left join so we'll say left join and i want to left join on this product id so it would be left join with sales lt dot product uh category and we would say on and so we would say sales lt dot product and it's going to be uh what was it called it is product category okay so we say product category id equals and we'd say sales lt dot product category category dot id so we're saying we want to join where these match on here assuming joins working like every other language which it should and
so i'm going to be very particular and just pull out some particular keys that i want so here i'm going to say i want the sales lt.product oops and i wonder if i can do this as on this like as prod or for p here because i'm going to go crazy if i keep on typing that a so we'll just try that um i think it should work um if we do that because then it'll just be easy to read here it's like a way of aliasing it sometimes you can don't even need to put
the word as you just put p like this and for sales product we could do that as well so i took out the on so i'll just say like pc and we'll see if it lets us do that okay and so here we have product id and i want to get the product name looks like it's working and we want the category name i think so we'll drop this down i assume it must have a category it's taking it's time to load here but we'll just say pc.name as product name as name as id and
we'll see if that lets us run that it's currently not available if the problem persists contact customer supports we'll hit run again i'm not sure why our server's having problems i didn't i didn't turn it off did i let's go back here and take a look it's online yeah it's in good shape here so we should be able to query it maybe just lost our connection so what i'm going to do is just click back here i'll type in testing123 [Music] and i'll go ahead and close that there we'll make our way back go back
in the query editor there we go i'm not sure i was giving us trouble i threw an error here um invalid id oh yeah it's not called that it's called uh product id there we go we'll run that invalid column name product id must be naming these wrong here capital d here uh is it the same thing down below here normally you know like when i name these i would have them all lowercase underscore but you know in the azure world they they like to do this quite a bit so we will try this um
product id so it doesn't oh right it wouldn't be uh this would be product category sorry so this is product category id and yeah that should match there we go so we just did a join between the two tables so we have the name and the product name which is uh pretty good there our azure data studio looks like it's done downloading so i'm going to go ahead and install that so what i'm all i'm doing here is i'm just double clicking the file to open it up and we'll just give that there a second
oh it was here the entire time we'll say i accept next next next uh we'll get a desktop icon sometimes i can't find them later registered data suit is an editor for sporto files new i like other other programs for that and we'll go ahead and install that doesn't take too long and we will launch the data studio and yeah if you've ever used visual studio code it's gonna look very familiar to you see um so don't like the light mode rather have it dark appearance can we change that to dark zen mode oh no
oh no what did i do uh i don't know how to get out of that okay great maybe not do zen mode i just wanted to make it a dark mode uh not sure how to change that i guess it's not a big deal but what we'll do is we'll go over here the top and we can go ahead and add a new connection and so that's an mssql server what we'll need to do is i want to keep this open because um it's very hard to find the tables here but what we'll go here
is go back and we'll get the server name so we'll copy that and then within visual azure data studio we'll paste in our server we have three options windows authentication sql login azure active directory i'm going to use sql login the password is azure user username is azure user the password is capital t testing 123. what i like will tell to remember is that if you drop it down it'll see if the database is there and there it is so my azure sql that's what we called it i'm going to go ahead and connect and
if i expand it i should be able to see tables good and stuff like that very good a lot easier to work with than that preview editor there so what we can do i'm just going to go over here and grab our query copy this and what we'll do up here is we'll make a new query so i think you uh it's over here new query maybe it's over here as well let me see now i guess we just do it from there i'm just trying to figure out what's easiest way to go up there
and do a new query i'm going to go ahead and paste that in notice we can change our connection i don't think a database is selected right now so we'll go ahead and connect to this database here we'll go ahead and run that and it's the same data okay uh there's the thing called notebooks which is kind of cool so i can go ahead and make a new notebook i think it's using jupyter notebooks underneath um but what we'll do can we save that query we sure can i'm not going to save that query today
because i do not care but yeah it is jupiter notebooks great so what we'll do is go file new notebook and here i can write some text so this will return back both products and product categories okay and then what i can do here is add another piece of code we'll paste that in oops and i can even say like limit so i could say like limit to 20 or maybe 10. let me go ahead and run that and we'll just connect on the server here just needs to connect okay and we spelt limit wrong
so we'll just spell that again there limit i don't think i'm spelling it wrong but i'll just take it out because i don't feel like goofing around with that today and so here we have our records okay it should just be limit right l-i-m maybe it's not limited tsql give me two seconds tsql limit um oh they don't have a limit function i did not know that okay well that's kind of interesting they have things like where row is equal to like i'm just looking at it over here right so c areas use the top
record so select top five you know top ten let me try that there we go so i guess it's just not limit so we have that in there which is kind of nice um and so let's say we wanted to narrow that down to something in particular uh if there's a way we could browse the data oops i mean that by accident but what we could do here is just take a look i wanted to see the um i'll do it over here i want to see the product category id because then maybe we could
just do a like a where statement really quickly and so these are all mountain bikes when it's five so what we'll do is we'll make a where statement and we will say when pc product category id is equal to five uh and then we'll hit run okay and then up here we'll just say that you know show us oops uh can i edit this yup show us the top 10 mountain bikes it's not really saying like it's not this is not very useful information because it's not saying uh why it's the top ten it's just
listing it there's no ordering here so maybe that would be also a good idea here so let's just take a look at our data structure here i mean we could do it in in here as well if we want to uh explore that way but i think it's a bit easier to do it here so let's look at the product um and see if there's like any definitive information that makes it interesting like list price so maybe that's something we could return here so we'd say um p dot list price as price and what we
could do is order that so we'd say order by p dot list price descending there we go when you have to do that tutorial it's it's easier to forget so now it's listing it from descending prices uh that amount isn't very human readable i don't know if it's like 3000 whatever i wonder if we could use a function to do that if we're using ms or sorry like postgres it would just be like floor so maybe i could like floor the result here or round it let's just see if i can do that sometimes it's
like round yeah there it is those are like built-in functions to most sql languages here and uh it needs two arguments what's the second oh probably what to round it to so we'll probably put a zero here because it's gonna be like rounded to what maybe it needs to be a one other languages you just put round in so i don't know what that is so what we'll do is look that up so we'll say tsql round okay just doing that off screen here and so it says number expression is an expression of the exact
numeric approximation is the precision in which you want it to be if i can get a nice example so uh round's a number i just don't want it with zeros um so that will round it to two decimal points and then we have negative two so we'll go up the top here is the precision to numeric to be rounded length must be an expression of tiny intent when the length is a positive number is rounded to the number of decimal points specified with the length with a number uh it's on the left side of the
decimal point oh okay that's kind of interesting so what we want to do is probably round the [Music] left or right side so either give it two there can't convert data type v and char oh its name you know what you're probably watching me be like andrew what the heck are you doing so it's supposed to be up here so we do round okay and i'll just say two and that should round it to two decimal points which is fine but i don't i really don't want anything there so i'll say zero and you know
i probably would want that just as an integer there's probably a way to cast it so maybe there's like a cast function or two int so i don't see that so cast uh float to [Music] ant uh tsql yeah it's called cast that's what i thought and so it's as simple as that so what we can do and we probably don't need to round it because the cast will probably uh round it itself so we can say cast and say as int very good that looks really nice let's say we wanted to like concat something
onto that uh again i don't know how to do concat and tsql so we'll look it up so we'll say tsql concat strings in postgres it's just a pipe oh it's a plus wow we get a plus i love that you don't usually get a plus so what i'll do here is i'll cast this as an end and then i'll cast this as i don't know what it is in this as varchar bar chart yes can i do that first it's still a number great and now that it is a a string we can do
maybe string dollar sign like that we'll say run we'll let me do that yes very good and we'll give it also the usd there he goes that's kind of nice so now we kind of have a human readable human readable thing there okay so what i'll do is i'll just copy that over put it in my notebook and we'll just have that there i'll move that off screen and you know that's pretty much how you would create data with sql now now that we have a mysql database we can actually consume it with uh with
a lot of other services like power bi or maybe some other services that we can think of but yeah how about we do that next let's try to load this data into power bi [Music] all right so now let's take a look at using power bi with our ms sql server here so in the bottom left corner you saw that i downloaded that file earlier so i'm going to go ahead and open that up if you don't have it just make your way over to power bi and download it again and all it's going to
do is establish a connection to the data source now we didn't really need to download this file to do it it's not that hard to establish our own connection uh but it does save us one little step here okay so it's gonna load here just give it a moment it's gonna pop this up if you don't see this what you can do i'm just gonna go ahead and close this to show you how to make a manual connection so i'm going to go over to server name copy that go back and if we go into
now you have this button here but we'll just go get data more so you have a more predictable way of loading data in and we'll go to database here we can go to azure and so we have azure sql database um which i think that's what we're using and so we'll go ahead and hit connect and so we'll put the server name in there and the server name was my azure sql we'll hit ok we'll give it a moment to connect and we'll get back to the same screen so here we can import the data
that we want we'll say product product category um customer right customer address address the more we bring in the more it's going to be so you know what i'm just going to narrow it down to just product product category and customer you can transfer the data which is really nice i'm not going to get into that because that's a whole other thing but we'll go ahead and hit load and what this is going to do is it's going to start to import this data and we'll give it a moment so it's going to load the
data in and then it's also going to detect relationships and do our data modeling so the data is now in if we go to here we'll be able to see these tables and explore that data in its raw format if we go over to this tab here we can see that it's detected a relationship the more tables the more relationships it will auto detect so we didn't want to make that too hard i want to go back over here to our report and i just want to show you how you can visualize some information so
one of the easiest ones to do off the bat is card so i'm looking for card it's this one nope slicer card okay and so i'm just dragging this one out just there we go i just clicked it whatever you have to do to get there and this is great if you have a single field so i'm going to go drop this down and i'm going to find product by list price so i'm going to drag that onto the field here notice it went into there it only takes one input for the fields and there's
my price if i want to change the look of it and go down here and there's all sorts of things we can change here like shadow you know other things that you can do for fun but you know there's a little bit of styling there and let's say i wanted to see that as a table so what i could do is just change that to table and now it's a table uh table display but this information is not very useful so what i want to do is drop this down and then drag out the name
of the product category remember that these have a relationship it's established here so it's going to know how to slice that information to make it useful and so now we have a breakdown of price and name based on category right and i mean that's category which was a i don't know how to rename that very well yeah for this visual here we'll say uh you know product or so this would be category and for here we'll just say we'll just rename that to make it a bit easier we'll just say price there we go now
let's say we want that as a graph so we just change that over to a graph so we can kind of see the difference in prices all sorts of things we just click around here to have some fun okay bar chart is going to be the most useful one i believe so yeah i don't know if there's anything else that's really fun here we have another breakdown kind of see like uh the cost there of all the stuff that they're selling okay i think that has the highest price is the bikes right because so the
volume of sales is just the cost of each item right that they're selling uh you know so that's how you'd use power bi desktop with um uh with your sql server so yeah there we go so another thing that we can do since we have power bi and we connected it to our mysql server or our sql server no problem is we probably should try to publish it to the power bi service so that we can see how to make dashboards so what i'm going to do is go up the top here and type in
app.powerbi.com if this is the first time you've ever gone here it's going to ask you to make an account even even if you have a microsoft account it's just another way of authorizing that there and so here you can see uh we can explore all sorts of data sets if we click into here these are just dashboards right and we'll go in here we can see all this kind of information which is uh really nice and we can go here and just select some information you can see it's very interactive which is really nice um
and you know we can go ahead and create our own uh kind of things here if we were to publish a data center or do whatever but uh you know what i want to do is uh to connect some more data sources download power bi desktop we already have that installed so what i want to do is just publish and get something into here so i'm going to reopen up our file we had there earlier and establish a connection to our ms sql server we'll make a little port and we'll publish it and see how
we can access it through the power bi service here okay so we'll give it a moment and all i want is product and product category once it decides to load here so we want product and product category and i will say load we'll give it a moment and now that is loaded what we'll do is again make ourselves a visual so i will drag out a card like we did last time and we'll go down to product here i will try price uh it's better probably better if it's a table it's a little bit more
useful i think and we'll go here to product category and drag out the name again we'll rename these we'll just say rename price rename category and what we'll do there's a publish bubble go ahead and hit our publish button so do you want to save changes yes uh and we'll just save this whatever we want to call we'll just say my power bi report okay and uh so we will just have to put our email in here okay oops it's thinking we'll give it a second there we go so it's asking me to log into
my account and we'll say my workspace sounds great to me and we'll give it a moment to publish that report great that report has now been published we'll make our way back to power bi and so it should be under our workspaces right all right so what we'll do is make our way over to our workspace on the left-hand side and you'll have to click on here it's not very clear and we have a data center report if we click into a report we have this error about missing credentials so what we'll do is make
our way over to our data set and we'll just go to uh settings here and we might need to provide some credentials here again so let's edit the credentials and we'll just say azure user capital t testing one two three uh privacy level settings is private we'll sign in and you know what the pro i wonder if it'll give us access to user updates etc because it's from a a uh you know like remember how we had that firewall rule so it could be the firewall rules that is preventing it so by clicking on my
report yeah it's still a problem so what i'll do is go over to my azure server set the firewall or rules here allow azure services to access this resource yes um so it's denying the rules so we'll have to figure that out there okay so it looks like actually let's go in the dataset and updating the credentials did work that message was just a bit of a false flag there i mean like it did need to reestablish connection but uh uh you know i thought i thought maybe i had to go digging around in our
firewall settings but it since we have this turned on you know powerbi should be able to access it and it can so we have uh we're looking at a report right now within power bi but let's say we want to make a dashboard well how would you do that well you go ahead and you just go and hit the pin we'll say my dashboard we'll go ahead and pin that and uh i mean if we wanted to create a mobile layout we could do that okay so there it is for our mobile layout and we
can go back to our web layout right and from there you know now that we have our dashboard we can go ahead and just you know share that with our team or put it in chat you know whatever it is but this is where the point where you would then hit the power bi pro where you have to upgrade but uh yeah it's as simple as that so yeah there you go [Music] so we saw azure data studio as a means to connector database but there's another tool which is um at sql server management studio
i figured we should give that a go so what i'll do is go ahead and download that and it's 635 megabytes so you have to decide whether you want to download that once that's downloaded what we'll do is go ahead and install and give it uh give it a look okay all right so after a very long wait sss ms is done downloading and even in here it says that it downloads azure data studio with it so they really really want to use azure data studio which we've already used but you know it doesn't hurt
to open up ssmss and give it a go if you can use it and so i'm just going to go ahead and install that there okay and it sounds great it's going to go there we go couldn't tell i hit install or not all right so waiting a little bit of time here that this finally installs we'll go ahead and close that and now what we'll do is go actually ahead and open that program ms i'm not sure where it installs into so i'll see you back here in a moment all right so i'm back
and i found it it's under c program files microsoft sql server management studio 18 common seven id now uh you know your start menu you should be able to type in ssms and find it for whatever reason on my computer nothing ever shows up under the start menu and i can't be bothered to fix it but it's good to know where it actually installs to so now we know where it is i just double clicked it and now it is open and so here we can see a bunch of stuff and this will be a
way for us to load this so what we'll need is our server name as per usual so i'm going to go back to azure here and i'm going to grab the server name we'll go back paste that in we'll choose sql server authentication we'll type in azure user we'll type in testing with a capital t one two three connect honestly i like this a lot better as your data studio i don't know like real apps just feel feel like they work a lot better you know and so we can go into our database here look
at some tables okay give it a moment to load and we can go into like customer or maybe product okay uh there should be a way to visualize whoops like that so maybe go design mode here give it a second to open up yeah so we can kind of see like what columns there are and stuff in here which is really great but what i want to do is create a new query and i just want to show you that you can query it in here too just query it everywhere right and so uh from
our azure data studio that i still have open from before what we can do is grab ourselves uh this code here and paste it on in and go ahead and execute that and we can see we get the results so you know this is a very very very very powerful tool um you know and it's just i could do spend all day trying to show you how to use this uh tool here but um yeah i mean this is all you need to know so yeah there you go so that is ssms and i'll just
go ahead and close that and there we go [Music] hey it's andrew brown from exam pro and what we're going to look at here is storage accounts and specifically blob storage tables and maybe azure files and so what i want you to do is go all the way to the top here and by the way i'm just carrying this over from my mysql or sql tutorial so something to do while i'm waiting for this thing to install but what we'll do is type in uh account or storage account so we'll go here and oh yeah
just hit cancel there that's for my other tutorial we'll hit add and we'll create a storage account okay and what we'll do is i'll say dp 900 storage it will say ok and we'll name this storage account we'll say we'll call it my storage account notice we have the option between oops one two three four five six there we go oh somebody really really wants uh to have the same name as me nine eight seven six five four three two one because those are unique names right um oh it doesn't like that uh can only
be contains must be three characters is now too long okay there we go and so they're uniquely identifiable across all accounts that kind of gets annoying between standard and premium if you choose it you get a few different options so with premium you get block blobs file shares page blob so we're going to stick with standard to save some money we change the redundancy we'll stick with local redundant zones because that's the most cost effective go over to advance see what else we want here um here we have the option to turn on uh uh
higher or higher cool name space that's if we're doing data like gen storage 2 which we're not doing at this point in time so i'll do is go ahead and hit review and create and we'll scroll down and say create okay and we'll just wait for that to create the storage account i'll see you back here in a moment all right so we waited a little bit of time here and our storage account is ready so i'll go in here and what we'll do is go to the left-hand side we can see we have some
containers we wanted to make a file share it's really easy just hit the file share button and we'll name it so say my azure file share okay and we can pick how the tiers we want um transaction optimize seems okay to me we have to set a quota uh we'll say i don't know one gigabyte we're not really going to do anything real here so i'm just gonna hit create okay and so now we have our nice file share there if we click into it we can uh upload files so i need to go grab
some kind of file off the internet and we'll just type in star trek here anything images as long as it's appropriate and yeah here's a photo so go ahead and save that save image as and we'll go to our downloads here and we'll just say kirk and spock okay and that is now downloaded we'll make our way back here i'll say upload what we can do is go to our downloads again kirk and spock we'll say open upload and there's the file right uh not super complicated so that's that um let's go back over to
here into our storage accounts uh back here and i want you to show you uh storage explorer so you can go ahead and download search explorer if you haven't so you go here you download it hit download now i think i already have it installed so all i'll do is just click open and say open takes like two minutes to install yep i do and this is just an easier way like if you don't want to have to open up the portal and you want to be able to easily work with files it's another way
you can do stuff so we'll give it a moment to open and so here we have our storage account we can drop it down and see what's under there because we have a bunch of disks that i need to go ahead and delete but if we go here to file share we have our file share here and you can just see it's just an alternative interface we can upload and do stuff there if i wanted to download that i could do that okay but let's go take a look at blob storage okay so go over
to containers we'll make ourselves a new container we'll say my blob storage uh yeah it can be private that's fine we'll say create we'll click into our blob storage and we need to upload a file so again we'll upload the same file so we'll go here and we'll upload our kirk and spock file here and we'll upload it and there it is again and if we go over to our storage explorer and we go under blob containers and there you can see we have the data there so that's really really nice um so there's that
we should probably take a look at tables so go back to my storage account left hand side we'll make ourselves a new table so we'll say my azure table say okay we click into this actually we can't this is kind of a pain you'd have to use the api stuff but this is where the explorer comes in handy so on the left-hand side we will look for tables and we'll say my azure table and uh we'll say add okay we have to set a partition key and a row key so our partition key would be
something like we could say wharf and our category could be lieutenant commander okay insert and uh if we wanted to edit this we could add more property such as like um you know planet kronos whatever you want you got all sorts of types here but it's pretty darn simple as you can see not super complicated there um but yeah maybe we should learn how to use the azure tables via the cli so i'll be back here in a second and show you how to do that okay all right okay so i have the commands that
we can do to uh do that for the cli so what i want you to do is click up here in the top right corner that is the azure cloud shell there and i want you to start in bash if it asks you to create a storage account go ahead and do that because it needs a volume so under here we'll actually make one this is the one that is for my developer environment so just hit yes and just wait for this to load sometimes it takes a little bit of time and if it's giving
you a lot of trouble i'm going to hit restart doesn't usually give me that much trouble but there we go okay great it's i'm so used to waiting around in azure it's crazy but uh so what we can do is type in a z that is the cli command for az if we hit enter it should spit out some information yeah so it tells us all the things we can do there so what we want to do i'm typing clear here az storage identity insert hyphen t we want to put the name of our table
table is called my azure table uh we need the container uh account name we'll need we don't need the container name so the account name is up here so i'm going to try to type it in here i'm going to go ahead and paste that in and now we can go ahead and insert so yeah yeah we'll go here to the hyphen e so e is going to be what we insert so we have partition key and this will be uh we'll say uh beverly and we'll put the row key as a commander and then
we could say planet equals earth okay and if we hit enter i think we got it all right there give it a second uh it doesn't know what account name is let me just see if i spelt that wrong maybe it just doesn't like where it is so i'm going to just copy it out of here trying to make a bit clean by putting it there but um i will put it on the end here oops go ahead and paste that in there still doesn't like it um if it doesn't like that we could try
giving it the container name it might just pick up the account name which is the account we're in um but i'm going to go here and find my container that's where i had this like working like a second ago and just decides not to work okay okay so that's fine so i already have a working version here on the side so what i'm going to do i'm just going to change some of the values here okay and so i have this here and i know this 100 works right so we have um a z a
z sword storage identity identity insert insert um hyphen t for the table so maybe that was my problem is that i just didn't do hyphen t there so i'll just paste that in there i don't think container is used anymore but we'll hit enter so it doesn't like the container name so we'll just erase that out there and so there it has inserted the data so there we go so yeah i'll just show you like azure is a bit painful sometimes even when you have perfect instructions it still doesn't work properly we'll go ahead and
hit refresh and there is the record in our table database so there you go uh and i might just leave these accounts open here because they might want to adjust them in another tutorial but of course we will clean them up at some point under the end of the course okay [Music] hey it's andrew brown from exam pro and in this uh follow along what we're going to do is take a look at cosmodb so what i want you to do is go to the top here and type in cosmodb and we'll go here and
add a new cosmodb account and notice that we have some options we have sql mongodb cassandra gremlin azure table so what i want to do i'm going to be pretty crazy here i'm going to make a new core sql one and so we'll just say under here i'll say cosmo dp900 cosmodb okay and i'm just gonna call this one cosmo db coresql and just to be more you can say dp900 here i don't know if this one conflicts with other ones so we have provision throughput and serverless i'm going to choose serverless because um i
don't need provision throughput and there is a free tier here but i'm going to go with serverless okay just so we don't have to worry about it we've got a lot of options here like global distribution et cetera like that i don't care about any of that stuff and so what i'm going to do is go ahead and hit review and create okay and we're going to repeat this process again and again so what i'm going to do is go back to cosmodb create a new account and we'll create a mongodb one now and we'll
drop this down and we'll choose cosmodb i think i called it dp900 cosmodb so this would be dp900 cosmo db db the name is not available you just have to change until you get it i'm going to make this one serverless as well and we'll hit review and create so i just want to show you the variance of these ones okay so this is a little bit of annoying but what we have to do we'll add another one here this one will be azure table we'll drop it down we'll choose uh dp900 cosmodb we'll say
cosmodb uh we'll say dp900 cosmo db um azure table serverless go ahead review and create and we'll create that and we'll go one more time we'll do a graph database okay oops doesn't want us to go back and we'll say gremlin and we'll choose our dp900 dp900 cosmo db gremlin okay serverless review and create and we'll go back to here and i'm just going to wait until these are all done so i'll see you back when we see all of our i think we made four four of our cosmo dbs okay all right so after
waiting a little while here actually my gremlin database didn't create or my account didn't create so i had to make that twice but here they are and you can see this one is still creating but we can go and do some things while we're waiting first let's go check check out the core sql and so on the right hand side you'll see there's this quick start it's always a great way to get started so if we wanted to uh you know in azure we could start inserting data so if i was to go over to
node.js and we say create items container this is what i'm the most familiar with now a lot of people go over to the explore data but i'm just curious as to what this stuff looks like so because i've never used the quick starts before so i'm just curious to see the quality of it so let's go ahead and create that items container right now i think it's just waiting to create so if i go over to that data explorer tab uh it's probably just going through that process of creating so it actually created us a
to-do list which is nice okay and it set up a partition key and oh looks like it's done so now we can go ahead and download that application says npm install npm start it's kind of cool and i have it over here so i just got to unzip it okay so i'm just unzipping that here choosing winrar that's what i got installed and we'll just drag that on over here and i wonder we'll go up here now you may or may not be able to easily open this depending on how your setup of your computer
is so if you do you can't just you can just follow along here i'm opening up visual studio code here you just can't see it's off screen but i just want to close some things in here okay and i'm just going to go open go here file open a folder and this is in my downloads so i'm going to navigate all the way to my downloads here so we'll say um [Music] this is users andrews if you want to see me i'm just writing this over here downloads sql and we'll say okay and what we'll
do is open up a new terminal here and the instructions said that we should be able to do npm install npm starts we'll give that a go we'll say npm install i might as well take a look here at the code just curious what's going on here so it's actually using the azure's cosm yeah cosmodb client it has a config file over here which contains uh our configuration information looks like it has some initial items that it's going to insert so there's some initial data which looks kind of cool it sets up a partition key
here we have an endpoint so what it does create the database if it does not exist read the database definition create the container if it does not exist read the container scale the container create family item query the container so there's a lot of examples here so here it just gives you some stuff so you know if you know you're coding and you want to go through this it's a great way to get started it looks like finished installing we'll do an npm start i don't know if i'll be able to actually view this because
it's running through a machine oh so it just runs it okay i thought i was going to like uh output a um like a localhost 3000 you could view it so it actually just ran that stuff so create the to-do list reading created etc so yeah all of it's there so if you're familiar with that you can take a look but what we'll do now is make our way over to day explorer we can also go up to cosmos uh was it cosmos.azure.com is that what it is there we go this has a little bit
more room i'm going to click on sign in it's the same thing as the date explorer just a little bit easier to work with subscription one and we are in sql here we'll drop it down now we actually have some data we can take a look at some good items check out items here and it should show us some right hmm go back here it says first name etc so you can see that it created the to-do list reading the database create the items reading oh completed with an error entity with the specified id does
not exist in the system okay so it's expecting something to already exist there and that's why it couldn't run so maybe what we'll do is just go ahead and create what it wants okay so while we go ahead and create a new item so the partition key is called partition key that's a very uncreative name but when you create these records you always have to have the id and the partition key name so i'll go here and we'll type in partition key and i guess it could just be whatever we want um i'm just looking
at the record here to see what they wrote so we just scroll up here we are looking for create item just scroll up here see if we can find it this is more like reading an item here i'm not necessarily creating partition key at sign country usa so um i guess we just say usa here say name oops andrew go see if he'll let us create that uh we'll go ahead and just copy this oops place with id so we'll just say one we'll say save okay and it added some additional data underscores ts is
for uh for a time stamp okay um and now what we can do is go and look at that data so we'll hit more oh i guess we're exploring it right there okay so let's say we wanted to query that data we could create a few other records because that's not a lot of information right so what i'll do is just copy this we'll create a new item here uh i'm technically not the usa i'm in canada so we'll put in vaco here just copy that and we will save we'll create a new item uh
they should have been different ids but whatever we'll say brazil put in roger and we will go ahead and save okay and so now let's say we wanted to query that data uh it should be these icons are very cryptic let's store procedure whoops um yeah i guess we could do it right here no usquery that's what i want okay so uh if we hit run execute query to return us all the data but let's say we wanted to only select a subset of data so we just do where um name equals andrew let's give
that a go see if that works oops uh what if we do c dot name there we go so yeah it's as simple as writing queries not like any other kind of sql so that's pretty easy let's take a look at another one here so that was that let's close that off and we'll take a look at mongodb next all right let's take a look at mongodb so we'll go ahead and click into here make our way to quickstart and we have some options we have node.js so if you wrote javascript and use the connection
girl that's one way of doing it i think mongodb might be the shell might be a fun way to do it so what i'll do so go over here i don't think i have this installed but it comes part of the server installation for mongodb so i'll go ahead if you would like to download the sell shepardly from server yeah that sounds better to me i don't think i need everything and so select the zip download which includes the okay and i guess we'll have to click on the computer community edition here oh windows we
can do the zip or msi yeah i guess we'll just install the full thing it's not a big deal maybe i'll use it for something else but we'll go there it's only 200 megabytes so we'll just wait for that to download shouldn't take too long here there we go and i just got my downloads over here we'll open that up sure we'll go ahead and hit next i agree next i guess we can customize it so maybe if we only wanted the client i'm going to just take everything um we'll say next that seems all
fine to me we'll hit next install this process is on a mac or linux it's just going to be different process but yeah we'll go ahead and see this here all right so after waiting a very long time uh on that compass step it finally uh finished installing i guess it was doing a visual installer there it looks like it is opened and uh i guess we installed compass and it makes it really easy cool uh sure we'll say yes to everything there there's a little bit too much information but i guess what we can
do is establish a connection and i guess i was thinking we just use the shell but this looks better so what we'll do is go ahead and take a look here so we will go grab our connection string and hopefully you'll just take it as that it looks like it's hold on here i think what we really want is this this is the string here connection string uh i guess we can grab it from here it's a bit easier we'll go back to our connection service here we'll hit connect fingers crossed it works first time
yeah there we are we're in okay cool so uh i it's a new data like it's completely new so i guess we wouldn't have anything in it like a database or anything like that so we can do at the same time as we're doing it let's make our way over to cosmodb um oh i guess i closed it cosmos azure.com and we'll go down and flip over to our mongodb database and we don't we haven't created anything yet so we have to make a new database so i'll just say my database or my mongodb my
collection can you think of a collection as like a table right uh shared key whatever you want uh my shared key again i'm just showing you how to insert stuff uh we don't need this sharded so we'll just say unsharded okay makes it a bit easier on us here i just want to see if that actually reflects over here in our manga db compass because it probably should we'll give it a moment there to create the database if it doesn't what we can do because there's a reload here right so let's hit reload until the
data loads and yeah looks like we have it now so that's all good and i guess we have to click on this maybe click on this one again here to reconnect great now we have a database so we can go in here click on my collection we can add data let's hoping to execute a command oops and i'm just trying to see if it allows us to do that here in any way again i'm not that familiar with this tool but yeah i guess we can go create a document here we can make it through
here so let's go to documents we'll add a new document and say one and i guess we should probably put our shirt key so my shared key say blue we'll just say name andrew and we will go ahead and save that document we'll go back here uh refresh we can see our document there we can edit it in here as well a lot nicer in this say we can add data we'll say document it's a little bit more cryptic i think i prefer to do it in there i'm again i'm not that familiar with this
tool but what i really wanted to show you was the command shell so what we'll do oh let's just do shell up here oh that's nice okay i thought we were going to have to uh you know i thought we're going to have to paste this in so maybe we didn't have to install that all along but hey this looks pretty good to me right so what i have pulled up here is the api i had here a second ago at least say mongodb shell i definitely had it open here for us um because i
wanted to show you the api uh so there's like right one okay here we go so i found the mongodb api here and if i just go back it's shell method so if you typed in mongodb.com for slash mail reference methods you can see all the kind of stuff that we can do and so let's just go ahead and see if we can insert something we only need to insert a single document and so here we kind of see the uh the stuff here yeah underscore id is the the main field there and what we'll
do is scroll on down and we'll just go ahead and copy this command and see if we can get that to work okay so we'll go back over here give that a paste hit enter and we'll go back to our document and see if it's there there it is we will go back and check in our mongodb cluster here give it a refresh i want to see uh well it shows two records so what i'll do yeah it's not what i want so we'll try this again yeah i just started typing it six i already
have an error and we want the my shared keys we'll say my shared key okay double quotations we'll say green and we'll just put in here name baco we'll say insert okay that record is there uh we'll give this a refresh here oops there it is there's the record as well it didn't insert the other one i wanted so we'll go back into our shell here we'll say db.products it's probably because i didn't i you know if you don't have the um partition keys just not going to insert it right i thought it would error
out or say something that's what i was looking for so i'm going to say id we'll say 5 i'm going to put my shared key we'll say red and we will put name here and we'll say roger we'll give that a go and if you don't type it right it will not work so we'll say insert one there we go we'll go back to our document here we'll give it a refresh here i'm having like no luck doing inserts today but you know you get the general idea right so we're not here to really teach
you a mongodb tutorial but i just want to show you that you know you use the shell to insert data there's this tool here and what a document looks like so that pretty much covers the document mongodb so what we'll do let's go back here and let's take a look at azure tables alright so let's take a look at cosmo db azure table so click into it we'll make our way over to the data explorer we will say new table and i'll just say my new table say okay and we'll give it a moment to
create this new table here as you can see we didn't have to make it a database or anything it was just a lot more straightforward here but it does seem to take a long time to make a table to be fair it is serverless so it could be uh having to provision something before it do something we'll go here and look at entities and we can go ahead and add ourselves uh a new entity here we go up the top here so we'll just say worf lieutenant commander planet kronos add entity and there it is
okay go maybe over to the query builder and if we wanted to filter this out we could say partition key i guess we need another record for this to make sense so we'll say um data uh commander um okay he's not from earth but we'll just put earth we'll add that entity we'll add another one might as well we'll say um crusher commander earth and we can go here we just filter it out so we'll say crusher uh partition key yep that's right so it should be able to run that so run query okay gets
that exact record so pretty simple not too complicated and so now we'll move on to gremlin all right so let's take a look at cosmodb gremlins so click into it and what i want you to do is go to quick starts we'll go to guided gremlin tour and we'll create ourselves a sales graph collection so we get a little bit more information that we normally would have give it a moment here and as there we go yep it's ready so we created the sample this downloaded a package that contains a console app as dot net
the console app needs to be executed first to upload the sample data and the asp.net web app will allow you to visualize your data okay so i guess we have to go ahead and do that so we'll go ahead and download that and there it is i'm just going to unzip it here and we'll double click and do it and it doesn't look like much but i'll drag it out into my folder here whoops and we will go and click into this and i want to know how to use this so i guess we'll open
this up in vs code okay let's give this a read sample application shows to interact with the cosmic db gremlin etc etc visual studio code 2015 sample data has two projects uploading the sample data is done via the console application project included in the quick start run the console app all right so it's saying that we need to go to tools and we're going to double click this and luckily i have visual studio installed if you don't have install you don't have to go do this but we just want to see a nice looking gremlin
graph in our cosmodb and i will go ahead and sign in and i'll use my example account and i guess it really wants me to choose some stuff but i'm fine let's just go ahead and start it uh the c sharp project is targeting.net which is not installed in this machine to proceed select the option below change the target you can change it sure as long as it runs you know i didn't feel like i was going to be doing.net today but we'll go ahead and open this up all that really matters is that we
can run it projects loaded ready to use in the background and so we need to run this so it says over here so verify the settings if you download it etc okay if the console is run upload etc i just want to run it run the application f5 so i think it's time to hit f5 on my keyboard and so i have a mac keyboard so i'm not sure if that's going to work but what i'll do is i'll go to the top here oops i might need to install extra components so we'll go ahead
and install whatever components it wants just click that over here in the right hand corner there it's probably because it needs a very particular version of net yeah it needs the.net desktop development here even though i'm not a dotnet developer i still know my way around ids pretty darn well so we'll go ahead and give that an install and it looks like it's at 848 megabytes i'll see you back here in a moment okay all right so after waiting a little while here it looks like it's installed and restarted and i think i can actually
run the application now so we'll go at the top here and we will run let's say should be like a project run i guess we just hit the start button that works too right you.net fans that are watching you can make as much fun as you want to be okay uh and so you know we just want this to run so that it loads the data right that's all this thing does so it's going to build the project and it's complaining about virus and threat protection not really worried about that and it said it succeeded
verified the database exists your collection etc uploading the graph now excellent there we go and we're just going to wait for it to upload those nodes and we're going to have a really really good example here right i'm not sure how many nodes there are in this could be a lot there we go oh now it's starting up edges great this is 469 nodes i wonder if there's going to be as many edges or less we'll see there we go so now it says graph uploaded you can now show this application okay sounds great so
what we'll do and i'll just go ahead and close visual studio because i don't need it open anymore and what we'll do is open up the data explorer actually kind of prefer cosmodb so we'll type in cosmodb again and we will switch over to our gremlin database here and go into persons graph database you should be able to see a visualization without have well i guess we can just hit execute search query hmm maybe insert it in this one we'll go ahead and execute it over here there we go so now we have some records
so cool all right we got that working and um so yeah this is like that gremlin language up here um i'm not that familiar with it so you know what i'm going to do i'm going to be right back and uh and learn it in two seconds okay all right so i'm back and i learned a little bit here and what i did was i went to the apache gremlin website for tinker pop or gremlin in particular so g dot v parentheses is when we want to get all records back if we put in a
particular id here it should give us a particular result i also figured out as we move around here if you scroll into things like this and you click on there it will expand the next area it should there it goes it's a bit slow right but you know we have an id here so let's give it a go and see if it actually works so we'll do g v and let's see if we put that in there like this we'll give it uh maybe quotations they just have the number one so i'm not sure to
tell you there and yeah so it grabbed that particular point there um it looks like we can grab different values so i'm assuming that these are values on it here so maybe we'll just do dot values here values uh well if we give a label here there we go service plan so yeah it's not that hard go back here execute the plan and then there's like commands like out e i assume that's the way of like you would select things in the region so um i don't know if like that is a definitive one but
we'll just go ahead and put that in there and see what happens and we got nothing because you know we have relationships between other sources like edges and so maybe it's based on the label here so let's try this execute the plan yeah i'm not sure what it is but my point is is that we populated ourselves a graph you can tell that that's how you use the language if you really want to learn more you can go through the tutorial uh on how to create that information but for our purposes we just wanted to
show that we could popular graph and play around with one there a bit okay so that is gremlin we did we did azure so that's all of them so we're all done here again i'm going to keep these around just in case i want to pull them into other services and at the end we will destroy all these resources okay [Music] hey this is andrew brown from exam pro we're going to learn how to use data factories to run a transformation job maybe between our sql and our blob storage because we do still have the
setup from previous tutorials so i'll make my way over to factories you just type in data factory at the top here and you can see i have one there it's right now deleting because i'm not happy with it but what we'll do is create ourselves a new one so i'm going to make a new group here dp900 data factory 2 okay and here i'll have to name it so we'll say dp900 data factory we'll go with version two it doesn't like it so i'm gonna put a two on the end there and we'll go to
next i do not want to use git at all that was the reason i uh i restarted this was because it was complaining about that for networking it's going to have a public endpoint for we have the ability to encrypt our data which we do not care today i'll go ahead and hit review and create we'll go ahead and create arc data factory and we'll give it a moment here it doesn't take too long to set up a data factory so there it is so we'll go ahead and go to resource and the way we're
going to actually access it i don't know why they have the button here but this is where it is it's author and monitor it's going to open up takes about a second and here we are so what we'll do is we want to create ourselves a new pipeline we'll make it easier on ourselves go to the left hand side hit the plus create a pipeline and we'll say sql to [Music] um sql to what it's a bit glitchy there sql to blob okay and we'll go ahead and save that um i think it uh does
it auto save yeah it auto saves so what we'll do is go to the left hand side here go to link services click the plus we need sql we have an sql database so we'll go here we'll say continue we'll have a connection string we'll select our subscription our server name our database name our type is sql authentication azure user capital t testing one two three okay you can have always encrypted on there i'll go ahead and create that you can hit test connection if you're concerned that it's not working i think we can click
into it hit text test connection there it's got a nice little green so that's all good we want to put this in our blob storage so i will add another one here we'll type in blob we will create that hit continue i don't care about the name but what we'll do is use a connection string to select it as well so we'll go down here subscription one it's my storage account and we will test the connection and it's all good we'll hit create cool so we'll now go back to our pipeline or our authoring here
we need a data set um can we get that from sql we sure can we'll click on that we'll hit continue we'll select our linked service which is here we'll select the table we want so let's say we want to translate over the products and we will go hit ok all right and so now we have our data set okay and so we'll move on to the next step all right so let's go set up a transformation so what we'll do is drag out copy data and we will choose our source and we'll say our
sql table and we're going to want to put that into a csv oops i did not create our data set here so we'll create a new data set go over to blob storage here hit continue choose csv format so you can do excel json all fun things we'll choose our link service which is blob storage uh this has to be the name of our blob storage i already forgot what it's called so we'll go up here make our way over to storage accounts on the left-hand side go into storage accounts we need to find that
container name there it is go all the way back put that in there call that transform and we'll say products dot csv first row is header sure but we don't have any import scheme because like you could have one in there and say okay this is what the schema should be like have the headings in there so it knows what it needs to translate over to but uh it's going to be a one-to-one mapping so it's going to be something really simple here uh so this seems all fine to me notice we don't have any
schema we could import one but there's nothing to import so go back to our copy data we'll choose delimited file and we'll take a look at mappings we'll import schemas it's going to import the schema from the table here are the column names here so as you can see there are no column names which is totally fine and uh what we can do is go ahead and let's go hit uh debug okay and so it's queued up and now what we'll do is go ahead and validate okay and it says it's been validated no issues
as of yet we'll go hit publish um yep go ahead and publish it and so now it has actually ran the pipeline so what i want you to do is go over to your storage accounts give it a refresh go into my blob storage now we have a folder called transform we have a product called uh csv called products we'll go ahead and download that okay we can go ahead and open that up in excel and there you go so we just did a transformation it's not the most beautiful transformation but you can see that
it's a very powerful tool we didn't have to write any any code whatsoever which is really nice so there you go hey it's andrew brown from exam pro and in this follow along we're going to take a look at a data bricks and sparks so what i want you to do is go to the top and type in azure data bricks we'll go ahead and hit add we'll create an azure databrick service and what we'll do is hit create dp900 azure databricks okay within our workspace here what we'll do is choose dp900 azure databricks very
uncreative of me but it works out fine and we have some options standard premium trial i'm going to stick with standard i do not want this to cost much whatsoever go ahead and create that environment there and i'll see you back here in a moment when it's deployed all right so um our databricks environment is ready so if you click launch workspace it does take a little bit time to launch so i'm not pressing it but we'll have our environment here and so uh you know there's stuff you can do here like create clusters and
run jobs and notebooks but uh just so you don't spend any money i'm going to show you another way you can run this service which is totally uh totally won't cost you anything so we'll go to databricks and type in community edition if we go down below below here i already have a log if you sign up here it's exact same thing pretty much but there's no chance of you having unexpected spend so i prefer to do it this way even the cluster is shut off over a certain amount of time and so we'll click
on the explore quick start tutorial here and i think it's shift enter if you want to the commands it's in the top right corner this little keyboard thing it tells you what you can press so run command and move to next cell that sounds like what we want to do so i'm going to hit shift enter shift so it says in the sidebar create a cluster etc but if we just keep on going here it should automatically create this one so automatically launch one without to clusters without prompting so we'll just hit attach and run
okay and that's going to start up a cluster but as it's running we can just kind of take a look here and so you can see that we're running sql commands um and so it looks like we're it has we're loading a csv and then we're treating the csv with sql which is really nice so that's the thing where it's like you have data that's not necessarily in a database but you're able to run sql commands on csv files and things like that and then down below you can see uh it's using python to use
spark to load and format the file um and uh and you can even visualize information in here so it automatically let us plot it and change some of the options here so we go here make a histogram quantile map that's useless area bars so it's very very useful tool but wait till that cluster spins up here takes a little bit of time if you want to monitor a cluster we can go on the left-hand side and just go to clusters here okay and you could leave this and and and totally not be worried about it
you don't have to worry about it like spinning up and costing you money okay so i'd go here and give it a refresh and i don't see the cluster so i think it's just because i was terminating an old one and now it's it's waiting until it can determine uh start up a new one so it might not spin up but it's not really that oh there we go it's terminating great uh is it going now you can never tell these things is the free edition so but if i just go back here we can
even look at the commands here cyborg create cluster quick start and database runtime drop down to l3 create a cluster um so is it going to create cluster now nope we'll cancel there yes and let's just follow it manually so i go to clusters create the cluster here it's saying l3 lts kind of gives you an idea how old their tutorial is lts but that's what it wants we should match it exactly sparking scala i doubt that it's for that one there and so we will just name our cluster quick start oops right after the
name of what it asked for oops and we'll go ahead and create and we'll i guess wait till that starts up it could take about a few minutes to start up so what i'll do is i'll see you back here in a moment when it started okay all right so we had to wait there a little bit but uh now it looks like our cluster is running so what i want you to do is go back to that notebook we had and this time it should possibly we'll close the other one because that is our
the one we don't want to use but we'll do shift enter shift enter shift enter um it's not would you like to launch a new cluster no i want to use the existing one maybe i can drop down and choose the cluster oh here we go um cancel hold on here quick start confirm there we go okay and we'll see if that executes the command now it's a bit slow but again this is a community edition and also imagine if you're dealing with massive amounts of data so it's not really surprised that it's not like
super fast um but like you have to think of it at scale right and so it completed the job so that was pretty good we can go ahead and run this line again for fun okay so you get the idea here right so that's all i really wanted to show you here um just kind of a bit of exposure to data bricks there but you'd have to learn all about apache spark to really understand the stuff behind it um but uh yeah so what we'll do and again you don't have to shut it off it
doesn't matter but what we'll do is just terminate and be be good about that also we'll delete this at the end all these follow alongs for the dp900 we'll definitely be sure to delete all of them but just for this one in particular i just want to get rid of it so i don't forget about it um yeah there you go that is databricks hey this is andrew brown from exam pro we're going to be looking at azure snaps analytics today in this follow along so what we'll do is go to the type here and
type in synapse make our way over to analytics we'll go ahead and add ourselves a new workspace and i'll create a new one here this is the dp900 so we'll say dp900 um synapse workspace okay and we need to give it a managed resource group uh so i guess same thing dp900 synapse workspace i'm just going to keep on naming it the same as long as i can have it for our subscription we'll have to create a new account because we do not have a gen 2 storage account i wonder if i could just make
it in my other one nope so we'll have to just because i already have a storage account right on there so we'll say data lake storage okay probably wants it all lower case data lake storage dp900 okay and uh we don't have a file system so dp900 data lake file system okay and notice here it says assign myself storage blob data contributor role for the data lake storage too if you remember uh from our lecture content about very important roles this is one of them and it's being auto assigned to us go here to security
here it has this but i'm going to just say azure user and capital t testing one two three capital t testing one two three just so i can remember between our other ones enable double encryption that sounds good but not something we're doing today um we'll go to network this seems okay to me next review and create and just notice up here it says azure synapse workspace is 6.40 so it's not something we want to uh have lying around um this is per terabyte right but we're not going to be working with a lot of
data here so it shouldn't be a big deal for us we'll go ahead and create that and uh yeah i'll see you back when this is all done okay all right so after waiting a little bit of time there our synapse analytics environment is ready what i want to show you on the left hand side is if you scroll on we'll actually got to go to the workspace first scrolling down you have your analytics pool so you have sql pools where you can run sql operations so you can go there and create those clusters these
get pretty darn expensive so um it's not even showing the estimation there but um the thing is is that you know if you needed sql you could do that because remember there's two engines that you can use you can use the sql engine or the apache spark engine right so you create those pools sql pools first you can also create them within um the workspace but we'll go up to overview the workspace is the studio so we're going to open that up and once we get in here we can ingest data and do all sorts
of things so it makes it very easy but let's actually just go back you can't exit out of that once you open that up so i'll go and reopen this up because they do have some test data we can use which is uh should not cost us much but if we go all the way over to learn here and we go to sample sample data we have some options here if you choose this one it'll create a pool but what i'm going to do is go to query with data with sql and it will give
us some sample data so we click on this and what you can see here and this is just me trying to run it before but what you'll see here is we could use the synapse sql right and we're actually able to import files so here it's importing a parquet file from a um azure opens like it's a storage account by azure and it's just an open data set so here you can just kind of see that there's ways of querying it and what we would do is choose where we want to do it so we
go built in we can publish that before we uh do that there and we'll go ahead and hit run and so you know it's going to pull that data it's going to take a little bit time to get it loaded in and before it runs it but it doesn't usually take too long and there is the data so yeah that gives you kind of an idea how azure synapse uh works um but yeah so if it was an sql here you just write your apache spark usual stuff there uh yeah and hopefully that gives you
a good idea how that works i don't like to keep the service running it is can get very expensive very quickly so me being paranoid i'm going to make sure i terminate this one definitely early even right now um but yeah that is azure synapse analytics okay [Music] all right so we are at the end of our follow alongs and you could tell that they all kind of tied in with one another so what we'll do is we do some cleanup so from the left-hand side go to resource groups and and it's dp900 you can
go ahead and delete so what i'm going to do is just click on each one here it could be from other ones but i'll go ahead and just type in like cosmodb again it's based on what you name them but this is the fastest way i find to clean up all my stuff and i'll go here and just keep on going down the list you know because mostly the things we used they're not going to have ongoing cost the one that will will definitely be like the sql server so that's something that we definitely need
to terminate okay some of these might give us some trouble so i'm again i'm just running down the list here deleting as much as i can and then i'm going to see if there's any errors here but i think we already deleted this one but we'll give it another go here and yeah that should be all of them here if you get a failure you might have to investigate it so here it says failed to resource the client uh does not have permission yeah i do however that's tonight against the deny assignment so the idea
is just you know wait for this to finish uh it could take a little bit of time and sometimes like it's deleted but it just takes time for this to vanish and another thing you can do is go to your resources and just make sure the resources aren't running anymore so i'm looking here and what's going to cost me money is this sql database that's running so you know i want to make sure that stuff is deleted so say yes you could also do it from here as well you should let the resource groups delete
it because sometimes it gets confused but uh you know i just want it all gone i've got an ip here data factory storage account all these things i want them gone just say yes uh you know it doesn't show up here is disks i know i have a bunch of disks let's go over here no there's nothing okay good i thought i might have had like a bunch of disks you know why because i was looking at my explorer and probably just showing the old ones so yeah it takes a while for this to to
happen but um you know i'll come back here and show you that i've deleted them all and that's what you should do too okay so i'll see you back here in one more wrap-up video and that's it all right so after waiting uh quite a little while here i just wanted to see if it was all cleaned up so i'm going to give it a nice refresh here looks like my resource groups are cleaned up if i make my way over to my resources here it doesn't look like there's much uh here remaining so and
just check our notifications make sure everything is deleted so that's how we make sure that everything is cleaned up in our azure account and that's it for our follow alongs okay
Related Videos
Microsoft Azure Fundamentals Certification Course (AZ-900) UPDATED – Pass the exam in 8 hours!
8:21:49
Microsoft Azure Fundamentals Certification...
freeCodeCamp.org
914,631 views
How He Got $600,000 Data Engineer Job
19:08
How He Got $600,000 Data Engineer Job
Sundas Khalid
227,766 views
Azure AI Fundamentals Certification 2024 (AI-900) - Full Course to PASS the Exam
4:23:51
Azure AI Fundamentals Certification 2024 (...
freeCodeCamp.org
292,341 views
DP-900 Data Fundamentals Study Cram v2
2:28:01
DP-900 Data Fundamentals Study Cram v2
John Savill's Technical Training
185,692 views
Microsoft Azure Fundamentals Certification Course (AZ-900) - Pass the exam in 3 hours!
3:10:26
Microsoft Azure Fundamentals Certification...
freeCodeCamp.org
5,596,496 views
Exam DP 900 Microsoft Azure Data Fundamentals Full Course - #ExamDP900MicrosoftAzureData
4:25:19
Exam DP 900 Microsoft Azure Data Fundament...
Geekdom Academy
30,573 views
Trump’s illegal EXTORTION surges into national spotlight
14:32
Trump’s illegal EXTORTION surges into nati...
Brian Tyler Cohen
191,458 views
US Can't Stop This - Canada Want to Join EU because of Trump Tariffs
20:50
US Can't Stop This - Canada Want to Join E...
Oracle Eyes
134,261 views
Trump is about to start a war with Canada over resources | Michael Bociurkiw
13:24
Trump is about to start a war with Canada ...
Times Radio
447,714 views
AZ-900 Azure Fundamentals Exam Cram (2024 Edition) - Full Course
2:30:52
AZ-900 Azure Fundamentals Exam Cram (2024 ...
Inside Cloud and Security
409,386 views
🚨 Trump finally messes with the WRONG opponent as legal move BACKFIRES
16:08
🚨 Trump finally messes with the WRONG opp...
Brian Tyler Cohen
1,022,216 views
FURIOUS Canada CHECKMATES Trump as BOYCOTT SURGES
17:59
FURIOUS Canada CHECKMATES Trump as BOYCOTT...
MeidasTouch
796,747 views
Ukrainian MIRAGE Pilot Just Did Something UNBELIEVABLE
17:38
Ukrainian MIRAGE Pilot Just Did Something ...
The Military Show
287,705 views
Microsoft Azure Data Fundamentals [Exam DP-900] Full Course
2:28:36
Microsoft Azure Data Fundamentals [Exam DP...
Susanth Sutheesh
120,717 views
How I Would Become a Data Analyst In 2025 (if I had to start over again)
15:40
How I Would Become a Data Analyst In 2025 ...
Avery Smith | Data Analyst
268,696 views
Google Cloud Digital Leader Certification Course - Pass the Exam!
6:06:13
Google Cloud Digital Leader Certification ...
freeCodeCamp.org
786,784 views
Azure Administrator Certification (AZ-104) - Full Course to PASS the Exam
11:16:26
Azure Administrator Certification (AZ-104)...
freeCodeCamp.org
1,629,631 views
DP203 Microsoft Azure Data Engineer Associate Certification Training | Day 1 - Data Lake (Part 1)
1:34:08
DP203 Microsoft Azure Data Engineer Associ...
databag
156,825 views
Manipulation Expert: How to Control Any Conversation and Read Their Mind Instantly!
2:15:26
Manipulation Expert: How to Control Any Co...
Jack Neel
387,210 views
Azure Tutorial For Beginners | Azure Course | Cloud Computing | Azure Core Services | AZ900 @SCALER
4:06:40
Azure Tutorial For Beginners | Azure Cours...
SCALER
42,992 views
Copyright © 2025. Made with ♥ in London by YTScribe.com