Unknown

0 views84419 WordsCopy TextShare
Unknown
Video Transcript:
e e [Music] [Music] e [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] e e e e [Music] hey there friends my name is Jeff Fritz and welcome to netcon focus on AI we've got a full day of content available here for you all about the great things that you can do with artificial intelligence Technologies and.net now whether it's programming with C whether it's building web applications we've got all kinds of sessions today that are going to help you to dial in some of those artificial intelligence features that you've been thinking about and wanting
to get working with your applications we've got sessions today talking about rag we've got sessions talking about large language modules we're even going to talk about building your own co-pilot we've got a great keynote to start things off with our friends Scott Hanselman and Maria Naga and I want to make sure that you tune in and check out all the content we have throughout the day of course this event is being presented Live on YouTube and twitch and the recording is being made available on YouTube now we've got a ton of links available for you
that you can check out we've got QR codes for you that are going to help you find some of this content and follow up and take some home so you can share with your friends so let me show you a couple of the links that we have available for you on our slides right here so the netcon focus on AI check out the collection at akam ms. netfocus aarn or take a picture of the QR code there we've got lots of QR codes for you throughout the slides that I have here all right now we
also have a survey let us know what you think of how we're doing not just with this event but how we can do more with AI give you more capabilities in yournet tool set all right aka.ms sl. netfocus a evaluation or you can take a picture of that QR code once again to get to that evaluation and give us a little bit of feedback we have a challenge for you an AI Challenge and some additional live streams through the month of September go to aka.ms netfocus a credential challenge live August 20th to September 20th you
can take a picture of the QR code to get there and we're going to have all kinds of live streams scheduled throughout the month focusing on more of those AI capabilities that we're going to discuss throughout the day today bunch of different hosts talking about the various technologies that are added into net to help make you and your application successful along with a little challenge to scratch that itch that part of your brain that's going to make this a little bit more interesting and fun to do more with AI in yournet applications we've got all
of the links for this event we have it in a collection along with the learn modules that teach more of the capabilities that we're going to be talking about today at aka.ms netfocus A/C collection once again that QR code will take you to that same location so you can get all of the links all of the things that we're talking about today in one place now this is neton focus on AI we got to make sure that we call out that of course netcon 2024 featuring The Net 9 launch is coming up in November so
we've got three full days for you coming up in November you can learn more at netc con.net we've got a call for Content there if you're a perspective speaker there's more than 30 sessions available for Community speakers to participate and give their content as part of this event now that happens during day three this is a three-day event day one and day two we've got a little bit during the day where we're going to have folks from Microsoft presenting and talking about the net 9 launch the next version ofet but that third day we go
24 hours around the clock everybody from every time zone is going to be able to find content at your time that's being presented and talking about cool Technologies and techniques for you to learn about with.net so make sure you join us November 12th through the 14th netc con.net there's a save the date button there you can click we've even got a link for prospective speakers to participate and finally of course we wouldn't be able to present this event we wouldn't be able to work through this without help from some of our sponsors make sure you
check out these folks they've been promoting the event they've even put together a little bit of a swag bag for you that you can enter and potentially win prizes from all of these organizations growth acceleration Partners mesus progress terer ABP avalara iron software code magazine octopus deploy snc fusion and nosce make sure you check out they have a swag bag that's worth more than $5,000 15 folks are going to get these fill out your evaluation at aka.ms sl. netfocus a evaluation all right there's gift cards there's software licenses fill out the evaluation there's a link
in there for you to click that'll take you into filling out and participating in the swag bag raffle all right let's get started it's time to get into our content and learn more about net and AI we're gonna hand it off to our friends Scott and Maria hey friends I'm Scott Hanselman hi and I'm Maria Naka and we are going to talk to you about AI today we're going to talk about net and how AI andn net are the match made in heaven we're going to do some really cool stuff we're going to augment applications
take them from Modern apps to intelligent apps right Maria yes and most importantly we're going to make sure that you feel like you can do it too I love that I want to feel powerful I want to be able to make my applications even cooler so what we're going to do is I'm going to take a moment I'm going to give you a little bit of context about how we're thinking about Ai and then uh when I'm done with that we'll bring Maria back and we'll get into some real demos can't wait all right let's
go I'm gonna bring up my screen over here and I want to call out that uh in in in GitHub here there's lots of ways to interact with co-pilot I'm using visual studio code I could be using visual studio Studio for Windows I could be using GitHub on the web it's all that same co-pilot on the back end now we see lots and lots of demos where someone will go and type something and generate a bunch of code and that's a that's a perfectly fine demo and that's a great powerful feature in GitHub co-pilot but
one of the things that I think isn't talked about enough is uh using it as a pair programmer kind of as a rubber duck uh a rubber duck programming model is the kind where you put a duck a rubber duck this is my Borg Microsoft rubber duck and I just put him on my on my monitor here and if I have questions I'll just talk to the duck and I'll say hey I'm stuck on this programming problem now the the issue is that the duck doesn't talk back but GitHub co-pilot can it is an infinitely
patient friendly pair programmer unlike my duck which is very quiet that will brainstorm with me so I can go in here and I could click uh start voice chat down in the corner because I've got some Mobility challenges right now in my hands so accessibility is baked in as well I'm going to say voice chat I could also type what do you think about this code are there any opportunities to make it more secure and this is really great here I want to pause and back up for a second and call out used one reference
it's actually indicating that it knows which part of the code we're talking about it's not looking at the entire code base it's it's referencing a section of that code base I can select that code and say hey co-pilot explain this fix this uh review it perhaps you want to generate docs generate tests but I want to understand from my pair programmer here what it thinks I could do to make some improvements so I'm going to say what do you think about this code and it says yeah there's a couple opportunities here you can parameterize the
URL because I'm you know not building that URL in a thoughtful way I am not in fact catching all of my exception handlings and I could be checking out things like SSL and then it offers some revisions some actual thoughtful revisions about how I could change that code and if I didn't want this information if I wanted to have a different information I could be more specific I could say select this and say explain this chunk of code for example I'm providing less context and it's getting deeper I'm only giving it lines 60 through 66
and then this is cool it can actually offer a follow-up question which was something I was actually going to ask it anyway what does the await keyword do I can just click on that and get Hub co-pilot's going to give me a a followup so I'll have big long conversations brainstorming with my uh GitHub co-pilot and I think about it as this kind of friendly Junior engineer that can just sit there and help me better understand my code and it's patient it's infinitely patient it'll sit there and talk to me as long as uh as
possible now my application is modern but it is not intelligent so I want to bring Maria in from the net AI team to share what the net team has been doing in this space and maybe I can make my application and other applications a lot more intelligent yeah thank you Scott so I'm going to share my screen for a bit I wanted to start with what we are committed in theet team when moving from one Tech wave to another net has been there through every single major shifts that we have had in the past 20
years we were there with you for mobile we were there for you for cloud native we were there for you crossplatform and now we are here for you in Ai and our goal here is to make it as easy and as simple as possible R you to leverage AI into your application now we understand the AI ecosystem is incredibly vast like you've probably seen how vast in is gone absolutely there's a ton of players a ton of code words and three-letter acronyms that I'm still learning to this day and I'm hoping during this keynote we
can guide people through this journey on how we are making AI more approachable for you so when we thought about our investments in mi we wanted to make sure that it resonated with where developers were going to learn how they were building their application as well as making sure that when we were going into the AI ecosystem we were showing up in places where net developers were learning about AI we wanted to ensure that you were able to deploy these applications easily and most importantly you want to be able to monitor them in production to
make sure that your customers and your developer experiences and your engineers know what's happening to that application on the learn side we set out to build a bunch of samples this was a great ex experience for us as a team because it put us in our developer shoes we were learning as they were learning so as we were building these samples it allowed us to start identify core building blocks that we needed to enable inet so that you could compose these AI components whether they were first party or third party in the net side we
made investments in core a building blocks such as tokenizers and tensors so that we would enable to build things like vector stores with other teens we work very closely with the semantic kernel team that sits right here in the Octo at Microsoft the semantic kernel enables developers to easily attach to the AI ecosystem in the AI ecosystem we wanted to make sure that we were working with places like open Ai and at build we announced the C openi client we also work with vector store such as quadrant and milis to build a first class C
experience and currently we're working with pine cone to build a c experience as well we also have been enable in the deploy scenario to meet developers in exciting new places like net Aspire for developers to be able to easily integrate with Azure AI as well as easily deploy with Azure with a and finally because of a spire and Azure monitor you can now monitor applications in production so before I go into demo I'd like to bring Scott back so Scott do you know what rag is I hear the term rag a lot I know the
r is retrieval but I'm still learning my my tlas my three-letter acronyms okay so R is a technique which stands for retrieval augmentation and generation so before I show the chat board let's go through a tidy scenario so you know me I like my Comics I like to draw so I came up with a comic book story so Scot imagine you were a customer agent and you were presented with a ticket of a customer complaint how would you go about figuring that out uh it all comes down to context I want to understand the problem
I want to find them a solution but I want to understand have they talked to us before do I know anything anything about them I would want to explore this problem space and this is where something like rag will happen so I'm going to explain rag in three steps let's start with retrieval so if I was a customer support agent as you mentioned you'd like to know context of where this was happening before if you are working in an office today you probably have to look over to your boss or look over to somebody else
and ask them for advice or you'd have to read to a Bice of documentation now in this scenario the customer who in this case is a customer support agent could ask a bot a question this question is then passed on to a Smart Retriever and this is called the retrieval section in the retrieval section we take that question and then we look up based on domain knowledge in this case that would be the customer support tickets now based on things like sematic relevance and cosign similarity what the AI will do is retrieve revant documents now
that's phase one am I are you with me so far yeah I think the part that really clicked for me just a second there is that we're not trying to think that a very large language model knows everything it doesn't know everything about all knowledge in the world it's a text based generator it knows a lot about some stuff but it doesn't know about my domain about my customers about the issues so it's that domain knowledge that clicked for me you're augmenting the large language model with specific domain knowledge and I assume that comes from
anywhere you decide you the developer gets to decide where that domain knowledge comes from yes and in AI you'll you'll hear about things like grounding this allows us to ground the data specific to our scenario now you jumped ahead like this gos how far you you are in this journey Scott you jumped to augmentation ah okay I didn't realize which is good which means everybody else is a net developer once we get to connect the dots too excellent so the next thing with augmentation is we have the data so it knows maybe you've asked about
a water bottle right so it's looking for everything relevant to a water bottle now it's going to pair the water bottle with your original question and it's going to use the llm to further analyze this data this makes sure that it probably does additional relevancy checks because some of the that might have been pulled could have been every single water bottle not the H2O water bottle so the llm comes into place yet again with a model in order to do that for you okay so this is that grounding I don't want it to talk about
other things the customer may have bought I don't want to talk about the entire product catalog we're slowly narrowing the problem space down given context given domain knowledge yeah exactly that us we reducing things like hallucinations right the I don't like to use that term hallucinations and I love that you said grounded because you're in you're kind of showing you can fabricate you can generate you can see these models kind of float away like a balloon but I love that you said you want to ground it in the domain knowledge and that way it stays
on task yes so you want truth versus hopeful truths that's a great I love that analogy thank you for that so the next one is like okay we have our question we have our response we have to generate a response and the response has to come back in a format that makes sense to the customer support agent and this is where natural language comes back comes to play and then it is sent right back I dig it I'm picking up what you're putting down okay so I'm going to switch screens and I'm going to show
you rag in code oo demos live demos oh yeah I haven't been I haven't demoed in a while so please show me some patience no i' absolutely I'm excited okay so we are in vs company and I am going to show you the rag new simpling code like we're not going to get into details on ingesting and chalking and storing the data there's so many presentations throughout the day that will show you how to do that but I would like to show you the rag n the rad new the first one is retrieval where you
are doing a search for the results against a set of documents we're going to look at the ticket the product ID the value as well as the query the query in this case is the question but this could also need something as simple as summarization as well we're then going to pair the query in the augmentation section where we're going to start using the llm to start even creating narrowing down our context and finally we are then going to generate the responses and get them back to the user so let's run this application so Scott
I'm going to show you this with AI and without AI so you begin to notice the value of adding AI into your application you ready ready because we're going to see a modern application become an intelligent application I like it so I don't want be too fuzzy I want to keep it simple we're going to stick to the console first okay so first thing I'm going to do is I want to run this app so I already ingested the data before because the ingestion section process can take a while and we want to make sure
that people don't get hung up on that so I've already injested the data I am inspecting the ticket and I'm just going to pick on one of them let's look at the filter cartridge now if you look at this it's a customer agent interaction this means this is a pretty good system but that means I have to read through this and if you're dyslexic like me it can take a while to be clear though I want to make sure I understand so what we're looking at is an actual transcript of real humans talking to each
other exactly this is not AI these are a transcript of a chat between a customer and an agent with a question and to your point to absorb this for anyone is is a lot there's a lot here dyslexic or not I'm I'm still kind of my eyes are blurred as I'm ready to like uh it's like when you get on hold and they say hang on and they go and they have to read and catch up that's going to take a long time it's G to take a long time so how do we help people
through that process let's summarize the information so people have context which will enable greater customer satisfaction so I'm going to go back to my code I'm going to do a little bitty trick as you can see I'm a first class typist you know commenting and uncommenting lights like lives and we're going to restart this application and right here I have added AI to do the summary so what we should see is a summarization of this conversation we'll do one again okay and you're doing this as a developer so you're seeing the compilation and the running
and the starting up process takes a little bit of a time here because you you've changed code recompiled it and are running it again exactly and if you are doing this in a real app have you ever used AI when you see the dot dots happening yeah yeah that's the kind of like UI elements that we can give to to customers so they know that something's happening in the background absolutely so we're going to hit inspect we're going to pick the exact and right now we have a summary right to to see that's that's a
nice that's a delighter I think of that as a customer delighter you know I'm thinking of our friends at uh uh at the story graph that have like a um a book reading reviewer site and it's like there's a thousand reviews of a book give me a little thing at the top that says hey what's the summary so you've made it so I can read a sentence and choose if I want to read the entire multiple paragraphs that's going to make scanning that a lot more easy a lot easier here we have only a few
thousand entries but you can imagine if people had Millions over years over different data stores how effective would that make them yeah and you could you could say uh from a user interface perspective you might say AI generated summary and that gets to that responsible AI perspective where a customer service agent who is in in their application experiencing this for the first time would say oh okay I see that that part in the yellow at the top is an AI generated summary and I see that the bottom was real humans talking and that makes it
really clear uh what's going on here which is super cool yeah our like our whole thing is not to remove the personal touch that people get when interacting with a real human yeah that makes a lot of sense I like it all right so how can we do an AI demo without chat you know I appreciate that because chat Bots are kind of the easy thing to do and it's nice and refreshing to see just little delightful moments of AI that are taking my application plus one Plus+ as opposed to the chatbot all the things
no chatbot only when needed is my but I'm GNA show you a chat bot anyway that's okay because the important thing there is to show the ability to do anything inet whe whether it be a chatbot or a delightful uh Improvement in an application here all right so the last secret Source we to enable chat and there we have itet run one more time we inspected we enter now we have have a summary plus the conversation yeah want to begin a chat and like okay so this is about a filtration system so I could ask
a question how many times should the and it gives me a summary that I can go read and I can personalize on my own to help the customer through their issue oh interesting so the chatbot in this context and maybe I'm understanding this correctly is the agent is getting a way to search using reg through all of the different facts and documentation so that they could still make a uh make a recommendation to the user so you could make this chatbot for the backend for the support person or for the front end for the customer
it's up to you up to you and in this case what I also wanted like you quote onto something we think about domain knowledge it's not enough just to know about what the format issues were you also need to know where the manuals to the products that you are referencing because the two of them provide you with a customer solution and the beauty of AI and especially when it comes to data gestion is the format of the data shouldn't matter we should be able whether it's a text or whether it is a PDF or whether
it is an image or whether it is anything as long as these all come together to form context we should provide a meaningful experience so our users so you C something okay I was looking at this data we were looking at the agent interaction but then you just asked about something specific to the product and how did you get them both to work together and that's what data ingestion does I see so it's that that ingestion thing is important that data might be the product catalog it might be Json files it might be PDFs it
might be markdown it might be all of the above and that's that important step exactly and going back to grounded versus ungrounded results very cool I dig it and will this code be available like this seems like something I'm GNA want to put on my side oh it will be available so if you go to net net AI samples we'll have Links at the end of our Keynote this will be available so you can use it as soon as you watch this video okay so maybe I'm putting you on the spot here I love a
I love a good console app and I want to give you credit for having a colorful console app because you had interact you know you're using Spectre console it's beautiful it's multiple colors and stuff like that do you have anything uh that shows me rag maybe a little more towards a web application like like my web application oh scor let me just show you something I earlier like we doing all cooking shows we've just the technical challenge now we're doing the showstopper I love it I love it so you are familiar with eShop correct yes
yes eShop is one of our big samples sample applications historically have been uh kind of simplistic like The North Wind sample is like products ESOP is a fully formed very fleshed out sophisticated sample that shows you how to do container orchestration microservices really big sophisticated application it's quite a extensive sample it is but I have a new version of can you guess what it is I suspect that you've added Ai and rag to make it even more fabulous I have but I also wanted to focus on core scenarios going back to the chatbot conversation like
everything in November we showed eShop with a chatbot where you could ask what products were available that's great but when we saw when we did research we found that customers developers everyone in the industry really wanted AI to help them with the Madan tedious talk tasks such as summarization translation um sentiment analysis right you want you to guess how your consumers were feeling about the experience so I am going to show you the beautiful version of the con application I love it the Maria aied version very let see it we're going to start from a
customer scenario so let's say I'm a customer right and I am opening up a new support ticket so I'm going to start in your request is it about a specific product yes it is now how many times have you tried to fill in like a customer support request and you have to either go back to your email or go back to like to your outbox and figure out your inbook of uh Amazon or another product and try to figure out what the name of the product was yeah I mean they're not very sophisticated right often
times the support ticket page is just a bunch of text boxes that are not clever text boxes and I just end up typing into them and I presume that it generates some email on the back end it's not a Smart Box no it is so I'm going to show you something just going to start typing okay what you see uh so it's autoc completed but you tell you typed in bottle and it found bottles cool it found bottles Now it only found bottles that Adventure work sells and we did this with something called smart components
so smart components is like a UI AI element that we experimented with on the net team so Steve Stander said Built This Together said how would developers how would consumers actually benefit from AI inside of their UI so this is the experience of it customer didn't have to go back and forth they didn't have to copy and paste they're like oh I remember buying an aqua flow 750 million Mill water bot so you could do that right and if I understand correctly you could be even more vague that's not just a search autocomplete box you
could say H2O you could misspell bottle you could say uh you know the the liquid holder thing and it would figure out that it was one of those water bottles it would now that's the beauty of like adding smartness to your application intelligence to it so I'm gonna just write something pretty vag which is like my bottle my bottle is me exclamation mark I am right and I can submit it and there we go as you can see I'm a frequent complainer yeah apparently this is a real problem for you need tech support for a
bottle but I'm yeah so let's go and look at what the customer support agent would see gotcha G to look at this now let me zoom in a little because I want you to feel the magic of this okay so we have Alice as Alice is a frequent complainer at is own but what you will notice is it gives it a sentiment analysis oh see I love a good Emoji I appreciate me too right it gives it a sentiment analysis of like hey Al has been here before and this time she is not happy and
it also gives a satisfaction R which means I could either not answered it so even for my person who's using this app for the first time they know which ones to address first at what are the most important cases simply by a class this is great I love that that you know I picked on you a little bit when you showed me the chatbot but what I'm seeing are delightful little improvements that are making apps more pleasant more helpful more accessible easier to explore easier to absorb the the it's the toil we've talked about this
like oh you're a tech support person they they've come with a complaint here read two pages of text or you're a tech support person you've got a crushing list of 349 Open tickets find the ones that are the most mad y you've solved those problems with with code with net back all and this is all net this is Blazer this is full set there is no magic here it is just.net and this code will be available as well for everyone to use at Steve Sanderson will go into more details on how he built this like
I I think you can set this as a Steve Sanderson application yeah the team is doing a great job Steve is a great communicator we're seeing a lot of really really cool uh sample code being created that's very actionable I could imagine going and putting rag on my podcast site which is written in.net having people ask questions about the last 400 hours of content hey I was looking for that episode where Maria was on the show talking about AI she said something something something can you find that that would be a cool feature I could
add to my podcast I would I would install the habital minute AI podcast like we should build that for. net night we should do that let's get on that okay so I am not willing to deal with Alice right now so let's see if Sarah's a little bit happier so I'm on that now back to this what you note of the know up this is a bit too big because you have not seen the magic remember the summary we see the summary up here as well nice we also have a real life conversation history between
Sarah and the previous agent so if I am the new agent who's on Shi right now I have access to this information and I have a summary of what happened before me so that is cool that is really cool pretty cool so I can ask questions like Sarah like Sarah's had initi she wants a refund right because it's in a lot of our replacement so why don't I ask AI you can issue a which one and Sarah all right so I think we can zoom in a little bit on that again go back to the
larger fonts and so the assistant has gone and searched for refund policy probably through a bunch of manuals and a bunch of knowledge bases and it's it's providing backend help for uh an assistant which would allow that person to be uh up to speed faster it would make it more fun to to do their job and it would make it less stressful exactly but this is what I think is also really important talking about grounding we also want to be an e being ethical we want to also show you where that information is from so
you can see a little reference here you probably also noticed it when you use for example yeah that's a great point if if I'm focused on asking it a question about specific things I don't want to go and suddenly start talking to my customer support chatbot about you know Elden ring or whatever and it would say oh I'm not a video game chatbot I'm a customer support chat but I want it grounded in the situation which is this knowledge based article and those things that are important to my job I don't want to chat about
random stuff so I need to keep it on topic exactly and what I love about this is like look it highlights exactly where it's from so you don't need to read the 10 P oh wow okay hang on that's great so it's referenced it just like we saw in GitHub co-pilot where it said here's the lines of code that I'm looking at this is the section that is my context and it highlighted it this is telling you not just the dock that it founded at but it highlighted the line that it's referencing that is grounded
I dig it that is very grounded so you could use this for a couple of things you could one ask AI to respond but as a person you probably want to add your own flavor so I've always advised people yes you can ask your assistant to make sure that you're responding with the right information but you could also just respond yourself and that is what I have for you today Scott this is really cool I love it this is giving us a real sense of what can be done and how we can uh you know
take an application spice it up a little bit sprinkle sprinkle a little uh AI into it very Demir very mindful very thoughtful I dig it thank you so much Maria for uh spending time with us today this has been a pretty cool keynote I feel good about this me too it's uh be my first time doing a talk in a while and I'm glad I got to do it with you absolutely smashed it shoulders back be proud of ourselves we've made it we've got a great day filled with content for you I feel really good
about this keynote and I know that the other talks that we're going to get to learn from today are going to be even better uh we're going to bring experts from all across the net organization to teach you how to make AI impactful responsible thoughtful and make your applications not just modern but intelligent thank you for spending time with us today thank you all right thanks so much Scott and Maria that was a great keynote I love seeing all the cool things that we can do withn net and artificial intelligence Technologies I've got some ideas
of how I can update some of my websites to do a little bit more this is interesting um but you know what I want to make sure that I hear from our next speaker Stephen tab who's going to be talking about getting started incorporating artificial intelligence into ournet applications I know he's got some things there that are going to help me out with some of the web apps that I've been building and then after that uh our friend Steve Sanderson is going to be joining us with Matthew balanos to talk about better together.net Aspire with
semantic kernel I know semantic Colonel is going to help make some of my applications better being able to summarize things and and put together some descriptions for for blog posts and things a little bit better so I'm interested to hear what they have for it us let me make sure that I go back and call out some of our slides some of the information that's also going on as part of netcon focus on AI um we have a collection of learn materials that are available on our slides over here check it out at aka.ms netfocus
aarn that QR code will help you get there and you can learn more about all the resources that we have available for you online so that you can take your education a little bit further even share with some of your colleagues who aren't watching the event right now we have a survey available as well check that out aka. ms.net focus evaluation let us know what you think of the event let us know what you think of the artificial intelligence tools and technologies that we're adding into net for you and of course there's a link in
there that you can click to join us for our sponsored wag bag giveaway there's gift cards there's software licenses and so much more 15 folks are going to win all of those items it it's a it's a pretty good siiz swag bag that with gift cards and software licenses and things check it out make sure you go to the evaluation fill out the evaluation and click the link you have to click the link to go over to that swag bag entrance and join that raffle all right well that's about all the time I have on
this break let me get you all set for our friend step talb who's going to be talking about getting started incorporating artificial intelligence into your existing Donnet applications Stephen take it away hi my name is Stephen to I'm a developer on the net team at Microsoft and I'm absolutely thrilled to be chatting with all of you about getting started incorporating AI into your net applications um if you are anything like me um uh when I'm faced with a new problem about something that I don't really understand I often have sort of a a blank P
blank page fear I'm staring at a blank page I don't really know how to get started so what we're going to do for the next 20 to 30 minutes is start from a blank page or almost blank page we have console right line hello Ai and we're going to see how to start incorporating AI into your net applications and go from zero to 60 in no time at all now we're just going to scratch the surface we're not going to talk about all the various ins and outs we're not going to dive deep into any
one particular area there are a lot of other opportunities to explore more deeply what we're going to do today is ensure that by the time you're done watching this uh you feel like you can get started immediately incorporating AI into yournet applications so with that let's get started so here I have Visual Studio I have a brand new C console application and I'm starting with a console application not because that's the only place you can incorporate AI uh quite the opposite you can incorporate AI into any net application I'm just using a console application to
show that there's nothing special here right we're just starting from scratch uh now there are a variety of ways that I can use AI in a d net application I need some model or some service that provides that capability to me and one of the easiest ones to get started using is open AI or Azure open AI uh so I'm going to start with using the open AI client now to do that I added a new get reference to the open AI client just search on newg get for openai and now I can say using
open AI pretty easy and I can say open AI client so I have my client here uh then newpin open AI client and I need to provide to this uh an API key uh which is the one thing I have to go to the service for I go to the service and say hey I'm going to start using your service give me a unique token that you can use to identify me uh and so I need uh to put my API key here now if you look at the Constructor for opening ey client you can
see it takes an API key credal so we're going to pass in system client model API key credential now because I don't want to leak my key to any of you I've put it into an environment variable I'm just going to zoom out just a little bit API key credential we'll bring in that namespace as well and then I'll say environment get environment variable and for me I've put this environment variable under AI open AI API key and that provides me with the ability to talk to the service now I specifically want want to work
in terms of chat so I'm going to say chat service and then I'll say client. get chat client and I need to tell it what model I want to work with now if I'm working with open AI there are a variety of models I can choose from here I'm going to use say GPT 40 mini which is a relatively recent model that open AI has introduced uh we can see that this is a uh an open AI chat client and now I can start using it so I'll say console. rightline um bar results equals uh
chat service Dot and then I'm going to call this complete chat a sync method to pass in some text and get back a response so I can say what is the color the sky uh and then I'll just print that out console right line result. value so in four lines of code plus a few Imports uh we're now bringing some AI into our application uh so interesting completion there uh this should pop up once the build completes there we go and we can see we're sending off the request we're getting back a response and the
response was a detailed discussion of the color of the sky now we saw that from the time the window opened to the time that something was printed out there was a bit of a delay there so maybe I actually want to take advantage of the fact that the way these mod models work is they end up generating one token you can think of that as a word or part of a word one token at a time um and rather than waiting for all those tokens to arrive and then printing them all out I could stream
them to in this case my console but if this was a website I could stream them back to the client or if this was a a graphical user interface I could stream it to my my graphical UI uh so to do that I'm going to make a small change here I'll say await for each update in and I'm just going to do the exact same thing I was doing but instead of complete chat async I'll do complete chat streaming async this is going to give me a a series a stream of updates coming from the
service and then each of those updates can actually include multiple pieces of content so I'll just say for each item an update do content update console. write item that's it now when I run this we can see those tokens starting to stream in as they're coming from the server no longer am I having to wait for all the content to be here before I see anything I get it as it arrives and it creates this nice sort of user user um interactive experience and again very little code to achieve what's actually a few years ago
would have been a mindboggling amount of functionality now here I'm talking directly to open and that's great and there is a wealth of functionality that I can uh get from this open eye client I can do uh text to audio or audio to text or chat or generating embeddings uh or uh generating images from uh from text such as with Dolly and so on um but for a variety of reasons you might want to be able to talk to something other than open AI or something other than uh Azure open AI maybe you want to
talk to Google's gemini or vertex maybe you want to talk to mistal AI maybe you want to talk to hugging face or AWS Bedrock or maybe you want to talk to a local model uh from Windows or a local model hosted with olama or local model hosted uh with Onyx runtime or a variety of other things and I'd really like to avoid updating the core Logic for the rest of my application and instead speak to an abstraction that allows me to talk to any of these services without having to constantly rewrite it every time I
change my my client um so I'm going to make a few small tweaks here let me go back to what I had before just to keep it simple uh we're going to go back to our our simple uh just uh not non-streaming request and response uh and instead of using a um hardcoded to open I client I'm going to change this chat service to instead be an i chat completion service this is an interface that comes from a set of libraries called semantic kernel you can saw we brought in a namespace from semantic kernel here
semantic kernel is a variety of things it's a whole stack of functionality but the bottom of the stack there's a layer of simple abstractions for talking to language models and other AI Services uh so instead of using a uh the openai client directly I'm going to create an openai chat completion service um as we saw before I can pass in a model so I'll pass in say a GPT 40 mini and then I'm going to pass in the same API key as I did here these also layer so if I wanted to pass that open
ey client to this openi chat completion service I could uh and now I just need to make a little change here to change to a different method name because I'm using the abstraction rather than the uh that opening ey client specifically so we've cleaned up the code a little bit we're dealing with this interface directly um and uh now I can replace this interface with anything I want so I can run this for openi and when this runs eventually my build will complete so here we're going to get back the answer from openi but if
I wanted to let me just replace this I'm not going to change any of this logic in the rest of my application I'm just going to change my startup logic here to say actually you know what I want a Google uh Gemini chat completion service now Google provides entirely different models so the model the name here I want to use is Gemini Pro uh and then I'll also provide environment variable here I've actually called mine AI Gemini API key um and without changing any of the remaining logic in my app I'm going to run it
again and now instead of talking to open AI now I'm talking to Gemini and you can see these models are little a little bit different open AI gave me a very wordy response by default and Gemini is just saying the color of the sky is blue U if I wanted to I could substitute something else maybe I want to talk to a mistal AI so I can have a mistal AI chat completion service mistl calls one of their models mistal small so I'll use that uh get environment variable and I'll do the exact same thing
uh I believe my environment variable is called mral AI run this again oh no not mistal AI just mistal uh and now we get back eventually a response a much more detailed response from Mr Lei model so you can see I'm getting different behaviors from the different uh models and services that I'm talking to but the rest but the the logic of my application is able to remain the same because I'm working in terms of these abstractions and if you want to um you can start with these abstractions and if you need to do something
specific to the underlying client you can break glass and um basically cast back to the original thing to have the full functionality great so we've got a very basic functionality here but I was just using a hardcoded uh question so let me um just go back to open AI here uh and I'm going to write a little simple chat Loop um now even though that this is called chat that's not the only purpose this is sort of a general purpose way of interacting with the service one of the purposes is to have a conversation with
it from for example a chatbot but you can also use this for more programmatic purposes I could ask it to compute something send me back the results in some structured form like Jason I can then parse it and I can use the result as a computation in the rest of my application so incorporating AI into your app doesn't just mean build a chatbot it can actually mean take advantage of the power of AI to power your application which might not have any kind of chat feature in site that said I am going to build a
little chat bot here so say while true we're going to have an infinite Loop uh and instead of printing out uh the answer to a specific question I'm just GNA type in a question at the console and I'll print out a little prompt to say Q right and now I can run this and I get my little q and I can say hello uh this nice to meet you uh my name is Stephen nice to meet you sa great uh what's my name I just told you so you should know but you don't and the
reason for that is these services are stateless um every time I communicate one of these I'm talking to a model that was trained up to a certain date it has no information uh beyond that it has no information that I previously told it all it knows is what I just sent it it's part of the The Prompt that I'm sending to it so if I don't tell it again when I say what's my name if I don't tell it all the previous communication we had it has no idea who I am so let's fix that
the fix that we need to store our conversation history and resend that conversation back to the service each time we communicated with it so I'll say chat history call this history um and then instead of just uh passing this console readline directly to the model I'm going to say uh question all say history. add user assistant message I'll add that readline in there then instead of passing console readline here I'll pass in the whole chat history and when I get a response back from my AI assistant I'll not only print out that response but I'll
also add it into the history so now every time there's additional content I'm putting it back into the history and every time I communicate with the service I'm sending the whole thing back now we can have the exact same interaction we just had I can say hello my name is Stephen I can say what's my name and now it knows who I am because we've rovided all of that context uh context back to it of course if I were to say um how old am I it has no idea right it doesn't know who Stephen
it is it doesn't know who I am I have not provided it with the information it needs to answer that question now I could proactively put some additional information into the prompt I could say uh add user message um Stephen is I don't know 29 years old we're going to make up a number here and now I can have the exact same conversation I just had hello my name is Stephen and I can say how old am I and now because I just told it that I'm 29 which I'm not but I told it I'm
29 it says hey you told me you were 29 and it can use that information but I don't want to have to put all possible the prompts I want to be able to allow the model they allow the service to ask me to retrieve additional information um so to do that we're going to make a few changes here now first you notice I've got my application logic and I've got my sort of startup logic here um now in a real application you would very unless you were writing a simple tool a simple console app you're
probably not going to co-mingle these you're going to have some startup logic somewhere that most likely is injecting the implementations of various sub ractions that you're using into a dependency injection container maybe as part of your aspn net startup and then in the rest of your application you're going to be taking that dependency injection container you're going to be querying it for the various interfaces and that's one of the most common ways that these apis end up getting used so let's do that here and then we'll come back to the age thing in just a
moment now if I were already in an aspnet application if I were already in a Maui application um I would already have a DI container kind of established for me by the template I was working in or by the environment because I don't have that it's just a console application I'm just going to set up a little one myself so I'll say service collection um and I I'm going to get out of this the same uh service provider that you would get and you would have handed to you in other environments um and now I
can add things into my service collection one of the things I could add would be an open chat open AI chat completion uh this is going to take the same arguments that I had down here so I'm just going to copy and paste that get this uh and now in my application when I want to use a chat service I can just say services. get required service I chat completion service and use this and so now in my startup logic I can change to use whatever service I want to use whether it's a local model
or a remote service whatever it might be maybe I want to change it for testing purposes to mock something up uh maybe I want to be able to have some fall over Behavior where if the service I'm talking to has an outage I can switch to something else maybe I want to do ab test uh maybe I want to insert an additional wrapper some additional functionality around this like a cache so that rather than going off to the service every time it looks up in a cache before it sends the question and gives me back
the results I can do all that by changing my configuration logic upfront and then just having injected into my application the thing that I should be working with so that's what I'm doing here uh and I you know we we see it behaves exactly the same I can say hello and we we get back responses now what we talked about wanting to do was is to add additional functionality into this application I'm going to do that using something called a kernel which is one of the core types that comes with semantic kernel and all it
is is a little little object that stores services and in particular plugins additional functionality that can be made available to the model so I'm just going to say add kernel just to make sure that I can get one of these from my di container and out here I want to uh get a kernel from Di great and now I want to be able to add some functionality into this kernel that will allow the model to know without my telling in advance how old I am so I'm going to come down here and write a little
class let's call this demographics and I'm going to write a function let's call this a get person age that takes a name and uh we'll say name switch if it's Steven uh we'll say let me say 29 yeah um if it's Elsa we can say 21 if it's Anna we can say 19 and if it's anyone else we'll say 40 is reasonable I'm also going to mark this as a function that I want to be able to import into this kernel and now all I have to do up here is say kernel import plugin from
type and now using reflection this is basically just excuse me inspected an instance of this type to find all the kernel function object objects it's imported this function and I can add additional metadata here I can add descriptions of the properties the parameters and the function just going to leave it as this for now now I'm going to do h two more things first I'm going to create a settings object uh that I can pass along with my prompts to configure various things you can see co-pilot is suggesting here various settings that I might want
to configure but what I actually want to configure is I want to tell it that it's allowed to automatically invoke the functionality that I'm providing I'm going to do down here is I'm going to say settings and kernel so I'm passing in these options including the ability to automatically invoke this stuff and I'm passing in this kernel object that just contains these plugins now when I run this again and I say hello my name is Stephen let me try that again my name is Steph how old am I now because uh it had this ability
to call back to me it knows that I'm 29 years old or I can say how much older is Elsa than Anna and again it doesn't know how old Elsa is it doesn't know how old Anna is but it was able to ask me to invoke a function elh to with Elsa and it knows that that's 21 then it knows it's 21 it was able to invoke it with Anna and then it knows that she's 19 and then it was able to do the the logic to know that there's a two-year difference between them now
because we're taking advantage of this integration with Di and it's fully integrated with the rest of the net ecosystem we can bring in other functionality here to get a much better understanding of what's actually happening under the covers so let me add in some logging for example I'll say add logging um and I'm going to do console logging and we're going to have it be the most verbose levels so say log level Trace uh to add this logging now we'll run this again now we can see we already see some information being output just by
adding that into the DI container now I can see that we imported this get person age into the kernel we have one function available uh and if I say hello we can see the message that the chat message that's being sent up and as part of this request response we can see that I used 42 uh request tokens prompt tokens as part of my request and the answer that came back had 10 tokens meaning that the total number of tokens used was 52 uh and that's important for two reasons one with pretty much all of
these systems uh you are um uh limited by how many tokens you can use every model has a different context window or token count that's allowed uh for both the request and the response so I might want to keep track of it for that purpose also most of these Services bill you based in terms of the number of tokens use so it's important to understand from a cost perspective as well now if I ask it a question like how much older is Elsa than Anna we see we sent another request and this time we get
a lot more interesting uh many more interesting things happen we got our initial request response but as part of that response the response it sent back to me had two tool requests um one of them was to invoke get person age with the name Elsa excuse me and one of them was to invoke yet person age with the name Anna and then because we told samanta colel it was okay to invoke these things it proceeded to invoke get person age with Elsa it succeeded in return the result 21 taking the am time then it invoked
again get person age passing in Anna it succeeded returned the result 19 it took this amount of time that we turned the crank again we made another request back to the service sending all of those results back along with the all the previous chat history which contained our original question and then we get back the result from the service saying Elsa is two years older than Anna because now it has all that context now in this particular case I wrote my own plugin which can be arbitrary. net code but you can imagine reusable libraries of
this kind of functionality in fact saman kernel as part one of its libraries includes a set of plugins one of which is is really useful for these kinds of examples because it allows me to do full web searches so I could come in here and I could say I want to import a plugin from a new object uh and this new object is going to be a Microsoft semantic kernel plugins web uh what do we call it web search engine plugin uh and then this takes a particular search engine to to talk to so I'm
going to use the Bing connector and I also need to give this an API key which is also for me stored an environment variable so we pass in API key and now um let's try this again so I've added in that plugin we can see it imported uh uh an additional um search API and I'm going to say um who won best actress at the 2024 Academy Awards now the model that I'm using has a training date before the 2024 Academy Awards appeared so it wouldn't be able to answer this question otherwise but because it
can now search the web oh I can't search the web apparently it's getting permission tonight it's possible my AP key is out of date we'll try one more time and we'll see if it works otherwise we'll move on um who won best actress at the 2024 Academy Awards no joy doesn't matter we can still see what we wanted to see which is that as part of my request the tool came back saying I would like you to issue a search for 2024 Academy AB boards best actress winner so the model was generating this request handing
it off to the plugin and for some reason my API key isn't working out exactly the way I want and Bing is saying go away that's fine you get the basic idea all right all right we have a couple minutes left um so you can see that you know you can build up these really powerful uh AI experiences with very little code we've only scratched the surface surface I'll show one more um one more aspect of this let's say that as part of these plugins I um I wanted a little bit more control over um
throughout my application maybe what got invoked I put a bunch of stuff in but then if the model asks for something or maybe it asks to invoke something with some arguments I don't want it to do I want more control over exactly how that happens so I'm going to write a little class here let's call this uh permission filter and I'm going to implement a function invocation filter uh function invocation filter will be invoked anytime a function is to be invoked so let me just print out to the console here uh what function is being
invoked and ask for permission okay to invoke context. function name and context. function. name and I want to do string. jooin with the arguments just to put a nice little list and then I'll say if uh console.readline so I'm going to just ask the user to type in yes or no then yeah we'll do that otherwise we'll throw an exception uh let's just say uh error user denied request okay so I've written this filter and now anytime the system asks for one of these things to be invoked I want this to be involved with dependen
injection is no problem I just say add Singleton uh I want to add a function filter in we'll use a permission filter and now I'll run this again actually let me just turn off logging to make it a little bit easier to see what's going on so we're not surrounded by noise uh and I'll say uh who won the 2024 Academy Award for best actress and now rather than just invoking it you can see it's actually asking me is this is the the function that's going to be invoked the function that's going to be invoked
with these arguments is this okay and I could say yeah sure go ahead or I could say no and it basically gets denied and it's trying one more time and it gives the model gives up it says you know what I wasn't able to do what I wanted to do I'm done um and so you can very easily plug in this functionality in one place and have it kind of apply pervasively throughout the rest of your application so with very few lines of code we've seen how to get started using something like the open AI
or Azure open AI client directly we've seen how to switch over to using interfaces and abstractions from Samantha kernel to be able to talk to open AI or any number of other services we've seen how to employ function calling with models and use various plugins we've seen how to use various filters we've seen how to do streaming and that was starting from a blank page in only 20 to 30 minutes imagine what you could do with a day or more so uh my my hope is that all of you watching uh turn off YouTube uh
go open Visual Studio and add these new get packages and just start playing and see what the the amazing things are uh that you can build and with that thank you okay uh welcome everyone to this session my name is Steven Sanderson I'm from the net team and today I'm joined by Matthew balanas from the semantic kernel team and we're going to be talking to you about some ways that net Aspire and semantic kernel can work together to give you a really good experience as a developer adding AI features to your application so let's start
with what does that even mean what kind of AI features would you want to add to an application that is AI even useful for a typical business app so I think the answer is yes but let's go through an example of it and I'm going to try to give you just about the most conventional and relatable example I can think of which is e-commerce so imagine there's an e-commerce site which uh as well as selling products to people also has a Customer Support Facility where customers can come along and submit inquiries about products that they
have bought or are thinking about buying and these inquiries get sent through to the staff working there and the staff of got some kind of web UI for working with that and it's probably got a big Grid in there in the middle that's perhaps how their workflow works so very typical business app how can we even use AI in a beneficial way here is it useful well yes uh for example we have got a search box so we might start by thinking can we upgrade that to be semantic search so that people can find stuff
without having to know exactly how it's phrased or without worrying about spelling mistakes we've also got these little titles here that can help staff remember what each of the different tickets is about and navigate around quickly and so it's nice to have an AI system automatically generate those for us we've got these different ticket types and we can use classification to set the right type automatically based on the text which allows you to trigger different workflows if you want to and then we've got these satisfaction rankings and it's very easy for a language model to
give a sensible estimate of customer satisfaction so we can do that and that might helps staff Focus their attention where it's most needed but that's not all we could also go a bit further and say when someone opens one of these tickets and they start working on it they're going to have various Fields they can fill out and just like before we could use AI to automatically classify or enable semantic search we've then got perhaps a big conversation between the staff and the customer and that could take a lot of time to keep rereading each
time you open so to save humans the trouble of reading all that stuff what about have ai produced some nice little summaries of that as well and then when it's time to actually answer these questions it's perhaps relevant to have some sort of chat assistant that does Q&A and that can find information from your business data in this case product manuals and of course we need it to be accurate and not hallucinate so we could force it to provide citations for any claims that it makes and then finally when it comes to typing a response
to the customer we could have a typ ahead system that suggests phrases that fit in with what the users type typing and that matches all the information that we've already got so overall I think it's clear that there are lots of opportunities for making this sort of application more productive and more beneficial for your users with some AI features and it doesn't even have to be very difficult so let's have a look now at some ways that you can do that at the code level now obviously we can't go through every line of it because
we haven't got time for that but I'm going to show you some of the highlights of how Aspire and semantic kernel can work together to enable that sort of thing okay so over here I've got visual studio and it's got my solution in it and this as you can guess is an Aspire application so here's the Aspire dashboard and you can see I've got lots of different services like vector databases and other stuff in there the ones that we're going to be focused on just now are the back end where most of the app logic
is and a couple of different web uis one for the customer one for the staff so let's start with this staff web UI and you can see it looks kind of like we had in those PowerPoint slides just then uh with all the stuff you expect so you know like semantic search to find all the things related to food and so on let's go into this ticket here you can see it's got quite a long conversation very long I definitely can't be bothered to read all that uh but I don't have to because the AI
has produced a nice little summary of what's that's all about so let's see in the code how that actually looks how do we generate a summary like that is it uh difficult or not so I'm going to go into the bit of the backend API where messages get posted okay so this C method here runs and you can see this uses normal Entity framework code to save the new message to the database but also each time a message is posted we generate an updated summary how does that work well the main thing is that we're
using a service called IAT completion service now that's an interface provided by semantic kernel and if you don't know it semantic kernel is a set of net packages and libraries that give you abstractions of different AI services and implementations for them so we can have various different language model backends for this chat completion service and then we can call any of them without having to change the rest of our code so in this case we get the chat completion service and I'll show you where that comes from in a minute and then we build a
prompt that we're going to pass into the language model and we tell it you know what product this is about the brand we emit all all of the messages in the the chat history to it and then we tell it what to do we're going to say okay write three summaries for me one uh a 30w summary that we'll use on the ticket Details page a short summary up to eight words that we use on the ticket list and then we'll also tell it to produce a customer score satisfaction and uh it's quite easily able
to do all that we tell it to reply as adjacent object which we can then pause and save all that stuff into our database all right so it's not too difficult to do this kind of thing and just to show that that works when we do it live let's add a new uh support ticket to our system so I'll go in here I'll say okay I need some uh food for my cats so let's say uh protein energy bars maybe those are good for cats I don't know now because I'm a customer I'm not going
to write a short succinct clear message I'm going to write an absolutely massive message that's really hard to understand it's got loads of information in it about my cats and their personality and it's got poem about cats and all kinds of useless stuff like that and somewhere hidden amongst all that is my actual question so it's going to take ages for the support system to even work out what I'm asking them um but actually it doesn't take that long because you see we get this nice little summary that's automatically generated energy bars for cats and
uh specifically picks out the most pertinent question are energy bars safe for feline consumption okay so you can see how that helps now in terms of where this IAT completion service comes from what is it is it open AI is it something else well one of the nice things about Aspire is that it can orchestrate this sort of thing for you and in this case I'm configuring it in my apphost so all the services in my application share the same language model service and I can easily switch it around in this case I've actually set
it up with a connection string that uh makes it use open AI uh but if I wanted to I could simply swap these two lines of code and now it would use Ola which is a way of running language models locally on your development workstation uh so everything happens fully offline now I'm not actually doing that because it runs slower on my laptop but you can if you want to especially if you've got a more powerful GPU and you don't have to make any other code changes okay so that's Aspire and semantic kernel working together
and in fact the integration between them goes a little more deep than that uh in terms of for example Telemetry if I go into this tracers list here and we have a look at the activity that's been going on one of the things that we'll notice is that it's able to recognize when we've made calls to a chat completion service because semantic kernel will log the Telemetry and it's got all sorts of useful information in there like the number of tokens that were produced and the prompt and the response things like that and if you're
doing this in production you can aggregate all these statistics and keep track of your costs and the times and that sort of thing okay so that's an example of how we can do a bit of summarization with these Technologies now let's move on to a slightly more advanced thing now which is going to be this Q&A chatbot thing so let's see how that actually works and then we'll look at a bit of code so let's go into another support ticket now so the first problem I notice here is that this customer is trying to talk
to me in German and I don't even speak German uh so that could be tricky but fortunately the summary gives me a a good indication of what they're asking for they want to know the PIN for Bluetooth pairing uh now I don't know what the pin is U which is why it's helpful to have this sort of AI assistant that's capable of searching through business data and finding the relevant things so let's try what does the manual say about this and you can see it does a search for Bluetooth pairing pen and it works out
that the default pin is 000000 z okay now do we know if that's really true well we make it give a citation and if I click this we can see it can take us into the actual product manual and highlight the exact basis for ex claim so we know it's actually true all right so then I could say something like write a reply to reply to the customer saying that okay and it'll write a reply great but of course in this case I need it in German and so it writes a nice little reply to
the customer and I can click this button users reply get rid of that that's a bur Al fix later hit uh send and it sent the message back to the customer in German cool how did that work well let's go and have a look at a bit of the code for that here's my assistant API and you can see that um it's not super complicated the main thing that it's doing is it's using another one of these I chat completion services and just like before it builds up a prompt uh saying hey I I want
you to answer customer support questions here's all the context with the product and so on here's the most recent message from the customer I want you to give me an answer okay so it will provide an answer and then in terms of being able to search product manuals this is something that semantic kernel makes fairly straightforwards it's got API where you can uh attach what it calls plugins uh which look a bit like this a class that's got methods that are decorated with these Special attributes kernel function okay so in this case I'm saying there's
a a call back that the language model can use if it needs more information called search manual and we need it to supply a search phrase and a product ID and then we can put whatever logic we want in there that goes off and gets you know search results from that and in this particular case I'm using a vector database and some search which is also provided by semantic kernel but you could do anything else you want you could use a traditional database full teex search it doesn't really matter what you supply some results and
then the language model can use that to give an answer okay so that's pretty straightforwards um one thing you might notice is the way that this button users reply appears sometimes but not always well that takes us on to this next matter which is classification it's very often often that you want to uh keep to determine what sort of class something fits into in this case I want to classify messages as are they suitable for sending to the customer or not so I know whether to show this button and there are a couple of ways
you can do it the first way I'll show you is just simply by using a language model and I chat completion service again so what I'm doing here is after we've sent the response we send a further message to the language model saying consider that answer you just gave and decide whether it's suitable to send to the cust customer or not and give me back adjacent object with a Boolean and then based on that I'll decide whether or not to show that button so that's all that's going on there um but there are other cases
where you might not want to use a language model for classification uh perhaps because you want to make it faster and cheaper still so language models are brilliant they're very powerful uh but they also take a bit of time it might take you know hundreds of milliseconds or even seconds to answer your questions and if you've got a high performance highspeed workflow system you might want to classify things faster and cheaper than that so I'm going to show you a way of doing that um and slightly surprisingly I'm not even going to use net I'm
actually going to use Python you might wonder why I'm doing that in a net event and the reason is not because I think you must use python in fact you can certainly do this with net the point is to show you that Aspire is very good for orchestration uh not just with net not just with Docker containers but with other Technologies as well so if you want to bring some python or Java or JavaScript or whatever else into your aspire application you can do that and I'm going to do a bit of of classification with
python here it's very straightforwards so in Visual Studio I've added this python project and you can see I'm constructing a fast API API server now if you don't know python don't worry about it this is a bit like minimal API for Python and I've got a couple of end points including this classifier and the classifier is going to take in some text and a set of candidate labels and it will determine which is the most relevant uh label for the text okay and it's doing so using this uh local model called mini lm2 which it
gets from hugging face which is a third party system and so that's able to do this classification very fast locally just on my development machine or on your server and to show you that really working uh clearly I'm going to do it manually on the commandline so if I get this text this is a python API and the candidate labels are animals programming or music you'll see instantly classifies it as programming whereas if I change the string to this as a python it classifies it as animals all right so uh what I want to classify
in this case is the incoming tickets question comment complaint or returns which is the most relevant label and you can see that when I did this thing about cats it was classified as a question because it is a question uh but if I go back as a customer and I say uh I need to return stuff or something else like that then when we look at the list um you'll see that this has now automatically been classified as returns so we could trigger different work flows based on that and Aspire is fully capable of starting
up things like python or Java or whatever else you want to use within your system all right so that's a couple of ways to do classification the last thing that I'm going to have time to show you is evaluation and in some ways this is kind of the most important one because this is the what makes the difference between just kind of like hoping that things work and actually measuring that they work and being an actual professional so I know that you're professionals you want to be able to know that your stuff works and in
fact how well it works so let's look at how we can quantify the quality of this AI assistant I want to reduce the whole thing to a single number that says how good it is right now how can we do that well the main way that people do this sort of uh development time evaluation is by having a set of uh evaluation data a set of inputs and desired outputs and then we can test it against the inputs and see how well it matches the desired outputs all right so what how are we going to
do that how would we get sample data well in fact all the data that you've seen it so far like all the product data the product manuals the customer support tickets all of that stuff has been generated through AI for the purposes of test data using this data generator project and if you get the code for this later you can see how that works and one of the things that it produces is a set of evaluation questions which in this case is a list of hundreds and hundreds maybe thousands of questions and answers answers for
example for product 35 what is included the answer is this stuff okay now we can take this as being the ground truth because this is the data that we generate first and based on this we uh we build all the product data and manuals and suchar out of it so for that reason we can consider this to be the truth so what we want to do now is ask our backend system all of these questions and test how well it answers them so what I'm going to do now is I'm just going to start up
my uh application in the background here and that's running now and I'm going to use a little console tool to run an evaluation Loop now you could imagine that we had a really nice UI for this and maybe at some point in the future someone will create one but right now we don't so just use your imagination because all I've got right now is a console app and this console app let me just start it running okay that is using some Aspire libraries to be able to call to discover and call in into the backend
service from this console application and it's looping through all those eval questions putting them into groups and then it's sending them in parallel to the back end to say you know here's a question here's another question here's another question and collecting all the answers and then it scores the quality of the answers and it does that by making another call to a language model to say how accurate is this and how relevant is this based on what the truth is and we get back a number and we can calculate the average of that over time
so we're going through right now and if you I I know the text is a bit small but at the moment it's averaging at about 75 um over time I happen to know from experience that in this setup it's going to average about 08 or something like that but the exact number doesn't really matter what matters is that you want the to find ways of making the number go up by making changes so you could try changing the prompt or the assistant logic or the vector database or the chunking or any other thing that's going
on in this system and then run the evaluator to see whether the quality goes up or down uh to give you a realistic example of that at an early stage when we were building this we noticed that it was hallucinating quite a lot and we would ask it questions and instead of looking at the product manual it just made up an answer based on nothing just a pure hallucination and at that time the score was about 0 61 so we realized that we needed to look at the manuals more so we added to the the
um prompt this one extra sentence if this is a question about the product we really want you to look at the product manual and just by adding that one sentence the score went up to 77 so clearly it's a good Improvement but it also made the time go up from two seconds to 2.6 seconds which you know also makes sense because spends more time looking at manuals and doing extra calls now but the point of this is that you the developer can now make tradeoffs in a principled manner you can make changes see whether the
scores go up or down whether the times go up or down or the costs and decide whether these changes are good or bad Okay so that's all I'm going to have time to show you I'm going to hand over to Matthew in a second but just to summarize from me uh there are many different things you can do to add AI features to your app uh chat is one of them but I don't think I would recommend starting with that I think that if you're starting you should look at things like semantic search summarization translation
classification all these kinds of things are much much easier to build and they can allow you to add some real value to your application over a fairly short period so I'd really recommend starting with some of those things although chat of course may be useful depending on your scenario uh I do think it's worth a bit of investment in having evaluation system from the start so that you can check whether what you do is improving or getting worse over time I think as a net developer you've got some great opportunities with the spire and semantic
kernel uh and the way that they work really well together uh but also be open to other things you know it's an emerging field right now and all the recommendations we give are going to change over time people come up with new patterns for evaluation or testing or whatever else so be flexible and consider using whatever is best in your scenario now if you want the code that I've been using in this uh demo uh that should hopefully be available by the time you see this at uh the URL that I show right there uh
so you can check that out and hopefully see how much of this stuff works on the inside so that's it from me I'm going to hand you over from to Matthew now who's going to take you to the next level with your AI understanding you're muted thank you Steve I love that demo because it shows how you can actually drive real productivity with your existing applications with AI um that's what it's all about right not just building shiny applications how do you actually make your own customers your own employees your own company more productive now
what I'm going to be talking about is kind of what's coming next Steve kind of mentioned all this stuff is still emerging there's all these new things coming out what should you expect in the AI space and how are we going to help you within the semantic colel team in yournet language so what I'm here to kind of share with all of you today is what we are working on within semantic kernel to wrap that chat completion service that Steve was talking about in an additional abstraction to make it easier to actually talk back and
forth with the chat completion service if you've been plugged into what's happening in the AI space you've probably heard this word agent pop up over and over and over again what we've done in semantic kernel just a a few days ago maybe a week ago we released our new agent framework for semantic kernel so I I could talk on and on about what an agent is how it works but I always find it's easiest best to dive into actual code uh to see what this new abstraction does so what I have right now is a
Jupiter notebook um which has some very simple semantic kernel code basically I'm running a uh terminal application a console application and let's just go ahead and get started by loading up all of our different packages uh and importing these statements and then we'll dive into what is an agent in semantic kernel and why it valuable for you so once these uh packages are installed we'll go ahead and import all of our statements and as Steve showed you can create these we call them plugins inside a semantic kernel that give the AI the ability to either
retrieve information to interact with the real world or in this case very simply uh getting the current time now in Steve examp Le he was using the chat completion service directly and making calls to it what we found though is most kind of those back and forth chats follow a similar pattern right you have a chat history you build it up you give it to the llm it gives you response back so what we believed and what the industry has seen is if you can wrap this in another abstraction you one you make your code
a little bit easier and you get a little bit more predictable way of working with these what we call agents so here what I'm doing we'll go ahead and click run is I'm building a kernel I'm adding in one of those chat completion services that uh you saw earlier we're adding in that plugin so that the AI knows how to tell a current time but what's new what's new uh in with inan Colonel is we can now create what we call a chat completion agent we can now create a chat completion agent where we can
give it a name we can give it an instructions uh so you don't have to Bubble or you don't have to combine all of that inside of a single uh prompt you can just tell it and then you can start a chat history and that instruction set will always be prepended to whatever to whatever that agent is working on we start our chat it's asking for our input we can ask simply what is the current time and the AI assistant is able to give us the answer okay not a whole lot new over just using
the chat completion interface uh by itself but what we're about to see is this also gives you the ability to use services that act as agents right so instead of having to build everything yourself with I chat completion you can talk to something like the Azure openai assistance API which if you're not familiar is a service that basically manages the thread on behalf of you so that you as a developer don't have to figure out where are you going to store that chat history how am I going to save that to something like Cosmos de
be that's all managed for you using the assistance API so we'll go ahead and play this one as well we're starting the same way we're creating our kernel uh this time we're creating an open AI assistant and what's new is instead of me creating a chat history object I can ask the service to go ahead and create a chat history for me or what they call a thread so that when I have this back and forth conversation I don't have to worry about saving that information so I can ask the same question of what is
the current time and it'll work the same way but I'm actually not going to do that quite yet the other thing that's really valuable about the assistance API is it also comes with built-in tools or built-in plugins so I can comment out this line that gives it the ability to check the time and I'll uh uncomment this one enable code interpreter what this is providing the AI is the ability to basically write python code basically do a bunch of operations in the side uh to answer any user's question and naturally python has some tools some
coding that it can use to tell the current time so when I ask what what is the current time what it's going to do behind the scenes is instead of using my plug-in it'll actually use code interpreter to write some simple python code we actually see it right now it'll execute it and tell me what the time is okay we'll go ahead and say bye now the last thing I want to show that's really demonstrates the value of this additional abstraction the value of having agents is with a common contract for how these agents work
together it's now possible to put them in something like a group chat or a process and have them use the same interface to talk back and forth and we're going to see just that this was popularized and started with a research project called autogen and we've basically taken those best Concepts the group chat the agents into semantic kernel and made them Enterprise ready so what we're seeing here is we're going ahead and create two agents instead of just one and art director and a writer you can see we have all these different instructions that we're
providing it um what they can do what they can't do don't waste time and we ultimately want to be able to create a group chat where we include these two agents the writer agent and the reviewer but there needs to be some sort of way of controlling the chat who gets to speak next when is the chat over in this particular scenario we're going to be having the agent reviewer basically determine when is this chat over when is the writer done writing something that is of good enough quality right and once it has achieved that
this chat will be over and the user will have that final answer so I'm going to go ahead and uh invoke this and what I am asking these two AIS to do these two agents is to come up with some marketing copy for uh this concept a way for multiple AI agents to collaborate on a single task EXA what they're doing right now uh we can see that first message and it starts off with a copywriter hey let's let's try this AI conductor orchestrating brains uh the art director's like uh no I don't like this
it's it's interesting but if it's a bit abstract can we refine it right um and so uh it tries again um and then it finally is able to approve this final message uh so that in this case a user can use it for their their marketing you can imagine in the example that Steve showed that maybe the summaries aren't that great or maybe the responses back to the customer aren't uh high enough quality what we found through research is having multiple agents kind of battle it out right challenge each other uh bring with each other
different uh perspectives or system instructions can actually make that final result even better right but but um this isn't the end all be all you can imagine these things going back and forth getting in Loops um uh just talking is frankly not enough for these AIS to always complete work right and so the last thing I kind want to leave all of you with is what is coming even further with Samantha Kel what else you should expect from the rest of the industry and that's really taking what we've learned through multiple ages working together these
multi-agent patterns and making them a little bit more predictable more deterministic so that you always get a result that you expect from your multiple agents and you kind of saw this in uh Steve's demo uh when they had to do that classification logic when he had to show that reply button he basically wrote some code that says hey reply and then check if a reply button needs to be involved it's very simple it's like do step one and then do step two you can imagine though as you build more complex business processes these things become
unwieldy and so different Frameworks whether it's autogen or crew AI have started building easier ways of abstracting agents and allowing them to create business processes and follow them uh and and also include humans in the loop so get excited for what's coming next for semantic kernel. net Aspire um it's an exciting time to be a developer a ton of new things are on the horizon um and with that I'll kind of conclude my section on agents and I'll go ahead and pass it back to Steve and we can wrap things up yeah I think that's
that's all from uh the two of us so we hope that is of use to you and that you will give some of this a try and quite importantly let us know how you get on we want to learn from you as well so thank you very much enjoy the rest of this event oh man okay okay semantic colel there's a lot there that we can do with that isn't there um thank you so much Steve and Matthew um yeah I'm gonna need to spend some time with this one that's that's some cool stuff listen
we've got a full day of events here for you as part of netcon focus on AI these are all recorded they're on YouTube and we're going to have a playlist available with all the recordings broken out for you at the end of the day today all right so if you miss it if you aren't able to stick around through the entire day have no fear we've got all the content available for you at the end of the day on a playlist so no problems there you can tune back in later you can dial us up
on the TV bring it up on your phone what have you there's always great netc content available on YouTube for you let me make sure I get you linked to some of the other content that we have for you we have some slides like this one talking about the event collection all of the learn modules all of the slides links to the videos they're all available in this collection at aka.ms netfocus a collection or you can take a picture of that QR code jump right into there and learn more about all the things that we've
built for you that we're presenting for you as part of this event all right make sure you check that out now we also have coming up an AI Challenge and a series of live streams for the next month netcon focus on a AI doesn't stop today go over to aka.ms netfocus a credential challeng live or once again you can click that QR code for the next month ending on September 20th there is great content for you coming up every few days you'll be able to go out there and this AI challenge we've got some tasks
for you to take a look at little bit of homework that you can explore and get some feedback on from our stream hosts all throughout the month make sure you check out the credential challenge coming up for the next month all right now I want to make sure that you also know we have a bunch of great sponsors that have been helping out check out some of these folks they've done a great job getting the word out so that other folks know about the netc focus on AI event and they've put together a swag bag
for you make sure you fill out that evaluation aka.ms sl. netfocus a evaluation 15 winners are going to win about $5,000 of prizes in that swag bag including software licenses gift cards you got to fill out the event survey and then click the link in the event survey so that you go through and can enter the swag bag raffle it's a second click in the survey that you must follow you can't just complete the evaluation all right we've got two more sessions for you first up is Daniel Roth talking about building interactive AI powered web
applications using Blazer and.net that's right up my alley all right and after that we've got our friend Bruno who's to join us and talk about navigating the world of AI with net building not just with large language models in the cloud but also locally now we're talking I like I like those smaller models working locally but first I want to hear about Blazer and AI Daniel take it away hi everyone I'm Daniel Roth the product manager for Blazer on theet team and in this session we're going to build interactive AI powered web apps with Blazer
and.net we're going to start by building a simple AI powered Blazer web app together uh we'll then look at how you can add additional AI capabilities and features using pre-built components like the net smart components we'll then take a tour of some of the great AI features coming from theet ecosystem Blazer is great for building AI powered apps Blazer's interactive server side rendering makes it easy to connect Rich interactive user experiences with Cloud hosted AI models you don't need to build separate AI endpoints to expose server AI capabilities to the client you can just do
it all from the server you can quickly add AI capabilities to your Blazer web apps using existing AI libraries like semantic kernel semantic kernel really handles the complexity of connecting to various AI models and allows easy integration of app specific capabilities and data Blazer's reusable component model also simplifies building AI powered apps instead of having to create each UI component from scratch uh you can leverage a rich ecosystem of available components okay let's just get started let's build our first AI powered uh Blazer web app I'm going to hop into here in visual studio and
let's create a new project let's create a Blazer web app and we'll just use all the default settings just make sure you've got interactive server side rendering enabled we'll go ahead and create that all right let's go ahead and get our app running so we can see uh what we've got so far for our our app you know blank canvas that we want to add some UI to there it is it popped over on my other screen so we've got our homepage here and what I want to do is I want to add a simple
you know chat AI uh assistant user interface onto this page okay so let's go back to visual studio and let's bring up that home component and we're going to need like a text area and a button and then display the response from the uh AI um uh the AI service so let's get copilot up let's get it to help us build the UI let's add a text area um that is bound to a string message field and a send button that has an async send event handler also add a bootstrap card that displays a uh
P tag with a response string response deal okay let's go ahead and see if that works let's get get a co-pilot to do all the hard work for us add some razor content here okay let's see yeah we got a uh we got a form a f we got we got a text area there with the button that looks good and there's our response and then we got a little a little bit of extra logic here for our send event handler we don't don't need all this code we'll just get rid of all that and
have an empty send Handler let's go ahead and accept that hot reload that into our Blazer web app and uh yeah that looks pretty good we've got some nice headings in here let's add a little bit of margin here let's get this side by side with Visual Studio let's add a little bit of margin on that uh that button so we'll find the class for that and we'll do a margin top two I guess and there okay perfect all right great and so now hopefully we can start connecting this to our AI service now I'm
going to use Azure open AI for this I've already gone into the Azure portal and I uh created an Azure open AI resource and deployed an AI model into that resource I'm using GPT b0 you know the latest latest greatest so you're going to need to do that yourself for U in order to to do this these next few steps um in order to connect to that AI service I'm going to use semantic kernel semantic kernel is a really great donet Library that's gonna help me connect to really any any AI model I want to
uh it'll handle you know sending the prompts getting the responses and orchestrating any AI related features so let's go ahead and add semantic kernel here to this app and we want this Microsoft semantic kernel package and then we're also going to want to be able to authenticate to Azure if you're using Azure open AI um so I'm going to add this Azure identity library and that's going to handle all of my authentication concerns for me so let's go ahead and install Azure do identity as well cool okay so now let's go into program CS let
give us a little more space and here's where we're going to wire up semantic kernel into our Blazer web app so we're going to do builder. services. add kernel right here now the kernel is the API that you use in semantic kernel to send the prompts get the responses you know configure what AI service you want to connect to and handle all the orchestration so that's good um we also need to say which AI model do we actually want to connect to I'm going I'm going to need to add some code here for Azure open
AI um I'm to save on some typing I'm just going to copy this snippet in it's really just two lines of code it's not bad um I'm going to pull out the parameters that I need to connect the Azure open AI service from config I've got this smart components config section set up that's just what I happen to name it you can name it whatever you'd like but then we're going to call add Azure open aai chat completion and we're going to pass in the deployment name for your uh AI model and the endpoint and
then you need to give it a credential to connect to Azure open AI now this default Azure credential uh call is that is comes from that Azure identity Library it's a really convenient API that can authenticate in a variety of ways whether you're in development or in production so in development it will actually uh pick up my user account in Visual Studio which I've also already in the portal um given permissions to be a contributor on my Azure open aai resource so once you've done that this will authenticate as me in visual studio and it
will it will just work and then when you deploy to the cloud you can set up like an app identity and make sure that has permissions for your AI service too okay so that's how you set up uh semantic kernel and now we just need to use it let's go to our home component and let's uh let's first of all let's make sure that we make our home component interactive we want to use interactive features so we can handle our button clicks and then we'll inject the kernel this is that kernel API that we set
up in Di and we'll just call this property kernel okay and then down here in our send method all we got to do is say response equals await kernel. invoke prompt async and we want it to give us back just a string we want the response as a string so we'll use that generic parameter for that and then we'll pass in the message that we got from the user okay that uh that do it let's go ahead and restart the app we did a bunch of work in di so we need to rebuild the DI
container and then let's look and see if this works so let's see uh hi wait refresh oh that must be I think that's a wrong I it must be an older version of the app that I was running all right let's get the app up there it is and let's say Hi how are you and then we'll go ahead and send that okay I hit send let's make sure that the message is being sent yep I can see in the logging that it's invoking the the model that's good oh great hello I'm here ready to
assist you how can I help you today awesome so we just connected our Blazer web app to the uh the uh AI service perfect now you notice it took a little while and in fact all the text came in in like one big block like it didn't stream in I want it to stream in as as it's being generated how can we do that well that's pretty easy instead of just say calling invoke prompt async we'll just do something slightly different we're going to Let's uh clean out the response initially and then instead of getting
the response as one string we're g to get a bunch of chunks okay we'll do kernel. invoke prompt streaming async this time because we want to do streaming and then we'll hand in our message now this call is going to actually give us an isync numerable and so hopefully get up co-pilot might help us with that and it does nice so we need to a uh await for each each of the chunks and we'll just uh concatenate those onto the response and then we got to tell Blazer that hey my state has changed uh so
please uh update the UI accordingly re-render the component okay so let's go ahead and hot reload that into the app and so now if I type in here let's say uh tell me something a bit longer I want it to have a bit longer response so we can see the the streaming really happening we'll go ahead and send that to the model and then once the response starts being generated we should see it start to to Chunk in yeah there it goes okay yeah and you can see it's doing a bit at a time and
it's you know come up with some interesting response about bioluminesence that's great okay so we got streaming working awesome now let's try something else let's ask it um what was the last question I asked you like does it remember things does it know my previous questions and its previous responses um and no it doesn't look like it does I think that's just the the the current question um so right now it has no like chat history notion like right now this AI model is acting in sort of a stateless way you send it a a
request it sends a response and then it forgets everything else so how can we give it um some memory like how can we give it a away to remember its previous responses and my previous questions well we do that by adding some chat history so let's add a chat history object here to my component we'll just call this chat chat and we'll new that up all right great and now what we want to do is like before we send our prompt our message to the chat service we want to add the user message to our
chat history we're keeping track of all the user messages and then when we're done we're also going to add the whatever the assistant sent back to our chat history so the response is going to be added here and then I want to send this whole chat history to the AI service um in order to get the response now to do that I can't actually use this uh colonel API anymore I need to kind of drop down a level to a more powerful API so we're now going to inject the I chat completion service we'll just
call this chat service okay so that's a a slightly more flexible API um but still use it pretty much the same way and we do chat service. get streaming chat message contents async okay so very similar uh and then I'm just going to pass in the the whole chat like like that okay so that's going to get all the chunks and render the response let's restart the app since we injected a new service and hopefully now it'll start remembering things like what we uh sent it previously okay so let's see how are you let's get
a ask get a first question there it goes I'm an AI I don't have feelings but thanks for asking okay now what was the last question I asked you let's see if it remembers that now you asked how are you okay we got some chat history now and remembers things cool um now this AI model doesn't currently like have any specific purpose it's just a general interaction with the AI service how do I give it a purpose like a Persona you know I want it to actually do something specific for my app well the way
you do that is by sending an initial prompt to the AI model to let it know like hey this is what you're all about this is your purpose in life that's called the the system prompt and you can send the system prompt using this chat history API so right here in the Constructor uh you can see here there's this system message that you can uh pass in initially to the to the chat history so for example if in here if I said want you to talk like a pirate right and then if we uh do
that and then if I refresh you know and then I ask it how are you now it's like a I be doing fine matey I'll be the winds blowing you today so now it actually uh has some Persona like it has an initial context that it can use to decide how it acts okay now for this app what i h want to do I have a system prompt here that I've already uh set up that I'm going to use and let's see I want this chat assistant to be a helpful chat assistant that demonstrates the
capabilities of azure open AI in a Blazer app um and then here we can also constrain what we don't want it to do uh if you're asked to do something like dangerous or hostile uh you should suggest playing a nice game of chess instead okay so you can constrain things as as well so let's uh hot reload that in and then I think I can just uh you know refresh the page and that should should work let's see uh what kind of AI assistant are you and let's uh send that says I'm an AI assistant
powered by Azure open AI integrated into a Blazer a good okay so now it's got a Persona and if I try to do something like like you know you're stuck in traffic you're just feeling irritable fire the photon Torpedoes right and see if it's going to do that and says it sounds like you're ready for S Intergalactic Adventure well I can't actually fire Photon Torpedoes how about we engage in a different kind of challenge how would you like to play a nice game of chess so we kind of constrained it to not do things that
we don't want it to do so that's how you can use your system prompt uh to give your AI assistant some uh some purpose in life okay now what I want this AI assistant to be able to do it in my blazer web app is I'd like it to be be able to help me with some some styling um this Blazer web app is using the same design that Blazer templates have used I think since we first shipped Blazer back in like 2019 like five years ago it's got this sort of Purple Gradient here on
the left hand side U and while it's beautiful um maybe it's not the be all end all of web app design U maybe we can get the AI assistant to to help us out with that okay so let's see let's go back to visual studio and I'm going to add another line uh to my system prompt to let it know that this is what I want it to do I'll just add it right here so when requested you can change the theme colors of the app using CSS colors okay and then now I need a
way for the AI model to actually be able to change this the theme colors of my app how can I do that uh well I'm going to do that using a plugin a semantic kernel plugin plugins are super cool there're a way that you can expose functions and method meod is from your your application that get advertised to the AI model and then it can decide that if it wants to to call them and it can then send a response saying yeah I'd like to call that function with this data and then your app can
decide to whether it wants to allow it to do that and then feed any data that comes back from that call back to the AI model and so forth so it's a way to expose local functionality and data from your app to the AI model okay so we're going to add a plugin to this Blazer web app let's add a class and we'll call this the theme plugin okay and I'm just going to copy in some pre-written code here just to save on time uh let's do this okay so we have a theme plug-in class
and it's got a single method the set theme colors method and it takes in two colors and it's um I'm using uh this is the C primary Constructor feature I'm I'm pulling in the IGS runtime service from Blazer so I can do JavaScript inop calls and I'm just making two calls down here to set two uh CSS variables uh in JavaScript to change the colors of the app okay so I need to actually set up those CSS variables let let me go up into my my app CSS styling and I'm going to just add a
little bit of CSS here toh create those two CSS variables so a navy color and an indigo color I'm GNA try to keep it as the app is currently and then in the layout and here we're going to change the gradient to actually use those CSS colors so right here is the linear gradient that makes that's that Purple Gradient on the side I'm just changing it to now use those two CSS variables okay all right cool so now we've got the theme set up we've got a plugin that um can call into JavaScript to set
these CSS variables change the theme uh how do we now register our theme plug-in with our application so that the a uh can call it well let's go into program CS we need to set it up with the kernel now one way you can add plugins is right here off of AD kernel there's like a plugins collection that you can add you know add from type or add from object um that works really well if your plugin is like a Singleton it's just a single instance in my case my plugin is actually I needed to
be a scope service because it depends on that ISS runtime service which itself is also a scope service so I'm gonna have to do something a little bit more complicated but not too bad uh I'm going to do add scop down here and then I'm going to call this kernel plug-in Factory helper method to create my plug-in from my theme plugin type this returns a kernel plug-in instance based on the theme plug-in class that I'm feeding it and that should then get picked up by semantic kernel so that we can uh let the AI model
call it okay so a little bit more elaborate di going on there but not too bad all right I have to do one more thing you actually have to tell semantic kernel that it's okay for it to automatically call this these kernel functions okay so I'm going to go into my my home comp uh component and I need to add one setting right here this guy right here so this uh open AI prompt execution settings object We'll add a using statement for that it needs to set up this tool call Behavior Auto invoke kernel function
so that way it'll be able to call our kernel function on our our theme plugin and then we just got to hand that in so let's hand in the settings right here and then I think I also need to pass in the kernel itself now to my uh chat service all right and that ought to do it let's let's restart the app we messed around in di so we got to rebuild the DI container and then once that's up and running hopefully now I should be able to say um please uh set the theme colors
for my app to something hip and cool okay let's see what it comes up with all right so sent our prompt okay how about we try a combination of modern teal and stylish coral and it changed it it did it it called my uh my kernel function awesome and we can even ask it to tweak the colors a little bit like um uh please improve the accessibility uh given that the text is white okay so we can have a little conversation to adjust the styling a bit and it should hopefully darken things a bit shall
we proceed with this new theme oh it's asking me to confirm yeah go ahead there it goes okay so it's updated the colors again awesome I don't know maybe that's what the new blazer web app template should should now look look like all right so we were able to connect our Blazer web app to an AI service we're able to do a chat uh assistant that can stream in the responses have chat history and then we were even able to connect it to some local app capabilities uh using semantic kernel and and plugins pretty cool
now this AI assistant UI is still pretty simple um if you wanted to have more elaborate features like suggestions and have it access more data and even site that data uh you can do that too um for that I would encourage you to look at a different sample that we have which is this eShop support sample you can find this on on GitHub I think Steve Sanders showed it earlier in the conference today and if we look at this app have still running pull it out over here get get it going I'll show you what
this looks like all right this is a a pretty elaborate app that's using um doned aspire to coordinate a whole bunch of AI related processes and services but here in this staff web UI uh application this is a UI that um a support agent can use to handle like support tickets last all right so here's like a grid or you can look at all the support tickets that are coming into the web app site and if we pick one of these you can see that uh it shows the chat interface with the customer it's got
a bunch of nice AI features like this H summary of the chat with the customer and so forth but even better is it's got a full uh Blazer chat UI over here on the right hand side that will display the full history you can uh pick from suggestions uh when it's trying to answer the customer's questions it can see the customer chat and use like product manuals and documentation like here it's even showing me how to respond to this customer based on some content from the the product manual and so on so much more complete
sample be uh be sure to check that out so if you want to figure out how to build a much more uh fleshed out uh Blazer chat user interface cool okay so that's pretty neat that's how we can add AI features to our Blazer web apps ourselves like building them ourselves and there are plenty of ways to add AI capabilities to a Blazer web app that uh aren't just chat you can have features like text summarization or smart type suggestions or semantic search um Blazer's component model makes it really easy to package up chunks of
AI related uh web UI so that they can be reused the net smart components are a set of simple drop in components for Blazer as well as MVC and razor pages that make it easy to add AI features to uh to your apps for useful scenarios um now we've made the code for the net smart components uh freely available as reference implementations they're basically a bunch of samples uh so that they can uh help bootstrap an ecosystem of AI enabled net components um let me show you what the net smart components can do you can
find them at akms sl. net components all right so let's hop back into visual studio and let's go to the dot net smart components there we go all right here we go so I'm g go ahead and get this uh running this is the sample app from that smart components repo okay so first thing let's look at smart paste so smart paste is just a this smart paste button and what it does is like you know sometimes you have these have a form with form Fields but the data that you're trying to enter into those
form Fields is coming from free form text like maybe you're getting an email from someone or a text message and you have to like copy and paste like the name out and the age and put it in each of the separate Fields wouldn't it be nice if you could just like copy all the text and just hit smart paste and have the AI model figure out where the fields should go for you so that's what what smart paste does you can see it put the right name and the name field and the age and the
age field uh here's another example this is like a bug submitting uh uh form if I grab one of these like email texts with a bug report and just oh I don't have Smart paste in this feature can I can I add it really quick let's go and find the uh the bug reort form and let's just add a smart paste button right there yeah now I got my smart paste button just hot reloaded in so I quickly dropped in a Blazer component and boom now people can add uh bug reports using this form by
just clicking uh smart paste it figured out what the Repro steps are what the project it should go in and so forth so that's smart paste and it works with any form on that first form that we were looking at this was actually built not with like you know generic Blazer components but it was actually built with the raden uh component Library raden's a great open source and free component library for Blazer and smart paste works with their components as well there's the RADS in template form and and so on okay so that's smart paste
um Smart Text area is a text area where it will um suggest completions like let's imagine this is an employee query uh response form where you can ask questions about like you know HR policies like what's your employee vacation balance like your employee vacation balance and you know the person doing this be like oh I don't remember how much it is oh yeah it's currently 10 days you can find the policy what's the URL at oh it's got nice semantic smart completions for me that you can configure like if we go look in the um
Smart Text area uh sample here you can feed it a bunch of phrases that it can feed to the AI model to use to help provide those those Smart Suggestions including all the URLs that you might need to include or any contextual data that might be useful okay so that's smart text area and then lastly uh smart combo is a a semantic search enabled combo box like maybe you have a big long list of items and you don't remember exactly the right item to pick for an expense category you've got a plane ticket but you
don't remember what the category is um semantic search will match what you uh typed in semantically with the list of items so you don't have to remember exactly word for word what those items are um another example is like maybe you have a uh an issue you're trying to remember the right issue label and you know it's a mobile issue but you don't know which label to use ah okay it identified the mobile related issue uh labels for Android and iOS excuse me all right so that's cool so that's semantic search and this is actually
using a local model uh calculating local embeddings uh using that that local model and you can just drop it into your existing apps so those are the net smart components it's a way that you can have U Blazer components that just add AI features directly to your web apps without you having to do a whole lot of work all right now the smart components are actually they're just samples they're like ways to to get you started with uh building AI features but there are also pre-built AI components available from the net ecosystem so let's take
a look at some of the Great Net AI components that are coming from uh you know the big component vendors of the world like teler uh Dev Express and sync Fusion now taric already provides a pre-built AI prompt component to streamline integrating AI services in your Blazer and MVC Ops it's fully customizable through templates and events uh and it supports uh globalization localization and right to uh right to left rendering uh terer also has several experimental smart components that they're working on that you can try out uh including uh smart search integration in their data
grid and also in their uh combo box controls and they were kind enough to give me uh some some demos to to show you let's look at some of the the telip components okay so let's go back here into visual studio and let's bring up teler okay so here's the teler sample apps let's first first take a look at their uh built-in AI prompt uh component for for Blazer and this is already uh shipped and you can use it today so this is uh going to show the teler PDF viewer um that'll display some you
know teler PDF document with a whole bunch of text in it but what's neat is it has this AI assistant button that we can just click it's got a place to put in the prompt it's got suggestions that it displays it will display the output so we can ask it something like I don't know how many components does teric have and that should be able to look in the PDF Doc and find an answer tcraft has over 1,250 modern feature Rich components is that is that really true let's see did it hallucinate that uh yeah
down here it features 1,250 modern Rich components so it's actually pulling from from the dock so that's a pre-built AI assistant UI that you can just use and then for some of their Blazer experiments they have um this is their toric data grid that's got semantic search integrated so here it's displaying like a bunch of food categories in the grid and we can search with specific foods and it should find the appropriate food category like if I search for milk we find dairy products I search for cake then we find things like desserts and candies
and Confections so semantic search and the data grid that's pretty neat uh and then they have their own version of a smart combo box this is pretty cool like it actually has a you know drop down shows you all the options that you can look through and then if you start searching like for pasta it'll show you all the pasta related uh food items in that combo box so smart search and a pre-built AI prompt component from from T pretty pretty nice all right now Dev Express um they're working on a bunch of AI powered
enhancements uh for their upcoming release in December so this is stuff that's in the works uh they're adding uh integrated apis uh to make it really easy to connect to Azure AI services and really any model uh with Dev Express UI components um they're building uh AI assisted text processing in their text editing components um they're also working on a pre-built Blazer chat UI for creating your own co-pilots and chat assistants uh and what's really neat is they actually have uh working on integration with Azure AI assistant which is a new API a new stateful
chat API U with support for persistent chat threads uh access to like multi file formats that you can feed into it uh and and also powerful built-in tools or plugins that you can just you know use without having to build them yourselves uh let's take a look at some of their new stuff they actually gave me an early preview of some of these new features that we can uh try out all right so let's go and look at Dev Express all right so if we run this all right cool so here we have the uh
this is the dev Express data grid and this is showing a whole bunch of like project information a bunch of work items and they priorities and their state but what's neat is over here on the right hand side we have the dev Express uh chat UI component that's just there ready for us to use uh let's ask it a question uh let's ask it which specific task should the team focus on and why we'll feed that into the model and that has access to all the data in the data grid so it should be looking
at all the this data and then trying to figure out okay well let's look at priorities let's look at the status and see if we can figure out which which items are are best for the team to work on there we go oh okay so we got a high priority task to check the register assigned to mic roller and status is new so that looks good and we got a bunch of medium tasks that we should probably do next and then the low priority desk so that's awesome yeah that that that looks exactly right so
that's uh the chat UI integrated with their data grid uh here's another example um where the chat UI is integrated into their report viewer so this report viewer is like displaying reports like a in like a PDF format but again we can pop out their AI assistant uh chat assistant uh UI and here we can maybe ask questions about this uh market share data let's see let's ask it uh how did the market share change in in India okay let's see Market in India went up by 9.52% in March and then down by 21.62% in
September is that is that actually correct and uh yeah that looks that looks right that these these values are in fact coming from that report so that's cool so that's the uh chat UI from Dev Express they also have really great Rich Text editing components like this is like a word processor component that they ship and then now they're working on AI integrated uh features so you can ask it to like translate uh the word word processor into German for you using AI or you can ask it to like summarize some text for you let's
summarize this big long paragraph using Ai and there's much much shorter version so you can get the the the piy read okay so those are some new features coming from Dev Express you know a pre-built chat UI component integrated with AI services and integration with their Rich Text editing experience as well lastly I want to show you some stuff from sync Fusion uh sync Fusion is also building out AI features uh forn net with several new features coming in their upcoming um essential Studio 2024 volume 3 release uh in September uh this release will include
a Blazer AI assist uh component uh for integrating AI Services um their AI assist component can send and suggest prompts execute commands using uh toolbar options and even uh and display responses in an easy to use format uh it provides toolbar options for copy edit uh like and unlike and you can of course add your own custom toolbar uh options too uh so let's take a look at what they've got with their AI assistant component uh let's hop back over here into visual studio and look at sync Fusion not sure all right and this is
again early drop early bits from them so this is all still in preview let's go ahead and run it okay so here is sync Fusion here's the their AI assist component really nice chat UI it's got suggestions it's got these like uh toolbar options for copying and saying like and unlike we can send uh uh a prompt into the model shows the whole chat history there and renders the response so nice pre-built UI so you don't have to to do it all yourself they're also experimenting with the uh net smart components like here's a sync
Fusion version of smart paste that they built that uh um is integrated with the sync Fusion Blazer components like we can copy this text and again fill out this bug report form again all built with sync Fusion components and they also are working on their own Smart Text area again it will you know do Smart Suggestions for us um you know you see it's even typing for me it's like guessing what the complete next sentence will will be pretty cool using the AI model so those will work with the syn fusion theming systems and and
and so forth so those are experimental and working on those as part of their upcoming release all right awesome so that's what I wanted to show you today uh so in summary Blazer and.net can make building your next AI powered web app really easy you can build AI features yourself uh using powerful. net libraries like semantic kernel or you can leverage a whole ecosystem of existing donet AI components uh here is a summary of the samples and resources that I mentioned in this talk so you can go and look at them and try them out
yourself thank you thank you very much for listening and be sure to check out other ways that donet can help you build your next great AI powered app at aka.ms netfocus AI collection hello everyone I'm super happy to be here talking about AI this is a full day about Ai and my name is Bruno Bruno capano I always forget about this I'm part of the clo Advocate team I focus on net and Ai and today I want to talk about local models how we can start using local models and how we can move to the
cloud and how we can evolve the things that we can do locally in the cloud this is going to be a fun one so this is very quickly the overview of what we want to do I'm going to spend a lot of time coding you know me but I want to set up the context first to talk a little about small language model 53 and more so small language model what this is probably new because we always talk about large language model but the best way to introduce small language models is to think about these
very amazing big models like GPT for but in a lower lower scale in a scale small enough that you can run it in your own computer you can run it in your own PC your Mac your laptop you can even run this in a Raspberry Pi in very small device and there are so many advantage of using this models the first one is that of course they are not as powerful as a gbt 4 o model but they are very very good think about that the best 53 model is probably as good as a gp3
3 GPT 3.5 or so so there are a couple of things that we want to to to S here because they are very very good and of course the one that I'm going to use most of the demos today is going to be five3 five3 is one model that we release at Microsoft the Microsoft research team did this there are different versions different flavors here we have the mini with 3.8 billion parameters that we can use we have one that also do Vision which is amazing and then we have of a mini medium and the
big one that we can use and there are also versions available to use in Azure AI studio in hagging face locally with a Lama to have a nonix room time hey we are going to see this and the whole idea today is show how we can start here how we going to start to use this and later maybe move to the cloud this is what we want to do so switch back to the code very very fast first of all the the first thing that I want to do I talk about 53 and we have
a repo an amazing repo if you search for this this is the 53 cookbook this is a great place with tons tons of samples of how you can use 53 how you can find tune 53 how you can use it locally on the cloud and more and if you even go here you have a c net LA sections that is going to guide you step by step of how you can download the model because remember this is local models so you can do something as popular as hey I want to do a g clone of
a model from hugging face and it's going to un lad the model by the way this is the one that I'm using today for doing this and then you have some step-by-step scenarios of how you can use this with semantic kernel locally in code spaces and more so this is one of the most important parts go into hugging faceing example and don't lad the bot again you can use the 53 cookbook at the entry point of everything that we are going to see today and how you can use this how you can start to use
this okay one of the version that we have in the in the in five three to use it is an export of Onyx if we go back here and see the version that we have that we have we can see that hey for the mini 4K there are kind of different versions but the Onyx is the one that we want and why Onyx Onyx is standard about this is kind of a nice standard and we have libraries in Windows and in net and in other platforms to basically access a model any model that is supported
at Onyx we know how we can interact with the model so here we have a c project that is going to using a path of a model we have here the path we are going to create and use the library micros Microsoft ml Onyx runtime genni and we are going to low out the model we are going to create a tokenizer to basically manage the information that we want to use with the model and then in an eternal Loop what we are going to do is is we are going to build and this is important
here we are going to build the specific string that the model requires to understand the system prompt understand the question and give us an answer and we are going to generate the parameters and send this to the model so this is kind of a I mean this is probably 70 lines of code part of the the demo that we have here and it works and it works fine the if we run this here let's do a net run what this program is doing right now as I said is loading the model in memory and we
have a loop where we can say hey my name is Bruno and now the five3 are going to start to answer hey hello Bruno how can I help you today what is my name I can ask the name but because we didn't Implement some kind of chat history or memory here the model doesn't know what can I do the model doesn't know who I am but what I really like about this is that if you are a fan of semantic carel and if you know me you know that I am we have the chance to
do this with semantic carel so what we can do here is I created here an example where I this is an extension of semantic kernel that allows us to use semantic kernel as the main connector for the models but interact with the Onyx model again this is all part on the 53 and what we are going to do here is we are going to go back to BAS we are going to have the location of the model the I3 models like whatever you want to do it then we are going to create a caral add
this Onyx fromtime chat completion Ser service and then build the model and after that we just create a chat we create a history and in a loop we start to ask questions to the model this is literally the same as the previous one but using semantic so we go back here and let me remember the name of this one this is lab three semantic carel 2 and we do net run what we are going to see a lot of information on this is semantic gel in the back low in the model take some time depending
on how big is your machine and then I have here in the bottom hi by three hello how can I help you today my name is Bruno can you help me and here we can see hey of course Bruno I'm here to assist you what do you need help with and I can say what is my name and because by default the chat completion service can handle history hey your name is Bruno and yeah so it's kind of nice how literally with a little little little extra lines lines of code using semantic kernel syntax we
are here in 60 lines but this is an amazing sample and hey this is all running locally remember this is important and what we can do with this if we run local if we start to do some kind of more complex scenario think about we want to do kind of a have information in the memory I want to start to have some store information in a vector database in the memory and use a local models to query the memory and do these kind of things there are several parts we can do here but we can
run everything locally and the whole idea of this rag scenario is that I have a question here and the question is what is oh what is Bruno sorry about that what is Bruno faite super the the best amazing superhero for me the model will won't know this and I am going to ask this and I have using I am using here Spectre console output to make it a little beautiful but once I have the response for the model which is usually hey I don't know Bruno I can answer this question give me more what we
are going to do we are going to create a memory this is going to be a volatile memory here we're going to create a local semant local embedding a local embedding service to basically generate these embeddings and then we are going to ask the same question using this PR hey answer the question that we have here so what is my the superhero that I like more using the memory context and with this we are going to have the response and and remember again this is all going to run using the local model part that we
have here so let's go to the terminal again like make this a little big there and let's loow out the one for rack there it is clear and netor and I'm using this here because I added some formatting as you can see here so it's nice to have this so first question um what who likes Batman oh who likes Batman this is part of the response that we have here I didn't change the question so the first question that we want to answer is who likes Batman and the 53 memory respon is going to say
hey Batman is a super popular character appears in a lot of places yada yada y so we are not going to have an answer but if we take a look at the code here if we take a look at the code that we have here when we load the memory database what we have here is a set of of items and the items are okay gella has the best superhero for Jella is Batman the last superhero movie that Jella watch was Guardian of the Galaxy by the way Jella is the friend who helped me to
build this long time ago then Bruno best superhero is Invincible I the last superhero movie was Deadpool and then I didn't like eternals so when I asked the question about hey who likes Batman I have a very very big response about the model that Batman is a character y y y but when I ask the question again when I ask the question using the memory he going to say hey based on the memory content provided J favorite superhero is Batman if before Jella therefore Jella likes Batman and if we go to the facts here yes
we have here that gella favorite superhero is Batman and we can change the questions here we can do more if I do the question and because we are using this brag locally if I use the question instead of in English if I do the question in Spanish which is what is the favorite superhero from Bruno but in Spanish this is qual super faor of the Bruno so we run this again what we are going to have is that first the response in Spanish from assistent virtual I don't have information so about this individual and if
I ask the question again he's going to say hey Bruno favorite superhero is Invincible which is kind of nice this is what we have I'm by the way you can test this in other languages like French English Spanish hey there are so many things that you can do and remember this is all working locally we have a local model here which is given us the chance to create these embeddings and do more but when we started to do this when we started to aim for this for this scenario one of the idea is okay and
how can I use this how can I use a GN form in example here that use a local model and do more so what I did in order to see this I searched for a chat winform application and go I went to giab and I start to see okay give me some samples about chats in win forms and I like the third one here the a green Tada I clone this one I fork the repo I clone it and start to work I make a lot of modifications here but I get to a point that
I build this so let's take a look at how what how we can do this here so let's close this one and what I build in order to show this is kind of a nice one so let's go here focus on AI close and close and what we have here is a chat win Form application you can see here this is the desktop application then we have an API that is going to manage the chat that is going to manage the the calls to the model and more and it's all going to be wrapped with
Aspire why Aspire hey because Aspire is cool and it's also give us a lot of information in order to see how we can use this we can trace this we can see how it working so here initially this is the the server and let me add some code here in the server so let's maximize this I have some Snippets here I use this and I do this usually a mix between visual studio and visual studio code and what I want to do here is basically when I am Builder this is the program the API so
I have the controllers the endpoints the Swagger stuff I add configuration I create a chat history here and then let's do a local model here so what is going to do now it's going to lower the model as we seen before and it's going to answer the question and it's going to add this into the Builder services and then I can use it and the way that I use the here is in the chat controller which is going to get the question question this is a class with a question and name a couple of information
is going to ask the question to the chat completion service and it's going to give the question response later so it's a very simple API so let's run this using Aspire and let's take a look how it works and I have here somewhere this is the window the win application as you can see a very simple one and I can say hey my name is Bruno so if we go to see the task here we are the we are in the server we have the b in breakpoint here and what is going to happen that
is going to ask a question to the chat completion service which is basically answering let's remove this which is basically answering the question here so hey my name is Bruno and we are going to say here continue and we're going to have the answer if we go to the traces here we are going to see how it's calling the back end and we have this and the final one is has the response this is because Paul is making the ret the retrice for this but remember that we talk about hey Bruno I'm feature friendly AI
what's in your mind today I remember that we talk about hey there are so many different models here I have the chance to use uh 53 53 small medium the vision one and hey you can download this you can clone this locally but there are other options that are very good in order to use and test this model so Onyx is amazing Onyx is literally kind of the direct access when you have your models in the dis you can use Onyx to L of the model but then we have other options and third party applications
like AMA which are very great because ol Lama in example you can download Lama to run in Windows in Mac or in Linux and then you can say hey amaama don't low the model and we have llama models here we can have the five3 models here and more if we go to the 53 models we have this version of the models the big one the 14 billion parameters the 3.8 bilon parameters and more and you can download this and have it locally and there is also an option here in ama where you can run AMA
as a Docker in Docker in a container where is my Docker somewhere here here order it is and when you have Ama here working in the docker container hey you can do amazing six Lo so let's start here and because this is up and running already let me open a terminal and let me show you the models that we have here so if I run or Lama list is going to show me here that okay I have pi3 the big one 14 billion parameters 53 by default this is a small one the mini then I
have Lama and how do I run this how do I use this if I want to if I want to use this in my code because we are using semantic carel and we are using kind of the basic view of what we have when we l the model instead of doing this what we are going to do is we are going to Cod this code and what we are going to create we are going to create a chat completion service and we are going to create a new open AI chat completion service where we are
going to say the name of the model the URL and then we need the AP API here because it's a it's a default parameter I'm sorry it's not an optional parameter but we need to add this but with this we are going to have a chat completion service that is going to help us to have this so no changes in the code just creating a new chat completion service and now we can go back to our project here let's close all of the windows and let's run this and when we run this we are going
to start to see how we have the calls TOA so let's run this okay we have here the the form and if we go here to the logs ofama we can see the logs here and if we go back to the form where is my form there is my form and I ask a question I can say hey my name is Bruno can you solve math problems we start to see in the back how it's lowed in the model it's going to take a couple of seconds the first time because it's kind of warming up
the model we can see very very small here that the model took two seconds to be loaded here and the model name is somewhere here in the top where is the model name somewhere here we have the name of the model where it say Pi three this is the one that we are loing and if we go back here say okay here Bruno absolutely I thrive on numbers like a starfish does in the sea yeah y so if I do so I don't know two plus two again we are going to see a new call
here in chat completions the third one here and it's going to say hey yes 2 plus two is four and right now what we are using is a is a model that is hosted in Docker with a container that we have from Alama if we go to the traces here and we start to see here the the calls we can see here basically the win form that calls the chat that calls internally the do container which that is running in Local Host 11434 and hey if I want to change my model from 53 to can't
remember the the name of the other ones let's go back here to the console if I do a Lama list and I copy the the Lama model what we can do now is instead of running the model 53 let's run the Lama 3.18 B let's say the name also here this is Lama 3.1 and then we are going to run this literally and again the first time you're going to take some time because you need to warm up and connect to the model but if I go here to the logs and I say hey we
are going to see yes it's loading the model there it is the the name there it is the name of the model this time I find it metal Lama 3.1 8B instruct and we have the answer here so this is a super easy way to change model and this is also a great way to basically run a Lama to test your model here and there and if we go to the if we go to the traces here we are going to start to see again that we have calls from the front end to the to
the back end and to the Docker container with the costs and there are so many things that we can do here remember this is just a model that we use for this if we want to do rock if we want to do more we can literally switch and start to do amazing things but how about going an extra level here and the extra level is hey I already changed my model to instead of lower the model from a local file call in a URL in this case the URL is a local Port from a Docker
container ER but how about going to the cloud what happen if I want to run a Model A 53 model example that I hosted in Nal AI Studio that I created there so hey literally with a simple change in the line we can do this so let go let me show you here this is ashure Studio there are a lot of models here we have Mistral models Lama models 53 model GPT models and more and here in the deployments what I have is a couple of models I have a 53 let me open this a
little I have a 53 Mini model and then I have a vision model here so what I am going to do I am going to change the code to basically use this model so let's go here instead of Local Host let's put that URL let's copy also the the app key now we need the API key and let's change the name of the model which is this one so now we are going to call this is a 53 model fromer I'm just changing the name but what just to recap here what we are going to
do now it's call a model that is hosted on the cloud and if you want to see how you create a model here is just go to the deployment right click new model select the model choose the way that you want to use it and that's it so let's run this and let's see if it works so we have here the model we have the chat and say hey my name is Bruno and hey Bruno we come here we have have a web from lims I don't know it's always answer a couple of silly things
but if you go to the traces here and take a look at the trace hey this is kind of nice start on a win form call the backend call the cloud we have the same model ra up and running here and what is important here what is really really nice as is that we also have a vision model remember that at the beginning when we talk about 53 and when we were here I showed you that we have a vision model I can deploy a new model and can deploy let's search for five and we
are going to find that we have find vision this is a specific uh one that can analyze images and more this is a lot of information here by the way you can fine-tune the model this is amazing we don't have time to take a look at this but fine-tuning models it's great but I already have the vision deploy I already have the vision deploy here in the cloud so what we can do I already change my applications to support this so let's make the app use the vision model and also let's make the app analyze
a couple of images so I'm going to change the endpoint I'm going to get the key and I need to change the name the name of the model is 53 Vision 128k instru two so I have this and the main difference here that I have is that when I am working in my chat controller when I send my message I need to double check if I have an image so what we are going to do here is if the question has an image create a collection with the text and the image and send this to
the model that's the only difference that we have right now so with this up and running let's give it a try and let's change the name to five revision I like five revision okay so we have here let's select one image I have a couple of images I have this image of my cat please describe this image and this is the image that we have here which is my cat and a dog in the bag and hey it doesn't understand theage oh my God this is what we didn't want to do let me create the
model just in case I put some of the can you read let's give it a try can you read the text let me update this just in case and run it again asking to I have a couple of license here with with Z stuff so read the text in the image so let's give it a try to see if we can use this so you can see here that we have JD I am and it doesn't is not reading the text something happened with the mod probably is going to be used for another third party
another demo and somewhere else but if we go to the traces and this is what I want to show hey we have here again a call from the win form to the backend to the to the vision service and we can see here that hey this is the vision service that we are using and this is how it working right now by the way we can even do this locally we have here I have here in the in the semantic carel I have a sample that is using the vision model to analyze an image Andy
it takes sometimes but this also works so if I back here and start this is the image that we want to analyze working there and if I go here this is the Mage which is my cat and my dog and after literally one or two minutes I am not running GPU are working on CPU this is that we are going to have a description of the so quick recap of what we see while this is working we see how we can start working with local models we see how we can start to working with the
local model that you have on your disc 53 is one of the most popular examples Lama 3.1 is super popular with then we move to the to use this model for a docking container and literally changes in the chat completion definition us carel two lines three lines of codes and that's it and then we move the same lines for going to the cloud we deploy a model to the cloud and when we have the model in the cloud we have the chance to literally without changing the code have all of this those capabilities and we
can do rck we can do search we can do a lot of things and if we are using the 53 models we can even fine-tune the models to be used there so this is an amazing and amazing chance to basically use these kind of things so this is still working as I say it's going to take a couple of minutes I am going to close the demo here ER invite everyone please enjoy the rest of the the conference is been a pleasure promise to have a better video with the cats recognizes there are and thank
you very much it's been a pleasure see you next one Goodbye Oh man local models that's what I'm talking about that's going to help me with my development story and be able to build things a little bit faster locally I like that I like that a lot so let's um let's talk about our next sessions that are coming up we've got a session from Matt soak up and Roger pinecomb about open Ai and Azure open AI that net SDK and how we can use it for both Services we've then got Costa pan and xiaon Jang
talking about uh building agents patterns and practices for automating business workflows so two more great sessions coming up but I want to make sure that you know some of the other things that are going on as part of netcon focus on AI I have some slides for you there we go check out all of the learn content that's available for you you can take a picture of that QR code head on over to that AKA link aka. ms.net Focus aarn we've also got some great information for you about an AI challenge over the next month
we've got live streams coming up we've got a little bit of homework for you to do if you'd like to try out your hands at building with AI and net you can follow along at aka.ms netfocus a/ credential challenge live all month long ending on September 20th you can take a picture of that QR code and learn more about our upcoming AI Challenge and of course we've got to make sure that we call out NETCOM 2024 if you've enjoyed this event so far make sure you join us for our 3-day event coming up it's all
on online it's all virtual and it features the net9 launch join us November 12th through the 14th atnet con.net there's a save the date button there so you can get a little bit of a calendar entry for your outlook you can also take a look there if you're a prospective speaker we've got a call for Content there that you can click through and join us because during the third day of the event it's 24 hours long we've got speakers from every time zone joining us and presenting and talking about their content their cool things that
they're doing with net as part of netc 2024 all right it's time to get ready for our next session about AI open Ai and Azure open AI with our friends Matt and Roger take it away hey everybody Welcome to the open AI plus azure open AI session and this is a super cool session because we're not just going to be talking about open Ai and Azure open AI rather we have AET convergence story that's right there is AET SDK for open AI my name is Matt so up and I am joined by my good friend
Roger pincomb and Roger why don't you to introduce yourself real quick hey there I'm Roger Pome um I'm pretty invested in this because I actually built the first open Aon uh SDK for.net way back in 2020 um and I've been really excited to work with the team to help them launch this new official version yeah so let's jump right into this and I really want to hammer home that this is the official openai library for.net and so that comes with a ton a ton of benefits like it is official the official library and so what
we're doing here now is that the has feature parody with python and JavaScript and so on the library so day one we have support for everything so we have the latest and greatest open AI features and models including like GPT 40 and assistance support from day one and we have a nice unified experience across open Ai and Azure open AI which we'll be demoing and showing so we're not big fans of slides here Roger and I so we're going to jump right in to some demos and so Roger you're going to take it away from
the open AI side because you have a ton of experience here and like you said you have a toolkit that you developed and I'm going to talk about the address side so let's get going with the net openai SDK and show us how how it works awesome yeah so I want to get started here and just show you the official open aai documentation just to show how nicely everything's integrated in there so we're here on the open AI docs um and on the left side you'll notice there's a library menu item and there's now a
new net Library section so you'll see right here um you can just use this example code to get started so how about we go in visual studio and I'll just paste this right in so I'm going to actually create a new project I'm going to show you the complete end to end how to build this just going to make a new console app uh and uh it can be any version of The NET Framework net core um it's uh got pretty wide uh platform support and to make this here and then what we're going to
do is we're just going to go ahead and manage new get packages and add the open AI package now what's interesting is if you just search for openai you'll see this uh original package so this is the package I had built over the past few years um the new Microsoft official package is still in beta it's a pre-release so you have to click this include pre-release button and then you'll see that that right here um you'll see the official uh open a logo so you know it's the right one and just make sure you grab
the latest pre-release version and go ahead and install that and agree to the license terms and now we've got it here in our project so let's go ahead and using open Ai and now we'll go ahead and just paste in that code we copied from the documentation um so right there and I'll walk through exactly what this is doing but basically creating a new client calling uh a model of our choice now there had previously been some questions about model support the models are just a string so you can put in any model that's supported
by openai here openai launches a new model tomorrow you can just put that new string in right there uh no need to wait for any updates to the SDK so that's going to come from if we go here on the left uh of the documation there's the model section and you can find the model names right here and you can select any of these models just passing in as string um and then this environment variable so as you know open aai is a hosted API and it requires an API key so for security purposes you
don't want to put that right in your source code um that can leak if it you post on GitHub it's generally uh not good security practices so what we're going to do is we're going to put an environment variable instead now this might be fairly simple but just for those of you who haven't done it especially on Windows let me walk you through how that works so we're going to go ahead and get our API key so on open AI platform here we're going to go to the dashboard go to API keys and create a
new secret key uh we're going to give it a name so we know what's what you can assign it to projects and permissions but by default we're just going to make it with a full permissions and now we've got this key right here and now we need to add that to our Windows environment so the easiest way to do this is just to open your start menu and type environment and then you'll see a link to add environment variables for your account we're going to make a new one we're going to call it open AI
API key and we're going to paste that API key right in there and so now if we go back to visual studio so that will basically be substituted in here from the environment variable so now we'll have a client that's attached to our account next thing here we're going to get a chat completion so very simple stuff here we're just going to give it a user chat message that has it Echo back this is a test and we want to get the response from that so you'll notice gives a chat completion object um that includes
various metadata like how many tokens you used and a few other things uh to get the result out we'll just console right line um the chat completion two string this is a helper function that basically takes the first text completion uh object inside that chat completion and gives it to you as a string so let me zoom in here as well make it a little easier to read okay so I'm going to go ahead and run this and we should see it saying this is a test B building awesome and you see it echoed back
this is a test now just to make sure we're actually talking in AI here and not uh just echoing back text um let's have it say in Spanish okay try this one more time awesome so it translated to Spanish and now we know it's actually giving us back uh an intelligent response so that's a pretty cool uh example of just how easy it is to get started but let's say we want to do something more complicated so you might notice on chat gpt's website um when you're chatting with it the results stream in so you're
not waiting for an entire result all at once you can see the result as it comes in so this is called uh a streaming uh response so let's go ahead and Implement that now so I'm going to go ahead and create a new method uh for chat streaming here and just like before we've got a chat client this time I'm going to the full GP 40 rather than the mini because we want it to take a little bit longer so we can see it streaming um and one thing we did earlier is we made a
very simple just complete chat with a single message but you can actually send multiple messages so for example I might have a system message that I want to send um giving some context uh that you're a poet and you love technology um and then a user message uh you can even add uh assistant messages here you can give it examples of how it should reply um basically your messages is just a list of messages that we're going to send in so now that we've got our list of messages let's get the results from that so
all we're doing is we're calling complete chat streaming async with the messages now it is a uh streaming async so there's two different levels here let me show you all the different options here if we do client. complete chat you'll notice there's this standard synchronous complete chat which is what we used above that blocks into until there's a response there's complete chat async that still Waits until there's a full response it doesn't stream it but it doesn't block so it uses the nice async await pattern we've got streaming um which streams the result back but
does block until it gets a response and then the fully asynchronous version that we're about to use the complete chat streaming async so that is an async await um it returns a task uh that we uh just in terms of calling the API and then we can use uh the fancy a weighted uh for each in order to get the results so let's see how we would actually enumerate the results here so we're going to do a weit for each and my auto complete here is there we go so a wait for each we're going
to go through each item in the result which is of type streaming chat completion update so what happens is it actually and second so what happens is every time the API returns additional data it returns it as an update now this is usually additional text that you can just append to the end hence this console right here although sometimes there's other things in there there could be metadata there could be a reason why it stopped it could ask you to could give you an image or run a tool and we'll get into a few more
of those details later but for right now we're just going to assume that every update is a string that we want to write to the console um and this should be good enough to run uh we're basically creating the client giving the list of messages asking for completion and streaming and then for each uh streaming result we're writing it to the console so let's go up here and call this chat streaming async function and because we're calling an async function from a non- asynchronous function we got have to call weights and we should be good
to go to run that give it a second here awesome so you can see the results are streaming in just like they would on chat gpt's website it's writing the super long poem here um and we got to see it as it was being written now because we're writing a long poem we might be concerned about token usage remember with open AI you pay per token input and output um and especially for some of the larger models that could add up if you use a lot more tokens than you expect so let's see how we
would determine how many tokens we've been using here um one thing that's worth noting is you can get get token usage on any chat completion or any API call uh but for the streaming results they only show up on the final result so what we're going to do here is say if we have usage information that's been provided um in the result in this update which will only be on the final update then we can assume that we're done with with the updates uh that this is the final part so we're going to write a
line and then we're going to write out the uh the various tokens here uh token usage so again we'll have the input tokens the output tokens and the total tokens so let's go ahead and run this and we'll get another poem and then we'll get the token usage at the end okay so this one's an even longer poem it's got longer lines definitely curious how many tokens we're using okay so you can see our input tokens was 50 that corresponds to both the system message and the user message we gave it and the output tokens
is a bit over 500 12al tokens is over 500 as well so that's not too bad open a is not that expensive but it is worth uh tracking the stuff in case you're doing something um where you're using a whole lot of tokens it's good to be able to know how many you're using okay so moving on what just to kind of recap what we got here we've got chat streaming where we've sented a couple messages and we're streaming the single result but there's a whole lot more you can do with open a so as
you may know openai released a new assistance AP API uh earlier this year and it's really exciting that the uh the new official Microsoft open AI SDK supports that as well as all the other functionality there so let's make a demo that will actually go through and create a assistant upload some files to it um ask it to look through those files and ask it to uh do rag U retrieval augmented generation based on the content those files and return results to us so this is very relevant to if you want it to uh understand
reports you might have on there or generate reports from raw data or if you're doing things like uh anal analyzing and writing code fixing bugs all sorts of stuff where you might have uh various files you want it to look at pull in the appropriate files and then respond based on that so let's get started here um we're going to make a new um a new function assistant with chunks because we're going to be chunking things uh I'll show you how to do that in a second here um again we're just creating the client here
um we're creating a new file client because we're going to be uploading a file to the service and then an assistant client now you might notice some red squigglies here that's because the assistant client is um currently in beta so just want to make sure that you're aware that this might still change over time um and uh it's not fully finalized but it does work um so in order to move on here we have to um disable the warning and this will just uh acknowledge that we know know it's in beta and it might change
and now we no longer have any errors there so the first step we want to do um I mentioned that we have uh we want it to be able to load files to go through so let's go ahead and just open a file from our file system and upload it so this just uses standard.net file streams we're going to file open read uh this could be any file in our file system creates a stream we're uh putting that file stream into the file client upload function giving it a name this is basically the server side
name it doesn't have to be the same as the local name and we're telling it we want to use it for the assistance um you could also upload files for fine-tuning and other purposes but today we're going to use it for assistance now even though this is really simple and should be intuitive if you're familiar with using files in net um in order to make this example very clear so we're not using any extra files um I'm going to do it all in line so this is actually fairly useful for uh for um demoing and
just uh get things off the ground here so what we're actually going to do is we're going to create binary data from string um and this just allows us to have effectively a literal uh Json file right here in our source code um so you probably wouldn't do this in practice but it does make development a lot simpler so now we've got this document here um as a stream and we're going to go through and upload that so um again this document is the stream we defined above um we're giving it a file name and
we're again saying that it's going to be useful for assistance um let me go ahead and fix a few formatting things that came up here there we go um so now we've basically uploaded this to chat or to open aai so it's now there on the server side and will be indexed uh by their systems but we have to explain to it how we want it to be indexed so we actually create a vector Store where the file is chunked into an index so let's create that Vector store the way to do that we're going
to make a vector store clients um and then we're going to create a new Vector store with the options that Define the files that are involved so you'll notice with the sales file um it returned to this file info object that has an ID we're going to pass a list of file IDs into the vector store basically telling you which files are relevant for this Vector store now we could in the future add additional files on that Vector store but right now we're just going to have that single file there in the vector store next
thing we're going to do is chunking strategy so um by default the uh Vector store will figure out how to chunk things automatically um it's pretty intelligent you shouldn't actually need to clarify but as you may know for for uh retrieval augmented generation it can be useful if you have a good understanding of what your data looks like to manually Define the choking strategy depending on how much overlap there might be or how big sections are in your data so what I'm going to do is I'm going to update this code to use a manually
created chuning strategy we're going to basically say here that we want only 100 tokens per chunk and we want 30 token overlap so that's going to tell open AI how to chunk up this file we gave it into multiple different vectors and the way we're going to pass that in here is just in the vector store in addition of a file IDs we're going to give it a choking strategy and that's the file or the object we defined above Okay so so we've uploaded a file we've created a vector store uh to uh embed the
file into now let's go ahead and continue making the assistant before we make the assistant itself though there are a few options we need to specify for the assistant so that's this assistant creation options here um basically we're going to give it a name uh instructions these are basically the system messages um and then tools that it can use so assistance just like in chat GPT can all tools uh there's a few built-in ones like searching the web uh doing calculations locally and also pulling in things from the vector store so let's go ahead and
give it some tools so here we care about the file Search tool definition and the code interpreter tool definition so basically we're going to allow it to search the files that we've provided or will be providing and also run code so running code will be useful for actually calculating uh things we want it to calculate based on the sales dat data and then we're going to finally have tool resources so in addition to the tools we need to provide some additional metadata to the tools um to help them work best so the file Search tool
needs to know what files to search so we've created this new file Search tool Resources with a list of vector store IDs um and that is basically telling it which Vector stores it can pull from so we've got this Vector store above we're taking that Vector store getting its ID and passing it in as a tool resource for the file search so we should be good there in terms of creating the assistant options we've given a name instructions tools and additional details for its tools let's go ahead now and let's go ahead now and make
the assistant itself all that Preamble is done we can finally create the assistant so what we've done here is we've created an assistant client telling it the model to use and we're passing them the options we defined above um now the way assistant Works think of an assistant as a GPT so in chat GPT you have this concept of different gpts they might have different instructions different tools they can call different personalities um but every interaction with that GPT where you have you send it a message and it sends you a response and maybe you
send a reply you have this ongoing thread So within assistance we have threads so here we've created these systems but now we need to create a thread that's attached to that assistant so that's what we're doing here assistant client that create thread um we're initializing it specifically with a message for this thread in this case we're asking it for a certain product that was included in this raw data above for that product how did it sell in February and graph its trend of return and that is the message we're giving the thread to run and
and then let's go ahead and actually run it now so we've created the assistant we've created the thread and now we need to actually run it so you can think of this as the equivalent of the create chat completion we did above where we are telling open AI okay given all of this stuff given the thread we created given the assistant we created and given maybe some additional instructions here for example we wanted to use puns because of course we wanted to use puns um now we want to actually get a result and just like
before we can have it just return the result and wait for it or we can create a streaming result and again the streaming results are a lot better for usability um because it helps the user understand what's going on sometimes these assistants can take a little bit longer to run so streaming the results is is always a good idea um so now that we've got the streaming updates collection result let's go ahead and enumerate through it similar to how we did before we're going to go through every update that it's provided um and see if
the update metad data says that it just started a run we're going to print that out we're going to see if it's a Content update and if so we're going to write out all the text so in theory this should be good enough to run now um there is one thing I want to do though I want to clean up any usage of all these things we've created because the assistant and the thread and the files and all those things continue to live server side which is probably what you want long term you're going to
you know call these assistants multiple times the files will probably be up there for a long period of time time but in our example because we're going to be wanting to make sure this runs with a blank slate every time I'm going to go ahead and clean up all of the resources that we created server side and to do this we're going to delete thread delete the assistant and delete the file so that should be good enough to run now I'm going to go ahead and build this make sure I don't have any silly syntax
errors and it looks like we are all good so let's go ahead and call this assistant with chunks function and see what happens so you'll see run started that means we've returned a response from open AI saying that it's already started on the server now it's going through and indexing the file we gave it chunking it getting in Bings um it's pulling up the appropriate parts of that in order to get a result awesome so here we see let me zoom in here so you can see a little bit better um it says that we
had a significant sales increase in February and it's got all that cool information there it also uh even made a a pun I guess you could call it a pun uh there at the end so it's followed all the instructions it's intelligently pulled the data we asked it to pull and it's given us this schol result but wouldn't it be great if it could make graphs and other images for us so let's go through here and see how to do that so we might notice here um that we've already asked it to graph over time
so actually is already graphing for us we're just not doing anything with it because we're only pulling the text from the updates so let's pull the images from the updates as well so let's check first if we have any images to pull so basically um we're checking the if there's an image file associated with the content update if there is we want to get that file U because all it's done is give us a reference to it what we need to do is actually retrieve the file from the server um we're going to get the
bytes of that file we're just storing it in binary data and then let's write that to our local file system now you don't have to write this to your file system you could do other interesting things with it you could send it to some other service run OCR in it whatever you want to do uh but right now we're just going to save it to our local file system and then we're going to create a new process um and launch that file just basically to pull it up and see what it looks like and that
should be enough to get it going let me go ahead and build and all good so let's run this so same as before we're giving it the same data choking it the same way asking it for the same output but now if we receive an image so it's visualiz the trend so it's now receiving the image and it should pop up with an image for us in addition to the text we see so the way it's generating those images is actually running uh I believe map plot lib on in Python on the server side um
and it's generating this image and sending it back to us so that's pretty cool right there so you can see right here this is the sales Trend it onally generated for us automatically based on the data that we had uploaded and the command we gave it um now the data we uploaded is relatively simple so it didn't really need to try that hard to find the right data but you can imagine if we gave it a ton of different documents a ton of different data um being able to go through and find the right subset
of that to to plot it is is pretty powerful and everything we done here basically allows it to do that so I'm going to real quick just run through one last time all the different Set uh sets of functionality here um and then hopefully you can use this to build your own assistance from so again we're creating the new client here um we're creating a file client because we're going to be uploading files and an assistant client because we're going to be dealing with assistants so each of these different parts of the open AI uh
API functionality kind of wrapped in their own clients um you can think that these basically if you go to the docs here these more or less um uh map to the different endpoints we have here so going back here uh we've got the document this the document that we want to upload we're uploading it as a monthly sales file we're then storing it in a vector store with a manual choking strategy we're creating an assistant and again this is kind of the equivalent of a GPT so this is uh a definition of how a a
GPT chat might work um we're giving the assistants name system instructions different tools it has access to and additional information for those tools in this case references to the file the vector store that the file search is allowed to use we're then creating the assistant itself we're creating a thread in the assistant which for each user in the future we probably make a new thread but not a new assistant we're giving a message for that thread and this is probably equates to what a user might ask rather than something hardcoded um and then we're getting
the results as a streaming result we're iterating through each results and if it's text we're outputting it to the console and if it's an image we're saving it to the dis and showing it and then at the very end here we're cleaning up everything we've done uh by deleting the thread the assistant and the file although in practice you probably would only delete the thread uh rather than deleting the assistant because you probably have multiple users creating new threads on top of that same assistant but I hope this has been a pretty exciting overview of
all the cool things you can do with this new uh official open AI SDK um and now I'll hand it back uh where we'll go over how this all fits together with Azure all right thank you Roger that was great and couple things I wanted to hammer home is that this is open AI c4.net which is awesome and we're showing off you showed off like the assistance s and that's some of the stuff that we've heard from the community before it's like I want to have access to the latest and greatest from open Ai and
now we do we have that and it's it's mindblowing it's all net Net's everywhere you started off by showing usnet documentation on the open AI site amazing so we are there um but everybody's wondering Matt you're here you're going to talk Azure and I'm going to tell you you're right I'm talking Azure of course but just a little bit and so the Azure open AI Library for.net uses the open AI Library for.net underneath the cover so of course we have Azure up in Azure up in the Azure Cloud we have open AI models that you
can totally consume and use so we went ahead and we built off of the open AI library for do net the Azure openai Library for.net so what does that do that gives you a whole bunch of like Azure goodness along with your open AI models that you can use and so I'll show you like just one that's really really cool once we get into the code of what that gives you so how do you use it there is super super super minimal code changes that you need to do you really you're just swapping out ah
the Azure open aai client or the open aai client for the Azure open a client most of the code that Roger just showed you stays exactly the same really it's super easy so let's see it in a demo all right so I have everything up here Roger and I said all right Roger you're going to take the Dark theme Matt you take the light theme that way everybody knows what you're doing and so what I did is I'm pretty much starting with the exact same app that Roger did but I am going to swap in
Azure open Ai and so the very first thing that we want to do in order to do that is that there are a couple Nate packages that we want to put in and real simple it's Azure AI doop aai and Azure identity so Azure identity is super cool and I'll show you why we want to do that but one thing to note is azure ai. openai along with the open AI by open AI package they're still marked beta and we're schedul to release those this Northern Hemisphere fall southern hemisphere spring to uh wider GA usage
so they're coming but you can still start building apps off of them and as Roger showed you they work pretty great okay so here we are uh the application looks very very similar we have chat streaming with tokens assistant with chunks but let's azze them so the first thing other than adding the new get packages to it I'm going to just do using Azure do AI doop aai and then I'm also going to do an Azure AI I'm going to do an Azure a I'm going to try to spell right aurei dot I would just
an aure identity there we go so I just do did some using statements to get things in next up is for my chat streaming with tokens all I need to do really is a swap out my clients so before we had the um just using the chat client and all we're doing is grabbing over the open AI key that Roger showed us on how we put that into the environment variables but with over an Azure what we want to do is using the Azure open AI client new Azure open AI client so we're using an
Azure open AI client rather than just the open AI client and I'm passing in a model endpoint here for the where it should go to call up and Azure to get where I deployed my model so one thing that I do is I have deployed uh models in different re and so on so I'm just saying hey a or SDK this is where you should go look and here this next part this default Azure credential I now no longer have to pass in an API key I'm saying because visual Studios logged into me or I'm
logged into visual studio and I've given myself access to the deployment of where I did my uh model I'm good to go so essentially I'm using entra ID or the old active directory to um handle permissions super cool and so that's part of the azing of the open API and so then uh for my chat client client equals how about we do call this Azure client aure client equals Azure client dog check client and we're going to use the gp-4 that's it everything else stays the same so it's really just the ceremony of um like
the initialization of where we want to call it so let's see it run we'll cross our fingers that things work chat streaming with tokens from our admin up up up make it bigger and it's just saying hello how can I assist you today so we'll just say say hello and tell us the state flower of Washington run it again and it should tell us that's the coastal rendin I should have been a BST and it's saying it's the coast Rendon cool we are up and running so the next thing that we want to do again
reiterate just changed two things on the initialization is over in the assistance with chunking I'll comment that out comment that in save things go into here now I am going to again comment out where it would throw an air if I don't have the open API key in my environment variables and see we are uh neing in a open API client of course what I want is the Azure open API client here so Azure open API client and I'll call open AP open AI client just so I can not have to do a bunch of
stuff and intellisense or probably GitHub co-pilot helps me out uh model endpoint new URI default Azure credential cool on my way everything else is the same file client assistant client building up that inmemory uh Json string one thing though I want to change and is the vector store client what I have to do here is actually use um actually it's the vector store so let's just go back to New so there's a minor change actually this should work I believe or hold on open AI client open this one yes get Vector star so it's a
little bit different I just can't do it new I want to actually do this um to get it back and so that's it that's all I should do and everything else is still exactly the same as we did it before calling in the model when we create the assistant telling it um what it should do graph its Trend and ideally it all works takes a a little bit time to warm things up because I am using a different model here than what I was using before I was using GPT 4 here I'm using GPT 40
but it should start to say our running and then generate things for me run started then it's going to tell me and analyze everything and tell me essentially doing rag over my data first it said image save to Roger displayed it automatically but I wanted to save it just to um probably send it to my manager to show that I actually did something and the pun was it appears the pun was it appears that February was febru very good for sales I would never would have come up with that so let's see where our image
went to let's open this up open containing folder we'll go to B we'll go to debug we'll go to net 80 here's our file and there we go looks very similar to what Roger got cool all right that's it we've have gone through now and taken all of rogers's code for the open AI client for.net moved it over the Azure and we showed that one thing and I want to hammer that home again is that we just used the default Azure credential and what's cool about that is I was logged into visual studio and I
granted myself access to the deployment of that open API model and so I was able to get access to it now if I would deploy something Azure all I would have to do is give that service that same access and it would also be able to access it too we wouldn't have to change the code at all we don't care about API Keys anymore so it makes it a little bit more secure and that's one of the things that you get with the uh auress out of it Roger how do we do we covered it
we covered all the greatness of the official open AI open AI SDK for.net there's so many letters and we azize it yeah um and just to kind of ram home that point again um it is really easy to go from the the new open AI SDK to the Azure open AI SDK because the Azure one is built on top of the open SDK um so it's really just that authentication change um which if you're already running stuff in Azure you should be fairly familiar with um you know API keys for openai or Azure credentials from
Azure and the rest is basically the same so it's pretty exciting how this is all coming together yeah super super exciting and it'll all be G in a couple fall 2024 Northern Hemisphere spring 2024 southern hemisphere so we are very close to having it all G8 all right well thank you very much Roger I had a blast learning all about the uh open Ai c4.net and I hope you all had fun learning about it too yeah I I'd love to point out there is a migration guide online again we'll share the link to that but
yeah this is the migration guide uh basically explaining what's different from the old uh opena Community Library I had published in the past and the new official Library there are some changes um the new official Library Maps a lot more closely to the actual API um but some of the common scenarios people might have used in the past and how to update it is uh covered pretty well in here so I showed you originally where there's links to the libraries but for each of the sections where it tells you how to do different things in
it um you'll notice uh up here there should be U net and C examples in there very shortly as well great y everything is coming and we will definitely put a link to an overall collection that people can check out and get links for everything that we just talked about and more all right thank you Roger I really appreciate it well thank you all hi all Welcome to our session patterns and practices for automating business workflows my name is Costa working as a global black belt and Microsoft I'm a developer at heart and helping customers
with Cloud native and AI apps but lately a lot of agentic caps or sh young uh hi I'm Shin so I'm a software engineer in Microsoft and I'm currently working on the gentic application and machine learning inet okay let's look into our uh agenda for to today's session uh we're going to show you a few patterns and Primitives that you can use to build AI applications and more specifically agentic workflows we'll start with a quick primer on agents and intelligent systems uh then we'll go over a few scenarios we see are a good fit for
agentic workflows and lastly we're going to describe few patterns and practices and show you how you can add them to your existing apps uh before we dive into the specifics let's start with a quick primer of agents and agentic systems uh let's do a little recap of where we are today when it comes to adding intelligence into our apps uh you've probably heard thing or two about the new AI models specifically the large language models uh when they came out our first applications and uh were mostly chat BS uh they use the lm's ability to
answer questions uh a very sophisticated ones but still limited to its training data then we wanted to use the llms to ask questions on our own data which gave birth to a pattern called Rack or retrieval augmented generation and up to this point we are using the LMS to create applications that you can say are more about asking the model uh and the next idea and realization we had is that we we can actually use the models to ask them to do stuff for us uh with the addition of function calling to the models that
became largely possible and we were able to build coiles that help us accomplish pretty sophisticated tasks as the copil grew larger and more complex uh we figure out we can split them into smaller agents and smaller domains uh and have multiple of them collaborate to achieve a task uh that gave us the possibility uh to have more intelligent coordination resulting in more autonomous agents uh those agents can plan coordinate and decide with minimal human inter intervention but what do we mean when we say agents uh there are a lot of definitions out there on what
constitutes an agent and in our context we're going to say that agents are programming Primitives and patterns that has their own behavior knowledge and context as well as different levels of autonomy uh let's dive a bit into what that means so Behavior maps to the prompts and functions or tools that the agents can use knowledge maps to the specific data the agent has access to like a DA agent knowing about the well architected framework or a call seter agent knowing about the device manuals of of a set of products context maps to memory uh of
the agent or the state of the current task of the group of agent collaborating uh different levels of autonomy means we have agents with um uh various degree of autonomy on one side we have the explicit agents where all of their decisions plans and orchestration is mostly uh done through code explicit and on the other end of the spectrum we have agents that are fully autonomous and can react on changes in their environment uh so they can plan take actions uh and make decisions on their own of course using the llm uh and Truth to
be told uh most agents fall somewhere in the middle uh where we have some aspects are explicit and some are a bit more Dynamic so they have a bit more freedom uh one thing we want to to point out is that agents and agentic applications are not tied to specific framework or the library they're more of a pattern much like microservices a specific case of agentic applications are so-called multiagent workflows where we have multiple agents collaborating on accomplishing a task main benefit is that we can split the responsibilities into smaller chunks and use the right
tools and data for the job uh well now another rule by any means we found that mapping agents to entities from the domain driven design world is very useful in helping modeling uh those agentic and existing workflows now that we have a better understanding of what agents are let's look at some high level scenarios where agent applications can be particularly applicable so first a scenario that comes to mind often to us as a developers is uh creating Dev agents that help with our day-to-day tasks uh those agents can collaborate to write and test code run
it in a sandbox uh produce documentation and all all that by integrating with our existing Source controls cicd pipelines Etc uh we have full control on the data we want to expose to the agents for example internal uh coding guidelines Styles libraries and so on uh also we have the freedom to choose the AI models that are going to back the agents as well as where they run uh which is really important a more conceptual interesting scenario is using agents for fulfilling high level asks uh taking the cognitive burden from the end user uh let's
look at the first example I have a party the user asks uh I have a party for 20 people tonight both adults and kids fill my card with drinks and snacks uh so to using today's applications the user needs to decide what and how much exactly they need by manually browsing and searching the products in a catalog uh in in the in the e-commerce site and picking the ones they like uh if we reimagine this flow to be a bit more intelligent and agenting we can have the user still describe the high level ask uh
what they want and what the requirements are and a team of specialized agents for example type of drinks uh agent or or another type of snacks agent can collaborate uh to combine the user past preferences and the current task into few more suggestions that the user just selects uh and of course the user can have interactions with the agents and can modify the suggestions uh but the cognitive load is significantly lower same pattern applies for the second example where the user wants to decorate or or fill an empty living room with a tight budget uh
so instead of the user doing the cognitive work work to try and fit different combinations of layouts and Furniture instead we can have the agents uh um take the high level ask and combine it with the user preferences to offer a few suggestions another very useful category of scenarios is to use agents to make existing content production teams more productive we have a couple of examples here um let's imagine we have the first one is we have a team of uh of agents that are expert in different parts of the law for example uh they
can help a legal team audit a case uh faster and with a higher quality so they can start giving feedback based on their different uh uh specialization back to the team of legal experts and that can significantly decrease the uh time it takes for the audit of the case uh another example is a team of creative agents that can help marketing team create a campaign so let's imagine once the member of the marketing team describes the shape and the asks of the campaign uh a Content writer agent can produce the copyright for the uh campaign
a graphical designer agent can produce the graphical content and an auditor can flag and give feedback to the agents if the uh content is violating any legal or brand uh guidelines so now xia Yung will walk you through the sample la we're going to use to demonstrate these patterns here is the real world agentic application the usop support we are going to use this application as a demo to illustrate different agentic patterns in action let's start with an overview of this application the system use S model agents to assist staff employees in managing customer tickets
such as product inquiry or refund request now let's look at the user interface on the left side of the screen uh you will see a customer window this is uh where staff interact with customer tickets and in the top left corner you will notice met data about each ticket such as uh ticket name number and category customer information Etc below that is a window showing both the summary and the conversation with customers and moving to the right side of the screen we have another conversation window this is where St can chat with our agents uh
in this demo we have three types of agents that can respond to us the request uh first uh we have the manual search agent this is a rack based agent that can search product information from a provided document DB and second we have a customer support agent uh this assistant helps write replies and responds to customers and finally uh there's a task planner agent this is a react like single step plan that can break down complex tasks into a manageable single steps and assign the steps to another agent to execute this agent works as an
orchestrator to help other agents to coordinate and collaborate with task resolving one of the main use case for this ESOP support is helping staff draft and resolve customer tickets uh with the assistance of this agent the agent work together to search product information reply to question strs Etc uh which can significantly reduce workload for staff employee allowing them to focus more on complex high value tasks in the remainder of this session we will introduce several common list gentic pattern and workflows using this demo as an example this will give you a practical understanding of how
this assistant agent can streamline customer support operation using different patterns back to you Costa thank you so the first pattern we're going to describe and show is the rack based agent this type of agent is specialized in answering questions on a particular topic or data source the data doesn't have to be text it can be multimodel so audio video images and so on where this pattern is useful is where we need to uh we need different prompts to answer questions that belong to different domains and they require retrieval of different sets of data as an
example imagine we have a team of call center agents each having specialized knowledge around a particular topic uh one agent is able to retrieve the right data to answer questions related to products and user manuals while another one might be able to retrieve customer invoicing data and answer questions related to that let's see this pattern in action um before diving into demo uh let's uh briefly overview the rag workflow the that the manual search agent will take the workflow will begin with the user submitting a task to the manual search agent and the manual search
agent will utilize a language model to generate a query based on the given task uh this query is then used to search a document DB which houses the content of the manual the document DB will return the most relevant chunks which are provided to both the manual search agent and the user as the final answer and then now let's jump into demo okay switch the screen thank you and uh thanks and in this demo we'll be using the ticket for for as an example and in this ticket uh the user asks a question about a
lumx Lin system and to find the answer uh we can instruct the manual search agent to search for the product information directly from manual uh by clicking on this shortcut M which will send uh short request to the Manu search agent and the the agent will first generate a query based on the provided summary and then search a manuals using this query the search result which consists of a raw text Trunks from the menu will be returned as a final answer uh it's important to note that in the demo the RO text chunks are return
directly without further processing by the manual search agent back to you CA thank you so the second pattern we want to show today and describe is the Alor critic pattern uh the basic idea is that when the author generates content we'll have another agent give feedback on that content so if needed the author agent can improve it uh we usually do that in a loop until the critic doesn't have any more feedback for the author but for a limited number of iterations we can have variations of this pattern we we can add more agents to
validate different uh aspects of the content and improve it even but the idea stays the same uh we apply different perspectives and iterate on the content before sending it forward if we go back to the marketing thing example we mentioned before a Content writer agent is tasked with producing the copy for the campaign and the auditor agent can check if everything is in line from uh a legal and brand guidelines perspective and if it has feedback uh that will go back to the content writer and that Loop continues until either the critic is satisfied or
we are running out of iterations so let's see this pattern in action uh here is how the auor crit pattern is applied to the eShop support demo in the customer help agent and in this case uh the critic will serve as a safeguard ensuring that the responses generated by the customer helper agent are appropriate and are suitable for the workplace this additional reval step is important to prevent the delivery of incorrect inappropriate or potentially off offensive repes which could harm customer relationships or the company's reputation and in the workflow when the user create a task
to draft the reply the customer helper will generates a response and in this time instead of sending this response directly back to the user as answer uh the uh the drafted reply will first forward it to the crit which is another assistant L model agent and for review the critic will check the content for any issues such as in appropriate language or mising information and if any problems are detected uh the customer helper will be invoked next to revise the answer until the response pass the critic uh critics review and send back to the user
as a final response uh now let's jump to the demo and we will still use the ticket 4 for as the example here and after the manual search agent retrieve the product information uh we can then invoke the Au creative Flow by asking the customer reply agent to draft the response based on what manual search agent found and we type in WR uh reply and uh hit enter and uh uh in this workflow that's with for the work workflow to finish and uh now it's now it's completed and we can see that uh the customer
reply agent uh firstly uh generate a draft and then the cretic agent is then invoke uh next automatically to review the answer and in this case we can see that uh the draft response from the customer reply agent is applied appropriate and concise so uh the critic agent uh approves the reply as a final answer and successfully completing the workflow and in The Next Step uh the staff the staff employee will be able to click on the US as reply button here to copy and paste the uh draft response response to the customer window and
use this as the final reply to your customer back to you Costa thank you and the last pattern we're going to show today should be a familiar one just like we are assembling existing workflows in our apps today we can Define agents to react to events invoke actions as a result and produce their own events we can combine them into groups and use them to assemble complex and sophisticated workflows uh which integrate with existing systems and apis as an example let's imagine we have a team of De agent that can react to GitHub issues and
Implement a new application or change an existing one so we start by the user describing a high level application in a GitHub issue then we have a Del agent that will break down the implementation to subtasks and each of those subtasks will have its own team of agents to work on it uh the team could be a developer agent TK to produce the code and an architect agent giving feedback so we have that Loop of critiquing to the developer and then the tester agent can write tests and execute them in the container sandbox uh once
the human in the loop reviews and approves the proposed code from the team the generated code is being pushed to a PR that can be worked on and merged into a repo uh now we're going to go back to Shaun to show the pattern in our sample app thanks Costa uh in the previous two demos I introduce two this distinct atic patterns the and also critic however in some cases resolving a task might require combining multiple work clows into one and in order to assemble this workflow into a cohesive process uh we can use a
react like task planner to break down the task individ individual steps uh these steps are created one at each time and each step will be created dynamically based on previous context after a step is gener it will be assigned to the appropriate agent to complete the workflow it's very similar to uh how the react pent Works uh where the TX Panner also operated in two stage for each iteration and in the first stage it will first observe the current context and create a single step which includes a brief description of the current steps objective and
uh assign it and it will be assigned to uh uh a group of candidate agents and in the second stage while the task planner waits for the agent to complete the current step and once the step is completed the task planner will gather the result and determine if the overall task has been completed if additional steps are needed uh the task planner will generate the next step and uh if the task is completed uh the test planner will conclude the workflow and Returns the final result to the user using the task planner we can assign
uh we can Sly integrate the previously demonstrated R based and also create patterns into unified workflow as showed in the PowerPoint in this final assembled workflow the task planner acts as an orchestrator breaking down tasks into a manageable steps and assigning them to the appropriate agents and in this case is the manual search agent or the customer reply agents uh and this uh workflow allowing each agent to complete its step using its respective workflow now let's jump into demo uh so in this time we will still use the ticket forid for as an example and
uh this time we're going to use task planner as orchestrator to complete both a rag retrieval and also critic reply in my request uh to do this uh we will first uh can uh send uh search manual and draft reply message to the agent window and the test planner will be involved first uh creating the first step which is uh to instruct the manual search agent to search for the product information and uh this follows the rec pattern and after the manual search agent retrieves the relevant product information uh it Returns the wordin to the
test planner and and mark the first step as completed and uh the test planner then analyze the output and determine the next step and in this case uh the test planner uh create the next step as draft the detail response uh to uh to answer customer question and assign the customer reply as the next agent to uh run the run the step and here the AOC critic pattern comes into play and after the customer reply agent DFT the response the critic agents is invoked uh to review the apply and since the response uh this time
is still applied and concise the critic approves it and marking the auor critic author critic workflow as completed uh the task planner then determin that the task has been successfully completed and marks the entire task as done uh back to you thank you so at the end we we would like to leave you with a few take ways um so the first one is we've seen with the sample app we don't really need to start from scratch we can equip our existing apps with agents where it makes sense uh second point we would like to
uh you to take away is agents and agentic workflows map really well to existing work business workflows and they can increase the productivity of teams uh the third one is even though the idea of more autonomy uh for agents seems really promising we don't need to start there we can start with explicit agents and once we feel more comfortable we can increase the level of autonomy and lastly uh we developers have a lot of experience in building event driven systems so that experience translates into building event driven agents and workflows as well uh now we're
going to leave you with uh a couple of resources that you can go to uh read more and explore dive a bit deeper into how to build agents uh there are examples scenarios and a couple of libraries that you can use to get started with building agentic applications and finally we would like to thank you for your attention and we hope you learn something interesting today thank you all right agents yeah that that makes sense and how I can dial in a little bit more information into my perspective models that we're going to use for
artificial intelligence listen if you want to learn more about agents and open AI Blazer semantic kernel all the topics that we've had throughout the day today make sure you check out the collection at aka.ms netfocus a collection you can take a picture of that QR code this session all of the other content is recorded it's going to be available on YouTube make sure you check out the playlist our folks are working behind the scenes putting together a YouTube playlist it'll be available at the conclusion of the event so so that you can go back rewind
and learn more but you know what I think it's time for us to to stretch get up take a few minutes take a break go for a walk go get something to eat nature might be calling make sure you take a break for yourself and I'll see you back here in a few minutes as we get ready for our next sessions e e e e e did you refresh your beverage do you have a nice cup of water there ready to go because we've got tons more content for you here as part of netcon focus
on AI now we wouldn't be here if it wasn't for some of our great sponsors that have been helping get the word out make sure you check out some of these folks who have been helping present this event they do a phenomenal job for us and we want to thank them by checking out their websites letting them them know that we appreciate the work they've done in helping out with netcon focus on AI all right coming up next we've got Dava day joining us to talk about rag with your data inet and Azure SQL Server
maybe you don't like relational data and you want to go with something just a little bit different that's fine we've got James joining us from Cosmos DB to talk about building generative AI apps with Cosmos DB DAV the stage is yours hey hi everyone I'm David Mai a principal product manager of azure SQL and today actually right now I'm going to talk you about how to do rag retrieval augmented generation on your data using net Ai and Azure SQL so let's get into it uh right away so first of all for those who doesn't know
uh about Azure SQL yet probably just a few but uh just for those few let's recap what Azure SQL is azure SQL is the Enterprise AI database it has it is a modern relational database with full true multimodel support means that you can store Json graph data even XML you start using it Jo spatial column store and querry all these things together uh it's scalable up to 100 terabytes you can have up to 80 VOR 30 replicas so it can really sustain any kind of work you can throw at it and of course we are
working on adding Vector support Vector support is in early adapter preview you can see the link on the slide where you can just go and submit your request to enter the private preview and the private preview is exactly what I'm going to use today to build the rag solution not because you can't without it but with the new Vector support that we are adding you will have just a much easier uh way to build complex solution that today are kind of common in our complex world uh there are other cool stuff that aure SQL offer
to developers not to mention of course Enterprise grade solution so you don't have to worry about security we take care of it everything is encrypted you can encrypt column you can even encrypt the data flowing between the SQL server or aure SQL in your application such in a way that no one except the receiver can decrypt the data and of course as a developer you probably want to connect with Azure services like Azure function that we will be using today Statics that you will be using today or any other services stream analytics Fabric powerbi and
of course that is absolutely easy with Azure SQL and as a developer again you probably want to use the common libraries that everyone uses today from netf core to semantic kernel to L chain or any other libraries that you want to use and again meure SQL is integrated with those Library so you can just have a great experience no matter what you want to use now the other thing that I want to discuss before moving into the demo is what are the use cases for having AI into Azure SQL right so why you want to
move AI to your data instead the other way around well the first and most common use case that even today I'm uh I'm attending and actually speaking at another conference at the same time we are recording this uh this session uh again all customer come to me and ask uh you know I have this very common situation I have my data in aure SQL or SQL server and I would like to add the vector search because I would like to be able to filter by let's say you are um uh insurance company and you have
all your data already in SQL so you want to find all the contracts that are related to a specific person or created by a specific employee and within those contract you want to chat with your data ask questions Nal language like for example find all the documents from Acme customer written by John do that are related to the new security policies they need to apply you can do that in SQL because uh you will have like the author of the document the tag that you use to tag the document in all other information already stored
in your database in some column or some document in Json format that you stored in SQL so adding Vector capabilities and then AI support into database is the Natural Choice for us and then for you to use because that will just make your life easier because you will add Vector support and semantic search to the queries that you already need to run in order to filter out the data that are specific to a customer or something else that's the most common use case what we call hybrid search where you mix vector or semantic search with
full text maybe are you already using and then with the regular filters then of course the other use case is to store chatbot memories for example we have releas an integration with semantic Cur that allows you to build a chatbot and store the memories in Azure SQL so then you can then analyze them find what is the most common uh question for example or problem that is arising from your customers chatting with your chatbot and of course the other very common um kind of use case is rag retrieval augmented generation which is exactly what I'm
going to show you right now this is actually also the most complex pattern so that's why I prepared a little bit of a high level architecture and a high level design so retrieval augmented generation is a sequence of steps actually too that you have to do so that you can take the data that is store in your database filter it using semantic search and Tas Vector search so let's say for example I have a database that contains all the session of a conference and let's say that I want to filter all this session based on
some topic or some day that I want to uh go and attend to the conference but then I also want to ask to an AI model out of the session that are available today uh what are the session that are about SQL ni and I want to do it in a natural language and I want to get the answer in a natural language so the first part is where I do the hybrid filtering and the vector search so I limit out of the hundred of thousand of session that I have available in my database I
only limit the scope to the one that are related to the question I'm doing like today uh the session must be today and then once I filter that I also make sure that I filtered only the uh the the session that are somehow related to the topic I I expressed the interest into like for example SQL and AI or security or anything then I take those um sessions and I send them to the AI model like a chat GPT Model A GPT model and and then ask it to you know evaluate the question in the
Nal language and return to me the answer in natural language so the first step is similarity search that's that's why we need the vector search and the second step is sending to the IM model the result of the first step and explaining the IM model using prompt engineering what it needs to do with the data I'm providing him so that he can provide me the answer I need so that's what we need to build and uh that's what we are doing and we are doing it with the aid of some measure s some measure services
for example we are using Azure function because uh the moment someone writes a new or updates a new abstract or a title of a session in my SQL database I need to immediately be able to be notified about the fact that the session has been added or updated so that I can calculate the medings get the vectors that are representing my um my topic or my session and store them back into SQL because all the AI model all thei word works on vectors and embeddings so we need to convert our text to those vectors and
embeddings and again here I'm using an Azure function that will be automatically triggered by Azure SQL as soon as a new title or a title has been updated or the abster has been updated then we will be using of course Azure open because my function will take the text that has been updated in my database and send it to open a so that it can be converted into a vector and then stored back in SQL and then in SQL I have a store procedure in this case that I want to use to search for all
the session that are in a specific day or related to a specific topic and once I have that function that story procedure ready I want to expose it so that it can be used by my front end and since I'm building a full stock application and specifically a jum stock application my front end is only able to communicate with my back end using Json and rest API and unfortunately my store procedure is just in SQL the problem I have to solve right now is that how do I expose that story procedure but it could be
a table or a view as a rest end point or maybe even a graph Q end point where typically I would have to you know create the service and code everything by myself but luckily today we have something called Data API Builder that allows my database to be exposed as a rest and point or a graphical point just in a second using a configuration file and that's it and the beauty of data Builder is not only that is open source and available for free Bott on premises and inure but inure is also integrated with static
web apps so if I'm creating a static web apps I don't even have to worry about deploying the AP Builder and configuring it to connect my database it will just happen automatically as part of my static AB solution and and that's exactly what we are going to use and then of course my frontend in this case is just a react front end I will just communicate with my backend my static apps back end uh using rest uh queries rest calls and Json so that's the application we are going to build all this solution is something
you can actually use right now and deploy right now in Azure because it's fully available on GitHub it support ACD so you can just easy do ACD app and everything deployed for you so you can try it out on yourself you don't need to use Azure if you don't have a subscription because everything also works locally and that's exactly what I will be doing today so you can even try it without having to um use a use a Azure for if you don't have it the only exception be would be for Azure SQL because the
the uh private preview we are running where you can use vectors as types and Vector function in in SQL is only available in Azure SQL at the moment so uh you can you need to use Azure SQL but luckily for you there is a free tier for aure SQL it's completely free forever free that you can use so you can really uh try out the example without having to worry about costs or money now before going to the demo let me show you how it works and you can actually uh try it on yourself right
away because this website that I built for uh another conference where I was invited to to talk uh I was allowed to also to keep it running so I can use it as a kind of a reference architecture and here let's say I want to learn about a session on security and what I want to do I already loaded my database with a session of that conference maybe I want to ask is there any session on security and uh even if I misspell it thanks to the fact that the data is being converted into an
embedding and a vector and then uh then uh semantic search is done in Azure SQL using Vector search and then everything is sent to the IM model the IM model is still uh able to understand my query or my in this case my question even if I didn't spell everything right and yes there is a session on security the session is sted how to build secure and blah blah blah you can read for your for yourself and you can see we have all the data here now in imagine doing that on your own data so
the example I'm going to show you right now is of course we don't have the full day here so I'm going to show you the important part of the architecture so you can take the architecture and change it to run on your own data so let's dive into the example starting from the GitHub repository that you can reach out from the little GitHub icon here so that's the repository and that you can clone and that's of course what I already done on um on my machine and this is the full structure of my uh of
my example um the let's start from well the database uh you have all the script here I already deployed the interesting part I want to show you is that tables are super simple we have the session table and we are going to store the embeddings and T the vectors directly into SQL for now as a binary object and then we have the speaker table and of course here also we have the embedding and the vector Associated to the speakers and then we have a very simple relationship table that connects the session to the speaker who
are delivering the session it's a many to many relationship so when it is a associative table that's it just three table uh nothing complex uh the only additional stuff is the ability to store to store embeddings right into a SE and uh once I want I will have at some point embedding in my table and I show you how then I want to find the session right I want to find a session that is about SQL and security so I have a t procedure that will take my text Will Call open AI using this get
embedding story procedure that again you have available you can see how it is done but it's very very easy is just calling openi through SP invoke external rest point which is a story procedure that allows you to call an external rest in point for example open Ai and then it will convert this text into an embedding which is needed to do Vector search then we'll take our Vector that is stored in a binary format and then we can calculate the distance between our Vector that represents the topic I'm interested in and I can compare its
distance um to the to the all embeddings already stored in my database that represents the session that I have in my conference I will take the uh the session that are closer to the topic I'm interested in and then I will return the result to my um to my uh query to my user and this is a story procedure so that's it uh that's the all the the only thing I need to do uh since I have data AP Builder that will help me to make this St procedure a rest end point without writing code
but just by configuring it and uh uh the first before going to how to Expos is a rest end point let's let's play with this data a little bit right now my database is completely empty right so the first thing I want to do is to add some data and make sure that my data will be automatically uh converted into an embedding because uh you will see that uh here I have some sample data here you will see that here I'm just adding my text I'm not really worrying about adding a vector a vector is
a very long sequence of numbers that will not be able to type in here I need an AI model to generate it for me so need an i model to take this content and turn it into a vector so in order to do that I have a function here this function is using something called a SQL trigger that means that as soon as there is a change in the in the web sessions table this function will be fired and we will get information on which row where changed or inserted or deleted and then we can
basically take these I only list of rows and in this case I'm writing a linku query so that I can take the title of the changed or updated row uh or inserted Row the abstract create one payload and then ask it to be processed for changes which means in this case I am just calling get embedding a SN on my open ey client so I'm using the openi SDK to call open Ai and get the embeddings of my of my text and then I'm saving back the uh embeddings that I that I get into my
uh into my database let me show you how the function Works behind the scene and I just need to start uh static apps because I'm using static apps so I have the front end and the function already so let's just do SW start this will start the function we start the API builder for me uh and as soon as the functioning start started it will be uh monitoring my database for changes and so I will be adding a um some some sample data and it will as soon as I will add the session it will
be sent to open a and convert it to an embedding so it just will take a a few seconds to start and then as soon as here we go we have the trigger ready so what I can do right now is let's just make sure that everything is ready here uh it should be so let me just uh go on and add the speaker the session and the the connection between speaker and session and that's it and now here you will see that uh yep the processing is starting so the function detected there is something
to be changed there something to be processed is is connecting to open AI getting the embedding and then this means that I can as soon as this is done uh I can go back to my table and I will see that uh even if I didn't add any embedding here when I query my session or speaker table yeah you see you have a I I have an embedding now I have the vector and of course Vector is saved in a binary format so I can do Vector search using the function Vector distance that we just
added to asure SQL in private preview to find the closest uh uh session to some topic so let me add another session and so also this will go through the entire process of getting the embeddings and you will see here that uh again there is something new new new detected so it's connecting to openi getting embedding and in just a few second my my session is converted to a vector now that my session is converted to a vector I can go to the Local website that's been created for me local lost 4280 and this is
the application I am running locally so my cool conference R sample uh if I click on about it automatically recognize that are two session indexed why it automatically recognize that are two session indexed because uh if I open thepi Builder behind in the scene that is again running uh automatically because it's integrated with Statics you see for example that I have my database is exposed and I decided to expose the session table for example as a graphql and resting point so it's very easy for my front end application to query the table or to query
my database just using regular rest and graphql actions uh for example I can have a graphical query or I can just execute the story procedure using the find endpoint okay so this is all free for me because I have Statics with data P Builder integration which means I can just focus on my front end my website I don't need this so I can just go back to and now for example I can ask some question like is there any session to learn how to use AI on my own data so what is happening here is
that U is that my um yeah this is so this is interesting uh so this is hallucinating a little bit this time but basically uh it will um be using uh uh all the data that I have in my database and uh and basically uh running the query and uh returning to me the answer based on the the data that I have in my database and how this is happening is because uh this uh ask button here is actually going to this specific function here we go chat Handler where I am doing a couple of
interesting things and here you can see the rag pattern at work here I am first of all retrieving the SAR session so I'm executing in this case directly the uh web find session so it's easier for this demo to show what's happening and I'm getting the result of find session based on the topic I'm searching for like security and then I'm getting the result of the the query and I'm sending it to uh my AI model basically concatenating the result by taking the title The Abstract the speaker and separating them by a pipe why I'm
I'm doing this because I'm doing some prompt engineering and I'm saying to my AI model hey look you are a system assistant who helps user to find the right session to watch from the conference and how do you know which session are available in the conference well because I'm providing them to you in this format so I'm providing the instruction on how the model should behave I'm providing a list of the session uh to the IM model using this format and then of course I'm also providing the uh the the query that the user typed
so the model is the information on how you should behave the context and the and and the question from the user and you can actually answer us uh by telling us what are the session that are about some some interesting stuff so let's go to the um demo AI do here we go so here I can say for example uh another question could be is there any any session by David for example and it will be able to scan the session doing Vector search using using this this query this text and then here we go
then it will get the session uh after we scan the database for similarity search and then uh the I model will actually understand the question and answer as uh in natural language this is all coming from the AI model so with this uh uh with these steps you can really easily have a static up function here is the client I'm not going into the client because it's just a regular react client not super interesting I would say the most interesting part is that for example thanks to the fact database is exposed as a resting Point
as I showed before calling uh the story procedure from the front end is as easy as doing a simple uh actually post using the fetch command in JavaScript so that is what make the Builder super nice because it takes your database and expose it as a rest end point so it means I can easily use it from JavaScript or typescript without having to worry about drivers and anything and how do I expose the store procedure that's the last thing I want to show you it's using this SWAT DB connection folder uh when you configure static
apps to use dat AP Builder you need to have a this configuration file that will be created for you and here you just specify that uh you want your story procedure not this one but this one your story procedure to be exposed at the slash find subpath so anything that will go will be routed to the AP Builder and this is something that automatically happen in Statics and that's why here you see if I go back to the client you see that I'm in the SL that API Builder because that that the API because that's
is how static Ops knows that it has to Route this question to the AP Builder and then rest and fine because we are using the rest and point and fine because that is where my my store procedure has been published so this is super easy I can have my store procedure published as the as an endpoint and then I can just focus on the frontend so front end as I was saying is in the client folder is all react then the database in database folder the function that takes care of getting the T the title
and the abstract and monitoring the table and turning them into embeddings is in the session processor the chat Handler is where you actually are doing the rag pattern so you you basically first of all retrieve the similar session here explain how the model should behave with this system message and then get the result after adding the user prompt so that you can actually uh basically send the data to the IM model and get the result and send it back as a Json payload that can be consumed by the front end and then the last part
is uh swb connection where you actually configure database to be uh rest and graphic end point that's it that's how you build uh Statics rag solution with Statics and and SQL and uh if you are interested in using Vector search in Azure SQL right away there is a in here this is the oh here this is the website you have to go in order to be able to um request to be part of the early adopter preview and as you saw we are adding the ability to store Vector in binary format but most importantly you
have the option to calculate the distance between vecture which is the clue of doing similarity search using this new feature called Vector distance that allows you to calculate distance between two Vector using cosine cion or do product which are the most common way to calculate distance between vectors uh with that said my recommendation is to go to the website to the GitHub repository download it run on your machine understand how the architecture is done uh everything is well explained here again uh with all the details that you need to do and then start to uh
run it on your machine and then add your data and see where you can go from there you will see a lot of interesting option and a lot of fun thanks a lot for listening see you next time hey everyone today we're going to talk about building generative AI apps with your own data sting Azure Cosmos DB I'm James Cella I'm a product manager in Azure Cosmos DB working on all of our AI features so let's get into it so we all know that Azure Cosmos DB is the world's most scaled ai ai database right
we have a serverless model that allows you to start really cost effectively U but we also have a provision throughput model that allows you to take advantage of things uh like our autoscale capability Cosmos DB features automatic sharting and partitioning of your data so you don't have to worry about if your data you know scales to very large SI sizes we take care of that sharting for you automatically Cosmos DB also features High elasticity with instant Auto scale so Cosmos DB scales out as your demand increases then scales back in as your demand decreases it
features low latency with real-time data transactions under 10 millisecond latency for Point reads and wrs we have uh High availability where uh you can maintain reliability of five nines which is really World leading world class uh it also offers five levels of multi- tendency built into the database everything from um logical separation of your tenants or users uh at at a logical partition level all the way up to separate uh containers and uh databases and resources as well more recently we've developed uh built-in Factor indexing and search capabilities in Azure Cosmos DV which we'll talk
about more in depth and see a demo today and this features disn which is a new state-of-the-art suite of algorithms developed by Microsoft research and implemented for the first time in a microsof Microsoft product that you can use yourself so uh before we get into the demos uh let's talk a little bit about common AI scenarios with Azure Cosmos DB for no SQL uh first is chat history right so uh if you're building a chat box or using a w language model to build an intelligent agent you might want to keep track of the chat
history between your users and your uh AI models uh you can do this for multiple reasons and we'll get into that in a little bit but we have thousands of customers doing this today with aure custom CV next is rag or retrieval augmented generation so this is all about taking the most relevant data from your database and using it to ground or personalize a large L language model or a small language model in order for it to uh really become contextually aware uh to your scenario and to your data next we have thousands of customers
building multi-tenant apps in Cosmos DB I talked about how multi-tenancy is built into Azure Cosmos DB and so a lot of customers take advantage of that today in a AI based scenario we also have customers building real-time recommendation systems anomaly detection systems and multi-agent AI systems as well um these are really exciting uh uh things that um our customers are are developing continuing to push to production and of course we have multiple customers uh building all these sort of AI applications on cosos DB maybe most prominently is um uh open AI that uses uh Azure
custom DB as the chat history store for chat GPT so your conversations you had with chat GPT are actually stored on Azure custom DB U because of its uh uh really low weighty uh for its Point readings and for its queries as well as its scalability so lots of excitement around Azure custom DV for AI apps and a lot of use cases as well we'll dig into a couple of the top three ones here chat history Rag and mobile inant apps so uh with that in mind let's talk about some the use cases for Ezra
Cosmos DB in gen gen AI scenarios uh first is we have the idea of a vector and operational database right so I mentioned that uh we recently have Vector indexing and search in Azure Cosmos DV so this this is really exciting right it gives you the opportunity to store your operational data and your vector data together in one data store in one data item right uh this is really exciting so it removes the need to ETL your vector data from a database to a dedicated Vector database right you can store your vectors and your data
together in one data source in one data item allows you and this allows you to um uh have highly consistent data and reduces the complexity costs uh of your AI application infrastructure built on top of this uh vector and operational data store uh we can do things like retrieval augmented generation right so bringing the most relevant data from your database to ground your large language or small language models uh this allows you to personalize your models on your data it's cheaper than fine-tuning and allows you to quickly iterate uh on new scenarios right and you
can do this today as your Cosmos DB thanks to our new Vector search capabilities which we'll talk about in a little bit uh additionally chat history right so I mentioned that thousands of customers are doing this today in Cosmos DB and they're doing this for multiple reasons right one is to maintain conversational context so you want your uh chatbot or your intelligent agent to remember what was talked about in a previous session or previous day or even last week U but you can also do some mining on the chat history data to identify you know
where does your large agage model um perform well and maybe where does it need a little bit of help maybe you need some fine tun maybe you need some prompt engineering or some other uh some other improvements as well also chat history is used quite often for auditing right uh if you ever call the customer support line you may hear uh this call is recorded for quality purposes we do this for human to human interactions so it only makes sense that we might want to do this for human and AI interactions as well right and
then there's this notion of something called semantic caching right so the idea of semantic caching is um I'm already storing chat history in Cosmos TV so um maybe I can actually take advantage of that uh in my chat scenario right so imagine that I have my chat history consists of a user question and then a response for my large language model so if I've seen a question uh uh if if I see a question multiple times maybe I can leverage a historical response from my large language model that I'm storing in my database and use
that instead of making a call to my large language model API right so the idea is I will take the user question and factorize it create Vector embeding of it and then when I see a user question a new user question I could run a vector similarity search uh in my database on my all my historical data and if I've seen a very similar question before I can actually take the historical example that of historical response that I've saved in my database from a large language model previously and surface that to the user instead right
so the idea is that this could save you cost because oftentimes calling a large language model like a gp4 model uh can be very expensive uh and also it has high latency right it takes sometimes multiple seconds uh to get a response from a large language model whereas executing a vector search in Cosmos DB you know can take you know tens or hundreds of milliseconds so uh really really fast uh really really nice feature here um so here's some of the common use cases for uh generative a using Azure Cosmos DP uh so I mentioned
that we launched a couple uh new Vector indexing of search capabilities in Cosmos DB for no SQL uh primarily uh we have uh these three areas right so native Vector indexing and search uh in public preview so uh what I mean by native Vector indexing in search is that this is actually part of the core database engine itself it's not a separate service that sits outside the database it's not it's another thing that you need to provision right it's just part of the cosmos DB engine and database that you get uh and you can you
know store along uh you can store your vectors alongside your original data next we announced a disc Ann based Vector index so disn is an advanced uh Vector index that allows you to do highly scalable efficient and costeffective Vector search uh at any scale and we have this inating preview today next we launched some AI framework Integrations with semantic kernel n chain and there's more Integrations to come let's talk a little bit real quickly about Cosmos DB for no SQL with this dis and in based index so the idea behind scanon is actually relatively simple
so you start with your vector embeddings and you uh insert them into your cosm DB database what we do on the back end is we take your vectors and we put it through a compression method called quantization the compression method then H then uh Scopes down the size of these vectors to be more efficient to store in Ram the full uncompressed vectors along with the graph data structure of disn is then stored to high-speed ssds that make up the backbone and measure Cosmos DB this interaction between the uh during search between the compressed vectors that
are stored in R and the uncompressed vectors that are stored on SSD allow the dis index to scale to huge huge sizes very efficiently right so we use very little computational resources and Ram uh because Ram is actually quite a more expensive than ssds and then we leverage the scale out architecture of custom DB and the high-speed ssds to store the full vectors in graph there um and disn as I mentioned is a state-of-the-art suite of vector indexing algorithms developed at Microsoft research and we're bringing it to you for the first time in a Microsoft
product um and you can take advantage of everything that Cosmos DD is built on right the unlimited scale we wait andc see uh and desan in itself is a robust data changes and it works with both our serverless model and our provision throughput model so we're really excited about the potential for high scale high efficiency high accuracy search with Azure custom CB and disn all right so I talked a lot already let's get into some demos right so what we're going to do today is we're going to uh I'm going to walk through some code
on building an assistant powered by your data in Azure Cosmos TB uh so we're going to build a highly scalable multi tenant generative AI chat app using Azure Cosmos DB for no SQL we'll Implement a rag pattern using Vector search and Azure Cosmos DB for no SQL on your data and we're going to be using Azure Customs DB for primarily three things one is to manage chat history two is to store and retrieve data for rag with Vector search and three is the semantic catch that we had talked about as well um and so we're
going to do this um there's a great GitHub repository I'll show you in a minute uh that has a great template that you can use so we'll store things in Azure Cosmos DB here uh we'll leverage Azure app service to deployer application and then we'll also Leverage The Azure opening ey service and we'll do this for creating our vector embeddings and for generating our completions or responses from large language models all right cool so let me switch over here uh so this is the GitHub repository that we have I have some links at the end
of this uh presentation so you can feel free to hold off or you can just go and check it out right now um so this is the demo that we're going to walk through right um so we're going to be building this essentially like a co-pilot like service here with Azure CM DV we're going to walk through everything that we had talked about right how to build a highly scalable multi-tenant and multi-user generative AI application using Azure custom DB for no sequel uh Works basically with these main components that we talked about um and then
it's also has a really nice uh user experience so it's built in using Blazer um so we're going to deploy this Blazer app so you can just access it through the web uh and you'll be able to ask some questions about this imaginary bike store the cosmic Works bike store um in this example here it's asking some general trivia questions but um we'll be able to ask other questions that are specific to data that we're storing in Cosmos TV so let's take a look all right so um when I deploy this Azure and actually let
me go back a minute here this is actually really easy to get started right it walks you through the prerequisites of what you need it's how to sign up for the vector search preview how to configure your ezure open AI resource Etc um and then there's instructions how to get started really easily right you can use a to um initialize and download the code from uh GitHub uh you can log into ASD then do ASD up and deploy the resources uh and they'll be all connected for you in Azure so really really easy uh way
to get started with this demo quickly and once you do that uh it will deploy a uh Resource Group for you so here's my sample Resource Group here um and then I can see all my resources that I've deployed here right so I'm using Azure custom DV here I have my Azure open AI resource here I have my app Service uh plan here I have my manage identity here so I you know I'm maintaining my keys here uh so this is you know really secure and this is a great example of how to build a
secure application and then I'm using asure App service to um actually uh orchestrate my application and actually run my blazer app right uh and this looks like this here's my app here's my web app here here uh and I can actually go ahead and browse this web app right so let me click on this um and here I see this web interface right so let me actually zoom in here a little bit so it's a little bit more wable there we go um so I don't have any chats in here right so we'll create a
new chat session here uh let's go ahead and do that create new chat session cool uh so maybe we'll say something like uh hello uh my name is James can you tell me about likes that you have so I just submitted my prompt to uh Azure opening and uh it orchestrated everything right so I submitt I prompt it looked at some chat history I have no chat history so I wasn't able to pull anything uh but it sent it to Azure openai to generate a completion based on the parameters that we set for our chat
bot so this says hello James I don't sell bikes but I can provide more information about different types of bikes and it's giving me some information about different types of bikes so then maybe I can say um I live in New York City can you recommend some bikes that might be good for my commute and let's see what it comes up with cool and it says uh certainly for commun in New York City uh you want a bike that's durable right so recommend some bikes and able to understand that you know I live in a
city and I want to commute because it's you know creating Vector embeddings from uh my prompt from my message to the orange language model um and then you know executes a vector search and as your custos DV to find the bikes and to find the products uh that are most relevant to my question uh so this is you know really cool really powerful stuff um and I can see that I've created a multi-tenant app here because I can actually create a new chat session and in this chat session like it's completely brand new like I
has no idea what my name is right so I can say hello uh what's f name and uh it doesn't know my name it doesn't have access to my personal information right so uh this this session information that I just started in this new chat is completely separate completely isolated from this uh bike recommendation session here um and I can continue going on here like uh tell me more about electric bikes oops electric bikes there we go and now it gives me a summary of electric bikes and why I might want to use these right
so some of the information is from the vector search that was returned by as DB and some of it is uh returned just from the knowledge in the large language model itself uh now watch this if I um if I copy and paste a a a prompt that I've already entered right so let me enter this in again I live in New York City can you recommend some bikes mute cool so did you see how fast it was that was like near instant right so what we can see here is um when I asked this
question originally I live in New York City can you recommend some bites it might be good for my commute we stored this in aure CS DB we stored my question and the vectorization the vector embedding so represent my question and we stored uh this inform the response from the orang language model in Azure Cosmos DB so when I asked another question we first ran a vector search on the data in Azure custom TV on the chat history and try to see was this question asked previously and since we already logged it in Cosmos TV we're
able to find this question again and surface the information that the chapot already gave me earlier right we're able to leverage his historical information and we can see in our token counter up here that uh returning this um returning uh this piece of information from the database actually cost zero tokens right we didn't actually have to call our large language model at all all we did was execute a vector SE to return the last or the most relevant uh response from the large language model that we've captured historically so this is an example of a
semantic cach right it saves me money because they didn't have to call the mar anguage model and executing Vector search is very cheap also saving latency as we could see the response from this assistant was near instant so really powerful stuff um go ahead and take a look at the GitHub repository and play around with this but let's go ahead and look at the code real quickly and we can see what's going on behind the scenes cool okay so um let's get started here so I have my visual studio set up here and uh here's
basically what the solution Explorer looks like for all these Services right um I have a bunch of different folders a bunch of different files we'll go through a couple of the main files here right so uh in the main sort of root directory I have this Azure yaml file and uh this basically just contains all the connection info for uh my um for my uh resources right my open AI endpoints uh I'm using semantic kernel here uh to orchestrate um the calls to my embedding model and Azure open Ai and my completion model or my
GPT model and then I have my Cosmos DB end point information and some of my chat parameters so these are all going to be environment variables right that I can set in my web app so it makes it really easy to customize and personalize these things once it gets deployed in the main program itself um this is basically going to to uh build the program for me and build uh all the services for me so we're going to make sure that we um build my resources we're going to make sure that we register my Azure
Customs DB service that we uh uh create the database and the containers for managing my chat history and my semantic cach and all my product data as well um and also all my orchestration even orchestration of creating a new semantic kernel service is going to be stored here in the program.cs file next we have an open AI Services CS file and this is going to walk through all the connections and interactions with my Azure open AI services so you can see that um actually I'm defining some strings here for my completion or my GPT model
and my embedding model um but I'm also going to find things like my system prompts right so my instructions that I give to the large language model uh such that um it knows how to behave and knows how to respond to my scenario right so um this is sort of a general system prompt here you know you're an AI assistant please provide quiet answers but then we can see that I also have this system prompt retail assistant here you know you're an intelligent assistant for the cosmic Works Bike company and this is the one that
I just used right I want to make this retail assistant personalized to my specific uh retail shop uh so I can give that information here as well so this open AI Services contains all the information and all the endpoint information and uh even things like my parameters for my large language model here uh and you can go and take a look this is really a great example of how you might want to configure these resources in your own application um and then of course we have a helper function here for getting the vector embeddings right
so uh if I ask a question and I want to ask do a vector similarity search and find the products in my database that are most similar to my question I have to create a vector embedding on my question I can do that using Azure open AI service and this code shows you a nice helper function of how to do that next we have a DB services. CS file and so uh this is really great because this sort of orchestrates all of the operations that would want to perform in Azure cosos TV right everything from
um you know getting the database and chat container and cache container and product container some my product information so this basically orchestrates connections to all the containers or collections in cosos DB uh this also orchestrates loading of the product so in this GitHub repository we have some sample data that you can automatically load into Cosmos it actually does that for you that contains product information and Vector embeddings uh so this function load product data async will do that for you there's also other information as well like uh defining a partition key uh so we're going
to use um uh a partition key Builder uh class here in order to build a partition key that's based on tenant ID user ID and session ID so if you recall um back here in our in our chat box right in our chat application uh we have these different sessions here um so I can I can maintain session isolation so what I'm talking about in one chat session doesn't influence the other chat session um but I also have like these different users as well that I can focus on right so I have some examples here
they're all all PMS on the cosmos DBT Mark James me our friends sandep and Saji um so you can experiment with just not only different chat sessions but even like look at this different users right I just switched users here so this is James I was in Mark user before and now I see there's no chats available because um there I didn't start a chat as James right but if I go back to Mark I can see the sessions that it created under the mark user so this is really cool really powerful right I'm using
Azure custos dv's partition key to have logical separation between my not just my um my sessions but my users as well so really cool stuff and there's some examples here of how to get started now we can also see some other helper functions too that help me get um some uh partition key information or get session information uh there's an example query here for how to get uh the most recent messages from a session um and then we can even see an example if we scroll down far enough here's an example of a vector search
in kasus DB right um so this is just a query it's actually a query with a subquery um and I'm just selecting you know my top results and I'm projecting a couple different pieces of information about U my data maybe a category name and skew and name and description of products um and then I'm executing a vector search right so I'm using this Vector distance system function and I'm basically doing a vector similarity search between my vectors and my query Vector right so s. vectors would be um the vector representation of my products and this
at vectors is just a parameter a parameterization of my query Vector right so if I say hey tell me about bikes that are useful for me I live in New York City uh we're going to vectorize that and pass it through here as that parameter that's what that at vectors is um so you can see a nice example here of um of executing a vector search uh in Azure custom TB no SQL uh which is really really easy and straightforward next we have a chat services. CS file and again this just orchestrates the uh different
interactions between the database and C kernel and your Azure open AI service right um so your orchestration of um hey you know um I want to make sure I get all the chat messages from this session and I want to make sure that I'm getting the most recent chat session information to feed it to the large image model in textual awareness um and then this get get cach async uh is a search to the semantic cache right uh so we actually look at this I go p definition I can see here that um this is
actually um uh uh running the get cach async uh here um and uh what this is doing is essentially it's running a vector similarity search on all my cached conversations so all the questions I've asked or all the prompts I've sent to as a user to the large language model and all the responses from the large language model so this aages the semantic cache and we'll see if there's any results in the semantic cache we'll see if my question or my prompt has been asked before or given before to the model um and if uh
if it's if it hasn't it's GNA the response is going to be empty and so then I'll cash this response and then I'll return you know the chat message um so this gives you a really nice comprehensive walkthr of maintaining session information and user ID information between each session doing leveraging the semantic cache in Azure C TB doing a vector similarity search all my product data stored in Azure Cosmos DB and then feeding all of that to the large language model in order for it to generate a response that's personalized on my data it's personalized
to my user's question and then of course I have a semantic kernel Service uh file here that basically just orchestrates everything that we saw uh using semantic kernel so basically generating the embeddings and generating the completions from Azure openi so uh that's about it for the code right uh there's some other files in here for configurations you're more than welcome to go through and clone the repository and walk through it yourself the last thing I'll show you is what this looks like in Azure Cosmos DB itself so um let me close this I just start
from scratch here uh so what I have here is I have um this is my resource my Cosmos DD resource that I got deployed for my for my azd up command um so this is the one that gets provisioned and loaded with data um in azra Cosmos DB just using that uh this uh co-pilot demo code that we've provided for you so I can see that it's instantiated as cosmos co-pilot DB database and I can see I have multiple different collections here if I take a look at the products catalog um I can go to
items and I can see that um there's a bunch of different items for my bike store here right I can see an ID a category name a name a description so this is like a a seat or a saddle I can see a price associated with it um and if I scroll down I can see the vectors right so these are vector embeddings that are created using one of the azra opening eye embeding models and they're just you know it's it's just a list or an array of floating Point numbers right uh so this can
be stored uh as just another property in asure Cosmos DB right this is part of the value prop that we're talking about here I don't need to store my vectors in a dedicated depth Vector database somewhere else right I don't need to maintain another service I can store everything in Cosmos DV and I can have my vectors just as another proper property of my data items so this is really really powerful stuff here and as I go through some examples I can see other items as well like here's a here's a bicycle um it's a
Touring bike and I can see the vectors here as well I can see other collections or containers that were created for me as well right so I can see here's my chat container so this is all the chat history right so I can see uh that I've uh created a session using this Mark character um and I can see um right so remember at the beginning where I entered in hello my name is James can you recommend me can you tell me about some bikes that you have here's that prompt we restoring this info in
Cosmos DB all the code to do it is in is in the GitHub repository um and we can see you know the completion this is what the large language model uh this is one my open AI GPT model I responded to me and I can see me some other statistics here about usage um so cool like I can see my chat history I can even see the semantic cache information that's stored in the cache container right so I can see what that looks like um so I can see that for this ID that's associated with
you know that that chat history item um I can see the vector embeddings that describe it so this is basically for the uh um you know for the chat history that I have um I can see uh the relevant information um uh and the vectors that represent that right so I can see so here's an example of the vectors this is the item uh when I said I live in New York City can you recommend some bikes that are good for my commute and this is what the L Ang language model responded um I'm storing
this to Cosmos DB so if I see this question again in my chat app I can just do a vector search on this vector and the user prompt find this information because I've already asked the question before and then just leverage this right if we go back to the chat app uh we can see uh certainly for Community New York City this is exactly what was responded here right so we're just pulling that from cos DD and leveraging what we've already seen so we've walked through a couple great things here today we talked a little
bit about Azure Cosmos DB we talked a little bit about um some of our Vector search scenarios and capabilities in Azure Cosmos DB and then finally we walked through a GitHub repository that has information and some code to get you started with this example so before we leave today I'll I'll I'll I'll offer you a couple of resources The Next Step link so you can get started with the first is to learn more about azra Cosmos DB for no SQL Vector search with this overview page and our documentation the next is if you want to
use our disn Advanced highly scalable highly efficient Vector index you can go ahead and sign up for the Gated preview following this link next we have our announcement blog post where you can get even more information about azra Cosmos DB into scan it finally uh you can uh check out this link for our Cosmos AI samples repository this contains uh a link to all of the Azure Cosmos DB AI samples that our product team has created uh and they can get you started in net we even have some python examples for those of you who
might be inclined to take a look at that and then finally if you want to take a look at today's demo you can follow this final link down here aka.ms cosmosdb hyen nosql hyen co-pilot and you can get started really easily so that's it for today thanks for joining me to learn more about Azure Cosmos DB and our AI capabilities all right I feel like I got my database filled there between SQL server and Cosmos DB really cool stuff thank you so much David and James now I've got some more content coming up for you
I've got two more sessions but I want to make sure that you know about our event survey so that you can let us know what you think of today's event head on over to aka.ms netfocus a evaluation not just about the event but what you think about using AI with yournet apps click through that link take a picture of that QR code and it'll get on board so that you can join us talking about what we can do to improve our next event and what we can do with AI in.net now there's a whole lot
of things going on there I want to make sure that you have opportunities to learn more so we have an AI Challenge and some live streams coming up over the next month we've got a bunch of hosts set up aka.ms netfocus a credential challeng live right on through September 20th there's content available there's some challenges there's some tasks for you to check out and we've got live stream hosts throughout the month that are going to be talking about all the great things that you can do with artificial intelligence and.net okay and of course today's event
is being recorded it's on YouTube we've got some folks behind the scenes our editing team are at it they're putting together a playlist for you that will be released at the end of the day today so that you can go back and review and share some of the great things that you've learned some of the things that you've enjoyed with your colleagues with your friends so that you can all do so much more withn net and artificial intelligence now coming up I've got two more sessions for you our first session is going to be from
Tim span who's going to talk about integrating semantic search capabilities with net and Azure using milis that's a that's a Vector database that you can use to work with your artificial intelligence data after that we've got a session coming in from from somebody who's actually building and using some of these Technologies they've deployed and they've got an application running out there in the public this is going to be Vin Comet joining us from H&R Block talking about some lessons learned building and deploying generative AI Tech using net and Azure all right Tim the stage is
yours hi Timothy span here today I'm going to be talking about the milis Spectre database uh integrating semantic search capabilities with net and Azure I'm Tim span I'm the principal developer Advocate at zillas we are the uh people behind the open source milis database today we're going to talk uh pretty quickly about a couple of things related to Vector databases and how you can use them uh with net and Azure a little background around that and some insights around this first off let's get introduced to you know why would you use a vector database obviously
there's a ton of different places to store your data ton of different databases what's special about Vector databases why would I want to use one well in today's world obviously there's a ton of data there's also even more of that is unstructured data and unstructured data is the new type of data that you see a lot in AI images video text audio you know data that's not created by a computer you know it's usually created by people and it's just a data that's a little different from the traditional data sources and there's obviously a lot
of them now Vector databases were created to work with this type of data so they really work well with that they're not designed for traditional relational data or other data that you're used to you know and that's uh why they came about what made it possible though well the reason why we could do Vector databases now and not say 20 years ago is a number of innovations that have come about very recently uh some of them are the massive storage of documents images all the things we have on the internet video ton of these different
data sources and then the thing that really drove it that we can make these Vector database is the ability to create vectors and that requires deep learning models and a lot of these are very recent in the last couple of years that let us be able to create large vectors of data now this is just really a big array of data like massive arrays of numbers and it's the type of data that you don't want to look at as a person but AI models love this type of data and it is also works out to
be a really good storage format for all of these unstructured data types images video text what have you uh fortunately there are modern Vector databases that run in Azure and in your other environments and that's uh milis milis is an open- source project that's been around for a number of years some of the benefits of it are it's really easy to set up code is nice because I can write it once and wherever you need to run it you can run it integrated with all the different tools systems platforms other databases data stores all the
different projects that you might want to use things like Lang chain llama index all those are supported all the features you need you know DSE embedding spars filtering reranking all the things that you maybe reading about for Vector search for Vector databases that's all in there there's a lot of users of it it's been around for a while it has a large number of people working on the source code in the open source and it's part of the Linux foundation for AI and data which means that there's a community of people working on it and
we're all working on this together now what might you want to use milis for now it's been designed to store index query all these uh Vector data you know for any kind of unstructured data at an enormous scale and it has features like Advanced filtering hybrid search so you could search on a vector plus something else maybe uh search on an image plus a description of it or some other fields you might have related to uh date or time whatever whatever other kind of more structured fields that you might add to your uh Vector there
uh is very durable it has backups all the features you expect to have in a Enterprise database replication High availability ability for data to be sharded among different uh servers being able to aggregate data full life cycle management you know the standard uh create replace update delete all that is supported support for multi- tendency so you could run in Azure in a cloud environment with a lot of users a lot of different people querying adding data able to do massive amounts of queries because if you're going to hook this up to say a system for
AI chat there could be a ton of users out there being able to insert and delete massive amounts of data getting the full precision and recall that you need and support for all the Advanced Hardware that we see out there things that uh accelerate it like different gpus like from Nvidia uh fpga you know custom Hardware that may be in your aure environment and being able to scale up to billions for as much as you need to store which is important as you start looking at the number of images video text Data you know whether
it's little chat or documents quickly those go into the millions and more so it'd be nice if we just had that M us would be great but we've as We've Come Through The Years A lot of people have enhanced the open source to add things that make it more robust flexible to fit almost all the use cases and all the different types of data is not there isn't one size fits all you know there's a lot of different indexes out there for uh Vector search and we support a lot of them right now it's over
15 and this is things that you'll you're looking up about Vector databases you'll see them things like binary spars uh dis based ones GPU based indexes ones that run in me uh in Ram ones that run on different type of uh analytics of the data lots of options there this lets you optimize how you want to search and balance things off like cost versus performance being able to tune how accurate the thing with unstructured data when you're searching it you could set a percentage of accuracy that you want for your matching and then how you're
going to search there's a lot of different ways to search Vector data and we support the majority of of them whether you're doing the top K number of returns or range you want sparse or dense are you looking you want to search across multiple vectors at once you want to group data together do you want to use the metadata and extra fields that are in uh the collection to filter them down you know you might have two of the exact image but they have different metadata so you want to be able to return just those
um being able to have like we mentioned before multi so multiple users multiple applications maybe multiple companies being able to go through the same collections at the same time but with things partition for security for efficiency uh just for keeping your data separate if your different groups and with all the different Hardware out there as we've seen in Azure you got a lot of choices when you're picking what you're putting out for your uh platform we support almost all the ones that will improve the speed and the uh Power of uh the vector search whether
it's uh quantization cach aware optimization gpus which we know they're great for a lot of things this speeds up your query speeds up the inserts reduce your cost makes things more scalable and all that flexibility is there and again this is all open source there's always the ability for more things to be added by different uh groups out there now we could search across a lot of different unstructured data and this leads to a lot of cool different use cases uh the most common one and this one pops up a lot especially you're working with
uh the different llms out there is having a retrieval augmented generation so you can get the exact data you need to be able to optimize your prps get back what you want tell the model exactly what it should be searching for by looking at say a list of documents relevant to your uh question what's nice is uh milis is extremely fast so I do my search whoever I need to find semantically get those results back and then very quickly be able to get that to the language model that is really important use case uh doing
recommender system these are very common you don't need to have ai for this just being able to uh match things semantically uh against your different data sources is pretty cool uh just searching text you know you might just want you have a huge amount of documents stored and you just want to find things and find the right context and the right meaning of things you know is Apple a fruit is it software what is it you know that's important and that's part of what the milis has being able to find images that are similar for
me I'm always searching for uh cats that look like my cats you could do the same with video do the same with audio you know I want uh I'm trying to find someone that has a voice that sounds like me you know I'm probably going to find a lot of people from New Jersey that's pretty cool um I've seen this one uh recently a lot of uh research in finding different types of molecules by able to spin a molecule get it to the way you want and use that to search pretty helpful when you're doing
uh research on new uh medications pretty powerful stuff there being able to find anomalies uh if things are supposed to be similar and they're not you know you could find things that are wrong and this could be lots of different data points whether it's logs images lots of ways to find out the 's faults in equipment or in different systems out there uh this one is pretty cool being able to search multiple types of data same time so I can have an image a video audio text search it one way and get results back what's
cool with Vector is I could take however you want to search could use a keyword I could use text string I could send in an image I could upload a video and say hey find stuff that matches this video or speak and have it match that or put a combination multimodal I could have an image of something and a little description like here's a cat and go give me cats like this but they're wearing a hat you know pretty cool stuff like that obviously more Enterprise things you could do as well but lots of fun
stuff so milis is cool the great thing is you could use it in the environment you want and use it with the uh tools that you already know so fortunately with the help of uh Microsoft Engineers we've built out a powerful net SDK in C and it's about as easy as you could possibly be for you to use it you just use the standard.net add the package and you're ready to go uh there'll be a new version of this in a few weeks uh but this one will work with all the modern uh milis versions
out there make sure you're using at least milis version uh two and greater we always like to use the latest one is there's always new features uh patches enhancements speed improvements and you could uh click and get that library right away and start using it for your applications uh there's also a semantic kernel connector I know a lot of people are using this pretty awesome uh framework for a lot of AI applications un fortunately there's a connector there as well again new versions of that coming soon you could quickly use those as well pick whichever
ever one you want to work with they both have advantages uh try them both see which one fits yournet application better uh we have an example and I'll show you a little demo once we get through the slides keep all that together and the code is pretty simple you know you really connect your client to your host now this can be running local this can be running in a cluster this can be running through zillas which has a Marketplace in Azure and I'll show you that and you'll have uh you know the ability to turn
on SSL we also have a token you could pass in again multi levels of security now we check to make sure that we uh we're able to connect then you could easily do a search pretty standard depending on what types of data you want to get back and what data you're going to pass in pretty easy to set this stuff and this is all well documented with online documentation that we'll uh share with you so once you've built your app well you probably should have milis close to where your app is running if you're running
yournet app on Azure then you probably should be running milis there as well now milis has a couple different ways to uh run on Azure as you might expect one if you're you know run your own environments you really like to do kubernetes milis has great support for kubernetes you could use uh AKs out there just using the the standard tools you might expect we've got all the helm charts and all the things there and there's good documentation on using the uh kubernetes service to host and run milis milis is a full Enterprise architected application
it has multiple layers that can be expanded out to make sure that you index search query your data at scale so if you need to scale up individual portions uh to meet needs like if you have more data storage or if you have more indexing or you have more clients that are running queries you can enhance all those different layers depending on your need for whatever use cases you have which can vary and can vary at different times maybe at uh busy times when you have a lot of images coming in you might have to
increase the uh number of indexing or you might have to increase storage all that can be done very easily on Azure on kubernetes all right now it makes sense to deploy milis on Azure and if you are someone who runs an Enterprise environment you're probably running kubernetes already and AKs is probably the best way to do that so fortunately milis uses all this standard stuff to run on kubernetes you just use the sard tools that you expect the Azure CLI uh Cube control and Helm all the helm charts and kubernetes uh materials are available in
the open source so it's pretty easy to run this and it's fully documented and there's also documentation on the Azure site so pretty easy for you to run an environment this way this makes it uh really good for being able to spin up things quickly being able to expand this is important with milis as it is a distributed application with a full Enterprise architecture so it has things like proxies it has different uh indexing elements it is separated compute and storage so there's a lot of different things that you could expand based on what needs
it like do you have more storage are you running more indexing you have a lot of people querying and you can expand those sections or bring them down to save costs as needed as different parts of the environment now if you're not into doing that and you don't want to run this yourself fortunately milis is on the Azure Marketplace you just go there pick zillis Cloud get it now a couple of clicks you got a full Cloud environment fully managed you don't have to do a hell of a lot of work this is a great
way to do this very simple very easy to subscribe to the Azure Marketplace and have a full uh managed environment in a very uh coste effective manner now we have a short video example of that uh no audio on that so I'll uh show you that while it's running and then we'll do some demos uh later but uh pretty straightforward so what you have here is when you've gone to the marketplace you're going to pick uh how you want to deploy this and we dedicated this on Azure and by default we do a Virginia's the
most common and as you could see pretty inexpensive uh depending on how many Cloud units you have you'll have your environments cluster will be created pretty straightforward you know you get your credentials and then anyone could uh log in what's nice is it's really simple puts you right into the uh IDE to start working and we will uh show this demo uh next up and you'll be able to uh see a little bit more about that so again there's different ways you can run this now as a if you're a developer or someone learning you
could just run this on your laptop within docker or uh local kubernetes pretty straightforward there's also a small Standalone version that doesn't require much environment at all so it's is very simple to just start figuring out how to build your collections what the fields are how to do searches great way to learn things and try things out now as uh in infrastructure out there there's a ton of things with large language models it's more than just I need you know a vector database I need all the different Frameworks out there which could be semantic kernel
or Lang chain or llama index I need different embedding models you know pie torch has some hugging face Co here open AI ton of different options there and you can work with all the major llms there's just a couple examples but it's there's connectivity to almost all of them whether it's open AI or any of these other ones run on Intel and Nvidia hardware and obviously install on Azure without too much difficulty some of the main things to see here is the open source for the community for milis is very important we've got a lot
of people come together to make sure this works well as well documented you can learn it pretty easy and it's not the heavy lift there lots of different use cases and none of them require the same things they're not all exactly the same going to require different indexes different types of searches what's cool is you can run your milis cluster right on Azure and if you don't want to do it yourself you could just use the marketplace it's pretty good idea to keep your vector database close to where you're going to run your gen llms
which makes sense you do it all in Azure and scalability is going to be important for these apps because once people use them once people like them it is going to expand greatly uh before you know it almost instantly you're going to have a couple users and there's more then all of a sudden you're like oh I need to be able to scale out to you know terabytes of data hundreds of users that's a good problem to have but you need something that'll handle that and it's pretty awesome what's nice is if you don't want
to run anything yet but you want to give it a try click this URL just go to uh milis demos and you could run a demo on top of uh milis and this one's pretty cool is you upload your own image and as soon as it uploads it does a search on that and gives you a bunch of results back so you get to run your own demo without having to set up milis and all it's linked to the source code so when you're ready to try this on your own get just uh download it
and get it started so that's a really cool one I like that one uh the GitHub is there if you want to look at the source code if you need help we have a pretty cool uh Discord Channel where we answer questions we also have a AI bot in there that'll try to answer your questions but it's not as good as the engineers and the uh users that we have in there but gives you a couple of options if you're asking something that's very common or maybe a little different uh thanks for coming to theet
conference on AI but let's get into some demos uh first off I want to show you that source code we did now I'm going just a visual code here and it's as simple as uh we showed you we connect to our client here is where it's going to be different you're probably going to have a a much longer link here to get to the host different port you're going to have SSL on you're probably going to have a token for your security I'm running in a local Docker environment and one of my machines off to
the left here just is that's a good way for demos so the demo gods don't come after us and take away our cloud or someone forgets to put their credit card in it kind of happens in demos so we do our connection which very easy to swap that out for different environments different security methods uh then we for this one uh we're taking a look if there's a collection if we have it we're going to drop it we want to rebuild it do this one fresh to get a fresh demo here so we just load
the data that we have we create our collection what's nice with net we do that asynchronously then we'll just iterate through it and uh print a couple things out pretty simple now if we want to build and run it again super fast here it is already built you know not much going on there and we already ran it now we could see the results of this I have uh what's called ATU which is an opsource tool to look at the data you want to be able to see it once it's been loaded and that was
that collection you saw there the books and you know it's pretty we got to load it first let's make sure we have it loaded once you have it loaded you can uh look at the data get it going and uh do a search browse the data whichever makes sense for you uh what's nice is you could also filter it based on different fields like if you have a caption here which is text you could just uh filter on a part of it again you could look under the covers see how many partitions there are look
at segments you know if that's of interest to you probably not you're more interested in what's your real data there just to give you an idea now if we're running this environment in Azure uh through the marketplace we're using the zillis cloud to do that and very easy to get to your different environments and see all the different collections created like here I have one for travel advisories and we could see different fields here this one is a more balanced one uh we also have ones where there's IM data so we could look at uh
just do a little quick preview of that and you could see all the fields here obviously looking at the vectors it's a bunch of uh floats not too interesting but for searching really important that you could do that quickly now if we needed to create a uh more advanced environment I could create a new one on Azure uh you know optimize it for capacity make sure I have backups figure out my backups make sure I'm paying people and then create my cluster and then I'll have one running on Azure at scale like I have here
in the small what's nice is it's very easy to use this and do whatever you need to do here also if I needed to add new data it gives me apis to do that and also this handy UI so pretty straightforward to run these things it is not uh not a difficult or heavy lift here hi and you know if you like what you see with milis and you want to learn more we have a unstructured data Meetup that runs in different locations around the world and also we stream it to uh YouTube and to
other places later so you can watch it at your leisure if you can't be in person but you could still ask questions we make sure everyone who speaks provides slides and their source code whether it's a notebook or code makes it very easy and you could just click and get right there I run the one in New York City we've got another great one in Berlin and another one in San Francisco and again uh recorded for you later uh available for everyone and we talk about things in unstructured data Vector databases milis llms we get
a ton of people from uh different startups and from different companies come out it is a great time on the uh East Coast we usually run at Microsoft reactor and on the west coast we usually at GitHub uh definitely come out it's a fun time we get some pizza and we go over some really cool technology fun time and then finally uh give you an opportunity to either run it on your desktop and like a Docker or Standalone or when you're ready to uh be in an Enterprise environment run Zilla Cloud just grab it from
the Azure Marketplace or just spin it up yourself with AKs very easy to get started with Vector databases uh it's been a great time speaking with you I'm Timothy span principal developer Advocate at zillas covering the milis open source database thanks for attending my talk hey everyone um my name is Vin Kat I'm a principal architect working for H&R Block and today we're going to talk about Lessons Learned From applying generative AI apps with net and aure here is the agenda um we're going to talk about Tex stack and architecture first uh show you a
little bit of demo with AI tax assist that we built and launched this past season uh benefits and shortcomings rack challenges evaluations um AI agent so there is another demo that I'm to do a simple uh AI agentic chat demo uh using semantic kernel and.net uh model deployment options um some really key insights uh and takeaways um during our journey for generative AI so far and then I have some questions that I'll share with you that we usually ask to do some use case Discovery process all right uh I do want to start with u
a key message message that I want to share is that this is a great opportunity foret developers um and here are the reasons um generative AI models have made AI accessible to appd Engineers uh in the past it was very hard to you know have an appd engineer think about and learn machine learning and all of that stuff now that it's just an HTP call it's become a lot easier to play around with this and especially things like um semantic Kel sdks and AER open SDK all these make uh our job as as developers and
Engineers uh make it more productive um we all have learned over the years to build deploy launch and scale apps um and this is no different building a AI enabled app is no different than any other app um there are a few things that you have to pick up as part of the skills such as prompt engineering and uh orchestration but end of the day you're still building an app you still have to scale it uh and serve your customers uh one thing unique about this is this AI wave is faster and it's accelerating um
as we see every day it's hard to catch up but is also more accessible um if you compare it with like mobile cloud devops and platform engineering waves U this one is is very different and also um is very impactful for Enterprises and businesses I would still say though based on experience that 80% is still app and data engineering work compared to what the AI model is doing of course it's going to bring uh huge gains to your users and customers but think of it like one more uh tool in your stack so I'll quickly
talk about what we did um this past year so uh we delivered two sets of uh capabilities one for our customers um the a tax assist which I'll demo here in a bit and then uh on the other side we also build some things for internal Associates and employees um we were one of the first ones as far as I'm aware that we built a chat GPD app for associates and we built it on using Blazer net stack and aure openi and this was a huge um uh opportunity for the employees to kind of you
know understand how gen Works learn a little bit of prompting use it in their day-to-day lives uh before co-pilot became available to all of us we also did a whole bunch of experiments with um Rag and agent decaps um and we are still continuing to do that um we went with a a simple approach we took uh a experience that was lookup based to a AI infused humanlike experience um where the assistant can answer your questions and this is a common pattern for any AI assistant or co-pilot um but I did want to demo it
I'm going to um start this video here all right let's see a quick demo of H&R Block online's AI tax assist so uh we used to have this help and you can ask questions like what is EV credit limits and you get a bunch of responses and you have to go through figure out which one in this case it might be this one instead uh with AI you can get humanik responses to your questions augmented by H&R Blocks expert knowledge base that uh is developed to specifically answer some of these questions um you can see
there are some readymade prompts here to get it started I'll go ahead and ask U First prompt um what is the income limit for Ev tax credit the uh it talks about the limits that we have for 202 3 I can also ask something like um my AGI is $60,000 um does EV credit apply to me so now you can see based on your AGI of 60,000 you may be eligible for Ev tax credit uh with the same additional information so uh this is a good example of how um you can augment your existing help
um with a humanik AI consistent experience um there are other capabilities so you can go back to history and uh download the chat or go back to a question that you had asked last week or you can start a new chat uh I do want to talk about the technology stack that we used um it's no different than uh standard stack that that you would use um we have we use um net web API as your functions signal R like I said semantic kernel and the open STK Cosmos DB aure storage uh we also use
Blazer app and net to build a test harness uh that we can run evaluations um on the AI responses um we also use predio SDK which is uh python library to do PI anonymization um from the infrastructure side um nothing new there if you on aure you kind of know uh these Stacks but um we used aure open a service um app Gateway app service environment Cosmos TB so on and so forth the architecture looks pretty simple um not a whole lot going on on the right side of the dotted line you got the build
time or design time injection process you have multiple content data sources you inest them and uh do pre-processing put them in Blob storage then have the ad search create a vector index on that content and then ready to go when uh runtime request comes in production um over here on the core application UI side um as the user query comes in um goes through app Gateway and then we have a a web API where we have semantic kernel um on demand in memory records um query and retrieval persistence and Telemetry we use app inside Cosmos
TV and Signal R here uh like I said we use the CDO for the pi anonymization and this is where we have um the HTTP calls happening to open AI GPD models and Ada models to embed the user query so we first embed the user query then we do the retrieval then we do the Generation Um so fairly simple architecture here not a whole lot going on but as we discover and evolve this pattern um we are thinking that there is there's going to be more and more components that you're going to add to this
blue box right here now I'll talk about some benefits and shortcomings of simple rag implementation it's very easy to get started right it's fast easy uh uh prototypes can be built in a matter of you know hours or days uh the big benefit here is that you can augment the llm with uh your domain specific knowledge base which is huge because you know otherwise you'll have to spend a whole bunch of time and money on fine tuning or you know other types of capabilities with AI engineering but here you just don't have to do that
um if you have a knowledge base you can turn them into embeddings put them in aure AI search run those vings uh all of that is a is a huge benefit to you know get something going quickly fast however there's a whole bunch of shortcomings that we are still working on and discovering and nothing is perfect by any any means like you use any rag based application there are always shortcomings so guard reals and accuracy and high quality of response is a very important uh thing to focus on early on so your journey to be
one is kind of long drawn out if you don't have the the background or experience it it becomes hard to you know deliver something within your timeline at the same time learn and evolve and improve so you have to kind of plan for it and then you may also find that in your first attemp the you may find the the that that you know users are not really that happy with some of the quality of the responses so you have to put in a lot of work in evaluation and um you know you only increase
uh complexity and make the quality better only when you're ready for it and uh has if you have gone through all that process I do want to share some rag challenges and strategies um these are fairly common uh first one is you know you you have a search it retrieve uh it retrieves the content from your vector index but it's not 11 to the query this could happen um and here are some solution strategies that we have come up with and it depends on what kind of knowledge base you have so it's not always necessary
that the solution that I'm I'm sharing here will apply to your knowledge base so it depends on a lot of things like chuning strategy and quality of your content and data uh I'm just sharing what worked for us right so search is not retrieving content we know to be relevant to user query so this is another critical one so uh we are running the retrieval but you know we're not getting content that has to be there but we not getting it right um so now you know you can ask a whole bunch of questions do
we have the content is it relevant we have to add it are there any metadata filters or fields that are missing that are important for the vector index to be added maybe you should try larger chunk sizes so there are different these strategies that you have to try and evaluate and each time you try them you have to run them through the evaluation Pipeline and I'll talk about some of the evaluation metrics that we used um another very curious problem is jumbled up and mismash of response you send multiple passages like let's say top five
results from the retrieval and you may find that sometimes the llm is returning a mismash of responses from something from the first passage something from the third passage so you feel like it's not giving you the the high quality response so this is where prompt engineering comes in you have to prompt the llm to only use relevant passages maybe it can use its foundational knowledge and the context to determine maybe you know the third passage is more relevant than the fifth passage things like that you may want to add additional metadata to your um Vector
index so there there's a lot of things that you can do and this is just the highlights there's a um whole bunch of new novel rag strategies that are out there that you know may improve your quality the last one is kind of uh obvious and it is like anytime there is a follow-up carry like you don't want to run the retrieval right every time so um there are strategies like doing intent detection first before you trigger the final llm call or you can reform the query itself to produce better retrieval results talking about evaluations
um there is something very important you have to consider and that is what is the mix of using llms and other models to evaluate your AI responses or humans like you have to kind of figure out the balance between them uh what worked for us is we use obviously use automated evaluation pipelines that use llms to to kind of score based on all these metrics but we also do uh human eval using sampling techniques and those sampling techniques can vary depending on which area of your knowledge carpus where you have to put more effort on
which category maybe it's product specific questions that you want to put more effort on or there may be areas of um in our case domain specific tax knowledge right um so you have to find a balance and kind of you know have to figure out a way to do both doing only one cannot cannot scale obviously so Auto automating the llm evaluation pipeline is a huge task that you have to start thinking about early on and there are two options here one is to use the Aur studio um there there is prompt flow and evaluation
flows which are available great capabilities that you can use and uh I see that as time passes more and more varieties of metrics are being added like recently I believe F1 scores are added recall Precision uh things that uh originally were not available or you can build your own uh in our case we started the journey very early early last year so we started building our own Tool Set uh we're working on evaluating aurea Studio to figure out how it can augment and complement uh some of these evaluation runs uh groundedness is a very uh
common when you have a rag solution um it basically checks how grounded your AI response is based on the context you provide after the retrieval coherence relevance or self-explanatory helpfulness is another metric that uh we um discovered or played around with ourselves uh this is basically looking at um your AI response from user lens this is not about you know how grounded the responses it's more about is it helping the user which means you have to provide additional context to the llm about who the user is what are they expecting um what are the requirements
uh from a product standpoint whole bunch of other instructions to the llm that says I'm now going to evaluate your response uh to make sure that you know it's helpful or at least it appears to be helpful for the user so this is a very new new to me too U but it's something that we are experimenting with and find benefit from Keyon evaluation is another one that our data science and engineering team came up with where you simply have a a bullet list of items that you want to evaluate so it's not like you
have a actual English language text but you actually extract key points from that expected response or answer and then you evaluate against those key points Um this can improve the accuracy of your evaluation method so that's a a new way to look at it and again this is something that we are constantly evolving and learning um I encourage you to try both a your stud and build your own and then figure out what works best all right um here is a quick demo of agent group chat using semantic Kel so let me switch to visual
studio um what this app does is it accepts a receipt or a document um right now in this case it's a donation so it looks like this so I donated to charity $1,200 uh I want to be able to upload it and uh have the agenty group chat tell me whether I can you know discard this receipt uh or I can I need to keep it based on a tax impact analysis um again I'll warn you that this is not real right I created a sample just to explain the concepts here um I have two
agents a tax impact analyzer so prompt looks like this and a reviewer so the tax impact analyzer analyzes the document first um using the latest GPD 40 model and then the reviewer looks at it and makes determination it also asks a question um a follow-up question before it can provide a recommendation um so we start off with loading up um API keys and whatnot the kernel using the GPT 40 so when you add add your open a chat completion um you have to provide the model name endpoint and API key uh you set up the
execution settings load up the prompt and there are many ways to load up the prompt this is not the the right way but for the purposes of demo I I tried to take shortcuts um here is a new uh chat completion agent capability that's currently um experimental but I expect a version of this to be made available in the final version of semantic kernel um but it works great uh I've tested this so here is here is the definition of what the agent does and then you can provide the instructions that you um grabbed from
The Prompt same thing happens uh with the tax impact reviewer it's very important to give them names and descriptions here although you might have additional information in your system prompts and then you go ahead and start a group chat um the key thing to note here is this termination strategy so I created a a recommendation termination strategy where I decide who is the agent who needs to terminate the conversation and how many iterations or turns it needs to take so if I look at this uh termination strategy it's a simple strategy you may have more
complex strategies but all I'm doing here is checking for the phrase I recommend so once I see that the tax impact reviewer has reached the a point where it's saying I recommend that means you know the job is done so that is what I'm looking for that terminates this process and I can then provide that recommendation to the user so here is a simple while loop I initiate a chat history and then go ahead and upload the image so I have a console app so I'm going to type in the path of the file um
I add a message so here is a user prompt It's always important to add a user prompt um although it might seem that all I have to do is upload an image uh um it it provides a little bit of additional context to the llm and then I go ahead and say group chat and invoke casing when I do that um you know I get a response and then deciding on whether to show the response to to the user is up to us so in this case I'm checking you know and making slight modifications where
I say if it's reviewer make it bold um if not act like it's a log right so that we we see so I'm going to run this uh sample here keyo discard so my file is in data donations image and um here we go so the first step is tax impact analyzer is providing the analysis so it determined that it's 1,200 from um organization called share and care all that looks good and then uh the tax impact reviewer took that and decided to ask this question now notice how tax impact analyzer is also providing some
commentary in the background but you know we've decided to kind of uh use it as as some kind of log so in this case I'm going to say you know yes that is a charitable organization 501c3 and then um tax impact rer comes back and says I recommend keeping this document so this is a very simple example but I expect in a business scenario you can figure out how you can chain multiple processes and interactions and specializations of AI agents using sematic colel and net and basically build these capabilities that are high business value all
right so let me switch back to presentation real quick I do want to talk about the model deployment option so when you deploy model in AER open AI um you have two options you start with standard that's how we started um you take measurements um you you know figure out based on usage how many calls are required um TPM things like that you have to take a swag when you're first time developing an application use the capacitor capacity calculator tool in aure Open studio or you can you can use Excel if you want uh you
can test it with you know like lowest pdus possible um maybe 50 in load environment and then gradually increase as you need it u in production find The Benchmark workload to optimize pdus so this is really important so once you launch you may discover that you know your PTU needs to be increased or decreased um that's something to keep a Clos eye watch on and then always use a fallback standard deployment for resiliency so something happens to the um production model you always have another production model that's in the standard deployment uh mode so that
you don't have cost associated with it and you only pay as you go I have a bunch of Lessons Learned here um not going to drain the slide but couple of key things here uh pick a leading model and stick to it uh there is a lot of um news out there that you know there's always some new model coming out it's very important to you know improve your solution as it relates to your business rather then always worrying about which model is better um things like continuous Improvement of evaluation pipelines is also very very
important um but the first one is is also a mindset that you have to start thinking about is this is not chatbot V2 um if you're building AI assistant or co-pilots that is uh you have to think them like you know they are human like assistant so if it were human would they would would they respond in the same way as the the AI assist is responding that's something to think about uh few more insights here uh one of the new ones that I that I've highlighted here is quality of outcomes is more important than
response types if quality of responses is not great then you know it doesn't matter it's built with AI or not uh you have to make sure that you know even if it takes a longer time so let's say instead of taking you know 10 milliseconds uh it takes like you know two or three seconds maybe five at the end of of the day if you give a better response to your customers um that is a a better success measure than always thinking that it needs to stream it needs to come in 10 milliseconds whatever that
may be so um lean heavily on quality of outcomes um the other uh lesson learned here is if you use multi-agent action based plugging like web search API calls there are so many examples of that advanced G of thought thought prompting all all of these put together make up a very powerful combination for reliability accuracy and realtime information especially with web search right um in our case we don't use web search for our stuff but we have done experiments with it right so that's why I didn't want to include it um the llms are becoming
more steerable and and and sensitive to your words characters and structure so pay attention to uh your system Proms and um consider reinforcements which is basically repeating yourself in the prompts to ensure that llm is adhering to your instructions um let's move on to use case Discovery um and this is really important as you start to build out AI enabled apps um there will come a time when you know everybody is going to stop worrying about you know AI assistants and co-pilots and then slowly start to ask this question what else can we do with
it right uh and this this is natural uh we have to think about um taking Ai and and you know augmenting our existing apps right not like building a new uh co-pilot app but an existing app and how can we make it richer for our customers more useful save time and make it a joy so in order to do that we have to ask some of these questions right does it require humanik text generation um maybe you can use it for code generation purposes where the Json schema or the C code or whatever that is
is very domain specific specific um maybe you can use it there with structured output capability that was just released um it is starting to become more and more plausible and feasible um you have to ask a question like does it have defined inputs and outputs especially for text and images like putting them together like we saw in the demo uh can the definition of input and output be documented with prompt engineering mindset so what I mean here is when you decide um these inputs and outputs you have to ask yourself if I was ba right
or a business analyst and I I have to write a spec how would I write it right and you're you're handing off that spec to uh in this case llm so basically think from a business standpoint and thinking from a specification standpoint and um eventually which turns into prompt engineering like how we provide instructions is very important and when you do that uh new use cases emerge um just last week um I was part of three different use cases where they were right in front of us uh we did not know until we started talking
through some of these questions and you know someone was like can we do this and well there was a perfect use case so um also ask ourselves is it a stateless design uh if you go if you're using GitHub co-pilot there is a button that can you know generate commit messages based on the um G diff and That's a classic example of it's completely stateless right like in the moment and providing something useful that you know users can use also key key question to ask ourselves is can the input and output be fit into the
token limit there's always a context window very important to ask that question and then what is the level of determinism is it very very math oriented or very deterministic where you have no chance of making any mistakes right like tax engine for example um so that may not be a good fit um at least right now so these are some questions to ask to figure out uh what kind of use cases are applied um and how you can take generative Ai and apply to your business scenarios now I do want highly recommend this talk by
Steve Sanderson uh which does go into detail and shows you a lot of demos um that are not just chat Bots but more than that right and how how we should discover and execute on this great opportunity that we have got as net and aure Developers thank you all right it's so cool to see how other folks are using net Technologies I always feel a little bit inspired when I see some of the cool things that other folks are building and how how I can take a little bit of what they've learned and apply it
to my applications I like that I like that a lot this event is being recorded it's on YouTube you might be watching us on YouTube check it out there's going to be a playlist that's available at the end of the day today head on over to youtube.com. if you're not there already and you'll be able to find the playlist with all the great content when we publish that at the end of the day today all right make sure you like And subscribe and all that stuff over there on on YouTube so that you know when
there's more live streams and other things going on with the net Channel we always have great video content coming but first I want to make sure you know about some of the other events and things that we''ve done NETCOM 20 24 is coming up in November November 12th to the 14th it's a 3-day event and it features the net9 launch all right you're going to want to be a part of this we've got some great speakers already lined up from Microsoft and there's an opportunity for you to speak as well if you're a prospective Community
speaker head on over to netc con.net and there is a call for Content link there click that and you'll be able to fill out some forms and potentially join us as a speaker for netcon day three has more than 30 sessions of community speakers talking about some of the cool things that they've been doing with.net it runs for 24 hours you can join us you can talk in your own time zone on your own time you don't have to wake up at 3 in the morning to give a presentation to some folks that are in
Seattle you can join us on your time present and have a great time as part of NETCOM 2024 like I said November M 12th through the 14th we also ran an event just a little bit ago net Aspire Dev day where we talked about all the cool things you can do with our new tech stack.net Aspire and that leads me into our next two sessions that we have coming up so in addition to that event that we had we've got a session coming up from gar off Seth talking about adding generative AI capabilities into your
web applications that you're going to run on Azure app service net Aspire comes in with our second session from Anthony who's going to be talking about observing AI applications from Dev to production using net Aspire all right garv you're up first let us know a little bit more about Azure app service and generative AI hello everyone my name is G Sate and I a product manager on Azure app service team and today I'm going to talk about how to add generative AI capabilities to your net web apps for Azure app service so before we get
started here is a quick view into the agenda for next 30 odd minutes so I'm going to talk about some of the key app service benefits AI assisted migration and modernization how do we integrate geni capabilities for a classic framework. net web app I'm going to talk briefly about intelligent observability and how do I integrate jni capabilities again for a net new net web app using net 8 so let's start by talking about some of the app service benefits now there are multiple benefits of deploying your web apps on app service but today I am
focusing primarily on the ones that are specifically useful for apps that use generative AI k capabilities now first thing is you focus on integrating geni capabilities with your web app and not on the underlying infrastructure now we also know that with the speed of AI Innovations you may need to release newer versions of your app faster and easier and that's where Native cicd Integrations with popular service like GitHub Azure devops Etc makes it super super easy and fast for you to go ahead and release New Year version of your uh web apps now it is
also important to understand that while you may scale your AI backend or any other backend services that your web app is talking to you would also require to scale your web app to address sudden burst in traffic and that is where the platform allows you to scale either manually or you could do an automatic scaling or you can also do a rule based scaling wherein you can use stuff like schedule of schuer or you can use uh even use a certain metric that you want to scale your web app now remember app service supports most
of the popular languages and Frameworks irrespective of Windows or Linux operating system so again you have all the support required to deploy your web app to Azure app service now we all agree that security is critical for your web apps you can build where Security First posture for your web app using features like virtual networking managed identity based authentication Microsoft Defender integration becomes super important for your web app and these features are available for you when you you deploy your web application to Azure app service now again if you want to use advanced concepts like
vector caching for your gen AI web apps or utilize AI powered monitoring auditing solutions from some of our Microsoft Partners we have recently introduced a sidecar feature for Azure app service Linux which again becomes very important and even remember this is all about how do you integrate geni capabilities for your web app even for Azure app Service as a platform we have started utilizing the power of azure copilot to help you diagnose and troubleshoot your web app now let's briefly talk about some of the demo scenarios that I'm going to talk about so you have
a net web app on Prem that you plan to migrate to app app service and also Infuse generative AI capabilities into your web app now I will show you how to use AI assisted Tooling in Visual Studio to analyze your source code and then use free tooling to migrate your web application to Azure app service I will finally showcase how to introduce generative AI capabilities for your existing NET Framework web app without the need to redes design or redevelop your complete web application now before I get into talking about the first scenario I again would
want to reiterate the other scenario as well so while I said that I'm going to talk about how to introduce generative AI capabilities for your web application which is based on as.net framework I am also going to talk about how to introduce gener ative AI capabilities when you when you are building a net new application for example using net 8 so these are the two core scenarios that I'm pretty much going to focus this entire discussion on so let's get started with AI assisted migration and modernization and then also how to introduce generative AI capabilities
for a classic net web app with that let me switch over to visual studio okay so this is a sample web application that I that I have here and what I'm going to do is that I am actually going to use the visual studio Marketplace extension called called as Azure migrate code and assessment toolkit which basically does a source code analysis and also if you have the GitHub uh co-pilot chat integration with your Visual Studio again it also makes you suggestions and recommendations of code changes that may be required so how you do that is
you go ahead and download the extension from Visual Studio Market Place install it and once that's done you basically kick click on this new option that you start seeing called as rep platform to Azure now you presented with two options either you can start on a new report or you can you can also open an existing report so I'm going to start by clicking on a new report now it lets you select the Target that you would want to deploy your code on so in my case I'm going to choose Azure app service Windows I'll
click next now because I have two projects it gives me option either to select both or one maybe for the sake of Simplicity I just select first project I click on next now in addition to the source code and settings I can also make a selection about binary dependencies I'll go ahead and hit analyze now within a matter of seconds I would basically give I'll be given this nice looking dashboard which gives me a summary of s vity categories Etc now let's go ahead and look at different issues that's been reported so as you can
see that I first issue being reported and again you can see the sity being reported is potential optional there's yet another ciity called as mandatory now take for example if your web application was to use uh integrated authentication on Prem now when I when you move your web application over to aure app service you can still use authentication but that would be that would more be based on entra ID and not the integrated authentication that you use on Prem such kind of issues are reported as mandatory so let's look at a potential issue so we
have this issue being reported that my web application is probably trying to access some external resources and I get which dlls are doing that and then if you see I have this option called as ask co-pilot once I click on that what's going to happen is let me zoom into this window so what's going to happen is it will pick up this specific warning or error and try and give you helpful context Now using this context it describes the issue and it even provides you the steps to resolve this issue so it talks about maybe
you can update ports you can migrate internal Services if the access is for an internal service and this is how you would do that so this is basically your example code adjustment that you may use if you want now what you just saw is how the power of AI can make it simple for you to fix source code so that you have a higher degree of success once you migrate your web application over to Azure app service now for a moment let's switch back to our slides and before I move any further into the into
the actual demo I also wanted to quickly talk about that like you saw the source code analysis which is AI assisted of course you all also have a bunch of free tools that you can use so imagine a scenario once you have you have the source code analysis you have fixed all what's required now the idea is to migrate your web application and app service team basically provides you a bunch of free application tooling which could be the whole Azure migrate integrated experience or simple Powershell scripts now depending about on the use case that you
have which could again be a data center enter exit or a single is web server migration you can make a choice of your tools now broadly what each of these tool does is it runs a discovery so if you're talking about the whole integrated Azure migrate experience we do a bulk Discovery where we go ahead and based on your virtual appliance that is a onetime configuration we pick up all the servers which are running is server and we give you a rich inventory of all the DOT web applications once the discovery is done you go
ahead and run assessments now what assessments does is based on attributes like what region you plan to migrate are there any discounts of pricing tiers that you wish to use or are you going to migrate to an app service environment if your requirement is around an isolated deployment we would basically go ahead and run a bunch of configuration checks and come back and give you a report in terms of what's ready to move what's ready to move with conditions and what's not ready to move now when I say what's ready to move with condition the
idea is that your web application will move fine it will migrate just fine but the challenge is that once your web application is migrated you may need to make some configurational adjustments or changes for the best example I can give you is your web application is hosted on a non default port on Prem like 9,000 but once you move it to aure app service the only traffic is available over Port 80 and 443 so that's the kind of changes that I'm talking about now if and once you're done with the discovery and the assessment the
whole Azure migrate experience also allows you to do a bulk migration of the selected web applications over to Azure app service so what we do is we move the whole zipped content and the configurations for your web application and for every individual web application that's on a is server we create a net new Azure web app and then deploy your code and config over similarly if your requirement is not an entire data center exit if you just want to move a bunch of web applications out of a single is web server you can use Powershell
scripts technically behind the scene Powershell scripts also do exactly the same thing we discover we let you assess and then we migrate a web application but I just want to give you a quick shout out to two more scenarios that Powershell scripts become super helpful one if you are trying to move your cloud service so if you still have a like a web role on a cloud service you can actually use Powell scripts to migrate them over to Azure app service or if you have a hybrid or like an offline online scenario where you could
run Discovery and assess ment on a server which has no internet access and then move the zip over to a virtual machine which has internet access and then finally my create your web application with that let me quickly move over to the sample app topology or the demo app topology that I'm going to Showcase next so now this is a sample topology for a first demo app now we use as I already showcase we used Visual Studio application code assessment toolkit to analyze an existing asp.net web app on Prem and then we made required code
changes if they were needed then as I spoke about the number of tools that are available you I migrated this web application using Powershell scripts now here is the interesting point you can either decide to introduce gen capabilities is before you migrate your web application over to Azure app service or you can redeploy your code once you made the changes after deployment so that's completely uh up to you and just a quick call out that we are for the demo we are using uh GPT 3.5 turbo now the other important point to understand is that
all that that the demo web app and all the required resources are deployed to a landing zone now a landing zone is basically a sort of a templated deployment for you which is based on architectural best practices and of course adares to all of the goodness that comes with Azure around security scaling isolation now also it is important to note that all Communications between app and other Azure web services or other Azure services for that matter are secured using virtual Network and managed identity authentication with that let me move back and showcase so this is
basically the web application that I was talking about so in the interest of time I have already migrated this web application from from an onframe is web server over to Azure app service and as you can see now once you migrated a web application remember as a platform provide you a bunch of uh features that you can use it could be around you want to do scaling or you want to look at your environment variables change them if you want to set up authentication set up application insights uh for basically observability scale up scale out
web application so yeah there are there are lot of things that you can do from the platform itself now this is the demo web application that I was talking about that I migrated this is like a hypothetical Dev shop where we are selling a bunch of uh uh products now if I go ahead and basically get into one of these uh let me go ahead and get into any of these products what I've done is here is the example scenario this was a asp.net or this still is a asp.net uh framework web application that I
migrated and what I've done is to this existing web application I've introduced this new capability called a chat with AI assistant now what's happening behind the scene is I have done an integration using the the Azure open.net client SDK and using that integration basically now I can go ahead and let my customers use the power of azure open AI to ask questions and get relevant answers so maybe you know I can get answers and and just to quickly point out in this specific demo I am not using the concepts like rag or I'm not using
the the database context for my search I'm just going to make a search directly against the Azure open AI endpoint so maybe I can just ask any question suggest a travel bag 100 $100 maybe for students and that's how I get the response back now remember what I am trying to Showcase is a very simple but yet super powerful scenario where you have an existing web application you modernized it using the Visual Studio extension you have gone ahead and then migrated the web application and now what youve done is you have infused gen AI capabilities
for an existing web application I have not redesigned or redeveloped the entire web application what I've simply done is I've gone ahead and introduced this additional capability so that's the that's a scenario that I was talking about you where you can introduce intelligent capabilities for an existing net or NET Framework web application now if I go ahead and show you the code changes that I made so that's that's basically my web application so what you saw where I I was like having the chat experience was the chat bot. aspx page I'm using a very simple
uh jQuery call to my chat controller and that's my chat controller that I was talking about as you can see here what I did is I've developed a new chat controller and what I'm doing here is I'm using the the new packages for Azure open AI client SDK for.net and the Azure do identity uh SDK now what I'm am doing here is I have defined all the configurations as parameters so that I can Define them as as settings in my in my app service I create the open AI client and the most important point that
I want to highlight here is as you can see that the entire communication between my web application and the Azure open AI backend is H is authenticated using managed identity credentials and that's exactly what I'm doing here now once I've established the connectivity I use the get response message and in this method what I'm doing is I'm using the chat completions API now in this API I pass the model and then I use a a prom template and what I've done for my prompt template is I'm saying that you are an employee of a store
that provides product recommendations based on user query and again I want the recommendation to be a bit and friendly and just a single recommendation so once I've done that I pass the message and I pass the user query I get the response back and send it over to the page so if you see it's just a few lines of code but now what's happening is my existing web application has generative AI capabilities with that I'm going to move back to my slides now let's move on to the next piece so I just spoke about I
just spoke about existing web applications now let's talk about what if I want to create a n new uh net 8 web application how do I introduce generative AI capability so again there are number of ways that you can do it you can either do a simplest of approaches using rest apis you can use Azure open AI uh SDK is recently we've also announced uh open AI SDK which is currently in beta for.net and it works with gp4 or if you have requirements you can even use uh higher level llm Frameworks something like Lang chain
or semantic kernel remember and if you want to look at some of the samples you can go on the on the link on your screen but that's some of the ways that you could uh introduce generative AI capabilities into a new net 8 web application and again here is a quick summary of how how do you go about adding these capabilities to a net web application I already spoke about how to add generative AI capabilities using SDK for an existing web application you can also do the same way for a new web application you can
use semantic kernel and now the reason behind using a sematic anal is if you have a scenario where you want to orchestrate across multi uh multiple different models I mean that's a demo that I'm going to uh show in a while or if you're trying to look at uh using a side car pattern on aure app service for for Linux especially if you are looking at options like vector caching to improve performance and even lower cost I mean these are different scenarios or ways that you could be introducing generative AI capability for an ex for
a new net 8 web application again here is a quick view into the topology before I go ahead and and uh showcase the code and the uh web application now this is a net web app I'm introducing gen AI features using the uh net SDK for open AI now this web application also uses the sidecar feature for app service Linux because I wanted to use Vector caching so that I can uh reduce round trips to to Azure open AI endpoint now we are also going to make use of semantic kernel for overall orchestration of app
DB and open AI Communications and also use text embedding model and GPT 3.5 turbo now again remember it's the same concept for deployment all the required resources get deployed to a landing zone for security and best practices perspective and also all Communications are again secured are secured using virtual Network and manage identity based authentication and before I switch to code and demo I also want to quickly talk about intelligent observability if you remember I did say that one if you have a requirement where you want to use a sidecar feature for bringing in uh some
of our uh partner offerings which could again can be AI assisted for logging and auditing so you could use the uh sidecar uh feature for that also I'm going to in the in my next demo I'm also going to talk about Azure co-pilot so yeah that's where the whole concept of intell intelligent observability comes into the picture so I'll showcase how you can use Azure co-pilot to simplify your whole diagnose and solve app uh performance or or or app errors with that again I'm going to switch back to visual studio and before that here is
the net 8 app web application that I was talking about remember this is exactly the same web application that we just saw for for framework now maybe I can just get into jewelry or accessories maybe I'll just go ahead and navigate to jewelry and if you see exactly the same concept I can say chat with AIC now the quick difference here is that every time I am going to post a query what I'm also going to do is I am behind the scene I am also using the whole Azure open AI search with my Azure
SQL database so I'm basically going to use the capability of uh AI search on my database get the context and then use Azure open AI to summarize and give the response back to you so if I say suggest goldplated chains if you see the first time the response is going to be a bit slow the reason is this response wasn't there in my cash now next time if I go ahead and ask a a similar question or if I go ahead and ask a question which is again similar in in context I would basically uh
get a much faster response from the cash that I'm using you can see how fast we get the response back and let me go back to the code now so this is basically the uh same controller that I created but this time it is for net 8 now over here what I've done is one remember I am using session history over here and secondly after after I'm done with all of these and yes just a quick call out before I move on I am using the quadrant for my Vector caching I'm using quadrant database for
my Vector caching and this is where basically the whole get response method comes into the picture which is used for getting the query from the user doing all the uh uh basically communication back and forth with the open AI endpoint and then sending the response over so you can see here that I actually have a quadrant and point and which is Local Host now the interesting thing here is from my web application perspective I am using side car pattern and once I do a side car pattern for my web application the whole quadrant database is
available as a local host endpoint and that's why if you see I'm actually using a local host endpoint once I have defined the Endo I go ahead and create a cache using the quadrant database and then this is where I am getting into using the semantic kernel you can see here that I create the deployment name and remember as I said the entire conversation or the entire communication between this web application and the open AI endpoint is secured using managed identity credentials now for generating this text embedding to use with my uh open open search
open AI search for for on my SQL database I'm actually using the text embedding model once I do that once I generate uh basically these tokens what I do is similarly like I showed in my previous demo I create a prom template just to instruct what kind of personality I'm looking in terms of my responses and I also add this context to my session key once I'm done with these things I go ahead and serialize the history and finally you can see here that's where I start the actual uh chat uh functionality so what I
do here is I'm using the chat format for semantic kernel I pass in and and as you saw that we also keep uh the or we also maintain the chat history so that's how we do it and once I get the tokenized form for the user query I send that to some to be summarized using Azure open AI endpoint and that's basically the model that I'm using and then I go ahead and stream the response back to my end user so again what you have seen SC is I can either do a simple integration with
an existing web application or we also support or we also let you deploy your web applications if you're looking at more advanced concepts around Vector caching for your web applications once you start using the side car pattern with that I'm going to switch back to my okay and let me also quickly talk about the whole Azure copilot experience now once you click on aure co-pilot now imagine I am experiencing some kind of uh slowness on my web application and I want to use the co-pilot capabilities to help me uh diagnose and troubleshoot the whole snow
list so what I'm going to do here is I'm going to say web app slow once I type this now co-pilot would automatically get the context of this specific web application as you can see here and then it is actually going to suggest you so behind the scene what's happening is this is actually using the the inbuilt uh intelligence to go ahead and and diagnose using some of the uh detectors that we have and it's going to come back with a bunch of uh solutions that you can use so you can see here some of
the potential root causes why your web application could be slow so it could be like you know if your CPU consumption is beyond uh 70% and then once you know some of these root causes what are the solutions so you can collect a profiling Trace now I can go ahead and keep uh conversing with co-pilot I can say okay now tell me how do I collect the uh uh profiler Trace so I can do that but the idea here is that again we are using the power of azure code co-pilot to help you diagnose and
troubleshoot your web application so what are the key takeaways from the session today so we talked about uh how do I my modernize and Mig my existing net web application over to asure app service and Infuse generative AI capabilities and also how do I introduce generative AI capability for a net uh for a net new. net 8 web application so we saw both the scenarios where I have an existing web application or I'm trying to add generative a capabilities for a new web application now if you want to get started with Azure app service here
is a list of uh useful links that you can use now we are on Twitter we also have a lot of blog articles that you can go through so with that I conclude this session thank you for joining in and have a great rest of your netcon focus on AI hi everybody I'm Anthony Shaw and I'm going to be talking about observing AI applications from Dev to production with net Aspire and in this talk I want to cover off a few things so I said observing applications so what is observability we're going to talk about
open Telemetry as the standard for our observability strategy um considerations for observability in production so I'm going to show you two demos one is a production application and one is a Dev uh environment and a Dev application it's the same code base but we've just got different ways of monitoring the application uh I'm going to talk specifically about observing AI applications and what makes them a bit different uh and what ways you can use to observe AI applications and then I've also got a list of resources and additional links and stuff you to check out
so before we get cracked into that um my name is Anthony Shaw I work in the cloud advocacy team at Microsoft I'm actually focused on python um so whilst I'm speaking at the net conference my expertise is in Python but I do a lot of work with net as well and I've been using the net Aspire dashboard in particular for observability in a series of python applications uh and just love using it so um I've basically applied that same approach from python to the net stack um and I've used the same techniques in net and
python so we're kind of building applications in both environments um but just love the Aspire dashboard and what it kind of brings to Telemetry so first things first what is observability you've probably heard terms like logging and monitoring before um but observability is kind of a bit of a convoluted way of of saying that you're trying to understand um an application better by looking at its outputs and trying to instrument its inner workings so with um there's different ways of doing it so why would you have observable applications in the first place uh if you
just have logging um um the logging the reliability of logging depends a lot on what you're logging um and when things go wrong in an application it's difficult to know um beforehand what you should have been logging or what you should have been capturing so the idea with observability is that you instrument an application um by trying to capture as much information as possible so that when things do go wrong um in whatever form you can kind of wind back the clock and look to see what was happening at that time uh so for example
um if you got reports from a user that oh last Tuesday the application was running really slowly and we got this error and then you asked them okay what was the error and they I don't know I didn't capture it um like how would you kind of wind back the clock to understand uh if there was a crash or an unhandled exception or even just it running slowly or differently uh what kind of factors contributed towards that application having that behavior so this is kind of the benefit of hindsight um you know each time you
have an issue with an application you might add more logs more information to it uh but the idea with observability is that you use instrumentation to capture uh metrics logs and um traces in your application kind of to see everything that's going on uh so that's kind of one thing another is that you try and simplify the complexity of the app so I'll talk about traces um in this talk but if you just log absolutely everything um you know you've got Reams and reams of data or text of information and when things do go wrong
you've got to sift through that um and I've definitely been in scenarios in the past where you know we've had a um a fault with an application or a bug that we're trying to track down and you're just looking at thousands and thousands of log records trying to figure out which one cont contributed to the crash um or which one contributed to the specific issue that we're trying to track down so with observability the idea is that you kind of capture a session or a trace from when the user initiated it to the back end
and I'll show you some ways of visualizing that as well uh this is not just about having lots of data captured in the database it's actually about using tools uh and UI to visualize what's happening in the application uh and then we can use that to capture Trends as well um it's not always as clean cut as a user reporting a problem you might actually see performance degradation over time so um for example you may not have noticed but your application started running 20% slower when you made a new change or a new deployment so
uh observability is a good way of capturing Trends and data because you're instrumenting the application in the back end and all the different components and you're capture that information you can see historically how things have worked and I'll show you what that looks like in the in the UI so open Telemetry is the standard I want to talk about today when we talk about observability it's not the only option there are other tools there are other standards um open cemetry actually replaced a number of other standards uh so there were things like open census um
was probably the main one it was a combination of that and some other projects uh it's backed by the cncf now as well so open Telemetry is a it's a modern standard it's for observability of any application so C++ C go Java JavaScript python rust um as well as things which are baked in you can trace end to end from the front end to the back end so this is super important so the app that I'll show you has like a JavaScript front end it's some JavaScript components uh and then has like an asp.net uh
backend it's got a a serverless component to it it's got like a lot of different Services working together um but because you've got a standard that's actually built to be agnostic of the technology then you can trace things from from start to finish open Telemetry has also got a massive ecosystem of libraries for instrumenting um other tools and third party packages across all those languages I spoke about um you can also kind of plug and play where you want to put the data so I'll show you the Aspire dashboard and also application insights but you
can export to other platforms as well uh like I mentioned it's governed by the cncf the cloud native Computing Foundation um it's an open source platform um and it's an open standard and it also comes with a protocol um an open protocol called o OTP which is uh an open spec for communicating between where the data is collected and the software so it's basically uh one massive big open spec that looks at your application um and then the first thing you would do with open Telemetry is You' start putting instrumentation on the app um there's
different ways of doing that you if you're using uh Frameworks in your application like asp.net for example um you would use the asp.net instrumentation Library uh that is a package that you would download uh and configure in your application and it would automatically set up instrumentation for things like incoming web requests um set a trace on the users it would look at the logs by default so open often with open Telemetry you can just download a package um from ngap um add it to your application and the instrumentation will come out of the box um
you can write your own custom instrumentation as well well so if you want to capture attributes for the app or anything specific um you can use the open Telemetry SDK for c um to build that in you then have basically three categories uh traces metrics and logs and I'll show you uh all three of these and how they kind of work together um but your application uh runs you put instrumentation on top of that and that emits uh traces uh which are like detailed events basically capturing what's happening in the application uh which are formed
in like a tree structure so if you imagine a user clicks a button on the website um that then starts a trace or a span um and that might then kick off lots of different things like it might send an API request that API request might run a database query for example so like that Trace goes end to end from the user initiated action um all the way down to like the backend queries or reading and writing files so that's the traces uh metrics are more sort of uh data that you can collect about the
system at runtime that's not specific to a particular user like how fast things running um how many uh API queries you have left for example and then logs is kind of what you would expect which is an ey logger interface um where you can use traditional logging and that then gets captured and exported so the collector takes all of that data um and then puts it into um somewhere where you can store it longterm and then search over it so this is how open Telemetry Works you've got instrumentation capturing all this Rich data about your
application uh you've got a collector which is a service that runs locally that stores that and caches it and then you would plug in an exporter which exports that data to a monit ing system um then on top of that you would typically have a user interface where you can explore the data and understand what's been what's been going on so the concepts I've mentioned first one is traces um the trace has a unit of work called a span uh so when you're doing anything with open Telemetry you'll start to be familiar with the span
uh I kind of see it as being very similar to a using statement you've got a gu a context and everything runs within that context um a span is very similar so you would run uh you would initialize a span and you can have spans within that um spans can have attributes so you can store data any metadata you want inside the span and it all gets captured in the system they're typically created um by instrumentation libraries or you can manually make your own so if you want to create a span based on some custom
code or logic in your app you can do that um or if you're using one of the instrumentation libraries it would create the spans for you spans are nested so um like I mentioned if a user clicks a button then that makes an API cool for example the API call itself would be a span but it belongs to the the the original call uh and spans have kinds so you can separate like user initiated uh requests or queries uh versus things like automated system commands in the back end uh We've then got metrics uh metrics
are measurements captured at runtime uh they're created by meters uh and they're typically defined again in instrumentation libraries or or you can Define them manually via the SDK so a meter might be um CPU usage for example as a meter or amount of RAM available so those are kind of meters that you could poll or run periodically or they might be more specific uh so you might want to capture the number of users you got in the system and that's something you might want to poll and just capture so for example if you're looking at
a a trace um you might want to look at traces and metrics at the same time so for example if a user reported an issue with slowness or there was a particular crash um then you'd see the span you'd see the attributes and the exceptions and the errors that happened when the user user made that request but you might also want to see at that time what were the system data what the system metrics uh Meers can be aggregate or live as well and I'll show you both both of those types of uh metrics uh
you've also got logs which are essentially an extension of the logging format they're timestamp text records um you can put structured data in them as well uh or you can make them unstructured uh the log exporters can plug into existing log infrastructure and for net that would basically be an ey logger interface and the app that I've got on our demo shows you what that looks like so so far I've talked about instrumentation libraries now with AI applications in particular um I guess there are uh typically a lot of services that you would be calling
or you would be using libraries to kind of orchestrate those services so whether that's llms Vector Stores um different types of search capabilities uh like local models for example there are lots of different components and services that you have to plug together to build an AI application so the first one I wanted to call out was probably the simplest one which is the Azure open SDK client uh for net so that has instrumentation for traces so it would capture your chat completions for example if you're using uh GPT Vision like it captures that as a
span um so it does tracing out of the box um it does some metrics out of the box but you can really need to kind of configure those yourself um and it has logging out of the box so it's uh essentially you can install that SDK and if you've got open Telemetry configured it will automatically capture the spans uh for your Azure open AI calls um if you're using any other component in azure um so using the Azure SDK so in the app I'll show you we're using document intelligence so whether it's like a pass
service or um any of the management interfaces for Azure as well traces and logs are captured automatically um so that comes out of the box uh with the Azure SDK the Azure SDK is automatically configured for open Telemetry um you can extend it as well if you want to capture any additional data but it just works out of the Bo and then the other one I wanted to call out was semantic kernel uh for net in particular um because they've gone really big on open Telemetry and they designed for semantic kernel so it does traces
for anything if you're using semantic kernels like your orchestration tool for calling and chaining together different AI components uh it also comes with metrics which is super handy for AI in particular so when you're looking at things like token usage for llms um it measures those by default and it exports those um as open Telemetry metrics and it also supports logging so so far I've talked about um enabling Telemetry in your application um but then when it gets to exporters and actually trying to export the data uh the one I want to show you for
the production environment uh is azure monitor so Azure monitor will um um basically show you an open Telemetry span uh in the UI you can navigate to see specific attributes and I'm going to show you a live demo of that um in a second so the UI is built in it supports open Telemetry um out of the box uh traces metrics and logs are all supported um and if you're using that with log analytics then you've got long-term storage of data uh it's delivered as a service and you basically just pay uh for the amount
of storage that you're using uh it's ideal for production workloads um especially if you're consu if you're producing large volumes of data and you can use a single aure monitor instance to capture Telemetry data from any of the components libraries and tools that I've talked about so far so I'm going to show you a demo [Music] from my other machine it is a a rag application um running in Azure container apps uh is written in asp.net um and this application you upload documents uh like corporate documents about in this case we've done things like our
we've got a fictional company but we've got like market analysis we've got our HR documents and policies and stuff like that um and we've uploaded them it's indexed that uh those documents and put them into AI sear so we can do uh essentially like a private GPT of those documents and you can ask it questions and interact with it so if you wanted to understand um what's in our fictional uh Health Insurance Fund uh what things are covered and what things are not covered you can interact with that like a rag application so this is
a production app that I've got deployed um it's got a number of components so I mentioned the uh container app that's running this asp.net server uh We've also got things like Azure functions which does some of the archiving we've got the search service component we've got Azure openai deployed there which has got our models um so we're using gp4 uh to do the actual like text analysis and Generation Um this demo is fly open source there's a link uh I'll put in the end of the the end of the talk um and important thing is
that I've plugged this in with open Telemetry uh and captured all the data about what's happening in the application so I've got application insights running on this app and when I'm interact with it interacting with it um I can look at performance I can look at the particular traces I've got let me just get back to where I was excuse me uh I can interact my app uh and then in the UI I can drill into the samples to see everything that happened when I clicked on the button so for a rag app um there's
the retrial which is the the the search uh which goes into AI search so I can see here I've got my search client call um my my call to AI search in the back end with a specific query uh I've got the two chat completions uh for the llm uh so I can see those combined as the specific Trace um and then anything else is kind of captured in the back end that I can see here in the in the trace so that's my sort of production app and I can use this to understand um
okay what's happening end to end in my request and then I can also look at uh particular types of calls so for example like this is my llm call um and I can go back and look in the application to understand all the times I've called the llm what the typical response time is and what's happening with that application so that's uh production but we talked about net Aspire so I'm going to go back to the slides and talk a bit more about what happens in a spire so this is a rag application that I
demoed the idea is that you have a user question um you would look up your specific documents that's the retrieval component um the results of that using some kind of vector search uh would be augmented into a chat completion query and then you'd use an llm to generate a response back to the user so I guess what makes this unique when it comes to observability is that you've got uh one user request and as far as the user sees that they've typed in a question they get an answer um but in the back end you're
basically stringing together all these different services and different components um and you want to see that as a single span like I showed you in that in that demo just there so as an example um there's a whole bunch of different Technologies you could be using to uh combine uh together to make a rag app uh this particular one is using uh Blazer uh Azure AI search um GPT 4 um and semantic kernel um but you could be using any number of combinations of um different Technologies in the front end uh the retrieval mechanism and
the llm um you could also be using a small language model if you're running things uh within a private environment um or you kind of what I've categorized just like the glue kind of libraries um specific metrics when it comes to observability the things I think are really important to measure is on the front end um there would be things that you would typically want to observe anyway which is who is the user um so if a particular person told you there was a problem like how do you capture who that was and when it
happened um when it comes to retrieval then you're really looking at um performance is one major thing um so especially with AI um and llms like the the retrieval time uh and the reliability of the um uh of the respons or things that you want to measure and capture um so I would suggest doing those as metrics um and capturing the API parameters as span attributes which you would get out of the box um if you're using an orchestration Library um then capturing the chain or the flow uh and then trying to emit that as
a span uh again is really important so as an example um when it comes to traces metrics and Logs with AI applications the traces I'd say you'd want to capture retrieval calls so the actual um search queries the API calls to other components um document and search lookups and also authentication so those are traces you want to capture uh in terms of metrics tokens is a massive aspect of this because um with uh with GPT or your your measurements basically for how you use GPT is the number of tokens that you're consuming uh you also
have rate limits on the number of tokens per minute that you have um and the amount that you've got allocated in the in this system so You' want to capture those as metrix and then with logs um I think in particular user query but also looking at safety violations uh the response from the llm and anything else you might need for auditing as well so those are kind of key things that I think it's important to to capture okay so now I'm going to switch back to My Demo environment uh and I'm going to show
you a couple of other things so in my application the first demo I did was using application insights and the way I did that was to uh in my service Builder this is my asp.net application I'd installed the application insights uh package uh which is the Azure monitor open Telemetry exporter um from nuget and if I've got an application insights connection string in my environment variables um then I'm going to configure application insights Telemetry that out of the box uh sets up uh traces and logging um and the only thing that was missing I think
in that configuration was metric uh and I mentioned earlier that semantic kernel emits a whole bunch of really helpful metrics as well um so I wanted to enable those and just measure in particular like token usage and things like that for for open AI That's My production configuration um so if I'm running a Dev environment you could you could admit to application insights um you get all the capabilities and stuff like that but if you've got multiple developers working on the same project at the same time it's going to get a bit confusing as to
whose environment is which it's also not really a simple way of just like resetting all the Telemetry data so if you're just kind of mucking around or experimenting with stuff and really want an environment that's more sort of like temporary um that you can just use to visualize so what I've done in this application is I've said okay in net um if this is in development mode so if you've run net run locally it would by default be development or if you set the environment to be the development environment then I'm configure some extra stuff
um so I want to enable uh open Telemetry logging capture all of my logs um I want to capture all of my metrics for semantic kernel anything in Azure um and then also I want to enable all traces so capture all information from semantic kernel from the Azure SDK uh and also from the Microsoft uh machine learning libraries I then got asp.net core instrumentation which you get um with the open Telemetry instrumentation for asp.net core that's a package that you install and then I've also installed one for HTTP so when I've got all of that
running uh I can do I can run my app locally and I've also got running uh the net aspired dashboard um you can run this if you're using it through Visual Studio um my preference is actually to run it directly in Docker so if youve got Docker installed locally um whether that's on uh windows with WSL uh on Mac OS um or any other environment then you can do Docker run with the Aspire dashboard um There's the link there there's details on how to do that in the aspire documentation um but that basically runs the
docker image Docker container and then uh we've got this uh Spire dashboard running uh locally on our machine that is like a a monitoring environment designed for open Telemetry where everything I've now got in my app that's running all my traces my logs and my metrics um are there and up and running hopefully the apps okay the apps up and running now so I've got that rag app I demoed earlier running on my local machine and I can capture the same information that I was looking at in the production application um but I've set up
asp.net instead of sending it to application insights it's sending it here so like if I come into here I can see what's happening in my app uh which calls are being made uh what backend calls are happening in my application and I've then got my endtoend traces as well this one's running at the moment that's running the open a call let's have a look at that so the same information I saw earlier in uh application insights is available on this Locker this local Docker image that's running that is a net Aspire dashboard locally on my
machine I can see all my backend calls I can see the the openai calls the chat completions uh all the attributes the configurations everything like that uh is available just locally on my machine uh and I can capture and play around with my application um I can try and crash it because I'm a developer um and I can also flip to the um the specific metric I have available which I mentioned earlier semantic kernel has this really helpful um automated metrics for uh completions and token usage so it will capture open AI uh tokens used
tokens remaining and give you that as a as a graph as well so I'm just going to go back to my slides okay so to recap um open Telemetry is a plugin solution it's uh designed so that you pick and choose which instrumentation libraries you want to use and which ones apply to the specific technologies that you've got in your back end on your front end and you stirring those together um I think typically when people talk about open Telemetry they talk about monitoring in big production environments um but what I really like about the
done SBI dashboard is that you can use the same technology um to instrument an application when you're doing development and often when you're doing development especially in AI applications not everything is running on your local machine so like the llm itself is a service that I'm consuming the AI search component is a service that I'm consuming but I I still want to do that debugging of my local web app and I can use something like this to kind of capture that whole endtoend environment and that whole endtoend Trace um there's some extra resources I wanted
to highlight so this rag application is uh a project on GitHub that you can look at um and if you want to that one comes with application insights out of the box um and using that link you'll also see how you can use it with um local Telemetry as well with the Spire dashboard uh there's a great session on observability net 8 and open Telemetry and there's a blog post from the semantic kernel team about what they do with Telemetry which again uh is using and heavily invested in open telemetry how they capture spans metrics
and stuff like that in semantic kernel uh yeah and then lastly there's a collection available for everything that's in the sessions today um thank you for coming to donet conf and yeah good luck with your observability adventures thank you very much ah that was great I love seeing how the newnet Aspire stack helps out with other Technologies like artificial intelligence thanks so much Anthony it was great to see that session now coming up we've got some more great content we've got two more sessions to end the day but before we get there I need to
make sure you know a little bit about our sponsors check out the 10 companies that are helping us out today big thanks to them for getting the word out about netcon focus on AI make sure you check out some of their websites they've done a fantastic job helping us out and they've even put together a swag bag for you okay there is almost $5,000 worth of gift cards and licenses and a whole bunch more in 15 Swag Bags that you can enter to win if you go to aka. ms.net focus a evaluation all right check
that out there's an evaluation form for you to fill out to tell us what you think about the event to tell us what you think about using artificial intelligence with.net and click the link in there you must click that link it's going to take you to another area where you can enter the swag bag raffle all right now of course there's been a ton of links that we've had for you throughout the day today so many things you want to just get that one Uber link the one place to go we've got a full collection
for you at this address akam ms.net focus a collection you don't want to type that all in on your phone you're typing quickly you can put your thumbs at rest just take a picture of that QR code and it will take you right to the collection with all of the resources the learn modules links to videos information about the survey information about our AI challenge that's going to be going on for the next month on YouTube that you can tune into and watch live streams and also participate in a couple of challenges some tasks for
you to complete that'll help you learn a little bit more about net AI all right next up I've got two more sessions for you the first one we're going to learn about building windows apps and infusing a little bit of AI in there so we can do some stuff for Windows so we can do some stuff with co-pilot Nicola is going to join us to teach us a little bit about how we can take our net skills and build on to co-pilot and windows after that we're going to learn about how we can use net
with Microsoft teams to build some code pilot capabilities in there that's going to be with our friends AA and John so let's get started diving in and learning about windows with co-pilot and the net runtime Nicola take it away hello beautiful people I hope you're enjoying netcon so far in this session we'll focus a lot on AI and how to add AI inside of your windows applications with the windows scop runtime now if you're not familiar with the windows SC runtime time it's a runtime we introduced at build this year to help developers Infuse their
applications windows with AI and take full advantage of the scale of Hardware that's available out there whether it's uh any of the Intel devices AMD devices or the latest Corella plus PCS with the devices have npus on them and allows developers to access any point of the stack all the way from just the experiences and just easy to use apps that use Ai and all the way down to the the level that's closest to the Silicon to access to use the tools and the Frameworks to bring their own models and to run on top of
those devices now I like to show you how to do this through demos so I'm going to switch away from slides and for the rest of the sessions will mostly be in demos and I'll start with an experience that I think it's fun I want to show you what some apps on windows are already doing with AI starting with paint the app that uh I think people uh enjoy and love um and if you're using a corop plus PC like I am using right now um you'll notice there's new features that use AI to generate
content as the user is is using the application in this case co-creator allows me to to generate uh images based on what I've drawn on here and I've been feeling a bit stressed so I'm going to draw a peaceful Meadow just to kind of be more peaceful meow um and I need to start drawing so I'm going to start with just a little green area down here so let's try that maybe fill it up with little grass let me try that one more time that there you go maybe do a little blue on top and
then I'll do a bit of clouds and you'll notice that as I'm doing this the co-creator is generating content on the side they're on the right so let's go ahead and add a bit of clouds here a cloud here maybe a bit of a cloud here and then you'll notice that it's generating exactly what I wanted and this looks kind of familiar but what I want really want to show you here is if you actually open the task manager one thing you'll notice on these new devices now is that there's an additional uh graph to
show you the npu the neural processing unit on those devices and the features that are being used in set of paint are actually fully leveraging the npu to do the whole the whole uh AI the the machine learning models on on this device so for example if I put this on the side here and let's say I do another generation let's say I want to put a little line here again maybe something like that let's see what that happens but if we look at here we can see a spike on the mpu right there right
you can see that Spike right there um and what you'll notice even is that the CPU didn't do anything same time the GPU didn't have a spike in there just the npu did which means that we're leveraging the npu here to offload a lot of that heavy processing of the AI functionality and clear up the GPU and the CPU to do whatever they need to do best and to process the data the way they need to this is a little fun example but there's also other examples of things that you're probably using every single day
for example Snipping tool is a great example of um of a way that we've added AI to the windows experiences in here see I think I need to click it there um through a feature that allows you to just do text recognition right so you can take a screenshot you can copy image this is a really easy way that we've added AI throughout different aspects of Windows um that just makes the experience it makes it better and makes you more productive uh to use your application and at the same time we're also making these experiences
available to developers to use within their own applications um there's other things that we're doing in here such as Studio Effects which is adding different effects on top of an app on top of the camera to offload them off to the npu so that way you as developers don't have to do these type of things and you can use these directly inside of your applications as you're building them now to show you how to do this I have an app for you so it's an app I built to demonstrate the usage of of these different
apis and how to actually add models to your applications um it's a very simple note taking application it's using a lot of the common patterns that I think most developers are building and using today to build their applications and we'll see how we can actually add AI throughout this applications in a way that will definitely relate to your users and relate to yourself when you're adding these to your applications um first things first is this application is built on top of wi ui3 it's a win appes application f.net um and um the first thing I
want to add in here is the ability for users to uh add images and to add audio files alongside their notes you can see here we have videos and we've also have the ability to add images and one of the things we can do here is I can drag an image directly to my application and as I've added it into my app you can see here that same OCR experience I've added to this application where I can also select the text I can recognize the text it even works with handwritten text which is which is
incredible here um it's a it's a it's a great model that runs on device that ships what Windows the developers can easily use as an API the where we we're adding this is by using the new text recognition API that's part of the it's going to be part of one of the future uh winep SDK releases um and to use it it's very simple first thing you do and this is the pattern that we use for all the apis is you request for the API to be available and what this does is actually requests the
model to be available on the device if the model is not downloaded on the device it will prompt Windows to download that model to make it available to to the application and to all the applications on the device and once that's done then you can create that model load that model into memory and then use the API to pass in an image to get back the location of where the the sentences are where the words are as well as the orientation you saw in the application that this was at an angle I was still able
to recognize it I was still able to render all the ux I needed on top of this to be able to select the text because I I was able to get that information from the API additional things we can do here is beyond audio is we can ALS Beyond using the apis we can also use models we can use a lot of Open Source models out there to make our application great and there's so many out there uh one of those models is is whisper which uh is a model by open AI allows you to
recognize audio transcribe audio as well as translate audio and other things and we use that with videos and we use it with audio um for example here I've attached the video recordings for my classes at school um and I was able to get the text for what the professor was saying I could get the actual actual uh time stamps of when that was uh taken and we're doing this by using the open- source version of The Whisper model and we're using it alongside a library called Onyx runtime now the mo the model we can actually
go and navigate to where we got this model from we got it from um from uh hung face but we used the tool called Olive um which is a tool by the onx runtime team um to allow developers to convert models to Target a specific Hardware or to optimize models for specific need whether they need a model to be very performant or they need a model to um to be very accurate and depending on which Hardware it needs to run so we're using one of the examples from the olive front time here uh to take
that whisper model from H face in this case there have an example of whisper tiny uh there's different variants of of whisper there's different size and they they've run different speeds and different accuracy so we we got this model we downloaded this model used Olive to convert it to Onyx so we can actually use it alongside the Onyx runtime so we've we have it inside of application you can see here we have few versions of it we have tiny and we have small difference here being uh is the tiny one is very small but it's
not that accurate versus the small one is a bit bigger it might run a bit slower but it's more accurate than the tiny one and is obviously bigger sizes going from there and you can use them depending how you need to use them um so first things we do is we use the Onyx runtime API which is the infer session here to pass in that model path the onx file itself um to to the inference session and then when we want to actually transcribe something we need to get the audio data and then pass it
through the model now one way to find out how to do this is to use an app called um Neutron um so I'm going to show you how that works right now so if I open up this model here you can see here they're just Onyx files um the app I'm using the web version of the app here I can just drag this in here allows you to actually inspect the model and look what's inside of it in this case you can see here just like any model it's a it's a graph with different operations
that runs on top of the weights but we can actually inspect the different inputs that the model has in this case you can see here this is where the audio data is coming through and then the different options that you can provide to the model and then the output here is just a string which is the recognized text so we're doing the exact same thing inside of our application here we are using that inference session um to run the model before we do that we set up our inputs this is where we're setting up that
audio tensor which is just a float that a set of floats that represents our audio data we have properties that we've set up we're also using um we're setting the logits pre-processor which is allowing us to uh get that audio time stamps that's just documented for the model we're setting up the language and then simply we're calling run on that inference sessions with those inputs to get that string back and once we have that string then we can process it and we can do whatever we need to do inside of our application to to run
it so in this case when I run the application I have an application here for example I have an audio file that my coworker has sent me I'm just going to copy it here from WhatsApp paste it in the application and then it will process for a bit it's transcribing and we can see here uh it it transcribed it immediately add it to my application and I'm able to both play it hey I'm gonna miss a class today as well as see the transcription and it's all now part of my notes application directly built in
that's part of that so I now have my notes I have videos I have audio files I have photos the what really my application misses or needs is a way for me to be able to search all of this content um and AI can absolutely help us to make this a great experience for our users by using uh using a method called semantic search and semantic search if you're not familiar is the ability to search for Content not by its content itself but by its meaning so that way you can search for Cat by searching
for a lion uh or you can search for whatever it is just by what it means not just by what the content it represents so in this case um I can search for let's say number of parameters in a function I know that's the professor was talking about this um and I can see that it took me to uh the lecture two where my professor was talking about functions taking as input zero or more inputs which is the exact same thing I'm searching for but in a completely different way and then ref refr phras in
a different way but the meaning is is the same thing I can click there and takes me directly to to where that thing is we're doing this again with an open source model that allows us to to get a vector representation of content and what that means is that I can take a piece of string and I can convert that to a multi-dimensional space by its meaning so that way I can map multiple pieces of content and I can get the distance between those pieces based on what they mean so all the animals like cats
dogs Etc will live maybe in one piece of the the space where let's say um different like my computers and different types of Technology might live in a different area in that space so that way when I'm trying to find something I can get this Vector representation and find where that's located in that space and what's closest to it the way we're doing this is by using the model called all mini LML 6 V2 it's a great name I I'll give you that but this is a model that's all already readily available on huging face
it's an onyx model already in the Onyx format you can simply just download this from the from the h face uh website and then use the Onyx runtime again inside of your application to run the run the model um the the way we're running the models is is exactly the same as in whisper so I'm not going to show you exactly that code but I want to show you is how we're actually doing the search how we how are we doing the comparison so here's that sech function that we're running I'm going to put a
little quick breakpoint right there I'm going to run that search one more time let's go ahead and walk over um that search so let's try that one more time let's run that search we just hit our breakpoint there it is so the first thing we do is all the content that we have in our application the notes the videos the audio the the pictures we've already run them through the model and we've stored them inside of a database based on their Vector representation and that's where the stored vectors are you can see here we have
about 49 of those vectors and if I open one of them you can see its Vector representation here it is essentially a big list of numbers about 384 numbers to be precise so it's 384 um dimensions in that Vector space now that we have those 490 vectors based on our application we can get the vector for our search term our search term being number of parameters in a function um which is just another float of 384 uh numbers in here same thing and then to get the distance we can actually use this function here called
cosine similarity that comes as part of net inside of the the tensor package that it allows you to get the distance between two vectors inside of a multi-dimensional space and we calculate the distance between our search term and all of the vectors on our space now there's many ways to do this there's databases out there that they handle all this stuff for you we're doing this in a very simple way here to kind of illustrate how this works but once we have the distance between our search vector and all of the different uh vectors that
we've already stored in our database we can get the closest matches which in this case was that the lecture two with have specific point in time and we're able to point the user directly to where they they need to go so we can continue here and takes us directly to to that same experience again which is really great now now one way to actually make this experience even better is to use language models and language models um I'm sure everybody's already familiar with language models like chat GPT and GPT models or any of the other
models that are very popular today uh but what I want to show you is how you can actually run those language models locally on your device that way they run completely offline in fact um I'm on Wi-Fi right now I'm going to go on airplane mode everything I've shown you so far has been offline but I want to Ill a point that all of this stuff actually does continue working offline uh in here as well uh which also turned off my mouse of course I'm going to turn my Bluetooth back on so I can use
uh my mouse there it is um so one of the ways we're using a language mon application is for example for summarization so for example I can select all this Texs I can say summarize which we're going to send this down to a model to a language model locally on the device to to actually go ahead and summarize for us uh as part of that so you can see here the model picked up the the task and is summarizing our notes to make it easier for us so and this is again running completely offline we
even have a little um disclaimer here that says everything is running offline on top of your application we're also doing things like autocomplete uh inside of our app so for example let's say functions take let's see what this is going to go complete with uh inputs process them and produ outputs right it actually did the autom complete as part using that language model but we're also doing things that are Beyond just showing tax and generating text we're also using them to help us with uh uh help us with reasoning on top of tax and help
us understand what the tax means so in this case we have a feature to help the the user uh quickly see what their to-dos are for for their application and we're rendering them in a more familiar ux rather than just a chat window and we can use the language model to take what this content is reason over that content and then show give it to us in a in a way that's more structured so that way we can use it to to either act on it or represent it in a more more visual way to
the user the way we're doing this is we're using a library called um Onyx runtime geni which is yet another level above the Onyx runtime for running um language models and in this case we're using 53 which is a small language model by Microsoft but you're able to use many other models with this Library you can use llama 3 you can use mistol Gemma whatever model out there that's kind of small that kind of works you can kind of use this with with this with this Library um I can show you here the dependency that
we have on this Library it's called Microsoft ml Onyx runtime gen and that's the specific library that we're using to to run this uh and then the model we're using for here is a model that we downloaded from H phace directly let me just go back online so I can show you the web page for that I can actually search for that here 53 there it is Hing face and this is already Hing face as well and there's many different variants of this model there is a variant that runs on the CPU and can run
across devices there's variant that runs on Cuda Nvidia gpus and their version that runs on Direct ml now if you're not familiar with direct ml directl is an an abstraction API that allows you to run models or run machine learning task and Abstract away the hardware the gpus below it so that way you can run them across the wide variety of gpus that run on Windows whether that's AMD there's Intel or Nvidia as well um so we've downloaded one of these models to our application here and we've put it inside of our folder there it
is that's the F3 files that we got from hunging face we've passed them on to our to our um gen API and then we're simply using their simple API to um just uh generate um generate tokens based on what the user whatever the prompt is so whenever the whenever want to send something to the to the model we use this prompt that's specific to fi every single model has a different prompt where they expect the data to be sent so in this case f expects that in this specific format where it wraps the the system
prompt into those tokens uh right there um and then when we want to get things like summarization we simply just ask the model to say hey summarize this text in there in three to five bullet points or we also have a function to fix spelling where we say hey fix up the spelling in this in this content or in this case here we also have one to autocomplete where say hey you're an assistant that helps use your autocomplete please complete this beginning of the sentence um for the user it can be more specific as you
need to and then for the to-dos specifically we're actually being a bit more specific where we're saying you need to summarize into to-do items but please use a adjacent array format to respond to to me so that way I can take that output as Json but either um converted or use regular expression to get the actual information from there and then act on top of it for my for my experience now the cool things here is that I can actually combine multiple of these models into uh into functions that can actually do much more than
that so for example you saw me do semantic search but I can also use semantic search alongside a model to ground a model to the information that's within my application and allow the user to ask the model to ask fi about anything that's inside of my application and then fi can respond to it so instead of the user just searching and looking for the information themselves fi can actually summarize what the user has found uh through a a pattern called rag or retrieval augmented generation so uh to do this to see this in action I
can show you here real quick is we have a little button I can ask the model let's say uh when are office hours or cs1 now there's no uh reason why the model should know anything about cs10 this is my class um it's not something that's maybe might be public it's maybe something in my notes or whatever it is but it's still able to answer that question and even provide me the source where I actually got that information and the way we're doing this is whenever the user asks a question we first run a semantic
search on top of that question to get the information about uh about what the user is looking for and then we feed this this uh this question to the model and say hey can you answer this question that the user has asked here's the contents of of the of where you can find that answer and go ahead and summarize that answer for the user um so that's exactly what we're doing um right here instead of the where is sorry uh instead of ask for Content a sync where we're running that search first where we provided
to us as content where been provided the question and we're we're uh asking the model to answer that question so we're saying hey you're helpful assistant answering question about this content um your system message that's your system message here's the content which I'll show you what that looks like in a second and then here's the question the user passed in so let's go ahead and ask the same question one more time let's see when our office hours for cs10 you can see here we hit a breakpoint inside of our semantic search function to find the
content and then now we have the content within here which is essentially just a big piece of text of that content that we found and in here somewhere it says hey okay yes Tuesday tomorrow 10 to1 is office hours uh that are within within that video so we're asking the model with that content to answer the question and given that context it's able to it's able to do that uh and provide that answer uh for the user right there there it is and then go back and we can go back directly to that point inside
of the video okay so now we saw how we can uh leverage both the apis that are available to us through the windows C runtime such as OCR and we also saw how we can bring models uh open source models and use them alongside the Frameworks and tools that are available through the windows C runtime such as the Onyx runtime such as direct ml Olive and other tools in there as well one of the best ways to actually learn about this and the project that I showed you here is all open source if you go
to learn. microsoft.com Windows a uh you'll be able to learn all about the windows cop runtime in fact all of the samples are directly there so you'll be able to find this sample alongside other samples where that allow you to will show you how to use the windows C runtime now just to recap um there's already a ton of experiences built with the windows compart mum built inside of Windows 11 and part of the applications that come of as Windows 11 but at the same time through applications you might be already using that are uh
that are already out there they're using open source models and apis um and one way to learn how to do that is just by using the applications and learning how they actually accomplish what they what they are doing um in terms of the API the apis are available through a library called the windows C library now this is a library that um exposes a set of on device model that ship as part of Windows um um that will allow you to leverage those models inside of your applications directly through a very simple API just like
I showed you with OCR um two of those two of those apis are going to be available in the upcoming experimental releases for the winap SDK one of them is f silica which is a language model um that's specifically optimized for the npu allowing you to leverage that without you having to pull in any kind of model within your application and leverage the model that's already on the device or downloaded through the API and the API is very simple to use as you can see here on the screen is you can make them all available
and ask Windows to download that model for you to make that available on the device if an experences if a different experience has already done this the the model will probably already be on the device for you to use and then you can simply load the memory load the model in memory with create a sync and then send a prompt to get a response back within your application and then you can do similar experiences that I showed you in the Notes application with this model that's already built into to the platform and then following the
release of these two models we'll also have uh additional apis available things such as Vector embeddings to allow you to do things like semantic search like I showed you and rag tax summerization and many more apis that we're working on to make available sprad the API and easy to use um Library beyond the library when we get to the bottom layer of bring your own models that's when you get when we get to the actual apis and Frameworks and direct ml is um a single Hardware exraction API that allows you to optimize and deploy models
across the whole scale of Windows devices over the one billion Windows devices uh whether they're running uh an AMD and Intel or an Nvidia GPU and we're also working to include npus under that umbrella as well so that way we you will be able to also Target npus as all the npus across the different silicons through direct ml um direct ml also supports uh 4bit uh quantization so that way you can make your models as small as possible while still maintaining accuracy be able to run them across multiple uh multiple devices uh and then you
can use it today it's been available for for many years now and it fully supports the gpus and you can downlo as a new get package inside of your applications the Onyx runtime and Onyx and all its Suite of of tools allow you to um to use models crossplatform now Onyx is a crossplatform an open open source model format to represent model graphs and the onx runtime runs across uh across all the different platforms allows you to run those models on top of any execution provider such as direct ml um the generate API showed you
is specifically for language models in generative Ai and it's it's uh available for you to use today with many of different models um and also supports different languages as well and then Olive is the the tool that I showed you for for um making for optimizing and making models um optimized for the different Hardware or um helping quantize them or even fine-tuning models for whatever you need them to do and then finally one of the tools that you should probably take a look at is the AI tool tool kit for visual studio code it's um
it's a great way for you to get started with language models uh and to experiment with language models allows you to download them and to try them out within directly within Visual Studio code um it allows you to also fine-tune and optimize those models using olive it wraps Olive into really nice uh user experience where you can provide your own data and then fine tune them whatever you need and then you can deploy those application um either on the cloud or to on your device so hopefully this session was uh was educated um there's plenty
of more for you to learn and to try out so I really encourage you to go check out ak. mwcr to learn more about the windows C runtime and to kind of stay in tuned for everything else that we have coming up in the upcoming months so thank you and I hope you have a great rest of your day hello everyone Welcome to our session my name is AA bash I'm a collab developer Advocate at Microsoft and John Miller is joining me today for this session hi John hi AA and is a product manager in
Toms toolkit in developer division So today we're going to talk about how you can build custom engine co-pilots using net teams library and teams toolkit so we have a full pack agenda with live demo we will start with co-pilot stack and we will dive in more into the custom co-pilot area and after that we will introduce you what is teams I library and then how we can use teams toolkit and we can leverage the library together and we will actually build a live H demo we will build custom and jinco pilot together today using team
to kit and teams Library using visual studio so before we jump into the demo we actually have a brief intro of uh all these so John why don't you take away and then get started sure thanks AA uh so to get us started let's back up one step and talk about the uh pillars on building CO co-pilots all the way from low code uh using co-pilot studio and you also have options like coding environments in visual studio and visual studio code so first you have the choice to extend Microsoft's co-pilot and I think about these
options as bringing your app to Microsoft's Ai and that's connectors and plugins fit into that category but you'll also have the option of building your own co-pilot which is on the right side of this slide and that is what we're going to focus on here and I think about this as bringing AI into your app so today we're going to show you how to do that with teams toolkit and visual studio so let's start breaking down what a CO custom co-pilot is in case you're unfamiliar and at its base a co-pilot uses an orchestrator that
sits on top of a foundational model so this is the AI brain of your co-pilot and coming up we're going to show oh let me go back one so coming up we're going to show you a few tools for developing these models and adding them uh with your data to suit your needs like Azure open AI Studio but it's important to note you can use any language model that you want U models and data aren't just uh are just one level of the stack though so to start building a conversational interface you'll need to add
prompt instructions actions uh for your business logic event handlers triggers intent detection there's a lot you need to add so late last year we launched a new set of tools that modernized teams Bots and we call this the teams AI library and it was built on Microsoft's bot framework and this is the foundational groundwork for a custom co-pilot uh the AI library is generally available now in net JavaScript and also supports python so finally your models data and conversational interface will have to run in Microsoft 365 and we've totally modernized that the chat Bots look
and feel to be just like the co-pilots that you see in teams and so this means you have features like streaming citations feedback loops AI generated labels and more and IA is going to demo some of that for you uh coming up in a minute so taking a a look back you know before we had the teams AI Library the chat bot support um was always in teams but building Bots that felt natural to interact with was a challenge most Bots relied on a limited set of declared commands uh so like slash commands or act
commands that you had to predefined and it it felt a little primitive and restrictive and then the users had to understand what those were and how to interact with the app skills and natural language processing was pretty complex to build and it was very expensive so what it looks like now is you can use team's AI library to address those challenges and you can start to build your own co-pilots or mp65 so with the AI Library your co-pilots can use language models to facilitate more natural conversational interactions with your us users and that can help
guide the conversation into your app skills so this means that you can focus on writing your business logic and not so much about how to set up all the language processing and allow teams to handle all the complexities of the conversational piece and still get the benefits of using a bot uh like you did before and overall uh the teams a library helps you create more intelligent personalized experiences for your users so the a library is a Microsoft 365 interface to a language model and it uses techniques like prompt engineering uh where you can add
co-pilot like conversational experiences with built-in safety features like moderation so you can make sure that your co-pilot responds in an appropriate Manner and the library includes a planning engine that lets the model identify user intent so this is how you can map your business logic to the conversation so in the library we talk about actions so you can map those intents two actions that you implement which are some code that you can write so this is similar to Tool calling or function calling that you might have heard of in other language models uh there's also
built-in support for all the open AI models both in open Ai and Azure openai but you can also uh add your own support for whatever model that you want to use there's four steps to get started with the AI Library I just going to show you how to do it let me walk you through what they are right now uh you'll start by out folding out the necessary AI components for your co-pilot and that's includes importing Library which is available on nougat and you can add to your project and you can scaffold a new project
with teams toolkit as well to get started next you create the application object and this represents your co-pilot within M365 so that object will be used to handle incoming messages process user input handle AI triggered actions and then you can start to get created creating your prompt to find the personality Its Behavior inform it of any actions that you have in your code and then you can Implement co-pilot actions so that's your business logic these are basically your function calling and actions are the primary way your code interacts with the AI components inside the model
so actions can call external apis process data or they can generate um responses what they can do whatever you want to make it simpler for you to get started with all this and basically stream line most of these steps you can use teams toolkit for visual studio it's an optional component that you can install as part of the asp.net workload so if you already have asp.net workload installed you can just go in and add the Microsoft teams developer tools which will install teams toolkit for you it's a set of templates and tools to help you
build custom co-pilots and apps for teams across M365 uh it supports in uh automating a lot of the setup and debugging and provisioning that you need to run these experiences and there's a bunch of project templates that we have for that depending on what you want to build so now we think we can hand it over to AA and she can show you how to do all of this inside a visual studio awesome thank you John I'm excited for this demo so this demo basically covers what John explained Us in the slides we will actually
do all of these steps build build all of these steps Hands On by just creating a new project so I'm in visual studio and I will create a teams app and um you can basically give any name to our project today we are going to create actually a career expert called Career gie and we can call it specialist okay so we have a bunch of templates available as John mentioned these templates help us create scaffold projects from scratch and some of them actually comes with teams a library from uh the F from the first moment
so today we're going to use AI chatbot but I also mentioned there are other templates available for us to use together with team's Library one of them is the AI assistant but if you work with openi assistant API before this template would be a suitable start point for you so you can bring your existing openai assistance API and you can use it together with the team's a library today our base is going to be AI chatbot so we'll start using teams AI library right away using this template I'll create my project and when it scaffolds
we will have a project from scratch and this is a actually regular AI chat bot but it comes with team's AI library from start start so our AI Library already installed in our project and we can start using the AI components on and all of the steps that John mentioned previously um so before I run the project I actually realized one thing I want to bring John in and ask so um I realized that there's a change in the project template John so now there is the teams app where we have the files and there's
also the project itself where we have the code steps can you quickly explain what was the reason why you made this this change in the project so the solution is now broken up into two projects uh you start out with a teams app project and that is going to encapsulate like all of the files that are needed for teams toolkit so this is the automation files which you'll see in a minute with the ymo files the environment files it's really a way to keep things separate so if you want to implement your team's bot or
website or any of the capabilities that team supports over multiple projects then you can do that and it's a little easier to uh work with and then all the teams toolkit related commands to help you provision and create things um you can interact with directly on the teams app project so it just helps keep things a little tidier for you in the solution okay sounds good so um in this case all of the teams app related files we have like yl files and environment variables are under teams app and all of the source code we
have about the project is under the program CS so um to start this project we actually need Azure open AI keys and uh for the asan to run this project locally we actually need to bring Azure op I ke is an endpoint here I already created those so I'll quickly go to Azure and go bring my endpoint and keys here so first I'll get the key and then you can quickly get the endpoint from here too okay once we done with the environment variable it's actually uh only a few step to run the app first
we will need to create a Dev tunnel and um it's quite easy to create a Dev tunnel actually you can call it anything demo de tunnel and then I'll keep the tunnel temporary and then access level you can put organizational public or private for this demo I'll just use public and then the last step for us is preparing teams um app dependencies this step actually does bunch of stuff behind the scenes where it registers ad and also um complete some of the required steps so we will actually have all of the setup ready in our
environment variables you can check that um you actually get teams app ID M65 username both domain and point over here and team soit does all of that for you behind the scene um so while we are running the app dependencies I also want to quickly run through what is available in our project so uh first things first to run this project meaning to run um AI chat bot together with teams a library we actually need teams a library installed and if you want to check how teams I Library works and understand a bit better about
the code you can go to n packages and here you'll see that teams microsoft. teams. a package is already installed in this template I'll quickly go ahead and update this and quickly I also want to go to program Cs and walk you through a quick um code so that you will understand better how teams Library components work so the first things as John mentioned is um AI components AI Library bring brings us AI components where we can run models planners and all bunch of other stuff so here the first thing we see is azure open
AI model which actually um we use it helps for us to call Azure open AI or same with Azure same with open AI model actually if you want to use open AI model and after that we have the uh prompt manager prompt manager manages prompt creation using the files under the prompts folder as you can see here we have the prompts folder available here and if you extend it we have a chat folder and here we have config Json which has all the settings for our prompt and we also have the SK prompt. txd this
is the place we actually Define the characteristics and the behavior of our bot so if I change this text and if I want to give more personality this is the place I'm going to add all of my um prompt system prompt so going back to the code really quickly we finally have the action planner and action action planner actually uses llm large language model to generate plans and call Azure open AI this is basically the step that defines everything for our planner and finally we create the application and application basically replaces the activity Handler class
in a typical bot framework bot so that when we run the app we return the app and we get all those llm functionalities okay so now I will run the app and we will test the plain vanilla version of this app and then after that we will add characteristics for career G so we'll have the uh career specialist support and then we will also bring some data so that we can also search through our data toe all right so our app is loading I'll add and then I can start chatting um with my app I
don't expect a lot of information this is just going to be a basic uh AI bot where I can ask generic questions but more I add into the prompt and more I add data Bo will will be more specialized in the area of the expertise I wanted so um while we're waiting John uh do you want to add anything in the code side where we add the AI component uh one thing I'll mention is since this almost is finished um is that team Soul kit helps uh launch this for you so you clicked start debug
and uh basically you can look in the property section there's a launch URL so we use this a special launch URL to open up teams for you and we plug in all the right values for your app and make sure that the right uh app is launched in your tenants so um that's all you know logging into teams toolkit so that was a step that you um took before is when you do the prepare team apps dependencies you choose an account that's your M365 account IA chose her developer account and that's how we know which
uh teams to launch so I think you're ready to start debugging yes awesome so first things first I'll just say hi and then we will probably receive a generic high and then I can ask some specific questions like can you suggest me anynet developers assume that I a human resources in a company I'm looking for that net Developers for position to fill but because ker Genie currently doesn't have any data set behind the scene it only show shares that you know you can go to LinkedIn upw work freelancer to find more people uh related to
this role but what I want to actually do is I want to add bunch of CVS behind the scene of career Genie and also add some uh prompt so we can Define behavior and we can actually get more value out of career Genie okay so quickly I'll go to Azure and I will Define everything here and I'll show you how our data set work in the chat playground and then we are going to bring the same experience into teams Okay so so first I already defined my index in Azure I search I actually predefined the
index so it's a vector index data set already available in AI search sample and I'm going to choose the right index in a second I have a lot of indexes but resumés is the right one for the stemo U so because my data set is uh defined as vector embeddings I need to choose Vector search for my search experience so um I will also choose text embeddings model so I can actually search over my embeddings data set search type you can actually choose Vector hybrid uh whichever is more convenient for you for this m I'll
just go for Vector API key and then I will be connected to my data set in a second because it's predefined I don't need to do anything and before I actually check with my data I want to quickly show you how my data looks like here is bunch of CVS it's more than 30 CVS available in this data set and all of the CVS are in native language of the people so you will see CVS here in English Chinese Spanish Turkish so all the uh diverse CVS you can find but because we're using uh Vector
search so I'll switch on this content Vector we are actually doing all the search experience through um embeddings so that we can get all the related data and we can basically have a pretty good search experience I'll go back to my playground I'll ask can you search for.net developers all right so now we see a bunch of that net developers recommended for us and we are also able to see their CVS coming through and we should be able to work with this data what we want to do as next is we actually want to bring
this data set into our teams app I'll quickly switch back to my app I'll actually stop this and then stop my app too and I will update my prompts uh prompt settings config.js and I'll also update SK prompt. txt so that I'll will give some Behavior personality to my bot and we can to chat with our data through teams all right so let's start with config.js and the difference between our app and uh chat playground is that quickly I want to show you here if you actually open the Json now you see here that we
have data sources we're actually going to do exactly the same thing in our config we will add data sources so that we can connect with our data source um from our teams app to okay so I'm going to update this a completion section I already prepared this section for you but I'll quickly walk you through uh what it does so we will update the properties in this uh completion section and I'll quickly section them [Music] in okay so um the some of the properties were already here these uh five of them but what I included
is include in input include history these These are defined as true and also we added data sources which is the key uh property where we Define that we are using Azure search and we have the endpoint we have the index name resumes exactly the same with our chat playground in the authentication we have the Azure AI search key and finally we Define the query type and we also bring our deployment name for our uh Ada model as well which is text and Bings so once I save this that means that we actually have our data
set ready for our app and finally to just bring a little bit of uh behavior and characteristics in career Genie I will update the prompt as well and let me copy that piece for you too and we can quickly have a look so here we Define that your career genus specialist named career Genie so you are friendly and always greet users you like using emojis and we also use data um coming from the Azure AI search as well so let's rerun the app and see how the behavior differs and if we will be able to
get any responses any CV recommendations from career gen too okay I will add this and then um now I should be able to um see career Genie using emojis being more friendly as well as when I ask about net developers or ask about people who have fiveyear experience then I should be able to get some CVS other than recommending me to go to um LinkedIn or um up upward basically so I think I should be able to close this and directly chat with ker Genie because we already debugged what I will actually do is I'll
say hi again just to see how career Genie responses to me so I'm expecting some excitement yes so I have Emoji I have the introduction as ker Genie and now I can ask more about can you search foret developers okay so I I think I'm going to be more specific and I'll say can you suggest that net developers who are [Music] experienced five plus years okay so when I'm more specific I think there's more data to work so we have uh more information we already have three different CVS coming through and we also have citation
uh thanks to team's a library this is built in actually and here you can see bunch of different profiles and people with five plus year experiences who are expert in that net coming through so this is what I wanted to show today and if you want to learn more you can also check out our documentation and I'm handing back to John who John can walk you through a little bit more about what you can do with teams ey Library awesome thank you AA really good job awesome demo love the career Genie uh as I just
mentioned we have some more cool things coming in the AI uh Library so we call this the powered by AI kit so there's a ton of new ux improvements for custom co-pilots in teams and that's things like I mentioned earlier the streaming ux will be coming soon um there's AI labels uh there's automatic support for citations when you're using Azure open AI studio and the azure on your data support like I should demoed here uh there's apis in the library for doing feedback loops so this is the thumbs up and thumbs down on the responses
that you can add and there's handoff support if you want to interrupt between your custom co-pilot and Microsoft's co-pilot and being able to present in the stage so just a recap we broke apart the components of the comp co-pilot text deck uh from the foundational models to the teams AI library and there's all the ux features that we have in teams and then all of that put together is a best-in-class solution for building and deploying your own co-pilot in Microsoft 365 and I just showed you how you can get started quickly using teams toolkit and
visual studio and be able to build withn net if you want to try it for yourself you can check out this link aka.ms TR teams toolkit that will take you to our page on building custom co-pilots for M365 and give you all the information on what we have available and all the tools that you can use to do that including teams toolkit and here's some more references for you the teams toolkit most of the project is open source so you can check out the repo there's lots of other resources there to learn more about what's
available in the toolkit the AI library is on GitHub as well and is open source you can check that out there's lots of cool samples in there too that are not part of the project templates that you can learn more from and the powered by AI ux toolkit is part of that so there's a link there to get there and something we didn't cover today but you may want to check out is the Adaptive card starter pack so this is the way as you start to vot out your custom co-pilots and you want to improve
their responses with some beautiful ways to display information you can use adaptive cards and teams and there's some cool new starter packs out for that and here is the link for you to give some feedback and get started with doet and AI you can go ahead and scan that and take a look at that and that'll take you to the net for AI page um where there's tons of other resources not just for teams and what we showed but lots of other cool stuff that's going on uh at the S at the NETCOM so thank
you and thanks for listening thank you bye all right thank you so much Nicola AA John I I now I know I can how I can get co-pilot into my applications whether it's on Windows even get some stuff into teams that that sounds interesting to me all right all right we've had a full day of content here my gosh I hope you've enjoyed it all of these sessions they're recorded maybe you're joining us late they're recorded we're going to have them on YouTube for you in just a few minutes we've been building a playlist behind
the scenes the team our amazing team of editors and production staff they're going to put that together and we're going to have it published and available for you in just a little bit all right big thanks to our friends Cam and Matt who have been working behind the scenes diligently putting all of the video content together for you now there's all kinds of great things about artificial intellig in.net that I want to make sure that you have at your fingertips so that you can do more all right check out some of the slides that I
have for you here of course we've got learn modules and content available for you check them out at aka. ms.net Focus AI learn if you don't want to key those in go ahead take a picture of that QR code it'll take you to the same place now I've got a survey for you make sure you fill out the survey let us know what you think of this event let us know what you think about yournet experience working with AI aka.ms netfocus a evaluation once again that QR code will take you to the same place and
over the next month we've got a full set of live streams and challenges some tasks a little bit of homework and puzzles for you to check out as part of our AI Challenge and live streams you can join us and follow along at aka.ms sl. netfocus a/ credential challenge live it's going to run right through September 20th make sure you join us at that website you can click the QR code and follow and learn more about our AI challenge now this doesn't just stop today or at the end of that challenge coming up in November
NETCOM 2024 featuring the launch ofet 9 this is a big event for us it's three days long we've got more than 70 sessions as part of this event a bunch of content from ournet teams we're looking forward to hearing from them some of the folks you've heard from today will be speaking at that event but you can be a part of it not just watching but you can be a part as a community speaker our third day is wall toall Community speaker ERS folks from all over the world we're broadcasting for 24 hours straight you
can join us at netc con.net there's a call for Content link there you can submit some sessions and let us know how you're using Donnet to build something cool maybe it's a open source library or a framework or maybe even a game check it out at netc com.net and join us in November for this amazing 3-day online event that you can watch with us we wouldn't be here we wouldn't have gotten this far if it wasn't for some of our great sponsors big thanks to these companies growth acceleration Partners iron software code magazine mesus progress
teler octopus deploy SN Fusion ABP avalara and nosce have done a great job promoting and telling all the world about the great things that we have as part of our netcon focus on AI event I want to make sure that you know there's not only has there been all these this great content we've shared we've got our sponsors we've got all the things going on we've got a complete collection of all the resources you need you're one link that does it all aka. ms.net focus a collection click that QR code and you can learn more
about all the things that we've presented all the videos the learn modules the AI challenge check it out at that address and at that QR code all right I think that's everything I think we're just about out of here our survey and our evaluation for the Swag Bags is done big thanks to everybody who participated and entered there we'll send out with our wrap-up blog post a list of those folks who have won the swag bag you'll be contacted via email and I want to put out a big thanks to the folks that help helped
make this event go we got to thank our friends jamy Singleton who helped plan and coordinate at this event John Galloway who helped coordinate a bit of the content and items that have been going on behind the scenes mahul Harry who's helped us promote and get the word out about the event huge thanks to everybody on the team all of our folks in the studio Cam and Matt and our editors back there thank you very much for helping out with netcon focus on AI my name is Jeff Fritz make sure you stick around netc con.net
our website so you learn more about our next event coming up in November netcon 2024 I'll see you
Related Videos
A Complete .NET Developer's Guide to Span with Stephen Toub
1:02:48
A Complete .NET Developer's Guide to Span ...
dotnet
52,880 views
Which AI should you use? Copilot, Copilot Studio, Azure AI Studio and more!
1:19:08
Which AI should you use? Copilot, Copilot ...
John Savill's Technical Training
32,547 views
.NET and Containers - each layer of abstraction indistinguishable from magic - Scott Hanselman
1:01:29
.NET and Containers - each layer of abstra...
NDC Conferences
17,457 views
ChatGPT for Data Analytics: Full Course
3:35:30
ChatGPT for Data Analytics: Full Course
Luke Barousse
250,446 views
Blazor Radzen DataGrid CRUD - Clean Approach - Pt. 1 READ (Static Data)
29:27
Blazor Radzen DataGrid CRUD - Clean Approa...
code with andre
1,438 views
What is generative AI and how does it work? – The Turing Lectures with Mirella Lapata
46:02
What is generative AI and how does it work...
The Royal Institution
960,788 views
Data + AI Summit Keynote Day 1 - Full
2:52:20
Data + AI Summit Keynote Day 1 - Full
Databricks
33,551 views
AI-900 - Learning About Generative AI
59:47
AI-900 - Learning About Generative AI
John Savill's Technical Training
53,323 views
LangChain Master Class For Beginners 2024 [+20 Examples, LangChain V0.2]
3:17:51
LangChain Master Class For Beginners 2024 ...
codewithbrandon
80,745 views
Infusing your .NET Apps with AI: Practical Tools and Techniques | BRK187
46:23
Infusing your .NET Apps with AI: Practical...
Microsoft Developer
13,663 views
Agents: Patterns and Practices for Automating Business Workflows
24:28
Agents: Patterns and Practices for Automat...
dotnet
51 views
Full Series [Part 1-18] | Generative AI for Beginners
4:20:18
Full Series [Part 1-18] | Generative AI fo...
Microsoft Developer
78,050 views
Johnny Hooyberghs - Building your own AI Agent using Semantic Kernel
1:23:20
Johnny Hooyberghs - Building your own AI A...
Microsoft Zero to Hero Community
4,152 views
Beyond the Hype: A Realistic Look at Large Language Models • Jodie Burchell • GOTO 2024
42:52
Beyond the Hype: A Realistic Look at Large...
GOTO Conferences
87,699 views
Bridge the chasm between your ML and app devs with Semantic Kernel | BRK250
42:29
Bridge the chasm between your ML and app d...
Microsoft Developer
2,371 views
The moment we stopped understanding AI [AlexNet]
17:38
The moment we stopped understanding AI [Al...
Welch Labs
931,167 views
Complete Generative AI With Azure Cloud Open AI Services Crash Course
3:03:16
Complete Generative AI With Azure Cloud Op...
Krish Naik
40,538 views
Build your own copilot with Teams AI library and .NET
26:53
Build your own copilot with Teams AI libra...
dotnet
36 views
"Highly Technical Talk" with Hanselman and Toub | BRK194
46:11
"Highly Technical Talk" with Hanselman and...
Microsoft Developer
38,110 views
How to use Microsoft Azure AI Studio and Azure OpenAI models
16:37
How to use Microsoft Azure AI Studio and A...
Adrian Twarog
88,017 views
Copyright © 2024. Made with ♥ in London by YTScribe.com