Generative AI Full Course 2025 | Gen AI Tutorial for Beginners | Gen AI Explained | Simplilearn

6.31k views102681 WordsCopy TextShare
Simplilearn
🔥Purdue - Applied Generative AI Specialization - https://www.simplilearn.com/applied-ai-course?utm...
Video Transcript:
AI is no longer a thing of the future it's happening right now and it's changing the way we work create and innovate so whether you just starting out or already familiar with AI this gen AI full course by simply learn will help you master the most exciting advancements in artificial intelligence we'll start with the basics what is generative Ai and then dive into AI carriers the road map to get started and the top AI Technologies shaping the industry you'll explore powerful AI models like DC cardan search GPT and open AI Sora while also understanding key
Concepts like deep learning Transformers G lstm and reinforcement learning want to know how businesses are using AI we'll cover L chain llms and AI tools for job interviews along with deep dive into Google Quantum Ai and agentic AI plus if you're preparing for an AI job we have got you covered with deep learning interview questions and llm benchmarking insights and by the end of this course you won't just know AI you'll know how to use it build with it and stay ahead in the AI Revolution but before we commence if you're interested in mastering the
future of technology the professional certificate course in generative Ai and machine learning is a perfect opportunity for you offered in collaboration with the en ICT Academy I canper this 11mth live and interactive program provides hands-on experience in Cutting Edge areas like generative AI machine learning and tools such as chat GPT dar2 and hugging face hurry up and enroll now you can find the course Link in the description box and pin comments meet Emma a graphic designer working on a new project one day her colleague mentions a tool that helps create designs images and text using
AI intrigued Emma wonders how AI can create something from scratch her curiosity grows and she decides to dive deeper into this new technology called generative AI generative AI refers to a type of artificial intelligence designed to create new content such as text images music and videos unlike traditional AI which analyzes or categorizes data generative AI produces original content based on patterns learned from vast data sets essentially it generates new unique material these models are often trained on large amounts of data and use sophisticated algorithms to mimic human creativity tools like chat GPT or do e
can create art write essays or simulate conversations by generating outputs based on user prompts generative AI has a wide range of applications content creation tools like gp4 generate text blog posts stories and essays from simple prompts Art and Design AI models such as DOI generates Unique Images and Designs based on text descriptions transforming creativity and art music and audio AI can compose music or replicate voices offering new possibilities for musicians and audio engineers Healthcare generative AI simulates disease progression or creates synthetic Medical Data helping doctors gain faster insights for research let's take image generation as
an example to explain how generative AI Works data collection and learning AI models like Dolly are trained on large data sets of images paired with text descriptions these data sets teach the model to recognize different objects colors Styles and how to associate text with corresponding images the more data the AI learns from the better it can generate accurate and diverse images based on user prompts neural networks and Transformers when Emma inputs a prompt like a cat wearing sunglasses the Transformer model processes the text recognizing words like cat and sunglasses and links them to images it
learned from during training Transformers help the AI decide how to combine these elements into a coherent image to tokens and context the text input such as a cat wearing sunglasses is split into smaller parts called tokens the AI processes each token and understands their relationship for instance it knows the sunglasses should be placed on the cat creating a contextually accurate image feedback mechanism generative AI models improve through feedback after generating an image users provide feedback on the accuracy or quality of the output if if Emma's generated image shows the sunglasses floating beside the cat she
can mark it as incorrect the model uses this feedback to improve future image Generations reinforcement learning reinforcement learning further enhances the ai's ability the model is rewarded when it generates accurate images and corrected when it makes mistakes for example when Emma describes a sunset in the AI produces a vibrant Sunset image it receives positive reinforcement over time this method refines the model's ability to generate better images data science and AI models data scientists curate the training data and Define the parameters that help the AI generate accurate images the more varied the data set the more
versatile the AI becomes in generating diverse types of content Advanced models use billions of parameters which are settings that guide the AI in processing data and generating outputs generating original content once trained the model can at original images for example Emma might describe a futuristic cityscape and the AI would produce a unique image based on what it learned the generated image isn't just a copy of pass data but an entirely New Creation showcasing the ai's ability to combine learn patterns and creativity now let's have a quick fun quiz on what we have learned so far
what does generative AI primarily do a analyze data B generate new content content C store data make sure to let us know your answer in the comment section below the term generative AI has emerged seemingly out of nowhere in recent months with a notable search in interest according to Google Trends even within the past year this spike in curiosity can be attributed to the introduction of generative models such as d 2 B and chgb however what does generative AI entail as a part of our introductory series on generative AI this video will provide a comprehensive
overview of a subject starting from the basics the explanation Will C to all levels of familiarity ensuring that viewers gain a better understanding of how this technology operates and its growing integration to our daily lives generative AI is after all a tool that is based on artificial intelligence a professional who Els to switch careers with AI by learning from the experts what is generative AI generative AI is a form of artificial intelligence possesses the capability of to generate a wide range of contact including text visual audio and synthetic data the recent excitement surrounding generative AI
stems from the userfriendly interfaces that allow users to effortlessly create high quality text graphics and video within a seconds now moving forward let's see how does generative AI Works generative AI begin a prompt which can take form of text image video design audio musical notes or any input that AI system can process various algorithm that generate new content in response to the given prompt this content can range from essay and problem solution to realistic created using images or audio of a person in the early stages of generative AI utilizing the technology involved submitting data through
an API or a complex process developers need to acquaint themselves with a specialized tool and writing application using programming language like python some of the recent and fully operational generative AIS are Google Bart Dal open AI ch GPT Microsoft Bing and many more so now let's discuss chat GPT D and B which are the most popular generative AI interfaces so first is DAL 2 which was developed using open as GPT implementation in 2021 examplify a multimodel AI application it has been trained on a v data set of images and their corresponding textual description Dal is
capable of establishing connection between various media forms such as Vision text audio it specifically links the meaning of words to visual elements open a introduced an enhanced version called d 2 in 2022 which empowers user to generate imagery in multiple Styles based on their prompts and the next one is chity in November 2022 chat GPT and AI power chatboard built on open AI GPD 3.5 implementation gained immense popularity worldwide open AI enable user to interact with and F tune the chat Bo text responds through a chat interface with interactive feedback unlike earlier version of GPT
that was solely accessible via an API chat GPT brought a more interactive experience on March 14 2023 open a released GPT 4 chat GPT integrates the conversational history with a user making a genuine dialogue Microsoft impressed by the success of new atgb interface announced a substantial investment in open Ai and integrated a version of GPT into its B search engine and the next one is Bard Google bard Google was also an earlier Fortuner in advancing Transformer AI techniques for language processing protein analysis and other content types it made some of these model open source for
researcher but were not made available through a public interface in response to Microsoft integration of GPT into Bing Google hardly launched a public facing chat Bo named Google b b debut was met by an error when the language model in ly claimed that the web telescope was the first to discover a planet in a foreign solar system as a consequences Google stock price suffered a significant decline meanwhile Microsoft implementation of chat GPT and GPT power system also face criticism for producing inaccurate result and displaying eratic behavior in their early iritation so moving forward let's see
what are the use cases of generative AI generative AI has broad applicability and can be employed across a WI wi range of use cases to generate diverse form of content recent advancement like GPT have made this technology more accessible and customizable for various application some notable use cases for generative AI are as follows chatbot implementation generative AI can be utilized to develop chatbots for customer service and Technical Support enhancing interaction with users and providing efficient assistance the second one is language dubbing announcement in the real in the realm of movies and educational content generative can
contribute to improving dubbing in different languages ensuring accurate and high quality translation and the third one is content writing generative AI can assist in writing email responses dating profiles resumés and term papers offering valuable support and generating customized content tailor to specific requirement and the fourth one is Art generation leveraging generative AI artists can create photo realistic artwork in various Styles enabling the exploration of new artistic expression and enhance creativity the fifth one is product demonstration videos generative AI can hun to enhance product demonstration video making them more engaging visually appealing and effective in showcasing
product features and benefits so generative AI versatility allow it to employed in many other application making it a valuable tool for Content creation and enhancing user experience across diverse domains so after seeing use cases of generative AI let's see what are the benefits of generative AI so generative AI of offers extensive application across various business domains simplifying the interpration and comprehension of existing content while also enabling the automated creation of a new content developers are actively exploring ways to leverage generative AI in order to enhance the optimize existing workflows and even to reshave workflows entirely
to harness the potential of Technology fully implementing generative AI can bring numerous benefits including automated content creation generative AI can automate the manual process of writing content saving time and effort by generating text or other form of content the next one is efficient email response responding to emails can be made more efficient with generative AI reducing the effort required and improving response time and the third one is enhanced technical support generative AI can improve responses to specific technical queries providing accurate and helpful information to users or customers and the fourth one is realistic person generation
by leveraging generative AI it becomes possible to create realistic representation of people enabling applications like virtual characters or avatars and the fifth one is coherent information summarization generative AI can summarize complex information into a coherent narrative distilling key points and making it easier to understand and communicate complex concept the implementation of generative AI offers a range of potential benefits steamingly processed and enhancing content content Creation in various areas of business operation so after seeing advantages of generative AI let's move forward and see what are the limitations of generative AI early implementation of generative AI serve
as Vivid examples highlighting the numerous limitation associated with this technology several challenges arise from the specific approaches employed to implement various use gifts for instance while a summary of a complex topic May more reader friendly than explanation incorporating multiple supporting sources the ease of credability comes at the expense of transparent identifying the information sources so the first one is when implementing or utilizing a generative AI application it is important to consider the following limitation I repeat the first one is lack of source identification generative AI does not always provide clear identification of content Source making
it difficult to trace and verify origin of the information the second one is assessment of bias assessing the bias of original sources used generative AI can be challenging as it may be difficult to determine the underlying perspective or agenda of the data utiliz in the training process the third one is difficulty in identifying inaccurate information generative AI can generate realistic content making identifying inaccuracy or falsehoods within the generated output harder and the fourth one is adaptability to a new circumstances understanding how to fine tune generative AI for a new circumstances or specific context can be
complex requiring careful consideration and expertise to achieve desired result and the fifth one is glossing over bias Prejudice and hatred generative AI results May amplify or preate biases prejudices or hateful content present in the training data requiring Vigilant scrutiny to prevent such issues so awareness of these limitation is crucial when the implementing of utilizing generative AI as it helps users and Developers ically evaluate and mitigate potential risk and challenges associated with the technology so future of generative AI furthermore advances in AI development platforms will contribute to the accelerated progress of research and development in the
realm of generative AI the development will Encompass various domains such as text images videos 3D contact drugs Supply chains logistic and business processes while the current stand loan tools are impressive the true transformative impact generative AI will realize while these capabilities are seeming integrated in I repat into the existing tools with regular use do you know artificial intelligence is transforming Industries across the globe creating a wealth of CER opportunities for those ready to embrace the future take Elon Musk for example he is known for his work with Tesla and SpaceX and he co-founded open AI
an organization dedicated to ensuring that AI benefits all the humanity musk transitions into AI under scores the massive potential of this field not just the tech Enthusiast but for anyone willing to innovate and adapt imagine this in the tech city of Hyderabad India Arjun sits at his desk eyes focused on his computer screen just two years ago he was a new computer science graduate working as a junior software developer at a small startup his salary was modest and his career prospects seemed limited but everything changed when he discovered the booming field of artificial intelligence Arjun
spent his free time learning python exploring statistics and experimenting with AI models fast forward 18 months his hard work paid off he landed a job as an AI engineer at a major tech company in Bengaluru tripling his salary from 6 lakh to 18 lakhs per year more importantly Arjun found himself at the forfront of Technology working on projects that are shaping the future Arjun story is just one example of how AI transforms careers in India across the country professionals are seizing new opportunities in AI as companies invest heavily in this revolutionary field but entering AI
isn't easy it requires dedication continuous learning and adaptability in this guide we will explore AI career paths the skills you need and what it is like to work in this Dynamic field so let's talk about is AI is a good career or not you have probably heard a lot not about artificial intelligence or AI it's everywhere and it's shaking up Industries all over the world but here's the big question is AI a good career choice yes absolutely it is take Elon Musk for example we all know him as the guy behind Tesla and SpaceX but
did you know he also co-founded open AI even aun diving into Ai and that just shows how massive this field is becoming and guess what AI is isn't just for Tech Geniuses there's room for everyone Let's Talk About Numbers AI jobs are growing like crazy up to 32% in recent years and the pay is pretty sweet with roles offering over $100,000 a year so whether you're into engineering research or even the ethical side of the things AI has something for you plus the skills you pick up in AI can be used in all sorts of
Industries making it a super flexible career choice now AI is a big field and there are tons of different jobs you can go for let's break down some of the key roles first up we have machine learning Engineers these folks are like the backbone of AI they build models that can analyze huge amounts of data in real time if you've got a background in data science or software engineering this could be your thing the average salary is around $131,000 in the US then there's data scientist the detectives of the AI World they dig into Data
to find patterns that help businesses make smart decisions if you're good with programming and stats this is a great option and you can make about $105,000 a year next we've got business intelligence developers they are the ones to process and analyze data to sport trends that guide business strategies if you enjoy working with data and have a background in computer science this role might be for you the average salary here is around $7,000 per year then we have got research scientist these are the ones pushing AI to new heights by asking Innovative questions and exploring
new possibilities it's a bit more academic often needing Advanced degrees but it's super rewarding with salaries around $100,000 next up we have big data engineers and Architects these are the folks who make sure all the different parts of business's technology talk to to each other smoothly they work with tools like Hadoop and Spark and they need strong programming and data visualization skills and get this the average salary is one of the highest in EI around $151,000 a year then we have ai software engineer these engineers build a software that powers AI application they need to
be really good at coding and have a solid understanding of both software engineering and AI if you enjoy develop in software and want to be a part of the air Revolution This Could Be Your Role the average salary is around $108,000 now if you're more into designing systems you might want to look at becoming a software architect these guys design and maintain entire AI system making sure everything is scalable and efficient with expertise in Ai and Cloud platforms software Architects can earn Hefty salary about $150,000 a year let's not forget about the data analyst they
have been around for a while but their role has evolved big time with AI now they prepare data for machine learning models and creat super insightful reports if you're skilled in SQL Python and data visualization tools like tblu this could be a great fit for you the average salary is around $65,000 but it can go much higher in tech companies another exciting rules is robotics engineer these Engineers design and maintain AI power robots from Factory robots to robots that help in healthcare they usually need Advanced degrees in engineering and strong skills in AI machine learning
and iot Internet of Things the average salary of Robotics engineer is around $887,000 with experience it can go up to even more last but not the least we have got NLP Engineers NLP stands for natural language processing and these Engineers specialize in teaching machines to understand human language think voice assistants like Siri or Alexa to get into this role you'll need a background in computational linguistics and programming skills the average salary of an NLP engineer is around $78,000 and it can go even higher as you gain more experience so you can see the world of
AI is full of exciting opportunities whether you're into coding designing systems working with data or even building robots there's a role for you in this fastest growing field so what skills do you actually need to learn to land an entry-level AI position first off you need to have a good understanding of AI and machine learning Concepts you'll need programming skills like python Java R and knowing your way around tools like tens flow and Pie torch will help you give an edge to and do not forget about SQL pandas and big Technologies like Hadoop and Spark
which are Super valuable plus experience with AWS and Google cloud is often required so which Industries are hiring AI professionals AI professionals are in high demand across a wide range of Industries here are some of the top sectors that hire AI Talent technology companies like Microsoft Apple Google and Facebook are leading the charge in AI Innovation consulting firms like PWC KPMG and accenta looking for AI experts to help businesses transform then we have Healthcare organizations are using AI to Revol solutionize patient with treatment then we have got retail giants like Walmart and Amazon leverage AI
to improve customer experiences then we have got media companies like Warner and Bloomberg are using AI to analyze and predict Trends in this media industry AI is not just the future it's the present with right skills and determination you can carve out a rewarding career in this exciting field whether you're drawn to technical challenges or strategic possibilities there's a role in AI that's perfect for you so start building your skills stay curious and get ready to be a part of the air Revolution imagine a world where creativity knows no bounds where machines can conjure of
art music and literature with the flick of a digital switch this isn't the stuff of Science Fiction it's the reality of generative AI a Cutting Edge technology that's reshaping our digital landscape picture this according to a recent report by Salesforce generative tools are already in the hands of 27% of Millennials 28% of genx and a staggering 29% of genz these aren't just numbers they are a testand to the growing influence of generative AI in our daily lives and as organizations raise to harness its power the demand for skilled generative AI experts is skyrocketing but what
exactly is generative AI it's more than just lines of code it's a gateway to infinite possibilities with generative AI machines can create anything from images to text to music all by learning from existing data sets it's the technology behind deep fakes virtual influencers and even the next big hit song so why should you care about generative AI in 2024 because it's not just the future it's the present it's the key to unlocking new Realms of creativity Innovation and opportunity and in this video we are going to show you how to become a master of generative
AI in 2024 so bugle up for the world of generative AI because the future is here and it's more exciting than ever before so welcome to the road map of becoming a generative AI expert in 2024 so learning generative AI in 2024 is crucial for several compelling reasons as we will outline in this diagram so the number one reason is technological advancement generative way represents a significant leap in the evolution of Technology particularly in its ability to generate complex outputs like video audio text and images this Innovation is set to expand exponentially marking a new
age of of technological innovation and the next reason is wide- ranging applications the surge in interest and development in generative AI is fueled by advancements in machine learning models artificial intelligence and platforms like chat jpd and Bart these tools have broad applications across various sectors making knowledge in this area highly valuable now moving to the next reason that is solving complex problems generative AI has the potential to simplify problem solving processes significantly its capabilities in creating real stick models can be applied to innovate and enhance Solutions across Industries and the next reason is impact on
major Fields the integration of artificial intelligence into major Fields is undeniable and generative AI plays a substantial role in this transformation it not only presents a threat to certain jobs but also opens up a plethora of new opportunities in the tech industry and Beyond and the next is dynamic and unexplored field the field of generative AI is filled with challenges and unexplored ter offering an exciting Frontier for those interested in shaping the future of technology it calls for creativity problem solving skills and a willingness to delve into the unknown so learning generative a in 2024
positions individuals at the Forefront of technological innovation equipping them with the skills and knowledge to contribute to significant advancements and explore new possibilities in the digital world and now we'll move to the major skills required to learn generative AI in 2024 so to effectively learn and Excell in generative in 2024 individuals need to possess a specific set of skills that are foundational to understanding and applying this technology so let's see what those skills are so let's start with the skills and the number one skill is deep learning and fundamentals a solid understanding of deep learning
Concepts is crucial this includes familiarity with neural networks back propagation and the various types of deep learning models and architectures and the next is machine learning Concepts Proficiency in machine learning is repeat Proficiency in machine learning is necessary andc compassing a broad range of algorithms their applications and an understanding of how they can be used within generative AI Frameworks and then comes the Python Programming Python Programming Remains the dominant programming language in Ai and machine learning Mastery over python includes its syntax data structure libraries such as tensor flow and py toch and Frameworks is essential
and the next skill is generative models knowledge specific knowledge of generative models such as generative adversion networks Gans and variational auto encoders repeat and variational auto encoders vaes is required understanding how these models function and are applied to key to innovating within the generative space and the next skill is image and text processing skills in processing and manipulating image and Text data are necessary as many generative AI applications involve creating or modifying such content and the next on the list is data processing and data augmentation the ability to pre-process and augment data efficiently can significantly
improve the performance of generative models skills in data cleaning augmentation techniques and feature engineering are vital and then comes ethical considerations with the power of generative AI comes the responsibility to use it ethically understanding the ethical implications of generative AI including issues of bias fairness and piracy is crucial the next is communication given the interdisciplinary nature of generative a projects effective communication skills are essential for collaborating with teams explaining complex Concepts in simple terms and engaging with stakeholders so developing these skills will prepare individuals for the dynamic and evolving field of generative enabling them to
contribute meaningfully to advancements in technology and address the challenges that come with it so let's move to the road map to learn generative AI in 2024 and the road map is as follows so the first step is understanding the basics of machine learning then comes mastering programming language that is mainly the python then is the learning data science and related Technologies and then we have handson realtime projects and then learning mathematical and statistics fundamentals and then on the list is developer skills and then we have the important thing that is keep learning and exploring so
starting with the number one point that is understanding the basics of machine learning so let's start by wrapping your head around the core machine learning algorithms it's like getting to know the tools in your toolkit each has its unique use make sure to understand the differences between supervisor unsupervised and reinforcement learning think of them as different paths to solving a puzzle some are straightforward repeat some are straightforward While others need you to figure out the rules as you go get comfortable with handling data after all data is the fuel for your machine learning engine learn
how to clean spit and pre-process it to get your models running smoothly learn how to evaluate your models with metrics like accuracy and precision it's like checking the health of a model to ensure it's fit for the real world and the next step for your road map would be Master Python Programming so focus on getting a strong grip on python syntax and structure python is the language of choice in AI so this is the learning repeat so this is like learning the alphabet before writing stories dive into librar is essential for AI such as pandas
for data manipulation and psychic learn for machine learning think of these libraries as your shortcuts to build powerful models practice writing efficient code it's not just about getting the right answer it's about getting their faster and cleaner engage with the python Community it's a treasure Trove of knowledge and a great way to stay updated on the latest trends and packages and then comes the next step that is explore data science and related Technologies sharpen your skills in data visualization visuals can reveal patterns and insights in data that numbers alone might not show Master feature engineering
to transform row data into a format that machines can better understand and predict for get a handal on building machine learning pipelines these are like assembly lines that take your row data and eye on emerging Tech Technologies and Frameworks in data science that complement generative AI staying updated will give you an edge in your projects so now moving to the next step that is engage in Hands-On real-time projects so choose projects that spark on your interest and challenge you this is where you get to apply what you have learned and see your knowledge comes to
life work with different generative AI models each project is a chance to deepen your understanding and refine your skills don't just build evaluate and iterate on your projects every iteration is a step closer to Mastery document and present your work clearly sharing your journey not only helps others learn but also solidifies your own understanding now moving to the next step for your road map is solidify your math and statistics fundamentals dive deep into linear algebra and calculus these are the building blocks for understanding how AI models learn and make predictions understand probability and statistics this
is crucial for modeling uncertainty and making informed predictions learn about optimization techniques these are the strategies your models used to improve over time like a person learning from the mistakes to get better now moving to the next step that is develop essential developer skills get comfortable with AI development tools these tools can make your work faster more efficient and more collaborative focus on debugging and testing a model that works flawlessly in theory might face unexpected challenges in the real world Embrace ethical AI development it's important to ensure your AI Solutions are fair accountable and transparent
and then comes the keep learning the field of AI is always evolving and staying curious is the key so this was for the development of essential developer skills now coming to the next step that is commit to continuous learning and exploration so participate in AI communities these are great spaces for learning from others experiences and sharing your own make reading research papers blogs and books a habit they are windows to the latest advancements and theories in AI attend workshops and conferences these events can inspire you and expose you to the new ideas and Technologies seek
man mentorship or collaborate on projects learning from others can accelerate your growth and open New Paths so this was the road map for learning generative AI in 2024 the world is becoming increasingly competitive requiring business owners or individual to find new ways to stay ahead modern customers or individuals have higher expectations demanding personalized experience meaningful relationships and faster responses artificial intelligence is a game changer here AI helps promote goods and services or make your life easy with minimal effort and maximum result allowing everyone to make faster better informed decisions however with so many AI tools
available it can be challenging to identify the best ones for your needs and productivity boost so here are top 10 AI Tools in 2024 that can transform your business or boost your productivity on the number 10 we have to to is a tool that can help you share your thoughts and ideas quickly and effectively unlike other methods such as as making a slide deck or building a web page Toms let you create engaging and detailed presentation in just a minute you can enter any topic or idea and the AI will help you to put together
a presentation that look great and gets your message across it's like getting the ideas out of your head and into the world all without sacrificing quality with Tom you can be sure that your presentation will be the boss fast and effective and Ninth on the list is zapier zapier is a popular web automation tool that connects different apps allowing user to automate repetive task without coding knowledge with zapier you can combine the power of various AI tools to supercharge your productivity zapia supports more than 3,000 apps including popular platform like Gmail slack and Google sheet
this versatility makes its a valuable tool for individual teams and businesses looking to streamline their operation and improve productivity and also with 7,000 plus integration and Services offering zapier EMP businesses everywhere to create processes and systems that let computer do what they are best at doing and let humans do what they are best at doing after covering zapia number Eighth on the list is gravity right gravity right is an AI powered writing tool that transer content creation it generates high quality SE optimized content in over 30 languages catering to diverse need like blog post social
media updates ad copies and emails these tools ensure 100% or plagorism free content safeguarding your Brand's Integrity its AI capabilities also include text to image generation enhancing visual content for marketing purposes the tool offers both free and paid plans making it versatile for freelancer small business owner and marketing teams on the seventh number we have audio box audio box is an advanced CI tool developed by meta designed to transform audio production it allow user to create custom voices sound effect and audio stories with simple text prompt using natural language processing audio box generate high quality
audio clips that can be used for various purposes such as text to speech voice mimicking and sound effect creation additionally audio Box offer interactive storytelling demos enabling user to generate Dynamic narratives between different AI voices this tool is particularly useful for content creator marketers and anyone needing quick high quality audio production without extensive manual effort and next on number six we have AOL AOL is Advanced AI power tool tailored for e-commerce and marketing professionals it offers comprehensive suit of feature designed to streamline content creation and enhance personalization with a cool user can generate customiz text
images voice and videos making it an invaluable assert for creating engaging product videos and marketing materials key feature of a cool include face swapping realistic avatars video transition and talking photos these tools allow businesses to create Dynamic and personalized content that can Captivate audience on social media and other platform a Cool's user friendly interface and intelligent design make it easy for user to produce high quality content quickly and efficiently on number five we have 11 Labs 11 Labs is a leading AI tools for text to speech and voice cloning known for its high quality natural
sounding speech generation the platform includes features like voice lab for creating or cloning voices with customizable options such as gender age and accent hey there did you know that AI voices can whisper are do pretty much anything ladies and gentlemen hold on to your hats because this is one bizarre site we have reports of an enormous fluffy pink monster strutting its stuff through downtown fluffy bird in downtown weird um let's switch the setting to something more caling imagine diving into a fast-paced video game your heart heart beats sinking with the story line I got to
go the aliens are closing in that wasn't calming at all explore all those voices yourself on the 11 Labs platform professional voice cloning supports multiple language and needs around 30 minutes of voice samples for precise replication the extensive voice Library offers a variety of profiles suitable for podcast video narration and more with various pricing plans ranging from free to Enterprise level 11 Labs creators to individual creators and large businesses alike standing out for its user friendly interface and Superior Voice output quality at number four we have go enhance go enhance AI is an advanced multimedia
tool designed to Rize video and image editing it leverages powerful AI algorithm to enhance and upscale images transforming them into high resolution Masterpiece with extreme detail the platform standout feature video to video allow user to convert standard video into various animated sty such as pixel art and Anime giving a fresh and Creative Touch to otherwise ordinary footage this AI tool is ideal for social media content creator marketer educator and anyone looking to bring their Creative Vision to life whether you need to create eye-catching marketing materials or professional grade videos go enhance AI provides the resources
to do so efficiently at number three we have pictor pictor ai power tool designed to streamline video creation by transforming various content types into engaging visual media it excels in converting text based content like articles and script into compiling videos making it ideal for Content marketers and Educators users can also upload their own images and videos to craft personalized content the platform featured AI generated voiceovers which add a professional Touch without the need for expensive voice Talent Victoria AI affs a range of customizable templates simplifying the video production process process even for those with no
design skills additionally its unique text based video editing capability allow user to repurpose existing content easily creating highlights or short clips from the longer videos at number two we have Nvidia broadcast it's a powerful tool that can enhance your video conferencing experience whether you are using Zoom or teams it can address common challenges like background noise poor lightning or low quality audio video with this software you can improve audio quality by removing unwanted noise such as keyboard clicks off hand sound it also offers virtual background option and bluring effect without needing a green screen so
you can seamlessly integrate it with other application like OBS Zoom Discord or Microsoft teams think of it as having a professional studio at home plus it's a free for NVIDIA RTX graphic card user visit the website to learn more and start using it today after covering all the tools at number one we have tap Leo Tapo is an AI powered tool designed to enhance your LinkedIn presence and personal branding it leverages artificial intelligence to create engaging content schedule post and provide insight into your LinkedIn performance tap Leo's main feature include AI powered content inspiration a
library of viral post and a Robos post composer for scheduling and managing LinkedIn content efficiently Tapo also offers easy to understand LinkedIn analytics to help user Make informed decision based on their performance data a free Chrome extension provides a quick overview of performance metrics directly on linkedin.com making it a convenient tool for daily users there you have it top 10 AI tools that are set to transform your life in 2024 whether you are Developer content creator or someone looking to boost their productivity these tools are worth keeping an eye on the future is there and
it's powered by AI something big is happening in the world of AI and it's shaking up the industry no no it's not another chat GPT update or a new release from open AI this time it's a brand new player on the scene deep seek R1 an AI model that's already outperforming the best known models in ways that have experts seriously been talking but why is this causing such a stir well it's because deep seek R1 doesn't just answer questions like otheris it thinks and I mean really thinks it breaks down complex problems step by step
showing you exactly how it arrives at the solution you just don't get an answer you get the entire process from start to finish that too in real time this is the kind of AI that doesn't just solve problems it solves the way we do now here's the twist deep seek R1 isn't from one of the usual AI giants like open AI or anthropic it was developed by high fly Capital Management a hedge fund that's quietly been making waves in the AI World in fact their new DEC carbon Light review is already making headlines for its
stunning performance on task like mathematical reasoning logical inference and real-time decision- making areas where even the best of the Open Eyes models often fall short but why is this model such a GameChanger it's because in 2024 the world is demanding AIS that can think logically and reason through problems deep se's ability to show its work is what sets it apart what while other AIS give you answers deep seek lets you watch its thought process unfold think of it as the AI that doesn't just solve problems but teaches you how to think like it does and
if you think that's all just a hype let me tell you fobes isn't the only one buzzing about it even experts across the industry are pointing out that deep seek is outperforming model from heavy weights like open anthropic on test that measures complex reasoning it's already winning in some major AI benchmarks and it's only going to get better so why is deep seek R1 such a big deal right now and what does it mean for the future of AI in 2024 and Beyond well in this video we will explore deeps R1 starting with its unique
capabilities and how it stands out from the other AI model we'll also test its math reasoning and problem solving skills comparing it with models like open AI then we will examine the real world application in fields like finance and Tech so let's dive in and see how it's shaping the future of AI so at first let us understand what is exactly deep seek hour life as you can see the street Force so deeps hour Life review is an AI tool which is similar to CH GP and is created by a Chinese company which is known
as deep SE the company announced this new model on X on November 20th and shared a few details so I'll show you the documentation page over here so this is a documentation page page of deep seek API and deep seek hour Light review is meant to be really good at solving problem reasoning in maths coding and logic it shows you how to think step by step so that you can understand how it comes up with answers and which help people to build trust more so you can try out me on the website and the website
name I'm going to show you so you just have to Simply go to Google and uh type over here chart do deep seek.com as soon as you enter this you'll be redirected to deep seek page so simply I just log in with the Google account and here you can see this is the interface of our deep seek our light model and uh here we will enable this uh deep think uh preview which is a new feature over here and now let's start with our test and start testing whether deeps this model is really that effective
or worthy hypon so first we'll be doing the strawberry test okay so let's see what exactly is the strawberry test so let's start with a very simple question which is how many times does the letter R occur in the word strawberry so this is our question over here and let us just check what would be the answer so I didn't expect such a long reasoning process what seems like a straightforward task I thought that after counting the letter R and identifying the position of the word it would have stopped there but what's interesting to me
is that it didn't stop it double checked the counting a couple of times I even considered things like how people might pronounce or spell the word differently which I think is a bit of predal especially the pronunciation part but then that's how it does show how careful and thoughtful it is so it just explained us every step by step and what process we can go through it so let's move on to our second question which would be a mathematical reasoning question so our next question would be uh if a triangle so if a triangle has
sides of 345 what would be its area so let's just check so as you can see here the Deep c car model is getting answers and uh it performs the checks I predicted although in a different order but then it's using all the steps and all the formulas does and what would be the best to go with this question so both the explanation and the output are particularly clear and easy to follow which makes me think that this would be a fantastic model to embed in math students assistance for example this is a particular use
case maybe the thought process could be shown first and the student could interact with acknowledging whether they understood it or not so that's a pretty uh fast answer let's move on to our third question so now let's ask deeps model or geometrical type of question which is a bit tough and let's see what answer would it generate for us and uh how it is it taking the logic building and the process so as you can see it is taking all the formulas and also it is considering the other probabilities as well so we'll just wait
a few seconds so as you can see it has written we and messed up in a denominator so it is even checking what mistakes has it generated which is really good it is just like as if somebody is solving this mathematical problem but then in the form of AI so that's really interesting so now as you can see it has given us the final answer and also step-by-step approach which is well known geometric formulas and applied them directly which makes the reasoning very easy to follow so the output is very clear and easy to follow
and across all these example it's been impressive to see how it consistently double check calculations using different methods the thought process is always detail logical and very easy to understand so for somebody who wants to actually learn or go through these mathematical problem this model will give a very detailed explanation so I would highly suggest using this so let's now move on to our next question which which would be a coding test so the first question would be uh Implement a function in Python that finds the longest palic substring in a given particular string the
function should have a Time complexity better than o n^ 3 they go of n the^ 3 so let us just see what and what is the time complexity will generate the answer for us the code let's just check it out so as you can see it has given us the solution code here and also the experient part and whatever the function it is using it has briefly mentioned the explanation about it and I think that the model did a very great job solving the problem and finding the longest paric substring the approach was smart efficient
and K Spain so instead of just brute force method every possible substring which would be slow it used a very clever expand around sentence technique and this method handle both odd length Padres and even length palad dromes as well which was really clear and effective so that was for the coding part so now let's move on to a logical reasoning type of question and let's see what answer does it generate and U and we will also compare it with the open a model here as well so let's quickly check it out a question here is
a man has to cross a river with a wolf a goat and a cabbage his boat can only carry himself and one other thing if left alone the W the wolf would eat the goat and the Goat would eat the cabbage how can he get everything across safety so let's quickly check out the answer so as you can see uh it's thinking the getting the answer for this also for the Deep thing messages U you can use 50 messages free per day so we have already um used five text and we have 45 messages left
for today so as you can see it has given a clear detailed explanation and it solved the problem really very well I think it carefully uh went through the rules and checked different possibilities it understands that some pairs like the wool and the Goat or the goat the Cabbage can't be left alone it also reviewed the constraints at the starting it also reviews the constraints at the start starting from this it looks at what would happen if the man takes each item across the river first and works out it creates any problem what I find
really nice is about how the model adjusts its plan when something doesn't work for example when it tries taking the wool first it realizes talking to the wolf first but realizes at the same time that it causes trouble and then rethinks the step this trial and error method feels very similar to how we as hum might solve the puzzle ourselves so in the end the model comes up with a very right solution and explains it in a very clear and stepbystep way so as you can see I've given charity this question and this is the
answer has generated but then it wasn't that detailed as compared to deep seek model so here I would read deep seek model more because it gave me a very clear explanation with all the details step you can clearly see the difference out here so for anyone who wants to understand from depth can this deeps model would be a great help so let's quickly have a look on the deeps R1 Life review benchmarks so we have tested the model in real world conditions but how does it stack up against other AIS on benchmarks like Aime math
and competitive programming languages so you have this on the screen we have compared it with aim MTH gpq Diamond code forces life code bench and also zebra project so according to the benchmarks de seek a Life review blows the competition out of the water when it comes to math and logic problems with the pass rate of 52.5% on the AI 2024 challenge it is really ahead of openi 01 preview and far surpasses models like gbd4 on complex math class so with this we have come to the last part of the section and here we will
be comparing that who who really wins at first you have seen that we have compared with the open a model so let's just quickly test some questions and just check it out that with open AI which one performs much more better so now we will compare deep seek versus open a over model with some command promps to see who really wins in different categories so here I have this dck model so my first command for this test is a math problem and it says solve for x the qu itic equation 3x ^ 2 + 5x
- 2 so let's quickly see this what answer is it given and we will be comparing it with the openi OV model as well so as you can see it is giving us all the available Solutions using different methods as well and see the first method the second method quadratic formula basically giving you the full overview of all the methods which you find easier and you can use that and here is our final answer so let's compare it with the open AI OV model as well so here I have this question and I'll just check
for the answer it's taking a bit of time but let's wait for a minute so as you can see it has only used one formula which is the quadratic formula and here we have the answer for this thing but what I found is uh it provided a correct solution but then it lacked of whereas a deep seek model it provided much more detailed explanation so I think deep seek wins for this Math's question as well for the second comparison I have a coding problem so let's check for both U first we will be checking for
deep SE model and the coding problem is write a python function to check if a given string is tendr or not and I'm waiting for the answer here so as you can see deep seek car like review model has provided us a correct solution along with a detailed breakdown of why each part of the code works as it is does so let's check for the OV model as well so now as you can see for the coding task uh the OV model has given us the solution but then it provides a working solution with clear
code but it doesn't explain the logic behind it whereas the deic model had provided the logic and also the solution with each detailed breakdown and I I think for this also the winner would be deeps carb light preview because it provides better and clearer explanation now let's move on to the third test which would be the logical reasoning so here is my question the logical reasoning which is if a man has five apples and gives three away how many apples does he left with so let's check for the Deep seat model so now you can
see it has given us the answer and also the correct answer but it has explained each and every logic to go through this question and why have we used this particular logic as well so that's pretty uh breakdown detailed step by-step explanation let's check for the open a OV model as well so here I have my question let's look for the answer see it has given us a straightforward answer which is appro to Apples but then it hasn't mentioned the logic behind it same as it was say for the coding task I think um since
both of the models have provided the accurate correct answer but here also the winner would be deep seek because it has provided with a lot of explanation and transparency as well so this is the wrap up of the video today and if you're looking for an a excels in reasoning problem solving and transparency it's definitely a contender while openi and other models are fantastic deepi bring something you need to the table step by-step logic clear explanation with an edge when it comes to math and in challenges so what is deep learning deep learning is a
subset of machine learning which itself is a branch of artificial intelligence unlike traditional machine learning models which require manual feature extraction deep learning models automatically discovers representation from raw data so this is made possible through neural networks particularly deep neural networks which consist of multiple layers of interconnected nodes so these neural network are inspired by the structure and the function of human brain each layer in the network transform the input data into more abstract and composite representation for instance in image recognition the initial layer might detect simple features like edges and textures while the deeper
layer recognizes more complex structure like shapes and objects so one of the key advantage of deep learning is its ability to handle large amount of unstructured data such as images audios and text making it extremely powerful for various application so stay tuned as we delve deeper into how these neural networks are trained the types of deep learning models and some exciting application that are shaping our future types of deep learning deep learning AI can be applied supervised unsupervised and reinforcement machine learning using various methods for each the first one supervised machine learning in supervised learning
the neural network learns to make prediction or classify that data using label data sets both input features and Target variables are provided and the network learns by minimizing the error between its prediction and the actual targets a process called back propagation CNN and RNN are the common deep learning algorithms used for tasks like image classification sentiment analysis and language translation the second one unsupervised machine learning in unsupervised machine learning the neural network discovers Ms or cluster in unlabelled data sets without Target variables it identifies hidden pattern or relationship within the data algorithms like Auto encoders
and generative models are used for tasks such as clustering dimensionality reduction and anomaly detection the third one reinforcement machine learning in this an agent learns to make decision in an environment to maximize a reward signal the agent takes action observes the records and learns policies to maximize cumulative rewards over time deep reinforement learning algorithms like deep Q networks and deep deterministic poly gradient are used for tasks such as Robotics and gameplay moving forward let's see what are the artificial neural networks artificial neural networks a Ann's inspired by the structure and the function of human neurons
consist of interconnected layers of artificial neurals or units the input layer receives data from the external resources and it passes to one or more hidden layers each neuron in these layers computes a weighted sum of inputs and transfer the result to the next layer during training the weight of these connection are adjusted to optimize the Network's performance a fully connected artificial neural network includes an input layer or more hidden layers and an output layer each neuron in a hidden layer receives input from the previous layer and sends its output to the next layer so this
process continues until the final output layer produced the network response so moving forward let's see types of neural networks so deep learning models can automatically learn featur from data making them ideal to tasks like imag recognition speech recognition and natural language processing so the most common architecture and deep learnings are the first one feed foral neural network fnn so these are the simplest type of neural network where information flows linearly from the input to the output they are widely used for tasks such as image classification speech recognition and natural Lang processing NLP the second one
convolutional neural network designed specifically for image and video recognition CNN automatically learn feature from images making them ideal for image classification object detection and image segmentation the third one recurrent neural networks RNN are specialized for processing sequential data time series and natural language they maintain and internal state to capture information from previous input making them suitable for task such as speech recognition NLP and language translation so now let's move forward and see some deep learning application the first one is autonomous V deep learning is changing the development of self-driving car algorithms like CNS process data
from sensors and cameras to detect object recognize traffic signs and make driving decision in real time enhancing safety and efficiency on the road the second one is Healthcare diagnostic deep learning models are being used to analyze medical images such as x-rays MRIs and CT scans with high accuracy they help in early detection and diagnosis of diseases like cancer improving treatment M outcomes and saving lives the third one is NLP recent advancement in NLP powered by Deep learning models like Transformer chat GPD have led to more sophisticated and humanik text generation translation and sentiment analysis so
application include virtual assistant chatbots and automated customer service the fourth one def fake technology so deep learning techniques are used to create highly realistic synthetic media known as defix while this technology has entertainment and creative application it also raises ethical concern regarding misinformation and digital manipulation the fifth one predictive maintenance in Industries like manufacturing and Aviation deep learning models predict equipment failures before they occur by analyzing sensor data the proactive approach reduces downtime lowers maintenance cost and improves operational efficiency so now let's move forward and see some advantages and disadvantages of deep learning so first
first one is high computational requirements so deep learning requires significant data and computational resources for training whereas Advantage is high accuracy achieves a state-of-the-art performance in tasks like image recognition and natural language processing whereas deep learning needs large label data sets often require extensive label data set for training which can be costly and time consuming together so second advantage of deep learning is automated feature engineering automatically discovers and learn relevant features from data without manual intervention the third disadvantage is overfitting so deep planning can overfit to training data leading to poor performance on new unseen
data whereas the third deep learning Advantage is scalability so deep learning can handle large complex data set and learn from massive amount of data so in conclusion deep learning is a transformative leap in AI mimicking human neural networks it has changed Healthcare Finance autonomous vehicles and NLP on July 25th open AI introduced search gbt a new search tool changing how we find information online unlike traditional search engines which require you to type in specific keywords Serb lets you ask question in natural everyday language just like having a conversation so this is a big shift from
how we were used to searching the web instead of thinking in keywords and hoping to find the right result you can ask SE gbd exactly what you want to know and it will understand the context and give you direct answers it designed to make searching easier and more intuitive without going through links and pages but with this new way of searching so there are some important question to consider can sir GPT compete with Google the sech giant we all know what makes sir GPT different from AI overviews another recent search tool and how does it
compare to chat GPT open AI popular conversational AI so in this video we are going to explore these questions and more we will look at what makes rgbt special how it it compares to other tools and why it might change the way we search for information whether you are new into Tech or just curious this video will break it down in simple words stick around to learn more about sgbt so what is Sir gbt Ser GPT is a new search engine prototype developed by open AI designed to enhance the way we search for information using
AI unlike a typical Jetport like chat GPT sear GPT isn't just about having a conversation it's focused on improving the search experience with some key features the first one is direct answer instead of Simply showing you a list of links sgpd delivers direct answer to your question for example if you ask what is the best wireless noise cancellation headphone in 2024 sgbt will summarize the top choices highlighting their pros and cons based on Expert reviews and user opinions so this approach is different from the traditional search engines that typically provide a list of links leading
to various articles or videos the second one is relevant sources sir GPD responses come with clear citations and links to the original sources ensuring transparency and accuracy so this way you can easily verify the information and delve deeper into the topic if you want the third one conversational search SBD allows you to have a back and forth dialogue with the search engine you can ask follow-up questions or refine your original query based on the responsive you receive making your search experience more interactive and personalized now let's jump into the next topic which is Sir GPT
versus Google so sir GPT is being talked about a major competitor to Google in the future so let's break down how they differ in their approach to search the first one is conversational versus keyword based search search GPT uses a conversational interface allowing user to ask question in natural language and refine their queries through follow-up question so this creates a more interactive search experience on the other hand Google relies on keyword-based search where user enter specific terms to find relevant web pages the second thing is direct answer versus list of links so one of the
search gpts standout feature is its ability to provide direct answers to the question it summarizes information from the various sources and Cly ites them so you don't have to click through multiple links Google typically present a list of links leaving user to shift through the result to find the information they need the third one AI powered understanding versus keyword magic sir GPS uses AI to understand the intent behind your question offering more relevant result even if your query isn't perfectly worded Google's primary method is keyword matching which can sometimes lead to less accurate result especially
for complex queries the fourth one Dynamic context versus isolated searches so search gbt maintains content across multiple interaction allowing for more personalized responses whereas Google treats e search as a separate query without remembering previous interaction and the last one real time information versus index web pages Serge is aim to provide the latest information using realtime data from the web whereas Google V index is comprehensive but may include outdated or less relevant information so now let's jump into the next topic which is SBD versus AI overviews so SBD and AI overviews both use AI but they
approach search and information delivery differently it's also worth noting that both tools are still being developed so their features and capabilities May evolve and even overlap as they grow so here are the differences the first one is Source attribution Serb provides clear and direct citation linked to the original sources making it easy for user to verify the information whereas AI overviews include links the citation may not always be clear or directly associated with specific claims the second one is transparency control sgbt promises greater transparency by offering Publishers control over how their content is used including
the option to opt out of AI training AI overviews offer less transparency regarding the selection of content and the summarization process used the next one is scope and depth ser strives to deliver detailed and comprehensive answers pulling from a broad range of sources including potential multimedia content and in AI overviews offers a concise summary of key points often with links for further exploration but with a more limited scope now let's jump into the next part Ser GPT versus Chad GPT Ser GPT and CH GPT both developed by open AI share some core features but serve
different purposes so here are some differences the first one is primary purpose sgpt designed for search providing direct answer and sources from the web whereas sgpt focus on conversational AI generating text responses the second one is information sources sgbt relies on realtime information from the web whereas s GPD knowledge based on the training data which might not be current the third one is response format Ser GPD prioritize concise answers with citation and Source links so whereas jgb is more flexible generating longer summarizes creative content code and Etc the next feature is use cases serg idal
for fact finding research and task requiring upto-date information whereas chpd is suitable for creative writing brainstorming drafting emails and other open end datas so now question arises when will sgpt be released sgpt is currently in a limited prototype phase meaning it's not yet widely available open a is testing with a select group to gather feedback and improve the tool so if you are interested in trying SBD so you can join the weight list on its web page but you will need a chat gbt account a full public release by the end of 2024 is unlikely
as open ey hasn't set a timeline it's more probable that SBD features will gradually added to the chat GPD in 2024 or in 2025 with the potential Standalone release later based on testing and the feedback so with this we have come to end of this video if you have any question or doubt please feel free to ask in the comment section below our team of experts will help you as soon as possible did you know that within just a few lines of code you can transform an AI model into something far more powerful something that
responds to questions connects to life data pull insights from databases and even interacts with other app in real time that's what Lang chain allows you to do and it's quickly becoming the go-to framework for AI developers think about this you're about to create some something amazing an AI that can think learn and grow in ways we once only dreamed of and here's the best part you don't need to be an AI expert to make that happen L chain is like a toolkit that connects the most advanced large language models like openi GPD to realtime data
allowing you to build AI applications that are smart flexible and highly interactive L chain is more than just a way to make AI development easier it's a framework that allows different language models to work together seamlessly so whether you want to understand user questions with one llm create humanlike responses with another or pulling data from an API or a database Lang chain makes all possible the framework takes care of heavy lifting connecting models managing data flows and even customizing how your AI interacts with external sources now the question is why is Lang chain so popular
it has become one of the most fastest growing open source project because it's solving a huge problem for developers the challenge of integrating generative VI and alms with external data and complex workflows as AI becomes more Central to our lives in 2024 Lang chain is helping developers create smarter more powerful application so whether it's just for chatbots Content creation or Advanced Data analysis in this tutorial I'll show you exactly how to get started with Lang chain from setting up your environment to building the first AI powered app I'll walk you through it so Lang chain
makes it possible to train models on our own custom data opening up more possibilities for building specialized intelligent application by the end of this video you will be ready to start building with L chain and trust me once you see how easy it is you'll wonder why you didn't start using it sooner let's start with a simple question why should we use Lang chain imagine you're working with large language models like gp4 or hugging face models and you want to take their capabilities further like integrating them with your own data sources or allowing them to
take action based on information they retrieve this is where Lang chain comes in Lang chain is like an open source framework that allows you to build intelligent applications by connecting large language models with external data sources it can turn static AI interactions into Dynamic data aware workflows one of the best parts is you don't have to manually code everything from scratch Lang chain abstracts a way much more complexity of working with llms allowing developers to focus on building functional applications instead of wrangling API calls and managing data pipelines so langin is set to play even
bigger role in a development because it enables you to harness true part of generative AI by connecting it with realtime data and external tools so now we have understood what Lang chain is let us now understand how to install lanch so let's start with the installation of lanch uh we'll just simply go to the website and we'll just simply go to the website docs part and we'll just read through this documentation so here it has explained what L chain is and what are the framework consisting of so we also have this tutorials on how do
we install L chain okay so for installing you can just simply click on this quick start and uh see here it has uh written how to we set up on jupyter Notebook so this is the command if you want to uh install Lan chain we will use the pi pip command so just simply you can copy this command pip install L chain and you have to open your command prompt or The Terminal in your computer and here you have to Simply copy paste the command so as you can see it has it will uh load
all the packages which is required for installing L chain so you can see your requirement already satisfied this is because I had already installed my Lang chain before so uh we have understood how to install this Lang Chain by using this command and you can also install the LM chain we'll understand it later so let me just show you what else you need to install first we have understood the slang chain then we have the pine con client so we'll just simply search here pine cone client and uh it will redirect us to this page
so pine cone client is actually a vector store for storing and retrieving embeddings which we will use in the later steps so pine cone is also used to actually uh you know create secret API keys so here you can also create the API Keys you can also read the documentation part so uh so we'll understand how to create API Keys using open AI but first uh let us install Pine con client in our system so we'll again go to command prom this terminal and will just copy paste PP install pine cone client so you can
see here it will download and install all the packages required so it has been installed now the third thing we are talking about is open EI client so we'll use openi models for a language large language task so uh so we'll just simply search here open AI okay and so it has redirected us to this page open AI platform and uh okay before starting this uh so this is the platform here where you can create export an API key uh in open AI okay so you can see here overview quick start concept everything is there
and uh to create an API key will just simply click here and uh here you have to select this option create a new secret key suppose I give my secret key name anything you can give suppose uh I give test test 1 2 3 okay and permissions is all and we create the secret key now uh you need to uh actually save your key we'll just copy this key because it will be required later while uh debugging the code so we will just copy paste this secret key we will require it later and then done
so these are the keys I have created so actually chat GPT and other LM models like open and hugging phas uses L chain to integrate with other apis to create your own custom llm models or chart boards so suppose here we have logged in our chart GPT here and uh if I search here who won the uh WTC World Cup in 2023 so here it has shown the answer but for example uh if I search who won the cricket match World [Music] Cup so here as you can see the Char has given answer as as
my last knowledge update men's welcome has not taken place yet this is happening because the this charp uh older version has not been trained on the latest uh upcoming news or whatever the new technology is so by using langin you can integrate with other apis and you can create your own customized LM models or chart boards which help you to train your own custom data using various tool and apis so uh before we move on I've already showed you how to create the secret uh API key and how you have to store that the API
Key address so first we have already understood how do we install Lang chain here so by using the PIP command we do it and uh so you also need to install python uh 3.8 or later installed in your system so I already have python installed and to check that you can just simply I already have it installed in my system so to check that I'll simply just type here python minus minus version and click enter so as you can see it has shown me the python uh version which is installed in my computer the so
the second step is already uh we have discussed which is open API key here so second step we have already discussed how do we create our open a API key so we have to sign up in our openai then go to the API key section and then create a new secret key and these are the keys I have created and you can just uh keep it later later you'll use that so now we'll come to the third step which is create a project directory and setup so what we do is uh we have Jupiter installed
in our system so we'll just go to command prompt here and type uh j u y Jupiter notebook so it will redirect us to The jupyter Notebook installed in our system so if not uh we can just simply go from here it is loading right now we have to wait now you just need to click on this uh new and Python 3 I by kernel because python has been installed in my system so we will use python as our curent here and here you can just give the prompts the command before that you have to
create a python file which we can also create this python file in um Visual Studio code just simply go to visual studio code here and just simply click on file new file and I'll just type here python dop and you can uh you have to first create store the API key for this we use the command open AI underscore API underscore key equal to and just give your uh secret key okay you can just simply uh copy paste here your secret key and just store this so This ensures that your API key stored securely and
it can be used whenever needed now the step four is to initialize a project and install the required libraries so you need to add some additional libraries like streamlet to make a user interface so let's uh add that too in our project folder you can either create uh a text file in requirements. text and then uh install all these we have already installed the openi Lang chain we just need to install streamlet so here you can just give the command pip inst install streamlet so as you can see this I have already uh installed streamlet
before same wise you can also install open a if it is not installed in your system using the command terminal the windows partial so uh we have understood this how what all packages and what all uh you need to install now the next step is to build your first slang chain app to create a simple app which uses a input query and the app will generate a response using GPT model so you have to create a python file name uh main. Pi here so so as you can see I've already uh imported this main. pi
and this is my code here import streamlet as STD from L chain and the constants I have created and then I have initialized the open AI with our API key so you have you have to just type this prompt here I'm using vs code here you can also do it in your jupyter notebook and then to create the streamlet app you have to give a title Lang Chene demo with open AI so this is the title I have provided and then the text input Pro for prompt The Prompt is uh sd. text input and enter
a prompt you can just type enter a prompt or whatever you wish to and then display the response so if prompt response is lm. predict uh prompt so you can use a predict method for llm so here what the Apple so after creating and debugging this in the terminal so your Apple initial open a using your API key and the user inputs a prompt through the streamlit interface langin processes the input and sends it to the open GPT model and the AI generates a response which is then displayed in the app so now you can
use all these prompts to run on your app so to do this you can just uh to see your app in action you can just go to the terminal and run the following command which is streamlit run main. so you can just simply go to uh the terminal here and just simply type the command uh simply type the command which is Stream streamlit Run mean dopy so by giving this prompt a new tab in your browser will open displaying the app and you can also type any question into the input box so last now we
have understood all these steps so this was a quite basic tutorial on how to install L chain and then you know integrated with the app you can also customize and expand so Lang chain flexibility allows you to integ create other apis also external data sources or even add memory to your AI application so whether you're building a simple chart board or more complex AI system the possibilities are endless so by following all these steps you will have a fully functioning app running in your system in no time today we will take you through a hands
of lab demo of how we can use Gan generative adversarial Network for the image classification and for amazing video like this subscribe to our YouTube channel and press that Bell icon to stay updated so in today's session we will discuss what G is and moving ahead we will cover types of models in G and in the end we will do a hands of lab demo of celebrated phas image using G so now let's see what is g so generative adversarial networks were introduced in 2014 by inj good fellow and co-authors G perform unsupervised learning task
in machine learning GN can be used to generate new example that possibly could have been drawn from the original data set so this is an image of G there is a database that has a real 100 rupe note the generator neural network generates fake 100 rupe node so the discriminator network will help to identify the real and the fake node or the real and the fake images you can see so moving ahead let's see what is generator so a generator is a g neural network that creates fake data to be trained on the discriminator it
learns to generate possible data so the generator examples or instances become negative training examples of the discriminator so as you can see here the random input generate a new fake image the main aim of the generator is to make the discriminator classify its output as real so the part of GN that drains the generator includes the noisy input vector or generator Network which transform that random input into a instance or the discriminator network which classify the generator data so after seeing what is generat let's see what is a discriminator so the discriminator is a neural
network that identifies the real data from the fake data created by the generator so the discriminator training data comes from two sources the real data instance such as real pictures of birds human currency notes and anything are used by the discriminator as a positive sample during training the second one is the fake data instance created by the general data are used as a negative examples during the training process so discriminator decide from the real images and the fake images generated by generator and discriminator decide which is fake and which are real so now let's move
on to the programming part and see how we can use G using celebrity face image data set so here we will start with G generative adversarial networks okay so first I will rename with G okay so here we will import uh some libraries like import OS so we will do from py toor machine learning deep learning library which work for like Neal networks so here I will write from torch do uist do data Bo data loader okay so what is this torch. u.d dat so this is an abstract class representing a data set and you
here you can custom data set that inherit data set and override the data set okay and this import data loader so data loader is a client application for the bulk import or export of the data and we can use it for to insert update delete or export like records and when importing data data loader reads extract and loads data from the CSV files like comma separated values or from a database connection you can say and when exporting data it's output a CSV file okay then moving forward like torch Vision dot transform as t Okay so
transform are like very common image transformation available in the tou Vision so transformation module they can be changed together using compost so most transform classes have function equivalent functional transform give find grin control over the Transformations and one more like from torch Vision Vision dot transforms sorry data set sets import import image folder okay invalid syntax why it is invalid I will tell you it's not it's import okay yes so now what I will do okay toor youtil it is yeah now it's working fine so now we will import uh the data set so we
are here we are using celebrity face image okay so I will provide you the data set in the description box below don't worry okay so you can download from data set directly from there so this is my path to data set 375 ttop phas image data set okay now let's run it oh okay now let's run it okay now I guess it's fine yeah so here what I will do I will set the image size and all so image size to 64 10 batch size to 256 then B size equal to 256 then stats equals
to 0.5 comma 0.5 and again 0.5 okay comma 0.5 comma 0.5 comma 0.5 okay so here we have set the image size and the B size and the stat values so now what we will do we will train the data set so here I will write train train DS equals to image folder of data sorry data directory comma transform T dot T dot compose here I will add D dot uh size then image size okay then again T Dot Center Crop Center crop here I will write image size I will be [Music] small then here
I will WR T dot to tensor comma T do normalize stats okay let me do like this here I can write train DL equals to data loader then train DS B size then Shuffle equals to True comma some workers equal to two number of workers then here I will write pin memory okay let me run it okay the system cannot find the part to specified C user okay so there is an part error okay so let me copy my path let's see now let me run yeah so it's working fine so let me import Torch
from torch Vision do utils import megr MD okay then import M plot lib M plot lib dot P plot spt [Music] then plot l in line so this torch vision. you tills import make grid is used to uh make a grid okay grid you know small small boxes and this m plot Li you already know is used for the making charts different types of chart line chart bar chart pie chart okay so let me run this some spaces so here I will write now make a function non IMG tensors then return IMG ters stats 1
0 Plus stats Z I get Z okay so let me run this now what we will do we will make again a new function for show images and show badges okay for that I will write DF show image okay the images comma n Max = to 64 64 will be there then figure Comm XIs = to PLT do subplots figure size 10 comma 10 okay then ax XIs do set XT ax. set yex okay then ax. IM show this is image show then make grid the non with the non function images. detect and Max comma
and R number of rows will be eight then dot permute 1 comma 2 comma 0 okay then DF show batch DL comma n Max = to 64 then for images in D show images then images comma Max like n Max then break okay so now let's see some badges so I will write show batch train D it's loading it's loading okay some okay image okay I'm show okay the spelling mistake so as you can see here this maybe Robert Downey junor this is Robert Downey junor this is also Rober Jun and different celebrities here so
we have to do GN in this we will generate the fake images and we'll generate the new images then discriminator will set the images which are real or fake okay so now let's use GPU like let's see GPU is available or not okay so here I will write the get default device then if do doq do is available then return do. device then Q down okay else return torch. device to CPU then DF to device data from our device like for from this we will move tensor to chosen device like okay if is instance see
instance data comma list comma double return to device X comma device for X in data return data. to device Comm non blocking equals So Pro okay T will be Capital here then I will write class device data loader so here what will we will do we will WRA a data loader to move data to a device so for the DF in it function to self comma DL comma device then here I will write self do dlal DL then self dot device device okay so here I will write t for thetion so here I have to
give two underscore here I will write again self so it yield a batch to data after moving it to a device so for for B in self. DL then yield to device then B comma self do device okay and the last one is DF for the length we'll write self then it will return the number of badges so return length of cells. DL okay okay invalid syntax okay not DOT okay so here I will write device here I will write device get fault device device okay then train DL equals to device device data loader and
train DL device okay so uh as we already know what is GN and what is discriminator and you know generator so let's uh take again GN overview so a generative address Network GN has two parts so the generator learns to generate plausible data the generator instant become negative training example for the for producing impossible results so so you have data so what discriminator will do discriminator will you know decide from the generated data and the real data which are fake and which are real okay this will generator will do discriminator sorry okay so discriminator like
takes an IM as an input and tries to classify it as real or generated in this sense it's like any other neural network so I will use here CNN which outputs is a single new nwor for every image so okay so I hope you know again like what discriminator is what generator is and what is like real data it is this okay and we will generate the data okay fake data and what discriminator will do discriminator will check whether the data is fake or real okay so here I will write import do do NN s
NN here I will write discriminator = to NN do sequential okay so these are some so these are some layer okay flatten layer converted layer okay leaky uh ra layer so here I'm setting you know discri like 3 into 64 64 okay so here 64 by 128 128 by 256 so these are the sizes sizes of the images okay so here discriminator equals to to device discriminator Comm device okay this okay what's wrong the spelling is wrong maybe okay so it's saying discriminator is not defied okay let me debug okay nothing else the spelling was
wrong so sorry for that so let me do for the best visuals okay so I know I hope you know the generator what generator network is so here what I will do I will set the size latent size equals to 128 okay so here we have set the discriminator now what we will do we will set the generator okay the sizes like 3 into 64 64 or 32 128 and so on for all the layers so here I'm setting for the generator the same I will write here generator to two device generator Comm device again
the generator this one okay yeah is Define here okay that's working fine so here so now what I will do I will do the discriminator training okay so for that I have to write BF train discriminator real images comma opt B okay now we will clear the discriminator gradients so op D do0 GR okay here we will pass real images through discriminator okay so these are the for the real images because we have to show the all the real and the fake images then we'll Shuffle then and we'll find the which is real and which
is not okay so and now we will generate the fake images using latent okay so for this later equals to torch. random input and the best size we are giving the later size we are giving okay fake images equal to generator so now what we will do we will pass the fake images through discriminator as we did for the real images okay so now we will update discriminator we for that I have to write loss equals to real loss then Plus fake Closs okay then lws dot backward optd do step return return L do item
comma PL score comma fake score okay okay bracket is missing okay backward 36 okay so here what we did we did the we pass the real images to discriminator then generate fake images and the same time we pass the fake images through discriminator and at the end that law equals to real la and the fake laws we update the discriminator weights okay so now so this was the discriminator training now what we will do we will do the generator training okay so for that I have to write DF for that I have to write DF
train generator then op g dot z r so what we are doing here we are clearing the generator gradients before that we did for the same the discriminator one okay so now we will generate the fake images okay what generator do generator only uh generate the fake images okay so from this prediction from this prediction what we are doing we are just make trying to fool the discriminator okay so so here we will update the generator R so I will write laws dot backward then I will write opore G do step then here will write
return losw do item okay let's run it so here I will write from torch Vision dot U import save image and here I will write sample directory equals to generated generated okay and os. may directory sample directory comma exist okay equal to True okay so now what we will do we will uh save the sample data okay so we I have to create uh to save samples uh one function okay so here what I'm doing we are I'm making the fake images generating the fake images and saving it okay so moving forward what I will
fix the I will fix uh the latent latent equals DOD dot random input then 64 latent size comma one comma one comma then device device then again save samples to 0 comma fixed latent okay save samples is not defined it's def find here yeah so see this is the generated images this is the fake image okay now what I will do I will do the full training Loop for that I have to write from TDM do notebook import DQ DM then import do. NN do functional SF let me give the spaces so now what we
will do we will train this uh we will do the full training Loop till the 400 epox so it will take a very long time so first I will write the definition okay I will Define one uh the function okay and then I will get back to you so yes what I did uh so this I have set the losses and the scores okay and uh these are the optimizers some optimizers OPD you can see Optimizer we have creating and here I'm training the uh discriminator and here I'm training the generator okay for the loss
and here the record of the loss in the you know scores will the save and this is for the log of losses and the scores last batch and for this this this is for the generated image okay it will save the generated image okay we have already created here you can see for the sample image for the saving okay now what I will do I will write percent percent time then LR equal to 0.5 then a box equals to 400 ax means it will take a huge time so history equals to fit box comma alot
okay fit is not defined okay to run it again okay White's coming like this okay something item object has zero gr okay I have to check so as you can see it started running so this box will run till 400 so it will take a long time very long time so I will get back to you after that so as you can see this is of 1 by 400 so it will run 3 till 339 okay so it will take a very very long time so it will Define the loss of generated the loss of
discriminator and the real score and the fake score and at the same time it's saving the generated images okay so it will take a long time and then I will get back to you so as you can see here the GN are done till like 400 okay till all the 400 okay so now let's do some losses comma losses of discriminator and the real score and the fake scores to history so here I will tou do save the generator do state and let school directory path comma G do B okay then I will write do
do save this Eliminator do state directory path comma d. pth okay some spelling mistake is there yeah so I write from IPython do display import image okay so here I will write image like what the generator generated the image do slash generated SL images then 0 0 01.png g okay let's see so this is this is the first image which generated by the generator okay so same we we have 400 a box so let's see so here I will check the 100 image so as you can see 100 image is more bit clear so what
if I will check for the like 300 300 image one it's more bit clear okay now let's check the 400 image I hope see it is clear so it is these are the fake images which are generated by the generator to fool the discriminator to confuse the discriminator okay so now we will plot a graph we will put out a graph for the EPO and the loss in the for the discriminator and the generator so for that right so as you can see this is a discriminator okay blue one and there is a generator generator
so loss for the generator is the more and the loss for discriminator is less which is very good and now let's see the real and fake images okay so these are the real images score and these are the fake images welcome to this video tutorial by simply Loan in this video we will learn about an important popular deep learning neural network called generative adversarial networks Yan leun one of the Pioneers in the field of machine learning and deep learning described it as the most interesting idea in the last 10 years in machine learning in this
video you will learn about what are generative adversary networks and look in brief at generator and discriminator then we'll understand how Gans work and the different types of Gans finally we look at some of the applications of Gans so let's begin so what are generative adversarial networks generative adversarial networks or Gans introduced in 2014 by ianj good fellow and co-authors became very popular in the field of machine learning Gan is an unsupervised learning task and machine learning it consists of two models that automatically discover and learn the patterns in input data the two models called
generator and discriminator compete with each other to analyze capture and copy the variations within a data set Gans can be used to generate new examples that possibly could have been drawn from the original data set in the image below you can see that there is a database that has real 100 rupe notes the generator which is basically a neural network generates fake 100 rupees notes the discriminator network will identify if the notes are real or fake let us now understand in brief about what is a generator a generator in Gans is a neural network that
creates fake data to be trained on the discriminator it learns to generate plausible data the generated instances become negative training examples for the discriminator it takes a fixed length random Vector carrying noise as input and generates a sample now the main aim of the generator is to make the discriminator classify its output as real the portion of the Gan that trains the generator includes a noisy input Vector the generator Network which transforms the random input into a data instance a discriminator network which classifies the generator data and a generator loss which penalizes the generator for
failing to do the discriminator the back propagation method is used to adjust each weight in the right direction by calculating the weight's impact on the output the back propagation method is used to obtain gradients and these gradients can help change the generator weights now let us understand in brief what a discriminator is a discriminator is a neural network model that identifies real data from the fake data generated by the generator the discriminator training data comes from two sources the real data instances such as real pictures of birds humans currency notes Etc are used by the
discriminator as positive samples during the training the fake data instances created by the generator are used as negative examples during the training process while training the discriminator it connects with two LW functions during discriminator training the discriminator ignores the generator LW and just uses the discriminator LW in the process of training the discriminator the discriminator classifies both real data and fake data from the generator the discriminator law penalizes the discriminator from misclassifying a real data instance as fake or a fake data instance as real now moving ahead let's understand how Gans work now Gans consists
of two networks a generator which is represented as G of X and A discriminator which is represented as D ofx they both play an adversarial game where the generator tries to fool the discriminator by generating data similar to those in the training set the discriminator tries not to be fooled by identifying fake data from the real data they both work simultaneously to learn and train complex data like audio video or image files now you are aware that Gans consists of two networks a generator G ofx and discriminator D ofx now the generator Network takes a
sample and generates a fake sample of data the generator is stained to increase the probability of the discriminator network to make mistakes on the other hand the discriminator network decides whether the data is generated or taken from the real sample using a binary classification problem with the help of a sigmoid function that gives the output in the range 0er and 1 here is an example of a generative ad verial Network trying to identify if the 100 rupee notes are real or fake so first a noise vector or the input Vector is fed to the generator
Network the generator creates fake 100 rupee notes the real images of 100 rupee notes stored in a database are passed to the discriminator along with the fake nodes the discriminator then identifies the notes and classifies them as real or fake we train the model calculate the loss function at the end of the discriminator network and back propagate the loss into both discriminator and Generator now the mathematical equation of training again can be represented as you can see here now this is the equation and these are the parameters here G represents generator D represents the discriminator
now P data of X is the probability distribution of real data P of Z is the distribution of Generator X is the sample of probability data of X Zed is the sample size from P of z d of X is the discriminator Network and G of Z is the generator Network now the discriminator focuses to maximize the objective function such that D of X is close to 1 and Z of Z is close to zero it simply means that the discriminator should identify all the images from the training set as real that is one and
all the generated images as fake that is zero the generator wants to minimize the objective function such that D of Z of Z is 1 this means that the generator tries to generate images that are classified as real that is one by the discriminator network next let's see the steps for training a neural network so we have to first Define the problem and collect the data then we'll choose the architecture of Gan now depending on your problem choose how your Gan should look like then we need to train the discriminator in real data that will
help us predict them as real for n number of times next you need to generate fake inputs for the generator after that you need to train the discriminator on fake data to predict the generator data is fake finally train the generator on the output of discriminator with the discriminator predictions available train the generator to fool the discriminator let us now look at the different types of Gans so first we have vanilla Gans now vanilla Gans have minmax optimization formula that we saw earlier where the discriminator is a binary classifier and is using sigmoid cross entropy
loss during optimization in vanilla Gans the generator and the discriminator are simple multi-layer percept droms the algorithm tries to optimize the mathematical equation using stochastic gradient descent up next we have deep convolutional Gans or DC Gans now DC Gans support convolutional neural networks instead of vanilla neural networks at both discriminator and Generator they are more stable and generate higher quality images the generator is a set of convolute ution layers with fractional strided convolutions or transpose convolutions so it UNS samples the input image at every convolutional layer the discriminator is a set of convolutional layers with
strided convolutions so it down samples the input image at every convolutional layer moving ahead the third type you have is conditional Gans or C Gans vanilla Gans can be extended into conditional models by using an extra label information to generate better results in Sean an additional parameter called Y is added to the generator for generating the corresponding data labels are fed as input to the discriminator to help distinguish the real data from fake data generated finally we have super resolution Gans now Sr Gans use deep neural networks along with adversarial neural network to produce higher
resolution images super resolution Gans generate a photo realistic high resolution image when given a low resolution image let's look at some of the important applications of Gans so with the help of DC Gans you can train images of cartoon characters for generating faces of Anime characters and Pokémon characters as well next Gans can be used on the images of humans to generate realistic faces the faces that you see on your screens have been generated using Gans and do not exist in reality third application we have is Gans can be used to build realistic images from
textual descriptions of objects like birds humans and other animals we input a sentence and generate multiple images fitting the description here is an example of a text to image translation using Gans for a bird with a black head yellow body and a short beak the final application we have is creating 3D objects so Gans can generate 3D models using 2D pictures of objects from multiple perspectives Gans are very popular in the gaming industry Gans can help automate the task of creating 3D characters and backgrounds to give them a realistic feel welcome to our video about
Transformers in Ai and no we don't mean the robot toys from the movies we are diving into something even cooler in the world of computers and AI have you ever wondered how your phone knows what word you might type next or how Google translate works so well that's where Transformers come in they are like super smart computer brains that can understand and create humanik text here's a fun example I asked a Transformer to tell me a joke and it said why did the computer go to the art school because it wanted to improve its draw
speed okay that's a bit cheesy but it shows how these computer programs can come up with new ideas on their own Transformers are changing how we use technology every day they help us with things like translating languages summarizing long articles writing emails and stories and even playing games like chess in this video we will explore how Transformers work why they are so special and what cool things they might do in the future so let's talk about what exactly are Transformers Transformers are an artificial intelligence model used to process and generate natural languages they can read
and understand huge amount of text and then use that knowledge to answer questions translate languages summarize information and even create stories or write code the magic behind Transformers is their ability to focus on different text Parts with attention mechanisms this means that they can understand context better than older models making their outputs more accurate and natural sounding the basic structure of a transformer includes two main parts the encoder and the decoder think of the encoder as a translator that understands and processes the input and the decoder as the one that takes the processed information and
turns it into the output for example if we are translating a sentence from English to French the encoder reads the English sentence and converts it into a form that AI can understand the decoder then takes this form and generates the French sentence a great example of a transformer in action is Char gbt CH gbt uses Transformers to understand and generate humanik text when you ask a question it processes your input with its encoder and generates a response with its decoder this lets it have conversations write essays and even tell jokes for instance if you ask
chity what's the wether like today it uses its Transformer model to understand your question and respond it with its with a chance of rain in the afternoon this ability to understand and generate text makes Transformers incredibly powerful so let's talk about how Transformers work Transformers are especially good at sequence to sequence learning task like translating a sentence from one language to another here's how they work first there's the attention mechanism this allows the Transformer to focus on different parts of input data for example if it's translating the sentence the cat sat on the mat it
can pay attention to each word's context to understand the meaning better so it knows cat is related to sat and mat helping it produce an accurate translation in another language Transformers also use something called positional encoding since they process all words at once they need a way to understand the order of the words positional encoding adds information about the position of each word to the input helping the Transformers understand the sequence another key feature is like parallel processing unlike older models like record and neural network which is rnns the process takes words by word Transformers
can process the entire sentence at once this makes them much faster and more efficient let's compare Transformers with recurent neural networks but first let's understand what are rnns so RNs is a type of neural network designed to handle sequential data they process data one step at a time maintaining a memory of previous step this makes them good for task where order matters like speech recognition or time series prediction however RNs have a problem called The Vanishing gradient which means that they can forget information from earlier in the sequence imagine trying to understand the sentence Alice
went to the park and then to the store and RNN might struggle to remember Alice by the time it gets to the store but a Transformer can easily keep track of Alice throughout the sentence so why are Transformers better unlike R and is Transformers process the entire sentence at once keeping the context intact this solves the vanishing greent problem and makes Transformers faster and more accurate to task like language translation and text generation so let's talk about the applications of transformers at first we have language translation they are used by services like Google Translate to
convert text from one language to another for example translating hello how are you to Spanish as hola then we have document summarization they can take long articles and summarize them into shorter more concise versions for instance summarizing a 10 page report into a few key points making it easier to understand the main ideas without reading the whole document then we have content generation Transformers can write articles stories and even quote they can create new content based on what they have learned for example you could ask a Transformer to write a short story about a space
adventure and then it would come up with a unique narrative then we have game playing Transformers can learn and play complex games like chess making strategic decisions just like a human player they analyze the entire board and make moves considering all possible outcomes let's talk about image processing they are used in task like image classification and object detection helping computers understand visual data for example identifying objects in a photo like recognizing a cat tree or a car now let's understand understand the training process the training Transformers involves two main steps semi-supervised learning they can learn
from both label data where the answer is known and unlabeled data where the answer is not provided this makes them very versatile for example a Transformer could be trained on a mix of articles with and without summaries to learn how to summarize text effectively pre-training and F tuning Transformers are pre-trained on a large data set to learn General patterns then they are fine tune with specific task making them highly versatile for instance a Transformer might be pre-trained on a large collection of books to understand language and then find tune to generate marketing copy for a
specific brand the future potential of Transformers is huge researchers are continuously improving them making them even more powerful we can expect more advanced applications in areas like healthcare finance and more sophisticated AI systems that interact with humans in more natural ways imagine having an AI that can provide personalized medical advice or one that can help you write a novel in conclusion we can say that Transformers are a revolutionary architecture in AI they offer Speed efficiency and versatility changing how we interact with the technology the future looks bright for Transformers and we can't wait to see
what they'll do next so now what is RNN RNN are a type of neural network that are designed to process sequential data they can analyze data with temporal dimension such as time series speech and text RN can do this by using a hidden sh pass from one time stamp to the next the next hidden state is updated at each other time step based on the input and the previous hidden State RNN are able to capture short-term dependencies in sequential data but they struggle with capturing long-term dependencies why the lstm suit so moving forward let's discuss
types of lstm gates so lstm models have three types of gates the input gate the forgate gate and the output gate so let's first discuss the input gate the input gate controls the flow of information into the memory cell deciding what to store the input gate determines which values from the input should be updated in the memory set it uses a sigmoid activation function to scale the values between 0o and one and then applies pointwise multiplication to decide what information to store next is forget gate controls the flow of information out of the memory cell
deciding what to discard the foret gate decides what information should be discarded from the memory cell it also uses a zmo activation function to scale the values between 0er and one followed by pointwise multiplication to determine what information to forget the last one is output gate controls the flow of information out of the lstm deciding what to use for the output the output gate determines the output of the lstm unit it uses a sigmoid activation function to scale the values from 0 to one then applies point5 multiplication to produce the output of the lstm unit
so these Gates implemented using sigmoid function are trained using back propagation they open and close based on the input and the previous hidden State allowing the lstm to selectively retain or discard information effectively capturing long-term dependencies so now let's discuss application of LSM lstm models are highly effective and used in various application including video analysis analyzing video frames to identify action object and scenes the second is language simulation tasks like language modeling machine translation and text summarization the third one is time series prediction so predicting future values in a Time series the fourth is voice
recognition tasks such as speech to text transcription and command recognition the last one is sentiment analysis classifying Tex sentiment as positive negative or neural so there are many more examples of LSD so now let's move forward and understand lstm model and how it works with example let's consider the task of predicting the next word in a sentence this is a common application of lstm networks in natural language processing so I will break it down step by step using the analogy of remembering a story and deciding what comes next based on the context so imagine you
are reading a story as you read you need to remember what has happened so far to predict what might happen next so let's illustrate with the simple example sentence the cat set on the dash so you want to predict the next word which could be mat or roof or something else an lstm Network helps this make prediction by remembering important parts of the story and forgetting irrelevant details so now let's dive into step by-step process so step by step exclamation using LSM the first one is reading the story input c as you read each word
in the word sentence you process it and store relevant information for example the you understand its determiner cat you know it's a noun and the subject of the sentence s indicates the action performed by the subject on preposition indicating the relationship between the cat and the next noun so this sequence diagram showing the words being read and process so second comes forget get as you move through the sentence you might decide that some detail are no longer important for instance you might decide that knowing the is less important now that you have car and said
so the word forget Gates help discard this less important information so this sequence diagram you can see on the screen showing how relevant information is discarded so the third one input gate when you read on you need to decide how relevant this information is so this sequence diagram in the screen is showing how new information is integrated with the last one okay the fourth one is the cell state memory part so this is like your memory of the story so far it carries the information about the subject cat and the action set on so it
updates the new information as you read each word okay the cat set on the retaining the important context so this sequence diagram showing how the memory is updated with the new information so the last one is output gate when you need to predict the next word the output G helps you decide based on the current memory cell estate so it uses the context the cat set on the so the predict the next word might be mat because cat and mat are often associated with the context so it can predict anything the cat set on the
table or on the sofa anything but Matt why I'm saying Matt because cat and Matt are often associated in the same context so this diagram is showing the prediction of the next word based on the current memory so there are many applications where you can use lstm in predictim Time series or next word in the sentence so by the using of lstm gates input gate forget gate and output gate and updating the cell State the network can predict the next word in a sequence by maintaining relevant context and discarding unnecessary information this stepbystep process allow
LSM Network to effectively handle sequence and make accurate prediction based on the context llms if you ever wondered how machine learning can Now understand and generate humanlike text you are in the right place from chat boards like chat GPT to AI assistant that power search engines llms are transforming how we interact with technology one of the most exciting advancement in this space is Google Gemini or open chgb a cutting as large language model designed to push the boundaries of what AI can achieve in this video we will explore what llms are how they work and
why models like Gemini are critical for the future of AI Google Gemini is part of a new wave of AI models that are smarter faster and more efficient it is designed to understand context better offer more accurate responses and integrate deeply into service like Google search and Google Assistant providing more human-like interactions so we will break down the science behind llms including their massive training data set Transformer architecture and how models like G use deep learning Innovation to change Industries plus we will compare Google Gemini to other popular LM such as open aity models showing
how each of these Technologies is used to power chat boards virtual assistants and other AI driven application by end of this video you will have a clear understanding of how large language models like chamini work their key features and what they mean for their future AI don't forget to like subscribe and hit the Bell icon to never miss any update from Simply learn so what are the large language models large language models like CH GPD 4 generative pre-trained Transformer 4 o and Google Gemini are sophisticated AI system designed to comprehend and generate human like text
these models are built using deep learning techniques and are trained on was data set collected from the internet they leverage self attention mechanism to analyze relationship between words or tokens allowing them to capture context and produce coherent relevant responses llms have significant application including powering virtual assistant chat boards content ation language translation and supporting research and decision making their ability to generate fluent and contextually appropriate text has advanced natural language processing and improved human computer interaction so now let's see what are large language model used for large language models are utilized in scenarios with limited
or no domain specific data available for training these scenarios include both few short and zero short training approaches which rely on the model's strong inductive bias and its capability to derive meaningful representation from a small amount of data or even no data at all so now let's see how are large language model trained large language models typically under grow pre-training on a boat all encompassing data set that shares statical similarities with the data set is specific to the Target task the objective of pre-tuning is to enable the model toire high level feature that can later
be applied during the fine tuning phase for specific task so there are some training processes of llm which involves several steps the first one is text pre-processing the textual data is transformed into a numerical representation that the llm model can effectively process this conversion may be involved techniques like tokenization and coding and creating input sequences the second one is random parameter initialization the models parameter are initialized randomly before the training process begins the third one is input numerical data the numerical representation of the text data is fed into the model of processing the models architecture
typically based on Transformers allows it to capture the conceptual relationship between the words or to in the next the fourth one is loss function calculation a loss function calculation measure the discrepancy between the model's prediction and the actual next word or token in a SX the llm model aims to minimize this lws during training the fifth one is parameter optimization the models parameter are registered through optimization technique this involves calculating gradient and updating the parameters accordingly gradually improving the model's performance the last one is itative training the training process is repeated over multiple itation or
AO until the models output achieve a satisfactory level of accuracy on that given task or data set by following this training process large language model learn to capture linguistic patterns understand context and generate coherent responses enabling them to excel at various language related tasks the next topic is how do large language models work so large language models leverage deep neural network to generate output based on patterns learn from the training data typically a large language model adopts a Transformer architecture which enables the model to identify relationship between words in a sentence irrespective of their position
in the sequence in contrast to RNs that rely on recurrence to capture token relationship Transformer neural network employs self attention as their primary mechanism self attention calculates attention scores that determine the importance of each token with respect to the other token in the text sequence facilitating the modeling of integrate relationship within the data next let's see application of large language models large language models have a wide range of application across various domains so here are some notable applications the first one is natural language processing NLP large language models are used to improve natural language understanding
tasks such as sentiment analysis named entity recognition text classification and language modeling the second one is chatboard and virtual assistant L language models power conversational agents chat Bots and virtual assistant providing more interactive and humanik user interaction the third one is machine translation L language models have been used for automatic language transl ation enabling textt translation between different languages with improved accuracy the fourth one is sentiment analysis llms can analyze and classify the sentiment or emotion expressed in a piece of text which is valuable for market research brend monitoring and social media analysis the fifth
one is content recommendation these models can be employed to provide personalized content recommendations enhancing user experience and engagement on platforms such as News website or streaming services so these application highlight the potential impact of large language models in various domains for improving language understanding Automation and interaction between humans and computers we've looked at a lot of examples of machine learning so let's see if we can give a little bit more of a concrete definition what is machine learning machine learning is a science of making computers learn and act like humans by feeding data and information
without being explicitly programmed we see here we have a nice little diagram where we have our ordinary system uh your computer nowadays you can even run a lot of this stuff on a cell phone because cell phones advance so much and then with artificial intelligence and machine learning it now takes the data and it learns from what happened before and then it predicts what's going to come next and then really the biggest part right now in machine learning is going on is it improves on that how do we find a new solution so we go
from descriptive where it's learning about stuff and understanding how it fits together to predicting what it's going to do to postcript coming up with a new solution and when we're working on machine learning there's a number of different diagrams that people have posted for what steps to go through a lot of it might be very domain specific so if you're working on Photo identification versus language versus medical or physics some of these are switched around a little bit or new things are put in they're very specific to The Domain this kind of a very general
diagram first you want to Define your objective very important to know what it is you're wanting to predict then you're going to be collecting the data so once you've defined an objective you need to collect the data that matches you spend a lot of time in data science collecting data and the next step preparing the data you got to make sure that your data is clean going in there's the old saying bad data in bad answer out or bad data out and then once you've gone through and we've cleaned all this stuff coming in then
you're going to select the algorithm which algorithm are you going to use you're going to train that algorithm in this case I think we're going to be working with svm the support Vector machine then you have to test the model does this model work is this a valid model for what we're doing and then once you've tested it you want to run your prediction you want to run your prediction or your choice or whatever output it's going to come up with and then once everything is set and you've done lots of testing then you want
to go ahead and deploy the model and remember I said domain specific this is very general as far as the scope of doing something a lot of models you get halfway through and you realize that your data is missing something and you have to go collect new data because you've run a test in here someplace along the line you're saying hey I'm not really getting the answers I need so there's a lot of things that are domain specific that become part of this model this is a very general model but it's a very good model
to start with and we do have some basic divisions of what machine learning does that's important to know for instance do you want to predict a category well if you're categorizing thing that's classification for instance whether the stock price will increase or decrease so in other words I'm looking for a yes no answer is it going up or is it going down and in that case we'd actually say is it going up true if it's not going up it's false meaning it's going down this way it's a yes no 01 do you want to predict
a quantity that's regression so remember we just did classification now we're looking at regression these are the two major divisions in what data is doing for instance predicting the age of a person based on the height weight health and other factors So based on these different factors you might guess how old a person is and then there are a lot of domain specific things like do you want to detect an anomaly that's anomaly detection this is actually very popular right now for instance you want to detect money withdrawal anomalies you want to know when someone's
making a withdrawal that might not be their own account we've actually brought this up because this is really big right now if you're predicting the stock whether to buy stock or not you want to be able to know if what's going on in the stock market is an anomaly use a different prediction model because something else is going on you got to pull out new information in there or is this just the norm I'm going to get my normal return on my money invested so being able to detect anomalies is very big in data science
these days another question that comes up which is on what we call untrained data is do you want to discover structure in unexplored data and that's called clustering for instance finding groups of customers with similar Behavior given a large database of customer data containing their demographics and past buying records and in this case we might notice that anybody who's wearing certain set of shoes go shopping at certain stores or whatever it is they going to make certain purchases by having that information it helps us to Market or group people together so then we can now
explore that group and find out what it is we want to Market to them if you're in the marketing world and that might also work in just about any Arena you might want to group people together whether they're uh based on their different areas and Investments and financial background whether you're going give them a loan or not before you even start looking at whether they're valid customer for the bank you might want to look at all these different areas and group them together based on unknown data so you're not you don't know what the data
is going to tell you but you want to Cluster people together that come together let's take a quick DeTour for quiz time oh my favorite so we're going to have a couple questions here under quiz time and um we'll be posting the answers in the part two of this tutorial so let's go ahead and take a look at these quiz times questions and hopefully you'll get them all right and it'll get you thinking about how to process data and what's going on can you tell what's happening in the following cases of course you're sitting there
with your cup of coffee and you have your checkbox and your pen trying to figure out what's your next step in your data science analysis so the first one is grouping documents into different categories based on the topic and content of each document very big these days you know you have legal documents you have uh maybe it's Sports Group documents maybe you're analyzing newspaper postings but certainly having that automated is a huge thing in today's world B identifying handwritten digits in images correctly so we want to know whether uh they're writing an A or capital
a BC what are they writing out in their hand digit their handwriting C behavior of a website indicating that the site is not working as designed D predicting salary of an individual based on his for her years of experience HR hiring uh setup there so stay tuned for part two we'll go ahead and answer these questions when we get to the part two of this tutorial or you can just simply write at the bottom and send a note to Simply learn and they'll follow up with you on it back to our regular content now these
last few bring us into the next topic which is another way of dividing our types of machine learning and that is with supervised unsupervised and reinforcement to learning supervised learning is a method used to enable machines to classify predict objects problems or situations based on labeled data fed to the machine and in here you see we have a jumble of data with circles triangles and squares and we label them we have what's a circle what's a triangle what's a square we have our model training and it trains it so we know the answer very important
when you're doing supervised learning you already know the answer to a lot of your information coming in so you have a huge group of data coming in and then you have a new data coming in so we've trained our model the model now knows the difference between a circle a square a triangle and now that we've trained it we can send in in this case a square and a circle goes in and it predicts that the top one's a square and the next one's a circle and you can see that this is uh being able
to predict whether someone's going to default on a loan because I was talking about Banks earlier supervised learning on stock market whether you're going to make money or not that's always important and if you are looking to make a fortune the stock Market keep in mind it is very difficult to get all the data correct on the stock market it is very U it fluctuates in ways you really hard to predict so it's quite a roller coaster ride if you're running machine learning on the stock market you start realizing you really have to dig for
new data so we have supervised learning and if you have supervised we should need unsupervised learning in unsupervised learning machine learning model finds the hidden pattern in an unlabeled data so in this case instead of telling it what the circle is and what a triangle is and what a square is it goes in there looks at them and says for whatever reason it groups them together maybe it'll group it by the number of corners and it notices that a number of them all have three corners a number of them all have four corners and a
number of them all have no corners and it's able to filter those through and group them together we talked about that earlier with looking at a group of people who are out shopping we want to group them together to find out what they have in common and of course once you understand what people have in common Maybe you have one of them who's a customer at your store or you have five of them are customer at your store and they have a lot in common with five others who are not customers at your store how
do you Market to those five who aren't customers at your store yet they fit the demograph of who's going to shop there and you'd like them to shop at your store not the one next door of course this is a simplified version you can see very easily the difference between a triangle and a circle which is might not be so easy in marketing reinforcement learning reinforcement learning is an important type of machine learning where an agent learns how to behave in an Environ ment by performing actions and seeing the result and we have here where
the in this case a baby it's actually great that they used an infant for this slide because the reinforcement learning is very much in its infant stages but it's also probably the biggest machine learning demand out there right now or in the future it's going to be coming up over the next few years is reinforcement learning and how to make that work for us and you can see here where we have our action in the action in this one it goes into the fire hopefully the baby didn't was just a little candle not a giant
fire pit like it looks like here when the baby comes out and the new state is the baby is sad and crying CU they got burned on the fire and then maybe they take another action the baby's called the agent because it's the one taking the actions and in this case they didn't go into the fire they went a different direction and now the baby's happy and laughing and playing reinforcement learning is very easy to understand because that's how as humans that's one of the ways we learn we learn whether it is you know you
burn yourself on the stove don't do that anymore don't touch the stove in the big picture being able to have a machine learning program or an AI be able to do this is huge because now we're starting to learn how to learn that's a big jump in the world of computer and machine learning and we're going to go back and just kind of go back over supervised versus unsupervised learning understanding this is huge because this is going to come up in any project you're working on we have in supervised learning we have labeled data we
have direct feedback so someone's already gone in there said yes that's a triangle no that's not a triangle and then you predict an outcome so you have a nice prediction this is this this new set of data is coming in and we know what it's going to be and then with unsupervised training it's not labeled so we really don't know what it is there's no feedback so we're not telling it whether it's right or wrong we're not telling it whether it's a triangle or a square we're not telling it to go left or right all
we do is we're finding hidden structure in the data grouping the data together to find out what connects to each other other and then you can use these together so imagine you have an image and you're not sure what you're looking for so you go in and you have the unstructured data find all these things that are connected together and then somebody looks at those and labels them now you can take that label data and program something to predict what's in the picture so you can see how they go back and forth and you can
start connecting all these different tools together to make a bigger picture there are many interesting machine learning algorithms let's have a look at a few of them hopefully this give you a little flavor of what's out there and these are some of the most important ones that are currently being used we'll take a look at linear regression decision tree and the support Vector machine let's start with a closer look at linear regression linear regression is perhaps one of the most well-known and well understood algorithms in statistics and machine learning linear regression is a linear model
for example a model that assumes a linear relationship between the input variables X and the single output VAR variable Y and you'll see this if you remember from your algebra classes y = mx + C imagine we are predicting distance traveled y from speed X our linear regression model representation for this problem would be y = m * x + C or distance equals m * speed plus C where m is the coefficient and C is the Y intercept and we're going to look at two different variations of this first we're going to start with
time is constant and you can see see we have a bicyclist he's got a safety gear on thank goodness speed equals 10 m/ second and so over a certain amount of time his distance equals 36 km we have a second bicyclist who's going twice the speed or 20 m/s and you can guess if he's going twice the speed and time is a constant then he's going to go twice the distance and that's easily to compute 36 * 2 you get 72 km and so if you had the question of how fast would somebody is going
through three times that speed or 30 m/s is you can easily compute the distance in our head we can do that without needing a computer but we want to do this for more complicated data so it's kind of nice to compare the two but let's just take a look at that and what that looks like in a graph so in a linear regression model we have our distance to the speed and we have our m equals the ve slope of the line and we'll notice that the line has a plus slope and as speed increases
distance also increases hence the variables have a positive Rel relationship and so your speed of the person which equals y = mx plus C distance traveled in a fixed interval of time and we could very easily compute either following the line or just knowing it's three times 10 m/s that this is roughly 102 kilm distance that this third bicep has traveled one of the key definitions on here is positive relationship so the slope of the line is positive as distance increase so does speed increase let's take a look at our second example where we put
distance is a constant so we have speed equals 10 m/ second they have a certain distance to go and it takes them 100 seconds to travel that distance and we have our second bicyclist who's still doing 20 m/ second since he's going twice the speed we can guess that he'll cover the distance in about half the time 50 seconds and of course you could probably guess on the third one 100 divided by 30 since he's going three times the speed you could easily guess that this is 33.33 3 seconds time we put that into a
linear regression model or a graph if the distance is assumed to be constant let's see the relationship between speed and time and as time goes up the amount of speed to go that same distance goes down so now your m equals a minus ve slope of the line as the speed increases time decreases hence the variable has a negative relationship again there's our definition positive relationship and negative relationship depended on the slope of the line and with a simple formula like this um and even a significant amount of data Let's uh see with the mathematical
implementation of linear regression and we'll take this data so suppose we have this data set where we have x yyx = 1 2 3 45 standard series and the Y value is 3 22 43 when we take that and we go ahead and plot these points on a graph you can see there's kind of a nice scattering and you could probably eyeball a line through the middle of it but we're going to calculate that exact line for linear regression and the first thing we do is we come up here we have the mean of XI
and remember mean is basically the average so we added 5 + 4 + 3+ 2 + 1 and divide by five and that simply comes out as three and then we'll do the same for y we'll go ahead and add up all those numbers and divide by five and we end up with the mean value of y of I equals 2.8 where the XI references it's an average or means value and the Yi also equals a mean's value of y and when we plot that you'll see that we can put in the Y = 2.8
and the xal 3 in there on our graph we kind of gave it a little different color so you could sort it out with the dash lines on it and it's important to note that when we do the linear regression the linear regression model should go through that dot now let's find our regression equation to find the best fit line remember we go ahead and take our yal MX plus C so we're looking for M and C so to find this equation for our data we need to find our slope of M and our coefficient
of c and we we have y = mx + C where m equal the sum of x - x average * y - y average or y means and X means over the sum of x - x means squared that's how we get the slope of the value of the line and we can easily do that by creating some columns here we have XY computers are really good about iterating through data and so we can easily compute this and fill in a graph of data and in our graph you can easily see that if if
we have our x value of one and if you remember the XI or the means value was three 1 - 3 = -2 and 2 - 3 = A-1 so on and so forth and we can easily fill in the column of x - x i y - Yi and then from those we can compute x - x i^ 2 and x - x i * Yus Yi and you can guess it that the next step is to go ahead and sum the different columns for the answers we need so we get a total of
10 for our x - x i^ 2 and a total of 2 for x - x i * y - y i and we plug those in we get 210 which equals 0.2 so now we know the slope of our line equals 0.2 so we can calculate the value of c that'd be the next step is we need to know where crosses the y axis and if you remember I mentioned earlier that the linear regression line has to pass through the means value the one that we showed earlier we just flip back up there to
that graph and you can see right here there's our means value which is 3 x = 3 and Y = 2.8 and since we know that value we can simply plug that into our formula y = 2x + C so we plug that in we get 2.8 = 2 * 3 + C and you can just solve for C so now we know that our coefficient equals 2.2 and once we have all that we can go ahead and plot our regression line Y = 2 * x + 2.2 and then from this equation we can
compute new values so let's predict the values of Y using x = 1 2 3 4 5 and plot the points remember the 1 2 3 4 5 was our original X values so now we're going to see what y thinks they are not what they actually are and we plug those in we get y of designated with Y of P you can see that x = 1 = 2.4 x = 2 = 2.6 and so on and so on so we have our y predicted values of what we think it's going to be when
we plug those numbers in and when we plot the predicted values along with the actual values we can see the difference and this is one of the things is very important with linear regression in any of these models is to understand the error and so we can calculate the error on all of our different values and you can see over here we plotted um X and Y and Y predict and we drawn a little line so you can sort of see what the error looks like there between the different points so our goal is to
reduce this error we want to minimize that error value on our linear regression model minimizing the distance there are lots of ways to minimize the distance between the line and the data points like sum of squared errors sum of absolute errors root mean square error Etc we keep moving this line through the data points to make sure the best fit line has the least Square distance between the data points and the regression line so to recap with a very simple linear regression model we first figure out the formula of our line through the middle and
then we slowly adjust the line to minimize the error keep in mind this is a very simple formula the math gets even though the math is very much the same it gets much more complex as we add in different dimensions so this is only two Dimensions y equals MX plus C but you can take that out to X Z ijq all the different features in there and they can plot a linear regression model on all of those using the different formulas to minimize the error let's go ahead and take a look at decision trees a
very different way to solve problems in the linear regression model decision tree is a tree-shaped algorithm used to determine a course of action each branch of a tree represents a possible decision occurrence or reaction we have data which tells us if it is a good day to play golf and if we were to open this data up in a general spreadsheet you can see we have the Outlook whether it's a rainy overcast Sunny temperature hot mild cool humidity windy and did I like to play golf that day yes or no so we're taking a census
and certainly I wouldn't want a computer telling me when I should go play golf or not but you can imagine if you got up in the night before you're trying to plan your day and it comes up and says tomorrow would be a good day for golf for you in the morning and not a good day in the afternoon or something like that this becomes very beneficial and we see this in a lot of applications coming out now where it gives you suggestions and lets you know what what would uh fit the match for you
for the next day or the next purchase or the next uh whatever you know next mail out in this case is tomorrow a good day for playing golf based on the weather coming in and so we come up and let's uh determine if you should play golf when the day is sunny and windy so we found out the forecast tomorrow is going to be sunny and windy and suppose we draw our tree like this we're going to have our humidity and then we have our normal which is if it's if you have a normal humidity
you're going to go play golf and if the humidity is really high then we look at the Outlook and if the Outlook is sunny overcast or rainy it's going to change what you choose to do so if you know that it's a very high humidity and it's sunny you're probably not going to play golf cuz you're going to be out there miserable fighting off the mosquitoes that are out joining you to play golf with you maybe if it's rainy you probably don't want to play in the rain but if it's slightly overcast and you get
just the right Shadow that's a good day to play golf and be outside out on the green now in this example you can probably make your own tree pretty easily because it's a very simple set of data going in but the question is how do you know what to split where do you split your data what if this is much more complicated data where it's not something that you would particularly understand like studying cancer they take about 36 measurements of the cancerous cells and then each one of those measurements represents how bulbous it is how
extended it is how sharp the edges are something that as a human we would have no understanding of so how do we decide how to split that data up and is that the right decision tree but so that's a question is going to come up is this the right decision tree for that we should calculate entropy and Information Gain to important vocabulary words there are the entropy and the Information Gain entropy entropy is a measure of Randomness or impurity in the data set entropy should be low so we want the chaos to be as low
as possible we don't want to look at it and be confused by the images or what's going on there with mixed data and the Information Gain it is a measure of decrease in inter y after the data set is split also known as entropy reduction Information Gain should be high so we want our information that we get out of the split to be as high as possible let's take a look at entropy from the mathematical side in this case we're going to denote entropy as I of P of and N where p is the probability
that you're going to play a game of golf and N is the probability where you're not going to play the game of golf now you don't really have to memorize these formulas there's a few of them out there depending on what you're working with but it's important to note that this is where this formula is coming from so when you see it you're not lost when you're running your programming unless you're building your own decision tree code in the back and we simply have a log squar of P Over p+ N minus n / p+
n * the log of n of p+ n but let's break that down and see what actually looks like when we're Computing that from the computer script side entb of a target class of the data set is the whole entropy so we have entropy play golf and we look at this if we go back to the data you can simply count how many yeses and know in our complete data set for playing golf days in our complete set we find we have five days we did play golf and nine days we did not play golf
and so our I equals if you add those together 9 + 5 is 14 and so our I equals 5 over 14 and 9 over 14 that's our P andn values that we plug into that formula and you can go 5 over 14 = 36 9 over 14 = 64 and when you do the whole equation you get the minus. 36 logun SAR of 36- 64 log s < TK of 64 and we get a set value we get 0.94 so we now have a full entropy value for the whole set of data that we're
working with and we want to make that entropy go down and just like we calculated the entropy out for the whole set we can also calculate entropy for playing golf in the Outlook is it going to be overcast or rainy or sunny and so we look at the entropy we have P of Sunny time e of 3 of two and that just comes out how many sunny days yes and how many sunny days no over the total which is five don't forget to put the we'll divide that five out later on equals P overcast equal
4 comma 0 plus rainy equal 2 comma 3 and then when you do the whole setup we have 5 over 4 14 remember I said there was a total of five 5 over 14 * the I of 3 of 2 + 4 over 14 * the 4 comma 0 and 514 over I of 23 and so we can now compute the entropy of just the part it has to do with the forecast and we get 693 similarly we can calculate the entropy of other predictors like temperature humidity and wind and so we look at the
gain Outlook how much are we going to gain from this entropy play golfus entropy play golf Outlook and we can take the original 0.94 for the whole set minus the entropy of just the um rainy day in temperature and we end up with a gain of. 247 so this is our Information Gain remember we Define entropy and we Define Information Gain the higher the information gain the lower the entropy the better the information gain of the other three attributes can be calculated in the same way so we have our gain for temperature equals .02 29
we have our gain for humidity equals 0.152 and our gain for a windy day equals 048 and if you do a quick comparison you'll see the. 247 is the greatest gain of information so that's the split we want now let's build the decision tree so we have the Outlook is it going to be sunny overcast or rainy that's our first split because that gives us the most Information Gain and we can continue to go down the tree using the different information gains with the largest information we can continue down the nodes of the tree where
we choose the attribute with the largest Information Gain as the root node and then continue to split each sub node with the largest Information Gain that we can compute and although it's a little bit of a tongue twister to say all that you can see that it's a very easy to view visual model we have our Outlook we split it three different directions if the Outlook is overcast we're going to play and then we can split those further down if we want so if the over Outlook is sunny but then it's also windy if it's
windy we're not going to play if it's not windy we'll play so we can easily build a nice decision treat to guess what we would like to do tomorrow and give us a nice recommendation for the day so we want to know if it's a good day to play golf when it's sunny and windy remember the original question that came out tomorrow's weather report is sunny and windy you can see by going down the tree we go Outlook Sunny Outlook windy we're not going to play golf tomorrow so our little Smartwatch pops up and says
I'm sorry tomorrow is not a good day for golf it's going to be sunny and windy and if you're a huge golf fan you might go uhoh it's not a good day to play golf we can go in and watch a golf game at home so we'll sit in front of the TV instead of being out playing golf in the wind now that we looked at our decision tree let's look at the third one of our algorithms we're investigating support Vector machine support Vector machine is a widely used classification algorithm the idea of support Vector
machine is simple the algorithm creates a separation line which divides the classes in the best possible manner for example dog or cat disease or no disease suppose we have a labeled sample data which tells height and weight of males and females a new data point arrives and we want to know whether it's going to be a male or a female so we start by drawing a line we draw decision lines but if we consider decision line one then we will classify the individual as a male and if we consider decision line two then it will
be a female so you can see this person kind of lies in the middle of the two groups so it's a little confusing trying to figure out which line they should be under we need to know which line divides the classes correctly but how the goal is to choose a hyperplane and that is one of the key words they use when we talk about support Vector machines choose a hyper plane with the greatest possible margin between the decision line and the nearest Point within the training set so you can see here we have our support
Vector we have the two nearest points to it and we draw a line between those two points and the distance margin is the distance between the hyperplane and the nearest data point from either set so we actually have a value and it should be equally distant between the two um points that we're comparing it to when we draw the hyperplanes we observe that line one has a maximum distance so we observe that line one has a maximum distance margin so we'll classify the new data point correctly and our result on this one is going to
be that the new data point is Mel one of the reasons we call it a hyper plane versus a line is that a lot of times we're not looking at just weight and height we might be looking at 36 different features or dimensions and so when we cut it with a hyper plane it's more of a three-dimensional cut in the data or multi-dimensional it cuts the data a certain way and each plane continues to cut it down until we get the best fit or match let's understand this with the help of an example problem statement
I always start with a problem statement when you're going to put some code together we're going to do some coding now classifying muffin and cupcake recipes using support Vector machines so the cupcake versus the muffin let's have a look at our data set and we have the different recipes here we have a muffin recipe that has so much flour I'm not sure what measurement 55 is in but it has 55 maybe it's ounces but uh it has certain amount of flour certain amount of milk sugar butter egg baking powder vanilla and salt and So based
on these measurements we want to guess whether we're making a muffin or a cupcake and you can see in this one we don't have just two features we don't just have height and weight as we did before between the male and female in here we have a number of features in fact in this we're looking at eight different features to guess whether it's a muffin or a cupcake what's the difference between a muffin and a cupcake turns out muffins have more flour while cupcakes have more butter and sugar so basically the cupcakes a little bit
more of a dessert where the muffins a little bit more of a fancy bread but how do we do that in Python how do we code that to go through recipes and figure figure out what the recipe is and I really just want to say cupcakes versus muffins like some big professional wrestling thing before we start in our cupcakes versus muffins we are going to be working in Python there's many versions of python many different editors that is one of the strengths and weaknesses of python is it just has so much stuff attached to it
and it's one of the more popular data science programming packages you can use in this case we're going to go ahead and use anaconda and Jupiter notebook the Anaconda Navigator has all kinds of fun tools once you're into the Anaconda Navigator you can change environments I actually have a number of environments on here we'll be using python 36 environment so this is in Python version 36 although it doesn't matter too much which version you use I usually try to stay with the 3x because they're current unless you have a project that's very specifically in version
2x 27 I think is usually what most people use in the version two and then once we're in our um Jupiter notebook editor I can go up and create a new file and we'll just jump in here in this case we're doing SPM muffin versus Cupcake and then let's start with our packages for data analysis and we almost always use a couple there's a few very standard packages we use we use import oops import import numpy that's for number python they usually denoted as NP that's very comma that's very common and then we're going to
import pandas as PD and numpy deals with number arrays there's a lot of cool things you can do with the numpy uh setup as far as multiplying all the values in an array in a numpy array data array pandas I can't remember if we're using it actually in this data set I think we do as an import it makes a nice data frame and the difference between a data frame and a nump array is that a data frame is more like your Excel spreadsheet you have columns you have indexes so you have different ways of
referencing it easily viewing it and there's additional features you can run on a data frame and pandas kind of sits on numpy so they you need them both in there and then finally we're working with the support Vector machine so from sklearn we're going to use the sklearn model import svm support Vector machine and then as a data scientist you should always try to visualize your data some data obviously is too complicated or doesn't make any sense to the human but if it's possible it's good to take a second look at it so you can
actually see what you're doing now for that we're going to use two packages we're going to import matplot library. pyplot as PLT again very common and we're going to import caborn as SNS and we'll go ahead head and set the font scale in the SNS right in our import line that's with this um semicolon followed by a line of data we're going to set the SNS and these are great because the the caborn sits on top of matap plot Library just like Panda sits on numpy so it adds a lot more features and uses and
control we're obviously not going to get into matplot library and caborn it' be its own tutorial we're really just focusing on the svm the support Vector machine from sklearn and since we're in Jupiter notebook uh we have to add a special line in here for our matplot library and that's your percentage sign or Amber sign map plot library in line now if you're doing this in just a straight code Project A lot of times I use like notepad++ and I'll run it from there you don't have to have that line in there because it'll just
pop up as its own window on your computer depending on how your computer set up because we're running this in the Jupiter notebook as a browser setup this tells it to display all of our Graphics right below on the page so that's what that line is for remember the first time I ran this I didn't know that and I had to go look that up years ago it's quite a headache so M plot Library inline is just because we're running this on the web setup and we can go ahead and run this make sure all
our modules are in they're all imported which is great if you don't have them import you'll need to go ahead and pip use the PIP or however you do it there's a lot of other install packages out there although pip is the most common and you have to make sure these are all installed on your python setup the next step of course is we got to look at the data can't run a model for predicting data if you don't have actual data so to do that let me go ahe and open this up and take
a look and we have our uh cupcakes versus muffins and it's a CSV file or CSV meaning that it's comma separated variable and it's going to open it up in a nice uh spreadsheet for me and you can see up here we have the type we have muffin muffin muffin cupcake cupcake cupcake and then it's broken up into flour milk sugar butter egg baking powder vanilla and salt so we can do is we can go ahead and look at this data also in our python let us create a variable recipes equals we're going to use
our pandas module. read CSV remember is a comma separated variable and the file name happened to be cupcakes versus muffins oops I got double brackets there do it this way there we go cupcakes versus muffins because the program I loaded or the the place I saved this particular Python program is in the same folder we can get by with just the file name but remember if you're storing it in a different location you have to also put down the full path on there and then then because we're in pandas we're going to go ahead and
you can actually in line you can do this but let me do the full print you can just type in recipes. head in the Jupiter notebook but if you're running in code in a different script you need to go ahead and type out the whole print recipes. head and Panda's knows is that's going to do the first five lines of data and if we flip back on over to the spreadsheet where we opened up our CSV file uh you can see where it starts on line two this one calls it zero and then 2 3
4 5 6 is going to match go and close that out because we don't need that anymore and it always starts at zero and these are it automatically indexes it since we didn't tell it to use an index in here so that's the index number for the leftand side and it automatically took the top row as uh labels so Panda's using it to read a CSV is just really slick and fast one of the reasons love our pandas not just because they're cute and cuddly teddy bears and let's go ahead and plot our data and
I'm not going to plot all of it I'm just going to plot the uh sugar and flour now obviously you can see where they get really complicated if we have tons of different features and so you'll break them up and maybe look at just two of them at a time to see how they connect and to plot them we're going to go ahead and use Seaborn so that's our SNS and the command for that is SNS dolm plot and then the two different variables I'm going to plot is flour and sugar data equals recipes the
Hue equals type and this is a lot of fun because it knows that this is pandas coming in so this is one of the powerful things about pandas mixed with Seaborn in doing graphing and then we're going to use a pallet set one one there's a lot of different sets in there you can go look them up for Seaborn we do a regular a fit regular equals false so we're not really trying to fit anything and it's a scatter kws a lot of these settings you can look up in Seaborn half of these you could
probably leave off when you run them somebody played with this and found out that these were the best settings for doing a Seaborn plot and let's go ahead and run that and because it does it in line it just puts it right on the page and you can see right here that just based on sugar and flour alone there's a definite split and we use these models because you can actually look at it and say hey if I drew a line right between the middle of the blue dots and the red dots we'd be able
to do an svm and and a hyperplane right there in the middle then the next St is to format or pre process our data and we're going to break that up into two parts we need to type label and remember we're going to decide whether it's a muffin or a cupcake well a computer doesn't know muffin or cupcake it knows zero and one so what we're going to do is we're going to create a type label and from this we'll create a nump array and P where and this is where we can do some logic
we take our recipes from our panda and wherever type equals muffin it's going to be zero and then if it doesn't equal muffin which is cupcakes it's going to be one so we create our type label this is the answer so when we're doing our training model remember we have to have a a training data this is what we're going to train it with is that it's zero or one it's a muffin or it's not and then we're going to create our recipe features and if you remember correctly from right up here the First Column
is tight so we really don't need the type column because that's our muffin or cupcake and in pandas we can easily sort that out we take our value recipes. columns that's a pandas function built into pandas got values converting them to values so it's just the column titles going across the top and we don't want the first one so what we do is since it always starts at zero we want one colon till the end and then we want to go ahead and make this a list and this converts it to a list of strings
and then we can go ahead and just take a look and see what we're looking at for the features make sure it looks right let me go ahead and run that and I forgot the S on recipes so we'll go ahead and add the s in there and then run that and we can see we have flour milk sugar butter egg baking powder vanilla and salt and that matches what we have up here where we printed out everything but the type so we have our features and we have our label Now the recipe features is
just the titles of the columns and we actually need the ingredients and at this point we have a couple options one we could run it over all the ingredients and when you're dealing this usually you do but for our example we want to limit it so you can easily see what's going on cuz if we did all the ingredients we have you know that's what um seven eight different hyperplanes that would be built into it we only want to look at one so you can see what the svm is doing and so we'll take our
recipes and we'll do just flour and sugar again you can replace that with your recipe features and do all of them but we're going to do just flour and sugar and we're going to convert that to values we don't need to make a list out of it because it's not string values these are actual values on there and we can go ahead and just print ingredients you can see what that looks like uh and so we have just the N of flour and sugar just the two sets of plots and just for fun let's go
ahead and take this over here and take our recipe features and so if we decided to use all the recipe features you'll see that it makes a nice column of different data so it just strips out all the labels and everything we just have just the values but because we want to be able to view this easily in a plot later on we'll go ahead and take that and just do flour and sugar and we'll run that and you'll see it's just the two columns so the next step is to go ahead and fit our
model we'll go a and just call it model and it's a svm we're using a package called s VC in this case we're going to go ahead and set the kernel equals linear so it's using a specific setup on there and if we go to the reference on their website for the svm you'll see that there's about there's eight of them here three of them are for regression three are for classification the SVC support Vector classification is probably one of the most commonly used and then there's also one for detecting outliers and another one that
has to do with something a little bit more specific on the model but SBC and SV are the two most commonly used standing for support vector classifier and support Vector regression remember regression is an actual value a float value or whatever you're trying to work on and SBC is a classifier so it's a yes no true false but for this we want to know 01 muffin cupcake we go ahead and create our model and once we have our model cre we're going to do model. fit and this is very common especially in the sklearn all
their models are followed with the fit command and what we put into the fit what we're training with it is we're putting in the ingredients which in this case we limited to just flour and sugar and the type label is it a muffin or a cupcake now in more complicated data science series you'd want to split into we won't get into that today we split it into a training data and test data and they even do something where they split it into thirds where a third is used for where you switch between which one's training
and test there's all kinds of things go into that and gets very complicated when you get to the higher end not overly complicated just an extra step which we're not going to do today because this is a very simple set of data and let's go ahead and run this and now we have our model fit and I got a error here so let me fix that real quick it's Capital SBC it turns out I did it lowercase support Vector classifier there we go let's go ahead and run that and you'll see it comes up with
all this information that it prints out automatically these are the defaults of the model you notice that we changed the kernel to linear and there's our kernel linear on the print out and there's other different settings you can mess with we're going to just leave that alone for right now for this we don't really need to mess with any of those so next next we're going to dig a little bit into our newly trained model and we're going to do this so we can show you on a graph and let's go ahead and get the
separating we're going to say we're going to use a W for our variable on here we're going to do model. coefficient 0 so what the heck is that again we're digging into the model so we've already got a prediction and a train this is a math behind it that we're looking at right now and so the W is going to represent two different coefficients and if you remember we had y = mx + C so these coefficients are connected to that but in two-dimensional it's a plane we don't want to spend too much time on
this because you can get lost in the confusion of the math so if you're a math Wiz this is great you can go through here and you'll see that we have a = minus W 0 over W of 1 remember there's two different values there and that's basically the slope that we're generating and then we're going to build an XX what is XX we're going to set it up to a numpy array there's our np. linespace so we're creating a line of values between 30 and 60 so it just creates a set of numbers for
x and then if you remember correctly we have our formula y = the slope * X Plus The Intercept well to make this work we can do this as y y equals the slope times each value in that array that's the neat thing about numpy so when I do a * XX which is a whole numpy array of values it multiplies a across all of them and then it takes those same values and we subtract the model intercept that's your uh we had MX plus C so that'd be the C from the formula yal MX
plus C and that's where all these numbers come from a little bit confusing because it's digging out of these different arrays and then we want to do is we're going to take this and we're going to go ahead and plot it so plot the parallels to separating hyper plane that pass through the support vectors and so we're going to create b equals a model support vectors pull in our support vectors out there here's our YY which we now know is a set of data and we have uh we're going to create YY down equals a
* XX + B1 minus a * B 0 and then model support Vector B is going to be set that to a new value the minus one setup and y y up equals a * XX + B1 - A * b0 and we can go ahead and just run this to load these variables up if you want to know understand a a little bit more of what's going on you can see if we print y y we just run that you can see it's an array it's this is a line it's going to have in
this case between 30 and 60 so it's going to be 30 variables in here and the same thing with y y up y y down and we'll we'll plot those in just a minute on a graph so you can see what those look like just go ahead and delete that out of here and run that so it loads up the variables nice clean slate I'm just going to copy this from before remember this our SNS our Seaborn plot LM plot flower sugar and I'll just go and run that real quick so you can see what
remember what that looks like it's just a straight graph on there and then one of the new things is because Seaborn sits on top of pip plot we can do the PIP plot for the line going through and that is simply PLT do plot and that's our XX and y y are two corresponding values x y and then somebody played with this to figure out that the line width equals two and the color black would look nice so let's go ahead and run this whole thing with the PIP plot on there and you can see
when we do this it's just doing flour and sugar on here corresponding line between the sugar and the flour and the muffin versus Cupcake um and then we generated the support vectors the y y down and y y up so let's take a look and see what that looks like so we'll do our PL plot and again this is all against XX our x value but this time we have YY down and let's do something a little fun with this we can put in a k dash dash that just tells it to make it a
dotted line and if we're going to do the down one we ALS want to do the up one so here's our YY up and when we run that it adds both sets Al line and so here's our support and this is what you expect you expect these two lines to go through the nearest data point so the dash lines go through the nearest muffin and the nearest cupcake when it's plotting it and then your svm goes right down the middle so it gives it a nice split in our data and you can see how easy
it is to see based just on sugar and flour which one's a muffin or a cupcake let's go ahead and create a function to predict muffin or cupcake I've got my um recipes I pulled off the um internet and I wanted to see the difference between a muffin or a cupcake and so we need a function to push that through and I create a function with dep and let's call it muffin or cupcake and remember we're just doing flour and sugar today not doing all the ingredients and that actually is a pretty good split you
really don't need all the ingredients to know it's flour and sugar and let's go ahead and do an IFL statement so if model predict is of flour and sugar equals zero so we take our model and we do run a predict that's very common in sklearn where you have a DOT predict you put the data in and it's going to return a value in this case if it equals zero then print you're looking at a muffin recipe else if it's not zero that means it's one then you're looking at a cupcake recipe that's pretty straightforward
for function or def for definition DF is how you do that in Python and of course you're going to create a function you should run something in it and so let's run a cupcake and we're going to send it values 50 and 20 a muffin or a cupcake I don't know what it is and let's run this and just see what it gives us and it says oh it's a muffin you're looking at a muffin recipe so it very easily predicts whether we're looking at a muffin or a cupcake recipe let's plot this there we
go plot this on the graph so we can see what that actually looks like and I'm just going to copy and pasted From Below or we plotting all the points in there so this is nothing different than what we did before if I run it you'll see it has all the points and the lines on there and what we want to do is we want to add another point and we'll do PLT plot and if you remember correctly we did for our test we did 50 and 20 and then somebody went in here and decided
we'll do yo for yellow or it's kind of a orange is yellow color is going to come up marker size nine those are settings you can play with somebody else played with them to come up with the right setup so it looks good and you can see there it is graft um clearly a muffin in this case in cupcakes versus muffins the muffin has won and if you'd like to do your own muffin cupcake Contender series you certainly can send a note down below and the team at simply learn will send you over the data
they use for the muffin and cupcake and that's true of any of the data um we didn't actually run a plot on it earlier we had men versus women you can also request that information to run it on your data setup so you can test that out so to go back over our setup we went ahead for our support Vector machine code we did a predict 40 Parts flour 20 Parts sugar I think it was different than the one we did whether it's a muffin or a cupcake hence we have built a classifier using spvm
which is able to classify if a recipe is of a cupcake or a muffin which wraps up our cupcake versus muffin what's in it for you we're going to cover clustering what is clustering K means clustering which is one of the most common used clustering tools out there including a flowchart to understand K means clustering and how it functions and then we'll do an actual python live demo on clustering of cars based on Brands then we're going to cover logistic regression what is logistic regression logistic regression curve and sigmoid function and then we'll do another
python code demo to classify a tumor as malignant or benign based on features and let's start with clustering suppose we have a pile of books of different genres now we divide them into different groups like fiction horror education and as we can see from this young lady she definitely is into heavy horror you can just tell by those eyes in the maple Canadian leaf on her shirt but we have fiction horror and education and we want to go ahead and divide our books up well organizing objects into groups based on similarity is clustering and in
this case as we're looking at the books we're talking about clustering things with know categories but you can also use it to explore data so you might not not know the categories you just know that you need to divide it up in some way to conquer the data and to organize it better but in this case uh we're going to be looking at clustering in specific categories and let's just take a deeper look at that we're going to use K means clustering K means clustering is probably the most commonly used clustering tool in the machine
learning library K means clustering is an example of unsupervised learning if you remember from our previous thing it is used when you have unlabeled data so we don't know the answer yet we have a bunch of data that we want to Cluster to different groups Define clusters in the data based on feature similarity so we've introduced a couple terms here we've already talked about unsupervised learning and unlabeled data so we don't know the answer yet we're just going to group stuff together and see if we can find an answer of how things connect we've also
introduced feature similarity features being different features of the data now with books we can easily see fiction and horror and history books but a lot of times with data some of that information isn't so easy to see right when we first look at it and so K means is one of those tools where we can start finding things that connect that match with each other suppose we have these data points and want to assign them into a cluster now when I look at these data points I would probably group them into two clusters just by
looking at them I'd say two of these group of data kind of come together but in K means we pick K clusters and assign random centroids to clusters where the K clusters represents two different clusters we pick K clusters and S random centroids to the Clusters then we compute distance from objects to the centroids now we form new clusters based on minimum distances and calculate the centroids so we figure out what the best distance is for the centroid then we move the centroid and recalculate those distances repeat previous two steps iteratively till the cluster centroids
stop changing their positions and become Static repeat previous two steps iteratively till the cluster centroid stop changing and the positions become Static once the Clusters become Static then K means clustering algorithm is said to be converged and there's another term we see throughout machine learning is converged that means whatever math we're using to figure out the answer has come to a solution or it's converged on an answer shall we see the flowchart to understand make a little bit more sense by putting it into a nice easy step by step so we start we choose K
we'll look at the elbow method and just a moment we assign random centroids to clusters and sometimes you pick the centroids because you might look at the data in a in a graph and say oh these are probably the central points then we compute the distance from the objects to the centroids we take that and we form new clusters based on minimum distance and calculate their centroids then we compute the distance from objects to the new centroids and then we go back and repeat those last two steps we calculate the distances so as we're doing
it it brings into the the new centroid and then we move the centroid around and we figure out what the best which objects are closest to each centroid so the objects can switch from one centroid to the other as the centroids are moved around and we continue that until it is converged let's see an example of this suppose we have this data set of seven individuals and their score on two topics A and B so here's our subject in this case referring to the person taking the uh test and then we have subject a where
we see what they've scored on their first subject and we have subject B and we can see what they score on the second subject now let's take two farthest apart points as initial cluster centroids now remember we talked about selecting them randomly or we can also just put them in different points and pick the furthest one apart so they move together either one works okay depending on what kind of data you're working on and what you know about it so we took the two furthest points one and one and five and seven and now let's
take the two farthest apart points as initial cluster centroids each point is then assigned to the closest cluster with respect to the distance from the centroids so we take each one of these points in there we measure that distance and you can see that if we measured each of those distances and you use the Pythagorean theorem for a triangle in this case because you know the X and the Y and you can figure out the diagonal line from that or you just take a ruler and put it on your monitor that'd be kind of silly
but it would work if you're just eyeballing it you can see how they naturally come together in certain areas is now we again calculate the centroids of each cluster so cluster one and then cluster two and we look at each individual dot there's one two three we're in one cluster uh the centroid then moves over it becomes 1.8 comma 2.3 so remember it was at one and one well the very center of the data we're looking at would put it at the one point roughly 22 but 1.8 and 2.3 and the second one if we
wanted to make the overall mean Vector the average Vector of all the different distan to that centroid we come up with 4 comma 1 and 54 so we've now moved the centroids we compare each individual's distance to its own cluster mean and to that of the opposite cluster and we find can build a nice chart on here that the as we move the centroid around we now have a new different kind of clustering of groups and using ukian distance between the points and the mean we get the same formula you see new formulas coming up
so we have our individual dots distance to the means centr of the cluster and distance to the means of the cluster only individual three is nearer to the mean of the opposite cluster cluster 2 than its own cluster one and you can see here in the diagram where we've kind of circled that one in the middle so when we've moved the clust the centroids of the Clusters over one of the points shifted to the other cluster because it's closer to that group of individuals thus individual 3 is relocated to Cluster 2 resulting in a new
Partition and we regenerate all those numbers of how close they are to the different clusters of for the new clusters we will find the actual cluster centroids so now we move the centroids over and you can see that we've now formed two very distinct clusters on here on comparing the distance of each individual's distance to its own cluster mean and to that of the opposite cluster we find that the data points are stable hence we have our final clusters now if you remember I brought up a concept earlier K me on the K means algorithm
choosing the right value of K will help in less number of iterations and to find the appropriate number number of clusters in a data set we use the elbow method and within sum of squares WSS is defined as the sum of the squared distance between each member of the cluster and its centroid and so you see we've done here is we have the number of clusters and as you do the same K means algorithm over the different clusters and you calculate what that centroid looks like and you find the optimal you can actually find the
optimal number of clusters using the elbow the graph is called as the elbow method and on this we guessed at two just by looking at the data but as you can see the slope you actually just look for right there where the elbow is in the slope and you have a clear answer that we want two different to start with k means equals 2 A lot of times people end up Computing K means equals 2 3 four five until they find the value which fits on the elbow joint sometimes you can just look at the
data and if you're really good with that specific domain remember domain I mentioned that last time you'll know that that where to pick those numbers or where to start guessing at what that K value is so let's take this and we're going to use a use case using K means clustering to Cluster cars into Brands using parameters such as horsepower cubic inches make year Etc so we're going to use the data set cars data having information about three brands of cars Toyota Honda and Nissan we'll go back to my favorite tool the Anaconda Navigator with
the Jupiter notebook and let's go ahead and flip over to our Jupiter notebook and in our Jupiter notebook I'm going to go ahead and just paste the uh basic code that we usually start a lot of these off with we're not going to go too much into this code because we've already discussed numpy we've already discussed matplot library and pandas numpy being the number array pandas being the panda data frame and map plot for the graphing and don't forget uh since if you're using the Jupiter notebook you do need the map plot library in line
so that it plots everything on the screen if you're using a different python then you probably don't need that cuz it'll have a popup window on your computer and we'll go ahead and run this just to load our libraries and our setup into here the next step is of course to look at our data which I've already opened up in a spreadsheet and you can see here we have the miles per gallon cylinders cubic inches horsepower weight pounds how you know how heavy it is time it takes to get to 60 my card is probably
on this one at about 80 or 90 what year it is so this is you can actually see this is kind of older cars and then the brand Toyota Honda Nissan so the different cars are coming from all the way from 1971 if we scroll down to uh the 80s we have between the 70s and 80s a number of cars that they've put out and let's uh we come back here we're going to do importing the data so we'll go ahead and do data set equals and we'll use pandas to read this in and it's
uh from a CSV file remember you can always post this in the comments and request the data files for these either in the comments here on the YouTube video or go to Simply learn.com and request that the cars CSV I put it in the same folder as the code that I've stored so my python code is stored in the same folder so I don't have to put the full path if you store them in different folders you do have to change this and double check your name variables and we'll go ahead and run this and
uh We've chosen data set arbitrarily because you know it's a data set we're importing and we've now imported our car CSV into the data set as you know you have to prep the data so we're going to create the X data this is the one that we're going to try to figure out what's going on with and then there is a number of ways to do this but we'll do it in a simple Loop so you can actually see what's going on so we'll do for i n x. columns so we're going to go through
each of the columns and a lot of times it's important I I'll make lists of the columns and do this because I might remove certain columns or there might be columns that I want to be processed differently but for this we can can go ahead and take X of I and we want to go fill Na and that's a panda's command but the question is when are we going to fill the missing data with we definitely don't want to just put in a number that doesn't actually mean something and so one of the tricks you
can do with this is we can take X of I and in addition to that we want to go ahead and turn this into an integer because a lot of these are integers so we'll go ahead and keep it integers and me add the bracket here and a lot of editors will do this they'll think that you're Closing one bracket make sure you get that second bracket in there if it's a double bracket that's always something that happens regularly so once we have our integer of X of Y this is going to fill in any
missing data with the average and I was so busy closing one set of brackets I forgot that the mean is also has brackets in there for the pandas so we can see here we're going to fill in all the data with the average value for that column so if there's missing data is in the average of the data it does have then once we've done that we'll go ahead and loop through it again and just check and see to make sure everything is filled in correctly and we'll print and then we take X is null
and this returns a set of the null value or the how many lines are null and we'll just sum that up to see what that looks like and so when I run this and so with the X what we want to do is we want to remove the last column because that had the models that's what we're trying to see if we can cluster these things and figure out the models there is so many different ways to sort the X out for one we could take the X and we could go data set our variable
we're using and use the iocation one of the features that's in pandas and we could take that and then take all the rows and all but the last column of the data set and at this time we could do values we just convert it to values so that's one way to do this and if I let me just put this down here and print X it's a capital x we chose and I run this you can see it's just the value vales we could also take out the values and it's not going to return anything
because there's no values connected to it what I like to do with this is instead of doing the iocation which does integers more common is to come in here and we have our data set and we're going to do data set dot or data set. columns and remember that list all the columns so if I come in here let me just Mark that as red and I print data set. columns you can see that I have my index here I have my MPG cylinders everything including the brand which we don't want so the way to
get rid of the brand would be to do data Columns of Everything But the last one minus one so now if I print this you'll see the brand disappears and so I can actually just take data set columns minus one and I'll put it right in here for the columns we're going to look at at and let's unmark this and unmark this and now if I do an x. head I now have a new data frame and you can see right here we have all the different columns except for the brand at the end of
the year and it turns out when you start playing with the data set you're going to get an error later on and it'll say cannot convert string to float value and that's because for some reason these things the way they recorded them must have been recorded as strings so we have a neat feature in here on pandas to convert and it is simply convert objects and for this we're going to do convert oops convert underscore numeric numeric equals true and yes I did have to go look that up I don't have it memorized the convert
numeric in there if I'm working with a lot of these things I remember them but uh depending on where I'm at what I'm doing I usually have to look it up and we run that oops I must have missed something in here let me double check my spelling and when I double check my spilling you'll see I missed the first underscore in the convert objects and when I run this it now has everything converted into a numeric value because that's what we're going to be working with is numeric values down here and the next part
is that we need to go through the data and eliminate null values most people when they're doing small amounts working with small data pools discover afterwards that they have a null value and they have to go back and do this so you know be aware whenever we're formatting this data things are going to pop up and sometimes you go backwards to fix it and that's fine that's just part of exploring the data and understanding what you have and I should have done this earlier but let me go ahead and increase the size of my window
one notch there we go easier to see so we'll do 4 I in working with x. columns we'll page through all the columns and we want to take X of I we're going to change that we're going to alter it and so with this we want to go ahead and fill in X ofi pandis has the fill Na and that just fills in any non-existent missing data and we'll put my brackets up and there's a lot of different ways to fill this data if you have a really large data set some people just void out
that data because if and then look at it later in a separate exploration of data one of the tricks we can do is we can take our column and we can find the means and the means is in other our our quotation marks so when we take the columns we're going to fill in the the non-existing one with the means the problem is that returns a decimal float so some of these aren't decimals certainly we need to be a little careful doing this but for this example we're just going to fill it in with the
integer version of this keeps it on par with the other data that isn't a decimal point and then what we also want to do is we want to double check A lot of times you do this first part first to double check then you do the fill and then you do it again just to make sure you did it right so we're going to go through and test for missing data and one of the re ways you can do that is simply go in here and take our X ofi column so it's going to go
through the x of I column it says is null so it's going to return any any place there's a null value it actually goes through all the rows of each column is null and then we want to go ahead and sum that so we take that and we add the sum value and these are all pandas so is null is a panda command and so is sum and if we go through that and we go ahead and run it and we go ahead and take and run that you'll see that all the columns have zero
null values so we've now tested and double checked and our data is nice and clean we have no null values everything is now a number value we turned it into numeric and we've removed the last column in our data and at this point we're actually going to start using the elbow method to find the optimal number of clusters so we're now actually getting into the SK learn part uh the K means clustering on here I guess we'll go ahead and zoom it up one more notot so you can see what I'm typing in in here
and then from sklearn going to or sklearn cluster we're going to import K means I always forget to capitalize the K and the M when I do this so capital K capital M K means and we'll go and create a um array wcss equals let make it an empty array if you remember remember from the elbow method from our slide within the sums of squares WSS is defined as the sum of squar distance between each member of the cluster and its centroid so we're looking at that change in differences as far as a squar distance
and we're going to run this over a number of K mean values in fact let's go for I in range we'll do 11 of them range Z of 11 and the first thing we're going to do is we're going to create the actual we'll do it all lower case and so we're going to create this object from the K means that we just imported and the variable that we want to put into this is in clusters we're going to set that equals to I that's the most important one because we're looking at how increasing the
number of clusters changes is our answer there are a lot of settings to the K means our guys in the back did a great job just kind of playing with some of them the most common ones that you see in a lot of stuff is how you init your K means so we have K means plus P plus this is just a tool to let the model itself be smart how it picks it centroids to start with its initial centroids we only want to iterate no more than 300 times we have a Max iteration we
put in there we have have the infinite the random State equals zero you really don't need to worry too much about these when you're first learning this as you start digging in deeper you start finding that these are shortcuts that will speed up the process as far as a setup but the big one that we're working with is the n clusters equals I so we're going to literally train our K means 11 times we're going to do this process 11 times and if you're working with uh Big Data you know the first thing you do
is you run a small sample of the data so you can test all your stuff on it and you can already see the problem that if I'm going to iterate through a terabyte of data 11 times and then the K means itself is iterating through the data multiple times that's a heck of a process so you got to be a little careful with this a lot of times though you can find your elbow using the elbow method find your optimal number on a sample of data especially if you're working with larger data sources so we
want to go ahead and take our K means and we're just going to fit it if you're looking at any of the the SK learn very common you fit your model and if you remember correctly our variable we're using is the capital x and once we fit this value we go back to the um array we made and we want to go and just depin that value on the end and it's not the actual fitware pining in there it's when it generates it it generates the value you're looking for is inertia so K means do
inera will pull that specific value out that we need and let's get a visualiz this we'll do our PLT plot and what we're plotting here is first the xaxis which is range 01 so that will generate a nice little plot there and the wcss for our Y axis it's always nice to give our plot a title and let's see we'll just give it the elbow method for the title and let's get some labels so let's go ahead and do PLT X label and what we'll do we'll do number of clusters for that and PLT y
label and for that we can do oops there we go wcss since that's what we're doing on the plot on there and finally we want to go ahead and display our graph which is simply PLT do oops. show there we go and because we have it set to inline it'll peer in line hopefully I didn't make a type error on there and you can see we get a very nice graph you can see a very nice elbow joint there at uh two and again right around three and four and then after that there's not very
much now as a data scientist if I was looking at this I would do either three or four and I'd actually try both of them to see what the um output looked like and they've already tried this in the back so we're just going to use three as a setup on here and let's go ahead and see what that looks like when we actually use this to show the different kinds of cars and so let's go ahead and apply the K means to the cars data set and basically we're going to copy the code that
we looped through up above where K means equals K means number of clusters and we're just going to set the number of clusters to three since that's what we're going to look for you could do three and four on this and graph them just to see how they come up differently be kind of curious to look at that but for this we're just going to set it to three go ahead and create our own variable y K means for our answers and we're going to set that equal to whoops I double equal there to K
means but we're not going to do a fit we're going to do a fit predict is the setup you want to use and when you're using untrained models you'll see um a slightly different usually you see fit and then you see just the predict but we want to both fit and predict the K means on this and that's fitcore predict and then our capital x is the data we're working working with and before we plot this data we're going to do a little pandas trick we're going to take our x value and we're going to
set XS Matrix so we're converting this into a nice rows and columns kind of set up but we want the we're going to have columns equals none so it's just going to be a matrix of data in here and let's go ahead and run that a little warning you'll see this warnings pop up because things are always being updated so there's like minor changes in the versions and future versions instead of Matrix now that it's more common to set it values instead of doing as Matrix but M Matrix works just fine for right now and
you'll want to update that later on but let's go ahead and dive in and plot this and see what that looks like and before we dive into plotting this data I always like to take a look and see what I am plotting so let's take a look at y k means and I'm just going to print that out down here and we see we have an array of answers we have 2 1 0 2 1 2 so it's clustering these different rows of data based on the three different spaces it thinks it's going to be
and then let's go ahead and print X and see what we have for x and we'll see that X is an array it's a matrix so we have our different values in the array and what we're going to do it's very hard to plot all the different values in the array so we're only going to be looking at the first two or positions zero and one and if you were doing a full presentation in front of the board meeting you might actually do a little different than and dig a little deeper into the different aspects
because this is all the different columns we looked at but we look at columns one and two for this to make it easy so let's go ahead and clear this data out of here and let's bring up our plot and we're going to do a scatter plot here so PLT scatter and this looks a little complicated so let's explain what's going on with this we're going to take the X values and we're only interested in y of K means equals zero the first cluster okay and then we're going to take value zero for the xaxis
and then we're going to do the same thing here we're only interested in K means equals zero but we're going to take the second column so we're only looking at the first two columns in our answer or in the data and then the guys in the back played with this a little bit to make it pretty and they discovered that it looks good with has a size equals 100 that's the size of the dots we're going to use red for this one and when they were looking at the data and what came out it was
definitely the Toyota on this so we're just going to go ahead and label it Toyota again that's something you really have to explore in here as far as playing with those numbers and see what looks good we'll go ahead and hit enter in there and I'm just going to paste in the next two lines which is the next two cars and this is our Nissa and Honda and you'll see with our scatter plot we're now looking at where Yore K means equals 1 and we want the zero column and y k means equals 2 again
we're looking at just the first two columns zero and one and each of these rows then corresponds to Nissan and Honda and I'll go ahead and hit enter on there and uh finally let's take a look and put the centroids on there again we're going to do a scatter plot and on the centroids you can just pull that from our K means the model we created do cluster centers and we're going to just do um all of them in the first number and all of them in the second number which is 01 because you always
start with zero and one and then they were playing with the size and everything to make it look good we'll do a size of 300 we're going to make the color yellow and we'll label them so always good to have some good labels centroids and then we do want to do a title PLT title and pop up there PLT title CU you always make want to make your graphs look pretty we'll call it clusters of car make and one of the features of the plot library is you can add a legend it'll automatically bring in
it since we've already labeled the different aspects of the legend with Toyota Nissan and Honda and finally we want to go ahead and show so we can actually see it and remember it's in line uh so if you're using a different editor that's not the Jupiter notebook you'll get a popup of this and you should have a nice set of clusters here so we can look at this and we have a clusters of Honda and green Toyota and red Nissan and purple and you can see where they put the centroids to separate them now when
we're looking at this we can also plot a lot of other different data on here as far because we only looked at the first two columns this is just column one and two or 01 as as you label them in computer scripting but you can see here we have a nice clusters of Carm and we' were able to pull out the data and you can see how just these two columns form very distinct clusters of data so if you were exploring new data you might take a look and say well what makes these different almost
going in reverse you start looking at the data and pulling apart the columns to find out why is the first group set up the way it is maybe you're doing loans and you want to go well why is this group not defaulting on their loans and why is the last group defaulting on their loans and why is the middle group 50% defaulting on their bank loans and you start finding ways to manipulate the data and pull out the answers you want so now that you've seen how to use K mean for clustering let's move on
to the next topic now let's look into logistic regression the logistic regression algorithm is the simplest classification algorithm used for binary or multiclassification problems and we can see we have our little girl from Canada who's into horror books is back that's actually really scary when you think about that with those big eyes in the previous tutorial we learned about linear regression dependent and independent variables so to brush up y = mx + C very basic algebraic function of uh Y and X the dependent variable is the target class variable we are going to predict the
independent variable X1 all the way up to xn are the features or attributes we're going to use to predict the target class we know what a linear regression looks like but using the graph we cannot divide the outcome into categories it's really hard to categorize 1.5 3.6 9.8 uh for example a linear regression graph can tell us that with increase in number of hours studied the marks of a student will increase but it will not tell us whether the student will pass or not in such cases where we need the output as categorical value we
will use logistic regression and for that we're going to use the sigmoid function so you can see here we have our marks 0 to 100 number of hours studied that's going to be what they're comparing it to in this example and we usually form a line that says y = mx + C and when we use the sigmoid function we have P = 1 over 1 + eus y it generates a sigmoid curve and so you can see right here when you take the Ln which is the natural logarithm I always thought it should be
NL not Ln that's just the inverse of uh e your e to the minus y and so we do this we get Ln of p over 1 minus p = m * X plus C that's the sigmoid curve function we're looking for and we can zoom in on the function and you'll see that the function as it derives goes to one or to zero depending on what your x value is and the probability if it's greater than 0.5 the value is automatically rounded off to one indicating that the student will pass so if they're doing
a certain amount of studying they will probably pass then you have a threshold value at the0 five it automatically puts that right in the middle usually and your probability if it's less than 0.5 the value rent it off to zero indicating the student will fail so if they're not studying very hard they're probably going to fail this of course is ignoring the outliers of that one student who's just a natural genius and doesn't need any studying to memorize everything that's not me unfortunately have to study hard to learn new stuff problem statement to classify whether
a tumor is malignant or B9 and this is actually one of my favorite data sets to play with because it has so many features and when you look at them you really are hard to understand you can't just look at them and know the answer so it gives you a chance to kind of dive into what data looks like when you aren't able to understand the specific domain of the data but I also want you to remind you that in the domain of medicine if I told you that my probability was really good it classified
things that say 90% or 95% and I'm classifying whether you're going to have a malignant or a Bine tumor I'm guessing that you're going to go get it tested anyways so you got to remember the domain we're working with so why would you want to do that if you know you're just going to go get a biopsy because you know it's that serious this is like an all or nothing just referencing the domain it's important it might help the doctor know where to look just by understanding what kind of tumor it is so it might
help them or Aid them in something they missed from before so let's go ahead and dive into the code and I'll come back to the domain part of it in just a minute so use case and we're going to do our noral Imports here where we're importing numpy Panda Seaborn the matplot library and we're going to do matplot library in line since I'm going to switch over to Anaconda so let's go ahead and flip over there and get this started so I've opened up a new window in my anaconda Jupiter notebook by the way Jupiter
notebook uh you don't have to use Anaconda for the Jupiter notebook I just love the interface and all the tools in Anaconda brings so we got our import numpy as in P for our numpy number array we have our Panda's PD we're going to bring in caborn to help us with our graphs as SNS so many really nice Tools in both caborn and matplot library and we'll do our matplot library. pyplot as PLT and then of course we want to let it know to do it in line and let's go and just run that so
it's all set up and we're just going to call our data data not creative today uh equals PD and this happens to be in a CSV file so we'll use a pd. read CSV and I happen to name the file I renamed it data for p2.png for you and let's just um open up the data before we go any further and let's just see what it looks like in a spreadsheet so when I pop it open in a local spreadsheet and this is just a CSV file comma separate variables we have an ID so I
guess they U categorizes for reference of what id which test was done the diagnosis M for malignant B for B9 so there's two different options on there and that's what we're going to try to predict is the m and b and test it and then we have like the radius mean or average the texture average perimeter mean area mean smoothness I don't know about you but unless you're a doctor in the field most of the stuff I mean you can guess what concave means just by the term concave but really wouldn't know what that means
in the measurements they're taking so they have all kinds of stuff like how smooth it is uh the Symmetry and these are all float values we just page through them real quick and you'll see there's I believe 36 if I remember correctly in this one so there's a lot of different values they take and all these measurements they take when they go in there and they take a look at the different growth the tumorous growth so back in our data and I put this in the same folder as a code so I saved this code
in that folder obviously if you have it in a different location you want to put the full path in there and we'll just do U Panda's first five lines of data with the data. head and we run that we can see that we have pretty much what we just looked at we have an ID we have a diagnosis if we go all the way across you'll see all the different columns coming across displayed nicely for our data and while we're exploring the data our caborn which we referenced as SNS makes it very easy to go
in here and do a joint plot you'll notice the very similar to because it is sitting on top of the um plot Library so the joint plot does a lot of work for us and we're just going to look at the first two columns that we're interested in the radius mean and the texture mean we'll just look at those two columns and data equals data so that tells it which two columns we're plotting and that we're going to use the data that we pulled in let's just run that and it generates a really nice graph
on here and there's all kinds of cool things on this graph to look at I mean we have the texture mean and the radius mean obviously the axes you can also see and one of the the cool things on here is you can also see the histogram they show that for the radius mean where is the most common radius mean come up and where the most common texture is so we're looking at the tech the on each growth its average texture and on each radius its average uh radius on there gets a little confusing because
we're talking about the individual objects average and then we can also look over here and see the the histogram showing us the median or how common each measurement is and that's only two columns so let's dig a little deeper into Seaborn they also have a heat map and if you're not familiar with heat Maps a heat map just means it's in color that's all that means heat map I guess the original ones were plotting heat density on something and so ever since then it's just called a heat map and we're going to take our data
and get our corresponding numbers to put that into the heat map and that's simply data do c RR for that that's a panda expression remember we're working in a pandas data frame so that's one of the Cool Tools in pandas for our data and this is pull that information into a heat map and see what that looks like and you'll see that we're now looking at all the different features we have our ID we have our texture we have our area our compactness concave points and if you look down the middle of this chart diagonal
going from the upper left to bottom right it's all white that's because when you compare texture to texture they're identical so they're 100% or in this case perfect one in their correspondence and you'll see that when you look at say area or right below it it has almost a black on there when you compare it to texture so these have almost no corresponding data They Don't Really form a linear graph or something that you can look at and say how connected they are they're very scattered data this is really just a really nice graph to
get a quick look at your data doesn't so much change what you do but it changes verifying so when you get an answer or something like that or you start looking at some of these individual pieces you might go hey that doesn't match according to showing our heat map this should not correlate with each other and if it is you're going to have to start asking well why what's going on what else is coming in there but it does show some really cool information on here mean we can see from the ID there's no real
one feature that just says if you go across the top line that lights up there's no one feature that says hey if the area is a certain size then it's going to be B9 or malignant it says there's some that sort of add up and that's a big hint in the data that we're trying to ID this whether it's malignant or B9 that's a big hint to us as data scientists to go okay we can't solve this with any one feature it's going to be something that includes all the features or many of the different
features to come up with the solution for it and while we're exploring the data let's explore one more area and let's look at data do is null we want to check for null values in our data if you remember from earlier in this tutorial we did it a little differently where we added stuff up and summ them up you can actually with pandas do it really quickly data. is null and Summit and it's going to go across all the columns so when I run this you're going to see all the columns come up with no
null data so we've just just to reash these last few steps we've done a lot of exploration we have looked at the first two columns and seen how they plot with the caborn with a joint plot which shows both the histogram and the data plotted on the XY coordinates and obviously you can do that more in detail with different columns and see how they plot together and then we took and did the Seaborn heat map the SNS do heat map of the data and you can see right here where it did a nice job showing
us some bright spots where stuff correlates with each other and forms a very nice combination or points of scattering points and you can also see areas that don't and then finally we went ahead and checked the data is the data null value do we have any missing data in there very important step because it'll crash later on if you forget to do this step it will remind you when you get that nice error code that says null values okay so not a big deal if you miss it but it it's no fun having to go
back when you're you're in a huge process and you've missed this step and now you're 10 steps later and you got to go remember where you were pulling the data in so we need to go ahead and pull out our X and our y so we just put that down here and we'll set the x equal 2 and there's a lot of different options here certainly we could do x equals all the columns except for the first two because if you remember the first two is the ID and the diagnosis so that certainly would be
an option but what we're going to do is we're actually going to focus on the worst the worst radius the worst texture parameter area smoothness compactness and so on one of the reasons to start dividing your data up when you're looking at this information is sometimes the data will be the same data coming in so if I have two measurements coming into my model it might overweigh them it might overpower the other measurements because it's measuring it's basically taking that information in twice that's a little bit past the scope of this tutorial I want you
to take away from this though is that we are dividing the data up into pieces and our team in the back went ahead and said hey let's just look at the worst so I'm going to create a an array and you'll see this array radius worst texture worst perimeter worst we've just taken the worst of the worst and I'm just going to put that in my X so this x is still a pandas data frame but it's just those columns and our y if you remember correctly is going to be oops hold on one second
it's not X it's data there we go so x equals data and then it's a list of the different columns the worst of the worst and if we're going to take that then we have to have our answer for our Y for the stuff we know and if you remember correctly we're just going to be looking at the diagnosis that's all we care about is what is it diagnosed is it Bine or malignant and since it's a single column we can just do diagnosis oh I forgot to put the brackets the there we go okay
so it's just diagnosis on there and we can also real quickly do like x. head if you want to see what that looks like and Y do head and run this and you'll see um it only does the last one I forgot about that if you don't do print you can see that the the y. head is just Mmm because the first ones are all malignant and if I run this the x. head is just the first five values of radius worst texture worst parameter worst area worst and so on I'll go ahead and take
that out so moving down to the next step we've built our two data sets our answer and then the features we want to look at in data science it's very important to test your model so we do that by splitting the data and from sklearn model selection we're going to import train test split so we're going to split it into two groups there are so many ways to do this I noticed in one of the more modern ways they actually split it into three groups and then you model each group and test it against the
other groups so you have all kinds of and there's reasons for that which is past the scope of this and for this particular example isn't necessary for this we're just going to split it into two groups one to train our data and one to test our data and the sklearn uh. model selection we have train test split you could write your own quick code to do this where we just randomly divide the up into two groups but they do it for us nicely and we actually can almost we can actually do it in one statement
with this where we're going to generate four variables capital x train capital X test so we have our training data we're going to use to fit the model and then we need something to test it and then we have our y train so we're going to train the answer and then we have our test so this is the stuff we want to see how good it did on our model and we'll go ahead and take our train test split that we just imported and we're going to do X and our y our two different data
that's going in for our split and then the guys in the back came up and wanted us to go ahead and use a test size equals 3 that's testore size random State it's always nice to kind of switch a random State around but not that important what this means is that the test size is we're going to take 30% of the data and we're going to put that into our test variables our y test and our X test and we're going to do 70% into the X train and the Y train so we're going to
use 70% of the data to train our model and 30% to test it let's go ahead and run that and load those up so now we have all our stuff split up and all our data ready to go now we get to the actual Logistics part we're actually going to do our create our model so let's go ahead and bring that in from sklearn we're going to bring in our linear model and we're going to import logistic regression that's the actual model we're using and this we'll call it log model oops there we go model
and let's just set this equal to our logistic regression that we just imported so now we have a variable log model set to that class for us to use and with most the uh models and the SK learn we just need to go ahead and fix it fit do a fit on there and we use our X train that we separated out with our y train and let's go ahead and run this so once we've run this we'll have a model that fits this data that 70% of our training data uh and of course it
prints us out that tells us all the different variables that you can set on there there's a lot of different choices you can make but for word do we're just going to let all the defaults set we don't really need to mess with those on this particular example and there's nothing in here that really stands out as super important until you start fine-tuning it but for what we're doing the basics will work just fine and then let's we need to go ahead and test out our model is it working so let's create a variable y
predict and this is going to be equal to our log model and we want to do a predict again very standard uh format for the sklearn library is taking your model and doing a predict on it and we're going to test y predict against the Y test so we want to know what the model thinks it's going to be that's what our y predict is and with that we want the capital x x test so we have our train set and our test set and now we're going to do our y predict and let's go
ahead and run that and if we uh print y predict let me go ahead and run that you'll see it comes up and it preds a prints a nice array of uh B and M for B9 and malignant for all the different test data we put in there so it does pretty good we're not sure exactly how good it does but we can see that it actually works and it's functional was very easy to create you'll always discover with our data science that as you explore this you spend a significant amount of time prepping your
data and making sure your data coming in is good uh there's a saying good data in good answers out bad data in bad answers out that's only half the thing that's only half of it selecting your models becomes the next part as far as how good your models are and then of course fine-tuning it depending on what model you're using so we come in here we want to know how good this came out so we have our y predict here log model. predict X test so for deciding how good our model is we're going to
go from the sklearn metrics we're going to import classification report and that just reports how good our model is doing and then we're going to feed it the uh model data and let's just print this out and we'll take our classification report and we're going to put into there our test our actual data so this is what we actually know is true and our prediction what our model predicted for that data on the test side and let's run that and see what that does so we pull that up you'll see that we have um a
Precision for B9 and malignant B&M and we have a Precision of 93 and 91 a total of 92 so it's kind of the average between these two of 92 there's all kinds of different information on here your F1 score your recall your support coming through on this and for this I'll go ahead and just flip back to our slides that they put together for describing it and so here we're going to look at the Precision using the classification report and you see this is the same print out I had up above some of the numbers
might be different because it does randomly pick out which data we're using so this model is able to predict the type of tumor with 91% accuracy so when we look back here that's you will see where we have uh B9 in Midland it actually is 92 coming up here but we're looking about a 92 91% precision and remember I reminded you about domain so we're talking about the domain of a medical domain with a very catastrophic outcome you know at 91 or 92% Precision you're still going to go in there and have somebody do a
biopsy on it very different than if you're investing money and there's a 90 2% chance you're going to earn 10% and 8% chance you're going to lose 8% you're probably going to bet the money because at that odds it's pretty good that you'll make some money and in the long run you do that enough you definitely will make money and also with this domain I've actually seen them use this to identify different forms of cancer that's one of the things that they're starting to use these models for because then it helps the doctor know what
to investigate so that wraps up this section we're finally we're going to go in there and let's discuss the answer to the quiz as in machine learning tutorial part one can you tell what's happening in the following cases grouping documents into different categories based on the topic and content of each document this is an example of clustering where K means clustering can be used to group the documents by topics using bag of words approach so if You' gotten in there that you're looking for clustering and hopefully you had at least one or two examples like
K means that are used for clustering different things then give yourself a two thumbs up B identifying handwritten digits in images correctly this is an example of classification the traditional approach to solving this would be to extract digit dependent features like curvature of different digits Etc and then use a classifier like svm to distinguish between images again if you got the fact that it's a classification example give yourself a thumb up and if you're able to go hey let's use svm or another model for this give yourself those two thumbs up on it C Behavior
of a website indicating that the site is not working as designed this is an example of anomaly detection in this case the algorithm learns what is normal and what is not normal usually by observing the logs of the website give yourself a thumbs up if you got that one and just for a bonus can you think of another example of anomaly detection one of the ones I use it for my own business is detecting anomalies in stock markets stock markets are very ficked and they behave very radical so finding those erratic areas and then finding
ways to track down why they're erratic was something released in social media was something released you can see we're knowing where that anomaly is can help you to figure out what the answer is to it in another area D predicting salary of an individual based on his or her years of experience this is an example of regression this problem can be mathematically defined as a function between independent years of experience and dependent variables salary of an individual and if you get guess that this was a regression model give yourself a thumbs up and if you're
able to remember that it it was between independent and dependent variables and that terms give yourself two thumbs up summary so to wrap it up we went over what is K means and we went through also the chart of choosing your elbow method and assigning a random centroid to the Clusters Computing the distance and then going in there and figuring out what the minimum centroids is and Computing the distance and going through that Loop until it gets the perfect centroid and we looked into the elbow method to choose K based on running our clusters across
a number of variables and finding the best location for that we did a nice example of clustering cars with K means even though we only looked at the first two columns to make it simple and easy to graph you can easily extrapolate that and look at all the different columns and see how they all fit together and we looked at what is logistic regression we discussed the sigmoid function what is logistic regression and then we went into to an example of classifying tumors with Logistics I hope you enjoyed part two of machine learning why reinforcement
learning training a machine learning model requires a lot of data which might not always be available to us further the data provided might not be reliable learning from a small subset of actions will not help expand the vast realm of solutions that may work for particular problem and you can see here we have the robot learning to walk um very complicated ated setup when you're learning how to walk and you'll start asking questions like if I'm taking one step forward and left what happens if I pick up a 50b object how does that change how
a robot would walk these things are very difficult to program because there's no actual information on it until it's actually tried out learning from a small subset of actions will not help expand the vast realm of solutions that may work for a particular problem and we'll see here it learned how to walk this is going to slow the growth thatch technology is capable of machines need to learn to perform actions by themselves and not just learn off humans and you see the objective climb a mountain real interesting point here is that as human beings we
can go into a very unknown environment and we can adjust for it and kind of explore and play with it most of the models the non-reinforcement models in computer um machine learning aren't able to do that very well uh there's a couple of them that can be used or integrated see how it goes is what we're talking about with reinforcement learning so what is reinforcement learning reinforcement learning is a subbranch of machine learning that trains a model to return an Optimum solution for a problem by taking a sequence of decisions by itself consider a robot
learning to go from one place to another the robot is given a scenario and must arrive at a solution by itself the robot can take different paths to reach the destination it will know the best path by the time taken on each path it might even come up with a unique solution all by itself and that's really important is we're looking for Unique Solutions uh we want the best solution but you can't find it unless you try it so we're looking at uh our different systems our different model we have supervised versus unsupervised versus reinforcement
learning and with the supervised learning that is probably the most controlled environment uh we have a lot of different supervised learning models whether it's linear regression neural networks um there's all kinds of things in between decision trees the data provided is labeled data with output values specified and this is important because when we talk about supervised learning you already know the answer for all this information you already know the picture has a motorcycle in it so you're supervised learning you already know that um the outcome for tomorrow for you know going back a week you're
looking at stock you can already have like the graph of what the next day looks like so you have an answer for it and you have labeled data which is used you have an external supervision and solves Problems by mapping labeled input to know output so very controlled unsupervised learning and unsupervised learning is really interesting because it's now taking part in many other models they start with an you can actually insert an unsupervised learning model um in almost either supervised or reinforcement learning as part of the system which is really cool uh data provided is
unlabeled data the outputs are not specified machine makes its own predictions used to solve association with clustering problems unlabeled data is used no supervision solves Problems by understanding patterns and discovering output uh so you can look at this and you can think um some of these things go with each other they belong together so so it's looking for what connects in different ways and there's a lot of different algorithms that look at this um when you start getting into those there's some really cool images that come up of what unsupervised learning is how it can
pick out say uh the area of a donut one model will see the area of the dut and the other one will divide it into three sections based on its location versus what's next to it so there's a lot of stuff that goes in with unsupervised learning and then we're looking at reinforcement learning probably the biggest industry in today's Market uh in machine learning or growing Market it's very in it's very infant stage uh as far as how it works and what's going to be capable of the machine learns from its environment using rewards and
errors used to solve reward-based problems no predefined data is used no supervision follows Trail and error problem solving approach uh so again we have a random at first you start with a random I try this it works and this is my reward doesn't does work very well maybe or maybe it doesn't even get you where you're trying to get it to do and you get your reward back and then it looks at that and says well let's try something else and it starts to play with these different things finding the best route so let's take
a look at important terms in today's reinforcement model and this has become pretty standardized over the last uh few years so these are really good to know we have the agent uh agent is the model that is being trained via reinforcement learning so this is your actual uh in that has however you're doing it whether you're using a neural network or a q table or whatever combination thereof this is the actual agent that you're using this is the model and you have your environment uh the training situation that the model must optimize to is called
its environment uh and you can see here I guess we have a robot who's trying to get a chest full of gems or whatever and that's the output and then you have your action this is all possible steps that can be taken by the model and it pick one action and you can see here it's picked three different uh routes to get to the chest of diamonds and gems we have a state the current position condition returned by the model and you could look at this uh if you're playing like a video game this is
the screen you're looking at uh so when you go back here uh the environment is a whole game board so if you're playing one of those Mobius games you might have the whole game board going on uh but then you have your current position where are you on that game board what's around that what's around you um if you were talking about a robot the environment might be moving around the yard where it is in the yard and what it can see what input it has in that location that would be the current position condition
returned by the model and then the reward uh to help the model move in the right direction it is rewarded points are given to it to appraise some kind of action so yay you did good or if uh didn't do as good trying to maximize the reward and have the best reward possible and then policy policy determines how an agent will behave at any time it acts as a mapping between action and present State this is part of the model what what is your action that you're you're going to take what's the policy you're using
to have an output from your agent one of the reasons they separate uh policy as its own entity is that you usually have a prediction um of a different options and then the policy well how am I going to pick the best based on those predictions I'm going to guess at different options and we'll actually weigh those options in and find the best option we think will work uh so it's a little tricky but the policy thing is actually pretty cool how it works let's go ahe and take a look at a reinforcement learning example
and just in looking at this we're going to take a look uh consider what a dog um that we want to train uh so the dog would be like the agent so you have your your puppy or whatever uh and then your environment is going to be the whole house or whatever it is where you're training them and then you have an action we want to teach the dog to fetch so action equals fetching uh and then we have a little biscuit so we can get the dog to perform various actions by offering incentives such
as a dog biscuit as a reward the dog will follow a policy to maximize this reward and hence will follow every command and might even learn new actions like begging by itself uh so you have B you know so we start off with fetching it goes oh I get a biscuit for that it tries something else you get a handshake or begging or something like that and it goes oh this is also reward-based and so it kind of explores things to find out what will bring it as biscuit and that's very much like how a
reinforced model goes is it uh looks for different rewards how do I find can I try different things and find a reward that works the dog also will want to run around and play explorers environment uh this quality of model is called exploration so there's a little Randomness going on in Exploration and explores new parts of the house climbing on the sofa doesn't get a reward in fact it usually gets kicked off the sofa so let's talk a little bit about markov's decision process uh markov's decision process is a reinforcement learning policy used to map
a current state to an action where the agent continuously interacts with the environment to produce new Solutions and receive rewards and you'll see here's all of our different uh uh vocabulary we just went over we have a reward our state or agent our environment interaction and so even though the environment kind of contains everything um that you you really when you're actually writing the program your environment's going to put out a reward in state that goes into the agent uh the agent then looks at this uh state or it looks at the reward usually um
first and it says okay I got rewarded for whatever I just did or I didn't get rewarded and then it looks at the state then it comes back and if you remember from policy the policy comes in uh and then we have a reward the policy is that part that's connected at the bottom and so it looks at that policy and it says hey what's a good action that will probably be similar to what I did or um uh sometimes they're completely random but what's a good action that's going to bring me a different reward
so taking the time to just understand these different pieces as they go is pretty important in most of the models today um and so a lot of them actually have templates based on this you can pull in and start using um pretty straightforward as far as once you start seeing how it works uh you can see your environment send says hey this is the agent did this if you're a character a game this happened and it shoots out a reward in a state the agent looks at the reward looks at the new state and then
takes a little guess and says I'm going to try this action and then that action goes back into the environment it affects the environment the environment then changes depending on what the action was and then it has a new state and a new reward that goes back to the agent so in the diagram Shan we need to find the shortest path between note A and D each path has a reward associated with it and the path with the maximum reward is what we want to choose the nodes a c d denote the nodes to travel
from node uh A to B is an action reward is the cost of each path and policy is each path taken and you can see here a can go uh to b or a can go to C right off the bat or it can go right to D and if explored all three of these uh you would find that a going to D was a zero reward um a going to C and D would generate a different reward or you could go AC BD there's a lot of options here um and so when we start
looking at this diagram you start to realize that even though uh today's reinforced learning models do really good at um finding an answer they end up trying almost all the different directions you see and so they take up a lot of work uh or a lot of processing time for reinforcement learning they're right now in their infant stage and they're really really good at solving simple problems and we'll take a look at one of those in just a minute in a tic tac toe game uh but you can see here uh once it's gone through
these and it's explored it's going to find the ACD is the best reward it gets a full 30 points for it so let's go ahead and take a look at a reinforcement learning demo uh in this demo we're going to use reinforcement learning to make a tick tac toe game you will be playing this game Against the Machine learning model and we'll go ahead we're doing it in Python so let's go ahead and go through um I always uh not always actually have a lot of python tools let's go through um Anaconda which will open
up a Jupiter notebook seems like a lot of steps but it's worth it to keep all my stuff separate and it's also has a nice display when you're in the Jupiter notebook for doing python so here's our Anaconda Navigator I open up the notebook which is going to take me to a web page and I've gone in here and created a new uh python folder in this case I've already done it and enabled it to change the name to tic-tac-toe uh and then for this example uh we're going to go ahead and import a couple
things we're going to um import numpy as NP we'll go ahead and import pickle numpy of course is our number array and pickle is just a nice way sometimes for storing uh different information uh different states that we're going to go through on here uh and so we're going to create a class called State we're going to start with that that and there's a lot of lines of code to this uh class that we're going to put in here don't let that scare you too much there's not as much here um it looks like there's
going to be a lot here but there really is just a lot of setup going on in the in our class date and so we have up here we're going to initialize it um we have our board um it's a tic teac toe board so we're only dealing with nine spots on the board uh we have player one player two uh is in we're going to create a board hash uh we'll look at that in just a minute we're just going to stir some information in there symbol of player equals one um so there's a
few things going on as far as the initialization uh then something simple we're just going to get the hash um of the board we're going to get the information from the board on there which is uh columns and rows we want to know when a winner occurs uh so if you get three in a row that's what this whole section here is for um me go ahead and scroll up a little bit and you can get a copy of this code if you send a note over to Simply learn we'll send you over um this
particular file and you can play with it yourself and see how it's put together I don't want to spend a huge amount of time on this uh because this is just some real General python coding uh but you can see here we're just going through um all the rows and you add them together and if it equals three three in a row same thing with columns U diagonal so you got to check the diagonal that's what all this stuff does here is it just goes through the different areas actually let me go ahead and put
there we go um and then it comes down here and we do our sum and it says true uh minus three just says did somebody win or is it a tie so you got to add up all the numbers on there anyway just in case they're all filled up and next we also need to know available positions um these are ones that don't no one's ever used before this way when you try something or the computer tries something uh it's not going to give it an illegal move that's what the available positions is doing uh
then we want to update our state and so you have your position going in we're just sending in the position that you just chose and you'll see there's a little user interface we put in there you P pick the row and column in there and again I mean this is a lot of code uh so really it's kind of a thing you'd want to go through and play with a little bit and just read through it get a copy of it great way to understand how this works and here is a given reward um so
we're going to give a reward result equals self- winner this is one of the hearts of what's going on here uh is we have a result self. winner so if there's a winner then we have a result if the result equals one here's our feedback uh if it doesn't equal one then it gets a zero so it only gets a reward in this particular case if it wins and that's important to know because different uh systems of reinforced learning do rewarding a lot differently depending on what you're trying to do this is a very simple
example with a a 3X3 board imagine if you're playing a video game uh certainly you only have so many actions but your environment is huge you have a lot going on in the en requirement and suddenly a reward system like this is going to be just um it's going to have to change a little bit it's going to have to have different rewards and different setup and there's all kinds of advanced ways to do that as far as weighing you add weights to it and so they can add the weights up depending on where the
reward comes in so it might be that you actually get a reward in this case you get the reward at the end of the game and I'm spending just a little bit of time on this because this is an important thing to note but there's different ways to add up those rewards it might have like if you take a certain path um the first reward is going to be weighed a little bit less than the last reward because the last reward is actually winning the game or scoring or whatever it is so this reward system
gets really complicated on some of the more advanced uh setups um in this case though you can see right here that they give a um a 0.1 and a 0 five reward um just for getting a picking the right value and something that's actually valid instead of picking an invalid value so rewards again that's like key it's huge how do you feed the rewards back in uh then we have a board reset that's pretty straightforward it just goes back and resets the board to the beginning because it's going to try out all these different things
while it's learning it's going to do it by trial and error so you have to keep resetting it and then of course there's the play we want to go ahead and play uh rounds equals 100 depends on what you want to do on here uh you can set this different you obviously set that to higher level but this is just going to go through and you'll see in here uh that we have player one and player two this is this is the computer playing itself uh one of the more powerful ways to learn to play
a game or even learn something that isn't a game is to have two of these models that are basically trying to beat each other and so they they keep finding explore new things this one works for this one so this one tries new things it beats this we've seen this in um chess I think was a big one where they had the two players in chess with reinforcement learning uh was one of the ways they train one of the top um computer chess playing algorithms uh so this is just what this is it's going to
choose an action it's going to try something and the more it tries stuff um the more we're going to record the hash we actually have a board hash where they self get the hash setup on here where it stores all the information and then once you get to a win one of them wins it gets the reward uh then we go back and reset and try again and then kind of the fun part we actually get down here is uh we're going to play with a human so we'll get a chance to come in here
and see what that looks like when you put your own information in and then it just comes in here does the same thing it did above it gives it a reward for its things um or sees if it wins or ties um looks at available position all that kind of fun stuff and then finally we want to show the board uh so it's going to print the board out each time really um as an integration is not that exciting what's exciting uh in here is one looking at this reward system whoops Play One More up
the reward system is really the heart of this how do you reward the different uh setup and the other one is when it's playing it's got to take an action and so what it chooses for an action is also the heart of reinforcement learning how do we choose that action and those are really key to right now where reinforcement learning is um in today's uh technology is uh figuring this out how do we reward it and how do we guess the next best action so we have our uh environment and you can see the environment
is we're going to be or the state uh which is kind of like what's going on we're going to return the state depending on what happens and we want to go ahead and create our agent uh in this CL our player so each one is let me go and grab that and so we look at a class player um this is where a lot of the magic is really going on is what how is this player figuring out how to maneuver around the board and then the board of course returns a state uh that it
can look at and a reward uh so we want to take a look at this we have a name uh self State this is class player and when you say class player we're not talking about a human player we're talking about um just a uh the computer players and this is kind of interesting so remember I told you depending on what you're doing there's going to be a Decay gamma um explor rate uh these are what I'm talking about is how do we train it um as you try different moves that gets to the end
the first move is important but it's not as important as the last one and so you could say that U the last one has the heaviest weight and then as you as you get there the first one let see the first move gives you a five reward the second gives you a two reward and the third one gives you a 10 reward because that's the final ending you got it the 10's going to count more than the first step uh and here's our uh we're going to you know get the board information coming in and
then choose an action this was the second part that I was talking about that was so important uh so once you have your training going on we have to do a little Randomness and you can see right here is our NP random U uniform so it's picking out a random number take a random action this is going to just pick which row and which column it is um and so choosing the action this one you can see we're just doing random States um Choice length of positions action position and then it skips in there and
takes a look at the board uh for p and positions you it's actually storing the different boards each time you go through so it it has a record of what it did so it can properly weigh the values and this simply just depins a hash date what's the last date pin it to the uh um to our states on here here's our feedback reward so the reward comes in and it's going to take a look at this and say is it none uh what is the reward and here is that formula remember I was telling
you about up here um that was important because it has decay gamma times the reward this is where as it goes through each step and this is really important this is this is kind of the heart of this of what I was talking about earlier uh you have step one and this might have a a reward of two you have step two I should probably should have done ABC this has a step three uh step four so on till you get to step in and this might have a reward of 10 uh so reward of
10 we're going to add that but we're not adding uh let's say this one right here uh let's say this reward here right before 10 was um let's say it's also 10 that just makes the the math easy so we had 10 and 10 we had 10 this is 10 and 10 n whatever it is but it's time it's 0.9 uh so instead of putting a full 10 here we only do 9 that's a .9 time 10 and so this formula um as far as the Decay times the reward minus the cell State value uh
it basically adds in it says here's one or here's two I'm sorry I should have done this ABC it would have been easier uh so the first move goes in here and it puts two in here uh then we have ourself uh setup on here you can see how this gets pretty complicated in the math but this is really the key is how do we train our states and we want the the final State the win to get the most points if you win you get most points um and the first step gets the least
amount of points so you're really training this almost in Reverse you're training you're training it from the last place where you have like it says okay this is now where I need to sum up my rewards and I want to sum them up going in reverse and I want to find the answer in Reverse kind of an interesting uh uh play on the mind when you're trying to figure this stuff out and of course we want to go ahead and reset the board down here uh save the policy load policy these are the different things
that are going in between the agent and the state to figure out what's going on let's go ahead and load that up and then finally we want to go ahead and create a human player and the human player is going to be a little different uh in that uh you choose an action row and column here's your action uh if action is if action in positions meaning positions that are available uh you return the action if not it just keeps asking you until you get the action that actually works and then we're going to go
ahead and append to the hash state which uh we don't need to worry about because it Returns the action up here and feed forward uh again this is because it's a human um at the end of the game bat propagate and update State values this part isn't being done because it's not programming uh the model uh the model is getting its own rewards so we've gone ahead and loaded this in here uh so here's all our pieces and the first thing we want to do is set up uh P1 player one uh P2 player two
and then we're going to send our player players to our state so now it has P1 P2 and it's going to play and it's going to play 50,000 rounds now we can probably do a lot less than this and it's not going to get the full results in fact you know what uh let's go ahead and just do five um just to play with it because I want to show you something here oops somewhere in there I forgot to load something there we go I must have start forgot to run this run oops forgot a
reference there for the board rows and columns 3x3 U there is actually in the state it references that we just tack it on on the end it was supposed to be at the beginning uh so now I've only set this up with um see where are we going here I've only set this up to train five times and the reason I did that is we're going to uh come in and actually play it and then I'm going to change that and we can see how it differs on there there we go and I didn't you
make it through a run and we're going to go ahead and save the policy um so now we have our player one and our player two policy uh the way we set it up it has two separate policies loaded up in there and then we're going to come in here and we're going to do uh player one is going to be the computer experience rate zero load policy one human player human and we're going to go ahead and play this I remember I only went through it um uh just one round of training in fact
minimal training and so it puts an X there and I'm going to go ahead and do row zero column one you can see this is very uh basic on here and so I put in my zero and then I'm going to go zero block it zero zero and you can see right here it let me win uh just like that I was able to win zero two and wo human winds so I only trained it five times we're going to run this again and this time uh instead of five let's do 5,000 or 50,000 I
think that's what the guys in the back had and this takes a while to train it this is where reinforcement learning really falls apart look how simple this game is we're talking about uh 3x3 set of columns and so for me to train it on this um I could do a q table which would take which would go much quicker um you could build a quick Q table with almost all the different options on there and uh you would probably get a the same result much quicker we're just using this as an example so when
we look at reinforcement learning you need to be very careful what you apply it to it sounds like a good deal until you do like a large neural network where you're doing um you set the neural network to a learning increment of one so every time it goes through it learns and then you do your action so you pick from the learning uh setup and you actually try actions on the learning setup until you get the what you think is going to be the best action so you actually feed what you think is right back
through the neural network there's a whole layer there which is really fun to play with and then it has an output well think of all those processes I mean and that is just a huge amount of work it's going to do uh let's go ahead and Skip ahead here give it a moment it's going to take a a minute or two to go ahead and run now to train it uh we went ahead and let it run and it took a while this this took um I got a pretty powerful processor and it took about
five minutes plus to run it and we'll go ahead and uh run our player setup on here oops I brought in the last whoops I brought in the last round so give me just a moment to redo the policy save there we go I forgot to save the policy back in there and then go ahead and run our player again so we we've saved the policy and then we want to go ahead and load the policy for P1 as a computer and we can see the computer's gone in the bottom right corner I'm going to
go ahead and go uh 1 one which is the center and it's gone right up the the top and if you have ever played tic tactoe you know the computer has me uh but we'll go ahead and play it out row zero column two there it is and then it's gone here and so I'm going to go ahead and go row 01 two no 01 there we go and column zero that's where I wanted oh and it says I okay you your action there we go boom uh so you can see here we've got a
didn't catch the win on this said tie um kind of funny they didn't catch the win on there but if we play this a bunch of times you'll find it's going to win more and more the more we train it the more the reinforcement happens this lengthy training process uh is really the stopper on reinforcement learning as this changes reinforcement learning will be one of the more powerful uh packages evolving over the next decade or two in fact I would even go as far as to say it is the most important uh machine learning tool
and artificial intelligence tool out there as it learns not only a simple Tic Tac toeboard but we start learning environments and the environment would be like in language if you're translating a language or something from one language to the other so much of it is lost if you don't know the context it's in what's the environments it's in and so being able to attach environment and context and all those things together is going to require reinforcement learning to do so again and if you want to get a copy of the Tic Tac Toe board it's
kind of fun to play with uh run it you can test it out you can do um you know test it for different uh uh values you can switch from P1 computer uh where we loaded the policy one to load the policy 2 and just see how it varies there's all kinds of things you can do on there so what is q-learning Q learning is reinforcement learning policy which will fill the next best action given a current state it chooses this action at random and aims to maximize the reward board and so you can see
here's our standard reinforcement learning graph um by now if you're doing any reinforcement learning you should be familiar with this where you have your agent your agent takes an action the action affects the environment and then the environment sends back the reward or the feedback and the state the new state the agents in where's it at on the chessboard where's it at in the video game um if your robots out there picking trash up off the side of the road where is it at on the road consider an ad recommendation system usually when you look
up a product online you get ads which will suggest the same product over and over again using Q learning we can make an ad recommendation system which will suggest related products to our previous purchase the reward will be if user clicks on the suggested product and again you can see um you might have a lot of products on uh your web advertisement or your pages but it's still not a float number it's still a set number and that's something to be aware of when you're using Q learning and you can see here that if you
have a 100 people clicking on ads and you click on one of the ads it might go in there and say okay this person clicked on this ad what is the best set of ads based on clicking on this ad or these two ads afterwards based on where they are browsing so let's go and take a look at some important terms when we talk about Q learning uh we have States the state as represents the current position of an agent in an environment um the action the action a is the step taken by the agent
when it is particular State rewards for every action the agent will get a positive or negative [Music] reward and again uh when we talk about States we're usually not with when you're using a q table you're not usually talking about float variables you're talking about true false um and we'll take a closer look at that in a second and episodes when an agent ends up in a terminating State and can't take a new action uh this might be if you're playing a video game your character stepped in and is now dead or whatever uh Q
values used to determine how good an action a taken at a particular State s is QA of s and temporal difference a formula used to find the Q value by using the value of the current state and action and previous State and action and very I mean there's bellman's equation which basically is the equation that kind of uh covers what we just looked at in all those different terms the Bellman equation is used to determine the values of a particular State and deduce how good it is to B in take that state the optimal the
optimal state will give us the highest optimal value Factor influencing Q values the current state and action that's your saay so your current state in your action uh then you have your previous date in action which is your s um I guess Prime I'm not sure how they how they reference that S Prime a prime so this is what happened before uh then you have a reward for Action so you have your R reward and you have your maximum expected future reward and you can see there's also a learning rate put in there and a
discount rate uh so we're looking at these just like any other model we don't want to have an absolute um final value on here we don't want it to if you do absolute values instead of taking smaller steps you don't really have that approach to the solution you just have it jump and then pretty soon if you jump one solution out that's what's going to be the new solution whichever one jumps up really high first um kind of ruining the whole idea of doing a random selection and I'll go into the random selection in just
a second steps in Q learning step one create an initial Q table with all values initialized to zero again we're looking at 01 uh so are you you know here's our action we start we're an idol we took a wrong action we took a correct action and int and then we have our um actions fetching sitting and running of course we're just using the dog example and choose an action and perform it update values in the table and of course when we're choosing an action we're going to kind of do something random and just randomly
pick one so you start out and you sit and you have then a um then depending on that um um action you took you can now update the value for sitting after you start from start to sitting get the value of the reward and calculate the Val the value Q value using the Bellman equation and so now we attach a reward to sitting and when we attach all those rewards we continue the same until the table filled with or an episode ends and and M's going to come back to the random side of this and
there's a few different formulas they use for the random um setup to pick it I usually let whatever Q model I'm using do their standard one because someone's usually gone in and done the math uh for the optimal uh spread uh but you can look at this if I have running has a reward of 10 sitting has a reward of seven fetching has a reward of five um just kind of without doing like a a a means you know using the bell curve for the means value and like I said there's some math you can
put in there to pick um so that you're more like so that running has even a higher chance um but even if you were just going to do an average on this you could do an average a random number by adding them all together uh so you get 10 plus 7 plus 5 is 22 you could do 0 to 22 and or 0 to 21 but 1 to 22 one to five would be fetching uh and so forth you know the last 10 so you can just look at this as what percentage are you going
to go for that particular option um and then that gets your random setup in there and then as you slowly increment these up uh you see that uh if you're Idol um where's one here we go sitting at the end if you're at the end of wherever you're at sitting gets a reward of one um where's the good one on here oh wrong action running for a wrong action gets almost no reward so that becomes very very less likely to happen but it still might happen it still might have a percentage of coming up and
that's where the random programming and Q learning comes in the below table gives us an idea of how many times an action has been taken and how positively correct action or negatively wrong action it is going to affect the next state so let's go ahead and dive in and pull up a little piece of code and see what this looks like um in Python uh in this demo we'll will use Q learning to find the shortest path between two given points if getting your learning started is half the battle what if you could do that
for free visit scaleup by simply learn click on the link in the description to know more if you've seen my videos before um I like to do it in the uh Anaconda Jupiter notebook um setup just because it's really easy to see and it's a nice demo uh and so here's my anaconda this one I'm actually using uh python 36 environment that I set up in here and we'll go ahead and launch The jupyter Notebook on this and once in our Jupiter notebook uh which has the kernel loaded with Python 3 we'll go ahead and
create a new Python 3 uh folder in here and we'll call this uh Q learning and to start this demo let's go ahead and import our numpy array we'll just run that so it's import it and like a lot of these uh model programs when you're building them you spend a lot of time putting it all together um and then you end up with this really short answer at the end uh and we'll we'll take a look at that as we come into it so we we go ahead and start with our location to State
uh so we have um L1 L2 these are our nine locations one to nine and then of course the state is going to be 0 1 2 3 4 it's just a mapping of our location to a integer on there there and then we have our actions our actions are simply uh moving from um One location to another so I can go to I can go to location zero I can go to location 1 2 3 4 5 6 7 8 uh so these are my actions I can choose these are the locations of our
state and if you remember earlier I mentioned uh um that the limitation is that you you don't want to put in um continually growing table because you can actually create a dynamic Q table where you continually add in new values as they arise because um if you have float values this just becomes infinite and then your memory in your computer's gone or you know does it's not going to work at the same time you might think well that kind of really limits the the Q uh T learning setup but there are ways to use it
in conjunction with other systems and so you might look at uh well I do um been doing some work in stock um and one of the questions that comes out is to buy or sell the stock and the stake coming in might be um you might take it and create what we call buckets um where anything that you predict is going to return more than a certain amount of money um the error for that stock that you've had in the past you put those in buckets and suddenly you start putting the creating these buckets you
realize you do have a limited amount of information coming in you no longer have a float number you know have um bucket one two three and four and then you can take those buckets put them through a q learning table and come up with the best action which stock should I buy it's like gambling stock is pretty much gambling if you're doing day trading you're not doing long-term um Investments and so you can start looking at it like that a lot of the um current feed say that the best algorithms used for day Traders where
you're doing it on your own is really to ask the question do I want to trade the stock yes or no and now you have it in a q learning table and now you can take it to that next level and you can see where that can be a really powerful tool at the end of doing a basic linear regression model or something um what is the best investment and you start getting the best reward on there uh and so if we're going to have rewards these rewards we just create um it says uh if
basically if you're um this should match our Q table because it's going to be uh you have your state and you have your action across the top if you remember from the dog and so we have whatever state we're in going down and then the next action and what the reward is for it um and of course if you were actually doing a um something more connected your reward would be based on U the actual environment it's in and then we want to go ahead and create a state to location uh so we can map
the indexes so just like we defined our rewards uh we're going to go and do state to location um and you can see here it's a a dictionary setup for location State and location to state with items and we also need to um Define what we want for learning rates uh you remember we had our two different rates um as far as as like learning from the past and learning from the current so we'll go ahead and set those to uh 75 and the alpha set to 0.9 and we'll see that when we do the
formula and of course any of this code uh send a note to our simply learn team they'll get you a copy of this code on here let's go ahead and pull there we go the new next two sections um since we're going to keep it short and sweet here we go so let's go ahead and create our agent um so our agent is going to have our initialization where we send it all the information uh we'll Define our self gamma equals gamma we could have just set the gamma rate down here instead of uh submitting
it it's kind of nice to keep them separate because you can play with these numbers uh our self Alpha um then we have our location State we'll set that in here um we have our choice of actions um we're going to go ahead and just embed the rewards right into the agent so obviously this would be coming from somewhere else uh instead of from uh self-generated and then a self state to location equals our state to location uh dictionary and we go ahead and create a q learning table and I went ahead and just set
the Q learning table up to um uh 0o to zero what what what the setup is uh location to State how many of them are there uh this just creates an array of 0 to zero setup on there and then the big part is the training we have our rewards new equals a copy of self. rewards ending State equals the self location state in location so this is whatever we end up at rewards new equals ending State plus ending State equals 999 just kind of goes to a dead end and we start going through iterations
and we'll go ahead um let's do this uh so this we're going to come back and we're going to call call it on here uh let me just erase that switch it to an arrow there we go uh so what we're doing is we're going to send in here to train it we're going to say hey um I want to iterate through this a thousand times and see what happens now this part would actually be instead of iterating you might have your external environment and they're going back and forth and you iterate through outside of
here uh but just for ease of use our agent's going to come in here and iterate through this sometimes I'll put this iteration in here and I'll have it call the environment and say hey this is what I did what's the next state and the environment does its thing right in here as I iterate through it uh and then we want to go ahead and pick a random state to start with that's what's going on here you have to start somewhere um and then you have your playable actions we're going to start with just an
empty thing for playable actions and we'll fill that up so that's what choices I have and so we're going to iterate through the rewards Matrix to get the states uh directly reachable from the randomly chosen current state assign those states to a list named playable actions and so you can see here we have uh range nine I usually use length of whatever I'm looking at uh which is our locations or States as they are uh we have a reward so we want to look at the current the rewards uh the new reward is our uh
is in our chart here of rewards underscore new uh current state um plus J uh J being what is the next state we want to try and so we go aad and do our playable actions and we append J and so we're doing is we're randomly trying different things in here to see what's going to generate a better reward and then of course we go ahead and choose our next date uh so we have our random Choice playable actions and if you remember I mentioned on this let me just go ahead and uh oops let's
do a free form when we were talking about the next State uh this right here just does a random selection instead of a random uh selection you might do something where uh whatever the best selection is which might be option three here and then so you can see that it might use a bell curve and then option two over here might have a bell curve like this oops and we start looking at these averages and these spreads um or we can just add them all together and pick the one that kind of goes in all
of those uh so those are some of the options we have in here we just go with a random Choice uh that's usually where you start play with it um and then we have our reward section down here and so we want to go ahead and find well in this case a temporal difference uh so you have your rewards new plus the self gamma and this is the formula we were looking at this is bellman's equation here uh so we have our current value our learning rate our discount rate involved in there the reward system
coming in for that um and we can add it all together this is of course our U maximum expected future setup in here uh so this is all of our our bellman's equation that we're looking at here and then we come up in here and we update our Q table that's all this is on this one that's right here we have um self Q current state next state and we add in our um Alpha because we don't want to we don't want to train all of it at once in case there's slight differ Ines coming
in there we want to slowly approach the answer uh and then we have our route equals the start location and next location equals start location so we're just incrementing we took a step forward and then finally remember I was telling you how uh we're going to do all this and just have some simple thing at the end where it just generates a simple path we're going to go ahead and and get the optimal route we want to find the best route in here and so we've created a definition for the optimal route down here scroll
down for that and we get the optimal route we go ahead and put the information in including the Q table self uh start location in location next location route q and it says while next location is not equal to in location so while we can still go our start location equals self location to State start location so we already have our best value for the start location uh the next state looks at the Q table and says hey what's uh the next one with the best value and then the next location we go ahead and
pull that in and we just append it that's what's going on down here and then our start location equals the next location and we just go through all the steps and we'll go ahead and run on this and now that we have our Q table our um Q agent loaded we're going to go ahead and uh take our Q agent load them up with our Alpha Gamma that we set up above um along with the location step action reward state to location and uh our goal is to plot a course between L9 and L1 and
we're going to go through a 100 a thousand iterations on here and so when I run that it runs pretty quick uh why is this so fast um if youve been running neural networks and you've been doing all these other models you sit here and wait a long time well we're a very small amount of data these are all integers these aren't float values there's not a the math is not heavy on the on the processing end and this is where Q tables are so powerful if you have a small amount of information coming in
you very quickly uh get an answer off of this even though we went through it a thousand times to train it and you'll see here we have L9 85 2 and 1 and that's based on our reward table we had set up on there and this is the shortest path going between these different uh setups in here and if you remember on our reward table uh you can see that if you start here you can go to here there's places you can't go that's how this reward table was set up so I can only go
to certain places uh so kind of a little maze setup in there and you can play with it this is really fun uh setup to play with uh and you can see how you can take this whole code and you can like I was saying earlier you can embed it into another setup in model and predictions where you put things into buckets and you're trying to guess the best investment the best course of action long as you can take that course of action and and uh uh reduce it down to a yes no um or
if you're using text you can use a one hot encoder which word is next there's all kinds of things you can do with a Q table uh depending on just how much information you're putting in there so that wraps up our demo in this demo we've uh found the shortest distance between two paths based on whatever rules or state rewards we have to get from point A to point B and what available actions there are hello and welcome to this tutorial on deep learning my name is moan and in the next about 1 one and
a half hours I will take you through what is deep learning and into tensor flow environment to show you an example of deep learning now there are several applications of deep learning really very interesting and Innovative applications and one of them is identifying the geographic location based on a picture and how does this work the way it works is pretty much we train an artificial neural network with millions of images which are tagged their geolocation is tagged and then when we feed a new picture it will be able to identify the geolocation of this new
image for example you have all these images especially with maybe some significant monuments or or U significant locations and you train with millions of such images and then when you feed another image it may not be exactly one of those that you have claimed it can be completely different that is the whole idea of training it will be able to recognize for example that this is a picture from Paris because it is able to recognize the eiel tar so the way it works internally if we have to look a little bit under the H is
these images are nothing but this is digital information in the form of pixels so each image could be a certain size it can be 256x 256 pixel kind of a resolution and then each pixel is either having a certain grade of color and all that is fed in into the neural network and it then gets trained in and it's able to based on these pixels pixel information it is able to get trained and able to recognize the features and extract the features and thereby it is able to identify these images and the location of these
images and then when you feed a new image it kind of based on the training it will be able to figure out where this image is from so that's the way a little bit under the hood how it works so what are we going to do in this tutorial we will see what is deep learning and what do we need for deep learning and one of the main components of deep learning is neural network so we will see what is neural network what is a perceptron and how to implement logic gates like and or n
and so on using perceptrons the different types of neural networks and then applications of deep learning and we will also see how neural networks works so how do we do the training of neural networks and at the end we will end up with a small demo code which will take you through intens of flow now in order to implement deep learning code there are multiple libraries or development environments that are available and tensor flow is one of them so the focus at the end of this would be on how to use tensor flow to write
a piece of code using python as a programming language and we will take up a an example which is a very common one which is like the hollow world of deep learning the handwriting number recognition which is a mnist commonly known as Mist database so we will take a look at Mist database and how we can train a neural network to recognize handwritten numbers so that's what you will see in this particular video so let's get started what is deep learning deep learning is like a subset of what is known as a high level concept
called artificial intelligence you must be already familiar must have heard about this term artificial intelligence so artificial intelligence is like the high level concept if you will and in order to implement artificial intelligence applications we use what is known as machine learning and within machine learning a subset of machine learning is deep learning machine learning is is a little bit more generic concept and deep learning is one type of machine learning if you will and we will see a little later in maybe the following slides a little bit more in detail how deep learning is
different from traditional machine learning but to start with we can mention here that deep learning uses one of the differentiators between deep learning and traditional machine learning is that deep learning uses neural networks and we will talk about what are neural networks and how we can Implement neural networks and so on and so forth as a part of this tutorial so a little deeper into deep learning deep learning primarily involves working with complicated unstructured data compared to traditional machine learning with where we normally use structured data in deep learning the data would be primarily images
or Voice or maybe text file so and it is large amount of data as well and deep learning can handle complex operations it involves complex operations and the other difference between traditional machine learning and deep learning is that the feature extraction happens pretty much automatically in traditional machine learning feature engineering is done manually the data scientists we data scientists have to do feature engineering feature extraction but in deep learning that happens automatically and of course deep learning for large amounts of data complicated unstructured data deep learning gives very good performance now as I mentioned one
of the secret sources of deep learning is neural networks let's see what neural networks is neural networks is based on our biological neurons the whole concept of deep learning and artificial intelligence is based on human brain and human brain consists of billions of tiny stuff called neurons and this is how a biological neuron looks and this is how an artificial neuron looks so neural networks is like a simulation of our human brain human brain has billions of biological neurons and we are trying to simulate the human brain using artificial neurons this is how a biological
neuron looks it has tend Dres and the corresponding component with an artificial neural network is or an artificial neuron are the inputs they receive the inputs through ddes and then there is the cell nucleus which is basically the processing unit in a way so in artificial neuron also there is a piece which is an equivalent of this cell nucleus and based on the weights and biases we will see what exactly weights and biases are as we move the input gets processed and that results in an output in a biological neuron the output is sent through
a synapse and in an artificial neuron there is an equivalent of that in the form of an output and biological neurons are also interconnected so there are billions of neurons which are interconnected in the same way artificial neurons are also interconnected so this output of this neuron will be fed as an input to another neuron and so on now in neural network one of the very basic units is a perceptron so what is a perceptron A perceptron can be considered as one of the fundamental units of neural networks it can consist at least one neuron
but sometimes it can be more than one neuron but you can create a perceptron with a single neuron and it can be used to perform certain functions it can be used as a basic binary classifier it can be trained to do some basic binary classification and this is how a basic perceptron looks like and and this is nothing but a neuron you have inputs X1 X2 X to xn and there is a summation function and then there is what is known as an activation function and based on this input what is known as the weighted
sum the activation function either gets gives an output like a zero or a one so we say the neuron is either activated or not so that's the way it works so you get the inputs these inputs are each of the inputs are multiplied by by a weight and there is a bias that gets added and that whole thing is fed to an activation function and then that results in an output and if the output is correct it is accepted if it is wrong if there is an error then that error is fed back and the
neuron then adjust the weights and biases to give a new output and so on and so forth so that's what is known as the training process of a neuron or a neural network there's a concept called perceptron learning so perceptron learning is again one of the very basic learning processes the way it works is somewhat like this so you have all these inputs like X1 to xn and each of these inputs is multiplied by a weight and then that sum this is the formula or the equation so that sum w i XI Sigma of that
which is the sum of all these product of X and W is added up and then a bias is added to that the bias is not dependent on the input but or the input values but the bias is common for one neuron however the bias value keeps changing during the training process once the training is completed the values of these weights W1 W2 and so on and the value of the bias gets fixed so that is basically the whole training process and that is what is known as the perceptron training so the weights and biases
keep changing till you get the accurate output and the summation is of course passed through the activation function as you see here this wixi summation plus b is passed through activation function and then the neuron gets either fired or not and based on that there will be an output that output is compared with the actual or expected value which is also known as labeled information this is the process of supervised learning so so the output is already known and um that is compared and thereby we know if there is an error or not and if
there is an error the error is fed back and the weights and biases are updated accordingly till the error is reduced to the minimum so this iterative process is known as perceptron learning or perceptron learning Rule and this error needs to be minimized so till the error is minimized this iterate atively the weights and biases keep changing and that is what is the training process so the whole idea is to update the weights and the bias of the perceptron till the error is minimized the error need not be zero error may not ever reach zero
but the idea is to keep changing these weights and bias so that the error is minimum the minimum possible that it can have so this whole process is an iterative process and this is the iteration continues till either the error is zero which is uh unlikely situation or it is the minimum possible Within These given conditions now in 1943 two scientists Warren mik and Walter Pitts came up with an experiment where they were able to implement the logical functions like and or and nor using neurons and that was a significant breakthrough in a sense so
they were able to come up with the most common logical Gates they were able to implement some of the most common logical Gates which could take two inputs Like A and B and then give a corresponding result so for example in case of an and gate A and B and then the output is a in case of an orgate it is a plus b and so on and so forth and they were able to do this using a single layer perceptron now most of these GS it was possible to use single layer perceptron except for
XR and we will see why that is in a little bit so this is how an endgate works the inputs A and B the output should be fired or the neuron should be fired only when both the inputs are one so if you have 0 0 the output should be zero for 0 1 it is again 0 1 0 again 0 and 1 one the output should be one so how do we implement this with a neuron so it was found that by changing the values of Weights it is possible to achieve this logic so
for example if we have equal weights like 7.7 and then if we take the sum of the weighted product so for example 7 into 0 and then 7 into 0 will give you 0o and so on and so forth and in the last case when both the inputs are one you get a value which is greater than one which is the threshold so only in this case the neuron gets activated and the output is there is an output in all the other cases there is no output because the threshold value is one so this is
implementation of an and gate using a single perceptron or a single neuron similarly an orgate in order to implement an orgate in case of an orgate the output will be one if either of these inputs is one so for example 01 will result in one or other in all the cases it is one except for 0 0 so how do we implement this using a perceptron once again if you have a perceptron with weights for example 1.2 now if you see here if in the first case when both are zero the output is zero in
the second case when it is 0 and 1 1.2 into 0 is 0 and then 1.2 into 1 is 1 and in the second case similarly the output is 1.2 in the last case when both the inputs are one the output is 2.4 so during the training process these weights will keep changing and then at one point where the weights are equal to W1 is equal to 1.2 and W2 is equal to 1.2 the system learns that it gives the correct output so that is implementation of orgate using a single neuron or a single layer
perceptron now exor gate this was one of the challenging ones they tried to implement an exr gate with a single level perceptron but it was not possible and therefore in order to implement an XR so this was like a a roadblock in the progress of U neural network however subsequently they realize that this can be implemented an XR gate can be implemented using a multi-level perceptron or MLP so in this case there are two layers instead of a single layer and this is how you can Implement an XR gate so you will see that X1
and X2 are the inputs and there is a hidden layer and that's why it is denoted as H3 and H4 and then you take the output of that and feed it to the output at 05 and provide a threshold here so we will see here that this is the numerical calculation so the weights are in this case for X1 it is 20 and minus 20 and once again 20 and minus 20 so these inputs are fed into H3 and H4 so you'll see here for H3 the input is is 0111 and for H4 it is
1 1 1 and if you now look at the output final output where the threshold is taken as one if we use a sigmoid with the threshold one you will see that in these two cases it is zero and in the last two cases it is one so this is a implementation of XR in case of XR only when one of the inputs is one you will get an output so that is what we are seeing here if we have either both the inputs are one or both the inputs are zero then the output should
be zero so that is what is an exclusive orgate so it is exclusive because only one of the inputs should be one and then only you'll get an output of one which is Satisfied by this condition so this is a special implementation XR gate is a special implementation of perceptron now that we got a good idea about perceptron let's take a look at what is the neural network so we have seen what is a perceptron we have seen seen what is a neuron so we will see what exactly is a neural network so neural network
is nothing but a network of these neurons and there are different types of neural networks there are about five of them these are artificial neural network convolutional neural network then recursive neural network or recurrent neural network deep neural network and deep belief Network so and each of these types of neural networks have a special you know they can solve special kind of problems for example convolutional neural networks are very good at performing image processing and image recognition and so on whereas RNN are very good for speech recognition and also text analysis and so on so
each type has some special characteristics and they can they're good at performing certain special kind of tasks what are some of the applications of deep learning deep learning is today us used extensively in gaming you must have heard about alphao which is a game created by a startup called Deep Mind which got acquired by Google and alphago is an AI which defeated the human world champion leas at all in this game of Go so gaming is an area where deep learning is being extensively used and a lot of research happens in the area of gaming
as well in addition to that nowadays there are neural networks or special type called generative adversarial networks which can be used for synthesizing either images or music or text and so on and they can be used to compose music so the neural network can be trained to compose a certain kind of music and autonomous cars you must be familiar with Google Google's self-driving car and today a lot of Automotive companies are investing ing in this space and uh deep learning is a core component of this autonomous Cars the cars are trained to recognize for example
the road the the lane markings on the road signals any objects that are in front any obstruction and so on and so forth so all this involves deep learning so that's another major application and robots we have seen several robots including Sofia you may be familiar with sopia who was given a citizenship by Saudi Arabia and there are several such robots which are very humanlike and the underlying technology in many of these robots is deep learning medical Diagnostics and Health Care is another major area where deep learning is being used and within Healthcare Diagnostics again
there are multiple areas where deep learning and image recognition image processing can be used for example for cancer detection as you may be aware if cancer is detected early on it can be cured and one of the challenges is in the availability of Specialists who can diagnose cancer using these diagnostic images and various scans and and so on and so forth so the idea is to train neural network to perform some of these activities so that the load on the cancer specialist doctors or oncologist comes down and there is a lot of research happening here
and there are already quite a few applications that are claimed to be performing better than human beings in this space can be lung cancer it can be breast cancer and so on and so forth so Healthcare is a major area where deep learning is being applied let's take a look at the inner working of a neural network so how does an artificial neural network let's say identify can we train a neural network to to identify the shapes like squares and circles and triangles when these images are fed so this is how it works any image
is nothing but it is a digital information of the pixels so in this particular case let's say this is an image of 28 by 28 pixel and this is an image of a square there's a certain way in which the pixels are lit up and so these pixels have a certain value maybe from 0 to to 256 and 0er indicates that it is black or it is dark and 256 indicates it is completely it is white or lit up so that is like an indication or a measure of the how the pixels are lit up
and so this is an image is let's say consisting of information of 784 pixels so all the information what is inside this image can be kind of compressed into this 784 pixels the way each of these pixels is lit up provides information about what exactly is the image so we can train neural networks to use that information and identify the images so let's take a look how this works so each neuron the value if it is close to one that means it is white whereas if it is close to zero that means it is black
now this is a an animation of how this whole thing works so these pixels one of the ways of doing it is we can flatten this image and take this complete 784 pixels and feed that as input to our neural network the neural network can consist of probably several layers there can be a few hidden layers and then there is an input layer and an output layer now the input layer take the 784 pixels as input the values of each of these pixels and then you get an output which can be of three types or
three classes one can be a square a circle or a triangle now during the training process there will be initially obviously you feed this image and it will probably say it's a circle or it will say it's a triangle so as a part of the training process we then send that error back and the weights and the biases of these neurons are adjusted till it correctly identifies that this is a square that is the whole training mechanism that happens out here now let's take a look at a circle same way so you feed these 784
pixels there is a certain pattern in which the pixels are lit up and the neural network is trained to identify that pattern and during the training process once again it would probably initially identify it incorrectly saying this is a square or a triangle and then that error is fed back and the weights and biases are adjusted finally till it finally gets the image correct so that is the training process so now we will take a look at same way a triangle so now if you feed another image which is consisting of triangles so this is
the training process now we have trained our neural network to classify these images into a triangle or a circle and a square so now this neural network can identify these three types of objects now if you feed another image and it will be able to identify whether it's a square or a triangle or a circle now what is important to be observed is that when you feed a new image it is not necessary that the image or the the triang is exactly in this position now the neural network actually identifies the patterns so even if
the triangle is let's say positioned here not exactly in the middle but maybe at the corner or in the side it would still identify that it is a triangle and that is the whole idea behind pattern recognition so how does this straining process work this is a quick view of how the training process works so we have seen that a neuron cons of inputs it receives inputs and then there is a weighted sum which is nothing but this XI wi summation of that plus the bias and this is then fed to the activation function and
that in turn gives us a output now during the training process initially obviously when you feed these images when you send maybe a square it will identify it as a triangle and when you maybe feed a triangle it will identify as a square and so on so that error information is fed back and initially these weights can be random maybe all of them have zero values and then it will slowly keep changing so the as a part of the training process the values of these weights W1 W2 up to WN keep changing in such a
way that towards the end of the training process it should be able to identify these images correctly so till then the weights are adjusted and that is known as the training process so and these weights numeric values could be 0. 52535 and so on it could be positive or it could be negative and the value that is coming here is the pixel value as we have seen it can be anything between 0 to 1 you can scale it between 0 to 1 or 0 to 256 whichever way Z being black and 256 being white and
then all the other colors in between so that is the input so these are numerical values this multiplication or the product W ixi is a numerical value and the bias is also numerical value you need to keep in mind that the bias is fixed for a neuron it doesn't change with the inputs whereas the weights are one per input so that is one important point to be noted so but the bias also keeps changing initially it will again have a random value but as a part of the training process the weights the values of the
weights W1 W2 WN and the value of B will change and ultimately once the training process is complete these values are fixed for this particular neuron W1 W2 up to WN and plus the value of the B is also fixed for this particular neuron and in this way there will be multiple neurons and each there may be multiple levels of neurons here and that's the way the training process works so this is another example of multi-layer so there are two hidden layers in between and then you have the input layer values coming from the input
layer then it goes through multiple layers hidden layers and then there is an output layer and as you can see there are weights and biases for each of these neurons in each layer and all of them gets keeps changing during the training process and at the end of the training process all these weights have a certain value and that is a trained model and those values will be fixed once the training is completed all right then there is something known as activation function neural networks consists of one of the components in neural networks is activation
function and every neuron has an activation function and there are different types of activation functions that are used it could be a relu it could be sigmoid and so on and so forth and the activation function is what decides whether a neuron should be fired or not so whether the output should be zero or one is decided by the activation function and the activation function in turn takes the input which is the weighted sum remember remember we talked about wixi + B that weighted sum is fed as a input to the activation function and then
the output can be either a zero or a one and there are different types of activation functions which are covered in an earlier video you might want to watch all right so as a part of the training process we feed the inputs the labeled data or the training data and then it gives an output which is the predicted output by the network which we indicate as y hat and then there is a labeled data because we for supervised learning we already know what should be the output so that is the actual output and in the
initial process before the training is complete obviously there will be error so that is measured by what is known as a cost function so difference between the predicted output and the actual output is the error and U the cost function can be defined in different ways there are different types of cost functions so in this case it is like the average of the squares of the error so and then all the errors are added which can sometimes be called as sum of squares some of square errors or ssse and that is then fed as a
feedback in what is known as backward propagation or back propagation and that helps in the network adjusting the weights and biases and so the weights and biases get updated till this value the error value or the cost function is minimum now there is a optimization technique which is used here called gradient descent optimization and this algorithm Works in a way that the error which is the cost function needs to be minimized so there's a lot of mathematics that goes behind this for example they find the uh local Minima the global Minima using the differentiation and
so on and so forth but the idea is this so as a trainer process as the as the part of training the whole idea is to bring down the error which is like let's say this is the function the cost function at certain levels it is very high the cost value of the cost function the output of the cost function is very high so the weights have to be adjusted in such a way and also the bias of course that the cost function is minimized so there is this optimization technique called gradient desent that is
used and this is known as the learning rate now gradient desent you need to specify what should be the learning rate and the learning rate should be optimal because if you have a very high learning rate then the optimization will not converge because at some point it will cross over to the site on the other hand if you have very low learning rate then it might take forever to convert so you need to come up with the the optimum value of the learning rate and once that is done using the gradient descent optimization the error
function is reduced and that's like the end of the training process all right so this is another view of gradient descent so this is how it looks this is your cost function the output of the cost function and that has to be minimized using gradient descent algorithm and these are like the parameters and weight could be one of them so initially we start with certain random values so cost will be high and then the weights keep changing and in such a way that the cost function needs to come down and at some point it may
reach the minimum value and then it may increase so that is where the gradient desent algorithm decides that okay it has reached the minimum value and it will kind of try to stay here this is known as the global Minima now sometimes these curves may not be just for explanation purpose has been drawn in a nice way but sometimes these curves can be pretty erratic there can be some local Minima here and then there is a peak and then and so on so the whole idea of gradient descent optimization is to identify the global Minima
and to find the weights and the bias at that particular point so that's what is gradient descent and then this is another example so you can have these multiple local Minima so as you can see at this this point when it is coming down it may appear like this is a minimum value but then it is not this is actually the global minimum value and the gradient desent algorithm will make an effort to reach this level and not get stuck at this point so the algorithm is already there and it knows how to identify this
Global minimum and that's what it does during the training process now in order to implement deep learning there are multiple platforms and languages that are available but the most common platform nowadays is tensor flow and so that's the reason we have uh this tutorial we have created this tutorial for tensorflow so we will take you through a quick demo of how to write a tensorflow code using Python and tensorflow is uh an open source platform created by Google so let's just take a look at the details of tensorflow and so this is a a library
a python Library so you can use python or any other language it's also supported in other languages like Java and R and so on but python is the most common language that is used so it is a library for developing deep learning applications especially using neural networks and it consists of primarily two parts if you will so one is the tensors and then the other is the graphs or the flow that's the way the name that's the reason for this kind of a name called tensorflow so what are tensor tensors are like multi-dimensional arrays if
you will that's one way of looking at it so usually you have a onedimensional array so first of all you can have what is known as a scalar which means a number and then you have a onedimensional array something like this which means this is like a set of numbers so that is a onedimensional array then you can have a two-dimensional array which is like a matrix and beyond that sometimes it gets difficult so this is a three-dimensional array but tens of flow can handle many more Dimensions so it can have multi-dimensional arrays that is
the strength of tensor flow and which makes computation deep learning computation much faster and that's the reason why tensor flow is used for developing deep learning applications so tensorflow is a deep learning tool and this is the way it works so the data basically flows in the form of tensors and the way the programming works as well is that you just create a graph of how to execute it and then you actually execute that particular graph in the form of what is known as a session we will see this in the tensorflow code as we
move forward so all the data is managed or manipulated in tensors and then the processing happens using this graphs there are certain terms called like for example ranks of a tensor the rank of a tensor is like a dimensional dimensionality in a way so for example example if it is scalar so there is just a number just one number the rank is supposed to be zero and then it can be a one-dimensional vector in which case the rank is supposed to be one and then you can have a two-dimensional Vector typically like a matrix then
in that case we say the rank is two and then if it is a three-dimensional array then it rank is three and so on so it can have more than three as well so it is possible that you can store multi-dimensional arrays in the form of tensors so what are some of the properties of tensorflow I think today it is one of the most popular platform tensorflow is the most popular deep learning platform or Library it is open source it's developed by Google developed and maintained by Google but it is open source one of the
most important things about tensorflow is that it can run on CPUs as well as gpus GPU is a graphical Processing Unit just like CPU is central processing unit now in earlier days GPU was used for primarily for graphics and that's how the name has come and one of the reasons is that it cannot perform generic activities very efficiently like CPU but it can perform iterative actions or computations extremely fast and much faster than a CPU so they are really good for computational activities and in deep learning there is a lot of iterative computation that happens
so in the form of matrix multiplication and so on so gpus are very well suited for this kind of computation and tensor flow supports both GPU as well as CPU and there's a certain way of writing code in tensorflow we will see as we go into the code and of course tensorflow can be used for traditional machine learning as well but then that would be an Overkill but just for understanding it may be a good idea to start writing code for a normal machine learning use case so that you get a hang of how tensorflow
code works and then you can move into neural networks so that is um just a suggestion but if you're already familiar with how tensorflow works then probably yeah you can go straight into the neural networks part so in this tutorial we will take the use case of recognizing handwritten digits this is like a hollow word world of deep learning and this is a nice little Ms database is a nice little database that has images of handwritten digits nicely formatted because very often in deep learning and neural networks we end up spending a lot of time
in preparing the data for training and with amness database we can avoid that you already have the data in the right format which can be directly used for training and and amnest also offers a bunch of built-in utility functions that we can straight away use and call those functions without worrying about writing our own functions and that's one of the reasons why mes database is very popular for training purposes initially when people want to learn about deep learning and tensor flow this is the database that is used and it has a collection of 70,000 handwritten
digits and a large part of them are for training then you have test just like in any machine learning process and then you have validation and all of them are labeled so you have the images and they label and these images they look somewhat like this so they are handwritten images collected from a lot of individuals people have these are samples written by human beings they have handwritten these numbers these numbers going from 0 to 9 so people have have written these numbers and then the images of those have been taken and formatted in such
a way that it is very easy to handle so that is amness database and the way we are going to implement this in our tensor flow is we will feed this data especially the training data along with the label information and uh the data is basically these images are stored in the form of the pixel information as we have seen in one of the previous slides all the images are nothing but these are pixels so an image is nothing but an arrangement of pixels and the value of the pixel either it is lit up or
it is not or in somewhere in between that's how the images are stored and that is how they are fed into the neural network and for training once the network is trained when you provide a new image it will be able to identify within a certain error of course and for this we will use one of the simpler neural network configurations called softmax and for Simplicity what we will do is we will flatten these pixels so instead of taking them in a two-dimensional arrangement we just flatten them out so for example it starts from here
it is a 28 by 28 so there are 7484 pixels so pixel number one starts here it goes all the way up to 28 then 29 starts here and goes up to 56 and so on and the pixel number 784 is here so we take all these pixels flatten them out and feed them like one single line into our neural network and this is a what is known as a softmax layer what it does is once it is trained it will be able to identify what digit this is so there are in this output layer
there are 10 neurons each signifying a digit and at any given point of time when you feed an image only one of these 10 neurons gets activated so for example if this is stained properly and if you feed a number nine like this then this particular neuron gets activated so you get an output from this neuron let me just use uh a pen or a laser to show you here okay so you're feeding the number nine let's say this has been trained and now if you're feeding a number nine this will get activated now let's
say you feed one to the trained Network then this neuron will get activated if you feed two this neuron will get activated and so on I hope you get the idea so this is one type of a neural network or an activation function known as softmax layer so that's what we will be using here this is one of the simpler ones for quick and easy understanding so this is how the code would look we will go into our lab environment in the cloud and uh we will show you there directly but very quickly this is
how the code looks and uh let me run you through briefly here and then we will go into the Jupiter notebook where the actual code is and we will run that as well so as a first step first of all we are using python here and that's why the syntax of the language is Python and the first step step is to import the tensor flow Library so and we do this by using this line of code saying import tensor flow as TF TF is just for convenience so you can name give any name and once
you do this TF is tens flow is available as an object in the name of TF and then you can run its uh methods and accesses its attributes and so on and so forth and M database is actually an integral part of tensorflow and that's again another reason why we as a first step we always use this example mest database example so you just simply import mest database as well using this line of code and you slightly modify this so that the labels are in this format what is known as one hot true which means
that the label information is stored like an array and uh let me just uh use the pen to show what exactly it is so when you do this one H true what happens is each label is stored in the form of an array of 10 digits and let's say the number is uh 8 okay so in this case all the remaining values there will be a bunch of zeros so this is like array at position zero this is at position one position two and so on and so forth let's say this is position seven then
this is position 8 that will be one because because our input is eight and again position 9 will be zero okay so one hot encoding this one hot encoding true will kind of load the data in such a way that the labels are in such a way that only one of the digits has a value of one and that indicates So based on which digit is one we know what is the label so in this case the eighth position is one therefore we know this sample data the value is eight similarly if you have a
two here let's say then the labeled information will be somewhat like this so you have your labels so you have this as zero the zeroth position the first position is also zero the second position is one because this indicates number two and then you have third as zero and so on okay so that is the significance of this one hot true all right and then we can check how the data is uh looking by displaying the the data and as I mentioned earlier this is pretty much in the form of digital form like numbers so
all these are like pixel values so you will not really see an image in this format but there is a way to visualize that image I will show you in a bit and uh this tells you how many images are there in each set so the training there are 55,000 images in training and in the test set there are 10,000 and then validation there are 5,000 so altoe there are 70,000 images all right so let's uh move on and we can view the actual image by uh using the matplot flip library and this is how
you can view this is the code for viewing the images and you can view them in color or you can view them in Gray scale so the cmap is what tells in what way we want to view it and what are the maximum values and the minimum Val Val of the pixel values so these are the Max and minimum values so of the pixel values so maximum is one because this is a scaled value so one means it is uh White and zero means it is black and in between is it can be anywhere in
between black and white and the way to train the model there is a certain way in which you write your tensorflow code and um the first step is to create some play placeholders and then you create a model in this case we will use the softmax model one of the simplest ones and um placeholders are primarily to get the data from outside into the neural network so this is a very common mechanism that is used and uh then of course you will have variables which are your remember these are your weights and biases so for
in our case there are 10 neurons and uh each neuron actually has 784 because each each neuron takes all the inputs if we go back to our slide here actually every neuron takes all the 784 inputs right this is the first neuron it has it receives all the 784 this is the second neuron this also receives all the 78 so each of these inputs needs to be multiplied with a weight and that's what we are talking about here so these are this is a a matrix of 784 values for each of the neurons and uh
so it is like a 10x 784 Matrix because there are 10 neurons and uh similarly there are biases now remember I mentioned biases only one per neuron so it is not one per input unlike the weights so therefore there are only 10 biases because there are only 10 neurons in this case so that is what we are creating a variable for biases so this is uh something little new in flow you will see unlike our regular programming languages where everything is a variable here the variables can be of three different types you have placeholders which
are primarily used for feeding data you have variables which can change during the course of computation and then a third type which is not shown here are constants so these are like fixed numbers all right so in a regular programming language you may have everything as variables or at the most variables and constants but in tens oflow you have three different types placeholders variables and constants and then you create what is known as a graph so tens oflow programming consists of graphs and tensors as I mentioned earlier so this can be considered ultimately as a
tensor and then the graph tells how to execute the whole implementation so that the execution is stored in the form of a graph and in this case what we are doing is we are doing a multiplication TF you remember this TF was created as a tensorflow object here one more level one more so TF is available here now tensorflow has what is known as a matrix multiplication or matal function so that is what is being used here in this case so we are using the matrix multiplication of tensor flow so that you multiply your input
values x with W right this is what we were doing x w + B you're just adding B and this is in very similar to one of the earlier slides where we saw Sigma XI wi so that's what we are doing here matrix multiplication is multiplying all the input values with the corresponding weights and then adding the bias so that is the graph we created and then we need to Define what is our loss function and what is our Optimizer so in this case we again use the tensor flows apis so tf. nn softmax cross
entropy with logits is the uh API that we will use and reduce mean is what is like the mechanism whereby which says that you reduce the error and Optimizer for doing deduction of the error what Optimizer are we using so we are using gradient descent Optimizer we discussed about this in couple of slides uh earlier and for that you need to specify the learning rate you remember we saw that there was a a slide somewhat like this and then you define what should be the learning rate how fast you need to come down that is
the learning rate and this again needs to be tested and tried and to find out the optimum level of this learning rate it shouldn't be very high in which case it will not converge or shouldn't be very low because it will in that case it will take very long so you define the optimizer and then you call the method minimize for that optimizer and that will Kickstart the training process and so far we've been creating the graph and in order to actually execute that graph we create what is known as a session and then we
run that session and once the training is completed we specify how many times how many iterations we want it to run so for example in this case we are saying Thousand Steps so that is a exit strategy in a way so you specify the exit condition so a training will run for thousand iterations and once that is done we can then evaluate the model using some of the techniques shown here so let us get into the code quickly and see how it works so this is our Cloud environment now you can install tensor flow on
your local machine as well I'm showing this demo on our existing Cloud but you can also install tensor flow on your local machine and uh there is a separate video on how to set up your flow environment you can watch that if you want to install your local environment or you can go for other any cloud service like for example Google Cloud Amazon or Cloud Labs any of these you can use and U run and try the code okay so it has got started we will log in all right so this is a our deep
learning tutorial uh code and uh this is our tensorflow environment and and uh so let's get started the first we have seen a little bit of a code walk through uh in the slides as well now you will see the actual code in action so the first thing we need to do is import tensorflow and then we will import the data and we need to adjust the data in such a way that the one hot is is encoding is set to True one hot encoding right as I explained earlier so in this case the label
values will be shown appropriately and if we just check what is the type of the data so you can see that this is a data sets python data sets and if we check the number of images the way it looks so this is how it looks it is an array of type float 32 similarly the number if you want to see what is the number of training images there are 55,000 then there are test images 10,000 and then validation images 5,000 now let's take a quick look at the data itself visualization so we will use
um matte plot lip for this and um if we take a look at the shape now shape gives us like the dimension of the tensors or or or the arrays if you will so in this case the training data set if we see the size of the training data set using the method shape it says there are 55,000 and 55,000 by 784 so remember the 784 is nothing but the 28 by 28 28 into 28 so that is equal to 784 so that's what it is uh showing now we can take just uh one image
and just see what is the the first image and see what is the shape so again size obviously it is only 784 similarly you can look at the image itself the data of the first image itself so this is how it it shows so large part of it will probably be zeros because as you can imagine in the image only certain areas are written rest is uh black so that's why you will mostly see zeros either it is black or white but then there are these values are so the values are actually they are scaled
so the values are between 0er and one okay so this is what you're seeing so certain locations there are some values and then other locations there are zeros so that is how the data is stored and loaded if we want to actually see what is the value of the handwritten image if you want to view it this is how you view it so you create like do this reshape and um matplot lib has this um feature to show you these images so we will actually use the function called um IM show and then if you
pass this parameters appropriately you will be able to see the different images now I can change the values in this position so which image we are looking at right so we can say if I want to see what is there in maybe 5,000 right so 5,000 has three similarly you can just say five what is in five five as 8 what is in [Music] 50 again eight so basically by the way if you're wondering uh how I'm executing this code shift enter in case you're not familiar with Jupiter notebooks shift enter is how you execute
each cell individual cell and if you want to execute the entire program you can go here and say run all so that is how this code gets executed and uh here again we can check what is the maximum value and what is the minimum value of this pixel values as I mentioned this is it is scaled so therefore it is between the values lie between 1 and zero now this is where we create our model the first thing is to create the required placeholders and variables and that's what we are doing here as we have
seen in the slides so we create one placeholder and we create two variables which is for the weights and biases these two variables are actually matrices so each variable has 784 by 10 values okay so one for this 10 is for each neuron there are 10 neurons and 784 is for the pixel values inputs that are given which is 28 into 28 and the biases as I mentioned one for each neuron so there will be 10 biases they are stored in a variable by the name b and this is the graph which is basically the
multiplication of these matrix multiplication of X into W and then the bias is added for each of the neurons and the whole idea is to minimize the error so let me just execute I think this code is executed then we Define what is our the Y value is basically the label value so this is another placeholder we had X as one placeholder and Yore true as a second placeholder and this will have values in the form of uh 10 digit 10 digit uh arrays and uh since we said one hot encoded the position which has
a one value indicates what is the label for that particular number all right then we have cross entropy which is nothing but the loss loss function and we have the optimizer we have chosen gradient descent as our Optimizer then the training process itself so the training process is nothing but to minimize the cross entropy which is again nothing but the loss function so we Define all of this in the form of a graph so the up to here remember what we have done is we have not exactly executed any tensorflow code till now we are
just preparing the graph the execution plan that's how the tensorflow code works so the whole structure and format of this code will be completely different from how we normally do programming so even with people with programming experience may find this a little difficult to understand it and it needs quite a bit of practice so you may want to view this uh video also maybe a couple of times to understand this flow because the way t flow programming is done is slightly different from the normal programming some of you who let's say have done uh maybe
spark programming to some extent will be able to easily understand this uh but even in spark the the programming the code itself is pretty straightforward behind the scenes the execution happens slightly differently but in tens of flow even the code has to be written in a completely different way so the code doesn't get executed uh in the same way as you have written so that that's something you need to understand and a little bit of practice is needed for this so so far what we have done up to here is creating the variables and feeding
the variables and um or rather not feeding but setting up the variables and uh the graph that's all defining maybe the uh what kind of a network you want to use for example we want to use softmax and so on so you have created the way variables have to load the data loaded the data viewed the data and prepared everything but you have not yet executed anything in tens of flow now the next step is the execution in tens of flow so the first step for doing any execution in tensorflow is to initialize the variables
so anytime you have any variables defined in your code you have to run this piece of code always so you need to basically create what is known as a a node for initializing so this is a node you still are not yet executing anything here you just created a node for the initialization so let us go ahead and create that and here onwards is where you will actually execute your code uh in tensor flow and in order to execute the code what you will need is a session tensor flow session so tf. session will give
you a session and there are a couple of different ways in which you can do this but one of the most common methods of doing this is with what is known as a withd loop so you have with tf. session as says and with a colon here and this is like a block starting of the block and these indentations tell how far this block goes and this session is valid till this block gets executed so that is the purpose of creating creating this width block this is known as a width block so with tf. session
as SS you say cs. run in it now ss. run will execute a node that is specified here so for example here we are saying cs. run cess is basically an instance of the session right so here we are saying tf. session so an instance of the session gets created and we are calling that sess and then we run a node within that one of the nodes in the graph so one of the nodes here is in it so we say run that particular node and that is when the initialization of the variables happens now
what this does is if you have any variables in your code in our case we have W is a variable and B is a variable so any variables that we created you have to run this code you have to run the initialization of these variables otherwise you will get an error okay so that is the that's what this is doing then we within this width block we specify a for Loop and we are saying we want the system to iterate for thousand steps and perform the training that's what this for Loop does run training for
thousand iterations and what it is doing basically is it is fetching the data or these images remember there are about 50,000 images but it cannot get all the images in one shot because it will take up a lot of memory and performance issues will be there so this is a very common way of Performing deep learning training you always do in batches so we have maybe 50,000 images but you always do it in batches of 100 or maybe 500 depending on the size of your system and so on and so forth so in this case
we are saying okay get me 100 uh images at a time and get me only the training images remember we use only the training data for training purpose and then we use test data for test purpose you must be familiar with machine learning so you must be aware of this but in case you are not in machine learning also not this is not specific to deep learning but in machine learning in general you have what is known as training data set and test data set your available data typically you will be splitting into two parts
and using the training data set for training purpose and then to see how well the model has been trained you use the test data set to check or test the validity or the accuracy of the model so that's what we are doing here and You observe here that we are actually calling an mest function here so we are saying Mist train. nextt batch right so this is the advantage of using Mist database because because they have provided some very nice helper functions which are readily available otherwise this activity itself we would have had to write
a piece of code to fetch this data in batches that itself is a a lengthy exercise so we can avoid all that if we are using amness database and that's why we use this for the initial learning phase okay so when we say fetch what it will do is it will fetch the images into X and the labels into Y and then you use this batch of 100 images and you run the training so cs. run basically what we are doing here is we are running the training mechanism which is nothing but it passes this
through the neural network passes the images through the neural network finds out what is the output and if the output obviously the initially it will be wrong so all that feedback is given back to the neural network and thereby all the w and Bs get updated till it reaches th000 iterations in this case the exit criteria is th000 but you can also specify probably accuracy rate or something like that for the as an exit criteria so here it is it it just says that okay this particular image was wrongly predicted so you need to update
your weights and biases that's the feedback given to each neuron and that is run for thousand iterations and typically by at the end of this thousand iterations the model would have learned to recognize these handwritten images obviously it will not be 100% accurate okay so once that is done after so this happens for thousand iterations once that is done you then test the accuracy of these models by using the test data set right so this is what we are trying to do here the code may appear a little complicated because if you're seeing this for
the first time you need to understand uh the various methods of tensor flow and so on but it is basically comparing the output with what has been what is actually there that's all it is doing so you have your test data and uh you're trying to find out what is the actual value and what is the predicted value and seeing whether they are equal or not TF do equal right and how many of them are correct and so on and so forth and based on that the accuracy is uh calculated as well so this is
the accuracy and uh that is what we are trying to see how accurate the model is in predicting these uh numbers or these digits okay so let us run this this entire thing is in one cell so we will have to just run it in one shot it may take a little while let us see and uh not bad so it has finished the thousand iterations and what we see here as an output is the accuracy so we see that the accuracy of this model is around 91% okay now which is pretty good for such
a short exercise within such a short time we got 90% accuracy however in real life this is probably not sufficient so there are other ways in to increase the accuracy we will see probably in some of the later tutorials how to improve this accuracy how to change maybe the hyper parameters like number of neurons or number of layers and so on and so forth and uh so that this accuracy can be increased Beyond 90 bu welcome to the RNN tutorial that's the recurrent neural network so we talk about a feed forward neural network in a
feed forward neural network information flows only in the forward direction from the input nodes through the hidden layers if any and to the output nodes there are no Cycles or Loops in the network and so you can see here we have our input layer I was talking about how it just goes straightforward into the hidden layers so each one of those connects and then connects to the next hidden layer it connects to the output layer and of course we have a nice simplified version where it has a predicted output they refer to the input as
x a lot of times and the output as y decisions are based on current input no memory about the past no future scope why recurrent neural network issues in feed forward neural network so one of the biggest issues is because it doesn't have a scope of memory or time a feed forward neural network doesn't know how to handle quential data uh it only considers only the current input so if you have a series of things and because three points back affects what's happening now and what your output affects what's happening that's very important so whatever
I put as an output is going to affect the next one um a feed forward doesn't look at any of that it just looks at this is what's coming in and it cannot memorize previous inputs so it doesn't have that list of inputs coming in solution to feed forward neural network you'll see here where it says recurrent neural network and we have our X on the bottom going to H going to y right that's your feed forward uh but right in the middle it has a value C so it's a whole another process it's memorizing
what's going on in the hidden layers and the hidden layers as they produce data feed into the next one so your hidden layer might have an output that goes off to Y uh but that output goes back into the next prediction coming in what this does is this allows it to handle sequential data it considers the current input and also the previously received inputs and if we're going to look at General draw and um Solutions we should also look at applications of the RNN image captioning RNN is used to caption an image by analyzing the
activities present in it a dog catching a ball in midair uh that's very tough I mean you know we have a lot of stuff that analyzes images of a dog and the image of a ball but it's able to add one more feature in there that's actually catching the ball in midair time series prediction any time series problem like predicting the prices of stocks in a particular month can be solved using r and in and we'll dive into that in our use case and actually take a look at some stock one of the things you
should know about analyzing stock today is that it is very difficult and if you're analyzing the whole stock the stock market at the New York Stock Exchange in the US produces somewhere in the neighborhood if you count all the individual trades and fluctuations by the second um it's like three terabytes a day of data so we're only to look at one stock just analyzing One stock is really tricky in here we'll give you a little jump on that so that's exciting but don't expect to get rich off of it immediately another application of the RNN
is natural language processing text Mining and sentiment analysis can be carried out using RNN for natural language processing and you can see right here the term natural language processing when you stream those three words together is very different than I if I said processing language natural leave so the time series is very important when we're analizing sentiments it can change the whole value of a sentence just by switching the words around or if you're just counting the words you might get one sentiment where if you actually look at the order there in you get a
completely different sentiment when it rains look for rainbows when it's dark look for stars both of these are positive sentiments and they're based upon the order of which the sentence is going in machine translation given an input in one language RNN can be used to translate the input into a different languages as output I myself very linguistically challenged but if you study languages and you're good with languages you know right away that if you're speaking English you would say big cat and if you're speaking Spanish you would say cat big so that translation is really
important to get the right order to get uh there's all kinds of parts of speech that are important to know by the order of the words here this person is speaking in English and getting translated and you can see here a person is speaking in English in this little diagram I guess that's denoted by the flags I have a flag I own it no um but they're speaking in English and it's getting translated into Chinese Italian French German and Spanish languages some of the tools coming out are just so cool so somebody like myself who's
very linguistically challenged I can now travel into Worlds I would never think of because I can have something translate my English back and forth readily and I'm not stuck with a communication gap so let's dive into what is a recurrent neural network recurrent neural network works on the principle of saving the output of a layer and feeding this back to the input in order to predict the output of the layer sounds a little confusing when we start breaking it down it'll make more sense and usually we have a propagation forward neural network with the input
layers the hidden layers the output layer with the recurrent neural network we turn that on its side so here it is and now our X comes up from the bottom into the hidden layers into Y and they usually draw very simplified X to H with c as a loop a to Y where a B and C are the perimeters a lot of times you'll see this kind of drawing in here digging closer and closer into the the H and how it works going from left to right you'll see that the C goes in and then
the X goes in so the x is going Upward Bound and C is going to the right a is going out and C is also going out that's where gets a little confusing so here we have xn uh CN and then we have y out and C out and C is based on HT minus one so our value is based on the Y and the H value are connected to each other they're not necessarily the same value because H can be its own thing and usually we draw this or we represent it as a function
h of T equals a function of C where H of T minus one that's the last H output and x a t going in so it's the last output of H combined with the new input of x uh where HT is the new state FC is a function with the parameter C that's a common way of denoting it uh HT minus one is the Old State coming out and then xit T is an input Vector at time of Step T well we need to cover types of recurrent neural networks and so the first one is
the most common one which is a onetoone single output one: one neural network is usually known as a vanilla neural network used for regular machine learning problems why because vanilla is usually considered kind of a just a real basic flavor but because it's a very basic a lot of times they'll call it the vanilla neural network uh which is not the common term but it is you know the kind of a slang term people will know what you're talking about usually if you say that then we run one to mini so you have a single
input and you might have a multiple outputs in this case uh image captioning as we looked at earlier where we have not just looking at it as a dog but a dog catching a ball in the air and then you have many to1 Network takes in a sequence of inputs examples sentiment analysis where a given sentence can be classified as expressing positive or negative sentiments and we looked at that as we were discussing if it rains look for a rainbow so positive sentiment where rain might be a negative sentiment if you're just adding up the
words in there and then of course if you're going to do a one to one many to one one to many there's many to many networks takes in a sequence of inputs and generates a sequence of outputs example machine translation so we have a lengthy sentence coming in in English and then going out in all the different languages uh you know just a wonderful tool very complicated set of computations you know if you're a translator you realize just how difficult it is to translate into different languages one of the biggest things you need to understand
when we're working with this network is what's called The Vanishing gradient problem while training an RNN your slope can be either too small or very large and this makes training difficult when the slope is too small the problem is known as Vanishing gradient and you'll see here they have a nice U image loss of information through time so if you're pushing not enough information forward that information is lost and then when you go to train it you start losing the third word in the sentence or something like that or it doesn't quite follow the full
whole logic of what you're working on exploding gradient problem Oh this is one that runs into everybody when you're working with this particular neural network when the slope tends to grow exponentially instead of decaying this problem is called exploding gradient issues in gradient problem Long training time poor performance bad accuracy and I'll add one more in there uh your computer if you're on a lowerin computer testing out a model will lock up and give you the memory error explaining gradient problem consider the following two examples to understand what should be the next word in the
sequence the person who took my bike and blank a thief the students who got into engineering with blank from Asia and you can see in here we have our x value going in we have the previous value going forward and then you back propagate the error like you do with any neural network and as we're looking for that missing word maybe we'll have the person took my bike and blank was a thief and the student who got into engineering with a blank were from Asia consider the following example the person who took the bike so
we'll go back to the person who took the bike was blank a thief in order to understand what would be the next word in the sequence the RNN must memorize the previous context whether the subject was singular noun or a plural noun so was a thief is singular the student who got into engineering well in order to understand what would be the next word in the sequence the RNN must memorize the previous context whether the subject was singular noun or a plural noun and so you can see see here the students who got into engineering
with blank were from Asia it might be sometimes difficult for the era to back propagate to the beginning of the sequence to predict what should be the output so when you run into the gradient problem we need a solution the solution to the gradient problem first we're going to look at exploding gradient where we have three different solutions depending on what's going on one is identity initialization so the first thing we want to do is see if we can find a way to minimize identities coming in instead of having it identify everything just the important
information we're looking at next is to truncate the back propagation so instead of having uh whatever information it's sending to the next series we can truncate what it's sending we can lower that particular uh set of layers make those smaller and finally is a gradient clipping so when we're training it we can clip what that gradient looks like and narrow the training model that we're using when you have a Vanishing gradient the OPA problem we can take a look at weight initialization very similar to the identity but we're going to add more weights in there
so it can identify different aspects of what's coming in better choosing the right activation function that's huge so we might be activating based on one thing and we need to limit that we haven't talked too much about activation functions so we'll look at that just minimally uh there's a lot of choices out there and then finally there's long short-term memory networks the lstms and we can make adjustments to that so just just like we can clip the gradient as it comes out we can also um expand on that we can increase the memory Network the
size of it so it handles more information and one of the most common problems in today's uh setup is what they call longterm dependencies suppose we try to predict the last word in the text the clouds are in the and you probably said sky here we do not need any further context it's pretty clear that the last word is going to be Sky suppose we try to predict the last word in the text I have have been staying in Spain for the last 10 years I can speak fluent maybe you said Portuguese or French no
you probably said Spanish the word we predict will depend on the previous few words in context here we need the context of Spain to predict the last word in the text it's possible that the gap between the relevant information and the point where it is needed to become very large lstms help us solve this problem so the lstms are a special kind of recurrent neural network capable of learning long-term dependencies remembering information for long periods of time is their default Behavior All recurrent neural networks have the form of a chain of repeating modules of neural
network connections in standard rnns this repeating module will have a very simple structure such as a single tangent H layer lstm s's also have a chain-like structure but the repeating module has a different structure instead of having a single neural network layer there are four interacting layers communicating in a very special way lstms are a special kind of recurrent neural network capable of learning long-term dependencies remembering information for long periods of time is their default Behavior LST tms's also have a chain-like structure but the repeating module has a different structure instead of having a single
neural network layer there are four interacting layers communicating in a very special way as you can see the deeper we dig into this the more complicated the graphs kit in here I want you to note that you have exit T minus one coming in you have X of T coming in and you have x a t + one and you have H of T minus one and H of T coming in and H of t + one going out and of course uh on the other side is the output a um in the middle we
have our tangent H but it occurs in two different places so not only when we're Computing the x of t + one are we getting the tangent H from X of T but we're also getting that value coming in from the X of T minus one so the short of it is as you look at these layers not only does it does the propagate through the first layer goes into the second layer back into itself but it's also going into the third layer so now we're kind of stacking those up and this can get very
complicated as you grow that in size it also grows in memory too and in the amount of resources it takes uh but it's a very powerful tool to help us address the problem of complicated long sequential information coming in like we were just looking at in the sentence and when we're looking at our long shortterm memory network uh there's three steps of processing assessing in the lstms that we look at the first one is we want to forget irrelevant parts of the previous state you know a lot of times like you know is as in
a unless we're trying to look at whether it's a plural noun or not they don't really play a huge part in the language so we want to get rid of them then selectively update cell State values so we only want to update the cell State values that reflect what we're working on and finally we want to put only output certain parts of the cell state so whatever is coming out we want to limit what's going out too and let's dig a little deeper into this let's just see what this really looks like uh so step
one decides how much of the past it should remember first step in the lstm is to decide which information to be omitted in from the cell in that particular time step it is decided by the sigmoid function it looks at the previous state h of T minus one and the current input X of T and computes the function so you can see over here we have a function of T equals the sigmoid function of the weight of f the H at T minus one and then X of t plus of course you have a bias
in there with any of our neural network so we have a bias function so F of T equals forget gate decides which information to delete that is not important from the previous time step considering an L STM is fed with the following inputs from the previous and present time step Alice is good in physics JN on the other hand is good in chemistry so previous output John plays football well he told me yesterday over the phone that he had served as a captain of his college football team that's our current input so as we look
at this the first step is the forget gate realizes there might be a change in context after encountering the First full stop Compares with the current input sentence of exit so we're looking at that full stop and then Compares it with the input of the new sentence the next sentence talks about JN so the information on Alice is deleted okay that's important important to know so we have this input coming in and if we're going to continue on with John then that's going to be the primary information we're looking at the position of the subject
is vacated and is assigned to John and so in this one we've seen that we've weeded out a whole bunch of information and we're only passing information on John since that's now the new topic so step two is then to decide how much should this unit add to the current state in the second layer there are two parts one is a sigmoid function and the other is a tangent H in the sigmoid function it decides which values to let through 0er or one tangent 8 function gives the weightage to the values which are passed deciding
their level of importance minus one to 1 and you can see the two formulas that come up uh the I of T equals the sigmoid of the weight of i h of T minus one X of t plus the bias of I and the C of T equals the tangent of H of the weight of C of H of T minus 1 x of t plus the bias of C so our I of T equals the input gate determines which information to let through based on its significance in the current time step if this seems
a little complicated don't worry because a lot of the programming is already done when we get to the case study understanding though that this is part of the program is important when you're trying to figure out these what to set your settings at you should also note when you're looking at this it should have some semblance to your forward propagation neural networks where we have a value assigned to a weight plus a bias very important steps than any of the neural network layers what th we're propagating into them the information from one to the next
or we're just doing a straightforward neural network propagation let's take a quick look at this what it looks like from the human standpoint um as I step out of my suit again consider the current input at xft John plays football well he told me yesterday over the phone that he had served as a captain of his college football team that's our input input gate analyses the important information John plays football and he was a captain of his college team is important told me over the phone yesterday is less important hence it is forgotten this process
of adding some new information can be done via the input gate now this example is as a human form and we'll look at training this stuff in just a minute uh but as a human being if I wanted to get this information from a conversation maybe it's a Google Voice listening in on you or something like that um how do we weed out the information that he was talking to me on the phone yesterday well I don't want to memorize that he talked to me on the phone yesterday or maybe that is important but in
this case is not I want to know that he was the captain of the football team I want to know that he served I want to know that John plays football and he was a captain of the college football team those are the two things that I want to take away as a human being again we measure a lot of this from the human Viewpoint and that's also how we try to train them so we can understand these neural networks finally we get to step three decides what part of the current cell State makes it
to the output the third step is to decide what will be our output first we run a sigmoid layer which decides what parts of the cell State make it to the output then we put the cell State through the tangent H to push the values to be between minus one and one and multiply it by the output of the sigmoid gate so when we talk about the output of T we set that equal to the sigmoid of the weight of zero of the H of T minus one you back One Step in Time by the
x of t plus of course the bias the H of T equals the out of T times the tangent of the tangent h of C of T so our o t equals output gate allows the past in information to impact the output in the current time step let's consider the example to predicting the next word in the sentence Jon played tremendously well against the opponent and won for his team for his contributions Brave blank was awarded player of the match there could be a lot of choices for the empty space current input Brave is an
adjective adjectives describe a noun JN could be the best output after Brave thumbs up for John awarded player of the match and if you were to pull just the nouns out of the sentence team doesn't look right because that's not really the subject we're talking about contributions uh you know Brave contributions or Brave team Brave player Brave match um so you look at this and you can start to train this these this neural network so it starts looking at and goes oh no JN is what we're talking about so brave is an adjective John's going
to be the best output and we give John a big thumbs up and then of course we jump into my favorite part the case study use case implementation of lstm let's predict the prices of stocks using the lstm network based on the stock price data between 2012 2016 we're going to try to predict the stock prices of 2017 and this will be a narrow set of data we're not going to do the whole stock market it turns out that the New York Stock Exchange generates roughly three terabytes of data per day that's all the different
trades up and down of all the different stocks going on and each individual one uh second to second or nanc to nanc uh but we're going to limit that to just some very basic fundamental information so don't think you're going to get rich off this today but at least you can give an a you can give a step forward in how to start processing something like stock prices a very valid use for machine learning in today's markets use case implementation of LSM let's dive in we're going to import our libraries we're going to import the
training set and uh get the scaling going um now if you watch any of our other tutorials a lot of these pieces just start to look very familiar cuz it's very similar setup uh but let's take a look at that and um just a reminder we're going to be using Anaconda the Jupiter notebook so here I have my anaconda Navigator when we go under environments I've actually set up a carass python 36 I'm in Python 36 and U nice thing about Anaconda especially the newer version remember a year ago messing with that an aond different
versions of python and different environments um Anaconda now has a nice interface um and I have this installed both on a Ubuntu Linux machine and on windows so it works fine on there you can go in here and open a terminal window and then in here once you're in the terminal window this is where you're going to start uh installing using pip to install your different modules and everything now we've already pre-installed them so we don't need to do that in here uh but if you don't have them installed in your particular environment you'll need
to do that and of course you don't need to use the anaconda or the Jupiter you can use whatever favorite python ID you like I'm just a big fan of this because it keeps all my stuff separate you can see on this machine I have specifically installed one for carass since we're going to be working with carass under tensorflow when we go back to home I've gone up here to application and that's the environment I've loaded on here and then we'll click on the launch Jupiter notebook now I've already in my Jupiter notebook um have
set up a lot of stuff so that we're ready to go kind of like uh Martha Stuarts in the old cooking shows we want to make sure we have all our tools for you so you're not waiting for them to load and uh if we go up here to where it says new you can see where you can um create a new Python 3 that's what we did here underneath the setup so it already has all the modules installed on it and I'm actually renamed this if you go under file you can rename it we
I'm calling it RNN stock and let's just take a look at start diving into the code let's get into the exciting part now we've looked at the tool and of course you might be using a different tool which is fine uh let's start putting that code in there and seeing what those Imports and uploading everything looks like now first half is kind of boring when we hit the rum button cuz we're going to be importing numpy as NP that's uh uh the number python which is your numpy array and the mat plot library because we're
going to do some plotting at the end and our pandas for our data set our pandas is PD and when I hit run uh it really doesn't do anything except for load those modules just a quick note let me just do a quick uh draw here oops shift Al there we go you'll notice when we're doing this setup if I was to divide this up oops I'm going to actually um let's overlap these here we go uh this first part that we're going to do is our data prep a lot of prepping involved um in
fact depending on what your system and say we're using carass I put an overlap here uh but you'll find that almost maybe even half of the code we do is all about the data prep and the reason I overlap this with uh carass let me just put that down because that's what we're working in uh is because carass has like their own preset stuff so it's already pre-built in which is really nice so there's a couple Steps A lot of times that are in the Kass setup uh we'll take a look at that to see
what comes up in our code as we go through and look at stock and then the last part is to evaluate and if you're working with um shareholders or uh you know classroom whatever it is you're working with uh the evaluate is the next biggest piece um so the actual code here cross is a little bit more but when you're working with uh some of the other packages you might have like three lines that might be it all your stuff is in your pre-processing in your data since coros has is is Cutting Edge and you
load the individual layers you'll see that there's a few more lines here and cross is a little bit more robust and then you spend a lot of times uh like I said with the evaluate you want to have something you present to everybody else to say hey this is what I did this is what it looks like so let's go through those steps this is like a kind of just general overview and let's just take a look and see what the next set of code looks like and in here we have a data set train
and it's going to be read using the PD or pandas read CSV and it's a Google stock pric train.csv and so under this we have training set equals data set train. iocation and we've kind of sorted out part of that so what's going on here let's just take a look at let's let's look at the actual file and see what's going on there now if we look at this ignore all the extra files on this um I already have a train and a test set where it's sorted out this is important to notice because a
lot of times we do that as part of the pre-processing of the data we take 20% of the data out so we can test it and then we train the rest of it that's what we use to create our neur Network that way we can find out how good it is uh but let's go ahead and just take a look and see what that looks like as far as the file itself and I went ahead and just opened this up in a basic word pad text editor just so we can take a look at it
certainly you can open up an Excel or any other kind of spreadsheet um and we note that this is a comma separated variables we have a date uh open high low close volume this is a standard stuff that we import into our stock one the most basic set of information you can look at in stock it's all free to download um in this case we downloaded it from uh Google that's why we call it the Google stock price um and it specifically is Google this is the Google stock values from uh as you can see
here we started off at 13 20102 so when we look at this first setup up here uh we have a data set train equals pdor CSV and if you noticed on the original frame uh let me just go back there they had it set to home Ubuntu downloads Google stock price train I went ahead and changed that because we're in the same file where I'm running the code so I've saved this particular python code code and I don't need to go through any special paths or have the full path on there and then of course
we want to take out um certain values in here and you're going to notice that we're using um our data set and we're now in pandas uh so pandas basically it looks like a spreadsheet um and in this case we're going to do I location which is going to get specific locations the first value is going to show us that we're pulling all the rows in the data and the second one is we're only going to look at columns one and two and if you remember here from our data as we switch back on over
columns we always start with zero which is the date and we're going to be looking at open and high which would be one and two we'll just label that right there so you can see now when you go back and do this you certainly can extrapolate and do this on all the columns um but for the example let's just limit a little bit here so that we can focus on just some keyy aspects of stock and then we'll go up here and run the code and uh again I said the first half is very boring
whenever you hit the Run button it doesn't do anything cuz we're still just loading the data and setting it up now that we've loaded our data we want to go ahead and scale it we want to do what they call feature scaling and in here we're going to pull it up from the sklearn or the SK kit pre-processing import min max scaler and when you look look at this you got to remember that um biases in our data we want to get rid of that so if you have something that's like a really high value
um let's just draw a quick graph and I have something here like the maybe the stock has a value One stock has a value of a 100 and another stock has a value of five um you start to get a bias between different stocks and so when we do this we go ahead and say okay 100's going to be the Max and five is going to be the Min and then everything else goes and then we change this so we just squish it down I like the word squish so it's between one and zero so
100 = 1 or 1 equal 100 and 0 equal 5 and you can just multiply it's usually just a simple multiplication we're using uh multiplication so it's going to be uh minus5 and then 100 divided or 95 divided by 1 so or whatever value is is divided by 95 and uh once we've actually created our scale we've calling is going to be from 0 to one we want to take our training set and we're going to create a training set scaled and we're going to use our scaler SC and we're going to fit we're going
to fit and transform the training Set uh so we can now use the SC this this particular object we'll use it later on our testing set because remember we have to also scale that when we go to test our uh model and see how it works and we'll go ahead and click on the run again uh it's not going to have any output yet because we're just setting up all the variables okay so we pasted the data in here and we're going to create the data structure with the 60 time steps and output first note
we're running 60 time steps and that is where this value here also comes in so the first thing we do is we create our XT train and YT train variables we set them to an empty python array very important to remember what kind of array we're in and what we're working with and then we're going to come in here we're going to go for I in range 60 to 1258 there's our 60 60 time steps and the reason we want to do this is as we're adding the data in there there's nothing below the 60
so if we're going to use 60 time steps uh we have to start at 60 because it includes everything underneath of it otherwise you'll get a pointer error and then we're going to take our X train and we're going to append training set scaled this is a scaled value between zero and one and then as I is equal to 60 this value is going to be um 60 - 60 is zero so this actually is 0 to I so it's going to be 0 to 60 1 to 61 let me just circle this part right
here 1 to 61 uh 2 to 62 and so on and so on and if you remember I said 0 to 60 that's incorrect because it does not count remember it starts at zero so this is a count of 60 so it's actually 59 important to remember that as we're looking at this and then the second part of this that we're looking at so if you remember correctly here we go we go from uh 0 to 59 of I and then we have a comma a zero right here and so finally we're just going to
look at the open value now I know we did put it in there for one to two um if you remember correctly it doesn't count the second one so it's just the open value we're looking at just open um and then finally we have y train. append training set I to zero and if you remember correctly I to or I comma 0 if you remember correctly this is 0 to 59 so there's 60 values in it uh so when we do I down here this is number 60 so we're going to do this is we're
creating an array and we have zero to 59 and over here we have number 60 which is going into the Y train it's being appended on there and then this just goes all the way up so this is down here is a 0 to 59 and we'll call it 60 since that's the value over here and it goes all the way up to 12 12 58 that's where this value here comes in that's the length of the data we're loading so we've loaded two arrays we've loaded one array that has uh which is filled with
arrays from 0 to 59 and we loaded one array which is just the value and what we're looking at you want to think about this as a Time sequence uh here's my open open open open open open what's the next one in the series so we're looking at the Google stock and each time it opens we want to know what the next one uh 0 through 59 what's 60 one through 60 what's 61 2 through 62 what's 62 and so on and so on going up and then once we've loaded those in our for Loop
we go ahead and take XT train and Y train equals np. array XT tr. NP array y train we're just converting this back into a numpy array that way we can use all the cool tools that we get with numpy array including reshaping so if we take a look and see what's going on here we're going to take our X train we're going to reshape it what the heck does reshape mean uh that means we have an array if you remember correctly um so many numbers by 60 that's how wide it is and so we're
when you when you do xtrain do shape that gets one of the shapes and you get um XT train. shape of one gets the other shape and we're just making sure the data is formatted correctly and so you use this to pull the fact that it's 60 by um in this case where's that value 60 by 1199 1258 - 60199 and we're making sure that that is shaped correctly so the data is grouped into uh 11 99 by 60 different arrays and then the one on the end just means at the end because this when
you're dealing with shapes and numpy they look at this as layers and so the in layer needs to be one value that's like the leaf of a tree where this is the branch and then it branches out some more um and then you get the Leaf np. reshape comes from and using the existing shapes to form it we'll go ahead and run this piece of code again there's no real output and then we'll import our different carass modules that we need so from carass Models we're going to import the sequential model dealing with sequential data
we have our dense layers we have actually three layers we're going to bring in our dense our lstm which is what we're focusing on and our Dropout and we'll discuss these three layers more in just a moment but you do need the with the lstm you do need the Dropout and then the final layer will be the dents but let's go ahead and run this and that'll bring Port our modules and you'll see we get an error on here and if you read it closer it's not actually an error it's a warning what does this
warning mean these things come up all the time when you're working with such Cutting Edge modules are completely being updated all the time we're not going to worry too much about the warning all it's saying is that the h5py module which is part of caros is going to be updated at some point and uh if you're running new stuff on carass and you start updating your carass system you better make sure that your H5 Pi is updated too otherwise you're going to have an error later on and you can actually just run an update on
the H5 Pi now if you wanted to not a big deal we're not going to worry about that today and I said we were going to jump in and start looking at what those layers mean I meant that and uh we're going to start off with initializing the RNN and then we'll start adding those layers in and you'll see that we have the lstm and then the Dropout lstm then drop drop out lstm then Dropout what the heck is that doing so let's explore that we'll start by initializing the RNN regressor equals sequential because we're
using the sequential model and we'll run that and load that up and then we're going to start adding our lstm layer and some Dropout regularization and right there should be the Q Dropout regularization and if we go back here and remember our exploding gradient well that's what we're talking about the Dropout drops out unnecessary data so we're not just shifting huge amounts of data through um the network so and so we go in here let's just go ahead and uh add this in I'll go ahead and run this and we had three of them so
let me go a and put all three of them in and then we can go back over them there's the second one and let's put one more in let put that in and we'll go and put two more in I meant to I said one more in but it's actually two more in and then let's add one more after that and as you can see each time I run these they don't actually have an output so let's take a closer look and see what's going on here so we're going to add our first lstm layer
in here we're going to have units 50 the units is the positive integer and it's the dimensionality of the output space this is what's going out into the next layer so we might have 60 coming in but we have 50 going out we have a return sequence because it is a sequence data so we want to keep that true and then you have to tell it what shape it's in well we already know the shape by just going in here and looking at xtrain shape so input shape equals the XT train shape of 1 comma
1 makes it really easy easy you don't have to remember all the numbers that put in 60 or whatever else is in there you just let it tell the regressor what model to use and so we follow our STM with a Dropout layer now understanding the Dropout layer is kind of exciting because one of the things that happens is we can overtrain our Network that means that our neural network will memorize such specific data that it has trouble predicting anything that's not in that specific realm to fix for that each time we run through the
training mode we're going to take 02 or 20% of our neurons and just turn them off so we're only going to train on the other ones and it's going to be random that way each time we pass through this we don't overtrain these nodes come back in in the next training cycle we randomly pick a different 20 and finally they see a big difference as we go from the first to the second and third and fourth the first thing is we don't have to input the shape because the shape's already the output units is 50
here this Auto The Next Step automatically knows this layer is putting out 50 and because it's the next layer it automatically sets that and says oh 50 is coming out from our last layer it's coming out you know goes into the regressor and of course we have our Dropout and that's what's coming into this one and so on and so on and so the next three layers we don't have to let it know what the shape is it automatically understands that and we're going to keep the units the same we're still going to do 50
units it's still a sequence coming through 50 units and a sequence now the next piece of code is what brings it all together let's go ahead and take a look at that and we come in here we put the output layer the dense layer and if you remember up here we had the three layers we had uh lstm Dropout and den uh Den just says we're going to bring this all down into one output instead of putting out a sequence we just know want to know the answer at this point and let's go ahead and
run that and so in here you notice all we're doing is setting things up one step at a time so far we've brought in our uh way up here we brought in our data we brought in our different modules we formatted the data for training it we set it up you know we have our y x train and our y train we have our source of data and the answers we're we know so far that we're going to put in there we've reshaped that we've come in and built our carass we've imported our different layers
and we have in here if you look we have what uh five total layers now carass is a little different than a lot of other systems because a lot of other systems put this all in one line and do it automatic but they don't give you the options of how those layers interface and they don't give you the options of how the data comes in carass is Cutting Edge for this reason so even though there's a lot of extra steps in building the model this has a huge impact on the output and what we can
do with this these new models from carass so we brought in our dense we have our full model put together our regressor so we need to go ahead and compile it and then we're going to go ahead and fit the data we're going to compile the pieces so they all come together and then we're going to run our training data on there and actually recreate our regressor so it's ready to be used so let's go ahead and compile compile that and I can go ahe and run that and uh if you've been looking at any
of our other tutorials on neural networks you'll see we're going to use the optimizer atom adom is optimized for Big Data there's a couple other optimizers out there uh beyond the scope of this tutorial but certainly adom will work pretty good for this and loss equals mean squared value so when we're training it this is what we want to base the loss on how bad is our error well we're going to use the mean squared value for our error and the atom Optimizer for its differential equations you don't have to know the math behind them
but certainly it helps to know what they're doing and where they fit into the bigger models and then finally we're going to do our fit fitting the RN into the training set we have the regressor do fit XT Trin y train epics and batch size so we know where this is this is our data coming in for the X train our y train is the answer we're looking for of our data our sequential input epex is how many times we're going to go over the whole data set we created a whole data set of XT
train so this is each each of those rows which includes a Time sequence of 60 and badge size another one of those things where carass really shines is if you were pulling this save from a large file instead of trying to load it all into RAM it can now pick smaller batches up and load those indirectly we're not worried about pulling them off a file today this isn't big enough to uh cause the computer too much of a problem to run not too straining on the resources but as we run this you can imagine what
would happen if I was doing a lot more than just one column and one set a stock in this case Google stock imagine if I was doing this across all the stocks and I had instead of just the open I had open close high low and you can actually find yourself with about 13 different variables times 60 because it's a Time sequence suddenly you find yourself with a gig of memory you're loading into your RAM which will just completely you know if it's just if you're not on multiple computers or cluster you're going to start
running into resource problems but for this we don't have to worry about that so let's go ahead and run this and this will actually take a little bit on my computer cuz it's an older laptop and give it a second to kick in there there we go all right so we have epic so this is going to tell me it's running the first run through all the data and as it's going through it's batching them in 32 pieces so 32 uh lines each time and there's 1198 I think I said 1199 earlier but it's 1198
I was off by one and each one of these is 13 seconds so you can imagine this is roughly 20 to 30 minutes runtime on this computer like I said it's an older laptop running at .9 GHz on a dual processor and that's fine what we'll do is I'll go ahead and stop go get a drink of coffee and come back and let's see what happens at the end and where this takes us and like any good cooking show I've kind of got in my latte I also had some other stuff running in the background
so you'll see these numbers jumped up to like 19 seconds 15 seconds which you can scroll through and you can see we've run it through 100 steps or 100 epics so the question is what is all this mean one of the first things you'll notice is that our loss is over here it kind of stopped at 0.001 14 but you can see it kind of goes down until we hit about 0.0014 three times in a row so we guessed our epic pretty close since our loss has remain the same on there so to find out
what we're looking at we're going to go ahead and load up our test data the test data that we didn't process yet and uh real stock price data set test ey location this is the same thing we did when we prepped the data in the first place so let's go ahead and go through this code and you see we've labeled it part three making the predictions and visualizing the results so the first thing we need to do is go ahead and read the data in from our test CSV you see I've changed the path on
it for my computer and uh then we'll call it the real stock price and again we're doing just the one column here and the values from I location so it's all the rows and just the values from these that one location that's the open Stock open let's go ahead and run that so that's loaded in there and then let's go ahead and uh create we have our inputs we're going to create inputs here and this should all look famili this is the same thing we did before we're going to take our data set total we're
going to do a little Panda concap from the data State train now remember the end of the data set train is part of the data going in and let's just visualize that just a little bit here's our train data let me just put TR for train and it went up to this value here but each one of these values generated a bunch of columns it was 60 across and this value here equals this one and this value here equals this one and this value here here equals this one and so we need these top 60
to go into our new data so to find out we're looking at we're going to go ahead and load up our test data the test data that we didn't process yet and real stock price data set test iocation this is the same thing we did when we prepped the data in the first place so let's go ahead and go through this code and we can see we've labeled it part three making the predictions and visualizing the results so the first thing we need to do is go ahead and read the data in from our test
CS fee you see I've changed the path on it for my computer and uh then we'll call it the real stock price and again we're doing just the one column here and the values from ication so it's all the rows and just the values from these that one location that's the open Stock open let's go ahead and run that so that's loaded in there and then let's go ahead and uh create we have our inputs we're going to create inputs here and this should all look familiar this is the same thing we did before we're
going to take our data set total we're going to do a little Panda concat from the data State train now remember the end of the data set train is part of the data going in and let's just visualize that just a little bit here's our train data let me just put TR for train and it went up to this value here but each one of these values generated a bunch of columns it was 60 across and this value here equals this one and this value here equals this one and this value here equals this one
and so we need these top 60 to go into to our new data cuz that's part of the next data or it's actually the top 59 so that's what this first setup is over here is we're going in we're doing the real stock price and we're going to just take the data set test and we're going to load that in and then the real stock price is our data test. test location so we're just looking at that first uh column the open price and then our data set total we're going to take pandas and we're
going to concat and we're going to take our data set train for the open and our data set test open and this is one way you can reference these columns we've referenced them a couple different ways we've referenced them up here with the one two but we know it's labeled as a panda set is open so pandas is great that way lots of Versatility there and we'll go ahead and go back up here and run this there we go and uh you'll notice this is the same as what we did before we have our open
data set we pended our two different or concatenated our two data sets together we have our inputs equals data set total length data set total minus length the data set minus test minus 60 values so we're going to run this over all of them and you'll see why this works because normally when you're running your test set versus your training set you run them completely separate but when we graph this you'll see that we're just going to be we'll be looking at the part that we didn't train it with to see how well it graphs
and we have our inputs equals inputs do reshapes or reshaping like we did before we're Transforming Our inputs so if you remember from the transform between zero and one and uh finally want to go ahead and take our X test and we're going to create that X test and for I in range 60 to 80 so here's our X test and we're appending our inputs I to 60 which remember is 0 to 59 and I comma 0 on the other side so that's just the First Column which is our open column and uh once again
we take our X test we convert it to a numpy array we do the same reshape we did before and uh then we get down to the final two lines and here we have something new right here on these last two lines let me just highlight those or or mark them predicted stock price equals regressor do predicts X test so we're predicting all the stock including both the training and the testing model here and then we want to take this prediction and we want to inverse the transform so remember we put them between zero and
one well that's not going to mean very much to me to look at a at a float number between zero and one I want the dollar amounts I want to know what the cash value is and we'll go ahead and run this and you'll see it runs much quicker than the training that's what's so wonderful about these neural networks once you put them together it takes just a second to run the same neural network that took us what a half hour to train ahead and plot the data we're going to plot what we think it's
going to be and we're going to plot it against the real data what what the Google stock actually did so let's go ahead and take a look at that in code and let's uh pull this code up so we have our PLT that's our uh oh if you remember from the very beginning let me just go back up to the top we have our matplot library. pyplot as PLT that's where that comes in and we come down here we're going to plot let me get my drawing thing out again we're going to go ahead and
PLT is basically kind of like an object it's one of the things that always threw me when I'm doing graphs in Python cuz I always think you have to create an object and then it loads that class in there well in this case PLT is like a canvas you're putting stuff on so if you've done HTML 5 you'll have the canvas object this is the canvas so we're going to plot the real stock price that's what it actually is and we're going to give that color red so it's going to be a bright red we're
going to label it real Google stock price and then we're going to do our predicted stock and we're going to do it in blue and it's going to be labeled predicted and we'll give it a title because it's always nice to give a title to your uh graph especially if you're going to present this to somebody you know to your shareholders in the office and uh the X label is going to be time because it's a Time series and we didn't actually put the actual date and times on here but that's fine we just know
they're incremented by time and then of course the Y Lael is the actual stock price pt. Legend tells us to build the legend on here so that the color red and and real Google stock price show up on there and then the plot shows us that actual graph so let's go ahead and run this and see what that looks like and you can see here we have a nice graph and let's talk just a little bit about this graph before we wrap it up here's our Legend I was telling you about that's why we have
the legend to showed the prices we have our title and everything and you'll notice on the bottom we have a Time sequence we didn't put the actual time in here now we could have we could have gone ahead and um plotted the X since we know what the the dates are and plotted this to dates but we also know this only the last piece of data that we're looking at so last piece of data which ends somewhere probably around here on the graph I think it's like about 20% of the data probably less than that
we have the Google price and the Google price has this little up jump and then down and you'll see that the actual Google instead of a a turn down here just didn't go up as high and didn't low go down so our prediction has the same pattern but the overall value is pretty far off as far as um stock but then again we're only looking at one column we're only looking at the open price we're not looking at how many volumes were traded like I was pointing out earlier we talk about stock just right off
the bat there's six columns there's open high low close volume then there's WEA uh I mean volume shares then there's the adjusted open adjusted High adjusted low adjusted close they have a special formula to predict exactly what it would really be worth based on the value of the stock and then from there there's all kinds of other stuff you can put in here so we're only looking at one small aspect the opening price of the stock and as you can see here we did a pretty good job this curve follows the curve pretty well it
has like a you know little jumps on it bins they don't quite match up so this Bend here does not quite match up with that bend there but it's pretty darn close we have the basic shape of it and the prediction isn't too far off and you can imagine that as we add more data in and look at different aspects in the specific domain of stock we should be able to get a better representation each time we drill in deeper of course this took a half hour for my program my computer to train so you
can imagine that if I was running it across all those different variables might take a little bit longer to train the data not so good for doing a quick tutorial like this so we're going to dive right into what is carass we'll also uh go all the way through this into a couple of tutorials because that's where you really learn a lot is when you roll up your sleeves so we talk about what is carass carass is a highlevel deep learning API written in Python for easy Implement implementation of neural networks uses deep learning Frameworks
such as tensor flow pie torch Etc is backend to make computation faster and this is really nice because as a programmer there is so much stuff out there and it's evolving so fast it can get confusing and having some kind of high level order in there we can actually view it and easily program these different neural networks uh is really powerful it's really powerful to to um uh have something out really quick and also be able to start testing your model models and seeing where you're going so cross works by using complex deep learning Frameworks
such as tensorflow pytorch um mlpl Etc as the back end for fast computation while providing a userfriendly and easy tolearn front in and you can see here we have the carass API uh specifications and under that you'd have like TF carass for tensorflow thano carass and so on and then you have your tensor flow workflow that this is all sitting on top of and this is like I said it organizes everything the heavy lifting is still done by tensor flow or whatever you know underlying package you put in there and this is really nice because
you don't have to um dig as deeply into the heavy in stuff while still having a very robust package you can get up and running rather quickly and it doesn't distract from the processing time because all the heavy lifting is done by packages like tensor flow this is the organization on top of it so the working principle of carass uh the working principle of carass is carass uses computational graphs to express and evaluate mathematical Expressions you can see here we put them in blue they have the expression um expressing complex problems as a combination of
simple mathematical operators uh where we have like the percentage or in this case in Python that's usually your uh left your um remainder or multiplication uh you might have the operator of uh to the power of3 and it us is useful for calculating derivatives by using uh back propagation so if we're doing with neural networks when we send the error back up to figure out how to change it uh this makes it really easy to do that without really having not banging your head and having to handr write everything it's easier to implement distributed computation
and for solving complex problems uh specify input and outputs and make sure all nodes are connected and so this is is really nice as you come in through is that um as your layers are going in there you can get some very complicated uh different setups nowadays which we'll look at in just a second and this just makes it really easy to start spinning this stuff up and trying out the different models so when we look at caros models uh caros model you have a sequential model sequential model is a linear stack of layers where
the previous layer leads into the next layer and this if you've done anything else even like SK learn with their neural networks and propagation and any of these setups this should look familiar you should have your input layer it goes into your layer one layer two and then to the output layer and it's useful for simple classifier decoder models and you can see down here we have the model equals a coros sequential this is the actual code you can see how easy it is uh we have a layer that's dense your layer one as an
activation now they're using the ru in this particular example and then you have your name layer one layer dense railu name Layer Two and so forth uh and they just feed right into each other so it's really easy just to stack them as you can see here and it automatically takes care of everything else for you and then there's a functional model and this is really where things are at this is new make sure you update your carass or you'll find yourself running this um doing the functional model you'll run into an error code because
this is a fairly new release and he uses multi-input and multi-output model the complex model which Forks into two or more branches and you can see here we have our image inputs equals your carass input shape equals 32x 32x 3 you have your dense layers dense 64 activation Ru this should look similar to what you already saw before uh but if you look at the graph on the right it's going to be a lot easier to see what's going on you have two different inputs uh and one way you could think of this is maybe
one of those is a small image and one of those is a full-sized image and that feedback goes into you might feed both of them into one note because it's looking for one thing and then only into one node for the other one and so you can start to get kind of an idea that there's a lot of use for this kind of split and this kind of setup uh where we have multiple information coming in but the information's very different even though it overlaps and you don't want it to send it through the same
neural network um and they're finding that this trains faster and is also has a better result depending on how you split the date up and and how you Fork the models coming down and so in here we do have the two complex uh models coming in uh we have our image inputs which is a 32x 32 by3 or three channels or four if you're having an alpha channel uh you have your DSE your layers DSE is 64 activation using the railo very common u x equals dense inputs X layers dense x64 activation equals Rao X
outputs equals layers dense 10 X model equals coros model inputs equals inputs outputs equals outputs name equals NC model uh so we add a little name on there and again this is this kind of split here this is setting us up to um have the input go into different areas so if you're already looking at corus you probably already have this answer what are neural networks uh but it's always good to get on the same page and for those people who don't fully understand neural networks to dive into them a little bit or do a
quick overview neural networks are deep learning algorithms modeled after the human brain they use multiple neurons which are mathematical operations to break down and solve complex maical problems and so just like the neuron one neuron fires in and it fires out to all these other neurons or nodes as we call them and eventually they all come down to your output layer and you can see here we have the really standard graph input layer a hidden layer and an output layer one of the biggest parts of any data processing is your data pre-processing uh so we
always have to touch base on that with a neural network like many of these models they're kind of uh when you first start using them they're like a black box you put your data in you train it and you test it and see how good it was and you have to pre-process that data because bad data in is uh bad outputs so in data pre-processing we will create our own data examples set with carass the data consists of a clinical trial conducted on 2100 patients ranging from ages 13 to 100 with a the patients under
65 and the other half over 65 years of age we want to find the possibility of a patient experiencing side effects due to their age and you can think of this in today's world with uh covid uh what's going to happen on there and we're going to go ahead and do an example of that in our uh live Hands-On like I said most of this you really need to have handson to understand so let's go ahead and bring up our anaconda and uh I'll open that up and open up a Jupiter notebook for doing the
python code in now if you're not familiar with those you can use pretty much any of your uh setups I just like those for doing demos and uh showing people especially shareholders it really helps because it's a nice visual so let me go and flip over to our anaconda and the Anaconda has a lot of cool to tools they just added datal lore and IBM Watson Studio Cloud into the Anaconda framework but we'll be in the Jupiter lab or Jupiter notebook um I'm going to do Jupiter notebook for this because I use the lab for
like large projects with multiple pieces because it has a multiple tabs where the notebook will work fine for what we're doing and this opens up in our browser window because that's how Jupiter do sorry Jupiter notebook is set to run and we'll go under new create a a new Python 3 and uh it creates an Untitled python we'll go ahead and give this a title and we'll just call this uh cross tutorial and let's change that to Capital there we go we go and just rename that and the first thing we want to go ahead
and do is uh get some pre-processing tools involved and so we need to go ahead and import some stuff for that like our numpy do some random number Generation Um I mentioned sklearn or your s kit if you're installing sklearn the sklearn stuff it's a s kit you want to look up that should be a tool of anybody who is um doing data science if if you're not if you're not familiar with the sklearn toolkit it's huge uh but there's so many things in there that we always go back to and we want to go
ahead and create some train labels and train samples uh for training our data and then just a note of what we're we're actually doing in here uh let me go ahead and change this this is kind of a fun thing you can do we can change the code to markdown and then markdown code is nice for doing examples once you've already built this uh our example data we're going to do experimental there we go experimental drug was tested on 2100 individuals between 13 to 100 Years of of age half the participants are under 65 and
95% of participants are under 65 experience no side effects well 95% of participants over 65 um experience side effects so that's kind of where we're starting at um and this is just a real quick example because we're going to do another one with a little bit more uh complicated information uh and so we want to go ahead and generate our setup uh so we want to do for I in range and we want to go ahead and create if you look here we have random integers train the labels of pin so we're just creating some
random data uh let me go ahead and just run that and so once we've created our random data and if you if I mean you can certainly ask for a copy of the code from Simply learn they'll send you a copy of this or you can zoom in on the video and see how we went ahead and did our train samples a pin um and we're just using this I do this kind of stuff all the time I was running a thing on uh that had to do with errors following a bell-shaped curve on uh
a standard distribution error and so what do I do I generate the data on a standard distribution error to see what it looks like and how my code processes it since that was the Baseline I was looking for in this we're just doing uh uh generating random data for our setup on here and uh we could actually go in uh print some of the data up let's just do this print um we'll do [Music] train samples and we'll just do the first um five pieces of data in there to see what that looks like and
you can see the first five pieces of data in our train samples is 49 85 41 68 19 just random numbers generated in there that's all that is uh and we generated significantly more than that um let's see 50 up here 1,000 yeah so there's 1 000 here 1,000 numbers we generated and we could also if we wanted to find that out we could do a quick uh print the length of it and so or you could do a shape kind of thing and if you're using numpy although the link for this is just fine
and there we go it's actually 2100 like we said in the demo setup in there and then we want to go ahead and take our labels oh that was our train labels we o did samples didn't we uh so we could also print do the same thing oh labels uh and let's change this to labels and [Music] labels and run that just to double check and sure enough we have 2100 and they're labeled one 1 0 0 1 0 I guess that's if they have symptoms or not one symptoms uh Z none and so we
want to go ahead and take our train labels and we'll convert it into a numpy array and the same thing with our samples and let's go ahead and run that and we also Shuffle uh this is just a neat feature you can do in uh numpy right here put my drawing thing on which I didn't have on earlier um I can take the data and I can Shuffle it uh so we have our so it's it just randomizes it that's all that's doing um we've already randomized it so it's kind of an Overkill it's not
really necessary but if you're doing uh a larger package where the data is coming in and a lot of times it's organized somehow and you want to randomize it just to make sure that that you know the input doesn't follow a certain pattern uh that might create a bias in your model and we go ahe and create a scaler uh the scaler range uh minimum Max scale feature range 0 to one uh then we go ahead and scale the uh scaled train samples we're going to go ahead and fit and transform the data uh so
it's nice and scaled and that is the age uh so you can see up here we have 49 85 41 we're just moving that so it's going to be uh between zero and one and so this is true with any of your neural networks you really want to convert the data uh to zero and one otherwise you create a bias uh so if you you have like a 100 creates a bias versus the math behind it gets really complicated um if you actually start multiplying stuff because a lot of multiplication addition going on in there
that higher end value will eventually multiply down and it will have a huge bias as to how the model fits it and then it will not fit as well and then one of the fun things we can do in Jupiter notebook is that if you have a variable and you're not doing anything with it it's the last one on the line it will automatically print um and we're just going to look at the first five samples on here and so it's going to print the first five samples and you can see here we go uh
9 1 195 791 so everything's between zero and one and that just shows us that we scaled it properly and it looks good uh it really helps a lot to do these kind of print UPS halfway through uh you never know what's going to go on there I don't know how tense I've gotten down and found out that the data sent to me that I thought was scaled was not and then I have to go back and track it down and figure it out on there uh so let's go ahead and create our artificial neural
network and for doing that this is where we start diving into tensor flow and carass uh tensor flow if you don't know the history of tensor flow it helps to uh jump into we'll just use Wikipedia careful don't quote Wikipedia on these things because you get in trouble uh but it's a good place to start uh back in 2011 Google brain built disbelief as a proprietary machine learning setup tensor flow became the open source for it uh so tensorflow was a Google product and then it became uh open sourced and now it's just become probably
one of the def factos when it comes for neural network as far as where we're at uh so when you see the tensor flow setup it it's got like a huge following there are some other setups like a um the S kit under the sklearn has our own little neural network uh but the tensor flow is the most robust one out there right now and carass sitting on top of it makes it a very powerful tool so we can leverage both the carass uh easiness in which we can build a sequential setup on top of
tensor flow and so in here we're going to go ahead and do our input of tensor flow uh and then we have the rest of this is all carass here from number two down uh we're going to import from tensorflow the coros uh connection and then you have your tensor flow cross models import sequential it's a specific kind of model we'll look at that in just a second if you remember from the files that means it goes from one layer to the next layer to the next layer there's no funky splits or anything like that
uh and then we have from tensorflow Cross layers we're going to import our activation and our dense layer and we have our Optimizer atom um this is a big thing to be aware of how you optimize uh your data when you first do it atom's as good as any atom is usually uh there's a number of Optimizer out there there's about uh there's a couple main ons but atom is usually assigned to bigger data it works fine usually the lower data does it just fine but atom is probably the mostly used but there are some
more out there and depending on what you're doing with your layers your different layers might have different activations on them and then finally down here you'll see um our setup where we want to go ahead and use the metrics and we're going to use the tensorflow cross metrics um for categorical cross entropy uh so we can see how everything performs when we're done that's all that is um a lot of times you'll see us go back and forth between tensor flow and then scikit has a lot of really good metrics also for measuring these things
um again it's the end of the you know at the end of the story how good does your model do and we'll go ahead and load all that and then comes the fun part um I actually like to spend hours messing with these things and uh four lines of code you're like ah you're going to spend hours on four lines of code um no we don't spend hours on four lines of code that's not what we're talking about when I say spend hours on four lines of code uh what we have here I'm going to
explain that in just a second we have a model and it's a sequential model if you remember correctly we mentioned the sequential up here where it goes from one layer to the next and our first layer is going to be your input it's going to be uh what they call D which is um usually it's just D and then you have your input and your activation um how many units are coming in we have 16 uh what's the shape What's the activation and this is where it gets interesting um because we have in here uh
railu on two of these and softmax activation on one of these there are so many different options for what these mean um and how they function how does the ru how does the softmax function and they do a lot of different things um we're not going to go into the activations in here that is what really you spend hours doing is looking at these different activations um and just some of it is just U um almost like you're playing with it like an artist you start getting a feel for like a uh inverse tangent activation
or the tan activation takes up a huge processing amount uh so you don't see it a lot yet it comes up with a better Solutions especially when you're doing uh when you're analyzing Word documents and you're tokenizing the words and so you'll see this shift from one to the other because you're both trying to build a better model and if you're working on a huge data set um it'll crash the system it'll just take too long to process um and then you see things like soft Max uh soft Max generates an interesting um setup where
a lot of the when you talk about rayu oops let me do this uh Ru there we go rayu has um a setup where if it's less than zero it's zero and then it goes up um and then you might have what they call lazy uh setup where it has a slight negative to it so that the errors can translate better same thing with softmax it has a slight laziness to it so that errors translate better all these little details make a huge different on your model um so one of the really cool things about
data science that I like is you build your uh what they call you build defil and it's an interesting uh design setup oops I forgot the end of my code here the concept of build def fail is you want the model as a whole to work so you can test your model out so what you can do uh you can get to the end and you can do your let's see where was it over overshot down here you can test your test out the quality of your setup on there and see where did I do
my tensor flow oh here we go I did it was right above me here we go we start doing your cross entropy and stuff like that is you need a full functional set of code so that when you run it you can then test your model out and say hey it's either this model works better than this model and this is why um and then you can start swapping in these models and so when I say I spend a huge amount of time on pre-processing data is probably 80% of your programming time um well between
those two it's like 8020 you'll spend a lot of time on the models once you get the model down once you get the whole code and the flow down uh set depending on your data your models get more and more robust as you start experimenting with different inputs different data streams and all kinds of things and we can do a simple model summary here uh here's our sequential here's our lay or output a parameter this is one of the nice things about carass is you just you can see right here here's our sequential one model
boom boom boom boom everything's set and clear and easy to read so once we have our model built uh the next thing we're going to want to do is we're want to go ahead and train that model and so the next step is of course model training and when we come in here this a lot of times it's just just paired with the model because it's so straightforward it's nice to print out the model setup so you can have a tracking but here's our model uh the keyword in Cross is compile Optimizer atom learning rate
another term right there that we're just skipping right over that really becomes the meat of um the setup is your learning rate uh so whoops I forgot that I had an arrow but I'll just underline it a lot of times the learning rate set to 0.0 uh set to 0.01 uh depending on what you're doing this learning rate um can overfit and underfit uh so you'd want to look up I know we have a number of tutorials out on overfitting and underfitting that are really worth reading once you get to that point in understanding and
we have our loss um sparse categorical cross entropy so this is going to tell carass how far to go into until it stops and then we're looking for metrics of accuracy so we'll go ahead and run that and now that we've compiled our model we want to go ahead and um run it fit it so here's our model fit um we have our scaled train samples our train labels our validation split um in this case we're going to use 10% of the data for validation uh batch size another number you kind kind of play with
not a huge difference as far as how it works but it does affect how long it takes to run and it can also affect the bias a little bit uh most of the time though a batch size is between 10 to 100 um depending on just how much data you're processing in there we want to go ahead and Shuffle it uh we're going to go through 30 epics and uh put a verbose of two let me just go ah and run this and you can see right here here's our epic here's our training um here's
our loss now if you remember correctly up here we set the loss let see where was it um compiled our data there we go loss uh so it's looking at the sparse categorical cross entropy this tells us that as it goes how how how much um how how much does the um error go down uh is the best way to look at that and you can see here the lower the number the better it just keeps going down and vice versa accuracy we want let's see where's my accuracy value accuracy at the end uh and
you can see 619 69. 74 it's going up we want the accuracy would be ideal if it made it all the way to one but we also the loss is more important because it's a balance um you can have 100% accuracy and your model doesn't work because it's overfitted uh again you won't look up overfitting and underfitting models and we went ahead and went through uh 30 epics it's always fun to kind of watch your code going um to be honest I usually uh um the first time I run it I'm like oh that's cool
I get to see what it does and after the second time of running it I'm like i' like to just not see that and you can repress those of course in your code uh repress the warnings in the printing and so the next step is going to be building a test set and predicting it now uh so here we go we want to go ahead and build our test set and we have uh just like we did our training set a lot of times you just split your your initial setup uh but we'll go ahead
and do a separate set on here and this is just what we did above uh there's no difference as far as um the randomness that we're using to build this set on here uh the only difference is that we already uh did our scaler up here well it doesn't matter because the the data is going to be across the same thing but this should just be just transform down here instead of fit transform uh because you don't want to refit your data um on your testing data there we go now we're just transforming it because
you never want to transform the test data um easy to mistake to make especially on an example like this where we're not doing um you know we're r randomizing the data anyway so it doesn't matter too much because we're not expecting something weird and then we go ahead and do our predictions the whole reason we built the model is we take our model we predict and we're going to do here's our xcal data batch size 10 verbose and now we have our predictions in here and we could go ahead and do a um oh we'll
print predictions and then I guess I could just put down predictions and five so we can look at the first five of the predictions and what we have here is we have our age and uh the prediction on this age versus what what we think it's going to be what what we think is going to they're going to have uh symptoms or not and the first thing we notice is that's hard to read because we really want a yes no answer uh so we'll go ahead and just uh round off the predictions using the ARG
Max um the numpy argmax uh for prediction so it just goes to a 01 and if you remember this is Jupiter notebook so I don't have to put the print I can just put in uh rounded predictions and we'll just do the first five and you can see here 010 0 so that's what the predictions are that we have coming out of this um is no symptoms symptoms no symptoms symptoms no symptoms and just as uh we were talking about at the beginning we want to go ahead and um take a look at this there
we go confusion matrixes for accuracy check um most important part when you get down to the end of the story how accurate is your model before you go and play with the model and see if you can get a better accuracy out of it and for this we'll go ahead and use the pyit um the SK learn metrics uh pyit being where that comes from import confusion Matrix uh some iteration tools and of course a nice map plot Library that makes a big difference so it's always nice to um have a nice graph to look
at um picture is worth a thousand words um and then we'll go ahead and call it CM for confusion Matrix y true equals test labels why predict rounded predictions and we'll go ahead and load in our cm and I'm not going to spend too much time on the plotting um going over the different plotting code um you can spend uh like whole we have whole tutorials on how to do your different plotting on there uh but we do have here is we're going to do a plot confusion Matrix there's our CM our classes normalized false
title confusion Matrix cmap is going to be in blues and you can see here we have uh to the nearest cmap titles all the different pieces whether you put tick marks or not the marks the classes a color bar um so a lot of different information on here as far as how we're doing the printing of the of the confusion Matrix you can also just dump the confusion Matrix um into a caborn and real quick get an output it's worth knowing how to do all this uh when you're doing a presentation to the shareholders you
don't want to do this on the Fly you want to take the time to make it look really nice uh like our guys in the back did and uh let's go ahead and do this forgot to put together our CM plot label we'll go and run that and then we'll go ahead and call the little the definition for our mapping and you can see here plot confusion Matrix that's our the the little script we just wrote and we're going to dump our data into it um so our confusion Matrix our classes um title confusion Matrix
and let's just go ahead and run that and you can see here we have our basic setup uh no side effect effect 195 had side effects uh 200 no side effects that had side effects so we predicted the 10 of them who actually had side effects and that's pretty good I mean I I don't know about you but you know that's 5% error on this and this is because there's 200 here that's where I get 5% is uh divide these both by by two and you get five out of 100 uh you can do the
same kind of math up here not as quick on the flight it's 15 and 195 5 not an easily rounded number but you can see here where they have 15 people who predicted to have no uh with the no side effects but had side effects kind of setup on there and these confusion Matrix are so important at the end of the day this is really where where you show uh whatever you're working on comes up and you can actually show them hey this is how good we are or not how messed up it is so
this was a uh I spent a lot of time on some of the parts uh but you can see here is really simple uh we did the random generation of data but when we actually built the model coming up here uh here's our model summary and we just have the layers on here that we built with our model on this and then we went ahead and trained it and ran the prediction now we can get a lot more complicated uh let me flip back on over here because we're going to do another uh demo so
that was our basic introduction to it we Ted talked about the uh oops here we go okay so implementing a neural network with carass after creating our samples and labels we need to create our carass neural network model we will be working with a sequential model which has three layers and this is what we did we had our input layer our hidden layers and our output layers and you can see the input layer uh coming in uh was the age Factor we had our hidden layer and then we had the output are you going to
have symptoms or not so we're going to go ahead and go with something a little bit more complicated um training our model is a two-step process we first compile our model and then we train it in our training data set uh so we have compiling compiling converts the code into a form of understandable by Machine we use the atom in the last example a gradient descent algorithm to optimize a model and then we trained our model which means it let it uh learn on training data uh and I actually had a little backwards there but
this is what we just did is we if you remember from our code we just had me go back here um here's our model that we created summarized uh we come down here and we compile it so it tells it hey we're ready to build this model and use it uh and then we train it this is the part where we go ahead and fit our model and and put that information in here and it goes through the training on there and of course we scaled the data which was really important to do and then
you saw we did the creating a confusion m matx with carass um as we are performing classifications on our data we need a confusion Matrix to check the results a confusion Matrix breaks down the various misclassifications as well as correct classifications to get the accuracy um and so you can see here this is what we did with the true positive false positive true negative false negative and that is what we went over let me just scroll down here on the end we printed it out and you can see we have a nice print out of
our confusion Matrix uh with the true positive false positive false negative true negative and so the blue ones uh we want those to be the biggest numbers because those are the better side and then uh we have our false predictions on here uh as far as this one so it had no side effects but we predicted let's see no side effects predicting side effects and vice versa so here's the open a documentation and you could see the new features introduced with the CH GPD 40 so these are the improvements uh one is the updated and
interactive bar graphs or pie charts that you can create and these are the features that you could see here you could change the color you could download it and what we have is you could update the latest file versions directly from Google Drive and Microsoft One drive and we have the interaction with tables and charts in an new expandable view that I showed you here that is here you can expand it in the new window and you can customize and download charts for presentations and documents moreover you can create the presentation also that we'll see
in further and here we have how data analysis Works in chat jbt you could directly upload the files from Google Drive and Microsoft One drive I will show you guys how we can do that and where this option is and we can work on tables in real time and there we have customized presentation ready charts that is you can create a presentation with all the charts based on a data provided by you and moreover a comprehensive security and privacy feature so with that guys we'll move to chat jpt and here we have the chat jpt
40 version so before commencing guys there's a quick info for you if you're one of the aspiring data analyst looking for online training and graduating from the best universities or a professional who elicits to switch careers with data analytics by learning from the experts then try giving a to Simply learn Purdue postgraduate program in data analytics in collaboration with IBM you can find the link in the description box and pin command so let's get started with data analysis part so this is the PIN section or the insert section where you can have the options to
connect to Google Drive connect to Microsoft One drive and you can upload it from the computer this option was already there that is upload from computer and you can upload at least or at Max the 10 files that could be around Excel files or documents so the max limit is 10 and if you have connected to Google Drive I'll show you guys uh I'm not connecting you but you guys can connect it to and you could upload it from there also and there's another cool update that is ability to quote directly in your chat uh
so while chatting with chat jbt I'll show you guys how we can do that and you could find some new changes that is in the layout so this is the profile section it used to be at the left bottom but now it's move to the top right and making it more accessible than ever so let's start with the data analysis part and the first thing we need is data so you can find it on kager or you could ask chat gp4 to provide the data I'll will show you guys so this is the kagle website
you can sign in here and click on data sets you can find all the data sets here that would be around Computer Science Education classification computer vision or else you could move back to chat GPD and you would ask the chat GPT for model to generate a data and provide it in Excel format so we'll ask him we'll not ask him can you we'll just ask him provide a data set that I can use for data analysis and provide in CSV format so you could see that it has responded that I can provide a sample
data set and he has started generating the data set here so you could see that he has provided only 10 rows and he is saying that I will now generate this data set in CSV format first he has provided the visual presentation on the screen and now is generating the CSV format so if you want more data like if you want 100 rows or th000 rows you could specify in the prompt and chat jpt will generate that for you so we already have the data I will import that data you could import it from here
or else you can import it from your Google Drive so we have a sales data here we will open it so we have the sales data here so the first step we need to do is data cleaning so this is the crucial step to ensure that the accuracy of analysis is at its best so we can do that by handling missing values that is missing values can distort our analysis and here chat gb4 can suggest methods to impute these values such as using the mean median or a sophisticated approach Based on data patterns and after
handling the missing values we will remove duplicates and outlier detection so we'll ask CH JB clean the data if needed so we can just write a simple prompt that would be clean the data if needed and this is also a new feature you can see the visual presentation of the data here that we have 100 rows here and the columns provided that is sales ID date product category quantity and price per unit and total sales so this is also a new feature that okay uh we just head it back we'll move back to our chat
GPT chat here okay so here we are so you could see that chj has cleaned the data and he has provided that it has checked for missing values checked for duplicates and ensure consistent formatting and he's saying Okay Okay so now we will ask him that execute these steps and provide the clean data as chj has provided that these would the steps to clean the data and let's see so he has provided a new CSV file with the clean sales data we will download it and ask him to use the same file only use this
new cleaned sales data CSV file for further analysis so you could see that he is providing what analysis we can do further but once our data is clean the next step is visualization so visualiz ations help us understand the data better by providing a graphical representation so the first thing we will do is we will create a prompt for generating the histograms and we'll do that for the age distribution part so we'll write a prompt that generate a histogram generat a histogram to visualize the distribution of customer ages to visualize the distribution of customer ages
and what I was telling you guys is this code button if you just select the text and you would find this reply section just click on that and you could see that it has selected the text or what you want to get all the prompts started with chat gbd so we'll make it cross and you could see that it has has provided the histogram here and these are the new features here and we could see that he providing a notification that interactive charts of this type are not yet supported that is histogram don't have the
color Change option I will show you the color Change option in the bar chart section so these features are also new you can download the chart from here only and this is the expand chart if you click on that you could see that you could expand the chart here and continue chat with chat GPT here so this is the interactive section so you could see that he has proved the histogram that is showing the distribution of customer ages and the age range are from 18 to 70 years with the distribution visualized in 15 bins that
he has created 15 bin Here and Now moving to another visualization that we'll do by sales by region so before that I will open the CSV file that is provided by the chat GPT so you guys can also see what data he has provided so this is the clean sales data and you could see that we have columns sales ID date product category quantity price per item total sales region and sales person so now moving back to chat jity so now we will create a bar chart showing total sales by region so we'll enter this
prompt that create a bar chart showing total sales by region so what we are doing here is we are creating bar charts or histogram charts but we can do that for only two columns if we want to create these data visualization charts we need two columns to do so so you could see that he has provided the response and created the bar chart here and this is the interactive section you could see that here's an option to switch to static chart if we click on that we can't like we are not getting any information we
scroll on that and if I enable this option you could see that I can visually see how many numbers this bar is indicating and after that we have the change color section you can change the color of the data set provided so we can change it to any color that is provided here or you could just just write the color code here and similarly we have other two options that is download and another is the expand chart section and if you need uh what code it has done to figure out this bar graph so this
is the code you could use any ID to do so if you don't want the presentations or the visualizations of the bar charts here you could use your ID and use the Python language and and he will provide the code for you just take your data set and read it through pandas and generate the bar charts so moving to next section that is category wise sales section so here we will generate a pie chart showing the proportion of sales for each product category so for that we'll write a prompt generate a py chart showing the
proportion of sales for each product category so you could see that it has started generating the pie chart and this is also an interactive section if you click on that you would be seeing a static pie chart and if you want to change the color you you can change for any section that could be clothing Electronics furniture or kitchen and similarly we have the download section and the expand chart section so this is how this new chat jpd 4 model is better than chat jpd 4 that you could use a more interactive pie charts you
could change the colors for that and you can just ho over these bar charts and found all the information according to them so after this data visualization now move to statistical analysis so this will help us uncover patterns and relationships in the data so the first thing we'll do is correlation analysis and for that we'll write the prompt analyze the correlation between age and purchase amount so this correlation analysis help us understand the relationship between two variables so this can indicate if older customers tend to spend more or less so we will find out that
by analyzing the data and we'll provide a PR to chat JPY that analyze the correlation between age and purchase amount so let's see what it provides uh so here's the response by chat gbt you could see a scatter plot that shows the relationship between customer age and total sales that is with a calculator correlation coefficient of approximately 0.16 so this indicates a weak positive correlation between age and purchase amount suggesting that as customer age increases there's a slight tendency for total sales to increase as well so you could just see the scatter Pro here that
if the age increases so it is not correlated to sales as you would see an empty graph here so till 40 to 50 years of age or the 70 years of age you could find what amount they have spent here that is the total sales accumulated by these ages so now moving to sales Trend so here we will perform a Time series analysis of purchase amount or the given dates so what does this do is time series analysis allows us to examine how sales amount changes over time helping us identify Trends and see seasonal patterns
so for that we'll write a prompt performer time series analysis of purchase amount or given dates so you could see that chgb has us the response and here is the time series plot showing total sales or the given dates and each point on the plot represents the total sales for a particular day so through this you can find out and the businesses find out which is the seasonal part of the year and where to stock up their stocks for these kind of dates and after that you could also do customer segmentation so what does this
do is so we can use clustering here to segment customers based on age income and purchase amount so clustering groups customers into segments based on similarities this is useful for targeted marketing and personalized services and after that we have the advanced usage for data analysis here we can draw a predictive modeling table and do the Market Basket analysis and perform a customer lifetime value analysis so we will see one of those and what we'll do is we'll perform a Market Basket analysis and perform an association rule mining to find frequently both together products so the
theory behind this is the association rule mining helps identify patterns of products that are often purchased together aiding in Inventory management and cross selling strategies so for that we'll write a prompt that so perform an association rule mining to find frequently board together products so for that we'll write a prompt here perform an association rle mining to find frequently bought products together so let's see for this prompt what does CH4 respond to us uh so you could see that he is providing a code here but we don't need a code here we need the analysis
don't provide code do the Market Basket analysis and provide visualizations so you could see that uh chat jpt has provided the response that given the limitations in this environment so he is not able to do the Market Basket analysis here so but he can help us how we can perform this in an ID so he providing you can install the required libraries then prepare the data and here is providing the example code so you could see there are some limitations to chat GPT 4 also that he can't do advance data analysis so you could use
the code in your ID and do the Market Basket analysis there so there are some limitations to chat GPT 4 also and now we will ask chat GPT can you create a presentation based on the data set and we'll provide a data set to it also so we will provide a sample sales data and we'll ask him can you create a presentation or PowerPoint presentation based on this data set and only provide data visualization graphs so you can see see that J GPT 4 has started analyzing the data and he is stating that and he
will start by creating a data visualization from the provided data set and compile them into PowerPoint presentation so you could see that j4 has provided us the response and these are all the presentations or the bar graphs that he has created and now we have downloaded the presentation here we will open that and here's the presentation that is created by J jpt 4 Sora is here open AI has introduced Sora an advanced AI tool for creating videos now available at sora.com earlier this year Sora was launched to turn text into realistic videos showcasing exciting progress
in AI technology now open AI has released Sora turbo a faster and more powerful version available to jbt plus and pro users Sora lets user create videos in 1080P quality up to 20 second long and in different formats like WID screen vertical or Square it includes tools like a story board for precise control and options to remix or create videos from scratch there is also a community section with featured and recent videos to spark ideas Chad B plus users can make up to 50 videos per month at 480p resolution while Pro user get access to
more features like higher resolution and longer video duration while Sora turbo is much faster open AI is still working to improve areas like handling complex action and making the technology more affordable to ensure safe and eth use Sora includes features like visible watermarks content moderation and metadata to identify videos created with Sora Sora makes it easier for people to create and share stories through video open AI is excited to see how user will explore new creative possibilities with the powerful tool so welcome to the demo part of the Sora so this is the landing page
when you will login in Sora so let me tell you I have the chgb plus version not the version so I have some 721 credits left okay uh later on I will tell you what are the credits okay so let's explore something here so these are some recent videos which I have created or tested you can see and this featured version is all the users of Sora which are creating videos so it's coming under featured so we can learn or we can generate some new ideas like this okay like this parot and all like this
is very cool for Learning and these are some the saved version and these are all videos and uploads like this so let's come into the credit Parts okay so you can see I have 721 credit left so if you will go this help open.com page and this page you can see what are the credit so credit are used to generate videos with Sora okay so if you will create 480p Square 5sec video it will take only 20 credit okay for 10 it will take 40 then this then this okay for 480p uh this much credit
25 credit 50 credit like this 720 is different 108 be different okay so here it is written please note that the questing multiple variation at once will be charged at the same rate as running two separate generation request okay so here this plus icon you can see so here you can upload the image or video okay so you can also do like this you can upload the image and you can create the video from that image okay and this is choose from library your personal Library this library right and this option is for the variation
okay like these are basically presets like balloon world stop motion archal filar or cardboard and the paper okay so this is the resolution okay 480p this is the fastest in video generation okay 720p will take uh like 4 4X lower and 1080p 8X lower I guess 1080p is only available in chat gbt uh pro version got it okay so we uh we are just you know doing I will I'm just uh showing you demo so I will uh choose this fastest version only okay so this is the time duration how long you want like 5
Second 10 second 15 and 20 seconds is available in pro version okay of jgb and this is how much versions want to take we I will select only two okay because it will again charge more credits to you okay and these credits are monthly basis I guess okay these credits are monthly basis okay see again recard remix Bland Loop to create content this will take again more credits okay see here chity plus up to 50 priority videos th000 credits okay per month I guess yeah per month up to 720p resolution and the 5 Second duration
and charge Pro up to 500 priority videos 10,000 credits unlimited relax videos up to 1080p resolution 20 second duration download without Watermark here you can download with Watermark I guess I don't know yeah we'll see uh about everything you okay but chgb Pro is $200 per month so huh yeah it's expensive right so yes let's uh do something creative so okay I will write here okay polar be enjoying on the S desert okay s desert yeah okay you can do Story Board as well or you can create directly videos okay so let me show you
the story board first yeah so frame by frame you can give you know different uh what to say prompt okay here you can give different prompt okay polar beer with family okay playing with scent like this okay and later on it will create a whole the video okay third you can describe again you can add image like this okay this is a story created by the chgb okay let's create Okay add it to the queue okay it's very fast actually almost done yeah see with family you can see playing with the scend okay so these
are the two variation okay you can choose either this or either that one or either that one okay I'm feeling this muches yeah so here you can again edit your story recut you can trim or extend this video in a new storyboard okay so basically record features allow you to creators to you know pinpoint and isolate the most impactful frame in a video extending uh them in either direction to build out of like complete scene okay if we choose recut okay this thing fine then remix what remix do is like the remix features allows user
to reimagine existing videos by alterating their components without losing you know that essence of the original originality you can say okay you want to you know add or remove certain things okay what if I want to remove you know that this polar beer or like this okay or you can say we can you know change colors or we can some tweak visual elements and this blend so this blend feature allows you to combine with different video if I want to upload some videos it will blend both the video this video particular with that video which
I will upload okay right and and the last Loop you know by the name Loop features you know feature make it easy to create seamless repetition of the video okay this will like this is one option is ideal for background visuals music videos like this okay so this is how you can create video in 2 minutes I can say just by giving prompt okay this one is favorite you can save it for the favorite and this this you can share options are there copy link or this unpublished and you can download see I told you
without Watermark is available in only pro version so I have this with Watermark you can download see download a video in just a click or you can download as a GFI as well right and uh add to a folder okay fine this is the notification activity right so let's create one okay monkey with family driving car on this space yeah so okay I will choose this St 16 by9 like it takes more credit of mine it's okay yeah add it to the queue if you'll go to favorites it will come this one because I chose
it okay and if you will ask how this Sora is working so it's like text to image Genera AI model such as like Dal three stable diffusion and mid joury so Sora is a diffusion model that means that it starts with each frame of the video consisting of the static noise see oh it's cartoonish but yeah see if you want Lamborghini you can add that I want Lamborghini or Tesla whatever so this is how you can generate videos with Sora you know in a quick in quick two minutes hello everyone I am M and welcome
to today's video where we will be talking about llm benchmarks tools used to test and measure how well large language models like GPT and Google Gemini performs if you have ever wondered how AI models are evaluated this video will explain it in simple terms llm benchmarks are used to check how good these models are at tasks like coding answering questions and translating languages or summarizing text these tests use sample data and a specific measurement to see how will the model perform for example the model might be tested with a few example like few short learning
or none at all like zero short learning to see how it handles new task so now the question arises why are these benchmarks important they help developers understand where a model is strong and where it needs Improvement they also make it easier to compare different models helping people choose the best one for their needs however llm benchmarks do have some limits they don't always predict how well a model will work in real world situation and sometimes model can overfit meaning they perform well on test data but is struggle in Practical use we will also cover
how llm leaderboards rank different model ped on their benchmark scores giving us a clear picture of Which models are performing the best so stay tuned as we dive into how llm benchmarks work and why they are so important for advancing AI so without any further Ado let's get started so what are llm benchmarks llm benchmarks are standardized tools used to evaluate the performance of La language models they provide a structure way to test llms on a specific task or question using sample data and predefined metrics to measure their capabilities these Benchmark assess various skills such
as coding Common Sense reasoning and NLP tasks like machine translation question answering and text summarization the importance of llm Benchmark lies in their role in advancing model development they track the progress of an llm offering quantitive insights into where the model performs well and where impr movement is needed this feedback is crucial for guiding the fine-tuning process allowing researchers and developers to enhance model performance additionally benchmarks offers an objective comparison between different llms helping developers and organization choose the best model for their needs so how llm benchmarks work llm Benchmark follow a clear and systematic
process they present a task for llm to complete evaluate it performance using specific metrics and assign a score based on how well the m model performs so here is a breakdown of how this process work the first one is setup llm Benchmark come with pre-prepared sample data including coding challenges long documents math problem and real world conversation the task is span various areas like Common Sense reasoning problem solving question answering summary generation and translation all present to the model at the start of testing the second step is testing the model is tested on one of
the three ways few short the LM is provided with a few example before being prompted to complete a task demonstrating its ability to learn from limited data the second one is zero shot the model is asked to perform a task without any prior examples testing its ability to understand New Concept and adapt to unfamiliar scenarios the third one is fine youe the model is trained on a DAT set similar to the one used in The Benchmark aiming to enhance its performance on the specific task involved the third step is scoring so after completing the task
The Benchmark compares the model's output with the expected answer and generates a score typically ranging from 0 to 100 reflecting how accurately the llm perform so now let's moving forward let's see key metrics for benchmarking llms so llms Benchmark uses various metrics to assess performance of large language model so here are some commonly used metries the first one is accuracy of precision measure the percentage of correct prediction made by the model the second one is recall also known as sensitivity measure the number of of true positive reflecting the currect prediction made by the model the
third one is F1 score combines both accuracy and recall into a single metric weighing them equally to address any false positive or negatives F1 score ranging from 0 to one where one indicates perfect precision and recall the fourth one is exact match tracks the percentage of predictions that exactly match the correct answer which is especially used for the task like translation and question answering the fifth one is perpect causes here it will tell you how well a model predicts the next word or token a lower perplexity score indicates better task comprehension by the model the
sixth one is blue bilingual evaluation understudy is used for evaluating machine translation by comparing and grams sequence of adjacent Tex element between the models output and the human produced translation so these quantitative metrics are often combined for more through evaluation so in addition human evaluation introduces qualitatively factors like coherence relevance and semantic meaning provide a nuanced assessment however human evaluation can be timec consuming and subjective making a balance between quantitative and qualitative measures important for comprehensive evaluation so now let's moving forward see some limitation of llm benchmarking while LM benchmarking available for assessing model performance
they have several imitation that prevents them from the fully predicting real world Effectiveness so here are some few the first one is bounded scoring once a model achieves the highest possible scores on the The Benchmark that Benchmark loses its utility and must be updated with more challenging task to remain a meaningful assessment tool the second one is Broad data set llm Benchmark often rely on Sample data from diverse subject and task so this wide scope may not effectively evaluate a model performance in edge cases specialized feeds or specific use cases where more tailor data would
be needed the third one is finite assessment Benchmark only test a model current skills and as LMS evolve and a new capabilities merge new benchmarks must be created to measure these advancement the fourth one is overfitting so if an llm is trained on the same data used for benchmarking it can be lead to overfitting where the model performs well or the test data but struggles with the real task so this result is scores that don't truly represent the models broader capabilities so now what are llm leaderboards so llm leaderboards publish a ranking of llms based
on the variety of benchmarks L board provide a way to keep track to the myad M and compare their performance llm leaderboards are especially beneficial in making decision on which model to you so here are some so in this you can see here open AI is leading and GPD 40 second and the Llama third with 45 parameter B and 3.5 Sonet is there so this is best in multitask reasoning what about the best in coding so here open AI o1 is leading I guess this is the orian one and the second one is 3.5 Sonet
and and after that in the third position there is GPT 4 so this is in best in coding so next comes fastest and most affordable models so fast models are llama 8B parameter 8B parameter and the second one is LL M Lama 70b and the third one is 1.5 flesh this is Gemini one and lowest latency and here it is leading llama again in cheapest models again llama 8B is leading and in the second number we have Gemini flash 1.5 and in third we have GPT 40 mini moving forward let's see standard benchmarks between CLA
3 Opus and gp4 so in general they are equal in reasoning CLA 3 opas is leading and in coding gp4 is leading in math again gp4 is leading in tool use cloud 3 opas is leading and in multilingual Cloud 3 op leading today we are exploring hugging F an amazing tool that makes working with language and AI super easy if you curious about using advanced technology to understand or create text this video is perfect for you hugging phas is a company that helps people use AI models for language tasks like translation text analysis or even
generating new text they have created a library called Transformer which comes with pre-trained models so you don't have to build everything from a scratch it's simple powerful and perfect for developers researchers or even beginners so in this video I will show you three cool things you can do with hugging face spe to text turn spoken words into written text easily great for captions or voice apps second one is sentiment analysis find out if a text is positive negative or neutral helpful for understanding reviews or comments the third one is text generation create humanik text that
sounds natural perfect for chatbots or creative writing I will also explain some important Basics like how pipelines make using hugging pH model super easy and how tokenization helps AI understand text by end of this video you will know how to use hugging phas to start building your own projects it's simple fun and really powerful so without any further Ado let's get started so welcome to this demo part of this video so here as you know we will be doing three things first thing speech to text recognition the second thing text generation from a particular sentence
or a word and the third thing we will do sentiment analysis okay so we will perform this one by one so let's start with something called spe to text recognition Okay using hugging pH so first I will write rename this I write here hug in face speech to text okay yeah so first I will install Transformers Library it is already installed on my system but again just for you guys I'm doing this okay VIP install Transformer so now let's see what is Transformers so Transformers is a powerful python Library created by hugging face so that
allows you to download manipulate and run thousands of pre-trained open- source AI model fine so as you can see requirement already satisfied it means like already installed so these Transformer models cover multiple task across you know modalities like NLP natural language processing computer Vision Audio and multimodel learning like many things fine so now I will do from Transformers import pipeline okay so now let's run it so now what is pipeline okay so a Transformer pipeline describes the flow of data from origin system to destination system fine because you already know Transformer means you can run
or manipulate thousand of preed Open Source AI model so pipeline describes the flow of data from origin system to Des destination system and defines how to transform you the data along the way okay so let's check the versions transform ver version okay that long tab okay light is coming import Transformers yeah so we have currently 4.4 2.4 version of the Transformer okay so I'm using Google collab okay so you can use either jupyter notebook visual code Studio or collab okay so now let's import libraries import liosa okay so what is librosa so librosa is a
python package for audio and music analysis right because we are doing spe to text so we need this so it provides various functions to quickly extract key audio features and metri from the audio files okay and this librosa can also be used to analyze and manipulate audio files in a variety of format such as wav W MP3 m4a and like that okay so next we will import torch I hope everyone know torch so torch is nothing P torch it is a machine learning library based on the torch Library used for applications such as computer vision
and NLP and originally it was developed by The Meta Ai and now the part of Linux Foundation you know umbrella let's import one more thing import so now let's import IPython okay Capital IPython do display as display so now you will be wondering what is this IPython display okay so it is an interactive command line uh for the terminal for Python and it you know it can provide a IPython terminal and the web based notebook platform for python Computing and this uh IPython have you know more advanced features than the python standard interpreter and we
quickly execute a single line of python code nothing okay for that I'm using this and the next is from Transformers import wav to vectorizer for CTC and W V 2 vectorizer 2 tokenizer okay so now what is this so W2 vectorizer for CTC is supported by the notebook on how to fine tune a speech recognition model okay and this tokenizer so tokenizer is nothing tokenization is a conversion of a text into meaningful lexical tokens belonging to categories defined Lexa program in case of NLP so those categories include nouns verbs adjective punctuation ATC okay now and
the last Library let's import numai s NP I hope everyone know numai what is numai again numai is a python Library used for working with arrays it ALS it also has a function you know for working in domain like linear algebra fora transform and matrices and many others fine I hope everyone know about these Library which we have imported okay so let's move forward right tokenizer to to tokenizer Dot from pre-trained this is why I love this Google collabo site just some words and it will show the suggestions you know Facebook so this is again
one pre uh train model okay so we are just importing it base 96 this is nothing the name okay then model equals to wv2 for CTC do from rra then Facebook W V2 vectorizer two then base 96 fine now let's run it so now you can see so we are loading this okay there they downloading okay so you can ignore these warning so now let's load the audio file so here I will write audio sampling TR equals to librosa dot load okay so you know about liosa right then I have V do M4 a so
we I have already one speech or one audio you can say so I will play it don't worry before the final output I will you know show you 16,000 okay so now I will write okay okay already loaded I guess okay okay ni issues I will load again okay file is loaded I willame it to V yeah fine so now it won't there and will come again we have v. MP4 okay so now we have to do okay let me try it we have v. MP4 there but I don't know why it is not [Music]
coming okay let me copy the path now it will run I guess it's running yeah so now I will write here audio comma sampling rate okay now listen carefully then I will write display display dot audio then part comma comma autop plays to True okay hello and welcome this is an AI voice message I guess you heard but again let me play Play It Again hello and welcome this is an AI voice message okay so it is saying hello and welcome this is an AI voice message fine so now what I will do I will
input some values input values equals to tokenizer then audio comma return tensors equals to PT find dot input values okay then here input values okay input values okay yeah so now logits will come here so we have to now store the logits which means non-normalized predictions okay so logits equals to model input values dot logits okay then here I write logits okay it's running done so now what we will do we will store predicted IDs then we will pass the Logics to values to softmax to get the predicted value okay so here I will write
predicted IDs equals to torch do ARG Max okay then Logics comma dimensionals to minus one okay then I will pass the you know prediction uh to the tokenizer decode to the transcription okay so here I will write transcriptions equals to tokenizer tokenizer dot decode predicted IDs Zero running fine so now let's see our output transcriptions so now you can see hello and welcome this is an AI voice message so now let's play it again hello and welcome this is an AI voice message amazing right so this is how you can use hugging face for this
piece to text documation it's a very small code line of code as you can see okay so now let's move forward and do the sentiment analysis okay or text generation so let me open new drive so yeah so now I will write here I will write sentiment slash text [Music] generation hugging pH okay so first of all let's import the files so First I write import import warnings then warnings dot filter warnings okay I will write here ignore then we'll import numpy you already know what is dpai and import P so these are some basic
python library and I hope everyone knows and import M plotly for the plots M plot Li dot P blot as PLT then I will import cbor for the crafts just static okay SNS then SNS do set so now we'll import SK learn model train split so SK learn is nothing pyed learn is a probably the most useful library for machine learning python so so this uh Library contains a lot of efficient tools for machine learning and you know statical modeling including classification you can do regression you can do clustering you can do from this Escalon
from Escalon do model [Music] selection import import train test split from Escalon dot matrix import F1 score comma confusion Matrix comma r a call R is score so again we'll import teams from Transformers import pipeline everyone know what is this then import torch okay now let's run it you have too many active session 10 minute existing to continue if you're interested in using one more session okay wait no fine now which one is open bro terminate this terminate this terminate this okay fine so now let's do sentiment analysis so so we will explore sentiment analysis
using a pre-trained transform model OKAY the hugging phase Library provides you know convenient pipeline as you already know function that allows us to easily perform sentiment on the text so now let's first import the necessary dependencies and create a sentiments pipeline using this line so I will write here classifier equals to pipeline sentiment analysis then type classifier okay so it will download the pre-trained sentiment analysis pipeline pipeline is not defined how okay okay error is there I hope it will run now yeah so now we can uh pass a single sentence or a list of
sentence the classifier and now and get the predictive sentiment labels and Associate confidence score fine so now just for the testing classifier so let's write this is a great movie okay now let me run it so now you can see here label positive so label positive typically refers to the outcome or a class of interest that the model is designed to predict so here we are just checking the sentiment analysis model okay so this is why I wrote this okay so let's check one more this was a great course then I did not understood any
of it now let's check this yeah perfect and the score you can see the accuracy score 99% 99% which is almost close to 100% and which is amazing so now you have access to a GPU so you can also utilize it for the faster processing by specifying the device using the parameter okay okay so now what I will do first I will okay wait yeah fine so now I will import data set aine tweets equals to PD do read PD means Panda's Library here we are using okay now first import I will import here itself
so I have twitter. tweets. CSV okay tweets do CSV let it upload yeah done so now let me Airline tws do head why doad means you it will show me the top five rows okay 0 1 2 3 4 5 okay you can see tweet ID line sentiment neutral positive neutral negative then this this this okay so now let's do something okay so what I will do DF equal DF means data frame as [Music] line tws then I will write Airline sentiment we have Airline sentiment this column okay comma text I need these two columns
okay DF do head five so yeah a ctim neutral and this is what this text is all about okay because these two are the main things text and the sentiment so now let's make plot count plot then I will write DF comma xals to Airline sentiment then pet equals to with this then here I will write PLT do X Lael Airline sentiment okay then PLT do vable count PLT do show so it will draw one graph okay okay spelling is wrong yeah it will show neutral positive sentiment and the negative so as you can see
the negative sentences or the sentiments are the more okay so now so we have now three classes which do not match the two class available in the hugging phase pipeline so therefore we will filter out all the rows which have been labeled as neutral okay so DF equals to DF and again Airline sentiment fine was not equals to Neutron okay then DF Target = to DF Airline sentiment dot uh map then I will write here positive positive 1 and the negative 0 fine negative will be 0o okay then print number of rows comma DF dot
shape okay so now you can see number of rows are 11,541 okay so now I will write predictions five it will okay predictions is not defined okay so now what I will do so here I will write text to DF text and to list do to list then here I willite predictions equals to classifier text okay so here write probabilities equals to predictions then score if predictions label dot starts with starts with p means positive okay else 1 minus prediction score prediction score okay then I will write for prediction and prediction why is not running
this is taking time too much time I don't know why so as you can see finally we have the output of predictions values so this depends on you know system to system my system took almost 17 minutes 47 seconds to complete okay so now let's run it okay predictions is not defin proo now predictions yeah so now I will write here predictions equals to NP do array NP means do array then I will write one if prediction label is positive dot starts with P okay lse zero for prediction and prediction values yep okay fine so
now let's check the accuracy of our model print accuracy okay then I will do round up the values so here I will write NP dot mean then DF then the Target right yeah then I will write equal equal to predictions then I will write into 100 comma two then I will write percentage okay fine looks good yes so as you can see our accuracy is 88.9% which is you know very good again so now let's do some confusion metric confusion Matrix then DF Target comma predictions common normalize to True okay then I will plot confusion
met six okay DF then Target comma predictions comma normalize equals to True okay okay my bad my bad my bad right here confusion [Music] Matrix comma labels okay so here I will you know plot a confusion metric using cbon okay so here I will use AR args which is confusion Matrix NP do n array so which is you know labels list so I will write PLT do figure then figure size = to 8 comma 6 that's SNS dot set font scale = to 1.4 okay then let's create the heat map Fusion Matrix comma and not
will be true okay then I will write fmt = to G then confusion map equals to Blues then XT labels to labels then y Pi labels equals to labels okay yes so now I will write PLT dot title will be confusion Matrix then PLT dox label will be predicted values then PLT do y Lael will be actual values then PLT dot show okay why the chart is not coming okay so I have to write plot confusion Matrix then I write CM negative comma positive okay now the chart will come okay yeah so now you can
see here this is you know actual and this is a confusion Matrix and the negative and the positive ratio is there okay so now let's print the or now let's check the ROC a score so I will write here print R A score then here I will write R A score then TF Target comma props okay so 94 okay so first let me tell you what is this Roc AC score so this is the area under the ROC curve so what it does it sum up how well a model can produce relative scores to discriminate
between positive and the negative instance across all the classification threshold so with this Roc score of 94 0.94 which is 94% we can conclude that the that a pre-train sentiment analysis model has achieved the high level of accuracy and Effectiveness so in predicting the sentiment labels so this indicates that the model is capable of accuracy classifying text into positive or the negative sentiment categories okay so now we'll do text generation okay so text generation involves generating creative and coherent text based thing so I will write here text okay so text generation involves generating creative and
coherent text based on a given prompt or starting point so what we will do first we will import the necessary dependencies and load the data set of the poem okay so here I'm write poems let me write poems equals to pd. read. CSV don't worry I will give you these files okay description box below I will add so here I WR robot Frost okay dot CSC fine and then I will write poems do head five or head just have to write it module pandas has okay not doore CSV yeah okay it will show the top
five stopping by wots on a Snowy Evening fire and I the aim was Song collection content year of publish okay so now what we will do we will write content equals to poems content dot drop na a to to list okay so to generate text we extract individual lines from the poems and use the pipeline text generation function to create a text generation pipeline fine so here I will write line equals to then I will write for poem in content okay for line and poem dot split into the next line okay then lines do append
then I will line dot WR strip to the right it will add okay fine so now lines equals to line for line and lines if length of [Music] line zero then show the lines five okay so now you have seen here whose wordss these are I think I know then the next line his house is in the village though that's the next line he will not see me okay like this so now let's you know import that pipeline text generation module okay gen equals to [Music] pipeline so these are the pre-rain model you already know
generation okay so now I will write here lines let's let's R it yeah done so in line zero you have we have whose votes these are I think I know okay so now we can now generate Tex by providing a prompt and specifying the parameters such as max length and num return sequence okay why because we have imported this Tex generation module okay for example see gen gen okay sorry lines zero dot max length max length equals to 20 so now it will generate the this to the maximum 20 okay tell 20 word maximum length
okay check okay uh okay gen expression cannot contain perhaps double equals to well okay wait now it will done so see the line was this much only whose words these are I think I know whose wordss these are think I know but here the generated text I wish to go to church because I feel like okay this is how you can do you know Tex generation now let's check for the mo gen lines one okay then max length equals to 30 comma num D return sequence say sequences equals to two Okay so our first line
was this his house is in the village though okay 0 1 2 3 4 5 okay so here you can see see his house in the village though however you might say that the place was the same with the place this this this this so these are the generated text okay and this is another you know return sequence second so there are two one and the second okay so now let me you know import text step okay then creating function r x I don't need this then return text STP dot fill X comma replace white
space white space equals to false false comma fix sentence endings equals to True okay so now out equals to generated lines 0 to maximum length equals to 30 then print that out to zero then [Music] generated text okay so now we are setting uh the P tokens to OS tokens okay so whose who these are I think I know and this is the maximum 30 till 30 okay so now I will write here okay preview equals to okay so now what you can do you can you know generate a prompt to generate text on a
specific topic like this prompt equals to Transformers have a wide variety of applications application in NLB okay so this is my prompt okay this I'm not importing from the data set okay so I will write out equals to gen prompt comma max length equals to 100 print zap out okay here I will write anyways here I will write generated text prompt okay so it is running let's wait yeah so okay somewh is the okay so here the issue is with 100 words I go go till 50 yeah so let me do it again 100 yeah
so now you can see we have we can generate tax using by giving a prompt so we have covered you know three topics in this first thing is piece to text using hugging phase second thing is sentiment analysis using hugging pH and the third is text generation using hugging pH okay and in this Tex generation we have we did with two methods one with the data set and second thing with the you know by giving prompt like in chity you can see open AI is one of the main leaders in the field of generative AI
with its chat GPT being one of the most popular and widely used examples chat GPT is powered by open AI GPT family of lar language models llms in August and September 2024 there were rumors about a new model model from open AI code name strawberry at first it was unclear if it was the next version of GPT 40 or something different on September 12 open aai officially introduced the 01 model hi I am m in this video we will discuss about open a model 01 and its types after this we will perform some basic prompts
using open a preview and openai mini and at the end we will see comparison between the open A1 models and GPD 40 so without any further Ado let's get started what is open i1 the open A1 family is a group of llms that have been improved to handle more complex reasoning these models are designed to offer a different experience from gp40 focusing on thinking through problems more thoroughly before responding unlike older models o1 is built to solve challenging problems that require multiple steps and deep reasoning openi o1 models also use a technique called Chain of
Thought prompting which allows the model to Think Through problem step by step opening a o1 consists of two models o1 preview and o1 mini the o1 preview model is meant for more complex task while the o1 mini is a smaller more affordable version so what can open A1 do open A1 can handle many tasks just like other GPD models from open AI such as answering questions summarizing content and creating new material however o is especially good at more complex task including the first one is enhanc using the 01 models are designed for advanced problem solving
particularly in subjects like science technology engineering and math the second one is brainstorming and ideation with its improved reasoning ow is great at coming up with creative ideas and solution in various field the number third is scientific research o is perfect for task like anoing cell sequencing data or solving complex math needed in areas like Quantum Optics the number fourth is coding the ow models can write and fix code performing well on coding tests like human e and code forces and helping developers build multi-step workflows the fifth one mathematics o1 is much better at math
than previous model scoring 83% in the international mathematics Olympia test compared to gp4 rows 133% it also did well in other math competition like Aime making it useful for generating complex formulas for physics and the last one is self fact checking ow can check the accuracy of its own responses helping to improve the reliability of its answer you can use open models in several ways chat gbd plus and team users have access to 0 preview and 0 mini models and can manually choose them in the model pickup although free users don't have access to the
0 models yet open AI planning to offer 0 mini to them in the future developers can also use these models open as API and they are available on third party platform like Microsoft as youri studio and GitHub models so yes guys I have opened this chb 4 model here and ch1 prev as you can see so I have this plus model OKAY the paid version of CH GB so I can access this o1 preview and o1 Mini model okay we will go with o1 preview model and we will put same prompts in both the model
of the chbd 40 and the O preview and see what are the differences are coming okay so we will do some math questions and we will do some coding we will do some Advanced reasoning and quantum physics as well okay so let's start with so I have some prompt already written with me so first one is number Theory okay so what I will do I will copy it from here and paste it in this and both okay so let me run in four and over preview so here you can see it's thinking okay so this
is what I was saying chain of thoughts okay so these are the chain of thoughts first is breaking down the primes this is and then is identifying the GCT and now see the difference between the output see output is 561 is not a prime number and the gcd greatest common deceiver of 48 and 180 is 12 okay here see charity o1 preview is giving the output in step by step first see determine if 561 is a prime number or not the number 561 is not a prime number it's composite number because it has this this
this okay then Second Step then the greatest common divisor then they found 12 and answer is no 561 is not composite number because of this and the greatest common divisor of 48 and 18 is 12 see just see the difference between the two models this is why CH G1 models are crazy for math coding and advanced reasoning quantum physics for these things okay so let's go with our second step so here if you will see you can see the attach file option in charity 40 okay you can come upload from your computer but here you
we will see in o1 there is no attach file option this is one drawback okay so here upload from computer so this is one small okay and let me open this and this is the question I have okay yeah so I will copy this I will run this and this okay see it's start giving the answer and o1 is still thinking solving the equation then solving analyzing the relationship okay so charp o1 will take time but it will give you more accurate more step by step as you want okay so here you can see solve
for x and question this this this and here the steps you can see okay this is more structured way you can say in a good structured way okay chity preview give you in good structur way as 0 mini as well okay so yeah so here they wrote just one and two this this this and here if you'll see question one solve for x in this and step one is this step two is this and step three is this then the answer of x equal to three but here simply the wrote we know this this this
and xal to 3 for the second question see expanding the left hand side this this is but here step one square both sides of the given equation start by squaring both side okay it's written but not in good way okay so this is why o1 is better for math okay so now let's check it for the coding part okay so I have one question okay let me see what output it will give to first I will write I need okay leave it I will copy it and I will copy it as well here run it
and run it see it's start giving answer okay and still this will adjust the parameters uring the code generation because jt1 will think first then it will analyze then after that it will give you answers okay here the code is done see here the code is done and it's still thinking step one and first here you can't see anything see step setup development environment pip install BL Li then this then this and here nothing and but I will ask it okay give me code in one tab okay here also give me code [Music] in in
single tab okay so I can just copy and paste so what I will do I will open one online compiler and I will directly copy it and paste okay so let's finish this I hope it will work so let me open W3 schools compiler okay yeah same I will open for this W3 School okay so let me copy the code and my bad and paste it here same for go for this okay still okay I will copy the code and I will paste it here okay I hope okay it gives something yeah cool so yes
now you can see the difference between the output so this is the output of 40 and this is the output of o1 preview see o1 preview output is this and this is the out output of 4 so this is the difference this is why o1 takes time but it will give you more accurate result in a good way okay so now let's check something else so moving on let's see some Advanced reing question okay so this is The Logical puzzle one the first one okay so I will copy it and I will paste it here
okay this is for o this is for preview why I'm not comparing o1 with mini because they both are same but slightly differences there okay so here we can see more difference between four old model versus new model you can say okay so now see the answer is end in this much only but it will explain you in a better way see thoughts for 7 Seconds explanation that case one then case two okay with conclusion in both scenarios summary and this here this one small explanation and that's it right so they created 0 preview for
more you know it will describe you more in a better way right now let's see some scientific reasoning as well okay so let me copy it here just still thinking but start giving answer see it's thought for 16 seconds so again I will say that you know CH g01 is much better than chb 4 chg 4 is great for you know content writing and all but chgb 01 preview and mini are very good for reasoning math coding or quantum physics these type of things okay Advanced reasoning okay charity 4 is good for you know generative
text okay like for marketing writing copies emails and all of those so now let's see some comparison between o1 models and GPD 40 model when new models are released their capabilities are revealed through Benchmark data in the technical reports the new open AI model excel in complex using task it surpasses human phsd level accuracy in physics chemistry biology on the GP QA bench coding becomes easier with o1 as it rent in the 89th percentile of the competitive programming questions code Force the model is also outstanding in math on a qualifying exam for international mathematics Olympiad
IMO GPD 40 solved only 133% of problems while 0 achieved 83% this is truly next level on the standard ml benchmarks it has huge improvements across the board MML means multitask accuracy and GP QA is reasoning capab abilities human evaluation open a ask people to compare o wi with GPT 40 on difficult open-handed task across different topics using the same method as the O preview versus GPD 4 comparison like o preview 0 mini was preferred over GPD 40 for tasks that requires strong reasoning skills but GPT 40 was still favored for language based task model
speed as a concrete example we compared responses from GP 40 o mini and O preview on the reasoning question while GPT 4 did not answer correctly both ow mini and O preview did and ow mini read the answer around 3 to 5x faster limitation and wor next due to its specialization on STEM Science technology engineering and math reasoning capabilities on Mini's factual knowledge on non stamp topics such as dates biographics and trivia is comparable to small LM such as gp4 Mei open AI will improve these limitation in future version as as well as experiment the
extending the model to other modalities and specialities outside of the stem so let's talk about the very first top AI tool which is sk. a this AI tool aims to provide the best AI driven mock interview experience you can take mock interviews for any job roles and subject receiving instant constructive feedback skor's a interviewer listens to your responses and ask Dynamic flow of questions to test your depth of knowledge with the essential plan you gain access to six AI mock interviews per month you'll receive instant feedback ideal answers and performance reports helping you upgrade your
skills for just $8 per month the Premium plan allows you to enjoy 15 AI mock interviews per month you'll get the same instant feedback ideal answers and performance reports along with prity support maximizing your learning for only $15 per month next up we have Visos Visos Advanced AI coach is designed to elevate your job in interview skills harnessing insights from over 10,000 companies it offers stealer sessions that simulate real interviews providing personalized feedback and performance analytics it transcribes your performance and delivers a conversational experience Aken to a actual interview it also offers customized feedback and
performance analytics helping you define both your answers and communication skills during practice session you can craft questions specific to your desired role and Industry ensuring relevance and depth in your preparation supporting various interview Styles the price for Eva services are available for free next up we have interviewing.io interviewing.io is an AI powered platform that provides candidates with simulated technical interviews the platform uses machine learning algorithms to access your coding skills problem solving abilities and communication style you can get four simulated technical interviews for $225 next up we have mock mate mock mate is an AI
power tool that allows you to practice answering common interview questions this tool provides feedback on your answers and gives tips on how to improve your performance the service is available for $29 per month moving on to chat GPT chart GPT is an AI chatboard that can be used to practice answering interview questions in a conversational setting the chatboard can be customized to simulate the style of an actual interview and it provides feedback on your answers the free plan allows you to get started with writing assistant problem solving and many more you will have limited access
to chat GPT 40 mini including data analysis file uploads Vision web browsing and custom gpts all of this is available at no cost then we have the plus plan which is for $20 per month the plus plan amplifies your productivity by offering Early Access to new features access to chat gp4 40 and GPT 4os mini up to 5x more messages for chat rgpt 40 you can access to data analysis upload files Vision web browsing and Di image generation you can also create and use custom gpts then we have my interview practice my interview practice is
an AI tool that allows you to practice interviews in a simulated environment the platform uses AI to generate interview question based on your chosen Feld and provides instant feedback on your responses this tool is particularly useful for those who want to practice their interview skills in low pressure environment the plus plan is for $49 per month you get one month of access and the Premium plan is for $57 per month and you will get access to additional features and many more benefits next up we have udle udle offers real-time interview coaching using AI it analyzes
your speech and Body Language providing feedback on how to improve your performance Ely services are available for free now you might be wondering how AI tools can help you a your job interviews AI tools can assist in several ways at first we have research companies AI tool provides information about company's website news articles and social media helping you tailor your answers to the interview's question practice questions AI tool help you practice answering common interview questions boosting your confidence and preparation feedback AI tool offers feedback on your answers highlighting areas for improvement and refining your resp
sponsors personalized advice AI tools provide tips and advice on improving your interview skills giving an edge to your next job interview choose the right AI tool and select an AI tool that fits your needs and preferences use the tool consistently starting a few weeks before your interview to maximize it benefits after practicing with the AI tool get feedback from others to identify areas for further Improvement dress professionally and be punctual for your interview to make a positive first first impression show enthusiasm and confidence during the interview to leave a lasting impression on the interviewer AI
tools can significantly Aid in interviewing preparation by using them effectively you can enhance your chances of success in your next job interview imagine you are managing a global supply chain company and where you have to handle orders shipments and demand forecasting but unexpected issues arises where sudden shortages like transport delays and the changes in demand so instead of relying on manual adjustments what if an AI agent could handle everything automatically this AI wouldn't just suggest actions it would decide execute and continuously improve its strategies That's The Power of agent AI with that said guys I
welcome you all on our today's tutorial on what is Agent Ki now let us start with understanding first the first wave of artificial intelligence which was Predictive Analytics or we could say data analytics and forecasting what exactly happened uh like predictive AI focused more on analyzing the historical data identifying the patterns and making forecast about the future events and these model do not generate any new content but instead it was predicting outcomes based on the statistical models and machine learning now technically how used to work so basically what we had like we used to take
uh suppose this is the ml model okay so this is taking a structured data which could be like suppose any past user activity or it could be a transaction record or any sensor reading for example you can consider say Netflix users watch History okay it could be any movie genre watch time and the user rating so now after this what we were basically doing is we were doing the feature engineering or pre-processing okay now in the future uh Engineering Process we were extracting key features like user watch time Trends preferred genre and was frequency and
we could also apply scaling normalization and encoding techniques to basically make data more usable for the ml model then we were using the ml models suppose it could be a Time series forecasting models like ARA lstm and all those given algorithms which was basically predicting the Future movie preferences based on the historical data and in the output guys Netflix AI recommends new shows or movie based on the similar user patterns so this is how exactly the Netflix model was working incorporating the machine learning model so this was exactly the first wave of AI now let
us discuss about the second wave of AI now if I discuss about the second wave which was basically content creation and use of conversational AI so you know LM models like chat GPT became very much popular during the second wave of artificial intelligence so what exactly was happening like generative AI was taking input data and it was producing new content such as text images videos or even code and these models learn from patterns in large data sets and it was generating humanik outputs now let us bit understand how exactly this technology was working so basically
first there was a data input okay so basically any prompt from the user so suppose in the GPT okay so I'll just open GPT all over here and say we are uh suppose we are giving any new prompt say such as write article on AI okay so this was our given prompt and after this what exactly was happening was tokenization and pre-processing so the input text suppose which I have written all over here right article on AI so this text was basically split into smaller parts for example like uh you could consider certain thing like
this so here you have WR as one uh you know and as next and similarly you could carry on you know for the other words then what exactly used to happen that these words were you know uh converted into word embeding means the numerical vectors represent words like in a higher dimensional space and then we used to perform neural network processing so here the llm processes input such as attention mechanisms okay or you know using these models like gb4 bird and Lama and with the help of self attention layers they were understanding the context and
they were predicting the next word okay now as a result you were getting output certain thing like this so which was basically a Genera AI phase so this was guys our second evolution of AI now if I talk about our third wave so it is basically agentic AI or autonomous AI agent now what is this guys so the agentic AI actually goes beyond text generation so it integrates decision making action execution and autonomous learning these AI systems don't just respond to prompts but they also independently plan execute and optimize the processes so you could understand
something like this so so here the first uh step was the user input or receiving any goal so user provides any high level instruction for example it could be like say optimize Warehouse shipments for maximum efficiency it could something be like that and unlike generative AI which would generate text agentic AI executes real world actions after this what suppose The Prompt that we have given like optimize Warehouse shipments for maximum efficiency then the next step would have been quering the databases the AI would pull the realtime data from multiple sources so it could be traditional
database like SQL or no SQL where we are fetching inventory levels or shipment history then it could be a vector uh database from where it is receiving some unstructured data like past customer complaints and all those things then with the help of external apis it is connecting to like forecasting services or fuel price apis or supplier Erp systems and these things are like present with this uh respect then uh the third step was the llm decision making now after quing the database the AI agent processes data through the llm based reasoning engine example like decision
rules applied like suppose if inventory is low then it could automate supplyer restocking orders like if shipment cost is increasing then it is rerouting shipments through cheaper vendors and suppose also if weather condition impact the route then it is adjusting the delivery schedules now you can understand how agentic AI is behaving all over here in the decision- making process now next step would be action execution bya apis so AI is executing task without human intervention it is triggering an API call to reorder a stock from A supplier or update the warehouse robot workflows to priortize
I fast moving products or even send emails and notifications to logistic partners and about the changes what is going to be happen and after this finally it is continuously learning which is a data fly wheel all over here okay the AI is monitoring the effectiveness of its action like U it was restocking efficient or did routing shipments you know uh reduce the cost and all so it is mon monitoring the effectiveness of the action it has taken and the data flywheel is continuously proving the future decisions so basically it is using reinforcement learning and fine-tuning
to optimize its logic okay now let's have a just quick recap about the comparison of the all these three waves of AI so basically creative ai's main focus was on forecasting the trends okay while generative AIS was creating the content and agentic AI on the other hand which is at the final step right now is making decision and taking action so you could see how the Evolution happened of AI in all these stages and if you uh understand about the learning approach then predective AI was basically analyzing the historical data while generative AI was learning
from the patterns like using text image generation okay and but agentic AI is basically using the reinforcement learning or the self- learning to improve its learning approach now if we just look at the user involvement in productive AI so human is asking for the forecast in all here human is giving the prompts but in the agent AI the prompts or the intervention of human input has become very much minimal if you could understand the technology like basically productive VI was using machine Learning Time series analytics so these kind of you know uh algorithms they were
usings generative AI was using Transformers like GPT llama BT and all those things now agentic ai is doing what guys it is using llm plus apis plus autonomous execution so we have discussed how this work workflow is you know in a short way how it is working and uh moving ahead we are also going to discuss uh through an example how exactly all these steps like agentic AI is working so based on the example you could understand like uh predictive AI you know netflex recommendation model which they have on their system and uh similarly if
you talk about U generative AI then you could understand about chat GPT you know writing articles and all those things and agentic AI we could imagine like how AI if Incorporated in Supply chains how you know things are working out so guys I hope so you would have got a brief idea regarding the three waves of AI now let us move ahead and bit understand about what is the exact difference between generative Ai and agentic AI now guys let us understand the difference between generative Ai and agentic AI so let us first you know deep
dive into what exactly is a generative AI okay so as you can see see all over here that generative AI models generally are taking input query okay and they are processing it using llm or large language model and basically returning a static response without taking any further action so in this case for example a chatbot like uh chat GPT you know it is taking the input from the user so as I've shown you earlier that uh say suppose I've given an input like write a blog post on AI in healthcare so when I have written
this uh given uh you know user input or given the query so when it goes to the large language model these model is actually you know tokenizing all these input query and it is retrieving the relevant Knowledge from its training data and it generate text based on the patterns now we give the prompt then llm processes it okay and then we are getting the given output so now this is basically how you know generative AI is working so you could see all over here we have GPT model we have di we have codex so these
are some of the you know amazing you know generative AI models okay now let us discuss bit about Del which is actually a you know realistic image generation you know gen so uh like Del is described as you know the realistic image generation model by the open Ai and this actually is a part of you know generative AI category alongside with GPT which is basically for humanik language creation purposes this model was created and you could have also codex for like it could be used for advanced code generation purposes so let us discuss a bit
about di so di is like a deep learning model basically which is designed to generate realistic images from the text prompt and it can create highly detailed and creative visuals based on descriptions provided by the users so uh some of the aspects of di like you could have all over here like text to image generation where users can input text prompts and Di can generate Unique Images based on those description the images generated by di are highly realistic and creative okay and it can generate photo realistic images artistic illustration and even surreal or imaginative visuals
we would also have customization and variability where it is allowing variation of an image edits based on text instruction and multiple style so this is also part of a generative AI model and it is the tool is actually playing a very amazing role so I will show you one example like how generative AI is actually working in a mage generation purposes so guys as you can see all over here I have opened this generative a tool called di let us give a prompt to Di and let us see how the image is generated so let's
say we want a a futuristic city at Sunset filled with neon skyscrapper say have flying cars and holographic Billboards streets are bustling with humanoid robots and we can have people wearing uh let's just say Hightech you know let's include some technology okay now let us see how uh di is trying to create an image so this is how actually generative AI is working so it is wait for a few seconds as the output comes up now you could see all over here that uh this image which is generated basically this is generated by Ai and
you could see based on our prompt it has given like the kind of you know uh the input we gave and we got the output based on this now so this is one of the amazing uh gen tool we could explore this guys okay now guys let us discuss about agentic AI or autonomous decision making and action execution so you could see this diagram all over here so agentic AI like unlike the generative AI it is not generating responses but it is also executing a task autonomously based on the given query for example like if
you take AI in managing a warehouse inventory okay suppose we want to optimize the warehouse shipment for the next quarter so here what is going to happen so first the agent is going to receive its goal all over here okay and U this AI agent uh you know is going to query the external data sources so it could uh you know for example it could be your uh you know inventory databases or Logistics API and then it retrieves real time inventory levels and it demands the given forecast okay now at here it is going to
make the autonomous discussions and the kind of output we are going to get will be kept in observation by this agent okay so basically it is going to analyze the current Warehouse talk product demand for the next quarter check the supplier's availability and automate the restocking if inventory is below the given threshold so U for example you could uh imagine uh you know suppose B based on the you know output what we are going to get all over here so based on this output we could get certain thing like this like uh say current inventory
level like say 75% capacity okay then uh it could have also other thing like uh say demand forecast plus say 30% increas in expected in quarter 2 and also it is going to go say like say reordering initiated so this is output what we are going to get based on the supply chain man mement in example what we are trying to get so as we have seen in generative AI user is giving the input okay prompt then it is using llm model to generate the given output but agent AI is doing what guys it is
going it is going to take action you know beyond just generating a text so in this scenario it is squaring the inventory databases it is automating the purchase order it is going to select the optimal shipping providers which could be you know suitable for the given company it is going to ously refine the strategies based on the realtime feedback so guys let's recap Once More so if we talk about the function base then J is more concerned with producing a written content or a visual content okay and even it can code from the pre-existing input
but if you talk about agentic AI guys uh it is actually you know it's all about decision making taking actions towards a specific goal and it is focused on achieving the objectives by interacting with the environment and making the auton decision gen is exactly relying on the existing data to predict and generate content based on say patterns it has learned during its training phase but it does not adapt or evolve from its experiences whereas if I talk about agentic AI it is adaptive so it is learning from its actions and experiences it is improving over
time by analyzing the feedback adjusting Its Behavior to meet objectives more effectively with the help of Genna human input is essential to The Prompt so that you know basically with the help of that it could go into the llm model and it could generate the given uh you know output based on your prompt once uh you set up the agentic AI it requires like minimum human involvement it operates autonomously making decisions and adapting to changes like without continuous human guidance and it can even learn in real time so that's what the beauty of agent so
we have given one example of gen like basically giving prompt to the chat GPT or Del okay and agentic AI one example could be your Supply Chain management system now let us bit deep dive into understanding the technical aspects of how agentic AI is exactly working now guys let us try to understand how agentic AI is exactly working so there is actually a four step process of you know how agent AI exactly works so the first first step is you know perceiving where basically what we are doing is we are gathering and processing information from
databases sensors and digital environments and also the next step is reasoning so with the help of large language model as a decision-making engine it is generating the solutions if we talk about the third step which is acting so it is integrating with external tools and softwares to autonomously execute the given task and finally it is learning continuously to improve through the feedback loop which is also known as the data flyv okay now let us explore each of the step one by one and let us try to understand so if you talk about perceiving okay so
this is actually the first step where agentic AI is actually stepping up so it is doing the perception where what exactly is happening guys that AI is collecting data from multiple sources so this data could be from database okay like your traditional and Vector databases Okay so it could be graph Q like vector database means the same and if you talk about other from data it could be from epis like it is fetching realtime information from external systems it is uh basically taking data from the iot sensors like for real world applications like Robotics and
Logistics and also it could take you know data from the user inputs also like it could be text command voice commands or a chat bot interaction now how it is exactly working guys so basically let us recollect everything technically and let us see how this is happening so the first step which is going in perceiving is the data extraction where uh exactly the AI agent queries the structured uh databases like SQL or no squel for Relevant records uh it is also using Vector databases to retrieve any semantic data for context aware responses like it could
be you know any complaint certain uh you know it is trying to find out okay so next after it has got the data extraction it goes for feature extraction and pre-processing where AI is filtering the relevant features from the raw data for example like a fraud detection AI is scanning the transaction log for anomalies the third thing is is entity recognition and object detection so AI uses basically computer version to detect objects and images and uh then it applying the named entity recognition this is a technique okay uh to ract the critical terms from the
given text also so we have three uh step-by-step process which is happening in uh perceiving the first one is data extraction second one is feature extraction and pre-processing the third one is like entity recognition and object deduction so uh let us take a very simple example like AI based customer support system so if it consider an agentic AI assistant like for a customer service so say a customer is asking where is my order so the AI queries multiple databases all over here suppose it is going to query the e-commerce order database to retrieve the order
status or it could go to the logistics API to track the realtime shipment location also it could go for customer interaction history to provide personalized response the result what we get all over here is that the AI is fetching the tracking details identifying any delays if it is happening and suggesting the best course of action now uh the next step is reasoning okay now ai's understanding and decision making and problem solving is making agentic AI way greater so here what is exactly happening like once the AI has perceived the data now it should start reasoning
it okay so the LM model acts as a reasoning engine you know orchestrating AI processes and integrating with specialized models for various function so if we talk about the key components uh like here used in the reasoning it could be llm based decision making so AI agents could use models like llms like gb4 Cloud llama to interpret a user intent and Generator response it is basically coordinating with smaller AI models for domain specific task like it could be like Financial prediction or medical Diagnostics so these could be uh you know the given an example then
it is using uh retrieval augmented generation or RG model okay to with the help of which AI is enhancing the accurac you know by retrieving any propriety data from the company's databases for example like instead of relying on gbt 4's knowledge the AI can fetch company specific policies to generate the accurate answers so this could be the one and uh in in the reasoning the final step is AI workflow and planning so it is a multi-step reasoning where AI is breaking down complex task into logical step for example like if asks to automate a financial
report AI is retrieving the transaction data and I izing the trend and it is formatting the results Al so for example you could use this in uh Supply Chain management suppose consider there is a logistics company which is using the agentic AI to optimize what could be the you know uh shipping routs you know so a supply chain manager requesting the AI agent to find the best shipping route to reduce the delivery cost so the AI processes realtime fuel prices traffic conditions and weather report so using llm Plus data retrieval it finds out the optimized
routes and selects the cheapest carrier result you get is that AI chooses the best delivery option so here the cost is reduced and improving efficiency but this is one of the uh use cases guys uh so after perceiving you get his reasoning okay now let us move ahead and discuss about the third step which is act so in this step basically what is happening like AI is taking autonomous actions so unlike generative AI which stops at generating content so agentic AI takes the real world action okay how AI is executing task autonomously guys so basically
first step is like here the integration with apis and software could be happen where AI can send automated API calls to the business systems for example like reordering the stock from the suppliers Epi so suppose any inventory level is going down so it could you know reorder that particular stock from the suppliers API so it is interacting with the given API now it could also automate the workflows like AI executes multi-step workflows without human supervision so here like AI can handle like insurance claims by verifying the documents checking policies and approving the payouts and finally
AI could operate within predefined business rules okay to prevent any unauthorized actions also so ethical AI is basically being worked in this direction for example like AI can automatically process claims up to say uh $10,000 you know but it is requiring the human approval for the higher amounts So based on you know insurance and policy making stuff so agentic AI could be you know really helpful in this scenario uh one example like uh let's consider so let's say we have this agent managing an IT support system so suppose a user says my email server is
down so the AI can diagnose the issue restart the server and confirms the given resolution now if it is unresolved then AI escalates to a human technician then it results into you know AI is fixing the issues autonomously reducing the downtime okay so this is where your action or act is coming up into the picture now if you go on to the next and the final step which is learning so uh learning basically with the help of data fly wheel it is continuously learning okay so this is the feedback loop all over here which is
the data fly wheel so how AI learns over the time if we ask this question so what is exactly happening that it is uh interacting with the data collection suppose AI logs uh successful and failed actions for example like if users correct AI generated responses then AI is learning from those Corrections second thing what you could do is you could model uh you could fine-tune the model and do reinforcement learning so AI adjust its decision- making model models you know basically to improve future accuracy it uses reinforcement learning basically to optimize workflows based on past
performance okay now uh third step could be automated data labeling and self correction so here what is happening that AI is labeling and categorizing past interactions to refine its knowledge base example like AI autonomously is updating frequently Asked answers based on the recurring user queries so in this way AI is learning over the time uh exam example one you could consider uh so say we have this AI is optimizing any financial fraud deduction so say this is uh consider that this is a bank which is AI powered which has this AI powered fraud detection system
so AI is analyzing these financial transaction and it is detecting any suspicious activity and if flagged the transactions are false and AI is learning to reduce these false alerts so over the time AI is improving the fraud detection accuracy like minimizing disruptions for the customer so in this way AI is getting smarter over the time like reducing the false alerts and also the financial fraud so let's have a just quick recap of what uh we studied right now so agentic AI Works in four steps the first step is perceiving where AI is gathering data from
databases sensors and apis the next step is reasoning so it is using llm to interpret task applies logic and generating the solution the third step is so here AI is integrating with external systems and automating the task and finally it is learning so AI is improving over the time you know bya feedback loop or which is basically called as data fly meel so guys uh now let us say this diagram and try to understand what this diagram is trying to say so the first thing you could see an AI agent all over here so this
is an AI agent which is basically an autonomous system so which has a capability of perceiving its environment making decision and executing actions without any human intervention now ai agent is acting as the Central Intelligence okay in this given diagram and it interacts with the user okay uh and various other data sources it processes input queries databases makes decision using a large language model and it is executing action and it is learning from the given feedback now the next step you could see the llm model so if you talk about llms these are the large
language model which is kind of an advanced AI model trained on massive amount of Text data to understand generate and reason over natural language now if I talk about this llm so This is actually acting as the reasoning engine all over here and it is interpreting the user inputs and making informed decision it is also retrieving relevant data from the databases generating uh responses it can also coordinate with multiple AI models for different task like it could be content generation okay predictions or decision making now when the user is asking a chat board like for
example let's say what is my account balance so the llm processes the query retrieves the relevant data and responds the given bank balance accordingly now if you look at the kind of database the llm is interacting so we have the traditional database and the vector database so uh here if I say uh the database like AI agent basically squaring the structured database so suppose structure database like it could be a customer records or inventory data or it could be any transactional log also so traditional databases basically store well defined you know structured information okay so
for example like when a bank a assistant is processing a query like show my last five transaction so it is basically fetching the information from a traditional SQL based database next we have this Vector database also guys so Vector database is a specialized uh kind of a database for storing unstructured data which could be like text emings images or audio representations so guys like unlike traditional databases that store exact values Vector databases store in a high dimensional mathematical space it allows AI models to search semantically uh similar data instead of like exact matches now ai
is retrieving the contextual information from the vector databases which is ex actually enhancing the decision making it is improving the AI memory by allowing the system also to search for you know conceptually similar past interaction let us take a example to understand this for example uh we have discussed about a customer support chatbot so suppose if it queries a vector database to find out similar pass tickets like when responding to a customer query so a recommendation engine could use a vector database to find out similar products on a user's past preferences so this could be
done in that scenario also some of the like popular Vector databases could be like Facebook's AI similarity search Pine Cone or vv8 these are the certain amazing Vector databases then you could see the next step is you know after it has worked on these given data it is performing the action so the action component is referring where ai's agent has this ability to now execute task autonomously after the reasoning is done so AI is integrating with external tools apis or automation software to complete the given task it does not provide only information but it is
actually uh say you know performing the given action so for example like in a customer support the AI can automatically reset a user's password after verifying their identity if we talk about in finance then AI can approve a loan also like based on the predefined eligibility criteria now finally we have the data flight wheel so data fly wheel is a continuous feedback loop where AI is learning from the past interactions refining its models and it is always improving over the time now every time like the AI is interacting the data or taking an action or
receiving a feedback that information is fed into this model so this is creating a self uh improving AI system that is becoming smarter over the time so the data fly wheel is allowing AI to learn from every interaction and uh AI is becoming more efficient by continuously optimizing responses and refining strategies best thing in could be used in a fraud detection so in this the AI is going to learn from the past fraud cases and it is going to detect new fraudulent patterns and more effectively chatbots also can learn from user feedback and improve the
responses and finally you have the model customization which is basically you are trying to fine-tune the AI models on specific business need or any industry requirement so AI models are not static like they can be adapted and optimized for a specific task so custom fine-tuning is actually improving the accuracy and domain specific application like it could be Finance Healthcare or cyber security so a financial institution uh say fine-tuning an llm to generate a investment advice okay on a historical market trends that could be one use case or in healthcare if you discuss like the healthcare
provider is fine-tuning then AI model to interpret the medical reports and recommend the treatments so guys based on the given diagram you would have got a brief idea like how uh you know agentic AI is working now if we discuss about the future of agentic AI then guys I would say it looks very much promising because it is keep improving itself and it is finding new ways to be useful like with better machine learning algorithms and smarter decision making these AI system will be more uh independent handling complex task on their own and believe me
in Industries like healthcare Finance customer service they have already started to see how AI agents can make more impact and it could be more efficient from personalization perspective you know managing resources and many more other things so as this system continue to learn and adapt I think so there will be opening up even more possibilities helping businesses grow improving how we live and work now I would say that uh in conclusion that agentic AI is actually Paving the way for New Opportunities like unlike the old versions of AI which was assisting with generating content or
predicting the data you know or responding to any queries but agentic AI can perform techniques independently with minimal human effort and agentic AI has become self-reliant in decisionmaking day and it is making very big differences in Industry like Healthcare Logistics customer services which is enabling companies to be more efficient as a result it is providing better services to their clients today we are diving into the fascinating world of Google Quantum AI we break it down step by step what Google Quantum AI is how is different from classical computers and why it's a GameChanger and the
real problem it's solving we'll also explore the latest developments their Innovative Hardware the challenges they face and why despite the hurdles it's still an incredibly exciting field with a bright future stick with me because by the end you'll be amazed at how this technology is shaping tomorrow so let's let's get started the universe operates on quantum mechanics constantly adapting and evolving to overcome the hurdles it encounters Quantum Computing miror a dynamic nature it doesn't just work within its environment it responds to it this unique tra opens the door to groundbreaking solutions for tomorrow's toughest challenges
the question arises what is Google Quantum AI Quantum AI is Google's leap into the future of computing it's a Cutting Edge project where they are building powerful quantum computers and exploring how these machines can solve problems that traditional computers struggle with or can't solve at all if not aware classical computers use bits like zero or one and solve tasks step by step great for everyday use now quantum computers use cubits which can be zero one or both simultaneously allowing them to solve complex problems much faster so think of Google Quantum AI like you're trying to
design a new medicine to fight a disease a regular computer would analyze molecules step by step which could take years but goal Quantum AI on the other hand can simulate how molecules interact at the quantum level almost instantly this speeds up drug Discovery potentially saving millions of lives by finding treatments faster now you must be wondering why is it so necessary Google Quantum AI is necessary because some problems are just too big and complex for regular computers to solve efficiently these are challenges like developing life-saving medicines Crea in Unbreakable cyber security optimizing Traffic systems or
even understanding How the Universe works regular computers can take years or even centuries to crack these problems while quantum computers could solve them in minutes or hours so the question is actually what problems they solving it is basically solving so many problems I will list some of them number one drug Discovery simulating molecules to find new treatments faster then comes cyber security developing Ultra secure encryption systems to keep your data safe AI advancements training AI models much quicker and with more accuracy climate modeling understanding climate changes to create better solutions for global warming so in
simple terms Google Quantum AI is here to tackle The Impossible problems and bring futuristic solutions to today's challenges it's like upgrading the world's brain to things smarter and faster so Google Quantum AI has been at the Forefront of quantum Computing advancements pushing boundaries from the groundbreaking psychor process to the latest Innovation Willow in 2019 Google introduced psychor a 53 Cubit processor that achieves something called Quantum Supremacy so chibits or Quantum bits are the code of quantum computers unlike regular bits which are either zero or one cubits can be zero one or both at once this
called superposition allowing quantum computers to process vast data simultaneously they are powerful but fragile needing precise control and hold the key to solving complex problems psychos solved a problem in just 200 seconds that would take the world's fastest supercomputer over 10,000 years this was a big moment it showed quantum computers could do things that classical computers couldn't after psychor scientists realized a key issue quantum computers are very sensitive to errors even small disturbances can mess up calculations to fix this Google started working on error correction making their systems more accurate and reliable for real world
use in 2024 Google launched Willow a one5 cub processor this ship is smarter and more powerful and it can correct errors as they happen so Willow shows how much closer we are to building quantum computers that can solve practical problems Google's logical chibits have reached a huge breakthrough they Now operate below the critical Quantum error correction threshold sounds exciting right but what does this mean let's break it down so conun computers use cubits which are very powerful but also very fragile they can easily be disrupted by noise or interference causing errors so to make Quantum
computors practical they need to correct these errors while running complex calculations this is where logical cubits comes in they group multiple physical cubits to create a more stable and reliable unit for computing the error correction threshold is like a magic line if errors can be corrected faster than they appear the system becomes scalable and much more reliable by getting their logical keybords to operate below this threshold Google has shown that their quantum computers can handle ERS effectively Paving the way for larger and more powerful Quantum systems so let's discuss what is a great Hardware approach
in Google Quantum AI that made it possible Google Quantum ai's Hardware approach focuses on making quantum computers stable and reliable for practical use they group cubits which are the building blocks of quantum computers to work together allowing the system to fix errors as they happen so by keeping the chips at extremely cold temperatures they reduce interference which keeps the calculations accurate this setup helps the system handle bigger and more complex tasks like simulating molecules for drug Discovery improving AI models and creating stronger encryption for data security it's a big step in making Quantum Computing a
tool for solving real world problems so while Google Quantum AI has achieved incredible Milestone it still faces some key limitations which are fragile cubits cubits are extremely sensitive to noise and interference which can cause errors keeping them stable requires Ultra cold temperatures and precise control error correction challenges although Google has made progress in fixing errors Quantum error correction still isn't perfect and needs more work before quantum computers can scale to solve real world problems reliably limited applications right now quantum computers are great for specialized problems like optimization and simulation for everyday Computing tasks classical computers
are still better Hardware complexity building and maintaining quantum computer is incredibly expensive and complicated the advanced cooling systems and infrastructure make it hard to expand these systems widely still in early stages quantum computers including Google's are still in the experimental phase they're not yet ready for large scale practical useing Industries but despite its challenges Google Quantum AI is Paving the way for a future where Quantum Computing tackles problems that regular computers can't handle like finding new medicines predicting climate changes and building smarter AI it's an exciting start to a whole new era of Technology full
of possibilities we are just beginning to explode the future of Google Quantum AI is incredibly exciting with the potential to solve real world problems that traditional computers can't handle it's set to revolutionalize Industries like healthcare by speeding up drug Discovery Finance through advanced optimization and energy with better material modeling so Quantum AI could also lead to breakthroughs in AI by trailing smarter models FAS and commuting unbreakable encryption for stronger data security as Google improves its hardware and error correction its Quantum systems will become more powerful and reliable Paving the way for large- scale practical applications
the possibilities are endless and Google Quantum AI is the Forefront of shaping a transformative future so thank you for joining our gen AI full course by simply learn we hope this training has provided you with valuable insights don't forget to subscribe to our channel for more expert Le courses and tutorials see you in the next video staying ahead in your career requires continuous learning and upskilling whether you're a student aiming to learn today's top skills or a working professional looking to advance your career we've got you covered explore our impressive catalog of certification programs in
cuttingedge domains including data science cloud computing cyber security AI machine learning or digital marketing designed in collaboration with leading University and top corporations and delivered by industry experts choose any of our programs and set yourself on the path to Career Success click the link in the description to know more hi there if you like this video subscribe to the simply learn YouTube channel and click here to watch similar videos to nerd up and get certified click here
Related Videos
ChatGPT For Finance | How To Use ChatGPT As a Finance Analyst | ChatGPT For Accountants |Simplilearn
1:34:36
ChatGPT For Finance | How To Use ChatGPT A...
Simplilearn
1,394 views
Artificial Intelligence Full Course 2025 | Artificial Intelligence Tutorial | AI Course |Simplilearn
10:21:57
Artificial Intelligence Full Course 2025 |...
Simplilearn
8,943 views
Artificial Intelligence Full Course 2025 | AI Tutorial For Beginners | AI Course | Simplilearn
11:47:08
Artificial Intelligence Full Course 2025 |...
Simplilearn
10,990 views
ChatGPT Full Course For 2025 | ChatGPT Tutorial For Beginnners | ChatGPT Course | Simplilearn
11:37:19
ChatGPT Full Course For 2025 | ChatGPT Tut...
Simplilearn
5,746 views
Artificial Intelligence Full Course 2025 | Artificial Intelligence Tutorial | AI Course |Simplilearn
10:21:57
Artificial Intelligence Full Course 2025 |...
Simplilearn
7,080 views
Machine Learning Full Course 2025 | Machine Learning Tutorial For Beginners | Simplilearn
11:59:30
Machine Learning Full Course 2025 | Machin...
Simplilearn
5,571 views
Data Science Full Course 2025 | Data Science Tutorial | Data Science Training Course | Simplilearn
11:59:33
Data Science Full Course 2025 | Data Scien...
Simplilearn
4,279 views
GenAI Essentials – Full Course for Beginners
22:53:55
GenAI Essentials – Full Course for Beginners
freeCodeCamp.org
313,438 views
Gen AI Full Course 2025 | Gen AI Tutorial for Beginners | Simplilearn
8:59:07
Gen AI Full Course 2025 | Gen AI Tutorial ...
Simplilearn
27,927 views
ChatGPT Full Course  | ChatGPT Tutorial For Beginners | ChatGPT Basic To Advanced | Simplilearn
8:35:26
ChatGPT Full Course | ChatGPT Tutorial Fo...
Simplilearn
7,189 views
Master Generative AI in Just 6 Hours: A Complete Beginner's Guide to Gen AI from Scratch
5:54:55
Master Generative AI in Just 6 Hours: A Co...
Satyajit Pattnaik
25,135 views
Python Full Course 2025 | Python Programming Tutorial For Beginners | Python Course | Simplilearn
11:32:45
Python Full Course 2025 | Python Programmi...
Simplilearn
4,806 views
Artificial Intelligence Full Course | Artificial Intelligence Tutorial for Beginners | Edureka
4:52:51
Artificial Intelligence Full Course | Arti...
edureka!
4,158,271 views
Azure AI Fundamentals Certification 2024 (AI-900) - Full Course to PASS the Exam
4:23:51
Azure AI Fundamentals Certification 2024 (...
freeCodeCamp.org
282,380 views
Data Analytics Full Course 2025 | Data Analytics Tutorial | Data Analyst Course | Simplilearn
9:05:57
Data Analytics Full Course 2025 | Data Ana...
Simplilearn
5,167 views
Generative AI Full course 2024 | All in One Gen AI Tutorial
7:39:53
Generative AI Full course 2024 | All in On...
Great Learning
170,892 views
Generative AI Full Course - 10 Hours [2024] | Generative AI Course for Beginners | Edureka
10:14:14
Generative AI Full Course - 10 Hours [2024...
edureka!
64,759 views
Generative AI Mastery Full Course - Part 1
10:31:17
Generative AI Mastery Full Course - Part 1
DSwithBappy
76,777 views
Gen AI Full Course 2025 | Gen AI Tutorial for Beginners | Generative AI Explained | Simplilearn
10:49:19
Gen AI Full Course 2025 | Gen AI Tutorial ...
Simplilearn
10,810 views
Generative AI Full Course  2025 | Generative AI Full Course Roadmap | Gen AI Tutorial | Edureka
9:39:31
Generative AI Full Course 2025 | Generati...
edureka!
10,284 views
Copyright © 2025. Made with ♥ in London by YTScribe.com