Learn R in 39 minutes

641.74k views6417 WordsCopy TextShare
Equitable Equations
Got 40 minutes? You can learn R and still have time for high fives afterwards. If this vid helps you...
Video Transcript:
hey everybody this is a short but I think shockingly thorough introduction to data analysis with r one of the amazing things about R I think is how quickly you can just jump in and get stuff done even if you don't have any programming experience my whole YouTube channel is dedicated to data analysis with r I encourage you to subscribe I'll throw links to different videos on going into some of these topics in more depth as we go through if you haven't already you'll need to install R I strongly recommend that you also install rstudio R is the programming language our studio is the front end that almost all of us use when we're working with r if you just Google rstudio or our studio desktop and follow the installation links you'll wind up at a page like this where it prompts you to both install r as well as the front end rstudio all of this is free um and generally pretty hassle-free as you install it once you've done that open up rstudio that is the only one you'll ever need to click on you don't need to interact with um the actual R icon directly and you'll get something like this there's a lot going on but when you're first starting out if you want you can just view it as a graphing calculator and do things like five plus seven or the absolute value of negative 17. and get answers out you can also work with variables here so we can assign X to be the value um I don't know negative 12. and you can see that that value X is now stored as negative 12.
I use a left arrow for assignment that's encouraged in R an equal sign will also work but for some technical reasons we generally don't do that and then you can do your usual operations on that variable so I could do X plus 7 to get negative 5 or I could do the absolute value of x you can also assign vectors I'm sorry variables to be entire vectors of values so let's let y be equal to negative 12 6 0 and negative 1. you can see now I have a value stored for y but there's actually four numbers in there so it's sort of an ordered n-tuple negative 12 6 0 negative 1. and I can do operations on y on this Vector for instance I can double the whole thing notice that's happening component wise I can apply functions to it like absolute value of y and expect those to be done component wise I could take a sine a tangent an exponential whatever I like if you're watching this video though you're probably not so much interested in the programming aspects and more on the data aspects and that makes sense R is fundamentally a language set up for working with data so let's import a data set and uh and take a look at it here in the lower right I have a file browser so I'll click on that and you can see you can navigate around your machine until you find the data set that you want to work with and if you have an Excel spreadsheet for instance or a CSV file importing a data set is extremely easy just find it by following these breadcrumbs clicking on the folders that you want and then just click on the file that you want and go to import data set that'll pull up a window with lots of different options that you can more or less just ignore when you're getting started and just click import okay so a few things have happened before I talk about any of them I just want to mention that what this data set is that we're looking at this is the Scooby-Doo database and I got it from the tidy2 to say project every week tidy Tuesday posts a new and interesting data set for us to practice our our skills data cleaning visualization and Analysis more broadly you can also of course work with these data sets in other languages but this definitely is revolving around the our ecosystem strongly recommending this um to check out tidy Tuesday the Scooby-Doo data set was featured at one point or another okay back to the code the importing is actually happening here in the second line the read Excel command and you can see inside it has found the file in question you can see the file path there write it in and assigned it to the variable Named Scooby and if you look in my environment tab up here on the upper right there's now a data set 549 observations rows of 75 variables columns this line here the view command is what actually opened up that data set so that we could see it it actually just put it in the viewer in an interactive way and we can kind of scan through it and see some of the variable names who caught the villain Velma Shaggy and so on did Shaggy get a Scooby Snack stuff like that lots of fun stuff here this First Command Library let's talk about that for a second R is an old language but over time it has been expanded and developed by its large and Vibrant Community of users the add-on sets of functions that they have created and which are available to us are called packages and this Library command is opening up a package of functions called read Excel that give us some additional functionality for working with Excel spreadsheets you can think of this package read Excel like an app on your mobile phone and the library command is opening that app and giving us access to its functions but if you haven't already installed that app if you haven't already installed this package of functions ours not going to know what to do with this Library command won't know how to open the app so you need to start by installing it with install.
packages parentheses quote read Excel I won't actually execute that because I already have it installed and actually I already have the package loaded so that's not something I need to do you only need to install the package once but every time you want to use it every session um every time you open up R and want to use it you have to library it you have to actually open that app okay so lots of interesting stuff here lots of interesting variables um we might want for instance to get some summary information on some of these variables like for instance what's the average run time of all the episodes in this database so not surprisingly that's a mean command now I want to get the average of this column in the Scooby data set and the syntax in R is to first name the data set so Scooby notice the autocomplete suggestion you can either use tab or enter to acknowledge that so I want the mean in the Scooby-Doo data set of the runtime variable and I'll specify the column I want with a dollar sign and then if I start typing runtime I can then select the variable that I want or just type it entirely a little more than 19 minutes for the average run time of all these episodes let's do the same thing for IMDb the average IMDb rating we'll see there's a little bit of a subtlety notice how fast I was able to key that in I used the up Arrow to get to the previous command and if I want to get two commands back I go up Arrow twice so this can save you some type some typing time so up arrow and then backspacing over run time and I'll replace it with IMDb and when I execute that to get the average IMDb rating of all these episodes I get an n a and the reason is if I go all the way down to the bottom here there are some n a's in this column some missing values there's literally no data for more recent episodes of Scooby-Doo in here they just didn't have IMDB ratings when this set was made and so when R tries to figure out what the average IMDb rating is over time it just doesn't know because these n a's could be very high or very low and so those n a's propagate so the mean command I'm using the up Arrow here like many functions in R has optional arguments and I'm going to put in one of those optional arguments right now n a DOT RM remove the n a's is true you can leave out n a DOT RM and it will default to false leave them in here I'm overriding that and saying take them out the average IMDb rating in this set for the episodes that actually have IMDB ratings is 7. 34 foreign okay now we've already done a bunch of commands here things are starting to get a little bit busy if you're doing a more full data analysis going line by line like this can be problematic you can lose track of your work it's also more difficult to recreate things later on so what we'd like to do is to actually have a document that actually encodes all of this that actually contains all of this in this video we're going to see two ways the most fundamental though is a script and so I'm going to go up to this little piece of paper here and get new R script and that's going to open up a new tab right next to my data set that's essentially just a text file and the idea is that we can code line by line here and then save this file later on just using the disk icon and save it anywhere we want so for instance I could put in a library command like Library read Excel going forward I'm going to want to use an entire ecosystem of packages that have become excuse me very popular over time and that is the Tidy verse family of packages these are produced by posit the same company that makes this front end our studio largely developed by their Chief data scientist Hadley Wickham and the Tidy tidyverse family of packages have really revolutionized R and data analysis just over the last decade so um I'm going to execute that if I just hit enter right now nothing's actually going to happen except I'm going to get a new line This is literally a text file and so R doesn't know that I'm actually wanting to execute that code it just thinks I'm wanting to go to a new line to write some more stuff if I want to actually execute the code I have to hit command enter on a Mac which I'm on or control enter if I'm on a PC and then that will send the line of code down to the console and actually execute it the Tidy verse consists of eight core packages you can see them listed here they all have some pretty important purposes in R I would say the ggplot2 and re and I'm sorry and D plier in particular have become absolute standards in our programming if you are using R today you absolutely have to know those two the others have been largely adopted as well it's certainly worth learning them all in this lecture in this video I'm going to talk about those two packages a little bit I don't think I'll get into any of the others particularly okay um the First Command I want to show you other than loading up packages or the first I think helpful tip I want to point out is this data command and I'm going to hit that and when it happens when I do that it's gonna show me a long list of data sets that are built in in r that you can use to practice some of the skills that I'm going to be teaching some of them that are just built in with base r that just come no matter what else you do like Titanic is a famous One and some that load up with these other packages so for instance if I scan down far enough data sets in the package D plier so these are some data sets that are included with that D plier package that was included in the Tidy verse that you can use to work on your data wrangling ggplot2 and so on in the next few examples I'm going to be using the mpg data set so I could do view MPG we've already seen that command when we um imported the Scooby data set I'll hit command enter to actually execute it and you can see the command was sent down here and now the mpg data set gets opened up in the viewer I'm going to close up a couple of these other windows for neatness okay so I don't know anything about the mpg data set I want to learn a little bit more about it when you want to learn more about a function or a data set that is either built in or loaded in with a function or loaded in with a package you can ask about it with question mark in this case MPG and when I hit command enter that will open up something in this help tab in this case we see we have fuel economy data from 1999 to 2008 for 38 popular models of cars and then we have a little bit of a data dictionary telling us more about it we can get help files on all sorts of functions as I mentioned for instance mean to get the arithmetic mean and you can see the sorts of arguments we might use as well as some of the optional arguments okay one other command I would like to show that I think is very helpful when you're encountering a new data set is the Glimpse command and you just feed it the name of the data set and the Glimpse command if you look down here maybe let me make this a little bit more visible for the moment just gives you sort of a top level overview of your data set what how many rows how many columns what are the variable names remember after a dollar sign R is typically specifying a variable name in a data set what are the first few values and what sort of variable is it now if you have programming experience you know that different variables can have different types and you've probably had to suffer through a fair amount of information on different data types in r that certainly exists however we are able to suppress some of that uh some of the technical stuff in our data analysis and just acknowledge that data can either be categorical like Audi or A4 or quantitative like 1. 8 or 1999.
now at a deeper level of sophistication there are decimals doubles and integers there are factor variables versus character vectors but another nice thing about R is that fundamentally for most purposes you don't have to think too hard about that and so I'm not going to talk anything more about not going to say anything more about it in this video okay so a very common data analysis task that you might have on a set like this is to subset it by rows for instance I might be interested in only the front wheel drive cars or only the cars that have City mileage at least uh 20 miles per gallon so let's do both of those things the fundamental command we're going to use to subset by row is the filter command and of course we can learn more about that with question mark filter in fact maybe I'll just do that question mark filter so I want this one subset rows using column values so for instance I'm going to start out by getting the cards whose City Highway City mileage is at least 20 miles per gallon um the first argument here should be the data set so that's MPG and then I have to specify the condition I want so City mileage should be at least uh 20. so greater than or equal to and when I command enter this it's not going to be exactly what I would hope for I'll explain why let me get a little better view on this okay so what happened is it just kind of printed it out you'll see I now have 56 rows as opposed to the um uh how many did I have before many many more that I had before if I go up and look at that Glimpse command I can see I had 234. what I would really like to do is to take this filtered set save it as a new value so maybe how about MPG efficient and then be able to do operations with that so I'm going to copy and paste for instance just right off the bat maybe I want to view that MPG efficient and now here in my viewer I can see that all these cars have City mileage of at least 20.
great let's do one more filter let's take MPG let's call this um Ford and let's do a filter so that was that manufacturer I think yes manufacturer should be quote Ford now if I hit command enter right now I'm going to get an error we detected a named input so remember when I was doing an optional argument on that um that mean command earlier which was pretty far back let's see here should I even try and find it it's up here somewhere yes I named the argument n a DOT r m equals true so here R is looking for an argument called manufacturer that's not what I mean I mean I want a value of a variable so the equal sign here that I'm looking for is actually different than the equal sign that R thinks I mean to specify um this kind of logical equality that I'm looking for I want a double equal sign and now that'll work and um I misspelled it manufacturer now it will actually work and um let's just take a view of that MPG underscore there it is and you can see now it's all forts great neaten that up a little bit um I think the next most common data task that you might have is to add or change a column in a data set so I'm going to do that one of the things I notice in this set is that the units of measure are metric or I'm sorry our standard miles per gallon and I know many people in my audience will be UNS familiar or less comfortable with miles per gallon then for instance uh kilometers per liter so let's take the mpg data set and add in a new column that is going to have the city mileage in in that new unit of measure the command I'm looking for is mutate okay and mutate is going to add or change a variable in my data set as with Filter the first argument should be the name of the data set and then after that I need to specify the name of the column that I want to add or change so in this case it's going to be cty metric and I need to specify a formula for this new column so um I Googled the conversion factor for converting miles per gallon to kilometers per liter and it is this number right here so let me just copy and paste that so I'm going to do that times the uh City mileage that was in miles per gallon command enter and you can see I now have a data set called MPG metric instead of having 11 variables it now has 12.
Related Videos
Not-so-simple linear regression with R
35:49
Not-so-simple linear regression with R
Equitable Equations
7,274 views
R programming for ABSOLUTE beginners
14:13
R programming for ABSOLUTE beginners
R Programming 101
364,232 views
R vs Python
7:07
R vs Python
IBM Technology
318,676 views
Data wrangling with R in 27 minutes
27:19
Data wrangling with R in 27 minutes
Equitable Equations
22,173 views
Data visualization with R in 36 minutes
36:16
Data visualization with R in 36 minutes
Equitable Equations
28,313 views
How I Would Learn Python FAST in 2024 (if I could start over)
12:19
How I Would Learn Python FAST in 2024 (if ...
Thu Vu data analytics
228,409 views
R Markdown TUTORIAL | A powerful tool for LEARNING R (IN 45 MINUTES)
45:22
R Markdown TUTORIAL | A powerful tool for ...
R for Ecology
88,330 views
R programming in one hour - a crash course for beginners
59:48
R programming in one hour - a crash course...
R Programming 101
378,603 views
How I'd become a data analyst (if i had to start over) in 2024
8:57
How I'd become a data analyst (if i had to...
Agatha
382,545 views
R Programming for Beginners | Complete Tutorial | R & RStudio
49:45
R Programming for Beginners | Complete Tut...
Dynamic Data Script
711,891 views
Teach me STATISTICS in half an hour! Seriously.
42:09
Teach me STATISTICS in half an hour! Serio...
zedstatistics
2,749,567 views
R Programming Tutorial - Learn the Basics of Statistical Computing
2:10:39
R Programming Tutorial - Learn the Basics ...
freeCodeCamp.org
4,305,878 views
Master Data Analysis on Excel in Just 10 Minutes
11:32
Master Data Analysis on Excel in Just 10 M...
Kenji Explains
1,982,908 views
🚨 YOU'RE VISUALIZING YOUR DATA WRONG. And Here's Why...
17:11
🚨 YOU'RE VISUALIZING YOUR DATA WRONG. And...
Adam Finer - Learn BI Online
72,700 views
R Programming For Beginners | R Language Tutorial | R Tutorial For Beginners | Edureka
1:10:56
R Programming For Beginners | R Language T...
edureka!
1,142,555 views
R vs Python | Which is Better for Data Analysis?
11:51
R vs Python | Which is Better for Data Ana...
Alex The Analyst
224,687 views
Stanford's FREE data science book and course are the best yet
4:52
Stanford's FREE data science book and cour...
Python Programmer
691,907 views
I Studied Data Job Trends for 24 Hours to Save Your Career! (ft Datalore)
13:07
I Studied Data Job Trends for 24 Hours to ...
Thu Vu data analytics
190,421 views
R/Medicine 101: Intro to R for Clinical Data (Stephan Kadauke, Joe Rudolf, Patrick Mathias)
2:24:10
R/Medicine 101: Intro to R for Clinical Da...
R Consortium
7,976 views
Graphics in R with ggplot()
18:39
Graphics in R with ggplot()
Equitable Equations
28,173 views
Copyright © 2025. Made with ♥ in London by YTScribe.com