NVIDIA's $249 Secret Weapon for Edge AI - Jetson Orin Nano Super: Driveway Monitor

454.45k views2711 WordsCopy TextShare

Dave's Garage

FREE GIVEAWAY OF JENSEN-HUANG-SIGNED ORIN NANO SUPER! See Below! Join Dave as he explores NVIDIA's ...

Video Transcript:

hey I'm Dave welcome to my shop we're going to be living on the edge literally today with a brand new Jetson Ora Nano a single board CPU a six arm cores and 1024 Cuda cores it's a pint-sized Powerhouse that is quite unlike the desktops or raspberry pies that you might be more familiar with this episode is sponsored by Nvidia a rare occurrence for my channel but fret not Nvidia has no editorial control here they just FedEx me the Orin Nano developer kit to Tinker with ahead of its release so let's dive in and see what came in the Box all right let's have a look at what the FED X-Man brought today open up the box take out a bag of air not very exciting but next to it we've got an or Nano notice the SD card that was taped to the back of the box I did not I had to fetch that from the garbage later but for now let's open it up and see what's inside Kudos tovid on the packaging it's actually pretty nice for a developer kit you get a little pamphlet here that tells you next to nothing you get a charger which I will set aside and of course you get the or a nano itself next I'll pull the power cable out and that's everything that's involved here let's boot it up now when you think of Nvidia chances are your mind jumps straight to their gpus whether that's for gaming machine learning or crunching numbers on the world's most powerful supercomputer clusters but what's a Jetson well Jetson is nvidia's answer to Edge Ai and that's all about bringing computational muscle closer to where the action is actually happening be it in robots drones cameras or as we'll see today in my driveway when you need your AI to be local and you can't stop a big desktop to it Edge Computing like the Orin Nano is the solution the Jetson Orin Nano fits into a particular but an interesting Niche it looks a little bit like a Raspberry Pi on steroids a compact form factor but with far greater performance under the hood and that's not hyperbole it boasts the GPU with 1024 Nvidia Cuda cores making it an ideal playground for AI experiments don't get me wrong this isn't going to replace your desktop for high performance gaming or anything like that but for the price it's an incredible little platform to explore what AI can do without selling your soul or your GPU budget and I'm pleased to report that they've slashed the price down to $2. 49 which is pretty impressive for a machine with as I said 1,24 Cuda cores 8 GB of RAM and six arm cores I'll confess that my first adventures with the Orin Nano were anything but cutting Aji they had actually included a bootable micro SD card with the Orin but I didn't see it taped to the side of the box and that means I went through the whole mundane process of downloading the SD card image from nvidia's website fidgeting with the tiniest micro SD card slot that I've ever seen and eventually booting into auntu Linux if there's a golden rule of developer boards this your patience is tested long before your programming skills over are I spent far too long poking around and prodding at the MicroSD Port but once that hurdle was cleared it was smooth sailing fortunately it's not something you have to do very often so otherwise it might be a concern one thing I should mention I added a 1 tbte Samsung 970 Evo SSD to give the oron Nano a bit of breathing room room now during the initial Ubuntu setup it defaulted to installing the operating system on the micro SD card instead of the SSD not ideal after some tinkering I cloned the system from the SD card onto the SSD using Linux command line tools like DD EF CK and resized to FS to make everything fit and with that the system was now booting off the SSD and the performance was definitely night and day in terms of disc it's worth the effort if you're planning to do anything intensive with it I even repeated the setup to confirm that I wasn't given a choice of install Drve which I still find odd now what makes the oron Nano particularly intriguing is its support for nvidia's AI ecosystem including tensor RT Cuda and a host of pre-trained models that makes it a solid candidate for AI enthusiasts like me who might not be ready to train their own GPT model from scratch but still want to dabble in the technology that powers things like Tesla's self-driving cars or Amazon's new Alexa with that in mind I decided to put the or n to work on a simple yet practical AI application a driveway monitor and this isn't your run-of-the-mill beam detect now this is a custom python script that uses a YOLO V8 object detection model to identify Vehicles entering and leaving my driveway the goal to teach the Jetson not just to detect motion but to understand what it's seeing and to notify me accordingly the script is where the magic happens at its core it uses the ultral litic YOLO Library running directly on the GPU to analyze video frames from my security camera feed in real time YOLO or you only look once is an object detection model that true to its name analyze izes an entire frame in a single pass making it extremely fast and speed does matter when you're dealing with live video streams so let's break the script down the script initializes the YOLO model and configures it to run on the oron Nano's GPU this isn't just about speed it's about maximizing this Hardware's potential and here's the kicker YOLO comes pre-trained on a massive data set so right out of the box it already knows how to recognize cars trucks buses and more my job was to narrow its focus to vehicles and tweak confidence thresholds to avoid any false positives after all I don't want it mistaking my dog for a Corvette the script also includes a rudimentary tracking system to keep tabs on individual vehicles I calculate the overlap between deducted bounding boxes to decide whether an object is new or just the same car moving around that way it doesn't show vehicle arriving every time somebody nudges their car forward a few inches and here's the fun part the system doesn't just detect the vehicles it notifies me over the intercom using text to speech modules if a car pulls up it announces vehicle arriving if it leaves I vehicle leaving might seem like a gimmick but it's been surprisingly effective out here in the shop the key is keeping the announcements infrequent enough that they don't turn into background noise in the final setup the script processes video frames at a few frames per second on the Orin but that's fast enough for my purposes and the oron Nano barely breaks a sweat doing it the tracking system also assigns unique IDs to vehicles and keeps a history of their movements over time I could extend this to include more advanced analytics say recognizing specific cars or who might be driving them or alerting when an unknowing vehicle arrives the oron Nano's architecture makes it possible to handle all of this in real time it offloads the heavy lifting like the neural network inference to its caor freeing up the CPU for other tasks it's this seamless interplay between the hardware and the software that sets the Jetson apart from say a Raspberry Pi or similar boards and because it's from Nvidia it works with Cuda and working with Cuda is almost a prerequisite for doing AI these days now let's pivot to a completely different AI use case for the oron Nano butning large language models locally with llama and the Llama 3. 2 model if you've ever been fascinated by how chat GPT like systems generate human like responses you're going to love this experiment the idea is just to see how well the Orin Nano can handle processing a massively large model locally no Cloud involved and then compare its performance to something like an M2 Mac Pro Ultra to give the Orin a better shot we're going to up its power to the nmax setting doing so it required that I update the firmware from the Nvidia site which I did and then the machine came back up with the new nmax power setting which I selected for maximum performance now before we look at setting up AMA on the Orin Nano let's take a quick look at running it on the pi 4 first I used my best pi4 an 8 gbyte model so it would have the memory needed to even have a chance at running the model and when I ran it I found myself ambivalent in the literal sense of the word because I was of two minds about it first it was incredibly impressive to me that a Raspberry Pi can run a large language model at all it's like when a dog plays a piano it's not how well they do it it's that they do do it at all and like the dog playing the piano the pie does it but not very well it runs at a speed of about a to in a second so it's far too slow to do anything responsive or truly useful I'd say you're certainly not going to have any kind of useful back and forth conversation with it so let's see if the oron Nano with the cacor fares any better the first step in this experiment was to install olama the local platform for running llama models olama simplifies the process of using large language models on your local Machine by providing a streamlined frame workor for downloading and running these models efficiently to install olama I ran the script provided on the ama.

com homepage next I downloaded the Llama 3. 2 model this model is one of the most advanced open source large language models available known for its high accuracy and capability to generate detailed coherent responses using ama's CLI downloading the model was as straightforward as AMA pull llama 3. 2 and with the model installed I was ready to test its performance on the auron Nano to measure through put I used ama's verbose mode this mode provides detailed insights into the model's operations like the metrics such as the tokens generated per second GPU use and latency per token these statistics help paint a clearer picture of how the hardware handles the Intensive AI workloads offering valuable data points for optimization and performance tuning the specific tests involved asking llama 3.

2 to generate a 500-word story based on a simple prompt tell me a story about robots that learn to paint the Orin Nano tackled this task admirably particularly given the challenge of running a model as large and complex as llama 3. 2 processing a large language model locally requires not only substantial computational power but also efficient resource allocation the oron Nano's Reliance on its CA cores and six arm CPU cores demonstrated its optimized architecture for AI workloads using all six arm cores for CPU side operations and offloading as much as possible to its Cuda cores the system managed to generate around 21 tokens per second while this might not sound blazing fast as compared to cloud gpus or the high-end desktops it's important to remember that this is a 15w device and it's at least an order of magnitude faster than the pi and then some the verbose output showed steady token generation with the GPU utilization hovering around 60% the story itself was rich and detailed and while the processing time was longer than you'd experience on a high-end workstation the Orin Nano proved as more than capable of running Cutting Edge language models in the end those 20 tokens per second are easily fast enough to make it responsive enough for fluid text to speech answering questions or using the model to solve problems in real time for comparison I ran the same test on an M2 Mac Pro Ultra and it's a fairly maxed out machine as well with the maximum number of GPU cores I think it's 76 in the Mac world and as expected the Mac Go perform the oron Nano by a factor of about five generating tokens at an impressive of 113 tokens per second this performance is largely due to the m2's unified memory architecture and highly efficient neural engine both of which which are optimized for handling AI tasks the significant difference in token generation speeds highlights the disparity and computational power between the two systems but also underscores the efficiency of the Orin Nano given its limitations however what's fascinating is how close the Orin Nano comes given its size and power constraints the Mac Pro represents the Pinnacle of Apple's desktop processing power with its custom silicon optimized for AI tasks it also cost more than $10,000 the oron Nano on the other hand is a $249 developer board designed for Edge Computing despite this it holds its own in a way that's nothing short of remarkable now if you need even more performance out of the system we can go to a more compact version of llama 3. 2 with only a billion parameters doing so more than triple the speed to an impressive 34 tokens per second a very fast generation rate so why would you use an Ora Nano instead of a more powerful system well the answer lies in its Niche Edge Computing applications often prioritize low power consumption compact form factors and low local processing capabilities the oron Nano can run AI models like llama 3.

2 in environments where a full-fledged desktop or server isn't feasible think of robots iot devices drones and that sort of thing imagine embedding a language model in a drone for natural language processing as it's flying allowing it to interact seamlessly with the operators or other devices in real time and so the Jets or Nano continues to impress with its versatility and raw performance for its size particularly when compared to other Edge Computing Solutions like a Raspberry Pi or the Coral TPU its ability to seamlessly integrate with nvidia's AI ecosystem coupled with its low power consumption and robust Hardware makes it an exceptional choice for developers and researchers looking to push the boundaries of an AI budget the device strikes a compelling balance between cost performance and functionality I think solidifying its place in the edge AI landscape from driveway monitoring to running large language models this pint-sized AI Powerhouse proves that you don't need a data center to do serious AI work while the M2 Mac Ultra Pro May dominate in Ross speed the oron Nano's ability to run models like llama 3.