ARRAYLIST VS LINKEDLIST

83.4k views3602 WordsCopy TextShare

Core Dumped

In this one, we explore how ArrayLists and LinkedLists works at memory level and how scripting langu...

Video Transcript:

arrays are a fundamental tool for developers but their behavior can vary significantly depending on the programming language being used in systems programming languages it's necessary to specify both the type and size of elements the arrays will store scripting languages offer more flexibility allowing for the addition of an unlimited number of elements while permitting the inclusion of diverse data types within the same array so our scripting languag is just better in today's video we'll look at how scripting languages handle this issue why it exists and how lower level programming languages overcome it we'll also discuss why

JavaScript arrays aren't even real arrays hi friends my name is George and this is core dumped before starting I want to send special thanks to the first members of this Channel and to today's sponsor codec Crafters more about them later in the video arrays are quite simple they contain elements each occupying a position in programming the first position is usually zero the second is one and so for these positions are called indexes which we use to retrieve and change the element at that position in the array at a low level the idea is to maintain

each array element adjacent to one another in memory but there's a problem here since memory also holds other values arrays should not inherently expand disregarding this fact and adding elements could potentially overwrite other values that might be required later in this example immediately after adding elements to the array a numeric Rec addition operation follows the problem here is that computers don't access values in terms of variables but in terms of memory addresses when the operation is executed the machine code produced by the compiler instructs the CPU to add the numbers located in the specified memory

addresses which is right because it refers to where both values were initially stored however since both values were overwritten by the array the CPU will execute the addition between incorrect values as you can see see this presents a significant issue one that is commonly addressed by simply not permitting programmers to add elements to arrays this explains why arrays and systems programming languages lack methods such as append insert or push as seen in Python JavaScript or other scripting languages apart from the fact that arrays are not objects at low level let's see how systems programming languages

enforce this basic rule when the array is declared we specify both the type of elements it will contain and its length indicating how many of those specific elements the array will contain the reason behind the inability to have different types within the same array lies in the use of this information to determine the required memory space for the array the type specification informs the compiler about the number of bytes each element occupies in memory by multiplying this value by the array's length the precise amount of memory required to store the entire array can be determined

whether the array is stored in the stack or the the Heap this information is essential to know how much to move the stack pointer and allocating a sufficiently large memory space respectively you can check out my two previous videos for more details about this okay here you see a clear distinction between elements but remember this is only an illustration in the bare Hardware things look more like this and the only information available to us is the memory address where the array begins referred to as the Base address this is how indexing Works since we know

the number of bytes each element occupies within the array when we want to access a specific element we can calculate the memory address where that element starts to achieve this we add the product of the element's size and its position referred to as the index to the memory Base address this explains why we utilize zero instead of one for the first element when attempting to access the initial element moving the pointer is unnecessary as far as I know the C programming language introduced this sugar syntax to simplify a process known as pointer arithmetic and the

reason why almost any major programming language out there uses indexing starting with zero is because their syntax is inspired by C pointer arithmetic is a concept very important in C I want to make a video about it in the future but for now here's a very good explanation about it from one of my favorite channels to conclude this section of the video it's important to note that the array itself does not contain information about its size at this level of abstraction an array is simply a block of memory lacking an inherent property such as length

to verify the validity of an index being passed therefore without caution there's a high likelihood of attempting to access an element outside the actual bounds of the array when this happens there are two possible scenarios if the array is allocated at or near the end of the memory Gra granted to our process the operating system will detect attempts to access memory locations beyond our allotted space in such instances the operating system terminates the execution of our process resulting in a segmentation fault error if the array is allocated further away from the memory limit the operating

system will not terminate the execution since we'll be accessing memory within our process's boundaries in this example even though we're accessing values out of the array it doesn't matter if that region is holding other value as everything is ones and zeros at low level the accessed region will be interpreted as if it were of the same type as the elements within the array allowing the program to continue execution you might think the second situation is the best case scenario because our program won't crash but stop the video for a second and think about it is

it preferable for our program to crash or to continue execution under the assumption that the array actually had a Fifth Element I mean just imagine if this array represents purchases made by an individual okay now that we understand why arrays can't grow and why they are limited to storing only one kind of element at a time let's address the non-dynamic size Behavior by utilizing data structures but before that a quick message from today's sponsor if you follow this channel you're probably the kind of person who loves to understand how things work if you've ever wondered

how your favorite developer tools like git redus or Docker work under the hood the best way to gain that knowledge is by recreating them yourself codec Crafters offers challenge-based courses that guide you through crafting these tools step by step if there is a concept that you don't know their platform will provide you with the necessary information to help you continue progressing the best part is you can start the courses by choosing a programming language of your choice so you're not only recreating these tools you're also improving your programming skills with the language you prefer if

you're interested and want to support me you can sign up for code crafters and choose the plan that best suits you using the referral Link in the description below I've been thinking about solving these challenges on stream but I don't know if that's something you guys are interested in I'm looking forward to hearing your thoughts in the comments you can follow me on Twitter for updates on this matter and now back to the video data structures in case you haven't watched my previous video let me quickly explain linked lists to you the simplest method to

obtain a dynamically sized list of elements is by storing them wherever there is available memory space to do this we create a custom data type called a node a node contains one item and one pointer the item represents the element of the list and the pointer directs to the address where the next node containing the next element resides in memory we only need to maintain a reference to the first node typically in the stack when we wish to add a new element to the list we search for an available space big enough for that node

then we update the last node of the list to point to the new node which becomes the last one this approach eliminates the concern about the list of elements growing however linked lists suffer from a significant drawback they are not efficient for caching data to give you some context cache is a small extremely fast memory embedded directly in the CPU its sole purpose is to store a copy of a small region of memory when the CPU requires data fetching it from memory can take a considerable amount of time relative to the CPU speed of course

but if the required value is cached the process of loading data into registers can be orders of magnitude faster when the data required for a CPU operation is found in Cache we refer to it as a cach hit conversely if the data is not in Cache we refer to it as a cach Miss due to its limited size the cache cannot store a copy of the entire memory for Optimal Performance our data should ideally be as compact as possible allowing it to fit into the cach and increasing the chances of cash hits this may sound

like a trivial concept but when our programs are optimized to take advantage of caching their performance can increase tremendously furthermore in modern Computing cash isn't just a feature it's the norm therefore cash misses will be our primary concern for the remainder of this video the problem as linked lists elements are scattered throughout memory they are very unlikely to be cached here arrays are a clear winner as they keep elements compacted in memory but now we are back where we started arrays are too Limited in functionalities here's where array lists come into play as the name

suggests we still use arrays but in a smarter way instead of just holding a reference to an array we create a data type that includes that reference plus additional information about the array when created this data type allocates memory for an empty array the capacity attribute indicates how many elements the array can hold which in this example is five the length attribute indicates how many elements the array contains at any given time initially since the array is allocated memory with no elements its length is zero the idea behind this data type is that it'll handle

the array for us so it comes with some Associated methods as the name suggests the push method accepts a value and adds it to the array every time we add an element to the array the length attribute increases but note that eventually the array will be filled with elements when this happens if we try to add more elements we'll exceed the arrays bounds which is exactly what we're trying to prevent by using this data structure to prevent adding values beyond the array's capacity we need to check whether the array is full before adding a new

element this is as simple as verifying if the length is equal to the capacity if the array is not full we can safely add the value of the new element however if the array is full we need to allocate a larger array to accommodate more elements typically we allocate a new array twice as long as the one that is full during this process we update both the capacity and the reference to the new array this involves copying all the values from the previous array to the new one once the elements are copied to the new

array the previous array is no longer needed so we free its memory finally with free space available in the new array we can add the new element and update the length accordingly and that's it now we have a list of elements whose size can grow dynamically and please don't tell me in the comments that this code can be written in better ways it was written by my cousin who knows more than you now it's important to mention that this process of creating a new larger array when the current one is full involves a heap allocation

which can have performance implications as we discussed in a previous episode additionally copying elements around in memory also incurs additional time if you know the approximate or exact number of elements the array list will contain and you're concerned about performance it might be wise to create the array list with a specific initial capacity this can reduce the need for frequent resizing operations which means fewer allocations which in turn means better performance this is why in languages like rust you usually create a vector the equivalent of an array list using the new method but there's also

the WID capacity method available another characteristic that array lists inherit from arrays is constant time access since elements are internally stored in an array accessing an element at a specific position simply involves retrieving the element from the array by indexing it this operation has a constant time complexity to prevent accessing elements beyond the array boundaries we perform a validation before indexing the array this way if the provided index is greater than or equal to the array length we can return null none or a proper error this effectively solves the problem of accessing outof bound elements

causing undefined behaviors which arrays by themselves cannot avoid though it adds an extra step the time complexity remains constant the reason I'm bringing this up is because in linked lists accessing elements is not that easy unlike arrays linked lists don't provide direct access to Elements by index as elements are scattered in memory there is no formula to determine the addresses where elements reside accessing an element at a specific position in a linked list requires traversing the list from the head until reaching the node in the desired position since each node must be traversed individually the

time complexity for indexing in quotes a linked list is linear the time complexity is not the only performance challenge here even if the desired element is in the cache the other nodes along the path to find that element are often not cached which can significantly slow down the traversing process when is up to remove operations after deleting an element from the array all the subsequent elements must be shifted back one position in the array so keep in mind that copying data in memory can be timec consuming if the removed element is closer to the end

of the array the shifting process is faster since there are fewer elements to shift if the removed element is closer to the beginning of the array there are more elements that need to be shifted which can significantly slow down the process of course this also depends on the number of elements within the array list now linked lists Fanatics would argue that remove operations are faster because everything we must do to remove an element is to update its previous node to point to the following node and then free The Unwanted nodes memory and yes it is

true as no shifting operations are needed the time complexity of this operation is constant but we also should consider that in order to update this node we first need to find its value in memory and as I've mentioned like 20,000 times accessing an element in a linked list is not efficient due to caching but before putting shame on linked lists as we're going to do with JavaScript in a moment consider that contrary to what is usually said linked list could still be a better choice in certain situations and listen correctly please I said it could

not all architectures base their efficiency on caches nor in data shifting instructions I'll read your comments about your opinions on this matter at the end of the day picking the right data structure depends on several factors when you get into the low-level World you'll usually find yourself benchmarking multiple Solutions attempting to ring out every drop of efficiency from the hardware now before getting into the final part you might have noticed that in both cases we created the data structure for a specific specific kind of element here the array list Can Only Hold 16bit signed integers

while the linked list Can Only Hold 8 bit unsigned integers so how can we solve this without having to create a different version for each type we desire well if you are thinking about generic types yes you are right but just to clarify generics don't work the same way in all programming languages in Rust for example when you declare a generic type the compiler identifies each instance where you use that generic generates source code specific to that type and substitutes the generic type with the generated type this process is invisible to us as it's accomplished

at compile time as a result in memory we get data structured exactly as we expected another approach is employed by Java for custom types if the type contained by the array is not primitive in instead of creating an array of values the VM is actually creating an array of pointers pointing to the memory addresses where the elements resides in memory here the language is designed to hide all of those pointers from us so we think we are dealing with arrays of values when in reality we are not if you are thinking this is bad for

caching yes this is one of the reasons why Java is considered a pretty fast language but still slower than systems programming languages with this approach linked lists can get even worse because nodes won't contain the actual element and the pointer to the next node but a pointer to the element and a pointer to the next node a cache nightmare one that hilariously is very common to see in languages like C where beginners are tempted to create something like this due to the lack of generics and this leads us to the final part of this video

how do scripting languages handle arrays all right let's quickly examine both the JavaScript and python approaches the first thing to remember is that scripting languages are typically not compiled into machine code instead they rely on a special program to interpret the script allowing it to mimic or emulate the desired behavior of course to execute that emulation process the real Hardware is used for this reason such programs are often referred to as interpreters or virtual machines when we declare an array behind the scenes The Interpreter generates a specialized data structure to simulate an array of elements

ments to understand what python does we can explore the implementation of the official interpreter which is open source here we encounter a type called py list object this is what The Interpreter generates when we create a python list the documentation comment here states that this type is nothing but an array of object pointers so it's similar to what Java does for non-primitive types an array of memory addresses pointing to the values but there's a key difference here in Java all pointers with within the array point to values of the same type in Python these pointers

point to objects implying they can point to values of different types this explains us being able to create an array and still being able to add multiple kind of elements to it because all of these pointers are of the same size when we index a python list The Interpreter simulates indexing the list by first indexing the array of pointers and then reading the value that the pointer is pointing to and finally let's talk about about JavaScript in simple terms JavaScript arrays aren't actually arrays they're hashmaps JavaScript doesn't concern itself much with arrays in memory instead

when a JavaScript array is declared the engine creates a hashmap where the keys correspond to the indexes you can verify this yourself create a Javascript file declare an empty array and then insert some elements at the position hello as you'll observe the indexing process is essentially syntax sugar for retrieving an element from the hashmap by its key now even though a lot of people claim that I don't like JavaScript I have to admit that this is a very bizarre but clever solution what's particularly fascinating about this is that if we add an element at position

1 million the engine won't need to allocate memory for an array that large instead it simply inserts another element into the table accessible with the key 1 million very cool huh and that about wraps for now if you notice that the anim quality decreases in the JavaScript part it's not a coincidence this video was particularly hard to animate so hit the like button if you learned something and if you want to learn more consider subscribing and following me in other platforms the subscriber count has double since the last video so again thanks a lot for

your support