Prefix Sums - Problems, Code in C & Python

47.2k views3367 WordsCopy TextShare

Errichto Algorithms

Prefix sums are the sums of the first K elements in an array. You can use them to quickly get the ra...

Video Transcript:

let's talk about perfect sums including C plus and python implementation first here's an exercise for you if I tell you that this long sum is equal to 50. and then I erase the first two elements three and five then what is the sum of remaining elements to be able to quickly compute that it's 50 minus 8 the sum of the first two so 42. and that's kind of the idea for what we will do today with prefix sums now we need some definitions prefix of a string or a sequence is a range that starts at the beginning with index 0.

similarly suffix would be something that ends at the end at index n minus one but will work with prefixes mainly prefix sum is the sum of elements inside and then prefix of length K let's say is the first K element that's it how can we compute prefix sums and store it we compute it anyway when we want to compute the sum of N elements if I go for this array from left to right and I add up elements while writing down the partial results 3 plus 2 is 5 then it's 6 9 10 13 and 16. 16 is the sum of all the N elements then those are prefix sums for example 9 is the sum of the first four elements so we would say that prefix of 4 a perfect sum of 4 is equal to 9. for convenience in code later on we often add here we insert the extra zero at the beginning we say that this is sum of an empty prefix prefix of length zero with that answering queries gets easier so let's say this is a zero here but we need to be careful with indices so we have our definition pref of k is the sum of first K elements if you forget any kind of formulas you still need to remember this and just from this you should be able to derive everything else we have array a of size 7 here index from 0 to 6 and we have array prefix that goes from zero where prefix of 0 is 0 is just the sum of first zero elements and then prefix of one will be the sum of first one elements so just a of zero and so on eventually at the end we have prefix of seven and it's the sum of seven elements now what's the formula here well we have six six prefer three it's the sum of first three elements and then I want here as pref of 4 the sum of four elements so including the fourth one so I need to add six plus three and it will be nine then ten Fourteen and so on the formula in your code will be this pref of 4 was perform three plus a of three generally that or depending on whether you start your for loop from zero or one maybe you will use the equivalent formula here at the bottom once we compute this prefix sum array in yellow I can easily get the sum of values in a range of indices just using a single subtraction if I want to know the sum of values in the range from index 2 to index 5 this is the sum of the first six values so from zero to five minus the sum of the first two values like we did at the beginning of the video so pref of 6 minus pref of 2.

in our prefix sum RH 14-5 which is 9 and you can verify that it's 9 indeed more generally if we have a range from L to R then we take the first R plus one element so from 0 to R and we subtract the first L element from 0 to L minus 1. so we take pref of R plus 1 minus pref of L it might be confusing this shift of indices but thanks to that we never go out of bounds otherwise without the extra zero at the beginning representing an empty prefix you would here in the formula get pref of L minus 1 and you would sometimes go out of bounds if L is equal to zero so here's the second formula we need in our code the first one was here and you can combine that to get a working solution the most basic problem is just this you're given an array of size n and q queries in every query get the sum of values in range of indices from A to B The Brute Force is on the right for every query you can create a variable and add up all the elements in this range by using a for Loop the time complexity of such brute force is for each query you go through all elements from A to B in the worst case there are n elements so the time complexity is quadratic of Q times n we can do much better with perfect sums here's C plus plus and python code they are the same so we can read whichever you prefer in C plus plus in C plus plus we create an array of size n plus 1 from 0 to n we then you go with a for Loop and apply the formula matching the definition from before about pref of K for example pref of 2 now is the sum of the first two elements here diary was named X so X of 0 plus X1 you can if you need to debug something just print the array and verify that this is correct then for every query we need to print the difference between two perfect sums and in this task in from cscs the indices and B where one based they went from 1 to n instead of zero to n minus one so you need to either subtract one so a minus minus B minus minus or you need to fix the formula now if for example you want the sum from indices three to five one based then it's the sum of the first five elements minus the sum of first two elements so pref of B minus pref of a minus 1. in Python on the right it's the same this formula for creating prefix sums changes a little bit because we use a for loop from 0 instead of from one you can read both codes and compare it should be still equivalent one thing you need to be careful about in C plus plus is the type if you add up a lot of numbers you need long long so here you don't need int you cannot use int you need to use long look the time complexity of both codes is O of n for reading the input first the array and creating the prefix sum array this is also space complexity Plus for every query you just print a single difference so it's constant per query the total time complexity is linear of n plus Q let's see a more difficult task we are given a string and Q queries given a range of indices from L to R find the most common character in The substring from L to R for example if we ask about this substring from index free to something then I see that character a appears three times B appears once C appears two times so apparently the answer is character a letter a is the most frequent let's say that the alphabet just consists of lowercase English characters and if there is a tie we will print the letter that is first in the alphabet or something it doesn't matter too much this task doesn't have much to do with prefix sums it seems because we don't want to get sum of a range The Brute Force will again just iterate this range from L to R to compute frequencies but let's think about those frequencies we need to get those values and then choose the maximum one of them of those up to 26 values this is the size of the English alphabet and can we do that so can we say that for example for character a this range and its frequency free is the difference between longer prefix and shorter prefix because that was a solution for the previous task then what is the sum for character a for this longer prefix well it's the frequency that's what we need a appears here four times if I'm not mistaken so far in this shorter prefix it appears once and the difference of that for -1 is 3 that's what we need for every character then for every letter of the alphabet and there are 26 of them we need to compute perfect sums using the frequency of those characters we need to know how many times they appear in every prefix then for a query of indices from L to R for every letter we can very quickly get the frequency we don't need to iterate everything we still have a for Loop for 26 characters we cannot avoid that at least I don't know how to do it and the query will be answered fast in 26 operations not really operations in a for Loop of size 26.

let's see the codes I would say that python code here is simpler so let's start with that those two codes are different this time because there are many different ways to implement this task it's educational if you try to understand both but if not that's fine at least one of them and here in Python I'm using a dictionary please note that dictionaries are slow and if possible you should use arrays or python lists here just to demonstrate the most just to show an easy to understand code I used dictionaries the dictionary for every letter of the alphabet A through Z will give me a prefix sum array for example pref of a will be that first prefix sum and if the second dimension is 5 then it's number of occurrences of a in the first five letters it's good to write down such a definition because then if you have doubts about indices you can see above at your comment with the definition and then you will adjust the indices in your for Loops in your formulas for every character a for Z create an array of zeros where we will apply prefix sums put one whenever there is a character because frequency of that character is one there and then apply prefix sums from left to right you don't need to create a separate separate array you can just do this it's important to understand those two lines line 13 and 14 in Python code and they show that you don't need to create always a separate array to run prefix sums you can apply prefix sums from left to right on one array always propagating value to the right or if TMP of I is already the sum up to index I then I plus 1 should just get that adds that to itself and then we put this whole prefix some array in the dictionary to answer queries we read the range LR and we try every letter of the alphabet always frequency is the difference of two prefix sums if this frequency is bigger than the maximum so far we overwrite eventually we'll print the answer C plus plus code is a little bit different for efficiency for a better constant factor and speed I'm using here a two-dimensional array in line eight it's a vector here actually n plus 1 by 26 so also dimensions are swapped compared to python code just to show you that you're not forced into one particular solution here's a definition pref of 5 will mean up to index four so the first five characters and zero is the letter of the alphabet zero is a b is 1 Z is 25. we go if we iterate and this is from 0 to n minus one of the string and always if you knew that 26 frequencies for the previous index you copy that and then you get the value of a letter here like a is value zero and you increase this frequency by one that's a way to go from left to right and to compute frequencies of everything while again storing them you have space complexity 26 multiplied by n sometimes it can be an issue the same was in Python then for queries we read Ln R in link21 we try all the 26 characters I'm using here arrays not characters anymore so it's just a for Loop of integers from 0 to 25 their frequency is difference of Two Perfect sums and if it's better than maximum so far then overwrite time complexity in both cases is for pre-processing for this first part of computing prefix sums 26 multiplied by n it's also the space complexity then for every query we have 26 operations or a for Loop of size 26 usually we don't put such constants in time complexity but this is significant we shouldn't really drop it and maybe it's better to put a variable here for the alphabet size let's say that a equals 26 is the alphabet size in some other version of this problem alphabet size could be bigger not just 26 characters of the lowercase English characters in the alphabet maybe you would have all the ASCII values then it would be 256 or 128 maybe it could be numbers from 1 to a thousand and then for every number from one to thousand you would get a separate prefix sum still I would call this a linear time complexity kind of n plus Q so much better than the quadratic time complexity of a Brute Force please note that prefixes will work well with for example summing up a range or Computing the xor because you can get xor of this prefix and subtractor here again xor shorter range but it doesn't work with all the operations for example maximum doesn't really work with that if I tell you that maximum of this longer range is 20 and maximum of this shorter range let's say is also 20 that's that ugly case then there is no way of knowing what is the maximum in this range for example maybe the values are 27 6 7 4. then the prefix maximum array doesn't even contain the maximum for this shorter range so prefix of R of l l minus 1 or anything like that is not useful a prefix maximum re might help in solving some problems but certainly it's not a problem about Computing maximum in a range from L to R here's the last problem I will discuss we're given an array a of size n and q updates Updates this time not queries so we will modify the array an update consists of a range of indices LR and the value X you need to increase all elements at indices in this range by plus X print the final sequence you don't need to print anything between the updates length of the sequence and the number of queries are both up to hundred thousand links to all those problems are in the description if you want to try them yourself as always we are not happy with a quadratic Brute Force and I will show you how we can add 5 to everything in this range basically by doing two operations it's treating a little bit but let's try I will put plus 5 here at this starting index l and at the end before printing the final sequence we'll apply prefix sums like in the previous problem where where we propagate this value to the right we make a for Loop and we do pref of I plus equal pref of I minus 1.

this plus 5 will probably propagate everywhere to the end and that's not what we wanted I wanted to add plus 5 up to here up to index r how can we cancel that how can we negate the plus 5 in this suffix in the last few elements the elements after index are we can subtract 5 here when there is an update with lrx we need to add X at index l subtract X at index R plus 1 and that's it for now we do it for all the updates and at the end just once we run prefix sums from left to right from index 0 to index n minus 1 this time do we then maintain prefix sums or something like that not really we just at the end before printing the sequence we apply the part of prefix sum algorithm I would say here is a pseudo code and we create an array of zeros because it's very important you don't want to run prefix sums on this array it doesn't make any sense the array should stay we put it on the side we separately create an array of zeros for every query we do those two changes increase and decrease then we go from left to right and we run the prefix sums from left to right so we propagate all the values to the right if we added five somewhere it will automatically be added to everything to the right and then for every index I we print the initial value from that index plus whatever we computed prefof I is the total contribution of updates to this index and I'm saying here increase decrease actually X could be negative it doesn't matter that this value is positive everything will work it's just that maybe we'll here do -20 and plus 20.