How to solve "Array Manipulation" on HackerRank
The Array Manipulation challenge, as presented, is fairly simple: for a set of ranges and values, add each value to each element of an array that falls within its given range. The difficulty comes from the scale: if there are ranges that each span elements, the simplest solution requires 1,000,000,000,000 (that’s one quadrillion) additions and loop iterations. And that number of operations just will not complete in the 2 second time limit for C++ submissions.
So, what’s the solution? Well, fortunately, there are at least two, and they’re fairly simple.
The binary tree approach
My preferred approach is to break the result array down into a binary tree of sub-ranges. For each range presented to us in our input, we add the given value to the smallest number of sub-ranges that completely cover the given range. Once this is done, we can efficiently calculate the value of any element of the final array by walking the tree from root to leaf and keeping a sum of the values of each node we pass along the way. Since we’re interested in the maximum value of any element here, we’ll recursively walk the entire tree and keep track of the maximum value we calculate for any leaf.
To get an understanding of the tree model, let’s look at some example operations. Say we’re given an 8 element array to work with. The initial tree looks like this, with the plain nodes at the bottom of the tree representing the sum of all of the ancestors for the leaf node it connects to:
Now, let’s add our first range. If we add 13 to the elements in the range , we end up with a tree that looks like this:
The range can be completely covered by the three sub-ranges , , and , so we add 13 to the value for each of those nodes. None of their parents or children are modified. As you can see, we only needed to perform 3 additions to cover a range of 7 nodes. The worst case insertion complexity will generally be with respect to the width of the range.
For our second operation, let’s add 19 to the elements in the range :
This range is covered by the sub-ranges , , and . The first of those has not been modified yet, so its value becomes 19. The latter two were touched by the previous operation, so their values become 32. The totals are updated accordingly, and we can see that the maximum value for any leaf node is 32.
So, with that understanding, let’s convert it from theory to code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 |
|
The prefix sum array approach
The above approach allows us to efficiently find the value for any index in the final array after we’ve completed the initial calculations. That feature is often quite useful in the real world, but not technically necessary for this problem. There is a simpler alternate approach that lacks efficient random access that we could use here instead: the prefix sum array.
A prefix sum array, or partial sum array, is a transformation of an array where each element stores the sum of the element at the same index in the original array and all of the elements to its left:
1 2 3 4 5 |
|
After which prefixSum
contains { 1, 3, 6, 10, 15 }
.
Or, mathematically, given a value array , its prefix array is defined as:
or, equivalently:
So, how is this useful in our challenge? Well, it turns out that the final result
array of all of our range additions is essentially a prefix sum array. For
example, let’s say we want to add 3 to all elements in the range .
We can add 3 to element 1 in the values
array and subtract it after element
3:
1 2 3 4 5 6 7 8 |
|
After which prefixSum
contains { 0, 3, 3, 3, 0 }
.
The advantage of this approach is that we only need to perform two operations for each ranged addition when we build the values array, and then one addition for each element in the array to calculate the result. That gives us a total complexity of for ranged additions in an element array, as opposed to a worst case complexity of if we added the value to each element in the range for each query.
The final code here is markedly simpler but, again, at the expense of much less efficient random access to the result:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
|