Hello, This is a 47 part series that tries to give an introduction to algorithms. The content that I am using here to write this series is from MIT 6.006 Introduction to Algorithms, Fall 2011. In this first part of the series, we are going to talk about the way of Algorithmic Thinking using a fairly easy Algorithm called Peak Finding.
Before starting out let’s first define Algorithmic Thinking
According to the professor of MIT 6.006 Introduction to Algorithms Srini Devadas and I quote “Algorithmic Thinking is all about efficient procedures for solving problems on large inputs”
What actually does large inputs mean?
Some fair example of large inputs are
- US — highway system
- Human genome (Which has billions letters in its alphabet)
- Social network (like facebook and twitter)
Here in 21st century definition of large input is in trillions. The World is moving faster than ever, things are getting bigger, we have the computational power that could handle large data (trillions) this does not mean efficiency is the main concern. I agree we can scan billions of element in a matter of second but if you had an algorithm that required cubit complexity suddenly we are not talking about 10 to the power 9 we are talking about 10 to the power 27 and even current computer can’t handle that kind of numbers. So efficiency is a concern as input gets larger it becomes more of a concern.
So in this series we mostly concern about
- Efficient procedures for solving large scale problems and
We are going to tackle above concern using the classic data structure like arrays, linked list, stack and queue along with classic algorithms like Search Algorithms, Sort algorithms, and Tree Algorithms.
This series is not about algorithmic design it’s about algorithmic analysis. We are going to do a lot of analysis and think efficient procedures to solve large-scale problems. We also concern about Scalability because back in the day’s large input was in thousands, today it is in trillions it’s just a matter of time we call 10 to the power 18 fairly a large input. Hence the algorithm we design should be scalable to the growth of the input.
Let’s start with the one dimensional version of peak Finder
Let us consider a number of arrays, we are representing them in symbols ( a — i ), we also assume that all the numbers are positive numbers.
Here position 2 is a peak if and only if b >= a and b >=c.
If you are equal and greater than the elements on left and right side than you are the peak. In case of the edges, you only have to look at only one side. For example, position 9 is a peak if i >= h.
So the problem we solve right now is represented as “Find a peak if exists”. I highly emphasis on the part “if exists”, this is an approach of Algorithmic Thinking. What we are trying to advocate for this problem is that the algorithms we design should be general. We use “if exists” because whenever we want to argue about the correctness of the algorithm we have a proof of concept that we will find or not find the peak from the given set of data. In our case, we will always find a peak but if we change the problem definition we will still have the starting point to go attack the second version of the problem. In this case we have defined that there is greater than and equal to (b >= a and b >=c) we can easily argue that any array will definitely have a peak but let’s tweak this problem a bit and say we only have a greater than, then we can’t for sure say there will be a peak.
Many time you are asked to do something, and you can’t answer the question or find something that satisfies all the constraints required. And in that case, you want to be able to give an argument that you searched hard but could not find it. Otherwise, there is always a case that you didn’t search hard enough. You searched hard and could not find the answer is the proof of concept that the solution might not be available.
Now let’s look at a Straightforward Algorithm
Let us assume that the peak is in the middle, the numbers start increasing from left up to the middle and start decreasing. Here the algorithm will have to look at n/2 elements to find a peak.
Let us again assume that the peak is all the way to the right, so you start searching peak from the left all the way to the right, you will be looking at n elements to find a peak.
So in the worst case scenario, the complexity will be Θ(n), i.e it has to look at all the elements in the array. So what we are really saying here is that the asymptotic complexity of the algorithm is linear.
Divide and Conquer
This is a divide and conquer algorithm. We are mostly going to look at the n/2 position.
If [n/2] < [n/2–1] then only look at left half from 1 to [n/2–1] to look for a peak
Else if [n/2] < [n/2+1] then only look at right half from [n/2+1] to n
Else n/2 is the peak.
Given the problem, we agree that this algorithm is correct and finds a peak. Given the fact that we agreed on the correctness of the algorithm now let us talk about the complexity of the algorithm.
First, let’s define a recurrence relation in terms of T(n) to this recursive algorithm Divide and Conquer
T(n) = T(n/2) + Θ(1)
Standing on the base of computational standpoint this algorithm does T(n) amount of work on the input size of n.
Here on the equation Theta 1 corresponds to the two comparisons we have to do since 2 is constant we represent it as Θ(1)
So we take the above equation and expand it eventually we will get to the best case which is T(1) = Θ(1).
We will reach to the array with a single value, for this array will return the value as a peak.
So we could also write the equation as
T(n) = Θ(1) + …… + Θ(1) [This is a expanded form of the above equation]
We gonna expand it log n times. So the complexity of the algorithm is Θ(log n)
So if you compare divide and conquer with straightforward algorithm there is an exponential difference in terms of complexity. Divide and Conquer is way faster than the straightforward algorithm. So we can conclude that it is always better to reduce complexity as the input gets large.
Now let’s look at the two dimensional version of peak finder
As we can guess a is a 2D peak if and only if
a >= b, a >= c, a >= d and a >= e
So we have again used greater than and equal to here as well so it’s similar to that of one dimensional that the peak will exist.
In this version also let’s start with a Straightforward algorithm called Greedy Ascent Algorithm
In Greedy Ascent Algorithm, we have to make a choice from where to start.
So if we say we want to start with 12, we are going to look for something to left. And if it’s greater than, we’re going to follow that direction. If it’s not, then you’re going the other direction. So, in this case, we will go to 12, 13, 14, 15, 16, 17,19, and 20. And we will find a peak. In this algorithm, if we try to find a peak we might have to touch the half part of the elements or even worse all the parts of the elements in a matrix. So if we try to do the worst case analysis of the algorithm we will find that it would be Θ(nm) where n is the number of rows and m be the number of columns. In the case where n = m the worst case complexity would be Θ(n²).
Now let’s try to improve the complexity by Extending 1D Divide and Conquer to 2D.
Attempt # 1
Let’s pick middle column j = m/2 and find a 1D peak at (i, j). Use (i, j) as a start point on row i to find 1D-peak on row i. I am really happy that we reduced the complexity to Θ(log n) as the complexity to find a peak in the 1D array is Θ(log n). But the problem is that this algorithm is efficient but not correct. As of old saying goes by
“It is better to have an algorithm that is inefficient but correct rather have efficient incorrect algorithm”.
So what’s the problem with this algorithm?
The problem is 2D peak my not exist in row i.
Let’s look at the example:
Let’s choose the 3rd column from left as a middle. We start finding a peak and returned 12 as a peak, it’s quite possible to return 12 as a peak even though 19 is the actual peak because the value that surrounds 12 are less than 12. So I choose 12 as a pick and start finding peak on a row where 12 is located. Looking at the row the peak is at 14. And the algorithm will return 14 as a peak of the matrix. It’s true that 14 is a peak in a 1D case but looking from the perspective of a 2D 14 is not a peak which means the algorithm is incorrect. This looks like an efficient algorithm but does not work.
So the last algorithm that will solve this problem is:
Attempt # 2
- Pick middle column j = m/2
- Find global maximum on column j at (i, j)
- Compare (i, j − 1),(i, j),(i, j + 1)
- Pick left columns if(i, j − 1) > (i, j)
- Similarly for right if (i, j) < (i, j + 1)
- (i, j) is a 2D-peak if neither condition holds
- Solve the new problem with half the number of columns.
- When you have a single column, find global maximum and you‘re done
Example of Attempt # 2
So the recurrence relation in terms of T(n,m) to this recursive algorithm is
T(n, m) = T(n, m/2) + Θ(n)
Why is this the equation because n is the number of rows and m is the number of columns, In one case we will be breaking things down into half number of columns which is m/2 and In order to find the global maximum we will be doing Θ(n) work.
So we take the above equation and expand it eventually we will get to the best case which is
T(n,1) = Θ(n).
So we could also write the equation as
T(n, m) = Θ(n) + …… + Θ(n) [This is a expanded form of the above equation]
We gonna expand it log m times. So the complexity of the algorithm is Θ(n log m)
Well, this was quite a long blog. Hope you got what I meant in this blog. If you want the reference from where I took content to write this blog then the reference has been listed below