Following are going to be covered in this post:
Selection Sort
Reference: https://www.geeksforgeeks.org/selection-sort/
The Selection Sort algorithm sorts an array by repeatedly finding the minimum element from unsorted part and putting it at the beginning.
The logic of algorithm is to maintain two subarrays, sorted and unsorted, in a giving array.
In every iteration of Selection Sort, the minimum element from the unsorted subarray is picked and moved to the sorted subarray. (swapping)
Steps:
- Selection
- Swapping
- Counter Shift
Example:
11, 25, 12, 22, 64
11, 12, 25, 22, 64
11, 12, 22, 25, 64
11, 12, 22, 25, 64
The good thing about selection sort is it never makes more than O(n) swaps and can be useful when memory write is a costly operation.
The default implementation is not stable. However, it can be made stable.
Code Snippet
1 |
|
1 |
|
Properties
Time Complexity: O(n^2), as there are two nested loops
Auxiliary Space: O(1)
Sorting in Place: Yes
Stable: No. The default implementation is not stable. However, it can be made stable. Please see stable selection sort for details.
Bubble Sort
Reference: https://www.geeksforgeeks.org/bubble-sort/
Bubble sort is the simplest sorting algorithm that works by repeatedly swapping the adjacent elements if they are in wrong order.
The algorithm always needs one more whole pass without any swap to know it is sorted and completed.
Due to its simplicity, bubble sort is often used to introduce the concept of a sorting algorithm. In computer graphics, it is popular to detect a very small error (like swap of just two elements) in almost-sorted arrays and fix it with just linear complexity (2n). For example, it is used in a polygon filling algorithm, where bounding lines are sorted by their x coordinate at a specific scan line (a line parallel to x axix) and with incrementing y. Their order changes (two elements are swapped) only at intersections of two lines.
Code Snippet
Intuitive Implementation
1 |
|
Optimized Implementation
The above one always run O(n^2) time even if the array is sorted. It can be optimized by stopping the algorithm if inner loop did not cause any swap.
1 |
|
Complexity
Time Complexity
- Worst and Average Case: O(n^2). Worst case occurs when array is reverse sorted.
- Best Case: O(n). Best case occurs when array is already sorted.
Auxiliary Space: O(1)
Sorting in Place: Yes
Stable: Yes
Boundary Cases
Bubble sort takes minimum time (Order of n) when elements are already sorted.
Insertion Sort
Reference: https://www.geeksforgeeks.org/insertion-sort/
Insertion sort is a simple sorting algorithm that works the way we sort playing cards in our hands.
Example:
12, 11, 13, 5, 6
11, 12, 13, 5, 6
11, 12, 13, 5, 6
5, 11, 12, 13, 6
5, 6, 11, 12, 13
Insertion Sort performs two operations:
- Scan through the list, comparing each pair of elements.
- Swap elements if they are out of order.
Insertion sort is used when number of elements is small. It can be useful when input array is almost sorted, only few elements are misplaced in complete big array.
Binary Insertion Sort
Binary search can be used to reduce the number of comparisons in normal insertion sort. Binary insertion sort uses binary search to find the proper location to insert the selected item at each iteration.
In normal insertion, sorting takes O(i), at the i-th
iteration, in worst case. We can reduce it to O(log i) by using binary search. The algorithm, as a whole, still has a running worst case, running time of O(n^2) because often series of swaps required for each insertion.
In other words, applying Binary Search to calculate the position of the data to be inserted doesn’t reduce the time complexity of insertion sort. This is because insertion of a data at an appropriate position involves two steps:
- Calculate the position.
- Shift the data from the position calculated in
Step 1
to create a gap where the data will be inserted.
Using Binary Search reduces the time complexity in Step 1
from O(n) to O(log n). But, the time complexity in Step 2
still remains O(n). So, the overall complexity remains O(n^2).
More specifically, for each insertion, we use Binary Search for comparisons and finding the position, which takes O(log n). The number of movements is O(n). Hence, the complexity is O(n + log n) = O(n). On top of that, we will have to do this n times for n insertions. Therefore, the overall complexity remains O(n^2).
Code Snippet
1 |
|
1 |
|
1 |
|
Properties
Insertion sort runs in O(n) time in its best case and runs in o(n^2) in its worst and average cases.
Time Complexity:
- Best Case: O(n)
- Worst Case: O(n^2)
Auxiliary Space: O(1)
Algorithmic Paradigm: Incremental Approach
Sorting in Place: Yes
Stable: Yes
Boundary Case
Insertion sort takes maximum time to sort if elements are sorted in reverse order. And it takes minimum time (Order of n) when elements are already sorted.
Merge Sort
Reference: https://www.geeksforgeeks.org/merge-sort/
Similar to Quick Sort, Merge Sort is a Divide and Conquer
algorithm. It divides input array in two halves, calls itself for the two halves and then merges the two sorted halves. The merge() function is used for merging two halves. The merge(arr, l, m, r)
is key process that assumes that arr[l..m] and arr[m+1..r] are sorted and merges the two sorted sub-arrays into one.
Merge Sort is useful for sorting linked list in O(n log n) time. In the case of linked lists, the case is different mainly due to the difference in memory allocation of arrays and linked list. Unlike arrays, linked list nodes may not be adjacent in memory. Unlike arrays, int the linked list, we can insert items in the middle in O(1) extra space and O(1) time. Therefore, merge operation of merge sort can be implemented without extra space for linked list.
In arrays, we can do random access as elements are contiguous in memory. Unlike arrays, we cannot do random access in the linked list. Quick Sort requires a lot of this kind of access. In linked list, to access i-th index, we have to traverse each and every node from the head to i-th node as we don’t have a continuous block of memory. Therefore, the overhead increases for Quick Sort. Merge Sort accesses data sequentially and the need of random access is low.
Logic of Merge Sort
1 | merge(arrp[], l, m, r) |
Code Snippet
1 |
|
Properties
Time Complexity: O(n log n) in all 3 cases (worst, average, and best)
Auxiliary Space: O(n)
Algorithm Paradigm: Divide and Conquer
Sorting in Place: No in a typical implementation
Stable: Yes
Merge Sort for Linked List
Merge Sort is often preferred for sorting a linked list. The slow random access performance of a linked list makes some other algorithms, such as Quick Sort, perform poorly, and others, such as Heap Sort, completely impossible.
Logic of Merge Sort for Linked List
1 | mergeSort(headRef) |
Code Snippet
1 | // linked list node |
Heap Sort
Reference: https://www.geeksforgeeks.org/heap-sort/
Heap Sort is a comparison based sorting technique based on Binary Heap data struture. It is similar to Selection Sort where we first find the maximum element and place the maximum element at the end. Repeat the same process for remaining element.
What is Binary Heap
A complete Binary Tree is a Binary Tree in which every level, except possibly the last, is completely filled, and all nodes are as far left as possible.
A Binary Heap is a Complete Binary Tree where items are stored in a special order, such that value in a parent node is greater (or smaller) than the values in its two children nodes. The former is called as max heap
and the latter is called min heap
. The heap can be represented by Binary Tree or array.
Why array based representation for Binary Heap
Since a Binary Heap is a Complete Binary Tree, it can be easily represented as array and array based representation is space efficient. If the parent node is stored at index i, the left child can be calculated by 2 * i + 1 and right child by 2 * i + 2, assuming the indexing starts at 0.
Heap Sort Algorithm for Sorting in Increasing Order
- Build a max heap from the input data.
- At this point, the largest item is stored at the root of the heap. Replace it with the last item of the heap followed by reducing the size of heap by 1. Finally, heapify the root of tree.
- Repeat above steps while size of heap is greater than 1.
How to Build the Heap
Heapify procedure can be applied to a node only if its children nodes are heapified. So, the heapification must be performed in the bottom up order.
Applications of Heap Sort
Heap Sort algorithm has limited uses because Quick Sort and Merge Sort are better in practice. Nevertheless, the Heap data structure itself is enormously used.
Code Snippet
Properties
Time Complexity: O(n log n)
Time complexity of heapify is O(log n).
Time complexity of createAndBuildHeap() is O(n).
Heap Sort is an in-place algorithm
Its typical implementation is not stable, but can be made stable.
Quick Sort
Reference: https://www.geeksforgeeks.org/quick-sort/
Similar to Merge Sort, Quick Sort is a Divide and Conquer algorithm. It picks an element as pivot and partitions the giving array around the picked pivot. There are many different versions of Quick Sort that pick pivot in different ways.
- Always pick the first element as pivot.
- Always pick the last element as pivot (implemented below).
- Pick a random element as pivot.
- Pick median as pivot.
The key process in Quick Sort is partition()
. Target of partition is, given an array and an element x
of array as pivot, put x
at its correct position in sorted array and put all smaller elements (smaller than x
) before x
, and put all greater elements (greater than x
) after x
. All this should be done in linear time.
Pseudo Code for Recursive quickSort Function
1 | // low -> Starting index |
Partition Algorithm
There can be many ways to do partition, following pseudo code adopts the method given in CLRS book. The logic is simple, we start from the leftmost element and keep track of index of smaller (or equal to) element as i
. While traversing, if we find a smaller element, we swap current element with arr[i]
. Otherwise, we ignore current element.
1 | // This function takes last element as pivot, |
When does the Worst Case of Quick Sort Occur
The answer depends on strategy for choosing pivot
. In early versions of Quick Sort where leftmost (or rightmost) element is chosen as pivot, the worst occurs in following cases:
- Array is already sorted in the same order.
- Array is already sorted in reverse order.
- All elements are same (especially the case of 1 and 2)
Since these cases are very common use cases, the problem was easily solved by choosing either a random index for the pivot, choosing the middle index of the partition, or (especially for longer partitions) choosing the median of the first, middle, and last element of the partition for the pivot. With these modifications, the worst case of Quick Sort has less chances to occur, but worst case can still occur if the input array is such that the maximum (or minimum) element is always chosen as pivot.
Why QuickSort is Preferred over MergeSort for Sorting Arrays
Quick Sort in its general form is an in-place (i.e. it doesn’t require any extra storage) whereas Merge Sort requires O(n) extra storage, n denoting the array size, which may be quite expensive.
Allocating and de-allocating the extra space used by Merge Sort increases the running time of the algorithm. Comparing average complexity, we find that both type of sorts have O(n log n) average complexity, but the constants differ. For arrays, Merge Sort loses due to the use of extra O(n) storage space.
Most Practical implementations of Quick Sort use randomized version. The randomized version has expected time complexity of O(n log n). The worst case is possible in randomized version also, but worst case doesn’t occur for a particular pattern (like sorted array) and randomized Quick Sort works well in practice.
Quick Sort is also a cache friendly sorting algorithm as it has good locality of reference when used for arrays.
Quick Sort is also tail recursive, therefore tail call optimizations is done.
Why MergeSort is Preferred over QuickSort for Linked Lists
In case of linked lists, the case is different mainly due to difference in memory allocation of arrays and linked lists. Unlike arrays, linked list nodes may not be adjacent in memory. Unlike array, in linked list, we can insert item in the middle in O(1) time and O(1) extra space. Therefore, merge operation of Merge Sort can be implemented without extra space for linked lists.
In arrays, we can do random access as elements are continuous in memory. Let us say, we have an integer (4-byte) array A and let the address of A[0]
be x
, then to access A[i]
, we can directly access the memory at (x + i*4)
. Unlike arrays, we cannot do random access in linked list. Quick Sort requires a lot of this kind of access. In linked list, to access i-th
index, we have to travel each and every node from the head to i-th
node, as we don’t have continuous block of memory. Therefore, the overhead increases for Quick Sort. Merge Sort accesses data sequentially and the need of random access is low.
Code Snippet
1 |
|
Properties
The time taken by Quick Sort depends upon the input array and partition strategy.
- Worst Case: The worst case occurs when the partition process always picks greatest or smallest element as pivot. If we consider above partition strategy where last element is always picked as pivot, the worst case would occur when the array is already sorted in increasing or decreasing order. The recurrence for worst case is O(n^2).
- Best Case: The best case occurs when the partition process always picks the middle element as pivot. The recurrence for best case is O(n log n).
- Average Case: To do average case analysis, we need to consider all possible permutation of array and calculate time taken by every permutation which does not look easy. We can get an idea of average case by considering the case when partition puts O(n/9) elements in one set and O(9n/10) elements in other set. The recurrence is also O(n log n).
Although the worst case time complexity of Quick Sort is O(n^2), which is more than many other sorting algorithm, like Merge Sort and Heap Sort, but Quick Sort is faster in practice, because its inner loop can be efficiently implemented on most architectures, and in most real-world data. Quick Sort can be implemented in different ways by changing the choice of pivot, so that the worst case rarely occurs for a given type of data. However, Merge Sort is generally considered better when data is huge and stored in external storage.
Is Quick Sort Stable
The default implementation is not stable. However, any sorting algorithm can be made stable by considering indexes as comparison parameter.
Is Quick Sort In-Place
As per the broad definition of in-place algorithm, it qualifies as an in-place sorting algorithm as it uses extra space only for storing recursive function calls, but not for manipulating the input.
How to Optimize QuickSort So That It Takes O(log n) Extra Space in Worst Case
Tail Call Optimization
- https://www.geeksforgeeks.org/tail-call-elimination/
- https://www.geeksforgeeks.org/quicksort-tail-call-optimization-reducing-worst-case-space-log-n/
In Quick Sort, partition function is in-place, but we need extra space for recursive function calls. A simple implementation of Quick Sort makes two calls to itself and in worst case requires O(n) space on function call stack.
The worst case happens when the selected pivot always divides the array such that one part has 0
element and other part has (n-1)
elements.
Can we reduce the auxiliary space for function call stack?
We can limit the auxiliary space to O(log n). The idea is based on tail call elimination. We can convert the code so that it makes only one recursive call in Quick Sort.
1 | // space complexity: O(n) |
Although we have reduced number of recursive calls, the above code can still use O(n) auxiliary space in worst case. In worst case, it is possible that array is divided in a way that the first part always has (n-1)
elements. For example, this may happen when the last element is chosen as pivot and array is sorted in decreasing order.
We can optimize the above code to make a recursive call only for the smaller part after partition.
If the left part becomes smaller, then we make recursive call for left part. Else, for the right part. In worst case (for space), when both parts are of equal sizes in all recursive calls, we use O(log n) extra space.
1 | // space complexity: O(log n) |
Quick Sort is also tail recursive.
Note that Merge Sort is NOT tail recursive, this is also one of the reasons why Quick Sort performs better.
Hoare’s Partition Scheme
Hoare’s partition scheme works by initializing two indexes that start at two ends, the two indexes move toward each other until an inversion is found. (A smaller value on left side and greater value on right side.) When an inversion is found, two values are swapped and process is repeated.
1 | // take the first element as pivot |
Random Pivoting
In Quick Sort, we first partition the array in place such that all elements to the left of the pivot element are smaller, and all elements to the right of the pivot are greater than the pivot element. Then, we recursively call the same procedure for left and right subarrays.
Unlike Merge Sort, we don’t need to merge the two sorted arrays. Thus, Quick Sort requires lesser auxiliary space than Merge Sort, which is why it is often preferred to Merge Sort. Using a randomly generated pivot, we can further improve the time complexity of Quick Sort.
1 | // pick a random index for randomized partition |
Counting Sort
Reference: https://www.geeksforgeeks.org/counting-sort/
Counting Sort is a sorting technique based on keys between a specific range. It works by counting the number of objects having distinct key values (kind of hashing). Then, doing some arithmetic to calculate the position of each object in the output sequence.
The problem with the above counting sort was that we could not sort the elements if we have negative numbers in it. It is because there are no negative array indices. So, what we do is to find the minimum element and store count of that minimum element at zero index.
Points to be noted:
Counting sort is efficient if the range of input data is not significantly greater than the number of objects to be sorted. Consider the situation where the input sequence is between range 1 to 10k and the data is 10, 5, 10k, 5k.
It is not a comparison based sorting. Its run time complexity is O(n) with space proportional to the range of data.
It is often used as a sub-routine to another sorting algorithm, like radix sort.
Counting sort uses a partial hashing to count the occurrence of the data object in O(1).
Counting sort can be extended to work for negative inputs also.
Code Snippet
Properties
Time Complexity: O(n+k), where n is the number of elements in input array and k is the range of input.
Auxiliary Space: O(n+k)