Overview of Data Structures

Reference: https://www.geeksforgeeks.org/data-structures/

Data structure is a way of organizing data so that it can be used effectively. It is used to reduced the time and space complexity for different tasks.

Linear Data Structures

Reference: https://www.geeksforgeeks.org/overview-of-data-structures-set-1-linear-data-structures/

Array

Array is a data structure used to store homogeneous elements at contiguous locations. Size of an array must be provided before storing data.

Assume that the size of array is n.
Access Time: O(1)
This is possible because elements are stored at contiguous locations.

Search Time:
O(n) for Sequential Search
O(log n) for Binary Search, if array is sorted

Insertion Time: O(n)
The worst case occurs when insertion happens at the beginning of an array and requires shifting all of the elements.

Deletion Time: O(n)
The worst case occurs when deletion happens at the beginning of an array and requires shifting all of the elements.

Linked List

A linked list is a linear data structure, like arrays, where each element is a separate object. Each element, a node, of a list is comprising of two items - the data and a reference to the next node.

Type of Linked List:

Singly Linked List
In this type of linked list, every node stores address or reference of next node in list, and the last node has next address or reference as NULL. For example:
1
1 -> 2 -> 3 -> 4 -> NULL
Doubly Linked List
In this type of linked list, there are two references associated with each node. One of the reference points to the next node, and one to the previous node. Advantage of this data structure is that we can traverse in both directions. Also, for deletion, we don’t need to have explicit access to previous node.
1
NULL <- 1 <-> 2 <-> 3 -> NULL
Circular Linked List
Circular linked list is a linked list where all nodes are connected to form a circle. There is no NULL at the end. A circular linked list can be a singly circular linked list or doubly circular linked list. Advantage of this data structure is that any node can be made as starting node. This is useful in implementation of circular queue in linked list.
1
1 -> 2 -> 3 -> 4 -> 1
where the next pointer of the last node is pointing to the first.

Access Time of an element: O(n)

Search Time of an element: O(n)

Insertion Time of an element:
O(1), if we are at the position where we have an element

Deletion Time of an element:
O(1), if we know the address of the node that is previous to the node to be deleted.

Stack

A stack or LIFO (last in first out) is an abstract data type that serves as a collection of elements, with two principal operations: push(), which adds an element to the collection, and pop(), which removes the last element that was added. In stack, both the operations of push() and pop() take place at the same end that is top of the stack. It can be implemented by using both array and linked list.

Stack are used for maintaining function calls, the last called function must finish execution first. We can always remove recursion with the help of stacks. Stacks are also used in cases where we have to reverse a word, check for balanced parenthesis, and in editors where the word you typed that last is the first to be removed when you used undo operation. Similarly, to implement back functionality in web browsers.

Access Time: O(n) in worst case
Insertion Time: O(1)
Deletion Time: O(1)

Note that, insertion and deletion are allowed on one end.

Queue

A queue or FIFO, first in first out, is an abstract data type that serves as a collection of elements, with two principal operations: enqueue(), the process of adding an element to the collection, dequeue(), the process of removing the first element that was added. It can be implemented by using both array and linked list.

Examples of using queue:
Any situation where resources are shared among multiple users and served on first come first serve basis, such as CPU scheduling, Disk scheduling.

Another application of queue is when data is transferred asynchronously (data not necessarily received at the same rate as sent) between two processes. Examples includes IO buffers, pipes, file IO, etc.

Access Time: O(n) in worst case
Insertion Time: O(1)
Deletion Time: O(1)

Note that, the element is added to the rear side, and the element is removed from the front side.

Circular Queue
The advantage of this data structure is that it reduces wastage of space in case of array implementation. As the insertion of the (n+1)-th element is done at the 0-th index if it is empty.

Tree, Heap, Hashing, Trie, Graph

References:

Binary Tree

Unlike arrays, linked list, stack, and queues, which are liner data structures, trees are hierarchical data structures. A binary tree is a tree data structure in which each node has at most two children, which are referred to as the left child and the right child. It is implemented mainly using Links.

Representation

A tree is represented by a pointer to the top most node, root, in tree. If the tree is empty, then the value of root is NULL. A Binary Tree node contains following parts.

Data
Pointer to the left child
Pointer to the right child

Traversal

A Binary Tree can be traversed in two ways:

Depth First Traversal
- In-order (Left-Root-Right)
- Pre-order (Root-Left-Right)
- Post-order (Left-Right-Root)
Breadth First Traversal
- Level Order Traversal

Properties

The maximum number of nodes at level l = 2^(l-1)
Maximum number of nodes = 2^(h+1) - 1, where h is the height of a tree. Hight is considered as the maximum number of edges on a path from root to leaf.
Minimum possible height = ceil(Log2(n+1)) - 1
In Binary Tree, the number of leaf nodes is always one more than nodes with two children
Time complexity of tree traversal: O(n), if tree is unbalanced

Examples

One reason to use Binary Tree or tree in general is for the things that form a hierarchy. They are useful in file structures where each file is located in a particular directory and there is a specific hierarchy associated with files and directories.

Another example where trees are useful is storing hierarchical objects, like JavaScript Document Object Model, which considers HTML page as a tree with nesting of tags as parent-child relations.

Binary Search Tree

Binary Search Tree is a Binary Tree with following additional properties:

The left subtree of a node contains only nodes with keys less than the node’s key.
The right subtree of a node contains only nodes with keys greater than the node’s key.
The left and right subtree each must also be a Binary Search Tree.

BST provides moderate access / search (quicker than linked list and slower than arrays)

BST provides moderate insertion / deletion (quicker than arrays and slower than linked list)

Complexity

Search: O(h)
Insertion: O(h)
Deletion: O(h)
Extra Space: O(n) for pointers

h: The height of BST
n: The number of nodes in BST

If Binary Search Tree is Height Balanced, then h = O(log n)

Self Balancing BST
Like AVL Tree, Red-Black Tree, and Splay Tree, make sure that height of BST remains O(log n)

Examples

BST is mainly used in search application, where data is constantly entering / leaving, and data needs to printed in sorted order. For example, in implementation in E-commerce websites where a new product is added or product goes out of stock, all products are listed in sorted order.

Binary Heap

A Heap is a special Tree based data structure in which the tree is a complete binary tree.

A Binary Heap is a Binary Tree with following properties:

It’s a complete tree, meaning that all levels are completely filled except possibly the last level has all keys as left as possible. This property of Binary Heap makes it suitable to be stored in an array.
A Binary Heap is either Min Heap or Max Heap. In a Min Binary Heap, the key at root must be minimum among all keys present in Binary Heap. The same property must be recursively true for all nodes in Binary Tree. Max Binary Heap is similar to Min Heap. It is mainly implemented using array.

Min Heap

       10
     /    \
  15       30
  / \      / \
40   50  100  40

Max Heap

      100
     /   \
  40      50
  / \     / \
10   15  50  40

Complexity

Insertion: O(log n)
deletion: O(log n)

Get minimum in Min Heap: O(1)
Extract minimum Min Heap: O(log n)
Decrease key in Min Heap O(log n)

Get maximum in Max Heap: O(1)
Extract maximum Max Heap: O(log n)
Decrease key in Max Heap O(log n)

Examples

Binary Heap is used in implementing efficient Priority Queues, which in turn are used for scheduling processes in operating systems. Priority Queues are also used in Dijstra’s and Prim’s graph algorithms.

The heap data structure can be used to efficiently find the k smallest (or largest) elements in an array, merging k sorted arrays, median of a stream, etc.

Heap is a special data structure and it cannot be used for searching of a particular element.

Hashing

Hash Function

A function that converts a given big input key to a small practical integer value. The mapped integer value is used as an index in hash table. A good hash function should have following properties:

Efficiently computable
Should uniformly distribute the keys, meaning each table position equally likely for each key

Hash Table

An array that stores pointers to record corresponding to a given number. An entry in hash table is NULL if no existing number has hash function value equal to the index for the entry.

Collision Handling

Since a hash function gets us a small number for a key which is a big integer or string, there is possibility that two keys result in same value. The situation where newly inserted key maps to an already occupied slot in hash table is called collision and must be handled using some collision handling technique. Following are the ways to handle collision:

Chaining
The idea is to make each cell of hash table point to a linked list of records that have same hash function value. Chaining is simple, but requires additional memory outside the table.

Open Addressing
In open addressing, all elements are stored in the hash table itself. Each table entry contains either a record or NULL. When searching for an element, we one by one examine table slot until the desired element is found or it is clear that the element is not in the table.

Complexity

Space: O(n)
Search: O(1) in average, O(n) in worst case
Insertion: O(1) in average, O(n) in worst case
Deletion: O(1) in average, O(n) in worst case

Hashing seems better than BST for all the operations. However, in hashing, elements are unordered and in BST elements are stored in an ordered manner. Also, BST is easy to implement, but hash functions can sometimes be very complex to generate. In BST, we can also efficiently find floor and ceil of values.

Examples

Hashing can be used to remove duplicates from a set of elements, it can also be used to find frequency of all items. For example, in web browsers, we can check visited urls using hashing. In firewalls, we can use hashing to detect spam. We need to hash IP addresses. Hashing can be used in any situation where want search(), insert(), and delete() in O(1) time.

Trie

Trie is an efficient data structure for searching words in dictionaries. Search complexity with Tire is linear in terms of word (or key) length to be searched. If we store keys in binary search tree, a well balanced BST will need time proportional to (m * log n), where m is the maximum string length and n is the number of keys in tree. Using Trie, we can search the key in O(m) time. So it is faster than BST.

Hashing also provides word search in O(n) time on average. But, the advantages of Trie are that there are no collision (like hashing), so worst case time complexity is O(n). Also, the most important thing is Prefix search. With Trie, we can find all words beginning with a prefix, which is not possible with Hashing. The only problem with Trie is that it requires a lot of extra space. Tries are also know as radix tree or prefix tree.

The basic Trie structure can be defined as follows:

struct TrieNode
{
    int value;
    TrieNode *children[ALPHABET_SIZE];
};

Complexity

Insertion Time: O(m), where m is the length of the string
Deletion Time: O(m)
Search Time: O(m)
Space: O(ALPHABET_SIZE * m * n), where n is the number of the keys in the trie, ALPHABET_SIZE is 26 if we are only considering upper case Latin characters

Examples

The most common use of Tries is to implement dictionaries due to prefix search capability. Tries are also well suited for implementing approximate matching algorithms, including those used in spell checking. It is also used for searching contact from mobile contact list or phone directory.

Graph

Graph is a data structure that consists of following two components:

A finite set of vertices also called as nodes
A finite set of ordered pair of the form (u,v) called as edge. The pair is ordered because (u,v) is not same as (v,u) in case of directed graph (di-graph). The pair of form (u,v) indicates that there is an edge from vertex u to vertex v. The edges may contain weight / value / cost.

V -> Number of Vertices
E -> Number of Edges

Graph can be classified on the basis of many things, below are the two most common classifications:

Direction:
Undirected Graph: The graph in which all the edges are bi-directional.
Directed Graph: The graph is which all the edges are uni-directional.
Weight:
Weighted Graph: The graph in which weight is associated with the edges.
Unweighted Graph: The graph in which their is no weight associated to the edges.

Representation

There are three basic ways to represent a graph in memory.

Objects and Pointers
Adjacency Matrix
Adjacency List

Complexity

Adjacency Matrix:
Time: Traversal - O(v^2) by BFS or DFS
Space: O(V^2)
Adjacency List:
Time: Traversal - O(e log v) by BFS or DFS
Space: O(V+E)

Examples

The most common example of the graph is to find the shortest path in any network. Used in Google Map or Bing. Another common use application of graph is social networking websites, where the friend suggestion depends on the number of intermediate suggestions and other things.