Data Structures
Data Structure
A Data Structure is a specialized format for organizing, processing, retrieving,
and storing data in a computer's memory. Choosing the right data structure is the most crucial step
for writing efficient algorithms and software systems.
1. Introduction to Data Organization
Different tasks require fundamentally different ways to store data. If you need to search a dictionary, you want it alphabetized (sorted). If you need to track the history of web pages you visited, you want to easily go back to the most recent one (a Stack).
Categories of Data Structures
- Linear Data Structures: Data elements are arranged sequentially, one after the other. Each element is connected to its previous and next element (e.g., Arrays, Linked Lists, Stacks, Queues).
- Non-Linear Data Structures: Data elements are not in a strict sequence; they are arranged hierarchically or interconnectedly, allowing a single element to connect to multiple other elements (e.g., Trees, Graphs).
Key Takeaways
- A data structure is a specialized way of storing and organizing data so it can be used efficiently based on the specific needs of an algorithm.
- Linear structures store data sequentially, while non-linear structures store data hierarchically or interconnectedly.
2. Linear Data Structures
The most fundamental ways to store collections of data in a sequential manner.
2.1 Arrays
An Array is a collection of elements of the same data type stored in contiguous (adjacent) memory locations.
Array Characteristics
- Fixed Size (typically): In statically typed languages like C or Java, the size (amount of memory allocated) must be defined when the array is created and cannot easily change. (Dynamic arrays like
ArrayListin Java orlistin Python handle resizing automatically under the hood). - Fast Access (O(1) Time Complexity): You can instantly access any element using its mathematical index (e.g.,
arr[3]). The computer calculates the exact memory address instantly:BaseAddress + (Index * DataSize). - Slow Insertion/Deletion (O(n) Time Complexity): To insert a new element at the beginning or middle, you must physically shift all subsequent elements over by one memory slot to make room, which is computationally expensive for large arrays.
2.2 Linked Lists
A Linked List is a sequence of elements (called Nodes) where each node contains its data and a pointer (memory address) to the next node in the sequence. Memory is not contiguous; nodes are scattered randomly throughout RAM.
Linked List Characteristics
- Dynamic Size: It can grow and shrink easily during runtime by simply allocating a new node in memory and updating pointers. No memory is wasted.
- Fast Insertion/Deletion (O(1) if location is known): To insert a node in the middle, you only need to update the pointers of the two neighboring nodes. You don't have to shift any other data.
- Slow Access (O(n)): To find the 5th element, you cannot calculate its memory address instantly. You must start at the
Headnode and follow the pointers sequentially until you reach the 5th node. - Types: Singly Linked (points to next), Doubly Linked (points to next and previous), Circular Linked (last node points back to the first).
Key Takeaways
- Arrays provide extremely fast index-based access (O(1)) but are slow to modify in the middle because memory is contiguous and fixed.
- Linked Lists provide fast, dynamic insertions/deletions by updating pointers, but are slow to search or access because you must traverse from the start (O(n)).
3. Abstract Data Types (ADTs): Stacks and Queues
ADTs define how data should be accessed logically (the rules of interaction), but not exactly how it's stored physically in memory. Stacks and Queues can be implemented using either Arrays or Linked Lists under the hood.
3.1 Stacks (LIFO)
A Stack follows the Last-In, First-Out (LIFO) principle. Imagine a stack of physical plates: you add plates to the top, and you must remove plates from the top.
Stack Operations
- Push: Add an element to the top of the stack.
- Pop: Remove and return the top element from the stack.
- Peek (Top): View the top element without removing it.
Real-World Stack Uses
The 'Undo' feature in text editors. Browser history (Back button). The Call Stack used by the CPU to manage function calls, local variables, and recursion.
3.2 Queues (FIFO)
A Queue follows the First-In, First-Out (FIFO) principle. Imagine a line of people at a grocery store: the first person to get in line is the first person served.
Queue Operations
- Enqueue: Add an element to the back (rear) of the queue.
- Dequeue: Remove and return an element from the front (head) of the queue.
- Priority Queue: A variation where elements are dequeued based on an assigned priority rather than strictly arrival order.
Real-World Queue Uses
Printer job scheduling. Handling incoming requests on a web server. Task scheduling in Operating Systems (CPU time slicing). Buffer management in networking.
Key Takeaways
- Stacks operate on a LIFO (Last-In, First-Out) principle, useful for tracking history, reversing data, and managing function calls.
- Queues operate on a FIFO (First-In, First-Out) principle, useful for fair scheduling, asynchronous processing, and buffering data.
4. Non-Linear Data Structures
Used when data naturally forms a hierarchy or complex, multi-directional relationships.
4.1 Trees
A Tree is a hierarchical data structure consisting of nodes connected by edges. The topmost node is the Root. Nodes with no children are called Leaves.
Important Tree Types
- Binary Tree: Each node can have a maximum of two children (typically designated as Left and Right).
- Binary Search Tree (BST): A specialized Binary Tree enforcing a strict rule: for any given node, all values in its Left subtree are smaller, and all values in its Right subtree are larger. This allows incredibly fast searching (O(log n)), similar to a binary search on an array, but with fast, dynamic insertions.
- Balanced Trees (AVL, Red-Black): Self-adjusting BSTs that ensure the tree doesn't become lopsided (which would degrade search performance to O(n)).
- Heaps: A complete binary tree usually implemented with an array, used to instantly find the maximum (Max-Heap) or minimum (Min-Heap) value. They form the basis of Priority Queues and the HeapSort algorithm.
Real-World Tree Uses
File systems (directories and subdirectories). Parsing HTML/XML documents (the Document Object Model or DOM Tree). Fast database indexing (B-Trees). Decision logic in AI.
4.2 Graphs
A Graph is a network of nodes (called Vertices) connected by lines (called Edges). Unlike trees, graphs can have cycles (loops), disconnected components, and do not have a designated root node.
Graph Types & Terminology
- Directed Graph (Digraph): The edges have specific directions (indicated by arrows), meaning you can only travel one way between vertices (like one-way streets or Twitter followers).
- Undirected Graph: The edges have no direction, meaning the relationship is mutual and travel is two-way (like Facebook friends).
- Weighted Graph: Each edge has a numerical value (weight/cost) associated with it, representing physical distance, financial cost, or travel time.
Real-World Graph Uses
Social networks (mapping relationships). GPS routing apps like Google Maps (finding the shortest path between cities on a weighted graph using algorithms like Dijkstra's). Computer networks (LAN/WAN topologies).
Key Takeaways
- Trees naturally model hierarchical data. Binary Search Trees (BST) provide excellent O(log n) efficiency for searching, inserting, and deleting.
- Graphs model highly interconnected networks of data (vertices connected by edges), essential for mapping relationships and routing algorithms.
- Heaps are specialized trees optimized specifically for priority-based operations.
5. Hash Tables (Hash Maps / Dictionaries)
A Hash Table is a data structure that implements an associative array abstract data type, a structure that can map keys to values for nearly instantaneous lookup.
How Hashing Works
- Hash Function: Takes an arbitrary key (like a user's string ID "JohnDoe99") and mathematically converts it into a specific integer index. A good hash function is fast and distributes indices evenly.
- Buckets (Underlying Array): The resulting integer index points directly to a specific slot (bucket) in an underlying array where the actual value (e.g., the User Object) is stored.
- Collisions: Sometimes the hash function inevitably generates the exact same index for two completely different keys. This collision must be handled.
- Collision Resolution (Chaining): The most common method. Instead of storing the value directly in the array bucket, the bucket holds a Linked List. If a collision occurs, the new value is simply appended to the linked list at that bucket.
Performance Characteristics
When well-designed with a good hash function and minimal collisions, Hash Tables provide an incredible O(1) average time complexity for inserting, deleting, and searching data, making them the fastest data structure for lookup tasks. They are the backbone of modern programming (e.g., Python
dict, JavaScript Object or Map, Java HashMap).Key Takeaways
- Hash tables map keys to specific array indices using a hash function, allowing constant O(1) average lookup times.
- Collisions occur when two different keys map to the same array index and are typically resolved using Chaining (linked lists) or Open Addressing.
- They are widely used for caching data, database indexing, and creating associative arrays/dictionaries.
Summary
Key Takeaways
- Arrays provide contiguous memory for fast O(1) index access but suffer from slow O(n) insertions/deletions.
- Linked Lists provide dynamic memory with fast O(1) pointer-based insertions but suffer from slow O(n) sequential search.
- Stacks (LIFO) and Queues (FIFO) are Abstract Data Types that enforce strict access rules for ordering data logic.
- Trees (BST, Heaps) organize data hierarchically for fast O(log n) searching, sorting, and priority queuing.
- Graphs model complex, non-hierarchical networks using Vertices and Edges.
- Hash Tables use mathematical hash functions to achieve the ultimate O(1) key-value lookup performance.