Section 25.13 Constants and Optimization
Although we generally ignore constant factors when expressing time complexity in Big-O notation (and doing so is perfectly reasonable), we do need to keep them in mind when they do matter:
-
When the input size is small, constant factors can have a significant impact on performance.
-
When comparing algorithms with the same Big-O notation, constant factors can help determine which algorithm is more efficient in practice.
-
When optimizing code, reducing constant factors can lead to performance improvements even if the overall time complexity remains the same.
Subsection 25.13.1 Hidden Constant Factors
The constant factors that tend to matter most are actually hidden from us. Consider these lines of code:
int sum = 0;
for (int i = 0; i < n; i++) {
sum += i;
cout << "Sum so far:" << sum << endl;
}
cout << "Sum: " << sum << endl;
The loop runs \(O(n)\) times, so the overall time complexity is \(O(n)\text{.}\) But system calls to do things like output to the console are very expensive compared to simple arithmetic operations. If we remove the output statement from inside the loop, the overall time taken by the program could be significantly reduced, even though the time complexity remains \(O(n)\text{.}\)
Other system calls include things like file I/O, network communication, and memory allocation. These operations can all have significant constant time costs that impact the overall performance of an algorithm, especially for smaller input sizes.
Another common source of constant factors is memory access patterns. Modern memory systems have multiple levels of cache, and accessing data that is already in cache is much faster than accessing data from main memory. The speed difference between finding data in cache versus main memory can be on the order of a 100x or more.
When data is retrieved from main memory, it is often fetched in blocks (cache lines) that contain multiple contiguous bytes. So we might access index 10 of an array, but the system actually retrieves a block of data that includes indices 8 through 15 from main memory into the cache. Subsequent accesses to indices 8, 9, 11, 12, 13, 14, and 15 will then be much faster because they are already in cache.
This means that algorithms that access memory in a sequential or localized manner (e.g. access indexes 1, 2, 3, 4...) can benefit from better cache performance, leading to lower constant time factors. In contrast, algorithms that access memory in a random or scattered manner (e.g. access indexes 1, 523, 17, 249...) may more frequently have to go back to main memory.
Data structures that rely on pointers can also lead to poor cache performance because the nodes may be scattered throughout memory. In contrast, data structures like arrays or contiguous vectors tend to have better cache locality, leading to improved performance due to lower constant time factors.
Subsection 25.13.2 What to Do
So what should we do about constant factors? For now, not much. The biggest gains in efficiency come from reducing the overall time complexity of an algorithm. An \(O(n)\) algorithm with bad constant factors will still vastly outperform an \(O(n^2)\) algorithm with small constant factors for sufficiently large \(n\text{.}\)
The main constant factors to keep in mind as you learn about data structures are the costs associated with system calls (allocating memory) and memory access patterns (cache performance). They help explain why certain data structures perform better than others in practice, even when their Big-O time complexities are the same.
Worrying about improving constant factors too early can lead to more complex code. For example, programmers sometimes try to avoid calling
new repeatedly for small allocations because of the overhead of the system calls (asking the OS for memory). Instead, they might allocate a large block of memory using new one time, and then manage smaller allocations within that block themselves. While this can improve performance by reducing the number of system calls, it also adds complexity to the code and increases the risk of memory management errors.
Premature optimization is the root of all evil.βDonald Knuth
It is also sometimes hard to predict if extra work will actually improve constant factors. For example, consider a sorting algorithm that first scans through the data to check if it is already sorted before proceeding with the sorting process. While this extra scan adds overhead, it can significantly reduce the overall time taken for nearly-sorted data. However, for random data, the extra scan may just add unnecessary work without any benefit. Deciding whether such optimizations are worthwhile often requires empirical testing and profiling.
So although you should have an awareness of the hidden constant factors that can impact performance, you should not be overly concerned about trying to βfixβ or work around them as you are learning about algorithms and data structures.
You have attempted of activities on this page.
