Estimations with Big-O

Section 25.5 Estimations with Big-O

The whole point of Big-O is to tell us how quickly the amount of work grows as the input size increases. In a linear algorithm, if the input size \(n\) doubles, we would expect the amount of work to approximately double as well. If \(n = 10\) and \(f(n) = n\) is how we estimate work, \(f(n) = 10\text{.}\) Doubling \(n\) to 20 would result in \(f(n) = 20\text{.}\)

🔗

In an algorithm with quadratic growth, the work will grow much faster. For example, if \(n = 10\) and \(f(n) = n^2\text{,}\) then \(f(n) = 10^2 = 100\text{.}\) When \(n\) is 10, we are estimating 100 units of work. But if \(n\) doubles to become \(20\) then \(n^2 = 20^2 = 400\text{.}\) Doubling the input size resulted in four times as much work.

🔗

One natural way we can use this estimation of work is to predict how long in real time (sometimes referred to as wall time or wall-clock time) it will take to solve a problem with a large input size based on the time it takes to solve a smaller version of the problem.

🔗

We assume that the amount of work done per unit of time will remain constant. Thus this proportion must hold:

🔗

\begin{gather*} \frac{\textrm{work for job 1}}{\textrm{work for job 2}} = \frac{\textrm{time for job 1}}{\textrm{time for job 2}} \end{gather*}

🔗

The work do do job 1 is to the work do do job 2 as the time for job 1 is to the time for job 2.

🔗

If we know, or can estimate three of those four values, we can solve for the fourth. Typically, we would measure the time to do some small version of the problem (job 1), use Big O to estimate the work for both the small and large versions of the problem (job 1 and job 2), and then solve for the time for job 2.

🔗

The key is to remember that the work does not necessarily equal the size of the problem. Instead, we have to use the size of the problem and the Big-O of the algorithm we are applying to calculate the approximate amount of work.

🔗

For example, say we have a list of 1000 things…

🔗

If we want to do a Binary Search, the Big-O is \(O(log_2(n))\text{.}\) That means the estimated work would be \(log_2(1000)\) or ~9.9657 units of work.
🔗

🔗
If we want to do a Linear Search, the Big-O is \(O(n)\text{.}\) That means the estimated work would just be 1,000 units of work.
🔗

🔗
If we want to do a Selection Sort, the Big-O is \(O(n^2)\text{.}\) That means the estimated work would be \({1000}^2\) or 1,000,000 units of work.
🔗

🔗
If we want to do a Merge Sort, the Big-O is \(O(n \log n)\text{.}\) That means the estimated work would be \(1000 \log_2(1000)\) or \(1000 \cdot 9.9657\) or approximately 9,965.7 units of work.
🔗

🔗

🔗

Note 25.5.1.

Recall that when we say \(\log n\) you should understand it as \(\log_2 n\) (log base 2) unless otherwise specified.

🔗

Example 25.5.2. Sample problem 1.

I have timed selection sort on 10,000 items and it takes 0.243 seconds. I want to estimate the time it will take to sort 50,000 items. Because selection sort is an \(O(n^2)\) algorithm, I know I need to square the problem sizes to estimate the amount of work required for each of the two jobs. So I can set up the proportion like this, where \(x\) is “time for job 2”:

\begin{gather*} \frac{ 10000^2 }{ 50000^2 } = \frac{0.243\textrm{ seconds}}{x} \end{gather*}

🔗

So…

\begin{gather*} \frac{100000000}{2500000000} = \frac{0.243\textrm{ seconds}}{x} \end{gather*}

🔗

To solve a proportion, we can cross multiply. Doing so gives us:

\begin{gather*} 100000000x = 0.243\textrm{ seconds} \cdot {2500000000} \end{gather*}

🔗

Solving for \(x\) gives:

\begin{gather*} x = 6.075\textrm{ seconds} \end{gather*}

Thus the estimated time to sort 50,000 items with selection sort is about 6.075 seconds.

🔗

As long as we are comparing the same algorithm performed on different input sizes, the fact that we are ignoring the constant factors in Big-O notation does not affect the accuracy of our estimates.

🔗

Say that while estimating the time to do selection sort of 50,000 things from a measured time for 10,000 things, we know that our implementation of selection sort involved \(5n^2\) work. If we included the constant factor in our proportion, it would be:

🔗

\begin{gather*} \frac{ 5 \cdot 10000^2 }{ 5 \cdot 50000^2 } = \frac{0.243\textrm{ seconds}}{x} \end{gather*}

The constant factors of 5 cancel out and do not affect the result.

🔗

Non-dominant terms do not cancel out, but will not do much to affect the result for large input sizes. If our algorithm’s work is \(n^2 + 3n\text{,}\) the \(3n\) and we set up the proportion using the \(3n\text{,}\) we would have:

🔗

\begin{gather*} \frac{ 10000^2 + 3 \cdot 10000 }{ 50000^2 + 3 \cdot 50000 } = \frac{0.243\textrm{ seconds}}{x} \end{gather*}

This becomes:

🔗

\begin{gather*} \frac{100030000}{2500150000} = \frac{0.243\textrm{ seconds}}{x} \end{gather*}

Which solves to approximately 6.074 seconds. This is almost identical to the result when we ignored the non-dominant terms.

🔗

Another use of Big-O estimations is to compare the time required for two different algorithms on the same input size. Maybe I have an algorithm that runs in \(O(n^2)\) time and I want to estimate how long it would take to run an \(O(n \log n)\) algorithm on the same input size.

🔗

Here we can’t rely on the constant factors cancelling each other out. We are comparing different algorithms that will have different constant factors. But, assuming we focus on large values of \(n\) and the constants are not huge, we can safely ignore them because the growth rates of the two functions are the dominant factor. Our estimate will be less accurate than when we are focusing on one algorithm, but can still provide useful insight.

🔗

Example 25.5.3. Sample problem 2.

I have timed a linear search on 10,000,000 items and it takes 8.12 seconds (call this job 1). I want to estimate the time it will take to use binary search instead (job 2).

🔗

The problem sizes are the same for both jobs: 10,000,000 items. However, the algorithms will require different amounts of work. Linear search is a \(O(n)\) algorithm, so the work for job 1 will be 10,000,000. For job 2, we are using a \(O(log_2(n))\) algorithm so the work will be \(log_2(10000000)\)

\begin{gather*} \frac{10000000}{log_2(10000000)} = \frac{8.12\textrm{ seconds}}{x} \end{gather*}

🔗

So…

\begin{gather*} \frac{10000000}{23.25} = \frac{8.12\textrm{ seconds}}{x} \end{gather*}

🔗

Cross multiplying gives:

\begin{gather*} 10000000(x) = 8.12\textrm{ seconds} \cdot 23.25 \end{gather*}

🔗

Solving for time for job 2 gives:

\begin{gather*} x = 0.000019\textrm{ seconds} \end{gather*}

🔗

That answer is almost certainly not exactly correct. But even if it is off by a factor of 100x, it is clear that binary search is significantly faster!

🔗

Checkpoint 25.5.1.

Hint 1.

The work for 3,000,000 items using \(O(\log_{2}n)\) is \(\log_{2}(3000000)\) or ~21.5165

🔗

Hint 2.

Your answer will be VERY close to the original time. \(\log_{2}(3000000)\) and \(\log_{2}(4000000)\) are very close in value.

🔗

Checkpoint 25.5.2.

Hint.

The work for 100,000 items using \(O(n \log_{2} n)\) is \(100000 \cdot \log_{2}(100000)\) or ~1,660,964

🔗

Checkpoint 25.5.3.

Hint.

The work for 20,000 items using an \(O(n)\) algorithm is proportional to 20,000. The work for 20,000 items using an \(O(n^2)\) algorithm is proportional to 400,000,000.

🔗

You have attempted of activities on this page.

🔗

Prev Top Next