With this definition, we can identify five important characteristics of algorithms"- Algorithms are well-ordered.
- Algorithms have unambiguous operations.
- Algorithms have effectively computable operations.
- Algorithms produce a result.
- Algorithms halt in a finite amount of time.
The sorting algorithms you will learn in the next few lessons share two basic operations in common. These operations are the comparison operation and the swap operation. We will look at each one in more detail before we examine our sorting algorithms.
The Comparison Operation : The comparison operation is simply a way of determining which item in a list should come first. If we are sorting a list of numbers from smallest to largest, the comparison operation tells us to place the number with the least value first. If we are sorting a list of letters alphabetically, the comparison operation tells us to place 'a' before 'b', 'b' before 'c', and so on. We will see that a sorting algorithm must usually perform many comparisons in order to correctly sort a list.
The Swap Operation: The swap operation is one way we move items as we are sorting. By swapping small items with large ones, we can place all the items in the correct order. When we use computers for sorting, the swap operation can be a little tricky because of the way computers copy data from one memory location to another. Using the animation below, see if you can correctly determine the algorithm for the swap operation.
Simple Sort:
Of course, we all know that the Simple Card Sort algorithm is not very useful to a computer. However, we can use the same idea as in our Simple Card Sort to write a Simple Sort that can be used by a computer. Let's see what this algorithm looks like and how it can be used to sort numbers in a computer.
Simple Sort Algorithm
- Get a list of unsorted numbers
- Repeat steps 3 through 6 until the unsorted list is empty
- Compare the unsorted numbers
- Select the smallest unsorted number
- Move this number to the sorted list
- Store a maximum value in the place of the smallest number
- Stop
|
Notice that most of the steps in our new algorithm are the same as the steps in the Simple Card Sort. The exception is step 6. We need this extra step because we are no longer moving cards from hand to hand but copying numbers in computer memory. For our algorithm to work, we must replace our original number with a special marker so it will not be considered again. The steps below illustrate how the Simple Sort algorithm works on a computer.
- First, we give the computer a list of unsorted numbers. These numbers are stored in a group of contiguous memory cells called an array. Each memory cell of the array holds a single number.
- As the computer sorts these numbers, it will repeatedly compare them to find the smallest number. This is similar to the comparisons we made when sorting our hand of cards. Each time we compared two cards and kept the smaller of the two. Then we compared this card to the remaining cards until we found a smaller one or checked all the cards. The computer uses the same process only with numbers rather than cards.
Once the smallest number is found, the computer will copy this number to a new array of memory cells and replace the old number with a special number called MAX. MAX is the largest number a single memory cell can hold. None of the remaining numbers can be larger than MAX, so this number is a good choice for marking memory cells that have already been sorted.
Unsorted Array
Sorted Array |
- Next, the computer begins searching for the smallest number in the unsorted list. Although it is easy for us to scan the numbers and select the 2 as smallest, the computer must compare all the memory cells in the unsorted array to be certain which number is smallest. This means the computer must perform six comparisons: (7 < 8), (7 > 5), (5 > 2), (2 < 4), (2 < 6), and finally (2 < 3). Once the comparisons are done, the computer copies 2 to the sorted array and replaces the original 2 with MAX.
Unsorted Array
Sorted Array |
- Now the computer begins searching for the smallest number again. Six more comparisons are required to determine that 3 is smallest: (7 < 8), (7 > 5), (5 < MAX), (5 > 4), (4 < 6), and finally (4 > 3). Now we can see the importance of replacing 2 with MAX in our previous step. If we had not made this change, then 2 would have been selected as the smallest number again. After copying 3 to the sorted array, the computer also replaces the original with MAX.
Unsorted Array
Sorted Array |
- With six more comparisons, the computer selects 4 as the smallest number, copies it to the sorted array, and replaces the original with MAX.
Unsorted Array
Sorted Array |
- The numbers 5, 6, 7, and 8 are also selected in turn by six comparisons, a copy, and a replacement of the original. Once all the memory cells in the unsorted array have been considered, the sorted array contains our original numbers in sorted order.
Unsorted Array
Sorted Array |
Insertion Sort
Now let's look at how the Insertion Sort algorithm would work inside a computer. Below is our modified algorithm for sorting a list of numbers.
Insertion Sort Algorithm
- Get a list of unsorted numbers
- Set a marker for the sorted section after the first number in the list
- Repeat steps 4 through 6 until the unsorted section is empty
- Select the first unsorted number
- Swap this number to the left until it arrives at the correct sorted position
- Advance the marker to the right one position
- Stop
|
This time the steps of our modified algorithm are almost identical to the steps in our card sorting algorithm. We have simply substituted numbers for playing cards and a list of numbers for a hand of cards. The steps below illustrate how the Insertion Sort algorithm works on a computer.
- First, we give the computer a list of unsorted numbers and store them in an array of memory cells.
- To begin the sort, the computer divides the sorted and unsorted sections of the list by placing a marker after the first number. To sort the numbers, it will repeatedly compare the first unsorted number with the numbers in the sorted section. If the unsorted number is smaller than its sorted neighbor, the computer will swap them.
- The first number in the unsorted section is 8, so the computer compares it with the number to the left. Since 8 is greater than 7, these numbers do not need to swapped and the computer simply advances the marker one position. Notice that only one comparison was needed to sort the 8.
- Now the first number in the unsorted section is 5. 5 is less than 8, so the computer swaps these numbers.
5 is also less than 7, so the computer swaps these numbers as well.
Now 5 is in the correct order, so the computer advances the marker one position. This time two comparisons and two swaps were needed to sort the number.
- Now the first number in the unsorted section is 2. 2 is less than 8, 7, and 5, so after three comparisons and three swaps, 2 arrives at the correct sorted position, and the computer advances the sort marker.
- Now the first number in the unsorted section is 4. 4 is less than 8, 7, and 5 but it is not less than 2. This time the computer performs four comparisons and three swaps to put the 4 in the correct order. Only three swaps were needed since the 2 and the 4 did not need to be switched. After these comparisons and swaps, the computer advances the sort marker.
- Now 6 is the first number in the unsorted section. After three comparisons and two swaps, the computer places the 6 in the correct position between 5 and 7. Notice that the computer did not need to compare the 6 with the 2 or the 4 since it already knows these numbers are less than 5. Once the computer finds a number in the sorted section less than 6, it knows it has found the correct position for 6 and it can advance the sort marker.
- The final unsorted number is 3. To find the correct position for 3, the computer must compare it with every number in the unsorted section. However, only five swaps are required since the first number (2) is less than 3. After moving 3 to the correct position and advancing the sort marker, the Insertion Sort is complete since the unsorted section is empty.
Selection Sort
Now that you have a general idea of how the Selection Sort works, let's see how the computer would perform this sort with numbers. Below is our modified algorithm for sorting a list of numbers.
Selection Sort Algorithm
- Get a list of unsorted numbers
- Set a marker for the unsorted section at the front of the list
- Repeat steps 4 - 6 until one number remains in the unsorted section
- Compare all unsorted numbers in order to select the smallest one
- Swap this number with the first number in the unsorted section
- Advance the marker to the right one position
- Stop
|
Again our modified algorithm is almost identical to our card sorting algorithm. We only needed to substitute numbers for playing cards and a list of numbers for a hand of cards. One other slight change is that steps 4 and 5 of the Selection Card Sort have been combined. This is typical since a computer will usually keep track of the smallest number while it compares all the numbers. The steps below illustrate how the Selection Sort algorithm works on a computer.
- First, we give the computer a list of unsorted numbers and store them in an array of memory cells.
- To begin the sort, the computer divides the sorted and unsorted sections of the list by placing a marker before the first number. To sort the numbers, the computer will repeatedly search the unsorted section for the smallest number, swap this number with the first number in the unsorted section, and update the sort marker.
- To find the smallest number in the unsorted section, the computer must make six comparisons: (7 < 8), (7 > 5), (5 > 2), (2 < 4), (2 < 6), and (2 > 3). After these comparisons, the computer knows that 2 is the smallest number, so it swaps this number with 7, the first number in the unsorted section, and advances the sort marker.
- Now five more comparisons are needed to find the smallest number in the unsorted section: (8 > 5), (5 < 7), (5 > 4), (4 < 6), and (4 > 3). After these comparisons, the computer swaps 3, the smallest number in the unsorted section, with 8, the first number in the unsorted section, and advances the sort marker.
- This time four comparisons are needed to determine that 4 is the smallest number in the unsorted section: (5 < 7), (5 > 4), (4 < 6), and (4 < 8). After these comparisons, the computer swaps 4 with 5 and then advances the sort marker.
- After three more comparisons, the computer identifies 5 as the smallest unsorted number: (7 > 5), (5 < 6), and (5 < 8). Then the computer swaps 5 with 7 and advances the sort marker.
- This time only two comparisons are needed to determine that 6 is the smallest number: (7 > 6) and (6 < 8). After these two comparisons, the computer swaps 6 with 7 and then advances the sort marker.
- Now we only need a single comparison to find the right position for 7: (7 < 8). Since 7 is the smallest number and it is also the first number in the unsorted section, the computer does not need to swap this number. It only needs to advance the sort marker. Now there is only one number in the unsorted section, so the list of numbers is sorted and the Selection Sort algorithm is complete.
Algorithm Analysis:
Types of Efficiency Measures
Space Amount of Computer Memory
Time # no of items copied, # no of item swapped, # no of item compared.
Comparision of Sort
Now that you have completed your calculations, let's summarize the results. We already know that the Insertion Sort and the Selection Sort were the most space-efficient, but we have yet to determine which sort is the most time-efficient. We will see that this answer is a little more difficult to determine.
Algorithm | # of memory cells |
Simple Sort Insertion Sort Selection Sort | 14 8 8 |
Algorithm | # of copies | # of comparisons |
Simple Sort Insertion Sort Selection Sort | 7 45 15 | 42 19 21 |
Notice that the Simple Sort required the least amount of copies. We would expect this to be true since it does not swap numbers while sorting. Instead the numbers are copied to a new list in the computer. This is a common tradeoff between time and space. Although the Simple Sort loses space efficiency by using two lists, it gains time efficiency because less copies are required. Of course this does not mean that it is always best to use the Simple Sort to gain more speed. If we are trying to sort a list of 5 million names the Simple Sort would use too much space in the computer's memory. It would be much better to swap items within the list rather than create two lists.
For number of comparisons, the Selection Sort and Insertion Sort were nearly the same. The Simple Sort, however, required twice as many comparisons. We can see the reason for this difference by thinking about how the algorithms work. Each algorithm repeatedly searches for the smallest number and then places this number in the correct position. For the Insertion Sort and the Selection Sort, each iteration of this process reduces the unsorted section by one number. During the next search, the computer does not need to make as many comparisons to find the smallest number. The Simple Sort, however, replaces sorted numbers with a marker called MAX. Each time the computer searches for the smallest number, it must compare all seven memory cells. This approach is much less efficient.
Given the particular set of seven numbers we sorted, the Selection Sort was the most time-efficient. However, it is important to understand that this may not be true for every set of seven numbers. Consider the following example.
If we use the Insertion Sort on these numbers only 8 comparisons and 1 swap would be needed to sort them. However, if we use the Selection Sort, 21 comparisons and 1 swap would be needed. In this case, the Insertion sort is more efficient.
Worst Case Comparison
Now that we have formulas to describe the worst case time efficiency of our sorting algorithms, we can see how the algorithms will perform as we increase the number of items to be sorted. First, let's review our formulas. Notice that the formulas for swaps have been converted to copies by multiplying by 3. This allows us to correctly compare Simple Sort with the other two sorts.
Algorithm | Comparisons | Copies |
Simple Sort | n * (n-1) | n |
Insertion Sort | (n-1) + (n-2) + ... + 1 | 3 * [(n-1) + (n-2) + ... + 1] |
Selection Sort | (n-1) + (n-2) + ... + 1 | 3 * (n-1) |
Using our formulas, we will compute the number of comparisons and copies needed to sort several lists of items. The number of items in each list increases by a power of ten.
# of Items | Simple Sort | Insertion Sort | Selection Sort |
10 | 90 | 45 | 45 |
100 | 9,900 | 4,950 | 4,950 |
1,000 | 999,000 | 499,500 | 499,500 |
10,000 | 99,990,000 | 49,995,000 | 49,995,000 |
From our table of comparisons, we can see two important facts. First, the Simple Sort always requires twice as many comparisons as the Insertion and Selection Sorts. This means we can rewrite the formula for these sorts as follows: (n * (n-1)) / 2. This expression is much easier to use than the summation (n-1) + (n-2) + ... + 1, especially when n is large.
Second, all three sorts require a tremendous number of comparisons as the number of items increases. In fact, we can see with the Simple Sort that the number of comparisons is very close to n2. Take another look at the formula you derived for the number of comparison the Simple Sort requires in the worst case. If we multiply the n into the parentheses, the formula becomes n2 - n. Now you see why our total comparisons is very near n2. You can also see why the number of comparisons needed is growing so quickly.
# of Items | Simple Sort | Insertion Sort | Selection Sort |
10 | 10 | 135 | 27 |
100 | 100 | 14,850 | 297 |
1,000 | 1,000 | 1,498,500 | 2,997 |
10,000 | 10,000 | 149,985,000 | 29,997 |
Now let's compare the number of copies each algorithm requires. Notice how quickly the number of copies required by the Insertion Sort is growing. This makes us suspect that the formula for the Insertion Sort contains an n2 term. In fact, the formula does contain such a term, but we must rewrite the formula to see it. Using the relationship we discovered earlier, we can replace the summation (n-1) + (n-2) + ... + 1 with (n * (n-1)) / 2 and simplify.
# of copies = 3 * [(n-1) + (n-2) + ... + 1] # of copies = 3 * [(n * (n-1)) / 2] # of copies = (3 / 2) * [n * (n-1)] # of copies = (3 / 2) * [n2 - n] |
The simplified formula shows the n2 term that we suspected. Notice that the other two formulas only have an n term rather than a n2 term. This is the reason that they grow slowly as the number of items increases.
Order Notation
In the last lesson, we discovered that we can represent the efficiency of our sorting algorithms with formulas. We derived these formulas by identifying the most important operations in the algorithm (e.g. comparisons, copies, or swaps) and then counting how many of these operations were required to sort a specific number of items. Next, we found a general pattern in the number of operations required at each step and used this pattern to write our formula. One such formula that we derived was the formula to describe the number of comparisons the Simple Sort requires in the worst case. The formula is given below.
# of comparisons = n * (n-1) # of comparisons = n2 - n |
Since the largest term in this formula is n2, we discovered that the number of comparisons needed grows very quickly. We also saw a couple other formulas whose largest term was n, and we found these formulas only showed moderate growth. From these examples, we can see that the size of the largest term in an algorithm's efficiency formula has a very great effect on the performance of an algorithm. In fact, algorithms are classified based on the size of this term. This classification is called "order notation" and it is used to compare the amount of work that different algorithms must perform to do the same job. An algorithm which has n2 as its highest term would be called an O(n2) algorithm (pronounced "order-n-squared algorithm"). An algorithm with n as its highest term would be called an O(n) algorithm (pronounced "order-n algorithm"). Two other common classes of algorithms are O(2n) and O(log n). The graph below shows you how the performance of these four classes of algorithms compares.
|
Although we used sorting algorithms to introduce the idea of efficiency and order notation, these ideas apply to all algorithms. For example, suppose we wrote a search algorithm to find a telephone number in a large list of phone numbers. We could also describe the efficiency of this algorithm using order notation. In this case, n would represent the size of phone directory (i.e. the number of phone numbers). The most important operation in this algorithm would be comparing numbers, so we would be interested in knowing how many comparisons our algorithm would need to find a particular phone number. If our algorithm is efficient, then the largest term in the efficiency formula will be n or log n. If our algorithm is inefficient, then our formula will have a larger term like n2 and 2n.
Summary
Let's take a quick review of all that we have learned about algorithms.
- We began by carefully defining what an algorithm is. In our definition, we saw five important characteristics that an algorithm must have:
- Algorithms are well-ordered.
- Algorithms have unambiguous operations.
- Algorithms have effectively computable operations.
- Algorithms produce a result.
- Algorithms halt in a finite amount of time.
- We discussed three different ways to represent algorithms: plain English, programming languages, and structured English. We found that plain English was too wordy and ambiguous for specifying algorithms. On the other hand, programming languages require knowledge of many details that are not relevant to algorithms. Our compromise was to use structured English to represent algorithms.
- We learned three different sorting algorithms: the Simple Sort, the Insertion Sort, and the Selection Sort. We found that these algorithms actually provide a solution to a class of problems. They are useful not only for sorting cards, but also for names, numbers, dates, etc.
- We compared the space efficiency and the time efficiency of the three sorting algorithms. Although the Selection Sort was the most efficient sort in our example, we found that these results can change depending on the number of items sorted and the order of the items. To perform a better comparison, we found formulas that describe the performance of our sorting algorithms in the worst case.
- We compared the efficiency formulas of the sorts and introduced the idea of order notation. We discovered that the performance of an algorithm depends heavily upon the size of the largest term in the efficiency formula. In order notation, this term defines what class an algorithm belongs to. Four common classes of algorithms are O(log n), O(n), O(n2), and, O(2n).
No comments:
Post a Comment