17-binary-search.py
python-for-j277 / algorithms  ·  Lesson 17
Lesson 17 — Algorithms

Binary Search

A much faster search — but the list must be sorted first.

Binary search is a clever way to find an item by repeatedly halving the part of the list you are still looking in. It is far faster than a linear search on big lists — but it only works if the list is sorted first. Take your time with this one; it rewards a careful read.

The idea: halve the problem each time

Think about finding a word in a dictionary. You do not start at page 1 and read every word. You open it near the middle, see whether your word comes before or after, and throw away the half it cannot be in. You repeat that until you land on it. Binary search does exactly this.

The 'higher or lower' guessing game is the same trick: guess the middle number, find out if the answer is higher or lower, and you have instantly ruled out half the possibilities.

The list must be sortedBinary search only works on a sorted list. Throwing away 'the half it cannot be in' only makes sense if the values are in order. On an unsorted list it gives wrong answers.

First, sort the list with sorted()

Because binary search needs order, you usually sort the list first. Python's sorted() function takes a list and returns a new list with the items in order, smallest to largest.

sorted.py
scores = [56, 12, 91, 23, 8]
ordered = sorted(scores)
print(ordered)
print(scores)
Output
[8, 12, 23, 56, 91]
[56, 12, 91, 23, 8]
How it works
  • sorted() returns a brand-new list with the values in ascending order.
  • It does not change the original — scores is still in its first order afterwards.
  • Store the result in a variable (here ordered) so you can use the sorted version.

sorted() works on text too, putting strings into alphabetical order.

sortwords.py
names = ["Cara", "Ava", "Ben"]
print(sorted(names))
Output
['Ava', 'Ben', 'Cara']
sorted() versus .sort()sorted(list) hands back a new sorted list and leaves the original alone. There is also a .sort() method that re-orders the original list in place. For now, sorted() is the clearer choice.

How binary search works

Binary search keeps track of the part of the list it is still searching, using two markers: low (the first index) and high (the last index). Each pass it checks the middle item.

How it works
  • Find the middle index: mid = (low + high) // 2// keeps it a whole number.
  • If the middle item equals the target, you have found it — stop and return mid.
  • If the target is bigger than the middle item, it must be in the right half, so move low to mid + 1.
  • If the target is smaller, it must be in the left half, so move high to mid - 1.
  • Repeat while low <= high. If low ever passes high, the item is not there.
Every step throws away halfAfter each comparison you discard half of what is left. That is why binary search is so quick — it does not look at most of the list at all.

A worked example

Let us search this sorted list of 10 numbers for the target 23. The indexes run from 0 to 9.

trace.py
numbers = [2, 5, 8, 12, 16, 23, 38, 56, 72, 91]
#           0  1  2   3   4   5   6   7   8   9
target = 23

Follow the markers as the search narrows. low and high close in until mid lands on the target:

Steplowhighmidnumbers[mid]What happens
10941616 < 23 → search the right half, set low = 5
25975656 > 23 → search the left half, set high = 6
35652323 = 23 → found at index 5

It found 23 in just three checks. A linear search would have checked items 0, 1, 2, 3, 4 and 5 — six checks — to reach the same place.

Binary search in Python

binary.py
def binary_search(items, target):
    low = 0
    high = len(items) - 1
    while low <= high:
        mid = (low + high) // 2
        if items[mid] == target:
            return mid
        elif items[mid] < target:
            low = mid + 1
        else:
            high = mid - 1
    return -1

numbers = [2, 5, 8, 12, 16, 23, 38, 56, 72, 91]
print(binary_search(numbers, 23))
print(binary_search(numbers, 40))
Output
5
-1
How it works
  • low and high start at the two ends of the list.
  • The loop runs while there is still a section left to search (low <= high).
  • Each pass works out mid and compares the middle item with the target.
  • Finding the target returns its index straight away; return ends the function.
  • 40 is not in the list, so low eventually passes high, the loop ends and it returns -1.

Sort, then search

In real use the data may not arrive in order, so sort it first, then search the sorted version.

sortsearch.py
data = [56, 12, 91, 23, 8]
data = sorted(data)
print(data)
print(binary_search(data, 23))
Output
[8, 12, 23, 56, 91]
2
How it works
  • sorted() puts the data in order: [8, 12, 23, 56, 91].
  • 23 now sits at index 2 in the sorted list, so that is what is returned.
  • Always search the sorted list — the index refers to that order, not the original.

Binary search vs linear search

Both find an item in a list, but they work very differently and suit different situations.

Linear searchBinary search
Works onAny listSorted lists only
MethodCheck each item from the startHalve the search area each time
Worst case (n items)Up to n checksAbout log₂n checks
Speed on big listsSlowVery fast

The difference grows enormous as the list gets bigger. Each extra check by binary search can cover roughly double the items:

List sizeLinear (worst case)Binary (worst case)
10 items10 checks4 checks
1,000 items1,000 checks10 checks
1,000,000 items1,000,000 checks20 checks
The trade-offBinary search is dramatically faster, but it needs a sorted list. Sorting takes time too, so for a small or one-off search a linear search can be the simpler choice. For large lists searched many times, sorting once and using binary search wins easily.

What you have learned

  • Binary search repeatedly halves the search area, so it is very fast on large lists.
  • It only works on a sorted list — use sorted() to put items in order first.
  • It tracks the section left to search with low and high, checking the middle each time.
  • It returns the index if found, or -1 if low passes high.
  • Linear search works on any list but is slower; binary search is faster but needs sorting.