Advanced Python: Coroutines

Coroutines are extremely powerful constructs that are often confused with generators. The difference between generator and coroutines: a generator only produces values, while a coroutine can also consume them.
How coroutines works?

  • yield occurs and the generator pauses.
  • send() occurs from outside the function and the generator wakes up.
  • the value sent in assigned to the left side of the yield statement.
  • the generator continues processing until it encounters another yield statement.

Let’s take two examples.

  1. Score board in 1-1 match.

Suppose we want to make a score board for a basketball match between Houston and Golden State. There are multiple ways to do this, but let’s use coroutines.

First we need to create a coroutine object.

def tally():
    score = 0
    while True:
        increment = yield score
        score += increment

Then we use it to update score of Houston and GoldenState

houston = tally()
next(houston) # return 0
golden_state = tally()
next(golden_state) # return 0

Why call next? Whenever a ‘next’ call called on a coroutine object, the code inside is run until it meet the first yield statement, return the value on the right side of yield (which is score), pause and wait for the next call called.
So the first two values to be printed out are two zeros.

Then Houston just got 2 scores, right after that Golden State pay back with 3 scores

houston.send(2) # return 2
golden_state.send(3) # return 3

Next, Houston continue with 2 scores and Golden State reply them with 2 scores

houston.send(2) # return 4
golden_state.send(2) # return 5

The way that corountines work is: Whenever you call send method, it get the value that you send in, run the code until it meets the next yield statement, yield that value out and wait until the next send method is sent.
So first houston’s score is 0, we call next, it returns 0. Then we send 2, it assigns 2 for increment, score increase 2, meet yield statement, and yield out score (2), pause there. Finally, we send 2, it assigns 2 for increment, score increase 2, meet yield statement, and yield out score again (now 4). Since yield is inside an infinite loop, it will run forever.

2. Log file parser.
Let’s say we have a log file with the content below.

unrelated log messages
sd 0:0:0:0 Attached Disk Drive
unrelated log messages
sd 0:0:0:0 (SERIAL=ZZ12345)
unrelated log messages
sd 0:0:0:0 [sda] Options
unrelated log messages
unrelated log messages
sd 2:0:0:1 Attached Disk Drive
unrelated log messages
sd 2:0:0:1 (SERIAL=ZZ67890)
unrelated log messages
sd 2:0:0:1 [sdb] Options
unrelated log messages
sd 3:0:1:8 (SERIAL=WW11111)
unrelated log messages
sd 3:0:1:8 [sdc] Options
unrelated log messages
unrelated log messages

The task is: obtain the serial number of any drives that have XFS ERROR.

We use regular expression for this task, so what we send to the coroutine is the pattern that the re module use to find matching values.

Let’s define a function called get_serials, this function takes a file_name as an input, and it will return all the serial numbers of devices (or drives) that have XFS ERROR.

import re
def get_serials(file_name):
    ERROR_PAT = 'XFS ERROR ('\[sd[a-z]\])'
    matcher = match_regex(file_name, ERROR_PAT)
    device = next(matcher)
    while True:
        bus = matcher.send('(sd \S+) {}'.format(re.escape(device)))
        serial = matcher.send('{} \(SERIAL=([^)]*)\)'.format(bus))
        yield serial 
        device = matcher.send(ERROR_PAT)

What is missing from the code above is match_regex function, we haven’t define it yet. It will be our coroutine object, we send it the pattern, it will return the string match the pattern.

def match_regex(file_name, regex):
    with open(file_name) as f:
        lines = f.readlines()
    for line in reversed(lines):
        match = re.match(regex, line)
        if match:
            t =[0]
            regex = yield t 

Here we read the log file in reversed order.
Note it regex = yield t. First next call on match_regex inside get_serial will run the code inside match_regex, with the initialized pattern. It yields the result of the first matching line, which is XFS ERROR [sdc].
Each iteration inside while of get_serials will get the serial number of the drive that has XFS Error. The first send call will send the pattern to get the drive’s bus, which is something like sd 2:0:0:1, then the serial will be found based on that bus value. It keeps repeating the process until nothing to be parsed.
The returned values (serial numbers) are: WW11111, ZZ12345

Those are two example that demo the usage of coroutines. In real life software development, we rarely see coroutines in action, but it still worth knowing.