Understanding The Python Reduce Function With Examples

While there are plenty of Python libraries available for data manipulation, they can be overkill for simple data transformation. Python standard library comes with functools.reduce() function, one of the most used functions in functional programming, is handy for simple data transformation. If you heard of reduce() or functional programming but are unsure of what they really are and how they could help you to write better Python code, this is the article for you.

In this tutorial, you'll:

  • Understand the basic principles of functional programming
  • Understand what the Python reduce() function is
  • Learn how Python reduce function can be used for basic data massaging and deriving insights from data
  • Be introduced to accumulate, that offers similar functionality as reduce() function
  • Recap what lambda function and iterable are

After reading this tutorial, you should be able to use Python reduce() to derive useful information from raw data, and to perform data transformation. You should also be able to identify the differences and use cases for reduce() and accumulate and use them accordingly.

What Is Functional Programming in Python?

Functional programming is a programming paradigm that focuses on writing declarative code and pure functions without any side-effects. Without going deep into the philosophy and history of it, when writing code of functional-style, you write functions that:

  • Don't change any input arguments or variables within its scope
  • Don't make any network requests
  • Don't print anything to the console
  • Don't include any randomness
  • Always return the same value given the input arguments remain unchanged

Python is a multi-paradigm language and supports functional programming. It comes with much more than reduce() in terms of functional programming. To learn more about functional programming, you can see the article about map() in Python and the HOWTO guide in the official documentation.

What Is Python Reduce?

Let's get down to business, what exactly is reduce() in Python? You'll start off with a naive example, multiplying a series of integers.

>>> # Using for loop, the imperative way
>>> multipliers = [2, 10, 4, 16]
>>> accumulation = 1
>>> for number in multipliers:
...     accumulation *= number

>>> accumulation
1280

From the code snippet, you just used a for loop to multiply all the numbers and stored the results in the variable accumulation

Next, let's take a look at an alternative solution with reduce():

>>> # Using reduce(), a functional approach
>>> from functools import reduce

>>> multipliers = [2, 10, 4, 16]

>>> accumulation = reduce(
...     lambda acc, number: acc * number,
...     multipliers
... )

>>> accumulation
1280

From the code snippet, instead of using a for loop, you imported reduce() from functools and used reduce() to achieve the purpose of multiplying numbers and stored the results in a variable accumulation.

This is just to give you a brief idea of how reduce() can be leveraged as an alternative solution to multiply integers.

In fact, since Python 3.8, a new function math.prod() is added to perform multiplication over iterable of numbers.

The purpose of the snippet above is to show you an alternative to writing imperative code with for loops. You'll see how reduce() can be used in more complex examples in no time.

Function Signature of Python Reduce

Let's dive into the function signature of reduce():

functools.reduce(function, iterable[, initializer])

reduce() takes:

  • function: the first argument that defines the function to apply, also known as the predicate of reduce function
  • iterable: the second argument with the values to be passed to function
  • initializer: the third argument, that is the value to start with.

Notice that the arguments in square brackets [ ] are optional.

An iterable is an object that is capable of returning one element at a time. Examples of iterables include all sequence type such as str, list, tuple, set, and any objects that implements __iter__() method. Iterables can be used with for loops and functions that take a sequence. To learn about iterable, you can read this Real Python article and this glosarry for the full definition.

Next, you'll learn how Python reduce() function works. It is approximate to:

>>> # Rough implementation of reduce(), taken from Python official documentation:
>>> # https://docs.python.org/3/library/functools.html#functools.reduce

>>> def reduce(function, iterable, initializer=None):
...     it = iter(iterable)
...     if initializer is None:
...         value = next(it)
...     else:
...         value = initializer
...     for element in it:
...         value = function(value, element)
...     return value

reduce() applies the provided function cumulatively with all the elements in the iterable, then returns the final result. In other words, reduce() builds up the result by changing it while it walks through every item of the input iterable.

If an initializer is provided, it will be used as the first element before moving to any elements of the iterable. You'll see examples when the initializer is needed in the subsequent parts.

However, you must also know that the predicate function is constrained to behave this way:

  1. It takes 2 positional arguments: the first is known as the accumulate value, the second is the update value
  2. It always returns the accumulate value

This is the common code pattern of a predicate:

def predicate(accumulator, current_element):
    # do something
    return accumulator

In our previous example of multiplying a list of integers, you also noticed the lambda operator. If you are unfamiliar with this keyword, it is essentially an anonymous function that implicitly returns the value of its last expression. To read more about lambda function, you can refer to this article.

Python Reduce in Examples

In this section, you will look at multiple use cases of reduce().

Example 1: Calculating the Number of Occurrences

In this example, you'll see how you can leverage Python reduce function to count the number of occurrences of even numbers in a list.

First, let's try to achieve with a for loop:

>>> values = [22, 4, 12, 43, 19, 71, 20]
>>> # Using for loop, the imperative way
>>> count = 0
>>> for number in values:
...     if not number % 2:
...         count += 1

>>> count
4

Now that you've counted the occurrences of even number with a for loop, you can try the functional programming way with reduce():

>>> from functools import reduce
>>> values = [22, 4, 12, 43, 19, 71, 20]
>>> count = reduce(
...     lambda acc, num: acc if num % 2 else acc + 1,
...     values
... )
>>> count
25

However, if you run the code above, you'll get 25 instead of 4.

Here is where initializer, the third argument of reduce() function comes in. If the initializer is present, it will be used as the value of the acc (or the first argument of our predicate function) in the first iteration.

In our previous example, since initializer is absent, the acc of the first iteration will, instead, be the first element of the iterable. That being said, the value of the acc in the first iteration will be 22.

Now you understand how initializer works, let's correct the faulty code by adding an initializer:

>>> # Using reduce() with initializer
>>> from functools import reduce
>>> values = [22, 4, 12, 43, 19, 71, 20]
>>> count = reduce(
...     lambda acc, num: acc if num % 2 else acc + 1,
...     values,
...     0    # Initializer
... )
>>> count
4

You initialized the value of initializer to 0. Run the code above, you should get 4 as the value of count.

Let's zoom into the predicate further:

lambda acc, num: acc if num % 2 else acc + 1

From the lambda function above:

  • If the number is even, the incremented acc is passed to the next iteration
  • Else, return the current value of acc

Don't forget that the value of the first argument (acc in this example) of the predicate will always be the value returned from the previous iteration!

Example 2: Creating a New dict Structure

In this example, you'll be given a scenario when you need to extract RSVP responses to event invitations. You'll get a list of invitees as below:

>>> list_of_invitees = [
...     {"email": "alex@example.com", "name": "Alex", "status": "attending"},
...     {"email": "brian@example.com", "name": "Brian", "status": "declined"},
...     {"email": "carol@example.com", "name": "Carol", "status": "pending"},
...     {"email": "derek@example.com", "name": "Derek", "status": "attending"},
...     {"email": "ellen@example.com", "name": "Ellen", "status": "attending"}
... ]

Let's say you want to visualize the RSVP status of invitations by creating a dictionary as such:

{
   "alex@example.com": "attending",
   "brian@example.com": "declined",
   "carol@example.com": "pending",
   "derek@example.com": "attending",
   "ellen@example.com": "attending"
}

The snippet above will be the result of your data transformation. It was transformed from a list of dictionaries which each dictionary contains the invitee's email, name and RSVP status, to a dictionary of RSVP statuses that is accessible using invitees' emails.

To achieve it, you can take advantage of the Python reduce function. First, you define your predicate function.

>>> def transform_data(acc, invitee):
...     acc[invitee["email"]] = invitee["status"]
...     return acc

In transform_data(), you add the invitee["email"] as a new key into the acc dictionary and then assign the invitee["status"] as the value.

As you can see, the predicate doesn't need to be a lambda function. It can be an ordinary function, method, or any Python callable.

Then, feed the reduce() function with your predicate transform_data, iterable list_of_invitees and an initializer of an empty dictionary {} and you'll see the results show up with a blink of an eye.

>>> results = reduce(
...     transform_data,
...     list_of_invitees,
...     {}    # Initializer
... )

>>> results
{'alex@example.com': 'attending',
'brian@example.com': 'declined',
'carol@example.com': 'pending',
'derek@example.com': 'attending',
'ellen@example.com': 'attending'}

From the above, you used reduce() to visualize RSVP status of invitations with a dictionary which keys are the invitees' emails and the values are their corresponding responses.

Congratulations! You just created a new dictionary structure with reduce() function. The keys of your dictionary are invitees' emails and the value of each key is the corresponding RSVP status.

It is normal that you are doubtful of whether to use an initializer and if so, what to be passed as the initializer. The thought process that you can take to guide is:

  1. Ask yourself: "Do the data structure of the expected results and the elements of your iterable match"? -> If not, use an initializer, else, you may not need it
  2. If an initializer is required, the data structure or data type should generally be the same as the expected result

Of course, this thought process is not foolproof, as no two situations are the same.

Example 3: Derive Insights From a List of Event Attendees

In the third example, you'll be given a list of attendees (note that this example has nothing to do with Example 2), and your task is to tell:

  1. The number of accompanied guests and total guests
  2. How many vegans and non-vegans attended
>>> # Your given list of attendees
>>> list_of_attendees = [
...     {"name": "Zeke", "vegan": True, "brought_guests": True,
...      "guests": [{"name": "Amanda", "vegan": False},
...                 {"name": "Wayne", "vegan": True}]},
...     {"name": "Xavier", "vegan": True, "brought_guests": False},
...     {"name": "Yohanna", "vegan": False,
...      "brought_guests": True,
...      "guests": [{"name": "Lily", "vegan": True},
...                 {"name": "Stefano", "vegan": True}]},
...     {"name": "Kael", "vegan": False, "brought_guests": False},
...     {"name": "Landon", "vegan": True, "brought_guests": False},
... ]

Task 1: Calculate the Number of Attendees Who Brought Guests

The expected output should be a dictionary:

 {
     "guest_who_brought_guests": 2,
     "total_guests": 9
 }

Feel free to open up your Python interpreter and tickle your brain before moving to the solution presented next.

Sample Solution to Task 1

>>> def derive_guest_count(acc, attendee):
...     acc["total_guests"] += 1
...
...     if attendee["brought_guests"]:
...         acc["guest_who_brought_guests"] += 1
...         acc["total_guests"] += len(attendee["guests"])
...
...     return acc

>>> results = reduce(
...     derive_guest_count,
...     list_of_attendees,
...     {   # Initializer
...         "guest_who_brought_guests": 0,
...         "total_guests": 0
...     }
... )

Let's dive into the predicate (derive_guest_count()).

  • In line 1, you increment the count of total_guests.
  • Subsequently, you check if the attendee brought guests by using a if statement. If so, you increment the count of guest_who_brought_guests and add the total number of guests went along (len(attendee["guests"])) into the count of total_guests.

After defining the predicate, you feed the reduce() function with your predicate, list of attendees (list_of_attendees) and an initializer of a dictionary with keys:

  • guest_who_brought_guests: to keep track of number of accompanied guests
  • total_guests: to keep the count of total guests attended

Task 2: Calculate the Number of Vegan and Non-Vegan

In the second task, with the same list of attendees, you're asked to derive the number of vegan and non-vegan. The expected output should be:

{
    "vegan": 6,
    "non_vegan": 3
}

Again, try to come up with a solution before moving forward.

Sample Solution to Task 2

>>> def derive_vegan_info(acc, attendee):
...     if attendee["vegan"]:
...         acc["vegan"] += 1
...     else:
...         acc["non_vegan"] += 1
...
...     if attendee.get("brought_guests"):
...         for guest_brought in attendee["guests"]:
...             # Check guests recursively
...             acc = derive_vegan_info(acc, guest_brought)
...
...     return acc

>>> results = reduce(
...     derive_vegan_info,
...     list_of_attendees,
...     {"vegan": 0,"non_vegan": 0}
... )
>>> results
{"vegan": 6,
"non_vegan": 3}

Let's take a look at the predicate (derive_vegan_info):

  • You first increment the count of vegan if the value of attendee["vegan"] is True. Otherwise, increment the value of non_vegan
  • Next, you check if the attendee brought along any guests. If so, you iterate through the list of extra guests attended. For every extra guest attended, you derive the vegan info recursively by invoking derive_vegan_info and passing the accumulation value (acc) and the info of the extra guest (guest_brought).

If you are new to the concept of recursion, check this Real Python article.

Up to now, I hope you understand how Python reduce function works. Be reminded that it takes a couple of practices to be accustomed to it.

The Close Sibling of Python Reduce Function: accumulate()

In Python, reduce function is sometimes being associated with its close relative, the itertools.accumulate() object. With reduce(), all you get is the end result after all the iterations run internally. What happens if you need the intermediate results of each iteration? Here is where the accumulate() shines.

Similarly to reduce(), which has to be explicitly imported from functools. accumulate() is located in the itertools module.

Remember in the very beginning of this tutorial, you saw multiplication using reduce(). Let's visualize the intermediate results with accumulate():

>>> from itertools import accumulate
>>> numbers = [2, 10, 4, 16]
>>> accumulation = accumulate(
...     numbers,    # Iterable
...     lambda acc, number: acc * number    # Predicate
... )

>>> list(accumulation)
[2, 20, 80, 1280]

Since the return value of accumulate is an object of itertools.accumulate type. You have to explicitly cast it to a Sequence (in your case, a list) to view the results.

Readers with sharp eyes will notice the function signature of accumulate() is slightly different. The first argument is iterable and the predicate is now the second positional argument.

Let's take a look at the function signature of itertools.accumulate():

itertools.accumulate(iterable[,func, *, initial=None])

Notice that the arguments in square brackets [ ] are optional, the default argument of the second positional argument func is operator.add. The operator module exports a set of functions corresponding to the intrinsic operators of Python. For example, operator.add(x, y) is equivalent to the expression x + y.

That being said, if the second argument func is absent, accumulate by default sums up all the elements in the iterable.

Note that the optional keyword argument initial is added since Python 3.8. If you are using a prior version, the function signature will be itertools.accumulate(iterable[, func]).

The operator module also exports mul() which is the multiplication correspondent to sum(). To further simplify the example of accumulate above, you can instead write:

>>> import operator
>>> from itertools import accumulate
>>> numbers = [2, 10, 4, 16]
>>> accumulation = accumulate(
...     numbers,    # Iterable
...     operator.mul    # This line has changed
... )

>>> list(accumulation)
[2, 20, 80, 1280]

Another difference between reduce() and accumulate() is that the latter returns an iterable itertools.accumulate object. To iterate through the results, feed the return object into a for loop or any functions that take iterable. With itertools.accumulate(), you get all the values obtained in the accumulation process, while with functools.reduce() you get just the last one.

Anti-Patterns of Python Reduce

The anti-patterns of using Python reduce function largely stem from the principles of functional programming. While using Python reduce:

  1. You shouldn't mutate any arguments other than the accumulation value
  2. You shouldn't create any side-effects in your predicate function

Here is a demonstration of a bad predicate, using the list of attendees you've seen in Example 3:

# A bad predicate
def derive_guest_count(acc, attendee):
    # Anti-pattern 1: Mutating the input argument
    attendee["processed"] = True

    # Anti-pattern 2: Creating side-effect, printing to console
    print(f"Processing {attendee['name']}")

    # The lines below remain unchanged
    acc["total_guests"] += 1

    if attendee["brought_guests"]:
        acc["guest_who_brought_guests"] += 1
        acc["total_guests"] += len(attendee["guests"])

    return acc

From the code above, the predicate tries to mutate the input argument attendee by changing the value of the key processed to True.

It also tries to print a statement to the console using the print() function.

Both of these are common mistakes even for some experienced Pythonistas.

Conclusion

In this tutorial:

  • You've learned how reduce() in Python works and how it can be used for deriving insights and data transformation, as well as some of the anti-patterns
  • You're introduced to a similar function, accumulate() to gather the intermediate results and understood its differences from reduce()
  • You're introduced to the operator module, which houses many convenient functions correspond to Python operators

Now, you can leverage reduce() to perform data massage and to extract useful information from raw data without relying on external packages. Any ideas or use cases where reduce() or accumulate() can be taken advantage of? Tell us how you use them, leave us a comment below. Have fun reducing!

No Comments Yet