While there are plenty of Python libraries available for data manipulation, they can be overkill for simple data transformation. Python standard library comes with functools.reduce()
function, one of the most used functions in functional programming, is handy for simple data transformation. If you heard of reduce()
or functional programming but are unsure of what they really are and how they could help you to write better Python code, this is the article for you.
In this tutorial, you'll:
- Understand the basic principles of functional programming
- Understand what the Python
reduce()
function is - Learn how Python reduce function can be used for basic data massaging and deriving insights from data
- Be introduced to
accumulate
, that offers similar functionality asreduce()
function - Recap what lambda function and iterable are
After reading this tutorial, you should be able to use Python reduce()
to derive useful information from raw data, and to perform data transformation. You should also be able to identify the differences and use cases for reduce()
and accumulate
and use them accordingly.
What Is Functional Programming in Python?
Functional programming is a programming paradigm that focuses on writing declarative code and pure functions without any side-effects. Without going deep into the philosophy and history of it, when writing code of functional-style, you write functions that:
- Don't change any input arguments or variables within its scope
- Don't make any network requests
- Don't print anything to the console
- Don't include any randomness
- Always return the same value given the input arguments remain unchanged
Python is a multi-paradigm language and supports functional programming. It comes with much more than reduce()
in terms of functional programming. To learn more about functional programming, you can see the article about map() in Python and the HOWTO guide in the official documentation.
What Is Python Reduce?
Let's get down to business, what exactly is reduce()
in Python? You'll start off with a naive example, multiplying a series of integers.
>>> # Using for loop, the imperative way
>>> multipliers = [2, 10, 4, 16]
>>> accumulation = 1
>>> for number in multipliers:
... accumulation *= number
>>> accumulation
1280
From the code snippet, you just used a for
loop to multiply all the numbers and stored the results in the variable accumulation
Next, let's take a look at an alternative solution with reduce()
:
>>> # Using reduce(), a functional approach
>>> from functools import reduce
>>> multipliers = [2, 10, 4, 16]
>>> accumulation = reduce(
... lambda acc, number: acc * number,
... multipliers
... )
>>> accumulation
1280
From the code snippet, instead of using a for
loop, you imported reduce()
from functools
and used reduce()
to achieve the purpose of multiplying numbers and stored the results in a variable accumulation
.
This is just to give you a brief idea of how reduce()
can be leveraged as an alternative solution to multiply integers.
In fact, since Python 3.8, a new function
math.prod()
is added to perform multiplication over iterable of numbers.
The purpose of the snippet above is to show you an alternative to writing imperative code with for
loops. You'll see how reduce()
can be used in more complex examples in no time.
Function Signature of Python Reduce
Let's dive into the function signature of reduce()
:
functools.reduce(function, iterable[, initializer])
reduce()
takes:
function
: the first argument that defines the function to apply, also known as the predicate of reduce functioniterable
: the second argument with the values to be passed tofunction
initializer
: the third argument, that is the value to start with.
Notice that the arguments in square brackets [ ]
are optional.
An iterable is an object that is capable of returning one element at a time. Examples of iterables include all sequence type such as
str
,list
,tuple
,set
, and any objects that implements__iter__()
method. Iterables can be used withfor
loops and functions that take a sequence. To learn about iterable, you can read this Real Python article and this glosarry for the full definition.
Next, you'll learn how Python reduce()
function works. It is approximate to:
>>> # Rough implementation of reduce(), taken from Python official documentation:
>>> # https://docs.python.org/3/library/functools.html#functools.reduce
>>> def reduce(function, iterable, initializer=None):
... it = iter(iterable)
... if initializer is None:
... value = next(it)
... else:
... value = initializer
... for element in it:
... value = function(value, element)
... return value
reduce()
applies the provided function
cumulatively with all the elements in the iterable, then returns the final result. In other words, reduce()
builds up the result by changing it while it walks through every item of the input iterable.
If an initializer
is provided, it will be used as the first element before moving to any elements of the iterable. You'll see examples when the initializer is needed in the subsequent parts.
However, you must also know that the predicate function
is constrained to behave this way:
- It takes 2 positional arguments: the first is known as the accumulate value, the second is the update value
- It always returns the accumulate value
This is the common code pattern of a predicate:
def predicate(accumulator, current_element):
# do something
return accumulator
In our previous example of multiplying a list of integers, you also noticed the
lambda
operator. If you are unfamiliar with this keyword, it is essentially an anonymous function that implicitly returns the value of its last expression. To read more about lambda function, you can refer to this article.
Python Reduce in Examples
In this section, you will look at multiple use cases of reduce()
.
Example 1: Calculating the Number of Occurrences
In this example, you'll see how you can leverage Python reduce function to count the number of occurrences of even numbers in a list.
First, let's try to achieve with a for
loop:
>>> values = [22, 4, 12, 43, 19, 71, 20]
>>> # Using for loop, the imperative way
>>> count = 0
>>> for number in values:
... if not number % 2:
... count += 1
>>> count
4
Now that you've counted the occurrences of even number with a for
loop, you can try the functional programming way with reduce()
:
>>> from functools import reduce
>>> values = [22, 4, 12, 43, 19, 71, 20]
>>> count = reduce(
... lambda acc, num: acc if num % 2 else acc + 1,
... values
... )
>>> count
25
However, if you run the code above, you'll get 25
instead of 4
.
Here is where initializer
, the third argument of reduce()
function comes in. If the initializer
is present, it will be used as the value of the acc
(or the first argument of our predicate function) in the first iteration.
In our previous example, since initializer
is absent, the acc of the first iteration will, instead, be the first element of the iterable
. That being said, the value of the acc
in the first iteration will be 22
.
Now you understand how initializer works, let's correct the faulty code by adding an initializer
:
>>> # Using reduce() with initializer
>>> from functools import reduce
>>> values = [22, 4, 12, 43, 19, 71, 20]
>>> count = reduce(
... lambda acc, num: acc if num % 2 else acc + 1,
... values,
... 0 # Initializer
... )
>>> count
4
You initialized the value of initializer
to 0
. Run the code above, you should get 4
as the value of count
.
Let's zoom into the predicate further:
lambda acc, num: acc if num % 2 else acc + 1
From the lambda function above:
- If the number is even, the incremented
acc
is passed to the next iteration - Else, return the current value of
acc
Don't forget that the value of the first argument (acc
in this example) of the predicate will always be the value returned from the previous iteration!
Example 2: Creating a New dict
Structure
In this example, you'll be given a scenario when you need to extract RSVP responses to event invitations. You'll get a list of invitees as below:
>>> list_of_invitees = [
... {"email": "alex@example.com", "name": "Alex", "status": "attending"},
... {"email": "brian@example.com", "name": "Brian", "status": "declined"},
... {"email": "carol@example.com", "name": "Carol", "status": "pending"},
... {"email": "derek@example.com", "name": "Derek", "status": "attending"},
... {"email": "ellen@example.com", "name": "Ellen", "status": "attending"}
... ]
Let's say you want to visualize the RSVP status of invitations by creating a dictionary as such:
{
"alex@example.com": "attending",
"brian@example.com": "declined",
"carol@example.com": "pending",
"derek@example.com": "attending",
"ellen@example.com": "attending"
}
The snippet above will be the result of your data transformation. It was transformed from a list of dictionaries which each dictionary contains the invitee's email, name and RSVP status, to a dictionary of RSVP statuses that is accessible using invitees' emails.
To achieve it, you can take advantage of the Python reduce function. First, you define your predicate function.
>>> def transform_data(acc, invitee):
... acc[invitee["email"]] = invitee["status"]
... return acc
In transform_data()
, you add the invitee["email"]
as a new key into the acc
dictionary and then assign the invitee["status"]
as the value.
As you can see, the predicate doesn't need to be a lambda function. It can be an ordinary function, method, or any Python callable.
Then, feed the reduce()
function with your predicate transform_data
, iterable list_of_invitees
and an initializer of an empty dictionary {}
and you'll see the results show up with a blink of an eye.
>>> results = reduce(
... transform_data,
... list_of_invitees,
... {} # Initializer
... )
>>> results
{'alex@example.com': 'attending',
'brian@example.com': 'declined',
'carol@example.com': 'pending',
'derek@example.com': 'attending',
'ellen@example.com': 'attending'}
From the above, you used reduce()
to visualize RSVP status of invitations with a dictionary which keys are the invitees' emails and the values are their corresponding responses.
Congratulations! You just created a new dictionary structure with reduce()
function. The keys of your dictionary are invitees' emails and the value of each key is the corresponding RSVP status.
It is normal that you are doubtful of whether to use an initializer and if so, what to be passed as the initializer. The thought process that you can take to guide is:
- Ask yourself: "Do the data structure of the expected results and the elements of your iterable match"? -> If not, use an initializer, else, you may not need it
- If an initializer is required, the data structure or data type should generally be the same as the expected result
Of course, this thought process is not foolproof, as no two situations are the same.
Example 3: Derive Insights From a List of Event Attendees
In the third example, you'll be given a list of attendees (note that this example has nothing to do with Example 2), and your task is to tell:
- The number of accompanied guests and total guests
- How many vegans and non-vegans attended
>>> # Your given list of attendees
>>> list_of_attendees = [
... {"name": "Zeke", "vegan": True, "brought_guests": True,
... "guests": [{"name": "Amanda", "vegan": False},
... {"name": "Wayne", "vegan": True}]},
... {"name": "Xavier", "vegan": True, "brought_guests": False},
... {"name": "Yohanna", "vegan": False,
... "brought_guests": True,
... "guests": [{"name": "Lily", "vegan": True},
... {"name": "Stefano", "vegan": True}]},
... {"name": "Kael", "vegan": False, "brought_guests": False},
... {"name": "Landon", "vegan": True, "brought_guests": False},
... ]
Task 1: Calculate the Number of Attendees Who Brought Guests
The expected output should be a dictionary:
{
"guest_who_brought_guests": 2,
"total_guests": 9
}
Feel free to open up your Python interpreter and tickle your brain before moving to the solution presented next.
Sample Solution to Task 1
>>> def derive_guest_count(acc, attendee):
... acc["total_guests"] += 1
...
... if attendee["brought_guests"]:
... acc["guest_who_brought_guests"] += 1
... acc["total_guests"] += len(attendee["guests"])
...
... return acc
>>> results = reduce(
... derive_guest_count,
... list_of_attendees,
... { # Initializer
... "guest_who_brought_guests": 0,
... "total_guests": 0
... }
... )
Let's dive into the predicate (derive_guest_count()
).
- In line 1, you increment the count of
total_guests
. - Subsequently, you check if the attendee brought guests by using a
if
statement. If so, you increment the count ofguest_who_brought_guests
and add the total number of guests went along (len(attendee["guests"])
) into the count oftotal_guests
.
After defining the predicate, you feed the reduce()
function with your predicate, list of attendees (list_of_attendees
) and an initializer of a dictionary with keys:
guest_who_brought_guests
: to keep track of number of accompanied gueststotal_guests
: to keep the count of total guests attended
Task 2: Calculate the Number of Vegan and Non-Vegan
In the second task, with the same list of attendees, you're asked to derive the number of vegan and non-vegan. The expected output should be:
{
"vegan": 6,
"non_vegan": 3
}
Again, try to come up with a solution before moving forward.
Sample Solution to Task 2
>>> def derive_vegan_info(acc, attendee):
... if attendee["vegan"]:
... acc["vegan"] += 1
... else:
... acc["non_vegan"] += 1
...
... if attendee.get("brought_guests"):
... for guest_brought in attendee["guests"]:
... # Check guests recursively
... acc = derive_vegan_info(acc, guest_brought)
...
... return acc
>>> results = reduce(
... derive_vegan_info,
... list_of_attendees,
... {"vegan": 0,"non_vegan": 0}
... )
>>> results
{"vegan": 6,
"non_vegan": 3}
Let's take a look at the predicate (derive_vegan_info
):
- You first increment the count of
vegan
if the value ofattendee["vegan"]
isTrue
. Otherwise, increment the value ofnon_vegan
- Next, you check if the
attendee
brought along any guests. If so, you iterate through the list of extra guests attended. For every extra guest attended, you derive the vegan info recursively by invokingderive_vegan_info
and passing the accumulation value (acc
) and the info of the extra guest (guest_brought).
If you are new to the concept of recursion, check this Real Python article.
Up to now, I hope you understand how Python reduce function works. Be reminded that it takes a couple of practices to be accustomed to it.
The Close Sibling of Python Reduce Function: accumulate()
In Python, reduce function is sometimes being associated with its close relative, the itertools.accumulate()
object. With reduce()
, all you get is the end result after all the iterations run internally. What happens if you need the intermediate results of each iteration? Here is where the accumulate()
shines.
Similarly to reduce()
, which has to be explicitly imported from functools
. accumulate()
is located in the itertools
module.
Remember in the very beginning of this tutorial, you saw multiplication using reduce()
. Let's visualize the intermediate results with accumulate()
:
>>> from itertools import accumulate
>>> numbers = [2, 10, 4, 16]
>>> accumulation = accumulate(
... numbers, # Iterable
... lambda acc, number: acc * number # Predicate
... )
>>> list(accumulation)
[2, 20, 80, 1280]
Since the return value of accumulate
is an object of itertools.accumulate
type. You have to explicitly cast it to a Sequence
(in your case, a list
) to view the results.
Readers with sharp eyes will notice the function signature of accumulate()
is slightly different. The first argument is iterable and the predicate is now the second positional argument.
Let's take a look at the function signature of itertools.accumulate()
:
itertools.accumulate(iterable[,func, *, initial=None])
Notice that the arguments in square brackets [ ]
are optional, the default argument of the second positional argument func
is operator.add
. The operator
module exports a set of functions corresponding to the intrinsic operators of Python. For example, operator.add(x, y)
is equivalent to the expression x + y
.
That being said, if the second argument func
is absent, accumulate
by default sums up all the elements in the iterable.
Note that the optional keyword argument
initial
is added since Python 3.8. If you are using a prior version, the function signature will beitertools.accumulate(iterable[, func])
.
The operator
module also exports mul()
which is the multiplication correspondent to sum()
. To further simplify the example of accumulate
above, you can instead write:
>>> import operator
>>> from itertools import accumulate
>>> numbers = [2, 10, 4, 16]
>>> accumulation = accumulate(
... numbers, # Iterable
... operator.mul # This line has changed
... )
>>> list(accumulation)
[2, 20, 80, 1280]
Another difference between reduce()
and accumulate()
is that the latter returns an iterable itertools.accumulate
object. To iterate through the results, feed the return object into a for
loop or any functions that take iterable. With itertools.accumulate()
, you get all the values obtained in the accumulation process, while with functools.reduce()
you get just the last one.
Anti-Patterns of Python Reduce
The anti-patterns of using Python reduce function largely stem from the principles of functional programming. While using Python reduce:
- You shouldn't mutate any arguments other than the accumulation value
- You shouldn't create any side-effects in your predicate function
Here is a demonstration of a bad predicate, using the list of attendees you've seen in Example 3:
# A bad predicate
def derive_guest_count(acc, attendee):
# Anti-pattern 1: Mutating the input argument
attendee["processed"] = True
# Anti-pattern 2: Creating side-effect, printing to console
print(f"Processing {attendee['name']}")
# The lines below remain unchanged
acc["total_guests"] += 1
if attendee["brought_guests"]:
acc["guest_who_brought_guests"] += 1
acc["total_guests"] += len(attendee["guests"])
return acc
From the code above, the predicate tries to mutate the input argument attendee
by changing the value of the key processed
to True
.
It also tries to print a statement to the console using the print()
function.
Both of these are common mistakes even for some experienced Pythonistas.
Conclusion
In this tutorial:
- You've learned how
reduce()
in Python works and how it can be used for deriving insights and data transformation, as well as some of the anti-patterns - You're introduced to a similar function,
accumulate()
to gather the intermediate results and understood its differences fromreduce()
- You're introduced to the
operator
module, which houses many convenient functions correspond to Python operators
Now, you can leverage reduce()
to perform data massage and to extract useful information from raw data without relying on external packages. Any ideas or use cases where reduce()
or accumulate()
can be taken advantage of? Tell us how you use them, leave us a comment below. Have fun reducing
!