Which is fastest: filter(), list comprehension, or .count()?
Photo by Djim Loic on Unsplash
Counting the number of times a value exists within a list is a common task, whether you?re counting the number of failed checklist items, students who passed a test, or even a red candy pieces.
As with most things in programming, there are many ways to solve the same problem. We?re going to take a look at three different options: list comprehension, filter(), and the count() method.
Here are two-sentence overviews of each and why they are included.
- List comprehension: Data comprehensions are a hallmark trait of Python that enable the use of functions, if logic, and even nested iteration to define a list. When you?re trying to think Pythonic, your brain tends to ask the question, ?Can I do this with comprehensions??
- The filter() function: The filter() function?a method in many other popular languages?is a universal implementation that queries each item with an expression and only keeps those that evaluate to true. If you?re coming from another language, you?re probably familiar with this method, and therefore, it surfaces as an option.
- The .count() Method: As filter is a built-in method for other languages, Python lists have the .count() method built-in. Often times it?s in our nature to problem-solve; however, it?s worthwhile looking to see what?s readily available.
How To Count the Occurrences in a Python List
Using list comprehension
List comprehension is a shorthand technique for creating a new list. In Python, we?re able to embed a for loop as well as a conditional if expression inside of the square brackets to form our new array.
What we?ll be doing is setting our expression to only include the item we?re looking for while looping through our original list. Then once we have our filtered list, we use the len() function to get our count.
numbers = [1,1,2,4,5,3,2,1,6,3,1,6]sixes = [ number for number in numbers if number == 6]count_sixes = len(sixes)
Notice I split up the list comprehension into multiple lines for clarity. The first line is our temporary variable name. Since we?re not modifying the value, nothing special happens here. The second line is the for loop that?ll assign each item in numbers to the temporary variable number. The third line is our condition that?ll determine which items in the list are included.
Using the filter() function
The filter() function will take two arguments: a function and a sequence. The sequence will be our list, and the function is going to be a lambda expression, which is a single-line anonymous expression ? similar to an anonymous function.
numbers = [1,1,2,4,5,3,2,1,6,3,1,6]sixes = list(filter(lambda number: number == 6, numbers))count_sixes = len(sixes)
The lambda expression shares many similarities to the list comprehension example. To be clear on its syntax, the temporary variable is defined after the lambda keyword and before the colon. The expression follows afterward, and only values which evaluate to True will be included in our filter.
It?s also important to recognize the filter() function returns a filter object. Thus, we need to coerce its return into a list and then determine the length.
Using the .count() list method
The .count() method is built-in to the list class, and unlike list expressions and filter(), we won?t be using an expression here. To use .count(), all you need to do is pass the value to match within the parentheses.
numbers = [1,1,2,4,5,3,2,1,6,3,1,6]count_sixes = numbers.count(6)
Super simple. It?s these types of native methods that many who dive in before reading the manual miss out on.
Determining the Fastest Technique
To test which technique is the fastest, I put together a simple test script that created a list of 1,000,000 random integers from 1 to 100,000 ? as well as a search value that?s also between 1 and 100,000.
The time for each method will be recorded, and this scenario will run 10 times. At the end, the average time for each technique will be used to compare.
from random import randintimport timecount_time = list_time = filter_time = for _ in range(0,10): numbers = [randint(1,100000) for _ in range(0,1000000)] search = randint(1,1000000) count_start = time.time() count_matches = numbers.count(search) count_time.append(time.time() – count_start) list_start = time.time() list_matches = len([x for x in numbers if x == search]) list_time.append(time.time() – list_start) filter_start = time.time() filter_matches = len(list( filter(lambda x: x == search, numbers) )) filter_time.append(time.time() – filter_start)print(sum(count_time)/len(count_time))print(sum(list_time)/len(list_time))print(sum(filter_time)/len(filter_time))
The results of the speed test were as follows:
- Count method: 0.018813681602478028
- Filter function: 0.03895139694213867
- List comprehension: 0.09621889591217041
None of the three techniques were particularly slow, but the .count() method was far and away the best performer. Additionally, coming in second place, the filter() function was closer to the speed of count than it was to list comprehension.
Given that .count() is not only the fastest but the easiest to read and shortest to write, I believe it?s objectively superior to the other two options for analogous use cases.