Python Progress Bars with Tqdm by Example

Python Progress Bars with Tqdm by Example

Learn how to leverage progress bars within your Juypter notebooks and Python applications

Image for post

Progress bars set expectations, give an impression of activity and can calm the nerves. We?ve had them for years, some are exciting, some are boring and some are just not well documented.

A progress bar in essence, fills up according to the percentage of progress made in accomplishing a task. The movement of the bars is based on certain milestones in the task. This is usually done by having a count of input items to be processed beforehand then by calculating the progress done by dividing number_of_items_processed by total_input_items o f course, this is an over simplification of the problem. There are other factors to consider, such as network speed, latency, and, if persisting data into local storage to derive a more accurate ETA (Estimated Time of Arrival), write speed.

If you are reading this, you?re probably, like me, frustrated with the limited tqdm examples online that don’t do a good job illustrating how this package works in various use cases. The examples seem to assume you already know how the package works beforehand. If you are not good at weeding through the Git issues, you might not find the examples you need among the back-and-forth conversations between the community and devs.

Before we divide let’s understand what tqdm actually means. When I first started learning about tqdm the name threw me off. I totally didn’t understand why the package was named that ? it seems totally unrelated. That?s because I don’t understand Arabic. Tqdm in a short for taqadum in arabic, which means progress.

Tqdm package is one of the more comprehensive packages for progress bars with python and is handy for those instances you want to build scripts that keep the users informed on the status of your application. Tqdm works on any platform (Linux, Windows, Mac, FreeBSD, NetBSD, Solaris/SunOS) in any console or in a GUI, and is also friendly with IPython/Jupyter notebooks, which we will see in one of the examples with pandas.

Do note that tqdm doesn’t play well with Python?s core logging library. You may have to resort to hacks to get the same seamless progress bars. Since progress bars generated by tqdm leverage a carriage return r and line feed n to control characters, it’s important to understand when they?re used in an environment that does not support this. For example, within Jenkins logging terminal or third-party logging frameworks like splunk, cloudwatch, and Loggly, to name a few, the desired output may not be what you expect e.g. the output is streamed to each line as shown below:

Image for post

This short tutorial will give you some examples to help you get up to speed without breaking your back in the process. The examples here compliment what tqdm is already be showcasing on their git repository, with added insights to how the code works. Feel free to add examples in the comments, so that this resource may serve as a reference point for other developers. Now, let’s get starting setting up tqdm on your local machine.

Prerequisites

Python 3 must be installed on your machine. If you are on a Mac you can use Brew or follow the setup instructions on the Python site. If you?re on Windows, the Python MSI should do the legwork for you, at least in terms of configuring the path variables and installing python.

$ brew install python3

Take note that Pip3 is bundled along with Python3. To install virtualenv via pip run:

$ pip3 install virtualenv

Identify the directory you would like to write your code in and create a virtual environment:

$ virtualenv -p python3 <your-desired-path>

Activate the virtualenv:

$ source <desired-path>/bin/activate

In the event that you wish to deactivate the virtualenv, you can execute this command:

$ deactivate

Execute:

$ pip install tqdm$ pip freeze > requirements.txt

Creating a virtual environment to run your python code is a best practice you should adhere to. Python virtual environments create an isolated environment for Python projects. This means that each project can have its own dependencies, regardless of what dependencies every other project has. Activating instructs pip, when invoked, to install packages into the virtual environment folder you created. Deactivating turns off the link to the virtual environment within your terminal session. Running pip freeze allows you to take a snapshot of the current version of packages that work with your application. Let’s walk through some tqdm use-cases.

Adding Progress Bars to for Loops

Instead of printing out indices or other info at each iteration of your Python loops to see the progress, you can easily add a progress bar ? as in the example below. Adding progress bars to loops keeps you informed when running long scripts. If you?re running on a Windows machine you may need to add the colorama pip package.

import timeimport sysfrom tqdm import trangedef do_something(): time.sleep(1)def do_another_something(): time.sleep(1)for i in trange(10, file=sys.stdout, desc=’outer loop’): do_something() for j in trange(100,file=sys.stdout, leave=False, unit_scale=True, desc=’inner loop’): do_another_something()

This gives us a pretty, nested progress bar. For each outer loop, iterate ten times. By default, tqdm prints to the sys.stderr output stream. To re-channel it to the standard output stream the following argument does the trick: file=sys.stdout

Image for posttqdm nested progress bars in nested for loops

Predictive Manual Updates of Progress Bar

They are instances when you need to take control and manually perform updates to the progress bar at certain intervals. For example, when downloading a multi-part file in chunks or streaming data. Think of this as periodical interval updates or pulsing at a specific interval. Tqdm package allows us to invite the update progress bar function manually, as shown in the example below:

import timeimport sysfrom tqdm import tqdmdef do_something(): time.sleep(1)with tqdm(total=100, file=sys.stdout) as pbar: for i in range(10): do_something() # Manually update the progress bar, useful for streams such as reading files. pbar.update(10) # Updates in increments of 10 stops at 100

The above tqdm class attribute total is the expected number of iterations, which in the above code has been set to 100. The call to function update incrementally adds ten to each iteration until 100% is achieved.If the total is unspecified, len (iterable) is used if possible. if you omit thisonly basic progress statistics are displayed (no ETA, no progress bar) which might not be useful for you but still shows there’s ongoing work in the background.

Download Large Files with Tqdm Progress Bar

For this example, you need to add a requests package and validators to your Python site-packages via pip.

$ pip install requests validators# Copyright 2019 tiptapcode Authors. All Rights Reserved.## Licensed under the Apache License, Version 2.0 (the “License”);# you may not use this file except in compliance with the License.# You may obtain a copy of the License at## http://www.apache.org/licenses/LICENSE-2.0## Unless required by applicable law or agreed to in writing, software# distributed under the License is distributed on an “AS IS” BASIS,# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.# See the License for the specific language governing permissions and# limitations under the License.# -*- coding: utf-8 -*-import osimport sysimport tqdmimport requestsimport validatorsclass FileDownloader(object): def get_url_filename(self, url): “”” Discover file name from HTTP URL, If none is discovered derive name from http redirect HTTP content header Location :param url: Url link to file to download :type url: str :return: Base filename :rtype: str “”” try: if not validators.url(url): raise ValueError(‘Invalid url’) filename = os.path.basename(url) basename, ext = os.path.splitext(filename) if ext: return filename header = requests.head(url, allow_redirects=False).headers return os.path.basename(header.get(‘Location’)) if ‘Location’ in header else filename except requests.exceptions.HTTPError as errh: print(“Http Error:”, errh) raise errh except requests.exceptions.ConnectionError as errc: print(“Error Connecting:”, errc) raise errc except requests.exceptions.Timeout as errt: print(“Timeout Error:”, errt) raise errt except requests.exceptions.RequestException as err: print(“OOps: Something Else”, err) raise err def download_file(self, url, filename=None, target_dir=None): “”” Stream downloads files via HTTP :param url: Url link to file to download :type url: str :param filename: filename overrides filename defined in Url param :type filename: str :param target_dir: target destination directory to download file to :type target_dir: str :return: Absolute path to target destination where file has been downloaded to :rtype: str “”” if target_dir and not os.path.isdir(target_dir): raise ValueError(‘Invalid target_dir={} specified’.format(target_dir)) local_filename = self.get_url_filename(url) if not filename else filename req = requests.get(url, stream=True) file_size = int(req.headers[‘Content-Length’]) chunk_size = 1024 # 1 MB num_bars = int(file_size / chunk_size) base_path = os.path.abspath(os.path.dirname(__file__)) target_dest_dir = os.path.join(base_path, local_filename) if not target_dir else os.path.join(target_dir, local_filename) with open(target_dest_dir, ‘wb’) as fp: for chunk in tqdm.tqdm(req.iter_content(chunk_size=chunk_size), total=num_bars, unit=’KB’, desc=local_filename, leave=True, file=sys.stdout): fp.write(chunk) return target_dest_dirif __name__== “__main__”: links = [‘https://nodejs.org/dist/v12.13.1/node-v12.13.1.pkg’, ‘https://aka.ms/windev_VM_virtualbox’] downloader = FileDownloader() for url in links: downloader.download_file(url)

Threaded Progress Bars

In this example, we can see how we can wrap tqdm package into Python threads. Threads here should not be confused with processes. If you want to take advantage of the total number of cores you have on your computer, then multiprocessing is the way to go. The tqdm position argument allows us to specify the line offset to print this bar (starting from 0). If it’s unspecified it will default to automatic. For our example, it’s important to specify this value to manage multiple bars at once (eg, from threads). If you omit this argument, your bars will be overridden by different threads

Image for postimport timefrom random import randrangefrom multiprocessing.pool import ThreadPoolfrom tqdm import tqdmdef func_call(position, total): text = ‘progressbar #{position}’.format(position=position) with tqdm(total=total, position=position, desc=text) as progress: for _ in range(0, total, 5): progress.update(5) time.sleep(randrange(3))pool = ThreadPool(10)tasks = range(5)for i, url in enumerate(tasks, 1): pool.apply_async(func_call, args=(i, 100))pool.close()pool.join()

How to Apply Tqdm to Pandas Dataframe

Tqdm extends the pandas apply and map and produces a tqdm progress bar Now you can use progress_apply instead of apply and progress_map instead of map, as you can see in the example below. IOn each pandas row cell item iteration the tqdm update hook has been invoked based on total data within the data frame ? thus, an ETA can be derived

Image for post

To run this program ensure you have requests, tqdm, and pandas installed:

pip install requests tqdm pandasimport timeimport pandas as pdimport requestsfrom tqdm import tqdmdef percent_off(product_price, discount): try: discount = float(discount) if discount < 0 and discount > 100: raise ValueError(‘discout amount should be between 1 and 100%’) value = (product_price – (product_price * (discount / 100.0))) time.sleep(0.0001) return value except ValueError as e: print(‘invalid product_price or discount amount’, e) raise edef appy_discount(perentage): df = pd.DataFrame(pd.read_json(‘products.json’)) df.insert(4, ‘discount’, 0) tqdm.pandas(desc=’apply_{}_percent_off’.format(perentage)) df[‘discount’] = df[‘price’].progress_apply(lambda x: percent_off(x, perentage)) return df# Downlaod sample best buy products json file# It sucks right that you do not see a progress bar while downloadng this large file belowr = requests.get(‘https://github.com/BestBuyAPIs/open-data-set/raw/master/products.json’, allow_redirects=True)open(‘products.json’, ‘wb’).write(r.content)# How about now imagine performing a large pandas dataframe calculationdf = appy_discount(5)df # use this to a nice html output in jupyter notebooks else print to sysout

To run the program in Juypter notebooks you will need to install Jupyter Notebook with pip:

python3 -m pip install jupyter

To run the notebook, run the following command in the terminal (Mac/Linux) or command prompt (Windows). This should open your browser with Jupyter running on the default port:

jupyter notebook

How to Add Color to Your Tqdm Progress Bar

If you don?t find the idea of adding colors to progress bars distracting, then this example might be for you. tqdm can work with colorama, a simple cross-platform colored terminal text in Python. Cross-platform printing of colored text can then be done using Colorama?s constant shorthand for ANSI escape sequences: Examples and source code for colorama can be found here.

from tqdm import trangefrom colorama import Fore# Cross-platform colored terminal text.color_bars = [Fore.BLACK, Fore.RED, Fore.GREEN, Fore.YELLOW, Fore.BLUE, Fore.MAGENTA, Fore.CYAN, Fore.WHITE]for color in color_bars: for i in trange(int(7e7), bar_format=”{l_bar}%s{bar}%s{r_bar}” % (color, Fore.RESET)): passImage for post

How to Use Python Logger With Tqdm

The following example illustrates how to log into the Python logging framework. The idea is to create a custom logger that inherits logged data from the StringIO and channel. Using buffer modules such as StringIO helps us to manipulate data like a normal file that we can use for further processing.

# Copyright 2019 tiptapcode Authors. All Rights Reserved.## Licensed under the Apache License, Version 2.0 (the “License”);# you may not use this file except in compliance with the License.# You may obtain a copy of the License at## http://www.apache.org/licenses/LICENSE-2.0## Unless required by applicable law or agreed to in writing, software# distributed under the License is distributed on an “AS IS” BASIS,# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.# See the License for the specific language governing permissions and# limitations under the License.import ioimport osimport sysimport loggingimport validatorsfrom urllib import requestfrom tqdm import tqdmclass ProgressBar(tqdm): def update_progress(self, block_num=1, block_size=1, total_size=None): if total_size is not None: self.total = total_size self.update(block_num * block_size – self.n) # will also set self.n = b * bsizeclass DownloadFileHandler(object): @staticmethod def download_file_by_url(url, download_dir=None): if not validators.url(url): raise ValueError(‘Invalid url := {}’.format(url)) if download_dir is not None and not os.path.isdir(download_dir): raise FileNotFoundError(‘Directory specified := {} does not exist’.format(download_dir)) else: download_dir = os.path.abspath(os.path.dirname(__file__)) filename = os.path.basename(url) download_destination = os.path.join(download_dir, filename) #The magic happens here in order to log to python logger we need to create # A custom logger that channels the output stream to the log stream with ProgressBar( file=TqdmSystemLogger(logger, suppress_new_line=False), unit=’B’, unit_scale=True, miniters=1, desc=filename ) as progressBar: # request.urlretrieve has an internal callback function that get invoked reporthook # The reporthook argument should be # a callable that accepts a block number, a read size, and the # total file size of the URL target. The data argument should be # valid URL encoded data. # tqdm uses this data to derive a progress bar as we know the total file size we can estimate ETA request.urlretrieve(url, filename=download_destination, reporthook=progressBar.update_progress, data=None) return download_destinationclass SystemLogger(object): def __init__(self): pass @staticmethod def get_logger(name, level=None): root_logger = logging.getLogger(name) root_logger.setLevel(level if level else logging.INFO) # An attempt to replace logger output as to print on same line may not work on some terminals # only applicable to logging to sys.stdout # formatter = logging.Formatter(‘x1b[80Dx1b[1Ax1b[K%(message)s’) formatter = logging.Formatter(fmt=’%(levelname)s:%(name)s: %(message)s (%(asctime)s; %(filename)s:%(lineno)d)’, datefmt=”%d-%m-%YT%H:%M:%S%z”) handler_stdout = logging.StreamHandler(sys.stdout) handler_stdout.setFormatter(formatter) handler_stdout.setLevel(logging.WARNING) handler_stdout.addFilter(type(”, (logging.Filter,), {‘filter’: staticmethod(lambda r: r.levelno <= logging.INFO)})) handler_stdout.flush = sys.stdout.flush root_logger.addHandler(handler_stdout) handler_stderr = logging.StreamHandler(sys.stderr) handler_stderr.setFormatter(formatter) handler_stderr.setLevel(logging.WARNING) handler_stderr.flush = sys.stderr.flush root_logger.addHandler(handler_stderr) return root_loggerclass TqdmSystemLogger(io.StringIO): def __init__(self, logger, suppress_new_line=True): super(TqdmSystemLogger, self).__init__() self.logger = logger self.buf = ” # only tested and works inside pycharm terminal logging to sys.stdout # by replacing default terminator newline we force logger to override the output on screen # thus giving us a progress depiction in a single line instead of multiple lines if suppress_new_line: for handler in self.logger.handlers: if isinstance(handler, logging.StreamHandler): handler.terminator = “” def write(self, buf): self.buf = buf.strip(‘rnt ‘) def flush(self): self.logger.log(self.logger.level, ‘r’ + self.buf)try: logger = SystemLogger.get_logger(‘DownloadFileHandler’, level=logging.WARNING) # Download a file to this scripts relative directory and log output to python logger sysout DownloadFileHandler.download_file_by_url(‘https://nodejs.org/dist/v12.13.1/node-v12.13.1-darwin-x64.tar.gz’)except Exception as e: print(str(e))

Adding Tqdm to python subprocesses

Python subproceses are used and should be used for accessing system commands, for example, executing windows terminal commands or bash commands on your terminal if you are running on Unix based systems. The subprocess module allows us to spawn processes, connect to their input/output/error pipes, and obtain their return codes.

import sysimport subprocessfrom tqdm import tqdmdef create_test_bash_script(): “”” Create a bash script that generates numbers 1 to 1000000 This is just for illustration purpose to simulate a long running bash command “”” with open(‘hello’, ‘w’) as bash_file: bash_file.write(”’ #!/bin/bash # Tested using bash version 4.1.5 for ((i=1;i<=1000000;i++)); do # your-unix-command-here echo $i done ”’)def run_task(cmd): try: # create a default tqdm progress bar object, unit=’B’ definnes a String that will be used to define the unit of each iteration in our case bytes with tqdm(unit=’B’, unit_scale=True, miniters=1, desc=”run_task={}”.format(cmd)) as t: # subprocess.PIPE gets the output of the child process process = subprocess.Popen(cmd, shell=True, bufsize=1, universal_newlines=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE) # print subprocess output line-by-line as soon as its stdout buffer is flushed in Python 3: for line in process.stdout: # Update the progress, since we do not have a predefined iterator # tqdm doesnt know before hand when to end and cant generate a progress bar # hence elapsed time will be shown, this is good enough as we know # something is in progress t.update() # forces stdout to “flush” the buffer sys.stdout.flush() # We explicitly close stdout process.stdout.close() # wait for the return code return_code = process.wait() # if return code is not 0 this means our script errored out if return_code != 0: raise subprocess.CalledProcessError(return_code, cmd) except subprocess.CalledProcessError as e: sys.stderr.write( “common::run_command() : [ERROR]: output = {}, error code = {}n”.format(e.output, e.returncode))create_test_bash_script()# run your terminal command using belowrun_task(‘chmod 755 hello && ./hello’)run_task(‘xx*3238’) # this will fail not a valid command??

In the example above, we iteratively stream the output generated by your executed command and use that to update the tqdm progress bar. Since you do not explicitly have an iterator with a pre-defined length we can’t anticipate an end to our iteration hence tqdm will default to elapsed time as output.

Image for posttqdm elapsed time in absence of a progress bar

The elapsed time may be desirable in instances you do not wish to have verbose output in your terminal.

Something to note the official Python documentation states a warning about using the shell=True argument .

?Invoking the system shell with shell=True can be a security hazard if combined with untrusted input?

Final Thoughts

I hope these examples prove useful in your daily grind. Stay tuned for more!

40