I will prefix this piece by saying that this is just me sharing some ideas and code that I've used in a personal project. I am not claiming this is a unique creation nor that you or anyone else should take up using it.
I recently saw a Mastodon post about a text function library for Python that did a progress bar in text mode displays. That reminded me of the progress logging method I devised and prompted me to consider writing about it.
As I was writing my program "Foldatry" I felt a need to be able to see how it was going and where it was up to. For the most part, it works by crawling around folder structures, making records of what it finds and then comparing those findings in various ways.
By its very nature, its job is to explore things not already known. As a consequence, a "progress bar" as normally conceived just isn't possible.
In particular, my program is intended to operate on potentially very large complex directory structures, so I need something that would be workable regardless of the scale of files and folders it would be applied to.
Foldatry is intended to be used either via its GUI or via the command line. In both modes, it shows what it is doing mainly by writing lines of text to logs. These logs can be shown in the GUI and/or written to text log files. Hence for showing progress, I wanted something that would write meaningful progress indicators but never blow out the size of the logs.
When I think of handling things of unknown scale, my mind goes to logarithmic methods. In some of my professional data work I've made literal use of "log" functions (and in SQL at that) but for this I wanted something a bit clearer, and frankly, less "mathematical".
Nonetheless, I knew I wanted something that changed what it did with scaled powers of ten. As a simple idea I wanted it to show the 1, 2, 3, 4 e4tc up to 10 and then 20, 30, 40 etc up to 100, then 200, 300 etc and so on. While that may seem quite a few lines of progress log, when applied to multi-multi-thousands it would soon settle to a reasonably finite amount.
Of course, an idea is of no use unless we have code to implement it.
Here's the initially code function.
- I've not bothered to revisit it and make it more Pythonic - frankly it works and was only seen as a thing to quickly put in place during development.
Oh - by the way - by accident I mistyped "progress" as "progess" when naming the first function and left it intact as it amused me and was a more unique search string than "progress" - my apologies for any discomfort that gives some readers.
def is_n_to_show_progess( n ) : if n < 11: b = True elif n < 101 : b = (n % 10) == 0 elif n < 1001 : b = (n % 100) == 0 elif n < 10001 : b = (n % 1000) == 0 elif n < 100001 : b = (n % 10000) == 0 elif n < 1000001 : b = (n % 10000) == 0 elif n < 10000001 : b = (n % 100000) == 0 else : b = False return b
To use this function, pass it a counter
i = 0 for thing in the_things: i += 1 if is_n_to_show_progess( i ): print( "Now at " + str(i) )
It's not rocket science to work out what this will output, but let's make that explicit anyway.
Now at 1 Now at 2 Now at 3 Now at 4 Now at 5 Now at 6 Now at 7 Now at 8 Now at 9 Now at 10 Now at 20 Now at 30
For one place where I was using this, I found that I was nearly always reaching to a count near or below 20, so I felt I may as well have a variant that I could tell to show me all the count points below some specified point. The implementation was trivial.
def is_n_to_show_progess_showing_all_below_x( n, x ) : if n < x : b = True else : b = is_n_to_show_progess( n ) return b
The above functions all work fine if the circumstances are that every counter value is reached in sequence. But sometimes, we get to increment counters in jumps - for example a sub-call may bring back a variable size amount to add to the running count.
For that case, we need a function that can use the same idea but be given a range of values to consider. Here is how that looks:
def is_n_range_to_show_progess( from_n, upto_n ) : def inrange_check( factor, f_n, u_n ) : boo = ( (upto_n % factor) == 0 ) or ( (upto_n - from_n) >= factor ) or ( (upto_n % factor) < (from_n % factor) ) return boo if upto_n < 101 : b = ( (upto_n % 10) == 0 ) or ( (upto_n - from_n) >= 10 ) or ( (upto_n % 10) < (from_n % 10) ) elif upto_n < 1001 : b = inrange_check( 100, from_n, upto_n ) elif upto_n < 10001 : b = inrange_check( 1000, from_n, upto_n ) elif upto_n < 100001 : b = inrange_check( 10000, from_n, upto_n ) elif upto_n < 1000001 : b = inrange_check( 100000, from_n, upto_n ) elif upto_n < 10000001 : b = inrange_check( 1000000, from_n, upto_n ) else : b = False return b
And the way to use it, is to keep a variable that holds what the count was before the possible jump and another for what it has become after the jump in value.
Here is the place where it is used in Foldatry:
was_file_Count = g_file_count new_file_Count = g_file_count + found_n_files_deep_match + found_n_files_deep_mismatch + found_n_files_deep_errors if cmntry.is_n_range_to_show_progess( was_file_Count, new_file_Count ) : p_multi_log.do_logs( "dbs", "Files:" + show_at_str )
Later on, in using this method, I was becoming dissatisfied with the progress indications when the counts got up into high numbers. While the output was good for logs, that I would look at long after the whole run was finished, it was not very helpful as I was watching the program running live. I felt that what I wanted there was something that would show me progress at some general time interval rather than just those power-of-ten milestones.
Here's what I came up with. Unlike the earlier versions, this now had a dependency, but I considered it enough of a stock module to depend on.
Here's the code, then I'll describe what it does and how to use it.
import time as im_time def is_n_to_show_progess_timed_get_mark(): return im_time.perf_counter() def is_n_to_show_progess_timed( p_n, min_time_diff, max_time_diff, prev_time_mark) : i_t = im_time.perf_counter() i_d = i_t - prev_time_mark if i_d > max_time_diff : r_b = True elif i_d > min_time_diff : if p_n < 11: r_b = True elif p_n < 101 : r_b = (p_n % 10) == 0 elif p_n < 1001 : r_b = (p_n % 100) == 0 elif p_n < 10001 : r_b = (p_n % 1000) == 0 elif p_n < 100001 : r_b = (p_n % 10000) == 0 elif p_n < 1000001 : r_b = (p_n % 10000) == 0 elif p_n < 10000001 : r_b = (p_n % 100000) == 0 else : r_b = False else : r_b = False if r_b : r_new_time_mark = im_time.perf_counter() else: r_new_time_mark = prev_time_mark return r_b, r_new_time_mark def is_n_to_show_progess_timed_stock( p_n, p_time_mark): min_time_diff = 5 # so every is_n will be reported max_time_diff = 15 # show something every 15 seconds r_b, r_new_time_mark = is_n_to_show_progess_timed( p_n, min_time_diff, max_time_diff, p_time_mark) return r_b, r_new_time_mark
So far, in practice, I am using the latter call -
is_n_to_show_progess_timed_stock which merely bakes in, a minimum time of 5 seconds and a maximum of 15 seconds - so we'll presume that for describing. Just note that the middle call can always be used for any preferred number of seconds.
Here is how we use it:
- First we get a time mark value. We just do this so we can pass it in the next call.
- Then, we call our progress function, giving it our counter and the timer mark - we get back a boolean, and a timer value to replace the one we passed
i_time_mark = is_n_to_show_progess_timed_get_mark() # do things in some kind of loop n += 1 b, i_time_mark = is_n_to_show_progess_timed_stock( n, i_time_mark) if b : print( "show progress " + str(n) )
Here is an example of the log entries written with this method.
File: 2023-08-06 11:29:09,447 - Matchsubtry traverse_tree_recurse: File count = 2 File: 2023-08-06 11:29:21,900 - Matchsubtry traverse_tree_recurse: File count = 3 File: 2023-08-06 11:29:57,633 - Matchsubtry traverse_tree_recurse: File count = 5 File: 2023-08-06 11:30:30,194 - Matchsubtry traverse_tree_recurse: File count = 8 File: 2023-08-06 11:30:35,686 - Matchsubtry traverse_tree_recurse: File count = 9 File: 2023-08-06 11:31:07,213 - Matchsubtry traverse_tree_recurse: File count = 11 File: 2023-08-06 11:31:48,356 - Matchsubtry traverse_tree_recurse: File count = 14 File: 2023-08-06 11:32:33,264 - Matchsubtry traverse_tree_recurse: File count = 19 File: 2023-08-06 11:32:39,236 - Matchsubtry traverse_tree_recurse: File count = 20 File: 2023-08-06 11:33:07,987 - Matchsubtry traverse_tree_recurse: File count = 22 File: 2023-08-06 11:33:46,181 - Matchsubtry traverse_tree_recurse: File count = 25 File: 2023-08-06 11:34:25,549 - Matchsubtry traverse_tree_recurse: File count = 28 File: 2023-08-06 11:35:00,026 - Matchsubtry traverse_tree_recurse: File count = 30 File: 2023-08-06 11:35:35,388 - Matchsubtry traverse_tree_recurse: File count = 33 File: 2023-08-06 11:35:55,221 - Matchsubtry traverse_tree_recurse: File count = 36
You may notice several things going on there:
- not all the values between 1 and 10 were written - that's because some of those happened faster than the minimum specified time
- values in between the multiples of 10 are showing up - that's because the maximum specified time had passed
- values of the multiples of 10 still showed up - while these wouldn't be guaranteed to, these probably all will for some scaling degree or other.
These functions are a simple idea that I've found useful. I suspect I may yet think of some other variations on this idea, but will just let ongoing events prompt me to do so.