Very, very interesting piece. At my previous job, I spent perhaps an entire month (plus all-nighters) working on a progress bar for the restore page of a backup program. The catch was it was a multi-stage process, and we wanted one and only one bar to represent the progress as smoothly as possible.<p>It's by far one of the most deceivingly difficult problems I've worked on. If anyone cares for more info, this is what the restore process looked like:<p><pre><code> * Single thread that determines the list of files that need to be restored, asynchronously feeding to
* Multiple threads that download the files from our servers as ZIP files, then each queue their results to either
* A thread pool with a dynamically adjusted number of worker threads for the unzip and decrypt process, extracting the files to their final destination, OR
* A different thread pool with a fixed number of threads that does block-level (byte-level differential) restore, which may, when processing a file, need to add files to the first thread mentioned in this list
</code></pre>
To pull this off, we had to modify the backup process to store enough info to be able to calculate, immediately when the user presses "start restore," the total number of files to be restored, the total number of files to be downloaded (which may be different because ZIP files can contain many small files to minimize latency and overhead, and also because the byte-level differential backup has "backup run" outputs), the total bytes to be restored, the total bytes to be downloaded, the total bytes to be unzipped, etc. etc. etc.<p>The progress bar had to move "smoothly enough" across the entire backup. If you're restoring a hundred and one files, 100 of which are tiny and in a single zip file, and the 101st being a huge differentially-backed-up file, the 101st will take forever and the progress should reflect that. For multi-GB restores, the math could give you 100% (with rounding) for over 10 minutes - need to jam the progress bar at 99% until it's actually done or you'll get complaints. Likewise can't keep it stuck at 0% even if it rounds down to 0% or you'll get complaints.<p>At the end of the day, the formula I came up with weighted for the following:<p><pre><code> * The number of files to be downloaded
* The number of files to be restored
* The size of the files to be restored
* The number of files to be decrypted
* The number of files to be differentially restored
* The number of files required to differentially restore a single file
* The size of the files required to differentially restore a single file
* The size of files cached locally that can skip download
* The number of files coalesced in a single ZIP archive
* The destination drive (externals are slower than internals)
* The average download speed
</code></pre>
Fringe cases such as attempting to restore a single file that was contained in a single ZIP with a hundred other files when tested with a naïve algorithm would end up giving negative progress as if you adjust for the factors listed above you'll end up needing to factor in a (relatively) "huge" number of bytes for download while you're actually only grabbing a single "small" file from the ZIP. Basically, if you adjust the weight of one factor for one case, you'll end up getting non-smooth progress bars for other cases as a result. Took a lot of charting, a lot of trial and error, a lot of user feedback (each time from people that had never participated in the test before to prevent any sort of bias or preconceived notions) to finally get it right. That code is now classified as "no one touch it, no matter how trivial (you think) a bug you found would be fix."<p>So, yes. Progress bars lie. It takes a shitload of work to make pull these lies off, and if they told the truth, your users would really make sure you never heard the end of it. Even if you've already technically done 80% of the work, if only 20% of the required time has elapsed, that progress bar had damn well not say 80%. Or 20% either, for that matter.