Scripts Index
Python File Management  

A Python script to test download mirrors

download download home home   report broken
important script information
company name:
code.activestate.com
license: Free
minimum requirements: Python
functional limitations:
A Python script to test download mirrors description
The concept of the script is straightforward: read the mirrors page from RedHat's web site, make a list of all the mirrors, test how long it takes to download from each, and present a sorted list of the results.

The first task, reading and parsing the RedHat mirrors list, is handled with the urllib and HTMLParser modules, respectively. I chose HTMLParser over the more comprehensive parser in sgmllib because it's a bit less work to override the default parser for simple tasks. After the parser sees the content comment in the HTML source, it starts recording any tags that have a scheme of 'homepage it stops recording after it sees the end of the content comment. Currently, it happens that there aren't any absolute URLs on the mirror page outside the content block, but I didn't want to rely on that fact.

To test the bandwidth of each mirror site, I simply test how long it takes to download the index page of the mirror. This is not a perfect test, but it gives reasonably good results without depending on knowledge of the site structure.

The bandwidth test demonstrates a few important paradigms when dealing with multithreading, either in Python or other languages:

1. Let the underlying libraries do as much work as possible.
2. Isolate your threads from the rest of the program.

The main thread creates a work queue of URLs to be tested and a result queue for retrieving results, then starts a number of threads to do the work and waits for those threads to exit. Because the Queue class is a threadsafe container, Python guarantees that no two threads will ever get the same work unit, and the storing of results by multiple threads will never leave the queue in a bad state.

Initially, each worker thread downloaded the mirror index page directly, but this caused the process to run for long amounts of time (over three minutes) when some sites were heavily loaded. To avoid this, I defined a maximum time to attempt downloading, and made each worker thread spawn a new daemon thread to do the download. The worker thread can use Thread.join() to wait on the subthread with a timeout; timeouts are counted as failures. Note that I pass an empty list to the subthread to collect the results. Threads in Python don't have a convenient way to return a status code back to the caller; by passing a mutable object like a list, the subthread can append value to the list to indicate a result. When the join() on the subthread completes, the worker thread can tell that it timed out if the list it passed in is empty.

The worker threads put the results for each URL into a results queue. For successful tests, they put a tuple of the URL and the time it took to download; for unsuccessful results, they put a tuple of the URL and a string describing the type of failure. When the main thread has detected that all worker threads have exited, it separates successes from failures, sorts the two lists, and prints them in aligned columns.

Note that the script could be written without the second-level threads. Using them helps isolate the failure-prone download from the more reliable worker thread pool, at the cost of a few more ephemeral threads, and provides a good demonstration of how and when to use daemon threads to keep a script from hanging indefinitely at shutdown.

This script is useful to tell which mirrors are most heavily loaded, but it has shortcomings. Some HTTP-based mirrors are actually redirects to FTP mirrors, and some seem to apply different bandwidth throttles to index pages and ISO downloads. Additionally, the script can't tell which of the mirrors actually have up-to-date files; this can't easily be fixed without having knowledge of each mirror site, since mirror sites differ in their directory structure. But this at least gives the would-be upgrader an idea of where to look.
Similar scripts
Convert PDF to TIFF (Popularity: ) : This script is a very short code snippet illustrating how to convert individual pages of PDF documents to TIFF files, one TIFF file per page. It works only on Mac OS X with PyObjC installed.As a recipe the code is ...
Count PDF pages (Popularity: ) : Count PDF pages script is a simple way to count the pages of a PDF the pure Python way.
Iterate over .MP4 atoms (Popularity: ) : This script yields the atoms contained in an MP4 file. Mostly is used for extracting the tags contained in it (artist, title etc) using a convenience class (M4ATags). This script could be implemented as an generator.
Counting pages of PDF documents on Mac OS X (Popularity: ) : Given that PDF is a "native" data format on Mac OS X, it is very easy to get access to some properties of such documents. One is the number of pages. Using Python the necessary code to do this is ...
Counting pages of PDF documents on Mac O (Popularity: ) : Given that PDF is a "native" data format on Mac OS X, it is very easy to get access to some properties of such documents. One is the number of pages. Using Python the necessary code to do this is ...
Disk (Popularity: ) : This script provides a simple simulation of secondary memory and is primarily designed to provide a driver interface to a virtual hard drive. The interface is simple and allows the simulation of IO errors. Also provided are methods that allow ...
Cross Platform Excel Parsing With Xlrd (Popularity: ) : This script easily extract data from microsoft excel files using this wrapper class for xlrd. The class allows you to create a generator which returns excel data one row at a time as either a list or dictionary. This script ...
A Singleton log file creator (Popularity: ) : This class is a basic Singleton log file creator. It allows separate classes/modules to log their activities to the same file (even the same line if they want to).This is a quite basic log file creator, intended to assist in ...
Backup your files (Popularity: ) : Backup your files script makes backup versions for your files.It can be used for non-python source code also.
Handling of command line arguments (Popularity: ) : This script handles arguments for small scripts that need to: - read some command line options - read some command line positional arguments - iterate over all lines of some files given on the command line, or stdin if none ...
User reviews

Write a review:
1 2 3 4 5 6 7 8 9 10
1=poor 10=excellent
Write review*
Your name*
Email*
  (Comments are moderated, and will not appear on this site until the editor has approved them)
 
Similar Software
Import OST to Outlook Download (Popularity: ) : Contacts, Emails, Calendar, Tasks, To Do List and Journals of OST files can now quickly accessed in Windows Outlook with Import OST to Outlook Download free software. No need to get panic because software firstly repairs damage Exchange OST information ...
Restore OST to Outlook Download (Popularity: ) : OST file inaccessibility makes users go mad but our organization got relieved by using restore OST to Outlook 2007 tool. Having OST file corrupt halts entire work structure then the only solution left is to restore OST to Outlook 2010 ...
Groupwise Contacts to Outlook Download (Popularity: ) : Extremely useful tools for organizations who want to change Novell Groupwise user to MS Outlook user, an and user even with of the Groupwise Contacts to Outlook Download continue using one of the most excellent PCVITA Novell Address Book Converter ...
Gif2png (Popularity: ) : gif2png is an utility that converts files from the obsolescent Graphic Interchange Format to Portable Network Graphics. The conversion preserves all graphic information, including transparency, perfectly. The gif2png program can even recover data from corrupted GIFs.

The distribution also includes a ...

Add a PST File to Outlook (Popularity: ) : Add a PST file to Outlook 2010, 2007, 2003 mail into a single PST using PCVITA PST Magic software. You can use this add a PST to Outlook software for complete versions of MS Outlook. Many MS Outlook users function ...
GroupWise to Outlook Download (Popularity: ) : GroupWise to Outlook converter tool to convert GroupWise to orphan PST, configured Outlook PST or vCard VCF easily. This is the easiest GroupWise to Outlook converter software which provides you the option to test the product first so that it ...
MoioSMS (Popularity: ) : MoioSMS is a Python script to automatically send SMS from Internet sites. MoioSMS features a plugin-like architecture that enables it to support easily new sites. At the moment, only the Italian language is supported.. moioli.net: sito di Silvio Moioli, programmatore ...
f2cpp (Popularity: ) : f2cpp is a python script to convert Fortran 77 to C++ code. The output files of f2cpp script, in contrast to the well-known f2c translator, can be easy read by human.
irssilogs2mysql (Popularity: ) : This is a Python script to parse your irssi logs and input them into a MySQL database which you can then use to search and display your logs on the web. It incrementally updates the database from the logs and ...
Speed Test Gadget (Popularity: ) : With Speed Test you can monitor CPU (over 8 cores) and RAM usage by itself and it has the ability to test;

- Maximum bandwidth speed (Internet Speed Test)
- Your own website download speed, response time and errors
- ...

ad


Rate me
supported os
All
stats
downloads 6
version 1.1
size in Kb
popularity   
1583/371338
user rating 5/10
New Scripts
Popular Scripts
Latest Reviews