GXP |
|
Distributed processing by make †Introduction †make is a popular, proven-to-work solution for describing and executing tasks with dependencies among them. Its original purpose was automating software builds, but many people know that it is good for any kind of workflows composed of existing programs. The description of make may be somewhat cryptic for beginners but goes straight to the point. It has a natural fault tolerance model in which when some tasks failed due to machine crashes or other temporary reasons, simply re-executing make will perform the rest of the process rather than all. It essentially uses intermediate files as checkpoints. It has a natural parallel execution model where tasks without dependencies may be run in parallel. Finally it has been used by many programmers so the learning barrier is quite low for many people. It is natural for us to take advantage of them. There seem many similar tools and research projects out there (SGE's built-in dmake, pgmake, distmake, ...), but this one is different in a couple of important ways.
Getting Started †Ingredients †
make --version GNU Make 3.81 Copyright (C) 2006 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. This program built for x86_64-pc-linux-gnu If the command is not found, 'apt-get install make' will do on debian systems. A package will be available on virtually all Unix platforms. The last resort is to build it from source. Step 1: Prepare Makefile †
Here is a simple example. all : 1.result 2.result 3.result 4.result 5.result
%.result : %.data
your_program $< $@
Running make with this Makefile will result in executing: your_program 1.data 1.result your_program 2.data 2.result your_program 3.data 3.result your_program 4.data 4.result your_program 5.data 5.result in sequence, all in the local host. If you want to run them distributed, follow the following two steps. Step 2: Grab resources with GXP †I assume you are familiar with how to do this with gxpc use and gxpc explore. Step 3: Run it †gxpc make -j instead of the usual make. This will launch the above five tasks in parallel (by -j option given to the command line), and dispatch them to free resources (by the job scheduler started by the above command). One host will execute a task at a time, so if there are only, say, two hosts, three tasks will wait in a queue and will be dispatched when one becomes free. Features †
Extensions to Makefile syntax †You can just write the regular Makefile and just 'gxpc make -j N' will distribute commands to workers. There may be circumstances, however, where you may want to control the execution of jobs more precisely. For example, it may often be true you want to execute certain commands on your local host. There may be cases where certain types of commands should be subject to more strict concurrency control (e.g., the number of tasks downloading big data files from a central web server must be no less than five). To this end, Makefile syntax is slightly extended, but fortunately, the default GNU make needs no modification. NOTE Extensions described here are still experimental. The exact syntax and semantics are likely to be changed/extended in future. In front of a command line in a Makefile, you may write some equal signs, optionally followed by resource specifications. For example, let's say the original rule in a Makefile looks like this: a.result : a.src my_program a.src a.result All of the following rules are also valid. a.result : a.src = my_program a.src a.result a.result : a.src == my_program a.src a.result a.result : a.src === my_program a.src a.result a.result : a.src =(webserver) my_program a.src a.result a.result : a.src =(webserver:3) my_program a.src a.result a.result : a.src =(webserver:3,fileserver:2) my_program a.src a.result The first one ('=') has no effect. It is the same as writing nothing. The second one ('==') says this command must be executed locally. The thrid one has the same effect, but it is slightly faster and is NOT subject to any concurrency control. The fourth and following ones specify the jobs should be distributed, with resource requirements to limit concurrency of certain types of jobs. 'webserver' and 'fileserver' in the above example can be arbitrary alphabetical strings and need not correspond at all to any actual resource. You may consider that each resource name such as 'webserver' and 'fileserver' is a semaphore, and a job consuming resource x will try to decrement the value of samephore x by one. If a resource name is followed by :n (like 'webserver:3' or 'fileserver:2' above), it specifies the count of the resources the job requires. It is like the job tries to decrement the semaphore by that value. The initial value of a semaphore can be given by the --sem command line option. For example, --sem x:n says there are initially n units of resources named x. To summarize, when the scheduler dispatches jobs, it limits their concurrencies in such a way that for each resource x, the total counts required by all running jobs are not more than the initial count of x given by a --sem x:n option. If a job requires a resource x and you omit its initial value, it is assumed to be infinite. The effect is simply to ignore the requirement on x altogether (this spec is likely to be reconsidered in future). Concurrency Control †
Command Line Reference †Synopsis †gxpc make [ '''GNU make options''' ] [-- '''job scheduler options'' ] GNU make options are whatever options you like to pass to the underlying GNU make. You can specify -j N, -k, -n, among other things. Following '--', the rest of the lines are not passed to GNU make and given to the job scheduler. Following the list of job scheduler options. You can see them by: gxpc make -- --help Options †Among others, you may find gxpc make -n gxpc make -- -n particularly useful. The former simply lets GNU make print what will be executed. Tha latter, on the other hand, lets GNU make execute commands, and our job scheduler pretends they all succeed instantly. As a side effect, you state.html will be generated. This way, you can see the list of commands that will be executed in the nice html table.
The syntax of SEM_SPECS is as follows
Current Limitation and Performance †General cautions †
I ran a simple test to assess performance of the system. Dispatching null tasks to varying number of processors. †This test runs many trivial tasks that finish immediately, distributes them to varying numbers of processors, and records the total elapsed time of the make command (measured by 'time gxpc make -j'). Specifically, Makefile is this. all : $(shell for i in `seq 1 $(N)`; do echo $$i.result ; done)
%.result :
= hostname
(It defines N targets 1.result, 2.result, ..., N.result and execute a useless hostname command for each X.result). As a result, N instances of hostname command are distributed. We ran this test using three types of processors as the master (GNU make and our scheduler). There are always about 1200 dispatching targets (workers). Each represents a single CPU core. With various values of N, the elapsed time of the whole make command was like this. What does this graph tell us? It suggests that dispatching a single task takes 40ms to 180ms depending on the master processor. It was about 40ms on dual quad core Xeon E5410, 80ms on dual dual core Xeon 5140, and 180ms on single dual core Core2Duo 6400. We refer to thse numbers as '''the master overehad per task.''' Dispatching N tasks takes N * (the master over head per task) on the master, no matter how small tasks are and no matter how many worker processors you have. Note that all master overheads are paid by the master and thus serialized. The curve gives the lower limit to finish N tasks. These numbers also suggest that the master overhead is almost reciprocal of the number of cores in the master. It is understandable because the work of the master is distributed to many processes (spawned by make) and is thus highly concurrent. Dispatching tasks of varying granularities to 1,000 processors. †While the master overhead per task is clearly a scalability limitation, the above number suggests that for processes taking sufficiently long time (compared to N * the master overhead per task), the relative overhead may not be too significant. For example, if a single process takes 10mins (600sec) to finish, dispatching 1,000 such processes to 1,000 processors will finish in 1000 * (the master overhead per task) + 600sec = 780sec, or with 600/780 = 77% efficiency on the Core2Duo 6400 master. The system is useful in regimes where N * (the master overhead per task) is small compared to typical execution time of a single process. This test performs a measurement with this perspective. Makefile is this. ALL=$(shell for i in `seq 1 1000`; do echo $$i.result ; done)
all : $(ALL)
%.result :
= ~/cpu.py $(T)
'~/cpu.py T' is a simple command that just consumes CPU T seconds and quits. #!/usr/bin/env python
import sys,time
def main():
start_t = time.time()
end_t = start_t + 10.0
if len(sys.argv) > 1:
end_t = start_t + int(sys.argv[1])
while time.time() < end_t:
pass
main()
We run 'gxpc make -j T=XXX' with XXX from 10 to 120 and record its elapsed the time. There are always about 1200 dispatching targets, but the number of tasks is kept 1000 (due to the file descriptor limitations mentioned above). The result is this. What does this graph tell us? The elapsed time is clearly around 40/80/180sec + T (depending on the master). It confirms that the elapsed time really consists of a constant overhead and the actual job execution time (T). When T=120sec, the elapsed time is 300sec on the Core2Duo master and the efficiency is a mere 40%. The graph, however, suggests that if T=600sec (10mins), the efficiency will be the predicted 77%. |
|