Page Tools


    Features

    A rough list of currently implemented features, as of svn revision 2350 :

    Core

    • XML documents are always validated against their XSD schema. See sample inputs in tests/.
    • Workflow is simple : conversion from specific description to generic description, planning and deployment (and a little control script after)
    • Installed files : adage binary, XSD schemas, specific launchers, adage-client, adage-satellite, adage-satellite-daemon and adage-oar wrappers scripts.

    Interface

    • Command-line interface, adage -h to have a simple help
    • -D switch to dump a document after its validation/creation (-f to dump it to a file)
    • -j/-o/-g for oar interaction, it reads $prefix/share/adage/xml/all-g5k.res to have a view of all available resources
    • by default, it only converts specific document to generic. Use -p if you only want to run the planner, -n if you want the plan to be dry-runned, and -x if you want the appl to be really deployed.
    • Debugging output messages are huge if you compiled with –enable-debug, and big otherwise. This is normal, we're still at a development step. Use -l 2 or -l 3 to reduce the verbosity.

    Planners

    • Controlled by the tag <planner> in control params : round-robin is the default. Heuristic planners are currently being developped by Benjamin Depardon.
    • Round-robin is done on the available resources. Placement constraints are set in control parameters (see the doc.) Inside round-robin placement, we tend to equilibrate the distribution of processes on hosts if there are less hosts than processes to deploy, i.e when we search for nodes matching the placement constraints of a particular process group, we return a list sorted as “least already used/planned first”.
    • Example: we have to plan 3 instances of process group A and 1 instances of process group B on nodes 1 and 2. First, we put one instances of A on node 1, one instance on node 2 and the remaining instance on node 1. Then we get a list of available nodes sorted by “load” as “node 2 node 1” (because node 2 has 2 instances and node 1 only one), and we put the instance of B on node 2 to equilibrate the load.

    Resource handling

    • Resource can come from various sources : oar job id, oargrid job id, nodelist in a file, a hand-written file… they are all merged in a single resource XML document.
    • We support various node properties (#of cpu/cores, processor speed, contact/transfer method, scratch dir,..) and a notion of resource group, where a set of nodes share a particular resource like a networked file system (used later by file transfer feature, only one NFS by node supported at the moment)

    File transfer

    • Files are related to a process and are described in generic document (your appl plugin has to support it).
    • 5 types of files : process binary, shared object, library, data and configuration file.
    • What type of file is transferred is controlled by the attribute “type” in tag <transfer_files> of control params.
    • Transfer method is either rcp, oarcp or scp, and destination (on remote nodes) is either scratch dir (generally /tmp) or shared dir (your homedir or /site/data0 or /nfs-shared)
    • A boolean (binlib_in_commondir) controls whether libraries and binaries are transferred in a common directory. Otherwise, they're located in the rundir/pwd of the process.
    • First, all files are gathered in subdirectories of a local tree (/tmp/$USER/prestage-tree-adage-$PID/$NODE), then we do a single transfer by node of the whole part of the tree corresponding to the target $NODE.
    • Even if we don't want to transfer any real file, this step is mandatory because it creates the remote hierarchy of directories which will be the rundirs/pwd for the remote processes, and specific launchers needed remotely are transferred anyway.

    Deployment

    • All process group are scheduled by dependencies in a state machine, we first launch the ones with no dependencies.
    • A process group instance is launched in a single remote command, which will generate a callback message with the PIDs of launched processes. We can then update the state machine.
    • The adage-satellite script is launched remotely with all the processes to launch in argument. It will launch each process by calling the specific launched associated with the application.
    • Each process has a unique rundir/pwd : /$BASE(either scratch or shared dir)/$UID/adage-$PID/$process_group_id/#_of_pg_instance/$process_id/#_of_process_instance.
    • Global environment variables can be controlled in control params with <env> tag.
    • Scripts for getting status and cleaning deployment are automatically generated in your pwd.
    • If adage stalls endlessly during deployment, this means that a callback hasn't been received, state machine is blocked, probably a remote process hasn't been correctly launched.

    Plugins

    • See each plugin page for more information on supported features.

    To-Do

    A rough high-level list of planned or eventual features (not sorted by priority) :

    • implement an advanced meta-plugin wrapper
    • publish real versions and milestones of adage with binary packages (at least deb and rpm)
    • add cardinality informations in accelerator structs ?
    • daemonize satellite with a client-server model too ⇒ gather more info on the running appls, fine-grained control

    See also


    Powered by Heliovista - Création site internet