Posted  by  admin

Here Are Not Enough Slots Available In The System To Satisfy

This is an outdated version of the HTCondor Manual. You can find current documentation at http://htcondor.org/manual.

There are not enough slots available in the system to satisfy the 2 slots. There are not enough slots available in the system to satisfy the 2 slots. This article contains basic information about the virtual memory implementation in 32-bit versions of Windows. This information concerns Windows 2000, Windows XP, Windows Server 2003, Windows Vista, and Windows Server 2008. (Because Windows Server 2008 R2 is available only in a 64-bit version, this information does not apply to it.).

Next:2.6 Managing a Job Up:2. Users' Manual Previous:2.4 Road-map for RunningContentsIndex
Subsections
  • 2.5.1 Sample submit description files
  • 2.5.2 About Requirements and Rank
  • 2.5.4 Submitting Jobs Without a Shared File System:HTCondor's File Transfer Mechanism
  • 2.5.6 Heterogeneous Submit: Execution on Differing Architectures

A job is submitted for execution to HTCondor using thecondor_submit command.condor_submit takes as an argument the name of afile called a submit description file.This file contains commands and keywords to direct the queuing of jobs.In the submit description file, HTCondor finds everything it needsto know about the job. Items such as the name of the executable to run,the initial working directory, and command-line arguments to theprogram all go intothe submit description file. condor_submit creates a jobClassAd based upon the information,and HTCondorworks toward running the job.

The contents of a submit filecan save time for HTCondor users.It is easy to submit multiple runs of a program toHTCondor. To run the same program 500 times on 500different input data sets, arrange your data filesaccordingly so that each run reads its own input, and each runwrites its own output.Each individual run may have its own initialworking directory, stdin, stdout, stderr, command-line arguments, andshell environment.A program that directly opens its ownfiles will read the file names to use either from stdinor from the command line. A program that opens a static filename every timewill need to use a separate subdirectory for the output of each run.

The condor_submit manual page is on page andcontains a complete and full description of how to use condor_submit.It also includes descriptions of all the commands that may be placedinto a submit description file.In addition, the index lists entries for each command under theheading of Submit Commands.


2.5.1 Sample submit description files

In addition to the examples of submit description files givenin the condor_submit manual page, here are a few more.

2.5.1.1 Example 1

Example 1 is one of the simplest submit descriptionfiles possible. It queues up one copy of the program foo(which had been created by condor_compile)for execution by HTCondor.Since no platform is specified, HTCondor will use its default,which is to run the job on a machine which has thesame architecture and operating system as the machine from which it wassubmitted. No input,output, anderrorcommands are given in the submitdescription file, so thefiles stdin, stdout, and stderr will all refer to /dev/null.The program may produce output by explicitly opening a file and writing toit.A log file, foo.log, will also be produced that contains eventsthe job had during its lifetime inside of HTCondor.When the job finishes, its exit conditions will be noted in the log file.It is recommended that you always have a log file so you know whathappened to your jobs.

2.5.1.2 Example 2

Example 2 queues two copies of the program mathematica. Thefirst copy will run in directory run_1, and the second will run indirectory run_2. For both queued copies, stdin will be test.data,stdout will be loop.out, andstderr will be loop.error.There will be two sets of files written,as the files are each written to their own directories.This is a convenient way to organize data if youhave a large group of HTCondor jobs to run. The example file shows program submission ofmathematica as a vanilla universe job.This may be necessary if the sourceand/or object code to mathematica is not available.

The request_memory command is included to insurethat the mathematica jobs match with and then execute onpool machines that provide at least 1 GByte of memory.

2.5.1.3 Example 3

The submit description file for Example 3 queues 150runs of program foo which has been compiled and linked forLINUX running on a 32-bit Intel processor.This job requires HTCondor to run the program on machines which havegreater than 32 Mbytes of physical memory, and expresses apreference to run the program on machines with more than 64 Mbytes.It also advises HTCondor that this standard universe job willuse up to 28000 Kbytes of memory when running.Each of the 150 runs of the program is given its own process number,starting with process number 0.So, files stdin, stdout, and stderr willrefer to in.0, out.0, and err.0 for the first runof the program,in.1, out.1,and err.1 for the second run of the program, and so forth.A log file containing entriesabout when and where HTCondor runs, checkpoints, and migrates processes forall the 150 queued programswill be written into the single file foo.log.


2.5.2 About Requirements and Rank

The requirements and rank commands in the submit description fileare powerful and flexible. Using them effectively requires care, and this section presentsthose details.

Both requirements and rank need to be specified as valid HTCondor ClassAd expressions, however, default values are set by thecondor_submit program if these are not defined in the submit description file.From the condor_submit manual page and the above examples, you seethat writing ClassAd expressions is intuitive, especially if youare familiar with the programming language C. There are somepretty nifty expressions you can write with ClassAds.A complete description of ClassAds and their expressionscan be found in section 4.1 on page .

All of the commands in the submit description file are case insensitive, except for the ClassAd attribute string values.ClassAd attribute names arecase insensitive, but ClassAd stringvalues are case preserving.

Note that the comparison operators(<, >, <=, >=, and )compare stringscase insensitively. The special comparison operators =?= and =!=compare strings case sensitively.

A requirements or rank command inthe submit description file may utilize attributesthat appear in a machine or a job ClassAd.Within the submit description file (for a job) theprefix MY. (on a ClassAd attribute name)causes a reference to the job ClassAd attribute,and the prefix TARGET. causes a reference to a potential machine or matched machine ClassAd attribute.

Here Are Not Enough Slots Available In The System To Satisfy The Following

The condor_status command displaysstatistics about machines within the pool.The -l option displays themachine ClassAd attributes for all machines in the HTCondor pool.The job ClassAds, if there are jobs in the queue, can be seenwith the condor_q -l command.This shows all the defined attributes for current jobs in the queue.

A list of defined ClassAd attributes for job ClassAdsis given in the unnumbered Appendix on page .A list of defined ClassAd attributes for machine ClassAdsis given in the unnumbered Appendix on page .


2.5.2.1 Rank Expression Examples

When considering the match between a job and a machine, rank is usedto choose a match from among all machines that satisfy the job'srequirements and are available to the user, after accounting forthe user's priority and the machine's rank of the job.The rank expressions, simple or complex, define a numerical valuethat expresses preferences.

The job's Rank expression evaluates to one of three values.It can be UNDEFINED, ERROR, or a floating point value.If Rank evaluates to a floating point value,the best match will be the one with the largest, positive value.If no Rank is given in the submit description file,then HTCondor substitutes a default value of 0.0 when consideringmachines to match.If the job's Rank of a given machine evaluatesto UNDEFINED or ERROR,this same value of 0.0 is used.Therefore, the machine is still considered for a match,but has no ranking above any other.

A boolean expression evaluates to the numerical value of 1.0if true, and 0.0 if false.

The following Rank expressions provide examples tofollow.

For a job that desires the machine with the most available memory:

For a job that prefers to run on a friend's machineon Saturdays and Sundays:

For a job that prefers to run on one of three specific machines:

For a job that wants the machine with the best floating pointperformance (on Linpack benchmarks):This particular example highlights a difficulty with Rank expressionevaluation as currently defined.While all machines have floating point processing ability,not all machines will have the kflops attribute defined.For machines where this attribute is not defined,Rank will evaluate to the value UNDEFINED, andHTCondor will use a default rank of the machine of 0.0.The Rank attribute will only rank machines wherethe attribute is defined.Therefore, the machine with the highest floating pointperformance may not be the one given the highest rank.

So, it is wise when writing a Rank expression to checkif the expression's evaluation will lead to the expectedresulting ranking of machines.This can be accomplished using the condor_status command with the-constraint argument. This allows the user to see a list ofmachines that fit a constraint.To see which machines in the pool have kflops defined,useAlternatively, to see a list of machines where kflops is not defined, use

For a job that prefers specific machines in a specific order:If the machine being ranked is friend1.cs.wisc.edu, then theexpressionis true, and gives the value 1.0.The expressionsandare false, and give the value 0.0.Therefore, Rank evaluates to the value 3.0.In this way, machine friend1.cs.wisc.edu is ranked higher thanmachine friend2.cs.wisc.edu,machine friend2.cs.wisc.eduis ranked higher than machine friend3.cs.wisc.edu,and all three of these machines are ranked higher than others.


2.5.3 Submitting Jobs Using a Shared File System

If vanilla, java, or parallel universejobs are submitted without using the File Transfer mechanism, HTCondor must use a shared file system to access input and outputfiles. In this case, the job must be able to access the data filesfrom any machine on which it could potentially run.

As an example, suppose a job is submitted from blackbird.cs.wisc.edu,and the job requires a particular data file called/u/p/s/psilord/data.txt. If the job were to run oncardinal.cs.wisc.edu, the file /u/p/s/psilord/data.txt must beavailable through either NFS or AFS for the job to run correctly.

HTCondor allows users to ensure their jobs have access to the rightshared files by using the FileSystemDomain andUidDomain machine ClassAd attributes.These attributes specify which machines have access to the same sharedfile systems.All machines that mount the same shared directories in the samelocations are considered to belong to the same file system domain.Similarly, all machines that share the same user information (inparticular, the same UID, which is important for file systems likeNFS) are considered part of the same UID domain.

The default configuration for HTCondor places each machinein its own UID domain and file system domain, using the full host name of themachine as the name of the domains.So, if a pool does have access to a shared file system,the pool administrator must correctly configure HTCondor such that allthe machines mounting the same files have the sameFileSystemDomain configuration.Similarly, all machines that share common user information must beconfigured to have the same UidDomain configuration.

When a job relies on a shared file system,HTCondor uses therequirements expression to ensure that the job runson a machine in thecorrect UidDomain and FileSystemDomain.In this case, the default requirements expression specifiesthat the job must run on a machine with the same UidDomainand FileSystemDomain as the machine from which the jobis submitted.This default is almost always correct.However, in a pool spanning multiple UidDomains and/orFileSystemDomains, the user may need to specify a differentrequirements expression to have the job run on the correctmachines.

For example, imagine a pool made up of both desktop workstations and adedicated compute cluster.Most of the pool, including the compute cluster, has access to ashared file system, but some of the desktop machines do not.In this case, the administrators would probably define theFileSystemDomain to be cs.wisc.edu for all the machinesthat mounted the shared files, and to the full host name for eachmachine that did not. An example is jimi.cs.wisc.edu.

In this example,a user wants to submit vanilla universe jobs from her own desktopmachine (jimi.cs.wisc.edu) which does not mount the shared file system(and is therefore in its own file system domain, in its own world).But, she wants the jobs to be able to run on more than just her ownmachine (in particular, the compute cluster), so she puts the programand input files onto the shared file system.When she submits the jobs, she needs to tell HTCondor to send them tomachines that have access to that shared data, so she specifies adifferent requirements expression than the default:

WARNING: If there is no shared file system, or the HTCondor pooladministrator does not configure the FileSystemDomainsetting correctly (the default is that each machine in a pool is inits own file system and UID domain), a user submits a job that cannotuse remote system calls (for example, a vanilla universe job), and theuser does not enable HTCondor's File Transfer mechanism, the job willonly run on the machine from which it was submitted.


2.5.4 Submitting Jobs Without a Shared File System:HTCondor's File Transfer Mechanism

HTCondor works well without a shared file system.The HTCondor file transfer mechanism permits the user to select which files aretransferred and under which circumstances.HTCondor can transfer any files needed by a job fromthe machine where the job was submitted into aremote scratch directory on the machine where thejob is to be executed.HTCondor executes the joband transfers output back to the submitting machine.The user specifies which files and directories to transfer,and at what point the output files should be copied back to thesubmitting machine.This specification is done within the job's submit description file.


2.5.4.1 Specifying If and When to Transfer Files

To enable the file transfer mechanism, place two commandsin the job's submit description file:should_transfer_files and when_to_transfer_output.By default, they will be:

Here Are Not Enough Slots Available In The System To Satisfy Someone

Setting the should_transfer_files command explicitlyenables or disables the file transfer mechanism.The command takes on one of three possible values:

  1. YES: HTCondor transfers both the executable and the filedefined by the input command from the machine where the job issubmitted to the remote machine where the job is to be executed.The file defined by the output command as well as any filescreated by the execution of the job are transferred back to the machinewhere the job was submitted.When they are transferred and the directory location of the filesis determined by the command when_to_transfer_output.
  2. IF_NEEDED: HTCondor transfers files if the job ismatched with and to be executed on a machine in adifferent FileSystemDomain than theone the submit machine belongs to, the same as if should_transfer_files = YES.If the job is matched with a machine in the local FileSystemDomain,HTCondor will not transfer files and relieson the shared file system.
  3. NO: HTCondor's file transfer mechanism is disabled.

The when_to_transfer_output command tells HTCondor when outputfiles are to be transferred back to the submit machine.The command takes on one of two possible values:

  1. ON_EXIT: HTCondor transfers the file defined by theoutput command, as well as any other files in the remote scratch directory created by the job,back to the submit machine only when the job exits on its own.
  2. ON_EXIT_OR_EVICT: HTCondor behaves the same as describedfor the value ON_EXIT when the job exits on its own.However, if, and each time the job is evicted from a machine,files are transferred back at eviction time. The files thatare transferred back at eviction time may include intermediate filesthat are not part of the final output of the job. Before the jobstarts running again, all of the files that were stored when the jobwas last evicted are copied to the job's new remote scratchdirectory.

    The purpose of saving files at eviction time is to allow the job toresume from where it left off.This is similar to using the checkpoint feature of the standard universe,but just specifying ON_EXIT_OR_EVICT is not enough to make a job capable of producing or utilizing checkpoints.The job must be designed to save and restore its stateusing the files that are saved at eviction time.

    The files that are transferred back at eviction time are not stored inthe location where the job's final output will be written when the job exits.HTCondor manages these files automatically,so usually the only reason for a user to worry about them is to make sure that there is enough space to store them.The files are stored on the submit machine in a temporary directory within thedirectory defined by the configuration variable SPOOL. The directory is named using the ClusterId and ProcId jobClassAd attributes. The directory name takes the form:where <X> is the value of ClusterId, and <Y> is the value of ProcId. As an example, if job 735.0 is evicted, it will produce the directory

The default values for these two submit commands make sense asused together.If only should_transfer_files is set, and set to the value NO, then no output files will be transferred, and the value ofwhen_to_transfer_output is irrelevant.If only when_to_transfer_output is set,and set to the value ON_EXIT_OR_EVICT,then the default value for an unspecifiedshould_transfer_files will be YES.

Note that the combination ofwould produce undefined file access semantics.Therefore, this combination is prohibited by condor_submit.

2.5.4.2 Specifying What Files to Transfer

If the file transfer mechanism is enabled,HTCondor will transfer the following files before the jobis run on a remote machine.

  1. the executable, as defined with the executable command
  2. the input, as defined with the input command
  3. any jar files, for the java universe, as defined with the jar_files command
If the job requires other input files,the submit description file should utilize thetransfer_input_filesAre command.This comma-separated list specifies any other files or directories that HTCondor is totransfer to the remote scratch directory,to set up the execution environment for the job before it is run.These files are placed in the same directory as the job's executable.For example:

This example explicitly enables the file transfer mechanism,and it transfers the executable, the file specified by the inputcommand, any jar files specified by the jar_files command,and files file1 and file2.

If the file transfer mechanism is enabled,HTCondor will transfer the following files from the execute machineback to the submit machine after the job exits.

  1. the output file, as defined with the output command
  2. the error file, as defined with the error command
  3. any files created by the job in the remote scratch directory;this only occurs for jobs other than griduniverse, and for HTCondor-C grid universe jobs;directories created by the job within the remote scratch directoryare ignored for this automatic detection of files to be transferred.

A path given for output and error commands representsa path on the submit machine. If no path is specified, the directoryspecified with initialdir is used, and if that is not specified,the directory from which the job was submitted is used.At the time the job is submitted, zero-length files are createdon the submit machine, at the given path for the files defined by the output and error commands.This permits job submission failure, if these files cannot be written byHTCondor.

To restrict the output files or permit entire directory contents to be transferred,specify the exact list with transfer_output_files.Delimit the list of file names, directory names, or paths with commas.When this list is defined, and any of the files or directoriesdo not exist as the job exits,HTCondor considers this an error, and places the job on hold.When this list is defined, automatic detection of output files created bythe job is disabled.Paths specified in this list refer to locations on the executemachine. The naming and placement of files and directories relies on theterm base name. By example, the path a/b/c has the base name c.It is the file name or directory name with all directoriesleading up to that name stripped off.On the submit machine, the transferred files or directoriesare named using only the base name.Therefore, each output file or directory must have a different name,even if they originate from different paths.

For grid universe jobs other than than HTCondor-C grid jobs,files to be transferred (other than standard output and standard error)must be specified using transfer_output_filesin the submit description file, because automatic detection of new filescreated by the job does not take place.

Here are examples to promote understanding of what files anddirectories are transferred, and how they are named after transfer.Assume that the job produces the following structure within theremote scratch directory:

If the submit description file setsthen transferred back to the submit machine will beNote that the directory d1 and all its contents are specified,and therefore transferred. If the directory d1 is not created by the job before exit,then the job is placed on hold. If the directory d1 is created by the job before exit,but is empty, this is not an error.

If, instead, the submit description file setsthen transferred back to the submit machine will beNote that only the base name is used in the naming and placementof the file specified with d1/o3.

2.5.4.3 File Paths for File Transfer

The file transfer mechanism specifies file names and/or paths onboth the file system of the submit machine and on thefile system of the execute machine.Care must be taken to know which machine, submit or execute,is utilizing the file name and/or path.

Files in the transfer_input_files commandare specified as they are accessed on the submit machine.The job, as it executes, accesses files as they arefound on the execute machine.

There are three ways to specify files and pathsfor transfer_input_files:

  1. Relative to the current working directory as the job is submitted,if the submit command initialdir is not specified.
  2. Relative to the initial directory, if the submit command initialdir is specified.
  3. Absolute.

Before executing the program, HTCondor copies theexecutable, an input file as specifiedby the submit command input,along with any input files specified by transfer_input_files.All these files are placed intoa remote scratch directory on the execute machine,in which the program runs.Therefore,the executing program must access input files relative to itsworking directory.Because all files and directories listed for transfer are placed into a single,flat directory,inputs must be uniquely named toavoid collision when transferred.A collision causes the last file in the list tooverwrite the earlier one.

Both relative and absolute paths may be used intransfer_output_files. Relative paths are relative tothe job's remote scratch directory on the execute machine.When the files and directories are copied back to the submit machine, theyare placed in the job's initial working directory as the base name ofthe original path. An alternate name or path may be specified by usingtransfer_output_remaps.

A job may create files outside the remote scratch directorybut within the file system of the execute machine,in a directory such as /tmp,if this directory is guaranteed to exist and beaccessible on all possible execute machines.However,HTCondor will not automaticallytransfer such files back after execution completes, nor will it cleanup these files.

Here are several examples to illustrate the use of file transfer.The program executable is called my_program,and it uses three command-line arguments as it executes: two input file names and an output file name.The program executable and the submit description file for this job are located in directory/scratch/test.

Here is the directory tree as it exists on the submit machine,for all the examples:

Example 1

This first example explicitly transfers input files.These input files to be transferredare specified relative to the directory where the job is submitted.An output file specified in the arguments command, out1,is created when the job is executed.It will be transferred back into the directory /scratch/test.

The log file is written on the submit machine, and is not involvedwith the file transfer mechanism.

Example 2

This second example is identical to Example 1,except that absolute paths to the input files are specified,instead of relative paths to the input files.

Example 3

This third example illustrates the use of the submit command initialdir, and its effecton the paths used for the various files.The expected location of the executable is not affected by the initialdir command.All other files(specified by input, output, error,transfer_input_files,as well as files modified or created by the joband automatically transferred back)are located relative to the specified initialdir.Therefore, the output file, out1,will be placed in the files directory.Note that the logs2 directoryexists to make this example work correctly.

Example 4 - Illustrates an Error

This example illustrates a job that will fail.The files specified using thetransfer_input_files command workcorrectly (see Example 1).However,relative paths to files in thearguments commandcause the executing program to fail.The file system on the submission side may utilizerelative paths to files,however those files are placed into the single,flat, remote scratch directory on the execute machine.

This example fails with the following error:

Example 5 - Illustrates an Error

As with Example 4,this example illustrates a job that will fail.The executing program's use of absolute paths cannot work.

The job fails with the following error:

Example 6

This example illustrates a casewhere the executing program creates an output file in a directoryother than within the remote scratch directory that the program executes within.The file creation may or may not cause an error,depending on the existence and permissionsof the directories on the remote file system.

The output file /tmp/out1 is transferred back to the job'sinitial working directory as /scratch/test/out1.

2.5.4.4 Behavior for Error Cases

This section describes HTCondor's behavior for some error casesin dealing with the transfer of files.
Disk Full on Execute Machine
When transferring any files from the submit machine to the remote scratch directory, if the disk is full on the execute machine, then the job is place on hold.
Error Creating Zero-Length Files on Submit Machine
As a job is submitted, HTCondor creates zero-length files as placeholders on the submit machine for the files defined by output and error. If these files cannot be created, then job submission fails.Here are not enough slots available in the system to satisfy the key

This job submission failure avoids having the job run to completion, only to be unable to transfer the job's output due to permission errors.

Error When Transferring Files from Execute Machine to Submit Machine
When a job exits, or potentially when a job is evicted from an execute machine, one or more files may be transferred from the execute machine back to the machine on which the job was submitted.

During transfer, if any of the following three similar types of errors occur, the job is put on hold as the error occurs.

  1. If the file cannot be opened on the submit machine, for example because the system is out of inodes.
  2. If the file cannot be written on the submit machine, for example because the permissions do not permit it.
  3. If the write of the file on the submit machine fails, for example because the system is out of disk space.


2.5.4.5 File Transfer Using a URL

Instead of file transfer that goes only between the submit machineand the execute machine,HTCondor has the ability to transfer files from a location specifiedby a URL for a job's input file,or from the execute machine to a location specified by a URLfor a job's output file(s).This capability requires administrative set up, as described in section 3.12.2.

The transfer of an input file is restricted tovanilla and vm universe jobs only.HTCondor's file transfer mechanism must be enabled.Therefore, the submit description file for the job will define bothshould_transfer_files and when_to_transfer_output.In addition, the URL for any files specified with a URL aregiven in the transfer_input_files command.An example portion of the submit description file for a jobthat has a single file specified with a URL:

The destination file is given by the file name within the URL.

Here Are Not Enough Slots Available In The System To Satisfy The Needs

For the transfer of the entire contents of the output sandbox,which are all files that the job creates or modifies,HTCondor's file transfer mechanism must be enabled.In this sample portion of the submit description file,the first two commands explicitly enable file transfer,and the added output_destination command providesboth the protocol to be used and the destination of the transfer.Note that with this feature, no files are transferred back to the submit machine. This does not interfere with the streaming of output.

If only a subset of the output sandbox should be transferred,the subset is specified by further adding a submit command of the form:

2.5.4.6 Requirements and Rank for File Transfer

The requirements expression for a job must dependon the should_transfer_files command.The job must specify the correct logic to ensure that the job is matchedwith a resource that meets the file transfer needs.If no requirements expression is in the submit description file,or if the expression specified does not refer to theattributes listed below, condor_submit adds anappropriate clause to the requirements expression for the job.condor_submit appends these clauses with a logical AND, &&,to ensure that the proper conditions are met.Here are the default clauses corresponding to the different values ofshould_transfer_files:

  1. should_transfer_files = YES results in the addition ofthe clause (HasFileTransfer). If the job is always going to transfer files, it is required to match with a machine that has the capability to transfer files.
  2. should_transfer_files = NO results in the addition of (TARGET.FileSystemDomain MY.FileSystemDomain). In addition, HTCondor automatically adds the FileSystemDomain attribute to the job ClassAd, with whatever string is defined for the condor_schedd to which the job is submitted. If the job is not using the file transfer mechanism, HTCondor assumes it will need a shared file system, and therefore, a machine in the same FileSystemDomain as the submit machine.
  3. should_transfer_files = IF_NEEDED results in the addition of If HTCondor will optionally transfer files, it must require that the machine is either capable of transferring files or in the same file system domain.

To ensure that the job is matched to a machine with enough local diskspace to hold all the transferred files, HTCondor automatically adds theDiskUsage job attribute.This attribute includes the totalsize of the job's executable and all input files to be transferred.HTCondor then adds an additional clause to the Requirementsexpression that states that the remote machine must have at leastenough available disk space to hold all these files:

If should_transfer_files = IF_NEEDED and the job prefersto run on a machine in the local file system domainover transferring files,but is still willing to allow the job to run remotely and transfer files,the Rank expression works well. Use:

The Rank expression is a floating point value,so if other items are considered in ranking the possible machines this jobmay run on, add the items:

The value of kflops can vary widely among machines,so this Rank expression will likely not do as it intends.To place emphasis on the job running in the same file system domain,but still consider floating point speed among the machines in the file system domain,weight the part of the expression that is matching the file system domains.For example:

2.5.5 Environment Variables

The environment under which a job executes often containsinformation that is potentially useful to the job.HTCondor allows a user to both set and reference environmentvariables for a job or job cluster.

Within a submit description file, the user may define environmentvariables for the job's environment by using the environment command.See within the condor_submit manual page atsection 10 for more details about this command.

The submitter's entire environment can be copied into the jobClassAd for the job at job submission.The getenv command within the submit description filedoes this,as described at section 10.

If the environment is set with the environment command andgetenv is also set to true, values specified withenvironment override values in the submitter's environment,regardless of the order of the environment and getenvcommands.

Commands within the submit description file may reference theenvironment variables of the submitter as a job is submitted.Submit description file commands use $ENV(EnvironmentVariableName)to reference the value of an environment variable.

HTCondor sets several additional environment variables for each executingjob that may be useful for the job to reference.

  • _CONDOR_SCRATCH_DIR gives the directorywhere the job may place temporary data files. This directory is unique for every job that is run,and its contents are deleted by HTCondorwhen the job stops running on a machine, no matter how the job completes.
  • _CONDOR_SLOTgives the name of the slot (for SMP machines), on which the job is run.On machines with only a single slot, the value of this variable will be1, just like the SlotID attribute in the machine'sClassAd.This setting is available in all universes.See section 3.5.10 for more details about SMPmachines and their configuration.
  • CONDOR_VMequivalent to _CONDOR_SLOT described above, except that it isonly available in the standard universe.NOTE: As of HTCondor version 6.9.3, this environment variable is no longerused.It will only be defined if the ALLOW_VM_CRUFT configurationvariable is set to True.
  • X509_USER_PROXYgives the full path to the X.509 user proxy file if one isassociated with the job. Typically, a user will specifyx509userproxy in the submit description file.This setting is currently available in thelocal, java, and vanilla universes.
  • _CONDOR_JOB_ADis the path to a file in the job's scratch directory which containsthe job ad for the currently running job. The job ad is currentas of the start of the job, but is not updated during the runningof the job. The job may read attributes and their values out ofthis file as it runs, but any changes will not be acted on in anyway by HTCondor. The format is the same as the output of thecondor_q-l command. This environment variable may be particularlyuseful in a USER_JOB_WRAPPER.
  • _CONDOR_MACHINE_ADis the path to a file in the job's scratch directory which containsthe machine ad for the slot the currently running job is using. The machine ad is current as of the start of the job, but is not updated during the runningof the job. The format is the same as the output of thecondor_status-l command.
  • _CONDOR_JOB_IWDis the path to the initial working directory the job was born with.
  • _CONDOR_WRAPPER_ERROR_FILEis only set when the administrator has installed a USER_JOB_WRAPPER.If this file exists, HTCondor assumes that the job wrapper has failedand copies the contents of the file to the StarterLog for the administratorto debug the problem.

2.5.6 Heterogeneous Submit: Execution on Differing Architectures

If executables are available for the different platforms of machinesin the HTCondor pool,HTCondor can be allowed the choice of a larger number of machineswhen allocating a machine for a job.Modifications to the submit description file allow this choiceof platforms.

A simplified example is a cross submission.An executable is available for one platform, butthe submission is done from a different platform.Given the correct executable, the requirements command inthe submit description file specifies the target architecture.For example, an executable compiled for a 32-bit Intel processorrunning Windows Vista, submittedfrom an Intel architecture running Linux would add the requirementWithout this requirement, condor_submitwill assume that the program is to be executed ona machine with the same platform as the machine where the jobis submitted.

Cross submission works for all universes except scheduler andlocal.See section 5.3.9 for how matchmaking works in thegrid universe.The burden is on the user to both obtain and specifythe correct executable for the target architecture.To list the architecture and operating systems of the machinesin a pool, run condor_status.

2.5.6.1 Vanilla Universe Example for Execution on Differing Architectures

A more complex example of a heterogeneous submissionoccurs when a job may be executed onmany different architectures to gain fulluse of a diverse architecture and operating system pool.If the executables are available for the different architectures,then a modification to the submit description filewill allow HTCondor to choose an executable after anavailable machine is chosen.

A special-purpose Machine Ad substitution macro can be used instringattributes in the submit description file.The macro has the formThe $$() informs HTCondor to substitute the requested MachineAdAttribute from the machine where the job will be executed.

An example of the heterogeneous job submissionhas executables available for two platforms:RHEL 3 on both 32-bit and 64-bit Intel processors.This example uses povrayto render images using a popular free rendering engine.

The substitution macro chooses a specific executable aftera platform for running the job is chosen.These executables must therefore be named based on themachine attributes that describe a platform.The executables named will work correctly for the macro

The executables or links to executables with this nameare placed into the initial working directory so that they may befound by HTCondor. A submit description file that queues three jobs for this example:

These jobs are submitted to the vanilla universeto assure that once a job is started on a specific platform,it will finish running on that platform.Switching platforms in the middle of job execution cannotwork correctly.

There are two common errors made with the substitution macro.The first is the use of a non-existent MachineAdAttribute.If the specified MachineAdAttribute does notexist in the machine's ClassAd, then HTCondor will placethe job in the held state until the problem is resolved.

The second common error occurs due to an incomplete job set up.For example, the submit description file given above specifiesthree available executables.If one is missing, HTCondor reports back that anexecutable is missing when it happens to match thejob with a resource that requires the missing binary.

2.5.6.2 Standard Universe Example for Execution on Differing Architectures

Jobs submitted to the standard universe may produce checkpoints.A checkpoint can then be used to start up and continue executionof a partially completed job.For a partially completed job, the checkpoint and the job are specificto a platform.If migrated to a different machine, correct execution requires thatthe platform must remain the same.

In previous versions of HTCondor, the author of the heterogeneoussubmission file would need to write extra policy expressions in therequirements expression to force HTCondor to choose thesame type of platform when continuing a checkpointed job.However, since it is needed in the common case, thisadditional policy is now automatically addedto the requirements expression.The additional expression is addedprovided the user does not useCkptArch in the requirements expression.HTCondor will remain backward compatible for those users who have explicitlyspecified CkptRequirements-implying use of CkptArch,in their requirements expression.

The expression added when the attribute CkptArch is not specified will default to

Here Are Not Enough Slots Available In The System To Satisfy

The behavior of the CkptRequirements expressions and its addition torequirements is as follows.The CkptRequirements expression guarantees correct operationin the two possible cases for a job.In the first case, the job has not produced a checkpoint.The ClassAd attributes CkptArch and CkptOpSyswill be undefined, and therefore the meta operator (=?=)evaluates to true.In the second case, the job has produced a checkpoint.The Machine ClassAd is restricted to require further executiononly on a machine of the same platform.The attributes CkptArch and CkptOpSyswill be defined, ensuring that the platform chosen for furtherexecution will be the same as the one used just before thecheckpoint.

Note that this restriction of platforms also applies to platforms wherethe executables are binary compatible.

The complete submit description file for this example:

2.5.6.3 Vanilla Universe Example for Execution on Differing Operating Systems

The addition of several related OpSys attributes assists in selection of specific operating systems and versions in heterogeneous pools.

Here is a more compact way to specify a RedHat 6 platform.


Next:2.6 Managing a Job Up:2. Users' Manual Previous:2.4 Road-map for RunningContentsIndexhtcondor-admin@cs.wisc.edu

Package: openmpi-bin;Maintainer for openmpi-bin is Alastair McKinstry <mckinstry@debian.org>; Source for openmpi-bin is src:openmpi (PTS, buildd, popcon).

Reported by: Heinrich Schuchardt <xypron.glpk@gmx.de>

Date: Mon, 2 Jan 2017 19:39:01 UTC

Severity: normal

Found in version openmpi/2.0.2~git.20161225-8

Reply or subscribe to this bug.

View this report as an mbox folder, status mbox, maintainer mbox

Report forwardedto debian-bugs-dist@lists.debian.org, xypron.glpk@gmx.de, Alastair McKinstry <mckinstry@debian.org>:
Bug#849974; Package openmpi-bin. (Mon, 02 Jan 2017 19:39:04 GMT) (full text, mbox, link).

Acknowledgement sentto Heinrich Schuchardt <xypron.glpk@gmx.de>:
New Bug report received and forwarded. Copy sent to xypron.glpk@gmx.de, Alastair McKinstry <mckinstry@debian.org>. (Mon, 02 Jan 2017 19:39:04 GMT) (full text, mbox, link).

Message #5 received at submit@bugs.debian.org (full text, mbox, reply):

To: Debian Bug Tracking System <submit@bugs.debian.org>
Date: Mon, 02 Jan 2017 20:37:11 +0100

Information forwardedto debian-bugs-dist@lists.debian.org, Alastair McKinstry <mckinstry@debian.org>:
Bug#849974; Package openmpi-bin. (Mon, 02 Jan 2017 22:39:04 GMT) (full text, mbox, link).

Acknowledgement sentto Thibaut Paumard <thibaut@debian.org>:
Extra info received and forwarded to list. Copy sent to Alastair McKinstry <mckinstry@debian.org>. (Mon, 02 Jan 2017 22:39:04 GMT) (full text, mbox, link).

Message #10 received at 849974@bugs.debian.org (full text, mbox, reply):

Here Are Not Enough Slots Available In The System To Satisfy Two

To: Heinrich Schuchardt <xypron.glpk@gmx.de>, 849974@bugs.debian.org
Subject: Re: Bug#849974: openmpi: not enough slots available

Information forwardedto debian-bugs-dist@lists.debian.org, Alastair McKinstry <mckinstry@debian.org>:
Bug#849974; Package openmpi-bin. (Thu, 19 Jan 2017 12:51:04 GMT) (full text, mbox, link).

Acknowledgement sentto Graham Inggs <ginggs@debian.org>:
Extra info received and forwarded to list. Copy sent to Alastair McKinstry <mckinstry@debian.org>. (Thu, 19 Jan 2017 12:51:04 GMT) (full text, mbox, link).

Message #15 received at 849974@bugs.debian.org (full text, mbox, reply):

To: 849974@bugs.debian.org
Date: Thu, 19 Jan 2017 14:49:24 +0200

Information forwardedto debian-bugs-dist@lists.debian.org, Alastair McKinstry <mckinstry@debian.org>:
Bug#849974; Package openmpi-bin. (Thu, 12 Jul 2018 13:57:03 GMT) (full text, mbox, link).

Acknowledgement sentto dparsons@debian.org:
Extra info received and forwarded to list. Copy sent to Alastair McKinstry <mckinstry@debian.org>. (Thu, 12 Jul 2018 13:57:03 GMT) (full text, mbox, link).

Message #20 received at 849974@bugs.debian.org (full text, mbox, reply):

To: 849974@bugs.debian.org
Date: Thu, 12 Jul 2018 21:48:33 +0800

Information forwardedto debian-bugs-dist@lists.debian.org, Alastair McKinstry <mckinstry@debian.org>:
Bug#849974; Package openmpi-bin. (Wed, 28 Aug 2019 21:45:03 GMT) (full text, mbox, link).

Acknowledgement sentto Graham Inggs <ginggs@debian.org>:
Extra info received and forwarded to list. Copy sent to Alastair McKinstry <mckinstry@debian.org>. (Wed, 28 Aug 2019 21:45:03 GMT) (full text, mbox, link).

Message #25 received at 849974@bugs.debian.org (full text, mbox, reply):

To: 849974@bugs.debian.org, Drew Parsons <dparsons@debian.org>
Date: Wed, 28 Aug 2019 23:44:12 +0200

Information forwardedto debian-bugs-dist@lists.debian.org, Alastair McKinstry <mckinstry@debian.org>:
Bug#849974; Package openmpi-bin. (Thu, 29 Aug 2019 01:03:03 GMT) (full text, mbox, link).

Acknowledgement sentto dparsons@debian.org:
Extra info received and forwarded to list. Copy sent to Alastair McKinstry <mckinstry@debian.org>. (Thu, 29 Aug 2019 01:03:03 GMT) (full text, mbox, link).

Message #30 received at 849974@bugs.debian.org (full text, mbox, reply):

To: Graham Inggs <ginggs@debian.org>
Subject: Re: openmpi: not enough slots available

Send a report that this bug log contains spam.

Debian bug tracking system administrator <owner@bugs.debian.org>.Last modified:Fri Dec 18 18:40:33 2020; Machine Name:bembo

Debbugs is free software and licensed under the terms of the GNU Public License version 2. The current version can be obtained from https://bugs.debian.org/debbugs-source/.

Copyright © 1999 Darren O. Benham,1997,2003 nCipher Corporation Ltd,1994-97 Ian Jackson,2005-2017 Don Armstrong, and many other contributors.