Notice
The command line options below pertain only to pre-5.2
versions. Beginning with the 5.2 version, command line options (and
configuration file options) are represented in an XML format.
For the command line options for the latest version of
pomdp-solve, see:
Latest Command Line
Options.
Command Line Options: Version 5.0 and Earlier
There are a lot of command line options to the program which break
down roughly into a number of categories:
- General options;
- Resource limit options;
- Value iteration options;
- Optimization options;
- Algorithm options;
- Debug options.
If either the "-h" or "-help" options appears
anywhere on the command line, e.g., :
- pomdp-solve -h
- pomdp-solve -help
appears anywhere on the command line, then you get the full list of
all available options, grouped in the categorizes shown above. The
command line arguments are described below in alphabetical order.
Auto Saving
Because solving a POMDP can oftem take a lot of time, the running
instances tend to get aborted (either deliberately or mistakenly). To
make sure that as little work as possible is lost, the program saves
the latest answer it has computed in a temporray file
"##pomdp-solve-ZZZ##" where the 'ZZZ' is replaced with the process
identification number (PID) of the process. Thus, an inadvertly killed
program can be restarted, provided you know the PID of the process,
with the last computed solution. The PID of the process is printed at
the top of the argument list when the program is first started. Note
that this is a little different than the '-save_all' argument, since
saving all solutions gives a different filename for each epoch,
whereas this "auto-save" feature just keeps over-writing the same file
with the latest solution. This also means that a bunch of irregularly
terminated pomdp-solve programs will leave these temporary files
around, so some clean-up might be necessary.
Command Line Options
-alg_rand (int)
Some algorithms can save some LPs by initializing the set of vectors it
produces by generating vectors for a set of random points. This
argument specifies how many random points to use in this
initialization. The default is zero.
-discount (0-1)
The POMDP parameter file specified with the '-p' argument has a
discount factor in it. If this argument is used, the given value will
over-ride the one in that file. Default is not to override and valid
values are between zero and one.
-dom_check [ false, true ]
The single most important contribution to reducing the number of LPs
required is to use a sinple componentwise domination check. Anywhere
a set of vectors is pruned down from another set, you have the option
of doing this domination check. The default is true.
-end_epsilon (0-infty)
Must be used in conjunction with the '-vi_variation
adjustable_epsilon' argument. For the automatic epsilon adjustment,
you need to tell the algorithm at what point to stop adjusting the
epsilon. This defines the limits on the epsilon precision that is
used. The default value is 1e-3.
-enum_purge [ none, domonly, normal_prune, epsilon_prune ]
Must be used in conjunction with the '-method enum' argument. This
defines what type of purging is used to reduce a set of vectors down
to its parsimonious representation. The 'none' option means that no
pruning is done and the vectors are passed to the next value iteration
epoch. The 'domonly' option uses only simple component-wise vector
domination checks to prune the set. The default is 'normal_prune'
which is a domination check followed by using the LP-based Lark/White
pruning algorithm. Finally, 'epsilon_prune' uses an algorithm which
approximates the set to within epsilon of the true set.
-epsilon (0-infty)
This argument defines the precision with which the pruning operations
occur. The exception is the epsilon pruning algorithm which uses the
argument '-prune_epsilon' to define its tolerance. The default is
1e-9.
-epsilon_adjust (0-infty)
Must be used in conjunction with the '-vi_variation
adjustable_epsilon' argument. For the automatic epsilon adjustment,
you need to tell the algorithm how the epsilon should be adjusted.
This is the adjustment factor which is used to divide the current
epsilon. The default is '10', which reduces epsilon to 1/10 its value
every time it is adjusted.
-history_delta (int)
Must be used in conjunction with the '-vi_variation
adjustable_epsilon' argument. For the automatic epsilon adjustment,
you need to tell the algorithm when to adjust the epsilon. This is a
function of looking at the last N epochs and seeing if the differenceqs
in size are within 'delta' of each other. If so, then the epsilon is
changed. This defines the 'delta' value and its default value is '3'.
-history_length (int)
Must be used in conjunction with the '-vi_variation
adjustable_epsilon' argument. For the automatic epsilon adjustment,
you need to tell the algorithm when to adjust the epsilon. This is a
function of looking at the last N epochs and seeing if the
differences in size are within 'delta' of each other.
If so, then the epsilon is changed. This defines the 'N' parameter
and its default value is '5'.
-horizon (int)
Allows you to specify how many iterations (epochs) of value iteration
to run, thereby making it a finite horizon solution. If this value is
less than zero, then an infinite horizon solution is assumed. The
default value for this is '-1', i.e., inifnite horizon.
-inc_prune [ normal, restricted_region, generalized ]
Must be used in conjunction with the '-method incprune' argument. The
Incremental Pruning algorithm has a basic normal form and a more
generalized form. Somewhere in between is the restricted region
variation. The normal one is conceptually and computationally much
simpler, but the genralized form has better theoretical properties.
This option selects which version of incremental pruning to use and
the default is 'normal'.
-terminal_values (alpha-file)
Without this argument, value iteration assumes that the terminal
values (or initial values, depending on which side you view the
iteration from) are all zero. You can specify different terminal
values as a piecewise linear set of vectors that are in the specified
file. The format of this file needs to be the same as what the
program outputs for the value functions answers. Thus the answer to
one run of the program can be used as the input to another,
essentially continuing the work from the place that it was left off.
It currently requires a file in a very specific, somewhat brittle
format described in 'alpha-file-spec.txt'.
-lp_epsilon (0-infty)
It sets the precision of the LP solver. Currently this option does
nothing for non-CPLEX versions. For CPLEX versions is sets some
internal precision parameters, but I do not really know what affect it
has.
-max_soln_size (int)
This argument will result in the program adjusting the precision to
try to attain a set of vectors close to this value. (Currently
unimplemented.)
-memory_limit (bytes)
Limits the amount of memory used by the program to this number of
bytes. Default is to have unlimited memory (up to virtual memory
size.) Currently this is untested on non-Sun Solaris OSes.
-method [ enum, twopass, linsup, witness, incprune ]
Defines which algorithm to use to solve the POMDP. The 'enum' option
is the exhaustive enumeration methods as first described by Sondik in
his 1971 thesis and later discussed by Monahan in his 1982 survey
paper. The 'twopass' is described by Sondik in his thesis as well.
The 'linsup' is the linear support algorithm developed by Cheng in his
1988 thesis. The linear support is only currently supported for CPLEX
versions and the new code is untested at this time. The 'witness'
algorithm was developed by Littman, Cassandra and Kaelbling and is
described in Littman's (1996) and Cassandra's (1998) theses. The
'incprune' is incremental pruning first developed by Zhang and Liu in
1996 and refined and genralized in a UAI'97 paper by Cassandra,
Littman and Zhang. The default algorithm is 'incprune'.
-o (file-prefix)
The result of solving a POMDP is a set of vectors representing the
value function and also a policy graph file mapping the solution to
the vectors of the previous solution. These files are suffixed with
'.alpha' and '.pg' respectively and use the prefix specified here.
The format of these files are described in the files
'alpha-file-spec.txt' and 'pg-file-spec.txt'. The default output file
prefix is 'solution'.
-p (pomdp-file)
This is the filename of the POMDP problem to be solved. Usually this
ends in the suffix '.POMDP'. The POMDP file must conform to the
specific syntax and semantics, which is described in
'pomdp-file-spec.txt'. This parameter is required.
-proj_purge [ none, domonly, normal_prune, epsilon_prune ]
The first step of a single DP iteration does a projection of the
previous iterations value function. After this projection, so of the
resulting vectors might be useless and can be purged away. This
option says how to purge these projection sets.
-prune_epsilon (0-infty)
This goes with the '-vi_variation epsilon_prune' arguemnt and defines
what level of precision to use in the approximation. The default
value is 1e-3.
-prune_rand (int)
The pruning of sets of vectors can save a few LPs by having a set of
point from which to initialize the process. One way to generate these
point is randomly. This argument defines how many random point to use
for the initialization of the prune operations. The default value is
zero.
-q_purge [ none, domonly, normal_prune, epsilon_prune ]
The incremental pruning, witness and two pass algorithms solve the
problem one action at a time and then merge the results into a set
(called the Q value functions). This merged set can then be pruned to
a parsimonious set. This argument defines how the pruning of this set
should be done.
-rand_seed (N1:N2:N3)
To make the execution reproducible, the rand seed used by the program
can be set using this argument. The program will print out the random
seed to start with so this can be used as the input. This will only
affect runs which use random numbers (e.g., using random point
initializations.) If this argument is not used, it initializes to some
(pseudo) random seed.
-save_all
Normally, only the value function and policy graph solution of the
last iteration is saved in a file. However, with this command line
flag you can have the program save every iterations' solution. The
files will have the prefix (either the default 'solution' or that
specified with the '-o' option) and the '.alpha' or '.pg' suffix with
the epoch number appended to that name.
-start_epsilon (0-infty)
Must be used in conjunction with the '-vi_variation
adjustable_epsilon' argument. For the automatic epsilon adjustment,
you need to tell the algorithm when to adjust the epsilon. This is a
function of looking at the last N epochs and seeing if the differenceqs
in size are within 'delta' of each other. If so, then the epsilon is
changed. This argument defines the the value of epsilon for the first
ietration.
-stat_summary
As the algorithm progresses through the epochs, it maintain
information about each epoch such as the size of the solution, the
times for each aspect of the program and the actual epsilon from
optimal. With this command line flag, when the program finishes you
get a summary of the execution in a tabular format.
-stdout
Typically the program prints out some trace of the execution as it
solves the problem. The amount that gets printed out depends upon the
verboseness set (the '-verbose' command line argument). All of this
information is, by default, written to stdout. If you want to write
it to a file, use this option to give the name of that file.
-stop_criteria [ exact, weak ]
When the finite horizon is run for longer and longer epochs, the value
function approaches the infinite horizon value function (assuming a
discount factor of less than 1.0 is used.) When solving the infinite
horizon version of the problem, you need to define when the iteration
has converged. Since the finite precision of the machine makes the
real true exact solution nearly impossible to get, we use one of two
different stopping criteria to terminate with an answer close to the
true infinite horizon answer. There are currently two stopping
criteria. The 'exact' criteria requires that successive iterations
match identicalls, vector for vector and component for component. The
'weak' option is a little more flexible and just ensures that the two
sets do not differ more than some 'delta' in value from each other.
This delta is specified with the '-delta' argument. This is optional
and the default value is 'weak'.
-succinct
When this option is specified, the output of the program becomes very
terse, listing only a single line for each epoch. The exact format
depends on the method being used to solve the pomdp. They are as
follows:
linsup: NV V T
enum: NE ND V T
witness/incprune: Q1 Q2 ... QN V T
Key:
NV - number of vertices enumerated in linear support algorithm.
V - number of vectors in current value function
T - time required to solve for this epoch
NE - number of vectors enumerated with enumeration algorithm.
ND - number of vectors dominated with enumeration algorithm.
QX - number of vectors in the Q functions for action
X (witness and incprune algorithms only.)
This option is good for use with scripts that compile the data, since
it presents an easy to parse output. The '-stat_summary' makes this a
bit unnecessary since it provides the relevant data in a sparse format
-time_limit (int)
Limits the amount of time used by the program to this number of
seconds. (Currently untested on non-Sun Solaris OSes.) This only
guarantees a minimal time, since the time-out signal generated might
take a while to take effect. The program prints out a message about
timing out and the total execution time, which will be slightly more
than the preset limit.
-verbose [ context, lp, global, timing, stats, cmdline, main, alpha,
incprune, lpinterface, vertexenum, mdp, pomdp, param,
proj, crosssum, agenda, enum, twopass, linsup, witness,
parsimonious, region ]
The program offers the generation of more output which is roughly
divided into a bunch of classes roughly corresponding to the differnt
source files. To turn on a more verbose reporting of the internal
operation of these differnt aspects of the program, you use this
command line option. If you want to specify multiple verbose
arguments, then use a comma delimited list of the ones form the list
above. Note that on the command line, there can be *no* spaces on
either side of the commas (or anywhere else in the list of verbose
arguments).
-vi_variation [ normal, adjustable_epsilon, fixed_soln_size ]
Although the basic dynamic programming (DP) algorithm for POMDPs is
used (i.e., value iteration), the program offers a few variations of
how this proceeds. The 'normal' option means just that. The
'adjustable_epsilon' uses a simple algorithm for adjusting the
precision of the solution as the number of DP stages increase.
Essentially, its purpose is to start at a coarse aproximation, and
when it looks like that solution is settling, make the next stage's
solution more precise. The user has control over a few of the
parameters of this algorithm. See the command line arguments:
-start_epsilon
-end_epsilon
-epsilon_adjust
-history_length
-history_size
Finally, an unimplemented feature is the '-fixed_soln_size' argument.
This will eventually be a simple algorithm that tries to adjust the
solution precision to keep the solution size a constant size. A sort
of way to answer the question: "How precise a solution can be achieved
with N vectors?"
-weak_stop_delta (0-infty)
When solving the infinite horizon version of the problem, you need to
define when the iteration has converged. There are currently two
stopping criteria (see the '-stop_criteria' option). For the 'weak'
stopping criteria is just ensures that the two sets do not differ more
than some 'delta' in value from each other. This argument defines
what that delta is and has default value 1e-9.
-witness_points [ false, true ]
There are a number of unimplemented optimizations which will be added
to the code concerning the use of "witness" points to save from soing
a certain amount of LPs. Essentially, you can use points generated
from LPs to avoid future LPs, if you save the LP results the right
way. The magic to save the points is there in the code, but not the
magic to actually use the points.