About this Page
This page describes the file format for a policy graph file output by
the 'pomdp-solve' program (usually with suffix ".pg").
About Policy Graphs
If the solution to an infinite horizon POMDP problem converges,
then a finite state controller can be created from the value
function's partitioning of the belief space. With this finite
state controller, one can execute the optimal policy without
needing to track the belief state. To use this first requires
knowing which of the policy graph states to start in. This
can be achieved by finding the alpha vector with the maximal dot
product with the initial starting state. That "best" alpha vector
will align with the nodes in the output policy graph, so that
determines the starting point in the finite state controller.
The node of the policy graph dictates the action to take.
After that, the observation received is used to lookup the next
node in the polciy graph, and hence the next action to take.
This repeats as the way to execute the optimal policy.
Each line of the file represents one node of the policy graph and its
contents are:
N A Z1 Z2 Z3 ...
Here 'N' is a node ID giving the node a unique name,
numbered sequentially and lining up sequentially with the value function
vectors in the corresponding output '.alpha' file
(see ).
The 'A' is the
action number defined for this node; it is an integer refering to the
the POMDP file actions by its 0-based index number.
These are followed by a list of node IDs, one
for each observation. Thus the list will have a length equal to the
number of observations in the POMDP. This list specifies the
transitions in the policy graph. The n'th number in the list will be
the index of the node that follows this one when the observation
received is 'n'.