###################################################################### # FILENAME: 4x3.POMDP # Stuart Russell's 4x3 maze # # The maze looks like this: # # # ###### # # +# # # # -# # # # # ###### # # The + indicates a reward of 1.0, the - a penalty of -1.0. # The # in the middle of the maze is an obstruction. # Rewards and penalties are associated with states, not actions. # The default reward/penalty is -0.04. # There is no discounting, but a there is an absorbing state that # + and - transition to automatically. The absorbing state cannot be exited. # # States are numbered from left to right: # # 0 1 2 3 # 4 5 6 # 7 8 9 10 # # I removed the absorbing state # # The actions, NSEW, have the expected result 80% of the time, and # transition in a direction perpendicular to the intended on with a 10% # probability for each direction. Movement into a wall returns the agent # to its original state. # # Observation is limited to two wall detectors that can detect when a # a wall is to the left or right. This gives the following possible # observations: # # left, right, neither, both, good, bad, and absorb # # good = +1 reward, bad = -1 penalty, discount: 0.95 values: reward states: 11 actions: n s e w observations: left right neither both good bad start: 0.111111 0.111111 0.111111 0.0 0.111111 0.111111 0.0 0.111112 0.111111 0.111111 0.111111 T: n 0.9 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.8 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.8 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.111111 0.111111 0.111111 0.0 0.111111 0.111111 0.0 0.111112 0.111111 0.111111 0.111111 0.8 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.8 0.0 0.0 0.1 0.1 0.0 0.0 0.0 0.0 0.111111 0.111111 0.111111 0.0 0.111111 0.111111 0.0 0.111112 0.111111 0.111111 0.111111 0.0 0.0 0.0 0.0 0.8 0.0 0.0 0.1 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.8 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.8 0.0 0.0 0.1 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.8 0.0 0.0 0.1 0.1 T: s 0.1 0.1 0.0 0.0 0.8 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.8 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.1 0.0 0.8 0.0 0.0 0.0 0.0 0.0 0.111111 0.111111 0.111111 0.0 0.111111 0.111111 0.0 0.111112 0.111111 0.111111 0.111111 0.0 0.0 0.0 0.0 0.2 0.0 0.0 0.8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.1 0.0 0.0 0.8 0.0 0.111111 0.111111 0.111111 0.0 0.111111 0.111111 0.0 0.111112 0.111111 0.111111 0.111111 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.9 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.8 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.8 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.9 T: e 0.1 0.8 0.0 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.2 0.8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.8 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.111111 0.111111 0.111111 0.0 0.111111 0.111111 0.0 0.111112 0.111111 0.111111 0.111111 0.1 0.0 0.0 0.0 0.8 0.0 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.0 0.0 0.8 0.0 0.0 0.1 0.0 0.111111 0.111111 0.111111 0.0 0.111111 0.111111 0.0 0.111112 0.111111 0.111111 0.111111 0.0 0.0 0.0 0.0 0.1 0.0 0.0 0.1 0.8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.2 0.8 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.0 0.1 0.0 0.8 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.0 0.0 0.9 T: w 0.9 0.0 0.0 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.8 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.8 0.1 0.0 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.111111 0.111111 0.111111 0.0 0.111111 0.111111 0.0 0.111112 0.111111 0.111111 0.111111 0.1 0.0 0.0 0.0 0.8 0.0 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.0 0.8 0.0 0.0 0.0 0.1 0.0 0.111111 0.111111 0.111111 0.0 0.111111 0.111111 0.0 0.111112 0.111111 0.111111 0.111111 0.0 0.0 0.0 0.0 0.1 0.0 0.0 0.9 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.8 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.0 0.8 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.0 0.8 0.1 O: * 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 R: * : 0 : * : * -0.04 R: * : 1 : * : * -0.04 R: * : 2 : * : * -0.04 R: * : 3 : * : * 1.0 R: * : 4 : * : * -0.04 R: * : 5 : * : * -0.04 R: * : 6 : * : * -1.0 R: * : 7 : * : * -0.04 R: * : 8 : * : * -0.04 R: * : 9 : * : * -0.04 R: * : 10 : * : * -0.04