# Markov Decision Process (MDP) Toolbox: example module¶

The example module provides functions to generate valid MDP transition and reward matrices.

## Available functions¶

forest()
A simple forest management example
rand()
A random example
small()
A very small example
mdptoolbox.example.forest(S=3, r1=4, r2=2, p=0.1, is_sparse=False)[source]

Generate a MDP example based on a simple forest management scenario.

This function is used to generate a transition probability (A × S × S) array P and a reward (S × A) matrix R that model the following problem. A forest is managed by two actions: ‘Wait’ and ‘Cut’. An action is decided each year with first the objective to maintain an old forest for wildlife and second to make money selling cut wood. Each year there is a probability p that a fire burns the forest.

Here is how the problem is modelled. Let {0, 1 . . . S-1 } be the states of the forest, with S-1 being the oldest. Let ‘Wait’ be action 0 and ‘Cut’ be action 1. After a fire, the forest is in the youngest state, that is state 0. The transition matrix P of the problem can then be defined as follows:

           | p 1-p 0.......0  |
| .  0 1-p 0....0  |
P[0,:,:] = | .  .  0  .       |
| .  .        .    |
| .  .         1-p |
| p  0  0....0 1-p |

| 1 0..........0 |
| . .          . |
P[1,:,:] = | . .          . |
| . .          . |
| . .          . |
| 1 0..........0 |


The reward matrix R is defined as follows:

         |  0  |
|  .  |
R[:,0] = |  .  |
|  .  |
|  0  |
|  r1 |

|  0  |
|  1  |
R[:,1] = |  .  |
|  .  |
|  1  |
|  r2 |

S : int, optional
The number of states, which should be an integer greater than 1. Default: 3.
r1 : float, optional
The reward when the forest is in its oldest state and action ‘Wait’ is performed. Default: 4.
r2 : float, optional
The reward when the forest is in its oldest state and action ‘Cut’ is performed. Default: 2.
p : float, optional
The probability of wild fire occurence, in the range ]0, 1[. Default: 0.1.
is_sparse : bool, optional
If True, then the probability transition matrices will be returned in sparse format, otherwise they will be in dense format. Default: False.
Returns: out – out[0] contains the transition probability matrix P and out[1] contains the reward matrix R. If is_sparse=False then P is a numpy array with a shape of (A, S, S) and R is a numpy array with a shape of (S, A). If is_sparse=True then P is a tuple of length A where each P[a] is a scipy sparse CSR format matrix of shape (S, S); R remains the same as in the case of is_sparse=False. tuple

Examples

>>> import mdptoolbox.example
>>> P, R = mdptoolbox.example.forest()
>>> P
array([[[ 0.1,  0.9,  0. ],
[ 0.1,  0. ,  0.9],
[ 0.1,  0. ,  0.9]],

[[ 1. ,  0. ,  0. ],
[ 1. ,  0. ,  0. ],
[ 1. ,  0. ,  0. ]]])
>>> R
array([[ 0.,  0.],
[ 0.,  1.],
[ 4.,  2.]])
>>> Psp, Rsp = mdptoolbox.example.forest(is_sparse=True)
>>> len(Psp)
2
>>> Psp[0]
<3x3 sparse matrix of type '<... 'numpy.float64'>'
with 6 stored elements in Compressed Sparse Row format>
>>> Psp[1]
<3x3 sparse matrix of type '<... 'numpy.int64'>'
with 3 stored elements in Compressed Sparse Row format>
>>> Rsp
array([[ 0.,  0.],
[ 0.,  1.],
[ 4.,  2.]])
>>> (Psp[0].todense() == P[0]).all()
True
>>> (Rsp == R).all()
True


Generate a random Markov Decision Process.

Parameters: S (int) – Number of states (> 1) A (int) – Number of actions (> 1) is_sparse (bool, optional) – False to have matrices in dense format, True to have sparse matrices. Default: False. mask (array, optional) – Array with 0 and 1 (0 indicates a place for a zero probability), shape can be (S, S) or (A, S, S). Default: random. out – out[0] contains the transition probability matrix P and out[1] contains the reward matrix R. If is_sparse=False then P is a numpy array with a shape of (A, S, S) and R is a numpy array with a shape of (S, A). If is_sparse=True then P and R are tuples of length A, where each P[a] is a scipy sparse CSR format matrix of shape (S, S) and each R[a] is a scipy sparse csr format matrix of shape (S, 1). tuple

Examples

>>> import numpy, mdptoolbox.example
>>> numpy.random.seed(0) # Needed to get the output below
>>> P, R = mdptoolbox.example.rand(4, 3)
>>> P
array([[[ 0.21977283,  0.14889403,  0.30343592,  0.32789723],
[ 1.        ,  0.        ,  0.        ,  0.        ],
[ 0.        ,  0.43718772,  0.54480359,  0.01800869],
[ 0.39766289,  0.39997167,  0.12547318,  0.07689227]],

[[ 1.        ,  0.        ,  0.        ,  0.        ],
[ 0.32261337,  0.15483812,  0.32271303,  0.19983549],
[ 0.33816885,  0.2766999 ,  0.12960299,  0.25552826],
[ 0.41299411,  0.        ,  0.58369957,  0.00330633]],

[[ 0.32343037,  0.15178596,  0.28733094,  0.23745272],
[ 0.36348538,  0.24483321,  0.16114188,  0.23053953],
[ 1.        ,  0.        ,  0.        ,  0.        ],
[ 0.        ,  0.        ,  1.        ,  0.        ]]])
>>> R
array([[[-0.23311696,  0.58345008,  0.05778984,  0.13608912],
[-0.07704128,  0.        , -0.        ,  0.        ],
[ 0.        ,  0.22419145,  0.23386799,  0.88749616],
[-0.3691433 , -0.27257846,  0.14039354, -0.12279697]],

[[-0.77924972,  0.        , -0.        , -0.        ],
[ 0.47852716, -0.92162442, -0.43438607, -0.75960688],
[-0.81211898,  0.15189299,  0.8585924 , -0.3628621 ],
[ 0.35563307, -0.        ,  0.47038804,  0.92437709]],

[[-0.4051261 ,  0.62759564, -0.20698852,  0.76220639],
[-0.9616136 , -0.39685037,  0.32034707, -0.41984479],
[-0.13716313,  0.        , -0.        , -0.        ],
[ 0.        , -0.        ,  0.55810204,  0.        ]]])
>>> numpy.random.seed(0) # Needed to get the output below
>>> Psp, Rsp = mdptoolbox.example.rand(100, 5, is_sparse=True)
>>> len(Psp), len(Rsp)
(5, 5)
>>> Psp[0]
<100x100 sparse matrix of type '<... 'numpy.float64'>'
with 3296 stored elements in Compressed Sparse Row format>
>>> Rsp[0]
<100x100 sparse matrix of type '<... 'numpy.float64'>'
with 3296 stored elements in Compressed Sparse Row format>
>>> # The number of non-zero elements (nnz) in P and R are equal
>>> Psp[1].nnz == Rsp[1].nnz
True

mdptoolbox.example.small()[source]

A very small Markov decision process.

The probability transition matrices are:

    | | 0.5 0.5 | |
| | 0.8 0.2 | |
P = |             |
| | 0.0 1.0 | |
| | 0.1 0.9 | |


The reward matrix is:

R = |  5 10 |
| -1  2 |

Returns: out – out[0] is a numpy array of the probability transition matriices. out[1] is a numpy arrray of the reward matrix. tuple

Examples

>>> import mdptoolbox.example
>>> P, R = mdptoolbox.example.small()
>>> P
array([[[ 0.5,  0.5],
[ 0.8,  0.2]],

[[ 0. ,  1. ],
[ 0.1,  0.9]]])
>>> R
array([[ 5, 10],
[-1,  2]])