Markov Decision Process (MDP) Toolbox: util module¶
The util module provides functions to check that an MDP is validly described. There are also functions for working with MDPs while they are being solved.
- Check that an MDP is properly defined
- Check that a matrix is square and stochastic
- Calculate the span of an array
- Check if a matrix has only non-negative elements
- Check if a matrix is square
- Check if a matrix is row stochastic
- mdptoolbox.util.check(P, R)¶
Check if P and R define a valid Markov Decision Process (MDP).
Let S = number of states, A = number of actions.
- P (array) – The transition matrices. It can be a three dimensional array with a shape of (A, S, S). It can also be a one dimensional arraye with a shape of (A, ), where each element contains a matrix of shape (S, S) which can possibly be sparse.
- R (array) – The reward matrix. It can be a three dimensional array with a shape of (S, A, A). It can also be a one dimensional array with a shape of (A, ), where each element contains matrix with a shape of (S, S) which can possibly be sparse. It can also be an array with a shape of (S, A) which can possibly be sparse.
Raises an error if P and R do not define a MDP.
>>> import mdptoolbox, mdptoolbox.example >>> P_valid, R_valid = mdptoolbox.example.rand(100, 5) >>> mdptoolbox.util.check(P_valid, R_valid) # Nothing should happen >>> >>> import numpy as np >>> P_invalid = np.random.rand(5, 100, 100) >>> mdptoolbox.util.check(P_invalid, R_valid) # Raises an exception Traceback (most recent call last): ... StochasticError:...
Check if matrix is a square and row-stochastic.
To pass the check the following conditions must be met:
- The matrix should be square, so the number of columns equals the number of rows.
- The matrix should be row-stochastic so the rows should sum to one.
- Each value in the matrix must be positive.
If the check does not pass then a mdptoolbox.util.Invalid
Parameters: matrix (numpy.ndarray, scipy.sparse.*_matrix) – A two dimensional array (matrix).
Returns None if no error has been detected, else it raises an error.
Return the span of array
span(array) = max array(s) - min array(s)
Check that matrix is row non-negative.
Returns: is_stochastic – True if matrix is non-negative, False otherwise. Return type: bool
Check that matrix is square.
Returns: is_square – True if matrix is square, False otherwise. Return type: bool
Check that matrix is row stochastic.
Returns: is_stochastic – True if matrix is row stochastic, False otherwise. Return type: bool