Creating, viewing and sampling a Bayesian network
This demo illustrates the creation, viewing and sampling of the example sprinkler Bayesian network from Artificial Intelligence: A Modern Approach (1st Edition). Terms variable and node are used interchangeably.
Contents
Creating the structure
The first component of a Bayesian network is its structure, a directed acyclic graph (DAG). In MATLAB®, graphs are represented as sparse matrices. A nonzero element in the matrix denotes an edge in the graph. We create the structure of the sprinkler network. The structure has 4 nodes and edges 1->2, 1->3, 2->4 and 3->4.
% create structure
structure = sparse([1 1 2 3], [2 3 4 4], 1, 4, 4);
Creating the conditional probability distributions
The second component of a Bayesian network is the conditional probability distributions (CPDs) of the nodes given values of their parents. In general, a CPD is the probability distribution of a set of response variables given values of a set of explanatory variables. The CPDs in a Bayesian network are CPDs of a single response variable, the explanatory variables being the parents of that variable.
The variables of the sprinkler network are cloudy, sprinkler, rain, and wetGrass. All of them are binary variables taking values false and true. We represent these values by Statistics Toolbox™ categorical arrays with levels false and true.
We create a tabular CPD for each variable. Tabular CPDs are encoded as tables. For each tabular CPD, we supply the variable names, the variable values, the CPD-variable types and the values of the CPD. A CPD-variable type is either Explanatory or Response. The values of a tabular CPD are ND arrays. Each value of the ND array is the probability of the corresponding variable-value combination.
import org.mensxmachina.stats.cpd.cpdvartype; import org.mensxmachina.stats.cpd.tabular.tabcpd; % create variable values -- same for all variables varValues = nominal([1; 2], {'false', 'true'}); % create CPDs E = cpdvartype.Explanatory; R = cpdvartype.Response; cloudyCpd = tabcpd(... {'cloudy'}, ... {varValues}, ... R, ... reshape([0.5 0.5], 2, 1)) sprinklerCpd = tabcpd(... {'cloudy', 'sprinkler'}, ... {varValues, varValues}, ... [E R], ... reshape([0.5 0.5; 0.9 0.1], 2, 2)) rainCpd = tabcpd(... {'cloudy', 'rain'}, ... {varValues, varValues}, ... [E R], ... reshape([0.8 0.2; 0.2 0.8], 2, 2)) wetGrassCpd = tabcpd(... {'sprinkler', 'rain', 'wetGrass'}, ... {varValues, varValues, varValues}, ... [E E R], ... reshape([1 0.1 0.1 0.01 0 0.9 0.9 0.99], 2, 2, 2)) % put them all together cpd = {cloudyCpd, sprinklerCpd, rainCpd, wetGrassCpd};
cloudyCpd = cloudy = false cloudy = true 0.5 0.5 sprinklerCpd = sprinkler = false sprinkler = true cloudy = false 0.5 0.5 cloudy = true 0.9 0.1 rainCpd = rain = false rain = true cloudy = false 0.8 0.2 cloudy = true 0.2 0.8 wetGrassCpd = wetGrass = false wetGrass = true sprinkler = false, rain = false 1 0 sprinkler = true, rain = false 0.1 0.9 sprinkler = false, rain = true 0.1 0.9 sprinkler = true, rain = true 0.01 0.99
Creating the network
We create the sprinkler network by supplying its structure and its CPDs.
import org.mensxmachina.pgm.bn.bayesnet; % create Bayesian network BayesNet = bayesnet(structure, cpd);
Viewing the structure
Bayesian networks are viewed by Bayesian network viewers. We use a biograph-based Bayesian network viewer, which uses a Bioinformatics Toolbox™ biograph, to view the structure of the sprinkler network.
import org.mensxmachina.pgm.bn.viewers.biograph.biographbayesnetviewer; % create Bayesian network Viewer Viewer = biographbayesnetviewer(BayesNet); % view the Bayesian network structure Viewer.viewbayesnetstructure();
Sampling the network
A Bayesian network is itself a CPD and can be sampled. We sample a CPD by supplying a Statistics Toolbox™ dataset array containing values of the explanatory variables of the CPD. The sample is a dataset containing values for the response variables of the CPD.
We get a random sample with 10 observations from the sprinkler network by supplying an empty 10-by-0 dataset array, since there are no explanatory variables in Bayesian networks.
% get a random sample from the Bayesian network
D = random(BayesNet, dataset.empty(10, 0))
Sampling... Creating column #1, 'cloudy' (1 of 4, 25.00%)... Creating column #3, 'rain' (2 of 4, 50.00%)... Creating column #2, 'sprinkler' (3 of 4, 75.00%)... Creating column #4, 'wetGrass' (4 of 4, 100.00%)... D = cloudy sprinkler rain wetGrass true false true true false false false false false false false false false false false false true false true true false true false true false true true true false false false false false false false false false false false false