Creating, viewing and sampling a Bayesian network

This demonstration illustrates the creation, viewing and sampling processes of the example Grass Bayesian network. 'Variable' and 'node' are used interchangeably.

Contents

Creating the graph

The first component of a Bayesian network is its directed acyclic graph (DAG). We create the graph of the Grass network as a sparse matrix. A nonzero element (i, j) means that node i is a parent of node j.

node = struct('Cloudy', 1, ...
              'Sprinkler', 2, ...
              'Rain', 3, ...
              'WetGrass', 4);

G = sparse(4, 4);
G(node.Cloudy, node.Sprinkler) = 1;
G(node.Cloudy, node.Rain) = 1;
G([node.Rain node.Sprinkler], node.WetGrass) = 1;

Creating the number of levels vector

An important aspect of categorical variables is the number of levels (values) they take. We create a vector with the number of levels of the network nodes. Since all nodes of the Grass network are binary, all number of levels are 2.

% create the number of levels vector
numLevels = [2 2 2 2];

Creating the conditional probability tables

The second component of a Bayesian network are the parameters of the nodes. In the case of Categorical Bayesian Networks, these are a conditional probability table (CPT) for each of the nodes. We create a cell array of ND-arrays. Each ND-array specifies the CPT of a node. The first dimension of a ND-array corresponds to the node and the rest dimensions to the parents of the node: For example, Param{4}(2, 2, 1) is the probability of the 4th node taking its 2nd level, given that its 1st parent takes its 2nd level and its 2nd parent takes its 1st level. The sums along the 1st dimension must all be 1. The size of each dimension equals the number of levels of the corresponding node (except in the case of root nodes, where the CPT is a column vector so the second dimension has always size 1 and does not correspond to a node; remember that in MATLAB the minimum number of dimensions of an array is 2). Since all nodes of the Grass network are binary, all CPT dimensions (except the second one of the Cloudy root node's CPT) have size 2.

% create the CPTs
Param{node.Cloudy} = [0.5; 0.5];
Param{node.Sprinkler} = reshape([0.5 0.5 0.9 0.1], [2 2]);
Param{node.Rain} = reshape([0.8 0.2 0.2 0.8], [2 2]);
%Param{node.WetGrass} = reshape([0.01 0.99 1.0 0.0 0.1 0.9 0.1 0.9], [2 2 2]);
Param{node.WetGrass} = reshape([1.0 0.0 0.9 0.1  0.1 0.9 0.01 0.99], [2 2 2]);

% demonstrate for each node that the sums along the last dimension of its
% CPT are all 1
reshape(sum(Param{node.Cloudy}, 1), 1, [])
reshape(sum(Param{node.Sprinkler}, 1), 1, [])
reshape(sum(Param{node.Rain}, 1), 1, [])
reshape(sum(Param{node.WetGrass}, 1), 1, [])
ans =

     1


ans =

     1     1


ans =

     1     1


ans =

     1     1     1     1

Creating annotations for the Bayesian network

Next we create a cell array of strings containing the names of our variables and a cell array of cell arrays of strings containing the labels of the levels of each variable. The variable names will appear when we visualize the network using the view method. The variable names and the level labels will also be used in the datasets sampled from the network using the randomds method. Consult MATLAB® Statistics Toolbox (TM) documentation to learn more about levels and labels of categorical arrays and datasets. Finally we create a string with a description for the network.

% create variable names
VarNames = fieldnames(node)';

% create level labels
Labels = {{'false', 'true'}, ...
          {'false', 'true'}, ...
          {'false', 'true'}, ...
          {'false', 'true'}};

% create description
description = ['Example from Russell and Norvig, "Artificial' ...
               'Intelligence: a Modern Approach", Prentice Hall, ' ...
               '1995, page 454.'];

Creating the Bayesian network

We create an object bnet of the org.mensxmachina.pgm.CategoricalBayesianNetwork class by providing the graph G, the number of levels numLevels and the node parameters Param as input arguments and the variable names, level labels and description as parameter name / value pairs to the constructor of the class. We also provide a 'ClassNames' parameter declaring that all of our variables are nominal. Consequently, datasets sampled using randomds will consist of only nominal variables.

% construct the network object
bnet = org.mensxmachina.pgm.CategoricalBayesianNetwork(...
    G, numLevels, Param, 'VarNames', VarNames, 'Labels', Labels, ...
    'ClassNames', 'nominal', 'Description', description);

Visualizing the Bayesian network graph

We visualize the Bayesian network graph using the view method. Notice that the nodes are labelled with the variable names we supplied.

% view the network graph
bnet.view();

Sampling the Bayesian network

We create a dataset of 10 samples from the network, using the randomds method. Notice that the variable names and level labels are the ones we supplied.

% sample the network
a = bnet.randomds(10)
Sampling...

Creating column #1, 'Cloudy'...

Creating column #3, 'Rain'...

Creating column #2, 'Sprinkler'...

Creating column #4, 'WetGrass'...

a = 

    Cloudy    Sprinkler    Rain     WetGrass
    true      false        true     true    
    false     false        true     true    
    false     true         false    false   
    true      false        true     true    
    false     false        false    false   
    false     true         false    false   
    true      false        true     true    
    true      false        true     true    
    true      false        true     true    
    true      false        false    false