

This MOOC uses Weka 3.6, but recent versions of Weka 3.7 do have an implementation of a particular variant of non-linear conjugate gradient descent (in ). If missing values do occur in the data, they are imputed using ReplaceMissingValues, which is a hack but appears to work reasonably well in practice.Ĭonjugate gradient descent is an optimization method. computation of probability estimates) is trivial and very fast. no missing values), which means that exact inference (i.e. Weka's Bayesian network classifiers (which are not covered in this MOOC) assume complete data (i.e. To address your question, my understanding is that MCMC (Markov chain Monte Carlo) algorithms are used in machine learning primarily to compute approximate probability estimates from probability distributions, mainly those represented by Bayesian networks.
