Introduction

A magnetic resonance imaging (MRI) scan of the brain presents an important application of image segmentation techniques. For example, an accurately segmented scan can assist in the diagnosis of brain disease and in the preparation of surgery. The three main tissue classes that are normally segmentated in a scan are white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF). Facilitating segmentation requires high contrasts between these tissue types. To generate improved tissue contrasts requires varying parameters of the pulse sequences responsible for causing the hydrogen atoms contained in the body to emit radio frequencies when subjected to a magnetic field. The contrasts achieved through MRI scans is an improvement over computed tomography. However, results still suffer from the following obstacles to segmentation

  1. Electronic noise – disturbance in the electrical signal
  2. A bias field – An unwanted artifact producing inhomogenous intensities of tissue regions
  3. The partial volume effect – overlap of tissue classes within a voxel

The paper Segmentation of Brain MR Images Through a Hidden Markov Random Field Model and the Expectation-Maximization Algorith by Yongyue Zhang et al. provides justification for a Hidden Markov Random Field (HMRF) model together with an Expectation Maximization (EM) method of iterative parameter updating, for a framework that incorperates the spatial dependencies innate to three dimensional scans. It will be demonstrated that this is an improvement over the classical approach using a finite mixture (FM) model.

In this project we adopt this framework to segment an MRI scan obtained from the mritc package. The success of this approach offers a means for automatic brain MR image segmentation.

NIfTI data format

The Neuroimaging Informatics Technology Initiative (NIfTI) format is an improvement on the ANALYZE format, retaining a header/image combination of data but also allowing for storage of auxillary information inside the file. The readMRI function from the mritc package can be used to obtain a three dimensional array with the appropriate image dimensions.

An image pixel can be understood as a sample of some original image we would like to display on a screen. If we could then imagine a volumetric display (giving a three dimensional representation of an object), then what is called a voxel extends the concept as a sample from some three dimensional object (as shown on the right). Each voxel also contains a value; in our case this value represents the signal intensity, which ranges from 0 to 250.

Source for figure to the right: https://commons.wikimedia.org/wiki/File:Voxels.svg

# Read in NIfTI data
T1 <- readMRI(system.file("extdata/t1.rawb.gz", package="mritc"),
              c(91, 109, 91), format="rawb.gz")
# Read in the mask for non-brain matter
mask <- readMRI(system.file("extdata/mask.rawb.gz", package="mritc"),
                c(91, 109, 91), format="rawb.gz")

Constructing a Markov Random Field

To make inferences on the data (comprising the entire scan) we consider a three dimensional image graph denoted \(\mathcal{G} = (\mathcal{V}, \mathcal{E})\), where \(\mathcal{V}\) is the set of nodes corresponding to \(N = 91 \times 109 \times 91\) voxels making up the three dimensional image, and \(\mathcal{E}\) corresponds to the undirected edges connecting the voxels. As mentioned earlier, the set of hidden classes attributable to each node is \(\mathcal{S} =\{WM, GM, CSF \}\), which are to be estimated from the signal intensities. Below is a density plots of the intensities taken from three different levels of depth into the saggital plane. The binary mask removes non-brain matter from the data.

Heuristically, we assume the three modes present in the above density plots correspond to our three classes. CSF often attains lower values, GM tends to land somehwere in the middle, and WM most often have the highest intensities. From this information a mixture of three Gaussian distributions seems like an appropriate model. The limitation of this approach rests with the fact that there is no spacial information accounted for.

To incorperate the greater likelihood of neighbouring voxels to share a class label, an edge \((i,j) \in \mathcal{E}\) must be accompanied by a bias that encodes such a tendency between voxels. The Markov model accounts only for the dependencies between voxels sharing a common edge and does not include higher order dependencies among neighbours. That is, for voxel \(i\) define the set of its neighbours to be \(\mathcal{N_i} = \{j: (i,j) \in \mathcal{E} \}\).

Source for figure to the right: Photogrammetric Engineering & Remote Sensing, Volume 84, Number 6

The sequence of voxel labels, denoted \(\{X_1, \dots, X_{N}\}\) is said to be a Markov Random Field (MRF) on the state space \(\mathcal{S}\) with respect to \(\mathcal{N}\), if and only if it has the following properties:

  1. \(P(\mathbf{x}) > 0\), for all \(\mathbf{x}\)
  2. The \(X_i\)’s are dependent only on their neighboring sites \(\mathcal{N_i}\) for all \(i\)

The Hammersley-Clifford theorem states that a MRF with the properties listed above is equivalently a Gibbs random field – that is, we may express the MRF as a Gibbs distribution: \[ P(\mathbf{x}) = \frac{1}{Z}exp\bigg\{-\sum_{c\in\mathcal{C}}\Psi_c(\mathbf{x}) \bigg\} \] where \(Z\) represents the normalizing constant (or partition function) to make the distribution sum to unity, and \(E(\mathbf{x}) = \sum_{c \in \mathcal{C}}\Psi_c(\mathbf{x})\) is known as an energy function. Here \(\mathcal{C}\) denotes the collection of cliques in the graph \(\mathcal{G}\). A clique \(c\) is a subgraph of \(\mathcal{G}\) that is fully connected – simply meaning each distinct pair of voxel sites (\(i, j \, : i\neq j\)) are neighbouring (\(i \in \mathcal{N_j} \wedge j \in \mathcal{N_i}\)).

The value of the clique potential \(\Psi_c(\mathbf{x})\) should decrease \(E(\mathbf{x})\) when two neighbouring voxels have similar intensities, and thus increase the joint probability \(P(\mathbf{x})\).

A Special Case of a Hidden Markov Model

Without the spacial property incorperated by the definition of the neighbourhood system \(\{\mathcal{N_i}, i = 1, 2, \dots, N\}\) the sequence of voxel labels can be characterized by a first order Markov chain whose state space is hidden and cannot be observed. This implies a proper hidden Markov model (HMM). However, to model the spatial property of a two or three dimensional image, we assume the generating stochastic process is the more general Markov random field; as mentioned in the previous section.

In addition to the hidden random field \(\{X_i, i = 1, 2, \dots, N \}\) we must model the observable random field \(Y = \{Y_i, i = 1,2, \dots, N \}\) having the finite state space \(\mathcal{D} = \{0,1, \dots, 250\}\) representing the signal intensities attributable to each voxel. We call \(Y\) the emitted random field, where \(Y_i \perp Y_j \, | \, \mathbf{x}\) for \(i \neq j\).

Given a sequence of class labels \(\{x_i \}\) the likelihood of each observed intensity value has the conditional probability distribution \(p(y_i \, | \, x_i, \theta_s)\), with \(\theta_s\) dependent on the states \(s \in \mathcal{S}\). Accounting for the local dependencies within neighbourhoods \(\mathcal{N}_i\) from the MRF, each pair \((X_i, Y_i)\) has the joint probability

\[ P(y_i , x_i \, | \, x_{\mathcal{N}_i}) = P(y_i \, | x_i)P(x_i \, | \, x_{\mathcal{N}_i}) \] by applying Bayes’ rule (Zhang et al. 2001, eq (8)). Thus, by marginalizing over the states \(s \in \mathcal{S}\) obtain \[ \begin{aligned} p(y_i \, | \, x_{\mathcal{N}_i}, \, \theta) &= \sum_{s \in \mathcal{S}}p(y_i, s \, | \, x_{\mathcal{N}_i}, \theta) \\ &= \sum_{s \in \mathcal{S}} f(y_i; \theta_s)p(s \, | x_{\mathcal{N}_i}) \end{aligned} \] where \(\theta = \{\theta_s, s \in \mathcal{S} \}\). Where Zhang (2001) defines this to be the hidden Markov random field model. We assume that the intensity values are independent and normally distributed with parameters \(\theta_s = (\mu_s, \sigma_s)^T\) depending on the tissue class. That is,

\[ p(\mathbf{y} \, | \, \mathbf{x}_{\mathcal{N}}, \, \theta) = \prod_{i=1}^N\sum_{s\in \mathcal{S}}\mathcal{N}(y_i \, ; \, \mu_s, \sigma_s)p(s \, |\, x_{\mathcal{N}_i}) \]

It can be seen that if we had assumed that the tissue types were independent of each other, the equation above reduces to the finite normal mixture model – thus highlighting the key difference between the HMRF model and the FM model. The parameters for an FM model can be learned through a standard EM approach, which we shall compare.

Tissue Classification using Iterated Conditional Modes

Class labels \(\mathbf{x}\) will be estimated via the maximum a-posteriori criterion. That is, \[ \begin{aligned} \hat{\mathbf{x}} &= \text{arg}\max_{\mathbf{x}} \{P(\mathbf{y} \, | \, \mathbf{x})P(\mathbf{x}) \} \\ &= \text{arg}\max_{\mathbf{x}} \bigg\{\prod_{i=1}^N\mathcal{N}(y_i \, ; \, \mu_s, \sigma_s) \, \big[e^{-E(\mathbf{x})} \big] \bigg\} \\ &= \text{arg}\min_{\mathbf{x}}\bigg\{\sum_{i=1}^N\bigg[\frac{(y_i-\mu_{x_i})^2}{2\sigma^2_{x_i}} + log \, \sigma_{x_i} \bigg] + \sum_{c\in\mathcal{C}}\Psi_c(\mathbf{x})\bigg\} \end{aligned} \] Multiple authors have suggested using the iterated condition modes (ICM) algorithm, which takes turns maximizing the probability of each parameter conditioned on the others. It has the advantage of fast convergence (only requiring a few iterations), but can get trapped at local maxima (or minima in this case).

Incorperating Partial Volume Effect

When considering the MRF model \(p(\mathbf{x})\) we may decide to include the realistic assumption that not all voxels are homogenous. It is possible that multiple tissue classes occupy the same voxel; this issue is known as the partial volume effect. Suppose we introduce extra classes CSF/GM with \(\mu_{CSF/GM}=\frac{1}{2}(\mu_{CSF} + \mu_{GM})\), \(\sigma^2_{CSF/GM}=\frac{1}{2}(\sigma^2_{CSF} + \sigma^2_{GM})\) and GM/WM with parameters defined similairly. The partial volume HMRF-EM model can be used with mritc.pvhmrfem(...) from the mritc package.

Segmentation with Bias Field Correction

An MRI machine generates a magnetic field in order to cause hydrogen atoms within the brain to emit a radio frequency. These desired frequencies can be corrupted by another low frequency artifact known as a bias field. It is generally considered to be noise caused by equipment – thus scans produced by older machines are more suspect. An automatic correction is available through mritc.hmrf with bias = TRUE.

  1. Expectation Step
  1. Maximization Step

Source for figure on the right: A New Multistage Medical Segmentation Method Based on Superpixel and Fuzzy Clustering

Experiments

In the following implementations we adopt the pure voxel assumption. That is, we do not consider the partial volume effect. This is because we only have data containing the true labels for three classes, and therefore cannot produce a confusion table assuming the extra two combination classes discussed earlier.

Initializing Parameters with Otsu’s Method

Both the EM (for model parameters) and ICM (for label estiamtion) algorithms are not guaranteed to converge globally. Thus, the choice of initial conditions can determine the success of the segmentation.

Otsu’s method of global thresholding can be used to determine estimates of the means and variances for each class from the grayscale histogram. For example, beginning with CSF, the algorithm assumes that two classes exist (CSF and everything else). There are 250 bins of the normalized histogram \(h(x)\), which will be used to estimate the true density \(p(x)\). To find an optimal threshold \(T\), we must minimize the intra-class variance, i.e. \[ \text{arg}\min_T \, w_{0,T}\sigma_0^2 + w_{1,T}\sigma_1^2 \] where \(w_{0,T}\), \(w_{1,T}\) are the class probabilities, and \(\sigma_0^2\), \(\sigma_1^2\) are the class variances. Upon convergence we will have an estimate of the mean and variance for the voxel intensities corresponding to CSF. The process is repeated for the other two tissue types.

y <- T1[mask==1]
initial <- initOtsu(y, 2)
prop <- initial$prop
mu <- initial$mu
sigma <- initial$sigma

The estimates for initial parameters can be fit to the distribution of signal intensities as shown above. By inspection they appear to be quite reasonable.

Model Fitting and Evaluation

In this section we compare two proposed models. The FM model, which again does not account for any spacial dependencies between voxels, can be fitted using the EM algorithm with our initialized determined in the previous section by Otsu’s method.

tc.em <- mritc.em(y, prop=prop, mu=mu, sigma=sigma, verbose = FALSE)

The HMRF-EM model is then fitted using a neighbourhood structure consisting of six nodes. The makeMRIspatial function from the mritc package can be used to get various spatial features for the data to be used as inputs for the tc.hmrfem function.

mrispatial <- makeMRIspatial(mask, nnei = 6, sub = FALSE)

This creates parameters for the neighbourhood structure and the blocks; capturing the conditional independence of voxels given the voxels of another block.

tc.hmrfem <- mritc.hmrfem(y, mrispatial$neighbors, mrispatial$blocks, mu=mu, sigma=sigma, verbose = FALSE)

Next, we use create the matrix of true proportions of each class per voxel for evaluation using the measureMRI function. As an input the truth matrix will have rows corresponding to each voxel, and columns giving the actual probability that a voxel is of each tissue type.

csf <- readMRI(system.file("extdata/csf.rawb.gz", package = "mritc"),
 c(91, 109, 91), format = "rawb.gz")
gm <- readMRI(system.file("extdata/gm.rawb.gz", package = "mritc"),
 c(91, 109, 91), format = "rawb.gz")
wm <- readMRI(system.file("extdata/wm.rawb.gz", package = "mritc"),
 c(91, 109, 91), format = "rawb.gz")

truth <- cbind(csf[mask == 1], gm[mask == 1], wm[mask == 1]) # ture proportions
truth <- truth/255 # matrix of true classification results

The measureMRI function will output density plots comparing the actual density against the estimated for each tissue class with labels (1) for CSF, (2) for GM, and (3) for WM.

res.hmrf <- measureMRI(T1[mask == 1], truth, tc.hmrfem$prob)
res.em <- measureMRI(T1[mask == 1], truth, tc.em$prob)

The predicted results appear best for the HMRF-EM method across all tissue types. The EM estimates obtained with the FM model are most innacurate at distinguishing between WM and GM. This can be explained by the fact that CSF and WM do not often appear adjacent, and so are more easily separable. To improve the distinction between WM and GM, occuring frequently together, it is necessary to incorperate this spatial relationship. This improvement is evidenced in the superior performance of the HMRF-EM results. To further support this, we can look at the mis-classification rates (MCR), and a confusion matrix.

The confusion matrix for the FM model shows decent performance in properly identifying CSF and GM with a classification accruacy of 90% and 96% respectively. Again, we can see that the limitations of the FM model account for the lower accuracy given for WM classification (79%). The confusion matrix on the left gives the improved results from the HMRF-EM framework. The MCR for the FM model and the HMRF model are 0.112 and 0.087 respectively.

If the cost of mis-classification for one tissue type is greater than another, then accuracy is not an ideal performance metric. Depending on the goals of the analysis, a scan may be taken with adjustments to the machine’s parameters which better separate specific signal intensities. It is also worth noting that the MCR compares the voxel classifications to the discrete anatomic model which introduces rounding error. In this sense, any partial volume effects are not accounted for.

Example Segmentation Results

The following plots are an example of a complete segmentation. In each plane we have green pixels marking CSF, red pixels marking GM, and yellow pixels marking WM. To view an interactive visualization, allowing one to adjust the depth of each plane see my Interactive Result Plots made with ShinyR.

Transverse Plane

Saggital Plane

Coronal Plane

Conclusion

The framework implemented in this project is based on the work by Zhang et al. (2001). The research has demonstrated that the HMRF model is more robust, and an overall improvement over the popular FM model. We also addressed specific problems in obtaining an accurate segmentation, including: partial volume effect and presence of a bias field. The mritc package was used throughout and contains numerous functions to handle specific data objectives. Furthermore, because MRI data can be extremely large in size, the package performs some of its heavy computations with C code and the OpenMP framework for shared memory parallel computing; making it very efficient. Thus, the HMRF model provides an improved automatic approach to brain MRI segmentation, offering more accurate tissue classification over more traditional models.

 

A work by Derek Wayne

derekwayne4@gmail.com