# Prof. Dr. Michael Nothnagel: Diploma thesis

**Classification procedures in discriminant analysis: a comparative and integrating overview**

My diploma thesis ("Klassifikationsverfahren der Diskriminanzanalyse: eine vergleichende und integrierende Übersicht", Humboldt University, Berlin, Germany, 1999) gives a comparative overview over the various methods in statistical field of **discriminant analysis**. The analysis' aim is to obtain an optimal rule for the mapping or classification of objects or subjects into one of several a-priori known groups or populations by means of their features or measurements.

**Keywords**: *discrimination, classification, supervised learning, machine learning, pattern recognition, Bayes risk, bias variance decomposition, exhaustive search, linear discriminant analysis, quadratic discriminant analysis, logistic regression, multinomial model, neural networks, classification and regression trees CART, nearest neighbours, kernel methods, support vector machines, projection pursuit, bagging, boosting, arcing, diskriminanzanalyse, klassifikation, S-Plus, humboldt-universitat, humboldt university berlin, berliner schule*

### Download

The thesis is available in **German** in the following document formats:

File | Size | Format | Software | |
---|---|---|---|---|

mnda.ps | 1.6 MB | PostScript | ||

mnda.ps.gz | 496 kB | UNIX zipped PostScript | ||

mnda.zip | 508 kB | Windows zipped PostScript | WinZip | |

mnda.pdf | 1.2 MB | |||

mndalink.pdf | 1.4 MB | PDF with links |

The file contains 149 pages and has been formatted for double-sided print.

### Extended Summary

What is discriminant analysis?

How to obtain a discriminant rule?

How well does a discriminant rule perform?

How to find the optimal discriminant rule?

**What is discriminant analysis?**

Discriminant analysis is a **set of statistical methods**. It aims to map subjects or objects into one of several groups or populations by means of their features and measurements (**classification**) or to find the essential features for such a classification. Groups or populations have to be known in advance (a-priori), thus separating discriminant analysis from group-building methods like cluster analysis. An example is the use of a statistical rule to classify patients as ill or healthy.

The thesis gives the **foundations** of discriminant analysis and presents the **main methods**, reaching from classical approaches to the newest developments (1999).

**How to obtain a discriminant rule?**

A **training set** where the class labels of the objects/subjects are known is used for the estimation of a discriminant rule erfolgt. Basically, the methods can be distinguished into two groups - parametric und non-parametric methods:

**Parametric methods** make assumptions on whether there are global parameters describing the distributions in the groups, the ratio of them, or the form of the separating hyperplane. Linear discriminant analysis and logistic discriminant analysis are examples for this group.

**Non-parametric methods** lack those global assumptions but instead impose other, rather local conditions, like local smoothness of some function or non-linear functional relationships. Artificial neural networks, nearest neighbours, classification and regression trees (CART), and support vector machines (SVM) are belong to this group.

The thesis presents options to solve **multi-group problems** and ways to combine several discriminant rules, for example **bagging**, **arching** and **voting**.

For every method, the underlying **assumptions**, **features** of the method, and its **optimality criteria** are discussed.

**How well does a discriminant rule perform?**

Theoretical results and real-world experience show that there is no single optimal method for all the situations possible. Applying discriminant analysis methods, one faces the problem of how to choose the best method for the particular case from the variety of methods available. The thesis presents several measures of goodness and their estimates to compare different rules and evaluates them. It recommends the **Bayes risk** - the weighted sum of misclassification costs of the rule - as the optimal measure to be used.

**How to find the optimal discriminant rule?**

Bias and variance of an estimate influence the Bayes risk in ways difficult to predict precisely. The effect of a bias-variance trade-off is, thus, not clear beforehand. The thesis suggests a comparison between different rules by most exact risk estimates possible via **resampling methods**, e.g. cross-validation und bootstrap.

**More options** for bias-variance trade-off are listed, like variable selection, penalty terms in optimzation problems, transformations of variables, and the use of invalid models, and discussed with regard to the particular methods.

Because the Bayes risk is difficult to tackle theoretically, the thesis recommends an **exhaustive search** over many methods and subsets of variables, including transformed and combined ones, to obtain a number of (nearly) optimal methods with similar low risk.

**Lowest risk - optimal rule?**

The means to compare different rules, the Bayes risk estimate, is itself prone to error since the sample size is limited. Thus, the decision for a particular discriminant rule should not only depend on a minimal risk. The thesis lists **other criteria**, like speed, interpretability, or cost of data needed, and suggests their application to reach a final decision of the method to use from the subset of methods with similar low risk.

Thus, the diploma thesis paths a way from the foundations of the field to questions of everyday practice, which might hopefully be useful for scientists and physicians, too.