Skip to main content

Calculating Binary Measures

One big disadvantage of Minitab is that, unlike SPSS, it does not provide Binary similarity measures. However, for a small number of cases it is feasible to do the calculations by hand.

The key to understanding Binary measures is the matrix of agreement. One of these must be calculated for each pair of cases and it has four numbers labelled a, b, c and d where:

You may also need n, the number of variables

For example, using the SAQ data file ( I have re-labelled the variables V1-V5 to avoid confusion with the matrix of agreement names a - d).

Case	V1	V2	V3	V4	V5
1 1 1 1 0 1
2 1 1 1 0 0
3 1 0 0 1 1
4 1 0 1 1 1
5 0 0 1 1 0

In the following examples I have labelled each variable with a, b, c or d as appropriate.

Case 1 v case 2.

Case	V1	V2	V3	V4	V5
1 1 1 1 0 1
2 1 1 1 0 0
a a a d b

Therefore a = 3, b = 1, c = 0, d = 1 and n = 5

Case 2 v case 3

Case	V1	V2	V3	V4	V5
2 1 1 1 0 0
3 1 0 0 1 1
a b b c c

Therefore a = 1, b = 2, c = 2, d = 0 and n = 5

Once you have the frequencies for a, b, c and d you substitute them into the appropriate formula for a distance measure.

The simple matching coefficient is (a + d)/n, therefore

The Dice coefficient is 2a/(2a + b + c), therefore

The Pattern Difference coefficient is bc/(n2), therefore

Calculating and viewing distance matrices in Minitab

Minitab can calculate, store and display distance matrices should you wish to see them although it is not something that I have ever found particularly informative. The most important part of a Cluster Analysis, and the only part that you need for interpretation is the Dendrogram.

However, should you wish to see the distance matrix this is what you need to do.

These explanations use the SAQ interval variable data. It is assumed that C1 is Case, C2 is x and C3 is y.

Case	x	y
1 1 2
2 3 2
3 6 6
4 10 7
5 8 8

First select the analysis options. In this example the Linkae Method is "Complete" and the Distance Measure is "Euclidean". Finally, don't forget to tick the Show Dendrogram option to see the dendrogram!

Minitab CLuster Observation Command Window

If you wish to store the distance matrix you must click the Storage button on the Cluster Observations command window. This opens up another window and you enter a matrix name into the appropriate place (labelled "Distance Matrix"). Note that matrices have names consisting of the letter M (or m) followed by a number, e.g. m1, M4, m101, etc. In the following example I have used the name M1.

Minitab Cluster Observation Storage options window

Now you need to find a way of displaying the distance matrix. The last command from the Data menu is "Display Data".

Minitab Data drop-down menu

If you select this another window appears and you select M1 as the data you wish to display.

Minitab Display Data Command Window

The output from this command appears in the Minitab Session window.

MTB > Print M1.

Data Display

Matrix M1

 0.000000 2.000000 6.403124 10.295630 9.219544 
2.000000 0.000000 5.000000 8.602325 7.810250
6.403124 5.000000 0.000000 4.123106 2.828427
10.295630 8.602325 4.123106 0.000000 2.236068
9.219544 7.810250 2.828427 2.236068 0.000000

MTB >

You can compare this to the results from the SAQ for Euclidean Distances

Case

1

2

3

4

5

1

0.0

2

2.0

0.0

3

6.4

5.0

0.0

4

10.3

8.6

4.1

0.0

5

9.2

7.8

2.8

2.2

0.0