Now consider a hypothetical situation in which examiners do just that, that is, assign notes by throwing a coin; heads = pass, tails = fail table 1, situation 2]. In this case, one would expect 25% (= 0.50 × 0.50) of students to receive the “failure” grade and 25% of both to get the “non-existent” grade, i.e. an overall “expected” rate of 50% (= 0.25 + 0.25 = 0.50). It is therefore necessary to interpret the observed approval rate (80% in situation 1), taking into account that 50% of approval was expected by chance. These auditors could have improved this situation by 50% (best possible concordance minus random expected agreement = 100%-50% = 50%), but only 30% (observed agreement minus random expected agreement = 80% – 50% = 30%). Their actual power is therefore 30% / 50% = 60%. But what is a kappa of 0? If two instruments or techniques are used to measure the same variable on a continuous scale, Bland Altman diagrams can be used to estimate compliance. This diagram is a diagram of the difference between the two measures (Y axis) compared to the average of the two measures (X axis). It therefore offers a graphical representation of the distortion (average difference between the two observers or techniques) with correspondence limits of 95%.

The latter are given by the formula: it is important to note that, in each of the three situations in Table 1, the percentages of success are the same for both examiners, and if the two examiners are compared to a usual test of 2 × 2 for the matched data (McNemar test), there would be no difference between their performances; on the other hand, agreement among observers varies considerably from country to country in all three situations. The fundamental approach is that “convergence” quantifies the concordance between the two examiners for each of the “pairs” of marks and not the similarity of the overall pass percentage between the examiners. Krippendorffs alpha[16][17] is a versatile statistic that evaluates the concordance between observers who classify, evaluate, or measure a certain amount of objects relative to the values of a variable. It generalizes several specialized conformity coefficients by accepting any number of observers, applicable to nominal, ordinal, interval and proportional levels, capable of processing missing data and being corrected for small sample sizes. The basic measure of reliability among evaluators is a percentage of agreement between evaluators. If you tick by chance to select a selection per question, you may have 20% correct answers, only by chance….