Statistical values
  • 29 Mar 2024
  • Pdf

Statistical values

  • Pdf

Article Summary

This article contains a number of statistical values that are used on the Analysis page. See also the article Analysis.

This article contains a description of a number of statistical values that appear in various places in RemindoTest. These values provide information about the quality of administered items, the test (matrix) as a whole and about the scores of candidate groups.

Item level

P′ value

The P′-value (pronounced: P-accent-value) shows how the average obtained score compares to the maximum achievable score, and is therefore an indication of the level of difficulty of an item. A low P′ value indicates a poor score on this item; the item may be too difficult or irrelevant for the test. A high P′ value indicates that this item has been scored well; the item may be too easy. The P′ value is between 0 and 1. It is possible to indicate in the test environment via Settings > Manage general settings > Demand analysis notification settings at which values a warning should be issued.

The P' value of a question can be found in the management environment via Questions > Statistics tab. In the administration environment it can be found under Results > Analyzes > tab Details of the result and tab Analyze questions.

Example

For an item with a maximum score of 10, the candidates achieve scores of 5.5, 7, 9.5, 6.5 and 3. The P′-value is then 0.63 = ((5.5+7+9.5+6.5+3)/5) / 10.

Attention!

RemindoTest only shows the P′-value, not the P-value, which shows the percentage of correct answers. The P value only works for dichotomous questions where an answer is either wrong or right. The P′-value indicates the proportion good of the question and thus gives a more nuanced picture than the P-value.


Rit/Rir value

The Rit value (“Rit” for Relation Item Test) indicates the correlation (coherence) between the scores on this item and the scores on all items (from the test as a whole). A low Rit value indicates that scores on this item are not correlated with scores on the test as a whole; this item may not belong in this key. However, the higher the Rit value, the better the item fits within the test, and the more qualitative the test as a whole. The Rit value is between -1 and 1. The Rit value has the disadvantage that the item in question also occurs in all items, and therefore correlates with herself. This effect increases as the test contains fewer items.

To avoid this problem, the Rir value is also used (“Rir” for Relation Item Rest), where the correlation is calculated based on the scores on this item with the scores on the other items within the test. Because the correlation of the item with itself disappears (this is always 1), the Rir value is never higher than the Rit value.

The Rit and Rir values of a question can be found in the management environment: Questions > Statistics tab. They can be found in the administration environment under Results > Analyzes > Result details tab and Analyze questions tab.


A value

The A value (“A” for distractor) is the percentage of candidates who chose this distractor. The A value is between 0 and 1. The sum of all A values is always 1. If this item consists of one interaction with one correct distractor, then the A value of this correct distractor is equal to the P value of the item (not the P′ value). Since distractors are only defined for interactions, and not for items, A-values are therefore only defined for interactions.

The A value of a question can be found in the management environment: Questions > Statistics tab.


Rat/Rar value

The Rat and Rar values (“a” for distractor) are related to the Rit and Rir values (“i” for item), but do not work at the item level, but at the distractor level. These values show the extent to which the choice of a particular distractor correlates with the candidate's score for the test as a whole. If the Rat or Rar value for an incorrect distractor is higher than the value for a correct distractor, the distractor may be correct after all, or it may be a trick question.

The Rat and Rar values of a question can be found in the management environment: Questions > Statistics tab.


Mean (µ)

This is the average of all candidates' scores on an item.
The average of a question can be found in the administration environment under Results > Analyzes > tab Analyze questions.

Standard deviation (σ).

The standard deviation, shown as STDEV (English: STandard DEViation), represents how far apart candidates' achieved scores are on a question. The smaller the standard deviation, the closer the scores are to each other. With a standard deviation of 0, every candidate achieved the same score. A large standard deviation indicates a large separation between good and bad scores, and thus strong discriminatory power. Furthermore, the higher the maximum attainable score of an item, the higher the standard deviation can also be. The standard deviation is at least 0.

The standard deviation of a question can be found in the collection environment under Results > Analyses > Result Details tab and Analyze Questions tab.


Test Level

Cronbach's alpha / reliability

Cronbach's alpha is a measure of the internal correlation between scores on items, and can be thought of as a test-level Rir value. The minimum value of Chronbach's alpha can be less than 0 (in fact, even minus infinity), the maximum value of Chronbach's alpha is 1, Cronbach's alpha reflects the reliability of a test: the higher the value (the closer to 1), the more reliable. However, too high a value (greater than 0.90) may be an indication that the same construct is measured too many times (redundancy). A low value (below 0.50) is considered unacceptable. Cronbach's alpha is between 0 and 1.

Cronbach's alpha can also be calculated in the administration environment for a test matrix (via Tests > [desired test matrix] > Statistics tab based on the data of previously taken items. Through this function it is also possible to make a specific selection for the questions to be analyzed. There you can choose to include or exclude specific test items with this blueprint for analysis. Adjustments can then be made to the blueprint based on the reliability.

Attention!

In the administration environment it is sometimes not possible to calculate the reliability. This occurs when a blueprint presents several questions to candidates so that there is not enough data collected for one or more questions. This can be solved by calculating the reliability in the administration environment and selecting more test samples there.

Chronbach's alpha can be found in the administration environment: Tests > Statistics tab and in administration environment under Results > Analyses.


Group analysis

In Results > Analyses > Group Analysis tab there are a number of specific statistical values that can only be found here, these are:

95th percentile score

The 95th percentile score is given for the entire test and possibly by item (when feedback items are set). This score indicates that 95% of the scores are lower or equal to this score.

Average board score

The average guess score is the average guess probability (percentage) across the items converted to the number of items the student can guess correctly.

Score standard deviation

The score standard deviation indicates the dispersion of the distribution - the extent to which the scores differ from one another. The score standard deviation can be found by comparing candidate groups within the group analysis.

Digit standard deviation

The numerical standard deviation indicates the dispersion of the distribution - the extent to which the scores differ among themselves. The figure standard deviation can be found by comparing within the group analysis candidate groups.

Disclaimer: This text was automatically translated from the Dutch version.


What's Next