Reliability slide ppt


Presented by Third Group:

Dewi Mariani
Kurrotul Ainiyah
Rahmah Hidayatul Amini
Ringe Ringe Preshqoury Limantain

Definition of Reliability

Reliability is the degree to which a test consistently measures whatever it measures. Reliability is the extent to which an experiment, test, or any measuring procedures show the same result on repeated trials. Without the agreement of the independent observers able to replicate research procedures, or the ability to use research tools and procedures that produce consistent measurements.

The Five Types Of Reliability

  • Equivalency
  • Stability
  • Internal Consistency
  • Inter-Rater
  • Intra-Rater

Equivalency is the extent to which two items measure identical concepts at an identical level of difficulty. Equivalency is is determined by relating two sets of test scores to one another to highlight the degree.
Stability is the agreement of measuring instruments over time. To determine stability, a measure or test is repeated on the same subjects at a future date. Result are compared and correlated with the initial test to give a measure of stability.
Internal Consistency is the extent to which tests or procedures assess the same characteristic, skill or quality. It is a measure of the precision between the measuring instruments used in a study.
Inter-Rater Reliability is the extent to which two or more individuals (orders or raters agree. Inter-Rater reliability assesses the consistency of how measuring system is implemented.
Intra-Rater Reliability is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions.

How to make tests more reliable

Take enough samples of behavior.
Other things being equal, the more items that you have on a test, the more reliable that test will be. This seems intuitively right. If we wanted to know how good an archer someone was, we wouldn’t rely on the evidence of a single shot at the target.

Exclude items which do not discriminate well be between weaker and stronger students.
Items on which strong students and weak students perfom with similiar degrees of success contribute little to the reliability of a test.

Do not allow candidates too much freedom.
In some kinds of language test there is a tendency to offer candidates a choice of questions and then to allow them a great deal of freedom in the way that they answer the ones that they have chosen.

Write unambiguous items.
It is essential that candidates should not be presented with items whose meaning is not clear ot to which there is an acceptable answer which the test writer has not anticipated.

Provide clear and explicit instructions.
This applies both to written and oral instructions. If it is possible for candidates to misinterpret what they are asked to do, then on some occasions some of them certainly will.

Ensure that test are will laid out and perfectly legible.
Too often, institutional tests are badly typed for, have too much text in too small a space, and are poorly reproduced. As a result, students are faced with additional tasks which are not ones meat to measure their language ability.

Make candidates familiar with format and testing techniques.
If any aspect of a test is unfamiliar to candidates, they are likely to perform less well than they would do otherwise. On subsequently taking a parallel version.

Provide uniform and non-distracting conditions of administration.
The greater the differences between one administration of a test and another, the greater the differences one can expect between a candidates performance on two occasions.

Use items that permit scoring which is as objective as possible.
This may appear to be a recommendation to use multiple choice items, which permit completely objective scoring.

Make comparisons between candidates as direct possible.
This reinforces the suggestion already made that candidates should not be given a choice of items and that they should be limited in the way that they are allowed to respond.

Provide a detailed scoring key.
This should specify acceptable answers and assign points for acceptable partially correct responses.

Train scores.
This is especially important where scoring is most subjective.

Agree acceptable responses and appropriate scores at outset of scaring.
A sample of scripts should be taken immediately after the administration of the rest.

Identify candidates by number, not name.
Scores inevitably have expectations of candidates that they know,

Employ multiple, independent scoring.
As a general rule, and certainly where testing is subjective, all scripts should be scored by at least two independent scorers.

Download Slide PowerPoint version: Reliability

Note: This material can be the result of students’ summarizing, paraphrasing etc. from references, mainly from Hughes, A. (2003). Testing for Language Teachers (2nd ed.). Cambridge: Cambridge University Press. The book was main book used by the students for the need of discussion in English Learning Assessment class in State Islamic Institute of Palangka Raya.