Reliability

Reliability slide ppt

Reliability

Presented by Third Group:

Dewi Mariani
Kurrotul Ainiyah
Rahmah Hidayatul Amini
Ringe Ringe Preshqoury Limantain

Definition of Reliability

Reliability is the degree to which a test consistently measures whatever it measures. Reliability is the extent to which an experiment, test, or any measuring procedures show the same result on repeated trials. Without the agreement of the independent observers able to replicate research procedures, or the ability to use research tools and procedures that produce consistent measurements.

The Five Types Of Reliability

  • Equivalency
  • Stability
  • Internal Consistency
  • Inter-Rater
  • Intra-Rater

Equivalency is the extent to which two items measure identical concepts at an identical level of difficulty. Equivalency is is determined by relating two sets of test scores to one another to highlight the degree.
Stability is the agreement of measuring instruments over time. To determine stability, a measure or test is repeated on the same subjects at a future date. Result are compared and correlated with the initial test to give a measure of stability.
Internal Consistency is the extent to which tests or procedures assess the same characteristic, skill or quality. It is a measure of the precision between the measuring instruments used in a study.
Inter-Rater Reliability is the extent to which two or more individuals (orders or raters agree. Inter-Rater reliability assesses the consistency of how measuring system is implemented.
Intra-Rater Reliability is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions.

How to make tests more reliable

Take enough samples of behavior.
Other things being equal, the more items that you have on a test, the more reliable that test will be. This seems intuitively right. If we wanted to know how good an archer someone was, we wouldn’t rely on the evidence of a single shot at the target.

Exclude items which do not discriminate well be between weaker and stronger students.
Items on which strong students and weak students perfom with similiar degrees of success contribute little to the reliability of a test.

Do not allow candidates too much freedom.
In some kinds of language test there is a tendency to offer candidates a choice of questions and then to allow them a great deal of freedom in the way that they answer the ones that they have chosen.

Write unambiguous items.
It is essential that candidates should not be presented with items whose meaning is not clear ot to which there is an acceptable answer which the test writer has not anticipated.

Provide clear and explicit instructions.
This applies both to written and oral instructions. If it is possible for candidates to misinterpret what they are asked to do, then on some occasions some of them certainly will.

Ensure that test are will laid out and perfectly legible.
Too often, institutional tests are badly typed for, have too much text in too small a space, and are poorly reproduced. As a result, students are faced with additional tasks which are not ones meat to measure their language ability.

Make candidates familiar with format and testing techniques.
If any aspect of a test is unfamiliar to candidates, they are likely to perform less well than they would do otherwise. On subsequently taking a parallel version.

Provide uniform and non-distracting conditions of administration.
The greater the differences between one administration of a test and another, the greater the differences one can expect between a candidates performance on two occasions.

Use items that permit scoring which is as objective as possible.
This may appear to be a recommendation to use multiple choice items, which permit completely objective scoring.

Make comparisons between candidates as direct possible.
This reinforces the suggestion already made that candidates should not be given a choice of items and that they should be limited in the way that they are allowed to respond.

Provide a detailed scoring key.
This should specify acceptable answers and assign points for acceptable partially correct responses.

Train scores.
This is especially important where scoring is most subjective.

Agree acceptable responses and appropriate scores at outset of scaring.
A sample of scripts should be taken immediately after the administration of the rest.

Identify candidates by number, not name.
Scores inevitably have expectations of candidates that they know,

Employ multiple, independent scoring.
As a general rule, and certainly where testing is subjective, all scripts should be scored by at least two independent scorers.

Download Slide PowerPoint version: Reliability

Note: This material can be the result of students’ summarizing, paraphrasing etc. from references, mainly from Hughes, A. (2003). Testing for Language Teachers (2nd ed.). Cambridge: Cambridge University Press. The book was main book used by the students for the need of discussion in English Learning Assessment class in State Islamic Institute of Palangka Raya.

Save

Save

Save

Validity

Validity

Presented by:

Anisa Rahmadhani
Ahmad Rizky Septiadi
Irfan Rinaldi Bimantara
Lydia Anggraini
Norlaila Hayani

Defination of Validity

A test is valid if it measures accurately what it is intended to measure.

Types of Validity

  • Content Validity
  • Criterion-related Validity
  • Construct Validity
  • Validity in Scoring
  • Face Validity

1. Content Validity

  • The test content is a representative sample of the language skills being tested.
  • The test is content valid if it includes a proper sample.

Importance of content validity:

  • The greater a test’s content validity, the more likely its construct validity.
  • A test without content validity is likely to have a harmful backwash effect since areas that are not tested are likely to become ignored in teaching and learning.

2. Criterion-related Validity
To degree to which result on the test agree with those provided by an independent criterion.

Kinds of criterion-related Validity
Concurrent Validity
is establised when the test and the criterion are administered at the same time.

Predictive Validity

  • Concerns the degree to which a test can predict candidates’ future performance.
  • Areas that are not tested are likely to become ignored in teaching and learning.

3. Construct Validity
The degree to which a test measures what it claims, or purports, to be measuring.

Construct: A construct is an attribute, an ability, or skill that happens in the human brain and is defined by established theories.

  • Intelligence, motivation, anxiety, proficiency, and fear are all examples of constructs.
  • They exist in theory and has been observed to exist in practice.
  • Constructs exist in the human brain and are not directly observable.
  • There are two types of construct validity : convergent and discriminant validity. Construct validity is establised by looking at numerous studies that use the test being evaluated.

4. Validity in Scoring

  • A reading test may call for short written responses.
  • If the scoring of these responses takes into account spelling and grammar, then it is not valid in scoring.

5. Face Validity

  • The way the test looks the examinees, test administrator, educators, and the like.
  • If you want to test the student in pronunciation, but you do not ask them to speak, your test lacks face validity.
  • If your test contain items or materials which are not acceptable to candidates, teachers, educators, etc., your test lacks face validity.

How to Make Tests More Valid?

  • Write explicit specifications for the test, which include all the construct to be measured.
  • Make sure that you include a representative sample of the content.
  • Use direct testing.
  • Make sure the scoring is valid.
  • Make the test reliable.

 

Download Slide PowerPoint version: Validity

Note: This material can be the result of students’ summarizing, paraphrasing etc. from references, mainly from Hughes, A. (2003). Testing for Language Teachers (2nd ed.). Cambridge: Cambridge University Press. The book was main book used by the students for the need of discussion in English Learning Assessment class in State Islamic Institute of Palangka Raya.

Save

Save

Save