Reliability

Reliability slide ppt

Reliability

Presented by Third Group:

Dewi Mariani
Kurrotul Ainiyah
Rahmah Hidayatul Amini
Ringe Ringe Preshqoury Limantain

Definition of Reliability

Reliability is the degree to which a test consistently measures whatever it measures. Reliability is the extent to which an experiment, test, or any measuring procedures show the same result on repeated trials. Without the agreement of the independent observers able to replicate research procedures, or the ability to use research tools and procedures that produce consistent measurements.

The Five Types Of Reliability

  • Equivalency
  • Stability
  • Internal Consistency
  • Inter-Rater
  • Intra-Rater

Equivalency is the extent to which two items measure identical concepts at an identical level of difficulty. Equivalency is is determined by relating two sets of test scores to one another to highlight the degree.
Stability is the agreement of measuring instruments over time. To determine stability, a measure or test is repeated on the same subjects at a future date. Result are compared and correlated with the initial test to give a measure of stability.
Internal Consistency is the extent to which tests or procedures assess the same characteristic, skill or quality. It is a measure of the precision between the measuring instruments used in a study.
Inter-Rater Reliability is the extent to which two or more individuals (orders or raters agree. Inter-Rater reliability assesses the consistency of how measuring system is implemented.
Intra-Rater Reliability is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions.

How to make tests more reliable

Take enough samples of behavior.
Other things being equal, the more items that you have on a test, the more reliable that test will be. This seems intuitively right. If we wanted to know how good an archer someone was, we wouldn’t rely on the evidence of a single shot at the target.

Exclude items which do not discriminate well be between weaker and stronger students.
Items on which strong students and weak students perfom with similiar degrees of success contribute little to the reliability of a test.

Do not allow candidates too much freedom.
In some kinds of language test there is a tendency to offer candidates a choice of questions and then to allow them a great deal of freedom in the way that they answer the ones that they have chosen.

Write unambiguous items.
It is essential that candidates should not be presented with items whose meaning is not clear ot to which there is an acceptable answer which the test writer has not anticipated.

Provide clear and explicit instructions.
This applies both to written and oral instructions. If it is possible for candidates to misinterpret what they are asked to do, then on some occasions some of them certainly will.

Ensure that test are will laid out and perfectly legible.
Too often, institutional tests are badly typed for, have too much text in too small a space, and are poorly reproduced. As a result, students are faced with additional tasks which are not ones meat to measure their language ability.

Make candidates familiar with format and testing techniques.
If any aspect of a test is unfamiliar to candidates, they are likely to perform less well than they would do otherwise. On subsequently taking a parallel version.

Provide uniform and non-distracting conditions of administration.
The greater the differences between one administration of a test and another, the greater the differences one can expect between a candidates performance on two occasions.

Use items that permit scoring which is as objective as possible.
This may appear to be a recommendation to use multiple choice items, which permit completely objective scoring.

Make comparisons between candidates as direct possible.
This reinforces the suggestion already made that candidates should not be given a choice of items and that they should be limited in the way that they are allowed to respond.

Provide a detailed scoring key.
This should specify acceptable answers and assign points for acceptable partially correct responses.

Train scores.
This is especially important where scoring is most subjective.

Agree acceptable responses and appropriate scores at outset of scaring.
A sample of scripts should be taken immediately after the administration of the rest.

Identify candidates by number, not name.
Scores inevitably have expectations of candidates that they know,

Employ multiple, independent scoring.
As a general rule, and certainly where testing is subjective, all scripts should be scored by at least two independent scorers.

Download Slide PowerPoint version: Reliability

Note: This material can be the result of students’ summarizing, paraphrasing etc. from references, mainly from Hughes, A. (2003). Testing for Language Teachers (2nd ed.). Cambridge: Cambridge University Press. The book was main book used by the students for the need of discussion in English Learning Assessment class in State Islamic Institute of Palangka Raya.

Save

Save

Save

Validity

Validity

Presented by:

Anisa Rahmadhani
Ahmad Rizky Septiadi
Irfan Rinaldi Bimantara
Lydia Anggraini
Norlaila Hayani

Defination of Validity

A test is valid if it measures accurately what it is intended to measure.

Types of Validity

  • Content Validity
  • Criterion-related Validity
  • Construct Validity
  • Validity in Scoring
  • Face Validity

1. Content Validity

  • The test content is a representative sample of the language skills being tested.
  • The test is content valid if it includes a proper sample.

Importance of content validity:

  • The greater a test’s content validity, the more likely its construct validity.
  • A test without content validity is likely to have a harmful backwash effect since areas that are not tested are likely to become ignored in teaching and learning.

2. Criterion-related Validity
To degree to which result on the test agree with those provided by an independent criterion.

Kinds of criterion-related Validity
Concurrent Validity
is establised when the test and the criterion are administered at the same time.

Predictive Validity

  • Concerns the degree to which a test can predict candidates’ future performance.
  • Areas that are not tested are likely to become ignored in teaching and learning.

3. Construct Validity
The degree to which a test measures what it claims, or purports, to be measuring.

Construct: A construct is an attribute, an ability, or skill that happens in the human brain and is defined by established theories.

  • Intelligence, motivation, anxiety, proficiency, and fear are all examples of constructs.
  • They exist in theory and has been observed to exist in practice.
  • Constructs exist in the human brain and are not directly observable.
  • There are two types of construct validity : convergent and discriminant validity. Construct validity is establised by looking at numerous studies that use the test being evaluated.

4. Validity in Scoring

  • A reading test may call for short written responses.
  • If the scoring of these responses takes into account spelling and grammar, then it is not valid in scoring.

5. Face Validity

  • The way the test looks the examinees, test administrator, educators, and the like.
  • If you want to test the student in pronunciation, but you do not ask them to speak, your test lacks face validity.
  • If your test contain items or materials which are not acceptable to candidates, teachers, educators, etc., your test lacks face validity.

How to Make Tests More Valid?

  • Write explicit specifications for the test, which include all the construct to be measured.
  • Make sure that you include a representative sample of the content.
  • Use direct testing.
  • Make sure the scoring is valid.
  • Make the test reliable.

 

Download Slide PowerPoint version: Validity

Note: This material can be the result of students’ summarizing, paraphrasing etc. from references, mainly from Hughes, A. (2003). Testing for Language Teachers (2nd ed.). Cambridge: Cambridge University Press. The book was main book used by the students for the need of discussion in English Learning Assessment class in State Islamic Institute of Palangka Raya.

Save

Save

Save

Kinds of Test and Testing

Kinds of Test and Testing

Presented by:

Ahmad Sahiba
Dewi Aprila kartika
Kurniawan Dwi H
Maulida
Septy noor amalia

Contents of slide:

  • Proficiency tests
  • Achievements tests
  • Diagnostic tests
  • Placements tests
  • Direct and indirect testing
  • Discrete point and integrative testing
  • Norm-referenced and criterion- referenced testing
  • Objective testing and subjective testing

 

Proficiency tests

  • Proficiency tests are designed to measure people’s ability in a language, regardless of any training they may have had in that language.
  • This test is based on a specification of what candidates have to be able to do in the language in order to considered proficiency.
  • This test is not based on courses that candidates may previously taken.
  • For example: TOEFL test, FCE and CPE.

Achievements tests

  • Achievement tests are directly related to language courses, their purposes being to establish how successful individual students, group of students or the courses themselves have been in achieving objectives.
  • Two kinds of this tests:
  1. Final Achievement Tests. Final Achievement Tests are those administrated at the end of course of study. They contribute to summative assessment. These tests have been referred to as syllabus-content approach which the tests’ content based directly on a detailed course syllabus or on the books and other materials used.
  2. Progress Achievement Tests. Progress Achievement Tests are intended to measure the measure the progress that students are making (Formative assessment). One way of measuring progress would be repeatedly to administer final achievement test, the increasing scores indicating the progress made.

Diagnostic tests 

  • Diagnostic tests are used to identify learners’ strengths and weaknesses. They are intended primarily to ascertain what learning still needs to take place.

Placements tests

  • Placement tests are intended to provide information that will help to place students at the stage (or in the part) of the teaching program most appropriate to their abilities.
  • Typically used to assign students to classes at different levels.

Direct and indirect testing

  • Direct test requires the candidate to perform precisely the skill that we wish to measure.
  • Indirect test attempts to measure the abilities that underlie the skills in which we are interested.
  • The main problem with indirect tests is that the relationship between performance on them and performance of the skills in which we are usually more interested tends to be rather weak in strength and uncertain in nature.

Discrete point and integrative testing

  • Discrete point testing refers to the testing of one element at a time, item by item. For example, take the form of a series of items, each testing a particular grammatical structure.
  • Integrative testing requires the candidate to combine many language elements in the completion of a task. For example, writing composition, making notes, etc.

Norm-referenced and criterion- referenced testing

  • Norm-referenced testing relates to one candidate’s performance to that of other candidates, and not told directly what the student is capable of doing in the language.
  • For example: student A obtained score that place him in the top 10 per cent of candidates who have taken that test.
  • Criterion-referenced test is classify people according to whether or not they are able to perform some task or set of task satisfactory.
  • The tasks are set, those who perform them satisfactorily ‘pass’; those who don’t, ‘fail’.

Objective testing and subjective testing

  • Objective test is if no judgment is required on the part of the scorer. A multiple choice test, with the correct responses unambiguously identified, would be a case in point.
  • Subjective test is if judgment is called for. The less subjective the scoring, the greater agreement there will be between two different scorers.

Download Slide PowerPoint version: Kinds of Test and Testing

Note: This material can be the result of students’ summarizing, paraphrasing etc. from references, mainly from Hughes, A. (2003). Testing for Language Teachers (2nd ed.). Cambridge: Cambridge University Press. The book was main book used by the students for the need of discussion in English Learning Assessment class in State Islamic Institute of Palangka Raya.

Save

Save

Save

Save