The Iowa Tests
The Riverside Scoring Services is an ongoing project, so please check back with us to review new score reports and additional sections on report interpretation. Additional information regarding score report interpretation may be found in the Form A Interpretive Guide for Teacher and Counselors.
Three of the fundamental purposes for testing are (1) to identify students' areas of relative strength and weakness in subject areas, (2) to monitor year-to-year growth in the basic skills, and (3) to describe each student's developmental level within a test area. To accomplish any one of these purposes, it is important to select the type of score from among those reported that will permit the proper interpretation. Scores such as percentile ranks, grade equivalents, and standard scores differ from one another in the purposes they can serve, the precision with which they describe achievement, and the kind of information they provide. A closer look at these types of scores will help differentiate the functions they can serve and the meanings they can convey. Additional detail can be found in the Interpretive Guide for Teachers and Counselors.
Raw Score (RS)
The first unadjusted score obtained in scoring a test. A Raw Score is usually determined by tallying the number of questions answered correctly or by the sum or combination of the item scores (i.e., points). However, a raw score could also refer to any number directly obtained by the test administration (e.g., raw score derived by formula-scoring, amount of time required to perform a task, the number of errors, etc.). In individually administered tests, raw scores could also include points credited for items below the basal. Raw Scores typically have little meaning by themselves. Interpretation of Raw Scores requires additional information such as the number of items on the test, the difficulty of the test items, norm-referenced information (e.g., Percentile Ranks, Grade Equivalents, Stanines, etc.), and/or criterion-referenced information (e.g., cut-scores).
Percent Correct (PC)
The percentage of the total number of points that a student received on a test. The percent correct score is obtained by dividing the student's raw score by the total number of points possible and multiplying the result by 100. For multiple-choice tests, this is the same as dividing the student's raw score by the number of questions (i.e., each item is worth one point) and multiplying by 100. Percent Correct scores are typically used in criterion-referenced interpretations and are only helpful if the overall difficulty of the test is known.
A grade equivalent score represents the typical performance of students tested in a given month of the school year at a particular grade. For example, a grade equivalent of 5.3 represents the score achieved by the median student in fifth grade after three months of instruction.
Developmental Standard Score (SS)
Standard scores are continuous across all levels and forms of a specific test. Because they are built on equal-interval scales, the magnitude of a given difference between two scores represents the same amount of difference in performance wherever it occurs on the scale. For example, the difference between standard scores of 15 and 20 is the same as the difference between standard scores of 45 and 50.
Percentile Rank (PR)
The percentage of scores in a specified distribution that fall at or below the point of a given score. Percentile Ranks range in value from 1 to 99, and indicate the status or relative standing of an individual within a specified group (e.g., norms group), by indicating the percent of individuals in that group who obtained lower scores. For example, if a student earned a 72nd Percentile Rank in Language, this would mean he or she scored better than 72 percent of the students in a particular norm group who were administered that same test of Language. This also implies that only 28 percent (100 - 72) of the norm group scored the same or higher than this student. Note however, an individual's percentile rank can vary depending on which group is used to determine the ranking. A student is simultaneously a member of many groups: classroom, grade, building, school district, state, and nation. Test developers typically publish different sets of percentile ranks to permit schools to make the most relevant comparisons possible.
An achievement test is built to help determine how much skill or knowledge students have in a certain area. We use such tests to find out whether students know as much as we expect they should, or whether they know particular things we regard as important. By itself, the raw score from an achievement test does not indicate how much a student knows or how much skill she or he has. More information is needed to decide "how much." The test score must be compared or referenced to something in order to bring meaning to it. That "something" typically is (a) the scores other students have obtained on the test or (b) a series of detailed descriptions that tell what students at each score point know or which skills they have successfully demonstrated. These two ways of referencing a score to obtain meaning are commonly called norm-referenced and criterion-referenced score interpretations.
Norm-Referenced Interpretation
Standardized achievement batteries like the ITBS and ITED are designed mainly to provide for norm-referenced interpretations of the scores obtained from them. For this reason they are commonly called norm-referenced tests. However, the scores also permit criterion-referenced interpretations, as do the scores from most other tests. Thus, norm-referenced tests are devised to enhance norm-referenced interpretations, but they also permit criterion-referenced interpretation.
A norm-referenced interpretation involves comparing a student's score with the scores other students obtained on the same test. How much a student knows is determined by the student's standing or rank within the reference group. High standing is interpreted to mean the student knows a lot or is highly skilled, and low standing means the opposite. Obviously, the overall competence of the norm group affects the interpretation significantly. Ranking high in an unskilled group could represent lower absolute achievement than ranking low in an exceptional group.
Most of the scores on ITBS and ITED score reports are based on norm-referencing, i.e., comparing with a norm group. In the case of percentile ranks, stanines, and normal curve equivalents, the comparison is with a single group of students in a certain grade who tested at a certain time of year. These are called status scores because they show a student's position or rank within a specified group. However, in the case of grade equivalents and developmental standard scores, the comparison is with a series of reference groups. For example, the performances of students from third grade, fourth grade, fifth grade, and sixth grade are linked together to form a developmental continuum. (In reality, the scale is formed with grade groups from kindergarten up through the end of high school.) These are called developmental scores because they show the students' positions on a developmental scale. Thus, status scores depend on a single group for making comparisons and developmental scores depend on multiple groups that can be linked to form a growth scale.
An achievement battery like the ITBS or ITED is a collection of tests in several subject areas, all of which have been standardized with the same group of students. That is, the norms for all tests have been obtained from a single group of students at each grade level. This unique aspect of the achievement battery makes it possible to use the scores to determine skill areas of relative strength and weakness for individual students or class groups, and to estimate year-to-year growth. The use of a battery of tests having a common norm group enables educators to make statements such as "Suzette is better in mathematics than in reading" or "Danan has shown less growth in language skills than the typical student in his grade." If norms were not available, there would be no basis for statements like these.
Norms also allow students to be compared with other students and schools to be compared with other schools. If making these comparisons were the sole reason for using a standardized achievement battery, then the time, effort, and cost associated with testing would have to be questioned. However, such comparisons do give educators the opportunity to look at the achievement levels of students in relation to a nationally representative student group. Thus, teachers and administrators get an "external" look at the performance of their students, one that is independent of the school's own assessments of student learning. As long as our population continues to be highly mobile and students compete nationally rather than locally for educational and economic opportunities, student and school comparisons with a national norm group should be of interest to students, parents, and educators.
A common misunderstanding about the use of norms has to do with the effect of testing at different times of the year. For example, it is widely believed that students who are tested in the spring of fourth grade will score higher than those who are tested in the fall of fourth grade with the same test. In terms of grade-equivalent scores, this is true because students should have moved higher on the developmental continuum from fall to spring. But in terms of percentile ranks, this belief is false. If students have made typical progress from fall to spring of grade 4, their standing among fourth-grade students should be the same at both times of the year. (The student whose percentile rank in reading is 60 in the fall is likely to have the same percentile rank when given the same test in the spring.) The reason for this, of course, is that separate norms for fourth grade are available for the fall and the spring. Obviously, the percentile ranks would be as different as the grade equivalents if the norms for fourth grade were for the entire year, regardless of the time of testing. Those who believe students should be tested only in the spring because their scores will "look better" are misinformed about the nature of norms and their role in score interpretation.
Scores from a norm-referenced test do not tell what students know and what they do not know. They tell only how a given student's knowledge or skill compares with that of others in the norm group. Only after reviewing a detailed content outline of the test or inspecting the actual items is it possible to make interpretations about what a student knows. This caveat is not unique to norm-referenced interpretations, however. In order to use a test score to determine what a student knows, we must examine the test tasks presented to the student and then infer or generalize about what he or she knows.
Criterion-Referenced Interpretation
A criterion-referenced interpretation involves comparing a student's score with a subjective standard of performance rather than with the performance of a norm group. Deciding whether a student has mastered a skill or demonstrated minimum acceptable performance involves a criterion-referenced interpretation. Usually percent-correct scores are used and the teacher determines the score needed for mastery or for passing.
The user must establish some performance standards (criterion levels) against which comparisons can be made. For example, how many math estimation questions does a student need to answer correctly before we regard his/her performance as acceptable or "proficient?" This can be decided by examining the test questions on estimation and making a judgment about how many the minimally prepared student should be able to get right. The percent of estimation questions identified in this way becomes the criterion score to which each student's percent-correct score should be compared.
When making a criterion-referenced interpretation, it is critical that the content area covered by the test – the domain – be described in detail. It is also important that the test questions for that domain cover the important areas of the domain. In addition, there should be enough questions on the topic to provide the students ample opportunity to show what they know and to minimize the influence of errors in their scores.
The percent-correct score is the type used most widely for making criterion-referenced interpretations. Criterion scores that define various levels of performance on the tests are generally percent-correct scores arrived at through teacher analysis and judgment. Several score reports available thru the Riverside Scoring Service include percent-correct skill scores that can be used to make criterion-referenced interpretations: Group Skills Analysis, Group Item Analysis, Individual Performance Profile, and Group Performance Profile.
Interpreting Scores from Special Test Administrations
When students have been tested with accommodations or modifications, should their answer documents be scored separately, or should they be included with those of other students? Should the scores of such students be included with the scores of all other students in group averages? Can the norms for the test be used? Should scores be interpreted differently? These are some of the many important questions that arise when testing accommodations/modifications have been used. Of course, school policy or state requirements may determine how each of these questions is answered in any given locale, but in the absence of such regulations, the rest of this section provides some ideas about how to resolve these issues.
To the extent that the accommodations used with a student were chosen carefully and judged to be necessary, the anticipated effect is to reduce the impact of that student's disability on the assessment process. That is, the student responses are like those we would expect the student to make if that student had no disability. Consequently, it seems reasonable to use that student's scores in the same ways we would use the scores of all other students. The student's answer document should be placed among the others for scoring, the student's scores should be included with all others in group averages, and the various derived scores (e.g., grade equivalents and percentile ranks) should be interpreted as though the student had been tested without any accommodations.
Total and Composite Scores Reported
The lists below identify the Total and Composite scores that can be obtained with each test level from the Complete, Core, and Survey Batteries. This information is needed for making decisions about which tests must be given, at a minimum, to ensure that all the scores needed by your school will be obtained. Of course, this information also is helpful in interpreting scores that appear on various score reports.
All Total and Composite scores are obtained by averaging the developmental standard scores from certain component tests. (An exception is the Reading Total score from Level 6. It is formed by adding the raw scores from the Words and Comprehension portions of the Reading test.) The average standard score can be converted to a percentile rank, grade equivalent, or other type of score for interpretation purposes. Thus, a report might show a national percentile rank for an average standard score, but it will never show an average national percentile rank.
Scores | |
Level 5 Complete Battery | top |
Vocabulary | V |
Word Analysis | WA |
Listening | Li |
Language | L |
Mathematics | M |
Core Total | CT = (V + L + M) / 3 |
Reading Profile Total | (RPT) = (V + WA + Li) / 3 or (V + WA + Li + RW) / 4 |
Level 6 Complete Battery | top |
Vocabulary | V |
Word Analysis | WA |
Listening | Li |
Language | L |
Mathematics | M |
Core Total | CT = (V + L + M) / 3 |
Reading Words | RW |
Reading Comprehension | RC |
Reading Total | RT |
Reading Profile Total | RPT = (V + WA + Li + RW + RC) / 5 |
Levels 7 and 8 Complete and Core Battery | top |
Vocabulary | V |
Word Analysis | WA |
Reading Comprehension | RC |
Reading Total | RT = (V + RC) / 2 |
Listening | Li |
Language | L |
Spelling | L1 |
Math Concepts | M1 |
Math Problems | M2 |
Math Computation | M3 |
Math Total* | MT^{+} = (M1 + M2 + M3) / 3 MT^{-} = (M1 + M2) / 2 |
Core Total* | CT^{+} = (RT + L + MT^{+}) / 3 CT^{-} = (RT + L + MT^{-}) / 3 |
Social Studies | SS |
Science | SC |
Sources of Information | SI |
Composite* | CC^{+} = (RT + WA + Li + L + MT^{+} + SS + SC + SI) / 8 CC^{-} = (RT + WA + Li + L + MT^{-} + SS + SC + SI) / 8 |
Reading Profile Total | RPT = (V + RC + WA + Li + L1) / 5 |
Levels 7 and 8 Survey Battery | top |
Reading Total | RT |
Language Total | LT |
Math Total* | MT^{+} = MT with Computation MT^{-} = MT without Computation |
Survey Total* | CT^{+} = (RT + LT + MT^{+}) / 3 CT^{-} = (RT + LT + MT^{-}) / 3 |
In the lists above, the Math Total score with the Math Computation score included in it (MT^{+}) can be replaced, at the user's option, with the Math Total score without the Math Computation score included in it (MT^{-} ). This decision needs to be made when scoring services are ordered. If the replacement is made, the Core Total and the Composite scores, which include the Math Total score, are also affected. The asterisks (*) show the affected scores. | |
Level 9 Complete and Core Battery | top |
Vocabulary | V |
Reading Comprehension | RC |
Reading Total | RT = (V + RC) / 2 |
Spelling | L1 |
Capitalization | L2 |
Punctuation | L3 |
Usage and Expression | L4 |
Language Total | LT = (L1 + L2 + L3 + L4) / 4 |
Math Concepts and Estimation | M1 |
Math Problem Solving | M2 |
Math Computation | M3 |
Math Total* | MT^{+} = (M1 + M2 + M3) / 3 MT^{-} = (M1 + M2) / 2 |
Word Analysis | WA |
Listening | Li |
Core Total* | CT^{+} = (RT + LT + MT^{+}) / 3 CT^{-} = (RT + LT + MT^{-}) / 3 |
Social Studies | SS |
Science | SC |
Maps and Diagrams | S1 |
Reference Materials | S2 |
Sources of Information Total | ST = (S1 + S2) / 2 |
Composite* | CC^{+} = (RT + LT + MT^{+} + SS + SC + ST) / 6 CC^{-} = (RT + LT + MT^{-} + SS + SC + ST) / 6 |
Reading Profile Total | RPT = (V + RC + WA + Li + L1) / 5 |
Levels 10-14 Complete and Core Battery | top |
Vocabulary | V |
Reading Comprehension | RC |
Reading Total | RT = (V + RC) / 2 |
Spelling | L1 |
Capitalization | L2 |
Punctuation | L3 |
Usage and Expression | L4 |
Language Total | LT = (L1 + L2 + L3 + L4) / 4 |
Math Concepts and Estimation | M1 |
Math Problem Solving | M2 |
Math Computation | M3 |
Math Total* | MT^{+} = (M1 + M2 + M3) / 3 MT^{-} = (M1 + M2) / 2 |
Core Total* | CT^{+} = (RT + LT + MT^{+}) / 3 CT^{-} = (RT + LT + MT^{-}) / 3 |
Social Studies | SS |
Science | SC |
Maps and Diagrams | S1 |
Reference Materials | S2 |
Sources of Information Total | ST = (S1 + S2) / 2 |
Composite* | CC^{+} = (RT + LT + MT^{+} + SS + SC + ST) / 6 CC^{-} = (RT + LT + MT^{-} + SS + SC + ST) / 6 |
Levels 9-14 Survey Battery | top |
Reading Total | RT |
Language Total | LT |
Math Total* | MT^{+} = MT with Computation MT^{-} = MT without Computation |
Survey Total | CT^{+} = (RT + LT + MT^{+}) / 3 CT^{-} = (RT + LT + MT^{-}) / 3 |
*In the lists above, the Math Total score with the Math Computation score included in it (MT^{+}) can be replaced, at the user's option, with the Math Total score without the Math Computation score included in it (MT^{-}). This decision needs to be made when scoring services are ordered. If the replacement is made, the Core Total and the Composite scores, which include the Math Total score, are also affected. | |
Levels 15-17/18 Complete and Core Battery | top |
Scores on these tests are combined to form what are called Total or Composite scores. Six such scores are available for the ITED, as defined below: 1. The Reading Total (RT) is the average of the Reading Comprehension (R) standard score and the Vocabulary (V) standard score: 2. The Mathematics Total including Computation (MT) is defined as a function of the Mathematics: Concepts and Problem Solving (M) standard score and the Computation (Comp) standard score as shown below: 3. The Core Total without Computation (CT^{-}) is the average of the RT, Language (L), and Mathematics (M) standard scores: 4. The Core Total with Computation (CT^{+}) is defined as: 5. The Complete Composite without Computation (CC^{-}) is the average of the RT, L, M, Social Studies (SS), Science (SC), and Sources of Information (SI) standard scores: 6. The Complete Composite with Computation (CC^{+}) is defined as: |
We would appreciate your feedback and comments. Please use this e-mail form to submit your thoughts about this site.