Chapter 3 Pondertime (p. 80, #3, #5) |
![]() |
3. What kinds of educational assessment procedures do you think should definitely require assembly of reliability evidence? Why? Popham describes three types of reliability evidence: stability, alternate form, and internal consistency. (pg 62 ff.) If we suppose that some assessment procedures are high stakes, then it seems logical to demand that those procedures be reliable. In other words we want to have confidence in high stakes decisions, therefore the instruments we are using: should not vary in test-retest situations, should be accurate measures no matter their particular format, and should not give mixed results at different points in the procedure. So what are high stakes tests? If I may be so bold (to co-opt some of Popham’s informal style), any test which groups a students, tracks a student, or determines some significant future course of action could be deemed a high stakes test. Conversely am I saying that low-stakes assessments need not require any assembly of reliability evidence? Yes, low-stakes assessments do not require reliability analyses and that agrees with Popham’s recommendation pg. 75, “In general, if you construct your own classroom tests with care, those tests will be sufficiently reliable for the decisions you will base on the tests’ results.” References Popham, W.J. (2011). Classroom Assessment: What Teachers Need to Know. (6th ed.). Boston ,MA: Pearson Education, Inc. |
5. What is your reaction to classification consistency as an approach to the determination of reliability?
I would react with shock and surprise, had Popham (2011) not warned us already that “even those educators who know … sometimes mush the three brands of reliability together (pg. 73).” In checking some other articles, I found that there was an argument going back and forth in Educational Research in 2009 and 2010 on this very topic. It seems that Newton (2009) wrote claiming that based on “internal consistency…a substantial percentage of students would receive different levels [scores] were the testing process to be replicated.” At which point Bramley (2010) wrote back to “show that it is not possible to calculate classification accuracy from classification consistency.”
The argument Bramley (2010) uses to refute Newton basically reminds us that reliability, i.e. classification accuracy, depends on some uncertainties in the tested population, which can vary widely irrespective of the questions being consistent. Interestingly enough, once you admit that the measurements are different and unrelated, it is fair to ask how much they may differ from one another in practice. Bramley’s (2010) last sentence reads “The author’s experience with both simulated and real data suggests that values for classification accuracy and consistency are often quite close – within about 5 percentage points.” Talk about a storm in a teacup! References Bramley, T. (2010). A response to an article published in Educational Research’s Special Issue on Assessment (June 2009). What can be inferred about classification accuracy from classification consistency? Educational Research. 52(3), 325-330. SPU EBSCO url. Newton, P. E. (2009). The Reliability of Results from National Curriculum Testing in England. Educational Research, 51(2), 181-212. SPU EBSCO url. Popham, W.J. (2011). Classroom Assessment: What Teachers Need to Know. (6th ed.). Boston ,MA: Pearson Education, Inc. |
Chapter 4 Pondertime (p. 109, #2, #5) |
![]() |
I start with a quote that spoke to me recently from Mighton (2011), who believes “that some educators have been so seduced by the language they use that they can’t clearly see the issues anymore.” With that as a caveat of sorts, on with my answer. I’m sympathetic to the argument Popham (2011) makes on pg. 102 that construct related validity evidence is a more powerful concept, since it can adequately express the meaning of both content related and criterion related validity evidence. The power of construct related validity evidence lies in the use of empirical evidence and the definition of the construct. Thus, criterion related evidence of validity can be thought of as a construct-related since it relies on a predictor construct, while content related can be thought of as construct-related since it uses empirical evidence (say, from specialists) or others to define an unobservable construct (content that is useful) and show that it has been suitably measured.
References Popham, W.J. (2011). Classroom Assessment: What Teachers Need to Know. (6th ed.). Boston ,MA: Pearson Education, Inc. Mighton, J. (2011). The End of Ignorance: Multiplying our Human Potential. Vintage Canada. |
5. What kind(s) of validity evidence do you think classroom teachers need to assemble regarding their classroom assessment devices?
Content related evidence would show that assessments have good representativeness relative to curricular aims, and assuming the curricular aims were aligned with state/school standards, then teachers could be relatively sure that their inferences for each student’s success in the next unit or level (i.e. their grade) was valid.
I don’t see much value of criterion-related validity evidence for a classroom teacher, since it is mostly predictive. That is to say, in the day-to-day operation of the classroom, I doubt that time should be spent predicting student performance on a criterion that will be evaluated potentially the next grade-level at the earliest. However, the statistician in me would love to see correlations that could be built between a student’s performance on a summative assessment in grade X, Unit Y, when that students get to grade X+1 and Unit Z. Probably the amount of data which would need to be collected an then crunched and then compared would be pretty expensive corroboration of the time honored-truths that “if you don’t do well at arithmetic, you won’t get algebra” and “if you don’t get algebra, you probably won’t get geometry or trigonometry or statistics, and don’t even think about calculus”. Which is sad because those all are pretty different from each other and learners could be quite diverse in their abilities or interests, which a test in arithmetic (basic math) can’t necessarily predict.* As far as construct-related validity evidence, while it may be powerful as a concept, it is probably too much overhead for a classroom teacher to worry about in the day-to-day functioning of the classroom. However, for as long as the debate over standardized tests is raging, I think a classroom teacher needs to be cognizant of types of validity evidence and what assumptions are being made by theorists and administrators that impact the functioning and procedures of the day-to-day classrooms * I make this point based on some accounts in Mighton (2011) where he describes students that he tutored in middle-school that later went on to higher degrees in mathematics. Mighton, J. (2011). The End of Ignorance: Multiplying our Human Potential. Vintage Canada.
|
As I was reading this part of the chapter I created the following diagram to help me keep track of the difference between alignment and representativeness. |
-
Pages
-
Archives
-
Categories
- .1a using instructional strategies (1)
- .1b using assessment strategies (1)
- .1c appropriate classroom managment (1)
- .1d challenging curriculum (2)
- .1e cultural sensitivity (2)
- .1f integrating technology (4)
- .1g informing-involving-collaborating (1)
- .2a feedback and reflection (2)
- .2b assess and plan professional growth (2)
- .2c remaining current (3)
- .3a advocating for student needs (1)
- .3b participating in improvement (1)
- .H1 Honor student diversity and development. (45)
- .H2 Honor student access to content material. (45)
- .H3 Honor the classroom/school community as a milieu for learning. (29)
- .H4 Honor family/community involvement in the learning process. (16)
- .H5 Honor student potential for roles in the greater society. (21)
- .O1 Offer an organized curriculum aligned to standards and outcomes. (42)
- .O2 Offer appropriate challenge in the content area. (48)
- .P1 Practice intentional inquiry and planning for instruction. (36)
- .P2 Practice differentiated instruction. (49)
- .P3 Practice standards-based assessment. (36)
- .P4 Practice the integration of appropriate technology with instruction. (41)
- E1 Exemplify professionally-informed, growth-centered practice. (74)
- E2 Exemplify collaboration within the school. (21)
- E3 Exemplify an understanding of professional responsibilities and policies. (29)
- PT1 Professional Growth and Contributions (3)
- PT2 Building a Learning Community (2)
- PT3 Curriculum, Instruction and Assessment (2)
- Uncategorized (76)
-
AllThingsConsidered Assessment Associations AuthenticApplications Awards books bphs CCSS-Math classroom management EDMA6357 EDSP6644 EDU6120 EDU6130 EDU6132 EDU6133 EDU6139 EDU6170 EDU6171 EDU6172 EDU6526 EDU6918 EDU6978 EDU6989 FHS games hbphs Humor ICanTeach Internship Jobs Jokes KeyIdeaIdentification L1 L2 L3 L4 Learning From Games LearningIllustrated MasterTeachers Math Mathematics microsoft OneNote OSPI P1 P2 P3 PAE2015 Physics PhysTEC/Noyce POGIL ReadMe Reflection Resources S1 S2 S3 Salaries Science SearchForMeaning Simulation Standards Statistics STEM T1 T2 T3 T4 Taxes Teaching TeachingPhysics technology WapatoHS Weekly-Reflect WFS
-
Top Posts
- John Dewey, “My Pedagogic Creed”, I Can Teach
- Reflection: How does differentiating instruction address the goals of a transformative multicultural learning environment?
- Tuesday 11/1/2011
- Thursday 11/3
- Reflection: Political Correctness
- Popham, Chapter 3 Pondertime & Chapter 4 Pondertime, Due 1/25/2012
-
Blogroll
-
Meta
-
Category Cloud
.1a using instructional strategies .1b using assessment strategies .1c appropriate classroom managment .1d challenging curriculum .1e cultural sensitivity .1f integrating technology .1g informing-involving-collaborating .2a feedback and reflection .2b assess and plan professional growth .2c remaining current .3a advocating for student needs .3b participating in improvement .H1 Honor student diversity and development. .H2 Honor student access to content material. .H3 Honor the classroom/school community as a milieu for learning. .H4 Honor family/community involvement in the learning process. .H5 Honor student potential for roles in the greater society. .O1 Offer an organized curriculum aligned to standards and outcomes. .O2 Offer appropriate challenge in the content area. .P1 Practice intentional inquiry and planning for instruction. .P2 Practice differentiated instruction. .P3 Practice standards-based assessment. .P4 Practice the integration of appropriate technology with instruction. E1 Exemplify professionally-informed, growth-centered practice. E2 Exemplify collaboration within the school. E3 Exemplify an understanding of professional responsibilities and policies. PT1 Professional Growth and Contributions PT2 Building a Learning Community PT3 Curriculum, Instruction and Assessment Uncategorized -
Tweets
- Another good reason to subscribe to Disney+ #lollylollylollygetyouradverbshere #edchat https://t.co/uyKp9m06Wm--- 3 days ago
- RT @MiaDoesAstro: Happy holidays! Here's a list of fellowships for graduate students in 2021 (sorted by due date!) that do NOT require proo…--- 6 days ago
- RT @Pascoyearbook16: This just in: 2019/20 Yearbooks are on campus. Hit SchoolPay & buy your copy NOW. Distribution is being planned. We wi…--- 1 week ago
- RT @PascoDogpound: A little PHS @Pascoyearbook16 bulldog told us the 2019/2020 yearbooks are en route! All you @IAmPurplePride hit up Schoo…--- 3 weeks ago
- #skywalker on #MandalorianSeason2--- 1 month ago
- @jm_muise I wish the answer was 202.1--- 1 month ago
- But is there anything significant about that answer? #justcurious twitter.com/palamudhir/sta…--- 1 month ago
- #scichat #diversity 4/4 https://t.co/6a0OrdUCTp--- 1 month ago
- #scichat #diversity 3/4 https://t.co/fLv18DXOOX--- 1 month ago
- #scichat #diversity 2/4 https://t.co/c3MC2zYfy4--- 1 month ago
- #scichat #diversity 1/4 music.apple.com/us/album/white…--- 1 month ago
- #obsolete @3M I have about 30-40 of these if you want them. https://t.co/2RgLRbknbg--- 1 month ago
- #wewillgobackwhenitissafe twitter.com/abel_jennifer/…--- 1 month ago
- #wewillgobackwhenitissafe twitter.com/JustinAion/sta…--- 1 month ago
- Well COVID-19 safety demands that cloth chairs must go. Bye chairs, you really had my back(side). https://t.co/T3K4JZYPIg--- 2 months ago
- #today #2pmPDT twitter.com/OSIRISREx/stat…--- 2 months ago
- #ShakeOut2020 #ShakeOut https://t.co/SpewMgEPc6--- 3 months ago
- Looking at #Mars https://t.co/sN4o7CWcut--- 3 months ago
- #thanksnothanks Siri https://t.co/nv3VEOtIhQ--- 4 months ago
- High noon in Pasco. Reminds me that I haven’t watched #Interstellar in a while. https://t.co/wyDQKcVJXO--- 4 months ago
-
ED.GOV STEM Widget
if (WIDGETBOX) WIDGETBOX.renderWidget('d848d9e8-de89-4971-84d9-695bdc33a3ad'); Get the ScienceEducation.gov widget and many other great free widgets at Widgetbox! Not seeing a widget? (More info)
Comments
Hi John,
When I was reading the chapters, I was struggling to understand how they would relate to everyday classrooms. I think your comment “for as long as the debate over standardized tests is raging, I think a classroom teacher needs to be cognizant of types of validity evidence and what assumptions are being made by theorists and administrators that impact the functioning and procedures of the day-to-day classrooms” says it all.
Your references in the question on classification consistency are interesting. I’m still trying to make sure I understand them. So is Bramley saying that even if a test has classification consistency you cannot necessarily imply that students are being classified into the “correct” groups because it is about reliability and not accuracy? It only means that similar numbers of students are grouped the same way each time the test is taken? So its not about how valid the classifications are for each student but how consistent the test is overall? But then he points out that they do tend to line up quite well anyway. Let me know if I’m off here; I’ll move forward on the pretense that I understood your quotes.
In the end, what do you think? Is classification consistency a good way for educators or researches to assess the reliability? Do you think we can assume reliability when particular students may change groups but the total in each classification is similar?
It would be nice to know that my tests have criterion-related validity – meaning that the test scores are good predictors of success in their year-two or college level course. It would be a tragedy for them to do well on my exams, but not be prepared for the next level of their education. That being said I would not want to take on the evaluation necessary to make that determination.