As reported on Teach the Vote Last week, the Commission on Next Generation Assessments and Accountability has released its final recommendations and considerations for further study. Below are ATPE Lobbyist Monty Exter’s comments on the commission’s recommendations.
Click on each recommendation below to view ATPE’s analysis.
Final Recommendations of the Texas Commission on Next Generation Assessments and Accountability (TCONGAA):
- Implement an Individualized, Integrated System of Multiple Assessments Using Computerized-Adaptive Testing and Instruction.
- Allow the Commissioner of Education to Approve Locally Developed Writing Assessments.
- Support the Continued Streamlining of the Texas Essential Knowledge and Skills (TEKS).
- Limit State Testing to the Readiness Standards.
- Add College-Readiness Assessments to the Domain IV (Postsecondary Readiness) Indicators and Fund, with State Resources, a Broader Administration of College-Readiness Assessments.
- Align the State Accountability System with ESSA Requirements.
- Eliminate Domain IV (Postsecondary Readiness) from State Accountability Calculations for Elementary Schools.
- Place Greater Emphasis on Growth in Domains I–III in the State Accountability System.
- Retain the Individual Graduation Committee (IGC) Option for Graduation as Allowed by TEC, §28.0258
1. Implement an Individualized, Integrated System of Multiple Assessments Using Computerized-Adaptive Testing and Instruction.
Recommendation one, potentially the most sweeping of the commission’s recommendations, breaks down into two main parts: computer adaptive testing (CAT) and a multiple assessment framework. Each part has pros and cons.
As its name suggests, CAT refers to an assessment administered via computer that varies (adapts) as the test taker answers questions. The more questions the tester gets right, the harder subsequent test questions become; the less questions the tester gets right, the easier subsequent test questions become. Unlike a grade level specific proficiency test (such as STAAR) that simply indicates whether or not a student meets grade level proficiency, CAT should give a much clearer picture of where along a broader spectrum a student is performing. These tests are, at least in theory, better for measuring growth and better for more precisely identifying proficiency levels of high-achieving and low-achieving students. Those attributes likely make CAT a superior type of assessment as compared to current STAAR tests, particularly since our accountability system values student growth data. The biggest hurdle (I won’t even call it a con) to moving from STAAR to a CAT is technology infrastructure. Under current test security protocols, which require essentially all students statewide to take a particular test simultaneously, all students would need access to a screen, keyboard, and either the web or a state- (maybe district-) level network, in order to take the assessment.
While the state could lessen the hardware requirement by modifying test security protocols so that kids could take the test in batches (a potential scheduling nightmare that could seriously waste instructional time if not implemented exceedingly well), the networking capacity is a fairly immutable requirement. Thankfully the federal government has solved this problem for us if the state acts quickly. For a short window of time, the federal E-rate program is offering a 9:1 match on state dollars to increase physical network capacity in exactly the way we need to increase ours. So if the state will put in $25 million, the feds will put in $225 million and we can dramatically increase our broadband capacity for underserved communities. This would be a big plus whether or not we move to CATs.
That all sounds great, but CAT is not a panacea. Using CAT does not ensure that the test questions will be written in a fair, direct, or developmentally appropriate way, a major criticism of the current test. CAT, in and of itself, also doesn’t solve the issue of overreliance on a single measurement. Currently the state almost exclusively relies on standardized testing data as a proxy for overall student performance, but such data at best provides a fairly narrow window on student knowledge or ability.
This leads to the second half of the recommendation where the commission recommends using multiple smaller assessments administered closer to the time concepts are actually taught. Those tests would be rolled up into a single summative score as opposed to the one-test-on-one-day model that STAAR currently utilizes. There are definitely potential pros to going with this recommendation, primarily that it would likely decrease the perceived impact of the test and therefore student stress levels. By breaking the assessment process into smaller, more manageable tests and spreading them over the full course of the school year, state standardized assessments could become more normalized, instead of a once-a-year event that students have to psych themselves up for (or psych themselves out about). Smaller tests could also lead to less test fatigue, giving students a better chance at doing their best work on the whole test.
If while reading the last paragraph you were thinking to yourself that this sounds a lot like the pop quizzes and smaller unit tests that teachers have been using to assess student comprehension since before your grandparents were born (i.e. formative testing), you would be right. That teachers should, and almost universally are, already using formative testing as an important part of their instructional practice did not escape the proponents of this recommendation. They feel that this recommendation will reduce the overall amount of testing because it could replace a summative test (STAAR) with a summative result that uses assessments already happening in most classrooms.
There are, unfortunately, three main concerns with this line of thought. To start, is formative assessment data appropriate for deriving summative conclusions? There is at least some indication that the answer to that question is no. The invited expert that testified before the commission cautioned against using formative assessments to draw summative conclusions. Formative assessments, by their nature, are designed to determine where a student is on a particular skill while still inside the learning process. The expectation is that the educator will use the information gleaned from the assessment and continue to perfect the students’ understanding of the skills tested. This is different from a summative assessment where a student is supposed to demonstrate mastery of an already learned skill. This distinction creates potential problems with using formative data for accountability purposes, as the student would not be expected to have attained mastery yet.
Assuming you get past this first concern, you then have to ask whether or not it’s really a benefit to subject these inherently individualized assessments to a state-level standardization process? One of the primary complaints of the current system is separation of teachers and teaching from development and administration of the test. Developing assessments that flow naturally from the curriculum being taught that are also contextually and developmentally appropriate for the class of students being assessed is a skill that must be honed, something the state should be helping teachers do. When you disassociate teaching and testing by removing the teacher from the assessment process, the result is negative from both an assessment and a curriculum stand point. From an assessment standpoint, a standardized test is by its nature less contextually and developmentally individualized to the students actually taking the test. If the goal is to ensure educators have the most accurate information about their students’ acquisition of the skills being taught verses a measure of their students’ ability to cope with a standardized testing instrument as compared to all other students in the state, teacher-developed assessments are preferable. Assessments measuring a standard set of state-determined skills and individualized to the group of students being taught, give students the best opportunity to demonstrate mastery of those skills. Contrarily, standardized tests measure a student’s ability to cope with a testing instrument in addition to the mastery of skills.
Finally, even if you overlook these concerns, the committee did not call for the replacement of STAAR with this system. While the initial recommendation called for replacing STAAR with a new multiple-assessment CAT framework, that language was ultimately removed. The recommendation now reads like more of an addition to STAAR instead of a replacement of it. It’s questionable whether or not we need a massive new statewide standardized assessment system to replace our existing massive statewide standardized assessment system. It is certain that we do not need a new system in addition to the current one.
2. Allow the Commissioner of Education to Approve Locally Developed Writing Assessments.
Commission recommendation number two is a solid recommendation which addresses the substantial and specific problem of replacing the current, extremely poorly designed, STAAR writing test. The recommendation, which was primarily driven by Chairman Aycock, recognizes that writing is a process-driven, highly individual skill that is not well assessed through an overly standardized process. The evaluation of writing ability should not rely on a testing scheme that produces a work product absent of any meaningful review and revision by the student, nor should a student’s writing evaluation be graded by non-experts using a rubric.
The recommendation will allow districts to develop a process that takes a more comprehensive and holistic look at students’ writing skills. Districts can utilize the professional judgment of credentialed educators to assess the students’ work rather than utilizing low-paid graders with uncertain qualifications that use a rubric to judge student work developed under an inherently flawed process.
3. Support the Continued Streamlining of the Texas Essential Knowledge and Skills (TEKS).
Allowing the continued streamlining of the TEKS is a fine recommendation, but with no suggestion as to how the SBOE might improve the TEKS writing process, it is also a shallow and somewhat meaningless recommendation. Recommendations on how the SBOE can write deeper, but more manageable TEKS or how the legislature might support the SBOE in this process (this recommendation includes no recommended statutory changes) would be much more beneficial.
There are inherent tensions that need to be addressed in the TEKS writing process. These include the relationship of the length of the standards to the length of the test, the creation of standards that allow for depth and creativity for both students and educators while giving new and inexperienced educators the support they need to teach subject matter they may not be fully versed in yet, and the tension between developing standards that are better for testing (discrete and many) versus teaching (topical and naturally fewer). It would have been preferable if the recommendation called for the SBOE to modify their TEKS writing process, either internally or with legislative direction, to create two documents: one that identifies only the broader areas of essential knowledge, which would be identified as the State Standards, and a separate supporting document with detailed examples of how a practitioner might structure curriculum to cover those broad areas.
4. Limit State Testing to the Readiness Standards.
While this recommendation seems appealing and simple, it is in fact a poor substitute for actually perfecting a better TEKS drafting process. Like the one mentioned above, it also happens to be outside of what is allowable under federal law, which requires that a state assess all of its standards in the areas required to be tested.
5. Add College-Readiness Assessments to the Domain IV (Postsecondary Readiness) Indicators and Fund, with State Resources, a Broader Administration of College-Readiness Assessments.
Increasing opportunities for students to be exposed to the possibility of pursuing post-secondary education is always a worthwhile goal. The state should fund these test administrations at a greater rate than it already is, and districts should absolutely be rewarded for encouraging as many students as possible to take college entrance exams and for assisting students to perform well on them.
On a cautionary note legislators should be mindful not to design a system which encourages districts to discourage lower-performing students from taking them. A system which punishes districts for lower SAT/ACT scores or that does not reward districts for increased participation rates would have a negative effect on increasing student access to postsecondary opportunities.
6. Align the State Accountability System with ESSA Requirements.
This recommendation calls for a response of: well, duh. One would certainly hope that the state would align its system with federal requirements, particularly when the new federal law was written with state driven accountability in mind.
7. Eliminate Domain IV (Postsecondary Readiness) from State Accountability Calculations for Elementary Schools.
Whether or not attendance rates should be removed from the accountability system simply because nearly all schools are doing an exceedingly good job on the metric is debatable. However, to say that the state can find no “indicators of student achievement [that lead to post-secondary readiness and are] not associated with performance on standardized assessments,” is a shocking statement.
Clearly the state is willing to recognize that being present in class is an indicator of student achievement that leads to post-secondary readiness. (It is simple loathe to give credit to everyone for accomplishing that measure as it doesn’t produce differentiation, i.e. universal success doesn’t create much of a bell curve). However, despite study after study pointing to the fact that the quality (relevant experience plus innate ability) of a student’s teachers is the single best indicator of a student’s long term academic success, the state/legislature/commissioner/this commission still refuses to hold districts accountable for equitable distribution of quality educators. While such a measure may not be an appropriate metric for campus level accountability ratings, it is certainly applicable and appropriate at both the elementary and secondary levels as part of the district-wide accountability rating.
8. Place Greater Emphasis on Growth in Domains I–III in the State Accountability System.
In the abstract, emphasizing student growth as a preeminent goal of the education system sounds great and is great. However in reality, it is the nature of the system that high performing students are hard pressed to show exponential growth on a grade level specific minimum standards test. It is equally true that low performing students, while they may or may not show exponential growth, are less likely to show proficiency on the same test. One has to assume that is why the current accountability system allows campuses or districts to pass either domain one or two (proficiency or growth) in addition to both domains three and four to be considered to have met standard. Assuming the commissioner takes this recommendation, it will be interesting to see how he both meaningfully increases the emphasis on growth and maintains the delicate and appropriate balance that is currently being achieved.
On a side note, it is important to continue to recognize that our testing system is not optimized/designed to measure student growth and that using it to do so is somewhat dubious at best.
9. Retain the Individual Graduation Committee (IGC) Option for Graduation as Allowed by TEC, §28.0258.
While it should be recognized that these committees are not a substitute for fixing what is broken in the current assessment and accountability system, they should certainly continue to exist, at least while the system continues to be broken and perhaps indefinitely as a valuable student safety net.