Thirteen Strategies to Measure College Teaching

A Consumer’s Guide to Rating Scale Construction, Assessment, and Decision-Making for Faculty, Administrators, and Clinicians

May 2006
More details
  • Publisher
    Stylus Publishing
  • Published
    5th May 2006
  • ISBN 9781579221935
  • Language English
  • Pages 288 pp.
  • Size 6" x 9"
Lib E-Book

Library E-Books

We are signed up with aggregators who resell networkable e-book editions of our titles to academic libraries. These editions, priced at par with simultaneous hardcover editions of our titles, are not available direct from Stylus.

These aggregators offer a variety of plans to libraries, such as simultaneous access by multiple library patrons, and access to portions of titles at a fraction of list price under what is commonly referred to as a "patron-driven demand" model.

December 2011
More details
  • Publisher
    Stylus Publishing
  • Published
    30th December 2011
  • ISBN 9781620360507
  • Language English
  • Pages 288 pp.
  • Size 6" x 9"

* Student evaluations of college teachers: perhaps the most contentious issue on campus
* This book offers a more balanced approach
* Evaluation affects pay, promotion and tenure, so of intense interest to all faculty
* Major academic marketing and publicity
* Combines original research with Berk’s signature wacky humor

To many college professors the words "student evaluations" trigger mental images of the shower scene from Psycho, with those bloodcurdling screams. They’re thinking: "Why not just whack me now, rather than wait to see those ratings again."

This book takes off from the premise that student ratings are a necessary, but not sufficient source of evidence for measuring teaching effectiveness. It is a fun-filled--but solidly evidence-based--romp through more than a dozen other methods that include measurement by self, peers, outside experts, alumni, administrators, employers, and even aliens.

As the major stakeholders in this process, both faculty AND administrators, plus clinicians who teach in schools of medicine, nursing, and the allied health fields, need to be involved in writing, adapting, evaluating, or buying items to create the various scales to measure teaching performance. This is the first basic introduction in the faculty evaluation literature to take you step-by-step through the process to develop these tools, interpret their scores, and make decisions about teaching improvement, annual contract renewal/dismissal, merit pay, promotion, and tenure. It explains how to create appropriate, high quality items and detect those that can introduce bias and unfairness into the results.

Ron Berk also stresses the need for “triangulation”--the use of multiple, complementary methods--to provide the properly balanced, comprehensive and fair assessment of teaching that is the benchmark of employment decision making.

This is a must-read to empower faculty, administrators, and clinicians to use appropriate evidence to make decisions accurately, reliably, and fairly. Don’t trample each other in your stampede to snag a copy of this book!

"The humor is delightful and the information critical to understanding the process of evaluating assessment instruments. The University of North Texas has formed a committee to examine student evaluation forms as a first step to help measure overall teacher effectiveness. I have recommended that the other members purchase a copy."

Paula Iaeger, GSA in the Office of the Provost and VP for Academic Affairs

“The evaluation of teaching is something that is done virtually wherever teaching itself is done. At too many places, though, it is done in a shallow, haphazard fashion.

Ron Berk’s book aims at evangelizing the rest of academia with the good news of how to do it right. This is ground that other well-respected academics have covered, but perhaps none aimed quite as much at the average faculty member. Berk does an excellent job at directing the reader to the relevant work that has been done in the field. The book is laid out in a logical fashion, with an introduction that describes the motivation for the book, followed by a chapter summarizing the thirteen strategies (i.e. sources of evidence used for evaluating teaching). This chapter gives an excellent overview of what Berk calls 360° Multisource Assessment, which is another way of saying that you should take many sources of evidence into consideration when assessing college teaching. This chapter is a good overview of building a teaching evaluation system, and can be read as a stand-alone topic. The book’s subtitle is actually a much better description of the main point of the book. For those faculty members or administrators who have been tasked with the development or overhaul of such a system at their college, the overview chapter may be the only one that is needed. For those who must develop student evaluation forms and other ratings instruments, the rest of the book contains invaluable information. Berk provides a step-by-step procedure for determining how the rating scales should be constructed, what questions (items) should asked, and what type of anchors (response choices) is appropriate. He provides examples of rating scales and items, both good and bad. Very importantly, he also provides clear instructions on how to field test the rating scale and how to determine its validity and reliability.”

International Journal for the Scholarship of Teaching and Learning

“Berk’s list of strategies is (as advertised) one of the most complete discussions of these issues. This book can be used by both novices and experienced practitioners as a guide to better practice. That’s why it is worth reading."

from the Foreword by Michael Theall, Associate Professor, Education, & Director, Center for the Advancement of Teaching And Learning at Youngstown State

ACKNOWLEDGMENTS; A FOREWORD (IN BERKIAN STYLE) BY MIKE THEALL; INTRODUCTION; 1 TOP 13 SOURCES OF EVIDENCE OF TEACHING EFFECTIVENESS; A Few Ground Rules; Teaching Effectiveness: Defining the Construct; National Standards; Beyond Student Ratings; A Unified Conceptualization; Thirteen Sources of Evidence; Student Ratings; Peer Ratings; External Expert Ratings; Self-Ratings; Videos; Student Interviews; Exit and Alumni Ratings; Employer Ratings; Administrator Ratings; Teaching Scholarship; Teaching Awards; Learning Outcome Measures; Teaching Portfolio; BONUS: 360° Multisource Assessment; Berk’s Top Picks; Formative Decisions; Summative Decisions; Program Decisions; Decision Time; 2 CREATING THE RATING SCALE STRUCTURE; Overview of the Scale Construction Process; Specifying the Purpose of the Scale; Delimiting What Is to Be Measured; Focus Groups; Interviews; Research Evidence; Determining How to Measure the “What”; Existing Scales; Item Banks; Commercially Published Student Rating Scales; Universe of Items; Structure of Rating Scale Items; Structured Items; Unstructured Items; 3 GENERATING THE STATEMENTS; Preliminary Decisions; Domain Specifications; Number of Statements; Rules for Writing Statements; 1. The statement should be clear and direct; 2. The statement should be brief and concise; 3. The statement should contain only one complete behavior, thought, concept; 4. The statement should be a simple sentence; 5. The statement should be at the appropriate reading level; 6.The statement should be grammatically correct; 7. The statement should be worded strongly; 8. The statement should be congruent with the behavior it is intended to measure; 9. The statement should accurately measure a positive or negative behavior; 10. The statement should be applicable to all respondents; 11. The respondents should be in the best position to respond to the statement; 12. The statement should be interpretable in only one way; 13. The statement should NOT contain a double negative; 14. The statement should NOT contain universal or absolute terms; 15. The statement should NOT contain nonabsolute, warm-and-fuzzy terms; 16. The statement should NOT contain value-laden or inflammatory words; 17. The statement should NOT contain words, phrases, or abbreviations that would be unfamiliar to all respondents; 18. The statement should NOT tap a behavior appearing in any other statement; 19. The statement should NOT be factual or capable of being interpreted as factual; 20. The statement should NOT be endorsed or given one answer by almost all respondents or by almost none; 4 SELECTING THE ANCHORS; Types of Anchors; Intensity Anchors; Evaluation Anchors; Frequency Anchors; Quantity Anchors; Comparison Anchors; Rules for Selecting Anchors; 1. The anchors should be consistent with the purpose of the rating scale; 2. The anchors should match the statements, phrases, or word topics; 3. The anchors should be logically appropriate with each statement; 4. The anchors should be grammatically consistent with each question; 5. The anchors should provide the most accurate and concrete responses possible; 6.The anchors should elicit a range of responses; 7. The anchors on bipolar scales should be balanced, not biased; 8. The anchors on unipolar scales should be graduated appropriately; 5 REFINING THE ITEM STRUCTURE; Preparing for Structural Changes; Issues in Scale Construction; 1. What rating scale format is best?; 2. How many anchor points should be on the scale?; 3. Should there be a designated midpoint position, such as; “Neutral,” “Uncertain,” or “Undecided,” on the scale?; 4. How many anchors should be specified on the scale?; 5. Should numbers be placed on the anchor scale?; 6. Should a “Not Applicable” (NA) or “Not Observed”; (NO) option be provided?; 7. How can response set biases be minimized?; 6 ASSEMBLING THE SCALE FOR ADMINISTRATION; Assembling the Scale; Identification Information; Purpose; Directions; Structured Items; Unstructured Items; Scale Administration; Paper-Based Administration; Online Administration; Comparability of Paper-Based and Online Ratings; Conclusions; 7 FIELD TESTING AND ITEM ANALYSES; Preparing the Draft Scale for a Test Spin; Field Test Procedures; Mini-Field Test; Monster-Field Test; Item Analyses; Stage 1: Item Descriptive Statistics; Stage 2: Interitem and Item-Scale Correlations; Stage 3: Factor Analysis; 8 COLLECTING EVIDENCE OF VALIDITY; AND RELIABILITY; Validity Evidence; Evidence Based on Job Content Domain; Evidence Based on Response Processes; Evidence Based on Internal Scale Structure; Evidence Related to Other Measures of Teaching Effectiveness; Evidence Based on the Consequences of Ratings; Reliability Evidence; Classical Reliability Theory; Summated Rating Scale Theory; Methods for Estimating Reliability; 9 REPORTING AND INTERPRETING SCALE RESULTS; Generic Levels of Score Reporting; Item Anchor; Item; Subscale; Total Scale; Department/Program Norms; Subject Matter/Program-Level State, Regional, and; National Norms; Criterion-Referenced versus Norm-Referenced Score Interpretations; Score Range; Criterion-Referenced Interpretations; Norm-Referenced Interpretations; Formative, Summative, and Program Decisions; Formative Decisions; Summative Decisions; Program Decisions; Conclusions; References; Appendices; A. Sample “Home-Grown” Rating Scales; B. Sample 360° Assessment Rating Scales; C. Sample Reporting Formats; D. Commercially Published Student Rating Scale Systems; Index.

Ronald A. Berk

Ronald A. Berk is Professor Emeritus of Biostatistics and Measurement and former Assistant Dean for Teaching, The Johns Hopkins University. He received the University’s Alumni Association Excellence in Teaching Award in 1993 and Caroline Pennington Award for Teaching Excellence in 1997 and was inducted as a Fellow in the Oxford Society of Scholars in 1998. He has published 11 books and 130 journal articles / chapters. These publications reflect his unwavering commitment to mediocrity and his motto: “Go for the Bronze!” He is a popular speaker on teaching and assessment throughout the U.S. and Europe.

student evaluations; tenure; faculty evaluation; assessment of teaching