Assessment and Evaluation


Educational Measurement


Interpreting student learning

Principal Metaphors

The word “evaluation” is derived from Old French, é + value, “out + worth = appraise.” The notion was first used in commerce and later picked up as a metaphor by educators. That is, assigning a mark was originally understood as analogous to determining the worth of a product. In a similar vein, the word “assessment” is derived from the Medieval Latin assessare, “set a tax upon,” and so it originally had almost the same meaning as evaluation. As it turns out, almost all of the terms associated with formal review of student learning are anchored to this emphasis. Others include:
  • Grading – adapted from the practice of ranking manufactured goods according to quality (i.e., how well they meet pre-defined standards). It is derived from the Middle English gree, “step or degree in a series,” which traces back to the Latin gradus, “step.”
  • Scoring – similar in meaning to grading within both education and business, it derives from the act of making notches or incisions (i.e., to signal different levels of quality). It traces back to the Proto-Germanic skura, “to cut,” which is also the root of scar and shear.
  • Marking – adapted from the practice of placing prices on items for sale. The word traces back to and across a range of European languages, originally referring to specifying borders and indicating margins.
This cluster of associations dominates conceptions and practices of assessment in contemporary schools – but it’s important to recognize that it doesn’t stand alone. Rather, this cluster is part of a grander flock of associations in which knowledge is characterized, metaphorically, as an objectively real thing. A small piece of a cascade of associations and entailments is:
  • Knowledge is a thing
  • Learning is acquiring that thing
  • Assessment and evaluation are about determining/assigning worth (of/to that thing).
Again, the above points to the core of the currently dominant cluster of associations that frame assessment. Here are some other flocks of association that might lead to other attitudes toward assessment:

Knowledge is a fluid → Learning is soaking up the fluid → Assessment is a measure of the learner’s capacity (to hold that fluid).

Knowledge is out there → Learning is internalizing → Assessment rates the accuracy of internalized representations.

Knowledge is a goal → Learning is progressing toward (and ultimately attaining) that goal → Assessment tracks achieved goals and progress toward other goals.

Knowledge is organism’s repertoire of behavior → Learning is change in behavior due to environmental influences → Assessment measures the effectiveness of rewards/punishments.

Knowledge is appropriate, situated action → Learning is enculturation, en-habiting → Assessment is gauging fitness of one’s actions.

We could go on. But we’ll assume the above details are sufficient to illustrate our main points: (1) There are multiple interpretations of assessment at play in contemporary education. (2) A few dominate, likely because they mesh with prevailing assumptions/metaphors of the nature of knowledge. These few, dominant notions are most often revealed by their uninterrogated alignment with Attainment Metaphor (evident, e.g., in educational aims, curriculum trajectories, etc.) and Acquisition Metaphor (manifest in, e.g., obsessions with learning objects/objectives and valuation of those objects). (3) More defensible conceptions are rendered difficult to understand, and even more difficult to enact, in a context dominated by just a few webs of association.


Distributed across the history of schooling, but the most prominent models and modes are rooted in notions of objective measurement and standardized production that rose to cultural prominence during the Scientific and Industrial Revolutions.


Technically, Assessment and Evaluation encompass any activity intended to monitor student learning. It can be planned or spontaneous; it can be focused on improving an individual’s learning or more generally concerned with monitoring an educational system; it can be systematic or haphazard; it can be grounded in the science of learning, or it can be beholden to inherited and indefensible assumptions. Most often, dollops of all of these elements are rolled in together. A review of subdiscourses of Assessment and Evaluation affords more nuanced insight into the fault lines of opinion and practice that are currently represented among educators. In the following list, we attempt to emphasize this point by pointing to some of underlying metaphors that operate to orient practitioners’ attitudes and practices. To this end, we are following a tenet of Conceptual Metaphor Theory: these details are provided in part to offer clues as to why some indefensible beliefs are so pervasive, and why so many evidence-based theories and practices are rarely enacted.
  • Formative Assessments – Usually contrasted with Summative Assessments, Formative Assessments include all in-process evaluations of student work, generally intended to offer timely and focused feedback to support more sophisticated learning processes and products. (Most often, the Attainment Metaphor figures centrally in discussions of Formative Assessments – which is to be expected, given the alignment of Formative Assessmentswith notions of progress toward attaining a goal.)
  • Summative Assessment­s – Usually contrasted with Formative Assessments, Summative Assessments are end-point evaluations of student learning, typically expressed as “final grades,” and often high stakes. (While the notion of “end point” indicates a reliance on the Attainment Metaphor, most Summative Assessments align more strongly with the Acquisition Metaphor, through which learning is interpreted as an entity to be measured and the projects of learning are regarded as objects.)
  • Progress Tests – Typically associated with large educational systems or multi-year programs of study, Progress Tests are examinations administered at regular intervals to gauge levels of student mastery at specific stages in well-defined programs of study. (The notion of “progress” through a well-defined program of study reveals that Progress Tests are anchored to the Attainment Metaphor. Arguably, they are just as reliant on the Acquisition Metaphor, since proponents focus at least as much on “how much” has been learned as “how far” learners have moved.)
  • Rubrics – A rubric is a grading guide, typically presented in grid form, one dimension of which is used to identify essential qualities of assigned work and the other dimension of which indexes work quality to grades. Proponents highlight that expectations are clarified for learners and fairness of grading is more transparent. Distractors counter that Rubrics often press student attitudes and efforts toward “make sure to give teachers what they want,” as opposed to, say, “engage authentically with a matter of profound interest.” Rubrics can be parsed into two main types – one or both of which, depending on how criteria and expectations are articulated, can be made to fit with almost every prominent discourse on learning:
    • Holistic Rubrics – With criteria typically presented in the form of descriptive prose, Holistic Rubrics are used to offer overall, global impressions of tasks or achievements.
    • Analytic Rubrics – Typically presented as checklists or grids, Analytic Rubrics correlate levels of performance with scores, enabling nuanced and weighted feedback across multiple criteria.
  • Concept Inventory – Typically associated with very-well defined sets of facts or skills, a Concept Inventory is a formal, rigorous accounting of one’s mastery of specific details – most often presented in the form of a multiple-choice or other limited-response examination. (The notion of “inventory” reveals that Concept Inventory is tethered to the Acquisition Metaphor.)
  • Confidence-Based Learning – Focused on memorized details, Confidence-Based Learning aims to evaluate both the level of one’s mastery and one’s confidence in that mastery. Reliant on multiple-choice (and similar) tests, Confidence-Based Learning is designed to minimize the effects that guessing can have on scores. (Confidence-Based Learning is unusual in its focused and unwavering alignment with the Acquisition Metaphor, evident in its treatment of learning as measurable objects, its faith in objective measurement, and its uncritical separation of what one thinks one knows from what one actually/objectively)
  • Authentic Assessment­ – Commonly associated with one or more Activist Discourses, an Authentic Assessmentis one that attends not just to mastery of content, but to the worthwhileness, significance, and meaningfulness of learning. It is associated with a range of learner-focused strategies. (Metaphors that link learning to empowerment or voice are common in discussions of Authentic Assessment.)
  • Standards-Based Assessments (Standardized Examinations) – Often contrasted with Norm-Referenced Assessments, Standards-Based Assessments are Criteria-Referenced Assessments that are rigidly indexed to well-defined programs of study and designed to provide quantitative information on the extent to which learners have mastered the content of those programs. (Standards-Based Assessments are most prominently aligned with the Acquisition Metaphor, evident in the treatment of learning as measurable objects and the faith in objective measurement.)
  • Criteria-Referenced Assessments – Often contrasted with Norm-Referenced Assessments, Criteria-Referenced Assessments include any formal tool designed to provide information on the extent of one learning of specific content. Technically, Standards-Based Assessments are Criteria-Referenced Assessments – although, unlike the former tend to be standardized and large scale, while the latter includes quizzes prepared by teachers, tests prepared by textbook publishers, and so on. (Criteria-Referenced Assessments are typically associated with the Acquisition Metaphor, evident in the treatment of learning as measurable object(ive)s.)
  • Norm-Referenced Assessments – Often contrasted with Standards-Based Assessments and Criteria-Referenced Assessments, Norms-Referenced Assessments present information on learners’ mastery of content in terms of rankings relative to one another, rather than in terms of measures of achievement. (Norm-Referenced Assessments rely mainly on the Attainment Metaphor, as learning is typically interpreted in terms of progress along a defined trajectory – based on which it makes sense to rate learners against one another. Contrast with Measures of Ability and Aptitude, under Learning (Dis)Abilities Theories) The following is a small sampling of some of the more prominent, non-discipline-specific Norm-Referenced Assessments:
    • Wide Range Achievement Test, Fifth Edition (WRAT5) (Sidney W. Bijou, Joseph Jastak; originally developed in the 1940s; most recent revision in the 2020s): a brief, group-normed, individually administered test focused on word recognition, sentence comprehension, spelling, and computation. For ages 12 to 94.
    • Peabody Individual Achievement Test-Revised/Normative Update (PIAT-R/NU) (originally developed in the 1970s; revised in the 1980s; updated in the 1990s) an hour-long, individually administered test that yields nine scores (General Information, Reading Recognition, Reading Comprehension, Total Reading, Mathematics, Spelling, Total Test, Written Expression, and Written Language). For ages 5 to 22.
    • Woodcock–Johnson Tests of Achievement, Fourth Edition (WJ IV ACH) (Richard Woodcock, K. S. McGrew, N. Mather; originally developed in 1970s with the Woodcock–Johnson Psychoeducational Battery, most recently revised in 2010s): an individually administered test, comprising 11 subtests in the standard battery and 11 more in the extended battery. It is designed to assess both academic achievement and cognitive development, and it includes measures of skills in reading, writing, oral language, mathematics, and academic knowledge. For ages 2 to 80+.
    • Kaufman Test of Educational Achievement, Third Edition (KTEA-3) (Alan S. Kaufman, Nadeen L. Kaufman; originally developed in the 1990s; most recent revision in the 2010s): an individually administered test to identify both achievement gaps and learning disabilities. It comprises 19 subtests across core academic skills in reading, mathematics, written language, and oral language. For ages 4 to 25.
    • Wechsler Individual Achievement Test, Second Edition (WIAT-II) (David Wechsler; originally developed in the 1990s, revised in the 2000s): an individually administered test used to assess academic achievement across any or all of four areas (Reading, Math, Writing, Oral Language). For ages 4 to 85.
  • Holistic Grading (Global Grading, Impressionistic GradingNonreductionist Grading, Single-Impression Scoring) is a method of evaluating essays and other compositions, based on overall quality and global impressions – usually in comparison to an exemplar of some sort. Holistic Grading is used in both classroom-based work and large-scale assessments.
  • Differentiated Assessments – Associated with Differentiated Instruction, Differentiated Assessments refer to any strategy developed, adopted, or adapted by a teacher to make sense of each individual student’s current level of mastery and consequent needs. (See Differentiated Instruction for the cluster of metaphors most often associated with the discourse.)
  • Portfolio Assessments – Often positioned as an alternative to testing, and in an explicit move to foreground quality or quantity (and quantification), Portfolio Assessments are curated collections of artefacts that are intended to afford insight into learner growth over time. (Portfolio Assessments treat learning in terms of growth and development, and they are thus more frequently associated with Coherence Discourses.)
  • Peer Assessment – As the phrase suggests, Peer Assessment involves students in judging the performance of one another. The notion is not well theorized or defined – and, consequently, associated practices span various spectra of sensibilities. For instance, Peer Assessment may be focused on either formative feedback or summative evaluations, it may be framed as a process based on caring and support or as the objective application of defined taxonomies. And so on. (Given the range of practices and sensibilities associated with Peer Assessment, it cannot be interpreted in terms of singular or prominent clusters of notions.)
  • Performance Assessment (Performance-Based Assessment; Alternative Assessment; Authentic Assessment) – Understood in most general terms, a Performance Assessment is a non-standardized task that presents an opportunity for one to demonstrate a competence in a manner that permits a summative interpretation according to pre-specified criteria. Performative Assessments tend to be contextualized and to be practical or applied in nature. Other aspects (e.g., individual vs. group; close-ended vs. open-ended) vary. (Performative Assessment techniques are usually associated with Coherence Discourses – that is, fitted to understanding learning in terms of construing, mastering, and participating, and learners as active agents and interactive participants.)
  • Self-Assessment – As the term suggests, Self-Assessment is about involving students in evaluating their own work. Principally, advice and commentaries focus on the positive impacts on learners’ awareness of task requirements, their own effort, and their own understandings. Self-Assessment has been shown to be strongly correlated with improved achievement. (While described in different ways, most commentaries on Self-Assessment are strongly aligned with the Attainment Metaphor, especially notions of progress, awareness of location, and pacing.)
  • Assessment Design – Aligned with Instructional Design Models (contrast: Design Thinking), Assessment Design is that aspect of program planning that attends the sorts of practice tasks and evaluation tools that are useful in monitoring learning progress on an assumed-to-be-linear learning trajectory. (Assessment Designstrongly aligns with the Attainment Metaphor.)
  • Feedback – In everyday educational usage, Feedback has come to refer to any direct commentary on one’s work. The original meaning was very different. Coined in the 1920s in the context of electronics, Feedback described when a portion of an outputted signal “fed back” into an input signal – sometimes with sufficient strength to trigger a self-amplifying loop. Applied to teaching and learning, then, Feedback can be interpreted as a critique of teacher-centered, direct instruction as it hints at the importance of learner self-determination and small-but-timely nudges from the teacher. That is, as a metaphor, Feedback can be construed as fitted to most Coherence Discourses. However, there’s no evidence the metaphor was embraced to do that sort of conceptual work.


Arguably, information on what educators really believe about learning is on fullest display in assessment practices. Within formal education, more often than not, assessment is the tail that wags the dogs of curriculum content and teaching practice. Phrased differently, assessment exerts a powerful influence on both explicit descriptions and implicit understandings of “learning” and “learners.” The small subset of discourses that swirl around the topic, then, could be argued to be disproportionately impactful.

Authors and/or Prominent Influences


Status as a Theory of Learning

In a sense, most Assessment and Evaluation are anti-theoretical. That is, they are encountered as end-products of belief systems – the inevitable consequences of assumptions and assertions that have already been embraced.

Status as a Theory of Teaching

Assessment and Evaluation are not often described as “theories of teaching,” but there is widespread recognition that these discourses have powerful shaping influences of what happens in classrooms.

Status as a Scientific Theory

For the most part, Assessment and Evaluation can serve as case studies of what happens when grounding assumptions about learning are ignored. That is, on the whole, Assessment and Evaluation cannot be described as scientific – despite the fact that massive examination-producing and data-analysis industries have arisen around formal testing and student ranking.


  • Analytic Rubrics
  • Assessment Design
  • Authentic Assessment­
  • Concept Inventory
  • Confidence-Based Learning
  • Criteria-Referenced Assessments
  • Differentiated Assessments
  • Feedback
  • Formative Assessments
  • Holistic Grading (Global Grading, Impressionistic Grading, Nonreductionist Grading, Single-Impression Scoring)
  • Holistic Rubrics
  • Kaufman Test of Educational Achievement, Third Edition (KTEA-3)
  • Norms-Referenced Assessments
  • Peabody Individual Achievement Test-Revised/Normative Update (PIAT-R/NU)
  • Peer Assessment
  • Performance Assessment (Performance-Based Assessment; Alternative Assessment; Authentic Assessment)
  • Portfolio Assessments
  • Progress Tests
  • Rubrics
  • Self-Assessment
  • Standards-Based Assessments (Standardized Examinations)
  • Summative Assessment­s
  • Wide Range Achievement Test, Fifth Edition (WRAT5)
  • Woodcock–Johnson Tests of Achievement, Fourth Edition (WJ IV ACH)
  • Wechsler Individual Achievement Test, Second Edition (WIAT-II)

Map Location

Please cite this article as:
Davis, B., & Francis, K. (2020). “Assessment and Evaluation” in Discourses on Learning in Education.

⇦ Back to Map
⇦ Back to List