The Ethics of Grading (About Grades, Part 2)

multiple-choice graphic

Grading is a method to measure students’ performance while the type of grading system employed is a representation of its underlying educational ethics. The more simple the grading system, the more simple the assertions of graders about the graded. The more multi-faceted the grading system, the more factors an assessment entails. In the latter case, justifications for stipulating assessment criteria need to be provided. Typically, no or little justifications are given in the case of simple grading methods, such as multiple-choice, true-false, matching, or simply accumulating errors and points that are commonly used in primary and secondary education.

But what are the ethics behind grading systems? Do some grading systems violate ethics and if they do, how should students be assessed instead? Let us have a closer look at the relation of ethics and academic assessment.

Low-level Grading and its Underpinnings in Social Darwinism, Liberalism and Behaviourism

Simple low-level grading is underpinned by a set of assumptions. These are that students should be graded individually regardless of social context and prior conditions, that given grades are a truthful account of a student’s performance and aptitude and that final grades are fair. One could argue that such grading is rooted in both liberalism and social Darwinism since only the fittest survive (at least from the perspective of teachers) and that teachers bear no responsibility whatsoever for their students’ learning – analogously to the view that governments should not interfere in markets. In such a more Darwinian outlook, some students are simply more gifted than others. It is nature above nurture, genetics above pedagogy. According to the philosophy of natural selection, traditional grading simply ‘separates the wheat from the chaff’. Worse, grading kills intrinsic motivation when comparing non-graded groups with graded groups on task autonomy (Pulfrey et al.,  2013).

As a tool of behavioural control, grades are commonly set out as rewards (‘An A, well done!) while bad grades serve as punishment. In principle, there is little difference between pupils getting grades and a rat inside a Skinner Box receiving either food or electric shocks as positive or negative reinforcements. The main task for students inside the learning box is store content temporarily in the short-term memory in order to pass exams and to forget the acquired knowledge shortly after; a cycle that the German philosopher Richard David Precht described as ‘bulimic learning’ (BulimieLernen). As a side effect of bulimic learning, students learn that knowledge is dispensable and not meant to be part of an all-encompassing lifelong learning process. Another side-effect is that graded students perceive themselves in a socially competitive situation and tend to prefer supporting evidence over questioning evidence, this is that grading compromises critical thinking (Hayek et al., 2014).

As grades are applied universally all over the world and across institutions, the systemic conditioning towards the belief in grades is strengthened along learners’ educational trajectories.

Mid-level Grading and its Meritocratic-liberal Stance

More complex grading based on criteria requires a justification of these criteria and is, subsequently, open to debate. Some institutions of Higher Learning assess higher-order learning outcomes such as, e.g., the use of specified evidence, the quality of the evidence cited, the ability to understand and differentiate concepts, to relate facts to ideas, to frame a general problem within a local context, the ability to argue cases and integrate multiple perspectives, to choose adequate methods of analysis, to be able to employ critical thinking as well as to demonstrate overall consistency. Such comprehensive, mid-level assessment takes time and requires educators to design adequate scoring rubrics.

The advantage of mid-level assessment is that students do not only know why they have received a specific grade (which could still be equivalent to providing or not providing a ‘model answer’) but indicates which areas to put more effort in. Scoring rubrics have the advantage that they can serve as a formative feedback to learners. A disadvantage is that they are typically limited to assess only cognitive skills.

From an ethical perspective, we could label mid-level scoring as a meritocratic approach: Although assessed on an individual account, all students are provided with a fair chance to improve their identified weaknesses and to build on their strengths in order to gain merit via continuous improvement. This concept is also liberal in a sense that individuals are provided with an opportunity (or right) for improvement, at least at face value, while it ignores an individual’s ability (or inability) to capitalise on a given opportunity. Meritocratic-liberal assessment is still based on the assumption that learning happens primarily individually and independent of social context, contrary to evidence provided by psychological research.

The different social starting points of learners and their contextual limitations (e.g., some students might have highly supportive parents while others have not, some have the financial means to pay for tutors or to participate in international exchange programs while others have not) are not subject to potential interventions. To this extent, mid-level grading is not directly engaged in providing equal opportunity since no support is offered to weaker students for improving their performance, even if it is pointed out to them which areas of studies they should focus more. It is like telling a thirsty person in the middle of a desert not to worry about water since the next oasis is only a couple of hundred miles away. This is why meritocratic-liberal assessment is more suitable for homogeneous classes where students are approximately on the same level rather than heterogeneous, socially diverse student populations. But how about exceptionally gifted students?

High-end Social-discursive Evaluation Beyond Grades

Jane Robbins wrote about elite students in InsideHigherEd that ‘They want the more complex, nuanced, individual (or small group), creative work. And while they can do a great deal in interaction with each other, they need and want, the guidance of experts with depth and breadth in the field at hand. They want and need feedback because they don’t yet have experience in solving those kinds of problems. Neither are they satisfied just to get their A- for many top students, A’s are easy, but the A in and of itself does nothing to motivate them, or do other than present a false sense of complete mastery; you can get an A and still need to advance to the next level of thinking. So for elite students, the teacher is a mentor, coach, prodder, supervisor who provides his or her guidance through feedback.’

On the highest level of learning, students are evaluated for a plethora of abilities. Among them is the ability to empathise, social cooperation and teamwork skills, to take on different social roles and responsibilities within a team, to conduct research in meaningful projects on authentic problems (or phenomena) and to foster originality and creativity. Elite learning is (a) socially scaffolded and discursive. It embraces (b) critical discussions and the development of mastery in learning, while (c) underlying motivation is entirely intrinsic and not extrinsic.

Why a Hierarchy of Evaluation Systems is Counterproductive

In many elite universities, grading has become redundant. Equal opportunity is mediated by including all students in research projects. The adequate description by Robbins begs the question why only elite students should be worthy of mentors and coaches. Isn’t achieving mastery in learning even more relevant for weaker students, especially at an early age?

It seems awkward and illogical that few lucky students are rewarded by high-level social-discursive evaluation systems (once they have made it through the maze of socio-Darwinist and meritocratic-liberal systems), but such privilege is kept away from ordinary students in the beginning of their development when they need such scaffolding the most.

By looking at learning environments that foster highly successful students, our journey into grading systems turns full cycle. Simple methods of evaluation are subject to social bias and confirmation bias can only yield distorted and inadequate conclusions about the true complexity of students’ learning and potential. Especially at a younger age, pupils deserve to develop the full range of social, emotional and cognitive capabilities to support more differentiated cognitive and metacognitive schemata some years later. Tell me your assessment system and I tell you how qualified as a teacher you are.


Hayek, A.-S., Toma, C., Oberlé, D., & Butera, F. (2014). The effect of grades on the preference effect: Grading reduces consideration of disconfirming evidence. Basic and Applied Social Psychology, 36(6), 544-552.

Pulfrey, C., Darnon, C., & Butera, F. (2013). Autonomy and task performance: Explaining the impact of grades on intrinsic motivation. Journal of Educational Psychology, 105(1), 39-57.

Strengths and Limitations of Behaviorism for Human Learning

The Evidence from Research on Behavioral Theories

Pavlov’s work on classical conditioning (Pavlov, 1927) and Skinner’s concept of operant conditioning (Skinner, 1953) have provided the blueprints for evidence-based applications in behaviorism. Behaviorism has since proven effective, for example in the diagnosis of patients with mental disorders by operationalizing the acquisition of new behavior (Barrett & Lindsley, 1962), improving item-recall for dementia patients (Dixon et al., 2011) or for conditioning students in military and technical education (Gökmenoğlu & Kiraz, 2010).

In combination with cognitive therapy, behavior modification helps autistic children with the acquisition of life-skills (Virues-Ortega et al., 2013). Behaviorism has proven its efficacy in contexts that require the performance of convergent and highly context-dependent tasks. (Photograph: B.F.Skinner/ rat in a Skinner Box.)

Strengths and Weaknesses

A central strength of behaviorism is that results can be reliably reproduced experimentally such as in a Skinner box or similar apparatus. This evident advantage translates into several distinct counter-arguments. Firstly, behaviorism does not acknowledge active human agency, this is conscious self-awareness (Chalmers, 1996) which is typically mediated via language. Key properties of human agency are intentionality, forethought and self-reactiveness (Bandura, 2006, p. 164-165), all of which play no role in behaviorism.

Secondly, a behaviorist perspective can  not explain how people make procedural decisions or negotiate between various types of potential rewards and goals. Most of human behavior is not based on conditioned, convergent reflexes on a single task, but correlates to preceding mental processes that are divergent and collaborative in nature (Funke, 2014; Eseryel et al., 2013; Hung, 2013). Besides, divergent thinking is related to developing interpersonal trust (Selaro et al., 2014). The Theory of Planned Behavior (Ajzen 1991, 2002) could be regarded as the anti-thesis to behaviorism since it postulates attitudes, norms, a person’s perceived behavioral control and intentions as precursors to behavior, rather than specific environmental stimuli.

Since reflexes are strictly defined as physiological interactions, behaviorism cannot explain individual differences in human learning, variations in learning- styles and the influence of personality on learning (Rosander, 2013; Kamarulzaman, 2014). The neurological functionality of reflexes is constrained to given brain organization (Goffaux et al., 2014) and neurotransmitter processes (Striepens et al., 2014) and excludes higher brain functions invoking mental processes (Degen, 2014). Behavioral studies and therapies in clinical settings also run into ethical problems on how to obtain legal consent for behavior modification, such as for patients with mental disorders and neurological impairments (Digdon et al., 2014).

On the Validity of Animal Studies

One of the distinguishing differences between humans and animals is the use of language. Using an Information Theory approach, Reznikova (2006) concludes that animals produce no syntax and provide little evidence for the learning and modification of signals (Reznikowa, 2006, p. 9), a notion shared by Seyfarth and Cheney (2009) who state that learned, flexible vocal production is relatively rare. A predictable, communicative system is absent in the animal world (Seyfarth & Cheney, 2009, p. 97-98) while signals are limited to specific contexts such as greetings, infant distress or predator alarm (Marler & Tenaza, 1977; Snowdon, 1986).

Human language development, by contrast, is tied to the development of Theory of Mind (ToM) skills (Miller, 2006). Language acquisition by reinforcement (Skinner, 1957) can neither sufficiently account for the semantic and pragmatic dimensions of coordinated human speech, nor for the meta-contextual quality of its acts (Chomsky, 1983; Searle, 1969). By ignoring cognitive development (Skinner, 1950), behaviorism deprives itself of fully understanding the role of behavior such as e.g., children’s joint attention, engaging in imitation and play-behavior, not only as precursors to language but for the parallel development of mental abilities (Charman et al., 2000).

Animal studies can be compromised by animals being exposed to uncontrolled pain and stress variables (Rollin, 2006, p.293; see also Watanabe, 2007). An anthropomorphized interpretation can furthermore lead to biased reporting. For example Martin Seligman’s behavioral dog experiment (Seligman, 1975) follows such anthropomorphism. Seligman found that when animals were given electrical shocks that they were not able to prevent (and subsequently surrendered in apathy) they tended to react similarly inactive in situations where they could have avoided punishment. Seligman concluded that the same was true for humans who suffer from depression in the form of ‘learned helplessness’. However, the experiment could likewise be interpreted that the animals were simply conditioned to accept new thresholds for enduring pain, that they had been traumatized or both. Besides, drawing inferences from animal reactions for human mind states and motivation seems far-fetched and impossible to prove. Some years later Seligman distanced himself from his original research findings (Abramson et al., 1978).


Behaviorism has valid, but limited applications. In clinical psychology behaviorist theory is typically complemented with cognitive theory to produce more efficient results (Feltham & Horton, 2006). In modern military education, issues such as professional ethics and mindfulness require cognitive skills and training (Major, 2014; Starr-Glass, 2013); the same applies to training in sports (Samson, 2014; Huntley & Kentzer, 2013). Behaviorism remains highly relevant in animal conditioning. It has however, with the advent of neurological imaging technology and the scientific measurement of cognitive processes (DeSouza et al., 2012; Kühn et al., 2014) ceased as a leading theory of learning. Few people know that Pavlov not only experimented on dogs, but also on children and that Skinner envisioned operant conditioning on societal scale, approaches that have become unacceptable in contemporary scientific ethics. Behaviorism does have its applications, but they must be seen in the context of human agency.


Abramson, L.Y., Seligman, M.E.P. & Teasdale, J.D. (1978). Learned helplessness in humans: Critique and reformulation. Journal of Abnormal Psychology, 87, 49-74.

Ajzen, I. (1991). The theory of planned behavior. Organizational Behavior and Human Decision Processes, 50, 179-211.

Ajzen, I. (2002). Perceived Behavioral Control, Self-Efficacy, Locus of Control, and the Theory of Planned Behavior. Journal of Applied Social Psychology, 32, 665-683.

Barrett, B. H., & Lindsley, O. R. (1962). Deficits in acquisition of operant discrimination and differentiation shown by institutionalized retarded children. American Journal of Mental Deficiency, 67, 424-435.

Chalmers, D. J. (1996). The conscious mind: In search of a fundamental theory. New York: Oxford University Press.

Charman, T., Baron-Cohen, S., Swettenham, J., Baird, G., Cox, A., & Drew, A. (2000). Testing joint attention, imitation, and play as infancy precursors to language and theory of mind. Cognitive Development, 15, 481–498

Chomsky, N. (1983) A review of B.F. Skinners verbal behavior. In Block, N., Readings in Philosophy of Psychology: Volume 1. Cambridge, Massachusetts: Harvard University Press.

Degen, R. (2014). Brain-Based Learning: The Neurological Findings About the Human Brain that Every Teacher should Know to be Effective. Amity Global Business Review, 915-23.

DeSouza, J., Ovaysikia, S., & Pynn, L. (2012). Correlating behavioral responses to fMRI signals from human prefrontal cortex: Examining cognitive processes using task analysis. Journal Of Visualized Experiments, (64), 1. doi:10.3791/3237

Digdon, N., Powell, R. A., & Harris, B. (2014). LITTLE ALBERT’S ALLEGED NEUROLOGICAL IMPAIRMENT.  Watson, Rayner, and Historical Revision. History Of Psychology, doi:10.1037/a0037325

Dixon, M., Baker, J. C., & Sadowski, K. (2011). Applying Skinner’s Analysis of Verbal Behavior to Persons with Dementia. Behavior Therapy, 42(1), 120-126.

Eseryel, D., Ifenthaler, D., & Ge, X. (2013). Towards innovation in complex problem solving research: an introduction to the special issue. Educational Technology Research & Development, 61(3), 359-363. doi:10.1007/s11423-013-9299-0

Fedurek, P., & Slocombe, K. E. (2011). Primate vocal communication: A useful tool for understanding human speech and language evolution?. Human Biology, 83(2), 153–173

Feltham, C., & Horton, I. (2006). The SAGE handbook of counselling and psychotherapy. London: SAGE.

Funke, J. (2014). Analysis of minimal complex systems and complex problem solving require different forms of causal cognition. Frontiers In Psychology, 51-8. doi:10.3389/fpsyg.2014.00739

Geiser, Robert L. 1978. “Review of ‘Behaviorism and ethics’.” American Journal Of Orthopsychiatry 48, no. 4: 736-738. PsycARTICLES, EBSCOhost (accessed August 12, 2014).

Goffaux, P., Girard-Tremblay, L., Marchand, S., Daigle, K., & Whittingstall, K. (2014). Individual differences in pain sensitivity vary as a function of precuneus reactivity. Brain Topography, 27(3), 366-374. doi:10.1007/s10548-013-0291-0

Gökmenoğlu, T., Eret, E., & Kiraz, E. (2010). Crises, Reforms, and Scientific Improvements: Behaviorism in the Last Two Centuries. Ilkogretim Online, 9(1), 292-299.

Hung, W. (2013). Team-based complex problem solving: a collective cognition perspective. Educational Technology Research & Development, 61(3), 365-384. doi:10.1007/s11423-013-9296-3

Huntley, E., & Kentzer, N. (2013). Group-based reflective practice in sport psychology: Experiences of two trainee sport and exercise scientists. Sport & Exercise Psychology Review, 9(2), 57-67.

Kamarulzaman, W. (2012). Critical Review on Affect of Personality on Learning Styles. Online Submission, [serial online]. March 1, 2012; Available from: ERIC, Ipswich, MA. Accessed August 12, 2014.

Kühn, S. M., Müller, B. N., van Baaren, R. B., Brass, M., & Dijksterhuis, A. (2014). The importance of the default mode network in creativity—A structural MRI study. The Journal Of Creative Behavior, 48(2), 152-163. doi:10.1002/jocb.45

Major, A. (2014). Ethics Education of Military Leaders. Military Review, 94(2), 55-60.

Miller, C. A. (2006). Developmental Relationships Between Language and Theory of Mind. American Journal Of Speech-Language Pathology, 15(2), 142-154. doi:10.1044/1058-0360(2006/014)

Pavlov, I. P. (1927). Conditioned Reflexes: An Investigation of the Physiological Activity of the Cerebral Cortex. Translated and Edited by G. V. Anrep. London: Oxford University Press.

Reznikova, Z. (2007). Dialog with black box: using Information Theory to study animal language behaviour. Acta Ethologica, 10(1), 1–12.

Rollin, B. (2006). The regulation of animal research and the emergence of animal ethics: A conceptual history. Theoretical Medicine and Bioethics, 27(4), 285–304

Rosander, P. (2013). The importance of personality, IQ and learning approaches: Predicting academic performance.

Samson, A. (2014). Sources of Self-Efficacy During Marathon Training: A Qualitative, Longitudinal Investigation. Sport Psychologist, 28(2), 164-175.

Sellaro, R., Hommel, B., de Kwaadsteniet, E. W., van de Groep, S., Colzato, L. S., Tops, M., & Hecht, D. (2014). Increasing interpersonal trust through divergent thinking. Frontiers In Psychology, 51-4. doi:10.3389/fpsyg.2014.00561

Searle, J. (1969). Speech Acts, Cambridge University Press

Seligman, M.E.P. (1975). Helplessness: On depression development and death. W. H. Freeman, San Francisco

Seyfarth, R. M., & Cheney, D. L. (2010). Production, usage, and comprehension in animal vocalizations. Brain & Language, 115(1), 92–100.

Starr-Glass, D. (2013). Experiences with Military Online Learners: Toward a Mindful Practice. Journal Of Online Learning & Teaching, 9(3), 353-364.

Striepens, N., Matusch, A., Kendrick, K., Mihov, Y., Elmenhorst, D., Becker, B., & … Bauer, A. (2014). Oxytocin enhances attractiveness of unfamiliar female faces independent of the dopamine reward system. Psychoneuroendocrinology, 3974-87.

Skinner, B. F. (1950). Are theories of learning necessary? Psychological Review, 57, 193-216.

Skinner, B. F. (1953). Science and human behavior. New York, Macmillan

Skinner, B.F. (1957). Verbal Behavior. Acton, MA: Copley Publishing Group.

Virues-Ortega, J., Rodríguez, V., & Yua, C. T. (2013). Prediction of treatment outcomes and longitudinal analysis in children with autism undergoing intensive behavioral intervention. International Journal Of Clinical Health & Psychology, 13(2), 91-100.

Watanabe, S. (2007). How animal psychology contributes to animal welfare. Applied Animal Behaviour Science, 106(4), 193–202.