University of Oxford Department of Educational Studies

Data about the informal assessment practices of thirty UK mathematics teachers were collected through unstructured interviews. A theoretical description of such practices revealed that they were complex and integrated with other aspects of teaching and pupil-teacher interaction. Analysis within descriptive categories showed many contradictions within individuals' practices, or between teachers. Some of these could, theoretically, lead to different assessment decisions being made by different teachers about similar situations. In this paper the author describes many of these potential differences, and concludes that in the UK situation, where differential grouping is the norm and teacher assessment is included in high-stakes assessment, more attention needs to be paid to evaluating such decisions.


Teacher assessment of mathematics is a component of statutory assessment in the UK at ages 7, 11, 14 and 16. Results of statutory assessment can be used to determine access to particular mathematics classes, schools, examination tracks and progress to further education or employment. Hence teacher assessment is high-stakes assessment. In addition, decisions about access to the curriculum through differentiated teaching groups are made throughout a child's school career, by grouping within classes in primary schools and by tracking in secondary schools. Teachers' informal assessments contribute to these decisions.

During the last ten years all teachers have been trained in the use of assessment techniques against National Curriculum statements, and record-keeping, pupil-monitoring and reporting have become important parts of their work.

Underlying these formal and paperwork aspects are the human interactions and judgements made in the ordinary day-to-day practice of the classroom. Interactions are subject to interpretation; mathematics is heavily dependent on interpretation since it is communicated through a variety of forms, not all transparent. The teacher's response depends on her interpretation of the children's attempts to communicate their mathematical understanding. It was to learn about these processes that the research study was set up.


Methods and analysis

A report of the methods and analysis follows, this being necessarily brief in order to devote most of the paper to the findings.

Thirty teachers of 10, 11 and 12 year-olds were selected from a range of schools and interviewed about their practices. Interviews were unstructured and based around core questions about assessment methods and descriptions of the mathematical understanding of pupils chosen by the researcher after classroom observation. Pupils were chosen for their apparently unexceptional behaviour and response in class. Interview transcripts were analysed for content to create a detailed story of the range of practices and issues accumulated by all the teachers; the aim was to provide descriptions of what might be going on in teaching mathematics in general, not portraits of individual teachers. Interviewing and analysis were undertaken concurrently, a liberal approach to coding was taken, transcripts and data were re-read frequently and several theoretical frames were tried in order to produce a credible view of the field. These techniques to develop grounded theory were broadly comparable to those suggested by Glaser and Strauss [1967] and are described elsewhere [Watson,1998].

A description of practices was developed, parts of which have been published elsewhere [Watson,1995 & 1996]. So far, the work had been largely descriptive, but during analysis it was found that a more critical stance could be taken towards teachers' methods. The critique stemmed from a range of literature suggesting that teachers' expectations might be based on social and behavioural aspects of children's classroom performance, rather than mathematical achievements [McIntyre et al 1966; Lorenz, 1982; Ruthven, 1987]. Also important were examinations of fairness in assessment [Gipps & Murphy, 1994]. Apart from a few cursory suggestions that teachers should be wary of bias, and that assessments should be moderated somehow in schools, this literature appears to have been largely ignored by the bodies generating training materials to support teacher assessment [SEAC,1991]. Emphasis in the official advice was on recording and reporting rather than examining judgements. Further research showed that even aware and careful teachers who were being observed could conceivably make biased judgements [Watson,1997]. The power of first impressions to influence future interpretations was also a factor in making judgements [Nisbett & Ross,1980].

It was decided to analyse the interviews and produce a full list of possible sources of contradiction and inequity. The original descriptive categories were grouped into a network which reflected the relative positions of teacher and pupil in a hierarchy of power which was seen to operate in schools in the field of assessment. Teachers-as-assessors were seen to be answerable to governors, headteacher, inspectors and appraisal systems as well as to a view of mathematics represented by the National Curriculum (NC). They also had their own doubts, beliefs and attitudes. These factors combined to influence their actions and decisions as assessors. Other raw material for assessment decisions included, of course, the pupils' predispositions and actions; teachers have powerful positions as assessors of pupils. Further elaboration can be found in Watson [1998]. These power vectors are summarised in this diagram: 

With this power structure in mind the raw data was re-read and re-sorted. It was found that differences and contradictions in practice were revealed at every level, and similar issues showed up in several places. Return to the raw data after structuring according to power proved to be very fruitful in terms of revealing sources of inequity. As a result of this stage of the research a small scale study of in-house moderation meetings was conducted, the outcomes of which are referred to below.

The rest of this paper will report the potential sources of inequity so found.

Problems made explicit by teachers

Problems and issues of informal mathematics assessment mentioned explicitly by teachers included a broad concern with not having enough time to do assessment properly, according to what they believed to be required or important:

time to talk and listen with pupils enough;

time to assess fully each individual pupil;

time to find out what pupil is thinking when they appear to have made an error.

Also prevalent were comments about the inaccuracy of statutory tests and the difficulty of having to make summative decisions: tests do not give a true picture of achievement;

NC criteria are not always clear;

there is not enough time to prepare pupils for tests and teach for understanding;

what a pupil can do today they may not be able to do tomorrow;

there is a gap between being able to "do" and being able to "write" maths.

A further area of disturbance was a recognition that pupils appear differently to different teachers: previous teachers "under-" or "over-" assess (sic);

different pupils respond to different teachers differently.

Apart from the last area, to which I shall return below, the other areas of doubt are all about existing systems, rather than the teacher's own role within the system. Any of these problems could lead to different decisions being made by different teachers, particularly in contrasting teaching situations, and hence to potential inequity.


Implicit problems in teachers' descriptions of practice

This section reports on problems which were not mentioned explicitly by teachers but were apparent in the analysis.

Although oral interactions were far and away considered the most important way to assess individual understanding, and the skills of reading and writing were seen as barriers to understanding and to communication of understanding, written outputs were valued more highly as evidence of achievement, and the value increased for higher stakes assessment. In particular, extended written demonstrations of understanding were valued throughout primary and secondary schools, although some of the subject-specific features of mathematics are brevity, essence, symbolism and compactness of argument.

Most teachers expressed their distrust of tests as providers of anything other than a flawed snapshot. There was a feeling that understanding was contextually and temporally specific, and that tests would therefore not show the whole picture. Nevertheless, there was widespread use of tests of various kinds for various uses in school; reasons included parents' needs, pupils liking getting ticks, they provide "firm" evidence even if it is not complete, they allow monitoring of what pupils are learning etc. No one suggested that the level of understanding they sought through their own tests would not be time and context specific, although this was frequently applied to other kinds of assessment including SATs, and hence teacher-devised tests were assumed to reveal an ultimate level of understanding.

Different views of learning led to radically different teaching approaches. Some teachers gave practical work first, followed by skills and abstract work, while others taught the other way round. Since most teachers agreed that use was the ultimate evidence of understanding, these differences will lead to different assessment decisions. However, the adherence to tests suggests that teachers will take performance as evidence if opportunities for use cannot be assured. Very few examples of use of mathematics were given as illustrations; very few were observed to be deliberately offered in classrooms.

Teachers' descriptions of an individual pupil's mathematics were largely about behaviour and attitudes to learning. Little use was made of descriptions of mathematical thinking in Ma1, although some teachers clearly valued some aspects of mathematical thinking. Although all said they valued "understanding" above "right answers" the extent to which they took this in their teaching and assessment varied hugely. There was also variation in what kinds of approaches to work were valued; for instance, some teachers might insist on step-by-step formal workings of a conventional type being given where others might accept intuitive, imaginative and short-cut arguments. An exploration of the characteristics teachers said were important for pupils to be good at mathematics revealed a little beyond good work habits. Some teachers were specific about kinds of thinking thought necessary to be successful at mathematics, but only a few mentioned anything that was special to mathematics. Many teachers regarded as essential certain traits which are useful for organisational purposes (speed, accuracy) but which have not been found to be essential in advanced mathematics [Krutetskii,1976].

In nearly all the schools visited pupils were grouped and offered different curricula for at least part of the time according to teachers' judgements and test results. The degree of flexibility in grouping varied a little; some teachers regarded different aspects of mathematics as bringing out different strengths, others that ability in mathematics was largely homogeneous across number, shape, reasoning and data-handling. There were differences in what was regarded as innate and what was regarded as changeable; for instance, "ability" was usually talked about as if it is innate, yet one or two teachers did not believe this and chose instead to talk about gaps in knowledge, lack of suitable mental images, a need for more confidence and so on as if ability could change given appropriate teaching. In general, however, grouping and setting were fixed for most pupils, and I took these to be high-stakes decisions because access to different curricula affects pupils' futures.

Different decisions could be made by different teachers based on their own judgements, their own views of what is valuable in mathematics and their own views of what is changeable in their pupils.

Teachers' views of fairness in their judgements varied, at one extreme some teachers believed that the only fair assessment would be the same test for all pupils of the same age; at the other, some teachers believed that the only fair assessment was their own judgement made on the strength of their knowledge and observations. Holders of each view felt the other would give skewed pictures of pupils. When asked, all teachers were convinced their judgements were fair, and many described how any statement of assessment was only a snapshot, and that pupils' knowledge and achievement were not permanently measurable qualities. Others talked of how they used as much evidence as they could to ensure fairness, but it was noticeable that these checks were always from other aspects of their own judgements rather than against evidence from elsewhere.

Differences in the teachers' own levels of mathematical knowledge did not seem to affect their assessment practices, or views of mathematics. Similar variations in practice were found among primary and secondary teachers. None of the above remarks apply more to one phase than another apart from a very slight increase in direct reference to mathematical thinking encountered among secondary teachers, and a decrease in the use of the word "confidence". More important was the similarity of problems about implementing the NC described by all teachers [Askew et al,1993]. More worrying than this agreement were the differences, such as those above, which could directly affect pupils' futures and which occurred in all phases.

These findings show that SEAC's view [1991], that teacher assessment should be a "combination of professional judgement and common sense in the use of available time" [p.19], is over-optimistic about the operation of both "professional judgement" and "common sense".

Further potential sources of inequity

The latter stages of the research revealed more causes for concern, and potential sources of inequity. It was shown that even committed, aware teachers were capable of making hasty judgements based on partial information and then viewing the pupil's subsequent behaviour in the light of their first impression, sometimes being reluctant to change even when there is a lack of evidence to support the view. The normal standards of inter-personal judgement which people use in their daily interactions appeared to be used also in the classroom. The presence of a researcher inevitably made the teachers more aware about the conclusions they drew, but even then their views of most of the focus pupils could be challenged, if not contradicted, by more detailed evidence. I commented above that teachers appeal to the breadth of their evidence to support their judgements, and of course this may be adequate if a teacher is prepared to change her mind. Tests were sometimes seen as a safeguard, yet often a pupil has already been offered a differentiated curriculum before the test so that an unfair decision may already have been made. Also it was found that a teacher could dismiss or excuse a test result which does not accord with her own judgement.

School moderation meetings could provide a safeguard of professional discussion about assessment decisions, but teachers' views and prior decisions about pupils were used as extra evidence in such meetings rather than as the focus of examination. A kind of circular self-justification was used to explain assessments so that " he's one of my brighter ones" (an assessment decision) was used to justify "so his work is an A grade" (another assessment decision). It is fair to add that most of the time moderation meetings were about agreeing the meaning of, and evidence for, certain criteria, yet arguments such as the one above were noticeable because they were the only times that teachers' informal judgements were mentioned. Only a few such meetings were visited, so I cannot generalise from the results, but the data supports the raising of the issue of whether teachers' informal judgements, which carry so much weight for a pupil, are ever professionally examined, challenged and systematically justified in terms of mathematical achievement.

Can assessment of mathematics be fair?

This research shows that teachers, in good faith, can fail to act justly for individuals because of the nature of their informal judgements and, ironically, their emphasis on individual treatment of pupils. Perhaps attention to pupils as a group, rather than as individuals, might result in fairer decisions. Or perhaps "fairness" should be in terms of the community as a whole rather than individuals.

So if teachers' socially acceptable attention to individuals can result in injustice, how can this be ameliorated?

In the research it was found that teachers rarely showed awareness of the potential for flaws in their judgements. One step forward could be to become aware of these processes, so that judgements are doubted and teachers become self-critics, doubting themselves as well as the systems and requirements with which they work. Instead of this being a private activity, it could be part of professional life. Another step could be for teams of teachers to demand more justification of each other's decisions, and less accepting of commonplace phrases which may mask judgements. A further step would be to review all decisions frequently, avoiding circular self-justification which uses previous decisions to justify current ones, but possibly testing other possibilities to ensure that pupils have not been needlessly limited by what is offered to them.

This last suggestion indicates that accepted hierarchies of mathematical knowledge might be questioned as part of professional life. There are many examples in history, and in the data, and in Krutetskii's work [op cit.] to show that uniform progress in all areas is not necessarily a pre-requisite for mathematical success. Nevertheless school mathematics is usually organised as if it were.

Of course flawed judgements do not matter if they do not affect pupils' futures. Another area for change would be to avoid the incorporation of unexamined or interpersonal judgements, or judgements dependent on local circumstances, into high-stakes assessments. By this I mean avoidance of semi-permanent grouping or setting decisions which result in different curricula and hence different opportunities for progress, as well as summative assessments at the end of courses or in career selection. Increased setting in schools [Ofsted,1994], often partly dependent on teacher assessment decisions, is a manifestation of the frequency of high-stake decision-making in mathematics.

It is recognised, by teachers in the case of external measures and through this research in the case of teachers' own measures, that all measures of mathematical achievement are flawed and hence potentially unjust if used to make decisions about futures.


Since teachers' judgements influence pupils' mathematical progress on many levels, in particular the comparatively early selection and differentiation of treatment peculiar to mathematics teaching, it is crucial that these are regarded as high-stakes decisions.

It is therefore recommended that:

It is my belief that such developments would enhance the professional life of mathematics teachers and improve mathematics teaching and learning.