DIRECTIONS e-journal

Xenia Delieza
MONITORING AND EVALUATING THE KPG SPEAKING TEST

Introduction

Oral assessment of language proficiency is a complex and largely subjective process in which many variables or facets have been found to affect the quality and quantity of candidate language output and the rating of their performance. Ultimately, this threatens the validity, reliability and fairness of the oral test procedure. The role and linguistic behaviour of the interlocutor during the oral exam has been highlighted by many researchers as a major variable which can potentially affect candidate output and examiner rating. With this in mind, the KPG English team has begun systematic examiner training but also examiner-conduct quality assessment, through the English Speaking Test Observation Project (ESTOP).

This project was launched in November 2005, aiming to identify whether and to what extent examiners follow oral test conduct rules, adhere to the test guidelines and carry out the oral test as instructed. In other words, by having especially trained professionals observe examiners while testing candidates, with the help of specially constructed observations tools, the English team wanted to obtain information about the efficiency of the oral test administration, about examiner conduct, about the applicability of the oral assessment criteria and the usefulness of the marking grids. The information actually obtained has been essential for the development and refinement of the oral test and for the training and evaluation of oral examiners. The results from this first phase (November 2005) were used in Observation Phase 2, in May 2006 and this also led to four more observation phases: May 2007, November 2007, May 2008 and November 2008.

To date, six observation phases have been carried out and for each one, a new, refined observation form has been produced, based on the findings of the previous phase.

As one can see in the table below, during these six observation phases 1,948 oral examiners were observed examining 6,755 candidates.

			Examiners	Candidates
PHASE 1: November 2005 Levels B2 & C1	25 observers	B2	138	470
PHASE 1: November 2005 Levels B2 & C1	25 observers	C1	98	288
PHASE 2: May 2006 Levels B2 & C1	33 observers	B2	155	540
PHASE 2: May 2006 Levels B2 & C1	33 observers	C1	118	418
PHASE 3: May 2007 Levels B1, B2 & C1	32 observers	B1	35	132
		B2	156	588
		C1	105	342
PHASE 4: November 2007 Levels B1, B2 & C1	42 observers	B1	50	201
		B2	177	753
		C1	100	339
PHASE 5: May 2008 Levels A1-2, B1, B2 & C1	48 observers	A1-2	45	184
		B1	60	193
		B2	182	612
		C1	136	440
PHASE 6: November 2008 Levels A1-2, B1, B2 & C1	41 observers	A1-2	51	113
		B1	55	154
		B2	187	659
		C1	100	329

Table 1: The KPG observation project in numbers

How is Observation conducted?

During observation, selected trained professionals are assigned to different examination centres to monitor the oral test, without interfering with the procedure in any way. While watching, as third parties, observers fill in their forms before, while and after the oral test has been conducted. The observation forms are designed so that each one is used for only one test session, and observers are instructed to monitor each examiner twice; i.e. with two pairs of candidates.

The project has been conducted, so far, in a random choice of examination centres around Greece,[1] and the observers are present at the examination centres going from one examination room to the other, for as long as the examination sessions last: morning to afternoon.

When their observation job is completed, observers send their completed observation forms to the English Team so that information is processed, and data is analysed. Qualitative and quantitative results are included in a report prepared and taken into account by the speaking test development team, by the persons responsible for designing the next phase of observation and by those responsible for the examiner training programme.

The observation forms

The tools prepared for this project, i.e., the observation forms are structured as checklists, with specific categories and subcategories. Respondents circle YES/NO or tick each item. There is also space for open ended remarks next to certain items. The content of these forms helps the English Team to elicit information about the candidates (age, sex, literacy level and how well they did on which tasks). More importantly, however, it is designed to elicit information regarding the examiners and their conduct, their choice of tasks, whether or not they used time effectively, how they applied the criteria for marking, etc. Finally, they elicit information regarding examiners’ language use and whether or not they alter task rubrics and thus interfere with candidates’ language output.

A summary of some of the results[2]

The findings of the observation project proved valuable in many respects. Firstly, they verified what the English Team suspected regarding the frequency of examiner intervention and their potential effects on the validity, reliability and fairness of the test as a whole. Secondly, they highlighted the need to introduce changes in the examiner training programme, so as to limit examiner intervention. Thirdly, the findings revealed that there are no examiners that systematically intervene and others that do not. Rather, their interference depends on a number of factors, such as candidate level of competence and quality of performance, stage of the test, etc. More specifically, the findings reveal that examiners most frequently change task rubrics (by using an introductory question, adding their own question or expanding the original question with added information) in the first activity of the lower level exams. The interpretation is that examiners tend to do this to reduce candidates’ anxiety and to facilitate language output. In the other two activities of the B1 and B2 level exams, examiners tend to tamper with task rubrics, but less frequently than at lower levels. Interventions mainly take the form of expanding the original task rubric or simplifying it through the use of examples in order to help candidates understand task requirements and to ensure that candidates respond to the demands of the task. A general conclusion is that the higher the level of the oral test, the lower the intensity of the examiners’ interference. During the C1 level speaking test, there is sporadic intervention.

The importance of the observation project for the KPG oral test

The information elicited from the ESTOP has proven valuable and extremely useful for the KPG test developers in many ways, and especially because the results have contributed to the improvement of test content. In other words, the speaking tasks take into consideration, among other things, the results of the observation project. Furthermore, the guidelines for how to conduct the speaking test have been affected by the observation project results. One of the important outcomes was that an Interlocutor Frame was introduced, to tackle the problem of examiner performance variation.

The ESTOP has been constructive on a variety of other levels too. For one thing, it has allowed the English Team to evaluate examiners’ performance. This is very important since the ultimate aim of the system is to establish and maintain a certified body of trained examiners. Secondly, insights from the project have been crucial for the preparation of examiner training material.

For all the reasons above and for others that will be discussed in future publications, it has become obvious that structured observation is a very functional and expedient way to monitor and assess the speaking test and examiners.

References

Karavas E., & Delieza X. (2009). On site observation of KPG oral examiners: Implications for oral examiner training and evaluation. Apples (Journal of Applied Language Studies), 3 (1), 51-77.

[1] The choice of examination centres was based on the number of examiners and candidates assigned to the centres. The aim was to cover centres all over Greece, but those which had more examiners and candidates were preferred as they were expected to yield more results.

[2] For a more detailed presentation of results see Karavas & Delieza (2009).

[Back]