DIRECTIONS e-journal

Kia Karavas
FAIRNESS AND ETHICAL LANGUAGE TESTING: THE CASE OF THE KPG

Abstract

This paper focuses on the issues of fairness and ethics in language testing – issues that have recently become priorities in the language testing agenda – and how the KPG exams deal with issues of fairness and ethics in the process of their development and implementation. After discussing the origins of ethical and critical language testing and their implications for the design and development of language tests, a rationale for the development of codes of ethics and codes of practice is provided. Two examples of internationally known codes of ethics and practice (ILTA and ALTE) are presented. It is argued that there are inherent limitations with the applicability and local relevance of internationally developed codes of ethics or practice. These limitations mainly relate to the nature of what is considered ethical, and how absolute ‘ethicality’ can be, and to the enforceability of codes. Given the nature of these limitations, the need for local examination systems to develop their own code of ethics based on universalist testing principles relating to test quality, validity and reliability, public accountability and responsible professionalism is highlighted. Finally, a detailed presentation of the code of ethics adopted by the KPG exam system and how the KPG exams adhere to principles of accountability, professionalism and transparency is provided. [1]

Keywords: ethical language testing, critical language testing, Codes of Ethics, Codes of Practice, universalist testing principles.

1. Ethical issues in language testing: When did it all begin?

Concern with fairness and ethics in testing is not a recent development in the sphere of educational assessment. In North America, concern with the development of testing standards began in the 1960s, because of litigation cases showing the differential treatment of African-American children as a result of their test scores. In fact, Fulcher and Bamford (1996) discuss examples of court cases arguing test bias against various ethnic groups. In the field of language testing, the issue of ethics began to slowly surface in the early 1980s with the works of Spolsky (1981) and Stevenson (1981), who highlighted the political purposes and consequences of tests, and with Messick’s (1989) influential expanded theory of validity, which explicitly linked ethics and validity. Messick’s framework of validity, which has become a reference for discussions, research, and practice in educational measurement, incorporates the issue of ethics into the concept of construct validity. Messick distinguished between ‘the consequential basis for test interpretation’ (i.e. what is commonly understood as the washback effect) and ‘the consequential basis for test use,’ which he described as the impact of the use (and misuse) of tests that had significant consequences for test takers and society. According to his expanded framework of validity, the construct validity of score interpretations forms the basis upon which considerations of values, uses and consequences of tests are based (see McNamara, 2001; Bachman, 2000 for further discussion).

However, it was only in the 1990s that the issues of fairness, ethical language testing, and accountability became priorities in the agendas of professional testers and testing bodies. The 1990s represented a ‘watershed’ in language testing (Douglas, 1995) since it was during this decade that the language testing field witnessed both: a) a rapid growth of discussion and writings amongst language testers and b) increased sensitivity and awareness by testing bodies about the role, impact and consequences of tests on candidates, on education, and on society (Hamp-Lyons, 2000). The surge of interest in fairness and ethics in assessment has also been spurred by our increased awareness of language use as a socially purposeful activity, and of testing as an institutional practice which serves various policy agendas and can act as a lever for social and educational reforms.

Interest in the social character of educational assessment and its consequences has also been accompanied by an expanded concern for professionalism in the field. This concern reflects the need to define professional conduct in language testing and, according to McNamara and Roever (2006: 137) ‘ … is also a sign of language testing coming into its own as a profession rather than being a subordinate component of language teaching or general psychometrics.’

Professionalism is closely linked to ethics since, in essence, being professional entails abiding by the principles of ethical conduct as laid down by your profession. As House claims (1990: 91 cited in Boyd and Davies 2002: 302), ‘ethics are the rules or standards of right conduct or practice, especially the standards of a profession.’ In many professions (such as law, medicine and psychology), codifications of ethical and professional conduct are published as ‘codes of ethics,’ ‘codes of practice’ or ‘standards;’ these codes (phrased as statements or guidelines) tend to elaborate the moral responsibilities of members of the profession. Admission to the profession and practice depends on adherence to an ethical code which, if violated, results in sanctions for the members of the profession (see McNamara and Roever, 2006).

Shohamy (2001b) extends the concept of professionalism in language testing and argues for the need of responsible professionalism which entails both: a) sharing authority among, collaborating with, and involving different stakeholders (including test takers) in the test development process, and b) striving to meet the various criteria for validity. One way of sharing authority, according to Boyd and Davies (2002), is to agree and publish a code of ethics, a practice which many language testing organizations are increasingly following in the recent years. But why is a code of ethics necessary in language testing?

2. The power of tests and the need for codes of ethics and practices

The power that tests have is discussed eloquently by Shohamy (2001a: 374). In her words, ‘Tests play a major role as their results have far-reaching implications for individuals and educational systems; tests can create winners and losers, successes and failures, the rejected and the accepted. Yet, it is often performance on a single test, on a single occasion, at one point in time that leads to the irreversible and far-reaching high stakes decisions.’

In understanding the power of tests, Bachman (2000) is one more testing theorist who agrees that tests are not value-free, culturally neutral tools developed in psychometric test tube environments. They are, in fact, powerful tools with far-reaching and sometimes detrimental consequences for individuals and society at large. Test scores, for one thing, can be used for admission to universities or to a profession, as well as for employment decisions and grants of citizenship. Bourdieu (1991 cited in Shohamy, 2001a) highlights the symbolic power of tests, arguing that they can be used as rites of passage, as a means of socializing the public, and creating a dependency of test takers on tests as a main criterion of worth. While tests represent ‘cultural capital,’ the knowledge contained in tests, on the other hand, represents in the minds of test takers and users what counts as legitimate, worthwhile knowledge, and, thus, tests can provide the means for controlling knowledge.

Furthermore, as Shohamy (2000, 2001a, b) maintains, tests can be introduced in unethical ways for disciplinary purposes and for carrying out various policy agendas perpetuating the domination of those in power and excluding unwanted groups. In her 2000 article, she provides various examples of the unethical and undemocratic uses of tests for excluding groups from various cultural and linguistic backgrounds. Shohamy, in fact, is one of the ardent proponents and instigators of critical language testing. In her 1997 plenary address to the American Association of Applied Linguistics, she essentially defined the critical perspective within the field of language testing, while in her 1999 article, she identified 15 democratic principles that should underline language testing. What Shohamy had defined as critical language testing has actually grown into a powerful movement in the USA that critiques tests and testing practices (see www.fairtest.org).

Critical language testing has certainly made an important contribution in terms of both: a) alerting test developers and administrators to the impact and consequences of the decisions they make on the basis of test scores, and b) raising the awareness of test developers about the power they hold (Hamp-Lyons, 2000). Critical language testing has given rise to constant questioning of the normative assumptions of language testing and applied linguistics in general, and has called attention to the political uses and potential abuses of language tests (Lynch, 2001; Bachman, 2000). It has clearly articulated and highlighted the need for language testers to become aware of the principles of and strictures on the use of the power that they hold, as well as the need for testers and candidates to critically analyse the ways in which tests are used, and to ensure that these uses are fair and ethical (McNamara, 1999 in Hamp-Lyons, 2000).

However, it should be pointed out that proponents of critical language testing engage in constant questioning in an attempt to address issues of political oppression and social domination as regards the use of high stakes exams which may result in excluding groups of individuals from society on the basis of their ethnic or cultural background. Much of the critique of critical language testers relating to the unethical and unintended uses of tests has also been made with reference to international language testing organizations which attempt to develop tests of the ‘one fits all’ type. That is, tests which purport to be relevant, valid and applicable to all contexts and for all test users regardless of their educational, socioeconomic and cultural background. The very act of developing one-fit-all tests and of adopting international exams in situations where the stakes are high, or doing the exact opposite, is based on political decisions and is, therefore, a political act (cf. Shohamy 1993, cited in Bachman, 2000).

Educational assessment and language testing in particular is certainly a ‘socially embedded activity’ (Stobart, 2005) which should be analysed and critiqued with reference to the cultural and political context in which it is developed and within which it operates. It is undoubtedly the responsibility of every test developer to become aware of the potential misuses of tests, their intended and unintended consequences on the lives of the candidates, and to ensure that the procedures followed in the development of tests reflect ethical practice and lead to fair competition.

Fairness, though, is not integral to the notion of validation of test instruments, as was once believed. A valid test does not necessarily guarantee that the test is fair. Fairness does not only relate to the technical aspects of test development, but to the wider social context in which test development, implementation and administration is undertaken. As Kunnan (1999), states this type of fairness is left untouched by concerns of validation and relates to ‘how instruments and their score interpretations work in society’ (ibid: 238).

Certainly, one way of addressing and overcoming the potential misuses of tests and ensuring that tests are fair not only on the test level but also in the broader context of their use is through the development of Codes of Ethics. Codes or Standards (a term mainly employed in the US educational context) form the basis for monitoring and evaluating testing practices. According to Alderson, Clapham and Wall (1995: 236), codes are in essence ‘an agreed set of guidelines which should be consulted and, as far as possible, heeded to in the construction and evaluation of a test.’

Codes of Ethics are sets of principles or guidelines that ‘draw on moral philosophy and are intended to guide good professional conduct’ (Boyd and Davies, 2002: 304). They are based on concepts of what constitutes moral behaviour, and represent a one-sided contract between the profession and all other interested stakeholders, i.e. a statement of the profession’s own moral position. This statement is intended to assure the public that the profession subscribes to high ethical standards, and that the tests it develops adhere to clearly articulated principles rendering them valid and fair.

Codes also aspire to guide professionals in their decision-making, and ‘serve a self-policing function [since] members who violate them are sanctioned’ (McNamara and Roever, 2006:138). Codes of ethics, as Boyd and Davies (2002) remind us, are regulative ideals, not prescriptions or instructions on how to act in every circumstance. They are neither a stature nor a regulation, but offer ‘a benchmark of satisfactory ethical behaviour for all those in the profession’ (ibid: 306).

In language testing, as in other professions, a common practice is to develop two codes: a code of ethics, and a code of practice. The former focuses on the morals and ideals of the profession, while the latter instantiates the principles of the code of ethics and provides more specific guidelines and requirements for practice which clarify the behaviours that lead to professional misconduct or unprofessional conduct. Some organizations combine general ethical principles and guidelines for practice in one code (e.g. the American Psychological Association Guidelines).

3. Examples of codes of ethics and practice in language testing

In the area of educational measurement, various large-scale testing bodies and organizations have developed codes of ethics and practice (or combined codes) in an effort to uphold standards in the profession and to ensure that tests are fair, valid and ethically used. Some of the most well-established and internationally known codes include:

The Standards for Educational and Psychological Testing (1985), also known as ‘APA Standards’, these were developed in the US by the American Educational Research Association (AERA), The American Psychological Association (APA) and the National Council on Measurement in Education (NCME).
The Code of Fair Testing Practices in Education (1988). A set of standards based on the APA standards. These are more focused on refining many of the principles of the APA in order to improve testing practices. This code was developed by the Joint Committee on Testing Practices, comprised of members of the AERA, APA, MCME, the Canadian Psychological Association, and 23 test publishers.
The ETS Standards for Quality and Fairness (1987). These were developed by the Educational Testing Service in the US and evaluated by the APA Standards Evaluation Committee.
Standards for Educational Testing Methods (1986). These were adapted from the Standards for Evaluation of Educational Programs, Projects and Materials (1981) by Nevo and Shohamy (1986).
The SEAC’s Mandatory Code of Practice (1993). This was developed by the Schools Examination and Assessment Council (SEAC) in the UK in order to monitor procedures in examination development in the context of the National Curriculum in England and Wales.

The nature and principles of all the above standards and codes, developed within the wider context of educational testing in the US and the UK, are not of immediate concern in this paper,[2] which focuses on language testing in particular. However, it is useful to briefly present and discuss the main principles of two of the most well- known internationally used codes of ethics and practice for language testing, the ILTA Code and the ALTE Code.

As will become evident from the descriptions below, although both codes are addressed to language testing organizations and individual language testers, there are significant differences in the nature and content of the Codes and the level of detail in which the principles are explained. Thus, the ILTA Code is essentially a code of ethics, while ALTE has opted for a code of practice. The former is addressed to international language testing organizations and individual language testers, while the latter is intended for European language testing bodies.

ILTA’s Code stipulates nine ethical principles which draw on moral philosophy and which reflect the ideals of the profession. These principles focus on the language testers’ moral obligations towards the candidates, the profession, and the wider society in which tests are used. Principles are phrased as guidelines, and each is broken down into a series of statements which elucidate the test developers’ moral obligations. The ALTE Code identifies the responsibilities of test developers and test users (i.e. bodies which select/commission examinations and make decisions on the basis of test results); these responsibilities, phrased as imperatives, define exactly the actions test developers/users must take in order to ensure good practice and to uphold the standards of the profession.

3.1 The ILTA Codes of Ethics and Practice

This Code was developed by the International Language Testing Association and consists of a Code of Ethics (2000) adopted at the annual meeting of ILTA in Vancouver, March 2000, and a draft Code of Practice (2005). The ILTA Code is addressed to individual language testers and testing institutions, and offers a benchmark of satisfactory ethical behaviour for all language testers. According to the document, it ‘is based on a blend of the principles of beneficence, non-maleficience, justice, a respect for autonomy and for civil society.’ The ILTA Code identifies nine fundamental Principles, each elaborated and explained through Annotations. The Annotations also elaborate on the Code’s sanctions, highlighting the fact that failure to uphold the Code may result in penalties (i.e. withdrawal from ILTA membership).

The first three principles of the Code focus on the protection of the candidate and the relationship between tester and candidate i.e., a) testers must not discriminate against candidates or abuse the power they hold over them, b) testers must respect the confidentiality of candidates’ personal information, and c) testers must abide by ethical principles when conducting test-related research that involves candidates.

The following six principles relate to the language tester’s responsibilities towards the profession and society at large. Test designers: a) must maintain and update their skills, b) must train others in principles of good practice, c) must uphold the integrity of the language testing profession, d) must improve the quality of the language testing profession, e) must make it widely available to others, and f) may withhold their professional services on the grounds of conscience when tests have negative effects or repercussions on all stakeholders. In order to illustrate the nature of ethical principles contained in the ILTA Code of Ethics, an extract from the Code is included in Appendix 1.

The ILTA Code of Practice deals mainly with test design, administration and score interpretation. The code also lists ten rights of test takers, which include their right to be informed about the test and its consequences and the right to be treated fairly, as well as ten responsibilities of test takers, which mainly relate to their responsibility to actively seek information about the test and to be cooperative during test administration.[3]

3.2 The ALTE Code of Practice

The ALTE Code of Practice has been developed by the Association of Language Testers in Europe, an Association formed in 1990 by representatives of eight institutions. ALTE now comprises of 31 institutions representing 26 European languages.

This Code is closely modelled on the JCTP Code mentioned above, and is divided into two parts: Part 1 lists the responsibilities of test developers, who are ALTE members, while Part 2 lists the responsibilities of test users or, more precisely, score users (i.e. those who make decisions which affect the educational possibilities and careers of others on the basis of examination results). The responsibilities of ALTE members relate to the development of examinations, the interpretation of exam results, the need to strive for fairness, and the need to inform candidates (see Appendix 2 for the list of responsibilities of test developers identified in the ALTE Code of Practice, Part 1). Responsibilities of score users include guidelines for the selection of appropriate examinations, the correct interpretation of scores, striving for fairness, and informing candidates. In a short document of only three printed pages, the responsibilities are expressed as brief statements about what each group is expected to do in relation to the categories mentioned. This Code has since been revised and is now part of a larger quality management process.

ALTE has also published a brief list of Minimum Standards for establishing quality profiles in ALTE examinations, which comprises 17 minimum standards grouped under: test construction, administration and logistics, marking and grading, test analysis, and communication with stakeholders (see Appendix 3 for the Minimum Standards)[4].

4. The dangers and limitations of codes

There seem to be two main limitations with the development and implementation of codes of ethics and practice. One limitation relates to the question of what ethical is and the second relates to the enforceability of codes of ethics.

Certainly, codes of ethics reflect the profession’s concern with the greater good and its desire to be moral and just. However, what is considered ‘ethical’ or ‘moral’ in one context is not necessarily considered so in another. Different cultures clearly have different conceptions of morality and acceptable behaviour. Morality can never be absolute, and ethical principles cannot be universally applicable.

Members of the ILTA organization faced great problems in reaching agreement on a common set of ethical principles and practices that are applicable across a range of different political and cultural contexts (Boyd and Davies, 2006). As a result, the principles are phrased in broad and general terms and are susceptible to various interpretations, while the annotations to these offer no room for negotiation.

In essence, codes of ethics reflect a top-down approach to ‘quality assurance’ which cannot by nature take into account and cater for all contexts, local differences and exigencies, and which cannot accommodate all possible circumstances and situations worldwide. An example will make this limitation of codes of ethics clear.

The fourth bullet of the Annotation to Principle 7 of the ILTA Code of Ethics states: ‘Language testers shall be prepared to work with advisory, statutory, voluntary and commercial bodies that have a role in the provision of language testing services.’ Taking the KPG test developers as an example: The director of the test development team, who is a public servant (professorial staff of a state university), is appointed by the Ministry of Education; thus, s/he must abide by the rules and regulations concerning the professional conduct of public servants in the Greek context. By law, members of the test development team are not allowed to cooperate and work with any commercial body involved in education or language testing for commercial or symbolic profit. In this case, the local code of ethics which determines the team’s professional status in Greece clashes with a principle of the code of ethics for language testers worldwide.

A second example: The third bullet of the Annotation to Principle 2 (see Appendix 1) states: ‘Similarly, in appropriate cases, the language tester’s professional colleagues also have the right to access data of candidates other than their own in order to improve the service the profession offers. In such cases, those given access to data should agree to maintain confidentiality.’ Taking the KPG test development team (and the ethical framework within which they are obliged to operate in Greece) as an example once again, it should be pointed out that the project team is not allowed to use, access or share data regarding test takers since this data is strictly confidential and is protected by the Ministry of Education. Thus, if another testing body requests access to KPG test taker data, in abiding by the local ethical code, it will not be possible to make such data available. Naturally, this does not mean that the KPG project team is ‘unethical’ or that its members do not wish to contribute and ‘to improve the service the profession offers.’

Indeed, codes of ethics, developed almost in their entirety in Western contexts, can be accused of promoting and imposing hegemonic ethics. Some authors (Boyd and Davies, ibid, MacNamara and Roever, 2006; Bachman, 2000; Bachman and Palmer, 1996) have highlighted the need for the development of local codes of ethics (i.e. codes developed by local testing professionals that adopt universalist testing principles with a local gloss) in order to overcome the problem of cultural relativism. This solution, though, is fraught with other difficulties. In many local contexts, there is no critical mass of professional language testers, which is needed to develop an examination body that could issue such codes. Therefore, a number of important questions such as the following are raised: If local codes are developed, who will oversee and evaluate the validity and usefulness of these codes? How will enforceability of these codes be guaranteed? Who will issue sanctions in cases where local codes are not followed?

Another problem with codes of ethics relate to their implementation and enforceability. Boyd and Davies (2002: 305) make an eloquent point about the potential inherent hypocrisy of codes by stating ‘Codes can be viewed in two opposing ways: positively they represent a noble attempt by a profession or institution to declare publicly what it stands for; negatively, they can be seen as a sophisticated pretence, a vacuous statement of pretend ideals behind which the profession feels even safer than before, now that it can lay claim to a set of unimpeachable principles. And if accountability remains in house, then there is no check on adherence to the principles.’

Unlike other professions (e.g. medicine and law), in ‘weak’ professions like language testing, there is no external organization that grants permission and the right to practice, and there are no serious sanctions against individuals or institutions that violate or do not follow codes of ethics (McNamara and Roever, 2006). Indeed, there are many well-known and respected testing bodies and organizations which do not follow codes of ethics and practice (see examples of such cases with known UK examination boards in Fulcher and Bamford, 1996). Alderson, Clapham and Wall (1995) conducted a survey on 12 examination boards in the UK in order to illustrate current practice in language testing, and they found that testing practices vary considerably in practices such as:

item pretesting, (According to the researchers, there was widespread absence of pre-testing in some cases but not in others.)
test validation, (Some boards lacked any evidence whatsoever on the empirical validation of their tests but not others.)
procedures ensuring the equivalence of different versions of exams,
training and monitoring of administrators,
double marking of all scripts (In rare cases did the researchers find that this took place),
availability of data on reliability of the exams and of marking. (Reliability amongst most boards was asserted rather than measured)

The results of the survey led the researchers to conclude that information about the exams is not readily and publicly available. They reported that it took a great deal of effort and time on their part to get as far as they did. ‘This should not be necessary,’ they said. ‘If evidence is available to support claims of test quality, that evidence should be made publicly available,’ and they continue ‘ … it appears that the different boards involved in EFL testing do different things, with differing degrees of rigour, to monitor the quality of their examinations’ (Alderson, Clapham and Wall ibid: 258).

Given the limitations of codes of ethics discussed above, the way forward, I believe, is for professional language testers and language testing bodies, especially those working in local contexts, to adopt an approach which centres on the following three principles, principles which are common to all codes of ethics:

a) adherence to universalist principles relating to test quality, validity and reliability

b) public accountability which entails transparency and openness about design, constructs, procedures, and scoring to all stakeholders

c) responsible professionalism, as defined by Shohamy (2001b), which requires shared authority, collaboration and involvement of stakeholders (including candidates).

In addition, I believe that any large-scale test is susceptible to practical, financial and political factors and local contingencies; compromises between these factors and the test design process are many times inevitable but, as Alderson, Clapham and Wall (1995) argue these compromises need to be as principled as possible. True ‘fairness’ in language testing can never be achieved since fairness is a relative, qualitative concept reflecting a value judgment. What language testers should strive for is ‘equitability,’ and I fully agree with Spaan (2000: 35), who argues that ‘this equitability implies joint responsibility of the test developer, test user and the examinee in a sort of social contract in which the developers promise to maximize validity, reliability and practicality of application. It is the developer’s responsibility to educate the users by providing readable and understandable interpretations and guidelines for use of their products, including the provision of norms and scoring criteria. The developer must be able to justify any claims about the test and furthermore the developer must also solicit feedback from users and examinees.’

5. Accountability, professionalism and equitability in the KPG exams

The KPG examination system does not follow a code of ethics or code of practice developed by an external international organization, but it has seriously considered international codes of ethics and practices in order to develop the ‘glocal’ code of ethics and practices. Since 2003, when the exams in four languages were administered, the language project teams have had to follow the principles of accountability, professionalism and equitability developed by the Central Examination Board. These principles are realized through the exam specifications and the Ministry-stipulated regulations, which are publicly available and common to all languages, concerning the test development process and exam administration. This code of ethics offers a local gloss on global language testing principles (hence ‘glocal’ in nature), relating to quality and ethical language testing. It places emphasis on establishing and ensuring validity and reliability of the exams, assessing the quality of the exams, and involving stakeholders in the development of the exam (principles not elaborated upon in the ALTE Code).

How the KPG exams in English follow and materialize these principles is described in Section 5.1 below. Where appropriate, reference to KPG’s adherence to guidelines for good practice as presented in the Minimum standards for establishing quality profiles in ALTE examinations (Appendix 3) will be made. As will become evident in the presentation below, the KPG exams meet all the minimum standards, which are denoted by a number after each item (e.g. MS:1, MS:5, etc. see Appendix 3) and fulfil many more not included in the ALTE profile. As will be demonstrated, through the established and commonly agreed upon procedures for the development of the KPG exams, the principles of accountability, professionalism and equitability have become cornerstones of the KPG examination system.

5.1 The KPG test development process

5.1.1 Test purpose and specifications

The National Foreign Language Exam System (Kratiko Pistopiitiko Glossomathias-KPG) aims at measuring levels of competence or proficiency in English, French, German, Italian, Spanish and Turkish. The overall aim of the KPG exams at all proficiency levels is to test the candidates’ ability to make purposeful use of the target language at home and abroad. More specifically, depending on the level of proficiency, the exams test candidates’:

use of language for comprehension and production of oral an written discourse,
ability to act as mediators across languages,
awareness of how the target language works to produce socially purposeful meanings (see Dendrinos, 2004: http://www.icc-europe.com/AGM, 2004/documentation/Bessie plenary.rtf) (MS:1 and MS:2)

Exam specifications have been developed by the Central Examination Board, in collaboration with testing professionals at Greek universities, and are common for all languages assessed through the KPG exams. The Central Examination Board (henceforth CEB) is responsible for approving the test paper content before it is disseminated to the state-selected exam centres and it is also responsible for the specifications regarding the exam format, the structure and the scoring regulations. Additionally, the CEB functions in an expert consulting capacity, advising the Ministry on matters regarding the development and growth of the system, exam policies and law amendments, new and revised regulations.

The specifications developed by the CEB for each exam level include information on:

the aims and nature of the exams (i.e. the theory of language and principles permeating the tasks, selection of materials, criteria for scoring of all test papers of the KPG exam). (MS:1)
candidates’ profile (i.e., characteristics of candidates to whom the different level exams are addressed). (MS:2)
the structure of the exam and the uses of language that each test paper (module) aims to assess.
descriptors of communicative performance expressed as can-do statements based on the CEFR. (MS:5)
degree of difficulty of the level of the exam.
distribution of items per test paper.
duration of each test paper and length of texts therein.
weighting of marks for each test paper.
text types for comprehension and production.
a typology of activity types per test paper.

Exam specifications for each level are accompanied by sample tests and answer keys. These specifications are made public and are easily accessible (see Section 5.3 below) through the Ministry of Education and Religious Affairs website (http://www.minedu.gov.gr/eksetaseis-main/kpg-main/) and the RCeL website (http://www.rcel.enl.uoa.gr/) (MS:16).

Moreover, rating scales and criteria for the assessment of oral and written language production (with detailed descriptions) for all levels of the exams have been developed. Rating scales are made public to candidates and teachers (MS:16). The detailed descriptions of assessment criteria are made available to multipliers (i.e. the trainers of oral examiners), oral examiners, raters, and coordinators of the marking centre.

5.1.2 Test design, item writing and pre-testing

A preliminary needs analysis survey involving employers, students and teachers throughout Greece was carried out, the results of which informed the overall content of the exams. In the design of tasks and the selection of materials (e.g. visuals and multi-modal texts), the following specific characteristics of candidates, which are identified in the exam specifications, are seriously taken into account: linguistic background, language-learning background, level of literacy, age, and socio-cultural background. These are systematically investigated at every administration with questionnaires distributed to candidates. (MS:2)

Clearly laid-out guidelines for the design of tasks and the selection of materials for each test paper and level of the exam have been published for item writers. These guidelines are accompanied by samples of appropriate tasks and materials which have been calibrated to CEFR level descriptions. (MS:3, MS:5). Apart from the item writers, there are also two Test Developers (who can be University Professors, School Advisors or highly qualified teachers) per language appointed by a Ministerial decree for two years. The foreign language departments of the University of Athens and the University of Thessaloniki and their specialist language groups are responsible for the development of test tasks in the languages that the KPG exam battery offers. More specifically, the University of Athens is responsible for developing test tasks in the following languages: English, German, Spanish and Turkish. The University of Thessaloniki is responsible for the French and the Italian language respectively. Each department has to also select an Academic Director, who is responsible for: a) the rating process and b) the training of the raters and the examiners.

The item writers and Test Developers are experienced language teaching professionals who receive training and feedback by the specialist language groups. Moreover, there are systematic procedures for review, revision and editing of items and tasks to ensure that they match test specifications and comply with item writer guidelines (MS:3). All items, tasks and task instructions designed by item writers are evaluated systematically (in terms of their adherence to the illustrative descriptors and can-do statements, the test specifications, and their clarity) by specialist language groups comprising of assessment specialists. Items and tasks are returned with feedback from the teams to the item writers for revision, and the process continues until all items and tasks are judged by the specialist teams as satisfactory. They are then included in the electronic item bank. Before each exam administration, the test items for all test papers are also evaluated by the Central Examination Board. (MS:3) More specifically, for the development of each test paper of the exam, the following procedure is followed:

       1.      Initial version of test paper: /span> screening by inspector(s) for approval
       2.      Test trial run by test development team
       3.      Test revised
       4.      Revised test paper piloting
       5.      Test re-revised to final version
       6.      Evaluation of test paper by (3) team experts
       7.      Approval of test paper by the Central Examination Board

5.1.3 Validity and reliability

Exams are assessed a priori and a posteriori for reliability and validity through test item analysis of piloted materials and actual exam materials. The results are discussed by all project team members, the test development team and the members of the Central Examination Board. After each exam administration, content and form features from level to level are compared, test taker’s scores on different test papers are compared, the difficulty of each exam paper is investigated, test takers’ scores on each test paper are related to the final result, alongside classical item analysis, test paper validity is also investigated using the Rasch model. (MS:4)

More specifically, systematic item analysis for all test papers is carried out by calculating item difficulty and item discrimination. Moreover, systematic research involving candidates’ perceptions of item and task difficulty on all test papers of the exam is carried out during each exam administration through specially designed questionnaires completed by candidates after the exam.[5] (MS:13 and MS:14)

Moreover, reliability of marking is monitored and assessed through systematic inter-rater reliability checks during and after each exam administration. For the speaking test, inter-rater reliability is assessed through the data collected by observers who note the mark for each candidate awarded by each observed examiner. On the observation form, the observers (who are multipliers, i.e. experienced and trained trainers of oral examiners) also note their mark of candidates’ performance. This data is then collected by the RCeL, where inter-rater reliability estimates are calculated. For the writing tests, coordinators of the marking centre systematically monitor markers’ application of the assessment criteria and the marks assigned to scripts. All scripts are routinely double marked. Cases of discrepancy between markers of the same scripts are noted, and markers are alerted of the discrepancies (also see Section 5.1.4 below). After the marking process has been completed, both marks on all candidate scripts are collected, and inter-rater reliability estimates are calculated.[6]

Reliability of marking and adherence to exam guidelines is also ensured through the systematic training of multipliers, oral examiners, marking centre coordinators and markers.[7] (MS:11 and MS:12)

5.1.4 Test administration and monitoring

The KPG exams are administered exclusively in Greece and are governed by the Ministry of Education and Religious Affairs, which, through its Department of Foreign language Knowledge Certification, is responsible for administering the exams and issuing the respective certificates. The Ministry uses the infrastructure, technical and administrative support of the university national entrance exams. State schools throughout the country serve as the official examination centres for the KPG exams. The same procedures followed for the administration of the national university exams are followed for the administration of the KPG exams.

Exam centres operate throughout Greece in every area where there are 10 or more candidates. On average, around 200 exam centres are used for each exam administration. There are also two specially equipped centres for candidates with learning disabilities and special needs, one in Athens and one in Thessaloniki. These two centres are staffed with specially trained personnel and trained and experienced oral examiners conduct the speaking test. (MS:6, MS:7 and MS:10)

Each exam centre has an Exam Centre Committee. The Ministry of Education, in collaboration with language project teams, have developed and published guides for KPG exam centre committees, detailing the rules and regulations for the administration of the exams. Committees are in direct contact with the Ministry of Education during the exam administration, reporting on any problems or difficulties faced.(MS:6 and MS:8)

After candidate scripts have been rated, the results of the rating process are collected by the Ministry’s Department of sults are then forwarded to the Specialist Language Groups of every language, the Central Examination Board and the Department of Foreign Language Knowledge Certification. The final results are announced by the Ministry of Education and Religious Affairs and they appear in a table validated by the Minister of Education, which includes the names of all the successful candidates. Candidates can be informed about their final scores by the Department of Secondary Education, in which they enrolled for the exams or via the Internet (KPG’s website) by using their individual access code. Candidates, who have passed the exams, can receive their certificates from the Departments of Secondary Education in which they have registered for the exams. (MS:9, MS:15)

Monitoring of the speaking test is carried out through the KPG Observation Project. The Observation Project was launched in November 2005, and has been on-going ever since, in an attempt to identify whether and to what extent examiners adhere to exam guidelines and the suggested oral exam procedure. The initial overall goals of the Observation Project were to gain information about the efficiency of the oral exam administration, the efficiency of oral examiner conduct, applicability of the oral assessment criteria and inter-rater reliability. (MS:11 and MS:12)

The observation forms, used by trained observers, elicit information on the seating arrangements in the examination room, procedure followed, the candidates’ age and sex, the choice of questions and tasks made by the examiner/interlocutor, ratings by the two examiners and the observer, duration of the oral exam, time allocation to different activities and to the two candidates, and overall assessment of the examiner’s oral performance as ‘Excellent-Very good’, ‘Good’ or ‘Mediocre-Poor.’[8] (MS:12)

Additionally, monitoring of the marking process for the writing test paper is carried out by trained coordinators at the two rating centres (one in Athenstyle="line-height: 100%; font-family: Arial"> and one in Thessaloniki) both of which are chaired by the CEB. In the Rating Centres, there are co-coordinators of the script-raters that monitor the rating process and guide the script raters. There is also an administrative committee and a secretariat that comprises of employees of the Ministry’s Department of Certification. The Rating Centres gather the candidates’ answer sheets for all the languages that they are responsible for.

The trained coordinators at each marking centre are responsible for:

monitoring script raters’ individual performance during the rating of a certain number of scripts (i.e. at least three scripts in every packet of 25). This procedure is followed each time raters are obliged to move to the next level of the exam.
monitoring the script raters’ application of the assessment criteria in each of these scripts, and keeping records of raters’ performance by filling in two different statistical sheets. The first one includes general comments, while the second is more detailed and asks for the coordinators’ justified evaluation of the individual script raters.
monitoring the performance of the raters through randomly chosen scripts already marked, in which the raters are asked to justify their assigned marks. The coordinators discuss the application of the marking grid, and they keep records of the whole procedure. These records will be analyzed after the rating period has ended, and details of the raters’ actual performance will be recorded and analyzed for further reference and evaluation of the individuals and the process itself. (MS:12)

5.1.5 Post-examination review

As mentioned above, item analysis and inter-rater reliability are calculated after each exam administration, and the results of each analysis are presented in the form of a report. Moreover, the results of the observation forms are analysed and presented in the form of a report, as are the results of the monitoring procedures of the marking process. Additionally, the Ministry of National Education issues statistics regarding the number of candidates who took part in the exam for each level for each language (MS:13 and MS:14)

5.2 Involvement of stakeholders

Relevant stakeholders have been involved, directly and indirectly, in the development of the KPG examination system in various ways:

Through the initial needs analysis survey which involved parents, students, employers and teachers, and the results of which have informed the design of the KPG exams.
Through candidate surveys carried out after each exam administration. Specially designed questionnaires in Greek are completed by candidates at the end of the exam to elicit their feedback on perceived difficulty of tasks and items, their familiarity with topics and text types, and their opinion of the test as a whole. Different questionnaires are designed for different level candidates.
Through feedback forms completed by oral examiners and script raters after each exam administration. These forms aim to elicit feedback concerning the potential problems with test items, their usefulness, appropriateness and practicality.
Through feedback forms completed by invigilators and staff of the rating centre committees.
Through examination preparation classes offered to University of Athens students by the RCeL staff.
Through a special exam preparation programme launched by the RCeL in 2008. This programme involves a sample of primary and secondary school teachers developing, with the cooperation and help of the staff at RCeL, test preparation materials and teaching test preparation courses in schools around the country. These courses are offered free of charge to students in after-school classes, through the Support-Teaching State School Programme.

5.3 Accountability and transparency

Transparency of procedures and openness to stakeholders has been a defining characteristic of the KPG examination system. Conscious efforts have been made throughout the years to disseminate information about the nature and requirements of the ts of the KPG exams, and to inform and educate all relevant stakeholders. KPG test specifications are written in Greek to ensure comprehension by prospective candidates, teachers and parents. These have been made publicly available and easily accessible through the Ministry of Education and RCeL websites. Moreover, on the Ministry of National Education website, relevant stakeholders and interested individuals can find information on (http://rcel.enl.uoa.gr/kpg/):

issues related to test management and administration
the general framework and principal features of the KPG examination system and the published version, which has been disseminated to relevant stakeholders)
the specifications for the exams which are common for all languages assessed sample exams and answer keys for all levels and languages

Moreover, the Ministry of Education has published leaflets to inform the general public and relevant stakeholders on the main features, aims, nature and requirements of the KPG exams. Also, the Ministry published and disseminated, free of (MS:16 and MS:17)

In addition, as mentioned above, candidates and teachers also have access to the rating grids for the speaking and writing tests.

The RCeL has published a comprehensive Handbook for the Speaking Test (Karavas, 2009) and a Guide for KPG raters. Both are addressed (MS:16)

The launched a KPG information dissemination programme in order to inform the public and relevant stakeholders about the characteristics and requirements of the KPG exams. More specifically, multipliers and school advisors, after being trained, were given specially written materials (e.g. PowerPoint presentations, leaflets, etc.) for different audiences, and were responsible for organizing ‘KPG information seminars’ for teachers, parents, the general public and language institute owners. These seminars also included question and answer sessions in order to clarify any queries or misinterpretations the public had about the KPG exams. (MS:16)

A series of conferences have also taken place in Athens and Thessaloniki to inform professionals in the area of applied linguistics and testing from Greece and abroad about the research and developments (MS:16)

Finally, professionalism and transparency is also reflected in the systematic attempts made by the KPG project team members to present KPG research at international and local conferences, as well as to publish books, articles and commentaries about the KPG exams. These publications appear in Greek for teachers and language testing professionals in Greece, and also in English and other languages for the international community (see http://rcel.enl.uoa.gr/kpg/).

The emphasis on transparency, accountability and professionalism which characterises the KPG exam system is also evident in development of the KPG school- a funded project whose purpose is to to link the KPG language exams with the state compulsory education system and to ensure sustainability of the KPG examination system. The KPG school is an e-school aiming at preparing, tutoring and guiding the following groups of “stakeholders” for the KPG exams:

Possible candidates of the KPG exams
Teachers who offer support to students preparing to sit for the exams
Teachers in the role of KPG assessors (i.e., script raters and oral examiners)
Parents of candidates of the KPG exams.

The KPG e-school will mainly include digital material aiming at successfully developing KPG candidates’ test-taking strategies that are necessary in all four tests of each exam battery for each language. More specifically, the KPG e-school will provide candidates with tasks focusing on the aforementioned strategies, accompanied by teaching instructions and guidance so that they can be used both by the teacher and by students of different age and literacy background.

As regards the administration of this on-line educational material, an e-directory will be developed and e-learning tools will be used with hyperlinks, to help the KPG candidates navigate easily, depending on the type of task they want to do or on the information they want to find.

Additionally, the KPG e-school will include teaching advice as well as tips for dealing with the various tasks, with guidance and instructions in every step. Access to the KPG e-school website will be open, as its aim is to be used by the KPG candidate outside the regular school hours either autonomously or with the support/guidance of the teacher.

References

Alderson, J., Clapham, C., & Wall, D. (1995). Language Test Construction and Evaluation. Cambridge: Cambridge University Press.

Bachman, L.F. (2000). Modern language testing at the turn of the century: Assuring that what we count counts. Language Testing, 17(1), 1-42.

Bachman, L.F., & Palmer, A.S. (1996). Language Testing in Practice. Oxford: Oxford University Press.

Boyd, K., & Davies, A. (2002). Doctors’ order for language testers: the origin and purpose of ethical codes. Language Testing, 19(3), 296-322.

Dendrinos, B. (2004). KPG – a new suite of national language examinations for Greece: philosophy and goals. Lecture at the ICC AGM Annual Conference 19/4/2004. Athens: http://www.icc-europe.com/AGM 2004/documentation/Bessie plenary.rtf .

Douglas, D. (1995). Developments in Language Testing. In Grabe W. et al (Eds.), Annual Review of Applied Linguistics, 15, Survey of Applied Linguistics (pp. 167-187. New York: Cambridge University Press.

Fulcher, G., & Bamford, R. (1996). I didn’t get the grade I need, where’s my solicitor? System, 24(4), 437-448.

Hamp-Lyons, L. (2000). Social, professional and individual responsibility in language testing. System, 28, 579-591.

Karavas E. (Ed). (2008). The KPG Speaking Test in English: A Handbook. National and Kapodistrian University of Athens, Faculty of English Studies: RCeL Publications, Series 2 (RCeL Publication Series Editors: Bessie Dendrinos & Kia Karavas).

Karavas E. (2009a). The KPG exams: Training the examiners. ELT News, Nï. 236, p.8.

Karavas E. (2009b). Training script raters for the KPG exams in English. ELT News, Íï. 239, p.16.

Kunnan, A.J. (1999). Recent developments in language testing. Annual Review of Applied Linguistics, 19, 235-253.

Lynch, B.K. (2001). Rethinking assessment from a critical perspective. Language Testing, 18(4), 351-372.

McNamara, T. (2001). Language assessment as social practice: Challenges for research. Language Testing, 18(4), 333-349.

McNamara, T., & Roever, T. (2006). Language Testing: The Social Dimension. Oxford: Blackwell.

Messick, S. (1989). Validity. In R.L., Linn (Ed.), Educational Measurement (pp. 13-103), (3rd ed.). New York: American Council on Education/MacMillan.

Mitsikopoulou, B. (Ed.). (2009). The KPG Writing Test in English: A Handbook. Athens: RCeL Publication Series 3, Faculty of English Studies, National and Kapodistrian University of Athens (RCeL Publication Series editors: Bessie Dendrinos & Kia Karavas).

Nevo, D., & Shohamy. E. (1986). Evaluation standards for the assessment of alternative testing methods: An application. Studies in Educational Evaluation, 12, 149-158.

Saville, N. (2002). Quality and fairness: The ALTE code of practice and quality management systems. Sprogforum, 23, Vol 8: 45-50.

Shohamy, E. (1993). The Exercise of Power and Control in the Rhetorics of Testing. In A. Huhta, K. Sajavaara, & S. Takala. (Eds.), Language Testing: New openings (pp. 23-38). Jyvaskyla: University of Jyvaskyla.

Shohamy, E. (1999). Critical language testing: use and consequences of tests, responsibilities of testers and rights of test takers. Paper presented at the 21st Annual Language Testing Research Colloquim, Tsukuba, Japan.

Shohamy, E. (2000). Fairness in Language Testing. In A.J. Kunnan (Ed.), Fairness and Validation in Language Assessment. Selected papers from the 19th Language Testing Research Colloquium, Orlando Florida. Studies in Language Testing, 9, 15-20.

Shohamy, E. (2001a). Democratic assessment as an alternative. Language Testing 18(4), 373-391.

Shohamy, E. (2001b). The Power of tests: A Critical Perspective on the Use of Language Tests. Harlow: Longman.

Spaan, M. (2000). What, if any, are the limits of our responsibility for fairness in language testing. In A.J. Kunnan (Ed.), Fairness and validation in Language Assessment. Selected papers from the 19th Language Testing Research Colloquium, Orlando Florida. Studies in Language Testing, 9, 35-39.

Spolsky, B. 1981. Some Ethical Questions about Language Testing. In C. Klein-Braley & D.K. Stevenson (Eds.), Practice and Problems in Language Testing. Frankfurt: Peter Lang.

Stevenson, D.K. (1985). Authenticity, validity and a tea party. Language Testing, 2(1), 41-47.

Stobart, G. (2005). Fairness in multicultural assessment systems. Assessment in Education 12(3), 275-287

Endnotes

[1] For the procedures followed in test development, test administration and test score interpretation, see Fulcher & Bamford 1996.

[2] The interested reader may refer to Alderson, Clapham and Wall, 1995 or McNamara and Roever, 2006, or the respective websites of these organizations, for further details and discussion.

[3] For further details see http://www.surrey.ac.uk/ELI/ilta/ilta.html

[4] For more details on the ALTE Code of Practice see www.alte.org, Fulcher and Bamford 1996 and Saville 2002.

[5] See Liontou (this issue) and Section 5.2 below for an example of such research.

[6] See Hartzoulakis, this volume.

[7] For the descriptions of the respective training programmes (aims, principal features, nature, content, and frequency, see Karavas 2008, Karavas 2009 a,b, and Mitsikopoulou 2009). For a brief description of the oral examiner training programme, see Delieza, this volume.

[8] For more information on the nature and main results of the observation project see Delieza this volume.

Appendix 1

Extract from the ILTA Code of Ethics

The following extract presents all the principles included in the Code and the Annotations for the first two principles.

Principle 1

Language testers shall have respect for the humanity and dignity of each of their test takers. They shall provide them with the best possible professional consideration and shall respect all persons’ needs, values and cultures in the provision of their language testing service.

Annotation

Language testers shall not discriminate against nor exploit their test takers on the grounds of age, gender, race, ethnicity, sexual orientation, language background, creed, political affiliations or religion, nor knowingly impose their own values (for example social, spiritual, political and ideological) to the extent that they are aware of them.
Language testers shall never exploit their clients nor try to influence them in ways that are not related to the aims of the service they are providing or the investigation they are mounting.
Sexual relations between language testers and their test takers are always unethical.
Teaching and researching language testing involving the use of test takers (including students) requires their consent; IT ALSO REQUIRES respect for their dignity and privacy. Those involved should be informed that their refusal to participate will not affect the quality of the language tester’s service (in teaching, in research, in development, in administration). THE USE OF all forms of media (paper, electronic, video, audio) involving test takers requires informed consent before being used for secondary purposes.
Language testers shall communicate the information they produce to all relevant stakeholders in as meaningful a way as possible.
Where possible, test takers should be consulted on all matters concerning their interests.

Principle 2

Language testers shall hold all information obtained in their professional capacity about their test takers in confidence and they shall use professional judgment in sharing such information.

Annotation

In the face of widespread use of photocopied materials and facsimile, computerized test records and data banks, the increased demand for accountability from various sources and the personal nature of the information obtained from test takers, language testers are obliged to respect test taker’s right to confidentiality and to safeguard all information associated with the tester-test taker relationship.
Confidentiality cannot be absolute, especially where the records concern students who may be competing for admissions and appointments. A careful balance must be maintained between preserving confidentiality as a fundamental aspect of the language tester’s professional duty and the wider responsibility the tester has to society.
Similarly, in appropriate cases, the language tester’s professional colleagues also have the right to access data of test takers other than their own in order to improve the service the profession offers. In such cases, those given access to data should agree to maintain confidentiality.
Test taker data collected from sources other than the test taker directly (for example from teachers of students under test) are subject to the same principles of confidentiality.
There may be statutory requirements on disclosure, for example where the language tester is called as an expert witness in a law court or tribunal. In such circumstances the language tester is released from his/her professional duty to confidentiality.

Principle 3

Language testers should adhere to all relevant principles embodied in national and international guidelines when undertaking any trial, experiment, treatment or other research activity.

Principle 4

Language testers shall not allow the misuse of their professional knowledge or skills, in so far as they are able.

Principle 5

Language testers shall continue to develop their professional knowledge, sharing this knowledge with colleagues and other language professionals.

Principle 6

Language testers shall share the responsibility of upholding the integrity of the language teaching profession.

Principle 7

Language testers in their societal roles shall strive to improve the quality of language testing assessment ad teaching services, promote the just allocation of those services and contribute to the education of society regarding language learning and language proficiency.

Principle 8

Language testers shall be mindful of their obligations to the society within which they work, while recognizing that those obligations may on occasion conflict with their responsibilities to their test takers and to other stakeholders.

Principle 9

Language testers shall regularly consider the potential effects, both short and long term on all stakeholders of their projects, reserving the right to withhold their professional services on the grounds of conscience.

Appendix 2

The ALTE Code of Practice: Part 1 – Responsibilities of ALTE Members

The following is an extract from the ALTE Code of Practice. The responsibilities of examination developers are presented.

Developing examinations

Members of ALTE undertake to provide the information that examination users and takers need in order to select appropriate examinations.

In practice, this means that members of ALTE will guarantee to do the following, for the examinations described in this book

Define what each examination assesses and what it should be used for. Describe the population(s) for which it is appropriate.
Explain relevant measurement concepts as necessary for clarity at the level of detail that is appropriate for the intended audience(s).
Describe the process of examination development.
Explain how the content and skills to be tested are selected.
Provide either representative samples or complete copies of examination tasks, instructions, examination sheets, manuals and reports of results to users.
Describe the procedures used to ensure the appropriateness of each examination for the groups of different racial, ethnic, or linguistic backgrounds who are likely to be tested.
Identify and publish the conditions and skills needed to administer each examination.

Interpreting examination results

Members of ALTE undertake to help examination users and takers interpret results correctly.

In practice this means that members of ALTE will guarantee to do the following:

Provide prompt and easily understood reports of examination results that describe candidate performance clearly and accurately.
Describe the procedures used to establish pass marks and/or grades.
If no pass mark is set, then provide information that will help users follow reasonable procedures for setting pass marks when it is appropriate to do so.
Warn users to avoid specific, reasonably anticipated misuses of examination results.
Strive for fairness.
Undertake to make their examinations as fair as possible for candidates of different backgrounds (e.g. race, gender, ethnic origin, handicapping conditions, etc.)

In practice this means that members of ALTE will guarantee to do the following

Review and revise examination tasks and related materials to avoid potential insensitive content or language.
Enact procedures that help to ensure that differences in performance are related primarily to the skills under assessment rather than to irrelevant factors such as race, gender and ethnic origin.
When feasible, make appropriately modified forms of examinations or administration procedures available for candidates with handicapping conditions.

Informing examination takers

Members of ALTE undertake to provide examination users and takers with the information described below.

In practice, this means that members of ALTE will guarantee to do the following:

Provide examination users and takers with information to help them judge whether a particular examination should be taken, or if an available examination at a higher or lower level should be used.
Provide candidates with the information they need in order to be familiar with the coverage of the examination, the types of task formats, the rubrics and other instructions and appropriate examination taking strategies. Strive to make such information equally available to all candidates.
Provide information about the rights which candidates may or may not have to obtain copies of papers and completed answer sheets, to re-take papers, have papers re-marked or results checked.
Provide information about how long results will be kept on file and indicate to whom and under what circumstances results will or will not be released.

Appendix 3

Minimum standards for establishing quality profiles in ALTE examinations

The complete list of the ALTE Miminum Standards for establishing quality profiles is presented below.

Test construction

The examination is based on a theoretical construct, e.g. on a model of communicative competence.
You can describe the purpose and context of use of the examination, and the population for which the examination is appropriate.
You provide criteria for selection and training of test constructors and expert judgment involved both in test construction and in the review and revision of the examinations.
Parallel examinations are comparable across different administrations in terms of content, stability, consistency and grade boundaries.
If you make a claim that the examination is linked to an external reference system (e.g. Common European Framework), then you can provide evidence of alignment to this system.

Administration and logistics

All centres are selected to administer your examination according to clear, transparent, established procedures, and have access to regulations about how to do so.
Examination papers are delivered in excellent condition and by secure means of transport to the authorized examination centres, your examination administration system provides for secure and traceable handling of all examination documents, and confidentiality of all system procedures can be guaranteed.
The examination administration system has appropriate support systems (e.g. phone hotline, web services)
You adequately protect the security and confidentiality of results and certificates, and data relating to them, in line with current data protection legislation, and candidates are informed of their rights of access to this data.
The examination system provides support for candidates with special needs.

Marking and grading

Marking is sufficiently accurate and reliable for purpose and type of examination.
You can document and explain how marking is carried out and reliability estimated, and how data regarding achievement of raters of writing and speaking performances is collected and analysed.

Test analysis

You collect and analyse data on an adequate and representative sample of candidates and can be confident that their achievement is a result of the skills measured in the examination and not influenced by factors like L1, country of origin, gender, age and ethnic origin.
Item level data (e.g. for computing the difficulty, discrimination, reliability and standard errors of measurement of the examination) is collected from an adequate sample of candidates and analysed.

Communication with stakeholders

The examination administration system communicates the results of the examinations to candidates and to examination centres (e.g. schools) promptly and clearly.
You provide information to stakeholders on the appropriate context, purpose and use of the examination, on its content and on the overall reliability of the results of the examination.
You provide suitable information to stakeholders to help then interpret results and use them appropriately.

[Back]