DIRECTIONS e-journal

Jenny Liontou
FACTORS AFFECTING THE COMPREHENSION DIFFICULTY OF TEXTS: THE KPG TEST-TAKERS’ BELIEFS AND READING EXPERIENCES[1]

Abstract

This paper[2] examines test-takers’ attitudes to the texts and tasks of the reading comprehension module of the KPG English language exams. As such, it constitutes part of ongoing doctoral research on the effect specific text and reader variables such as text organisation, genre, content, readers’ background knowledge, topic preference and familiarity, sex, age, exam preparation, place of residence, etc. have on text comprehensibility. Findings from a longitudinal survey conducted at the RCeL[3] on a national scale in the form of questionnaires administered to EFL learners sitting for the KPG English language exams are presented in this paper which discusses text and task difficulty from the test-takers’ perspective i.e., their attitudes towards and opinions about the lexical complexity of the texts, the difficulty of the tasks and their familiarity with or preference for specific text topics.

Keywords: reading comprehension, reading attitudes, text difficulty, readability

1. Introduction

Over the last twenty years there has been an increasing interest in exploring and better defining the interaction between the reader and the text, rather than text features alone, acknowledging that reading is a multifaceted phenomenon, reflecting properties of both the text and the reader and the interaction between them (Keshavarz et al., 2007: 20; Rupp et al., 2006: 445; Drucker, 2003: 25; Bachman, 2002: 464; Parker et al., 2001: 308; Alderson, 2000: 33). Especially in the field of language testing, a number of researchers have highlighted the significant effect of reader factors on comprehension and further supported the view that a satisfactory understanding of the reading process depends on clearly describing the impact readers’ personal characteristics such as sex, age and years of instruction in a foreign language as well as prior knowledge of and interest in the topic, can have on overall exam performance (Pomplun & Omar, 2001: 171; Stricker et al., 2001: 205; Sullivan, 2000: 373; Wightman, 1998: 255; Sunderland, 1993: 47). As Sullivan has stressed (2000: 373-4) it is necessary for test-designers to better define any learner variables that may affect exam performance in order to be able to detect and eliminate any test features that can be sources of measurement error and thus distort their attempts to achieve an accurate assessment of learners’ language proficiency. To this end, a number of researchers have called for more research in order to enhance our knowledge about the factors contributing to text difficulty while at the same time examining the interaction between text and reader variables in order to determine how, when and why one mediates the other under high-stakes reading test conditions (c.f. Shokrpour, 2004: 15-16; Phakiti, 2003: 651; Ungerleider, 2003: 122; Bailin & Grafstein, 2001: 298; Chalhoub-Deville & Turner, 2000: 528; Saito et al., 1999: 216; 212; Purpura, 1997: 290; Hock, 1990: 215; Nevo, 1989). Given this need, the current research has been designed to add to our present state of knowledge on reading comprehension by exploring text difficulty from the test-takers’ point of view and examining the effect a number of reader variables can have on perceived task and reading module difficulty.

2. Literature review

The extensive research conducted in the field of second language acquisition with specific reference to ways of reading and text-processing strategies has shown that several reader and text variables can have an impact on either the reading process or reading product and, as such, need to be taken into account during test design and validation (c.f. Carr, 2006: 282; Ben-Anath, 2005: 1; Brantmeier, 2005: 37; Mesmer, 2005: 2; Phakiti, 2003: 651; Leong et al., 2002: 130; Lin et al., 2002: 187; Read, 2000: 191). In fact, as pointed out by Alderson (2000: 81) “If the reading process or product varies according to such influences, and if such influences occur in the test or assessment procedures, then this is a risk to the validity of test scores, to the generalizability of results or to the interpretation of performances”. While differences in the test performance of different groups of test-takers has been a long-standing interest among language testers, the past decade has seen increased research in this area, both in terms of test-takers’ personal characteristics such as sex, age, native language and years of instruction in a foreign language as well as psychological factors such as feelings of anxiety under testing conditions.

Especially in relation to sex, as Brown & McNamara (2004: 525) acknowledged, the issue of sex-related effects in educational tests has been a concern of psychometrics in order for test developers to ensure that test items are not biased against particular subsets of the testing population due to their sex, ethnicity or age inter alia. Sunderland (1995: 32) and Brown & McNamara (2004: 524) further emphasized that, since test results can affect people’s lives, detecting sex bias, it is not only a professional issue, but also an ethical, even political one. Despite the importance of detecting sex-based bias in foreign language reading performance, the relevant literature is rather limited and its findings far from consistent (Phakiti, 2003: 652; Coles & Hall, 2002: 97; Garner & Engelhard, 1999: 29; Moreno & Mayer, 1999: 355; Pomplun & Sundbye, 1999: 95). For example, Wen and Johnson (1997: 27) reported that females outperformed males on a standardized national proficiency test. On the other hand, Boyle (1987: 273) found that although female Chinese university students of English as a foreign language scored higher overall in a language proficiency test, males scored higher than females on listening vocabulary, which might be related to reading comprehension. Scarcella and Zimmerman (1998: 27) also found that males performed significantly better than females on specific types of vocabulary tests.

Although a lot of research has focused exclusively on the role of prior knowledge in reading comprehension, Bügel and Buunk (1996: 15) investigated the impact of both reading interests and prior knowledge on the performance of Dutch secondary students on English reading tests. Their data analysis revealed significant differences in test results, which, according to the researchers, appeared to be largely due to sex-based differences in reading habits that consequently had an impact on students’ prior knowledge and topic interests (ibid: 16). In a more recent study, Brantmeier (2003: 11) also investigated the effect of sex-based differences and topic familiarity on second language reading comprehension performance of intermediate learners of Spanish and, similarly with Bügel and Buunk, reported significant interactions between readers' sex and passage content with reading performance.

Much earlier, Allen et al. (1988: 164) investigated personal factors by asking learners of French, German and Spanish to read four passages and report on their thoughts while reading. Statistical analysis of the protocols showed that years of instruction had a significant effect on subjects’ text comprehension (ibid: 169). In contrast with Bernhardt (1986: 26) who argued that comprehension is text-dependent, Allen et al. found that text-based factors such as text genre did little to explain the reading abilities of second language learners (ibid: 170). In other words, when statistical analysis of this study was applied exclusively to the texts no effects were traced, whereas, when readers’ characteristics were evaluated in conjunction with the texts, significant effects did emerge (ibid: 170). In another study, Ilyin et al. (1987: 158) investigated the effect of two specific learner variables i.e., age and high-school graduation, in the test performance of adult learners of English and age was found to have a significant impact on their scores in the reading and listening modules of the exam.

The effect of topic familiarity on test-takers’ performance has always been considered a fundamental issue in validating reading comprehension test scores (c.f. Nassaji, 2007: 83-85; Alptekin, 2006: 494; Calvo et al., 2006: 68; Ketchum, 2006: 22; Krekeler, 2006: 99; Pulido, 2004: 503; Wylie & McGuinness, 2004: 497; Ellis, 2001: 34; Barry & Lazarte, 1998: 190-1; Bensoussan, 1998: 213). In 2004, Pulido examined the effect of topic familiarity on second language reading comprehension and retention of new lexical items from printed texts. On average, the participants recalled more semantic propositions from the more familiar stories in comparison to the less familiar ones, which, according to Pulido, shows that their text comprehension increased due to topic familiarity (ibid: 489). However, in another study carried out by Jennings et al. (1999) regarding the Canadian Academic English Language test battery, it was found that factors such as test-takers’ interest in the topic, prior knowledge or perceived relevance of the topic had no significant effect on their overall performance, and as a result were not a potential threat to the validity of the test (ibid: 448). Despite these findings, according to the authors, the fact that test-takers expressed strong concerns about the topic of the test indicated that it was a salient factor to them (ibid: 449). Finally, Hong (1999: 431) investigated the relationship between students’ perception of test difficulty and actual performance by analyzing questionnaire and test data collected from 208 undergraduate students and found that although perceived test difficulty did not have a direct effect on students’ performance, it had an indirect effect on the arousal of anxiety i.e., students who perceived the test as more difficult showed higher levels of worry and anxiety, which in turn influenced their performance as measured by their final scores (ibid: 432). Hong concluded by pointing out the importance of understanding students’ perception of test difficulty because its impact on the arousal of anxiety can further lead to impaired performance (ibid: 432).

Undoubtedly, performance in language tests can be influenced by a wide range of factors, which can interact in an unpredictable way with the characteristics of individual test-takers (Lumley & O’Sullivan, 2005: 415, Woodfield et al., 2005: 45; Horiba, 2000: 223). Having said that, does not, however, mean that test developers should remain passive observers of such interactions but rather try to eliminate their effect and control for any sources of bias and error in their tests. In fact, as Brown & McNamara (2004: 524) and Shohamy (2001: 383-4) artfully pointed out, a responsible professional practice in language testing involves care for the rights and interests of particular social groups who may be at risk from biased language assessments. Test designers’ knowledge of the variables that can influence the reading process and product, which is the main objective of our ongoing research project, is, thus, in many respects linked to the validity of the reading tests i.e., test designers need to know whether and to what extent the instruments they use to measure reading ability are, amongst other things, affected by the readability[4] of the texts, the difficulty of the tasks, or the readers’ background knowledge.

3. Aim and methodology

The aim of the research presented in this paper has been twofold: a) to provide a description of KPG test-takers’ personal characteristics for all interested parties such as test-developers and material writers to have a clearer picture of their intended audience and b) to examine whether and to what extent specific reader variables such as sex, age, years of instruction, background knowledge and topic preference influence KPG test-takers’ impressions of text and task difficulty. Taking into consideration test-takers’ feedback and in a way treating them as “judges” of reading texts is a crucial part of the present research, since they were the actual readers who had to interact with the texts in order to perform a set of tasks and demonstrate a successful performance in the specific exams. So, their feelings and opinions about the texts are believed to be of great importance since they can give us more in-depth information about text difficulty from the reader’s point of view. At this point it should also be mentioned that most studies pertinent to EFL comprehension processes and reading performance involved a small number of EFL learners taking part not in real, high-stakes exams but in experiments designed to explore a limited number of reader or text variables in isolation and which had in many cases produced rather contradictory results. With particular reference to the KPG English language exams, no empirical evidence was available on what features make a text easier or more difficult for Greek learners of English to comprehend or which topics are more familiar to them and could better suit their interests and reading preferences. Furthermore, the idea of examining the effect not only of text characteristics but also of personal characteristics on test-takers’ perceptions of reading module difficulty comes to question a rather common belief that since nothing can be done to control these factors there is no need to look into their impact; to the contrary, it is hoped that such an investigation will make us aware of the impact less explored factors may have on overall exam performance. In order to explore these issues, the following research questions were formed:

Which is KPG test-takers’ profile in terms of personal characteristics such as age, sex, years of learning English and place of residence? Are there any significant differences between B2 and C1 test-takers with regard to these characteristics?
To what extent do specific reader variables contribute to KPG test-takers’ perceptions of reading texts and tasks difficulty?
Is there a relationship between text characteristics and test-takers’ perceived level of text difficulty?

In order to collect valuable information on the KPG test-takers’ profile and reading preferences as well as their perceptions of text and task difficulty, a national survey was conducted by the RCeL with the support of the Greek Ministry of Education & Religious Affairs in the form of questionnaires administered to all KPG test-takers sitting for the B2 and C1 English language exams in the May and November 2006 and 2007 and May 2008 examination periods. The questionnaire was administered on the day of the exam, in exam centres all over Greece. More specifically, once test-takers had finished their exam and before leaving the examination room, they were kindly requested to provide some personal information regarding their age, sex, place of residence, years of learning English and preparation for the KPG exams. They also rated on a Likert scale of 1-5 or 1-3 a variety of statements about module, task, text and instructions difficulty, topic and genre familiarity, topic preference, text lexical complexity and text processing strategies (Appendix A). The design of the questionnaire was rather time consuming and required in depth research on relevant literature, careful piloting and refining for its final version to contain a comprehensive range of questions. However, this effort proved its worth since the collected data has yielded valuable information from the test-takers’ point of view that would not have been possible to otherwise obtain. The actual process of designing the questionnaire involved the clarification of the research purpose and the formulation of specific questions that corresponded to this purpose. More specifically, since the aim of the study was to investigate the effect specific text and reader variables have on test-takers’ perceptions of reading comprehension difficulty, the questions were designed to elicit information related to these variables, i.e., topic familiarity, topic preference, text lexical complexity and employed reading strategies. Bearing in mind the fact that the larger the sample size (n), the more reliable the findings, the present questionnaire was administered to a rather large sample (i.e., all test-takers sitting for the B2 and C1 English language exams), and thus had to include structured, closed questions which test-takers could answer by selecting a numerical value. These kinds of questions are, according to Cohen et al. (2000: 251), very useful in large-scale surveys since they enable collection of data amenable to thorough statistical analysis and allow comparisons to be made across groups, facilitating the observation of patterns occurring in the same sample. To avoid dichotomous questions, a Likert scale was employed, since such a scaling can provide a range of responses with a degree of sensitivity while still generating numerical values (ibid: 251). As can be seen in Table 1, to date 7.250 questionnaires from five examination periods have been analyzed, 4.750 in relation to the B2 level and 2.500 in relation to the C1 level.

	B2 Level				C1 Level
	Number	Percent	Cronbach's Alpha	Number		Percent	Cronbach's Alpha
May 2006	1750	17%	0.794	500		13%	0.771
November 2006	500	13%	0.799	500		31%	0.822
May 2007	1000	13%	0.741	500		16%	0.801
November 2007	750	23%	0.829	500		47%	0.774
May 2008	750	15%	0.791	500		20%	0.773
Table 1: Test-takers’ questionnaires

Moreover, in order to get a general idea of the readability level of reading texts and to check whether specific readability formulas could be of any practical usefulness within the context of the KPG exams, the readability level of all reading texts used in past KPG exams was electronically estimated using four readability formulas[5], namely the Dale-Chall Readability Formula, the Flesch Reading Ease Index, the Gunning's Fog Index and the Flesch-Kincaid Grade Level Index, and two computer programs, i.e., the AceReader Pro Deluxe and the TextAnalyser Version 2 software (for more information on readability formulas see Lane, 2004; Leong et al., 2002; Bailin & Grafstein, 2001; Dreher & Singer, 2001; Fulcher, 1997; Oakland & Meade & Smith, 1991; Pichert & Elam, 1985; Fry, 1989; Klare, 1984). To date, 24 B2 level texts and 24 C1 level texts have been processed, resulting in a total of 192 analyses. Along with readability indices, additional text features that may contribute to text difficulty at word and sentence level have been measured, such as word length, number of sentences, average characters per word, average words per sentence, average number of syllables per 100 words, average number of sentences per 100 words, and lexical density. The latter, as defined by Halliday (1996: 347), refers to the proportion of content (lexical) words – basically nouns, full verbs, adjectives, and adverbs derived from adjectives – over the total number of words in a text.

Most importantly, an attempt was made to investigate whether text lexical diversity (D) could provide a consistent measurement tool of text difficulty. Lexical diversity refers to ‘the range of vocabulary displayed in the discourses,’ as roughly defined by Durán et al. in 2004, and has been extensively mentioned in literature related to learners’ written or spoken production. Actually, as Jarvis (2002: 57) explains, it is considered an essential indicator of learners’ quality of output, but has not yet been extensively applied to reading texts. Measuring lexical diversity is a rather complex task that has troubled researchers across several disciplines and has triggered the development of different formulas, amongst which the most widely used has been the type-token ratio (TTR) i.e., the ratio of different words to total words used in a text. However, the traditional TTR has been considered sensitive to sample size and, according to Richards (1987: 203), renders itself incapable of assessing lexical diversity in a satisfactory way if the sample sizes vary a great deal. More recently, another transformation of the traditional TTR was developed by Malvern and Richards (2002: 85) and incorporated into the vocd programme, which is included in the CLAN suite of programmes in the Child Language Data Exchange System (CHILDES). A minimum sample size of 50 words-tokens is required to compute a valid D. In this process, data first has to be transcribed in a standard format (CHAT), following a specific coding system (which in fact makes the whole procedure rather time consuming), but once the text has been checked and the coding accepted, the software calculates the average of Ds three times and reports a final optimum D value for a text. According to its designers, D presents methodological advantages over other measures of lexical diversity since it is independent of sample size and makes it possible to perform valid comparisons between varying quantities of linguistic data (Malvern & Richards, 2002: 85).

As previously stated, up until now measurement of lexical diversity has been mainly confined to spoken data produced by learners. However, given the fact that lexical diversity relates to vocabulary and that vocabulary itself is an important element of reading texts, the idea of investigating the relation between lexical diversity (D) and text complexity could not remain unexplored. After personal communication with Professor Malvern (in 19/03/2008), such an application was considered to be viable so in the following section, some preliminary findings of one such investigation are presented.

4. Results and discussion

4.1 KPG test-takers’ profile

Descriptive statistics have shown that the KPG B2 level test-takers’ age in all five exam periods ranges from 12 to 62 years. Although age values are scattered, indicating lack of normal distribution, most values are found within the range of 15-20 years, with 15 being the most frequent age for test-takers sitting for the B2 exam. As far as their sex is concerned, women outnumber men in all five exam periods. Regarding the number of years of learning English, the majority of test-takers have stated that they had been learning English for about six to eight years before sitting for the exam, and also that they have been especially prepared for it (which means that they were at least partly familiar with the format and type of activities included in the exam). With regard to the KPG C1 level test-takers’ profile, age values in all five exam periods range from 12 to 48 years. Although age values follow no normal distribution, most of them occur within the range of 16-24 years, as was the case for B2 test-takers, with 16 being the most frequent figure. As far as sex is concerned, once again women taking part in the C1 exam outnumber men in all five exam periods. In relation to years of learning English, the majority of C1 test-takers seem to have been learning English for about seven to nine years before sitting for the exam, a bit longer than the B2 ones. However, as opposed to the B2 findings, a lower percentage of test-takers sitting for the C1 exam have stated that they have undergone some kind of special preparation before sitting for the specific exam.

4.2 KPG test-takers’ personal characteristics & reading module difficulty

Statistical analysis has shown that age as a predictor variable can have an effect on test-takers’ perception of reading module difficulty, topic familiarity, topic preference, and text lexical complexity. More specifically, ordinal and logistic regression analyses revealed that at a 0.05 significance level, the older the test-takers, the higher the probability of finding the reading module more difficult and the text language more complex to comprehend (Table 2-Appendix B). However, at the same time, older test-takers seem to be more familiar with and to enjoy reading about specific topics, such as environmental issues, than their younger counterparts.

Years of learning English as an independent variable can also be used to a limited extent to predict test-takers’ responses. Regression analysis has shown that as the years of learning English increase, test-takers are more likely to be familiar with certain topics and to find the reading module less difficult and the text language less complex to comprehend. Surprisingly enough, a significant effect of sex on test-takers’ perception of module difficulty has also been observed since in most B2 and C1 examination periods, it has been found that women are more likely than men to find the vocabulary and the rubrics less difficult to comprehend and the items less difficult to answer, but also to be less familiar with certain topics i.e., environmental issues like climate change and recycling (Table 3-Appendix B).

Although no significant effect of place of residence on overall module difficulty has been observed, it should be pointed out that test-takers from urban areas seem to have been more familiar with and liked reading about topics of rather special social interest (i.e., multilingualism in the United States, Parkour, Common History Books in Europe) than those living in rural regions. In May 2007, a question regarding preparation for the KPG exams was added in the questionnaire to test the hypothesis that test-takers’ familiarization with the format and content of the specific exam has a positive impact on their perception of module difficulty. Data analysis of three consequent examination periods showed no significant effect of exam preparation on test-takers’ perception of text and task difficulty. However, in May 2007, a significant effect of exam preparation on test-takers’ perception of text difficulty was observed i.e., test-takers specially prepared for the KPG exam found the text language less complex to comprehend and the items less difficult to answer.

4.3 KPG test-takers’ perspective on module difficulty

Data analysis of KPG B2 test-takers’ perspective on overall difficulty of the reading comprehension module has shown that in all five exam periods the majority of responses ranged from ‘slightly’ to ‘fairly difficult,’ and only a small percentage of test-takers found the module ‘very’ or ‘extremely difficult’ (12% in May 2006, 9% in November 2006, 16% in May 2007, 8,9% in November 2007 and 10.7% in May 2008). Task instructions were mainly treated as ‘not difficult’ or ‘slightly difficult.’ However, as regards expected level of text and item difficulty, a noticeable difference has be traced between May and November 2007, since in May 2007 a much lower percentage of test-takers found the texts and the items as difficult as expected, whereas in November 2007 the majority of participants (almost 80%) stated that texts and tasks were as difficult as expected. Such a discrepancy calls for further investigation in order for specific patterns of difference to be identified.

As far as C1 test-takers are concerned, their perceptions of reading module and instructions difficulty are quite similar to those of the B2 level test-takers, i.e., in all five exam periods, the majority of participants found the reading module ‘slightly’ to ‘fairly difficult,’ and the task instructions ‘not difficult’ to ‘slightly difficult.’ However, when looking at the expected level of text and item difficulty, a discrepancy can once again be observed between exam periods: in November 2006, a much lower percentage of participants found the texts and the items as difficult as expected, in comparison with May 2006, November 2007 and May 2008.

4.4 KPG test-takers’ perspective versus textual features

The comparative examination of test-takers’ questionnaires and textual features has provided preliminary insight into text comprehension constraints on the part of the reader. In Table 4, data regarding B2 test-takers’ perspectives on text lexical complexity for the B2 level May and November 2006 exams are presented (Column 1) and cross-related to four readability formulas (Columns 2-4), and estimates of the relationship between the lexical diversity of a text and its readability (Column 5). All four readability formulas seem to fail to agree with test-takers’ perspective on text lexical complexity, since texts are mainly rated as ‘very difficult’ by readability formulas and appropriate for advanced readers (i.e., above average readers). On the other hand, if we compare data in the first and last columns, there seems to be a better correspondence between participants’ responses and lexical diversity (D), since, in most cases, as the percentage of participants who found the text ‘very’ to ‘extremely difficult’ increases, so does the lexical diversity of the text.

B2 Level, May 2006
	Test-takers’ impression of text lexical complexity [‘very’ to ‘extremely’ difficult]	Flesch Reading Index [0=difficult 100=easy]	Dale-Chall Index		Flesch-Kincaid Index	Gunning’s Fog Index	Lexical Diversity (D)
Text A	12.7%	57.3	9-10		9.6	10.40	80.75
Text B	13.8%	52.6	11-12		9.9	10.80	81.32
Text C	14.7%	52.7	11-12		12.9	12.40	81.16
B2 level, November 2006
Text A	09.1%	53.8		9-10	11.3	13.20	96.58
Text B	07.0%	74.0		7-8	6.8	8.00	60.89
Text C	16.9%	60.0		9-10	10.4	11.60	111.84
Table 4: B2 test-takers’ perspective vs. textual features

An additional comparative exploration of textual features in texts that test-takers have rated very differently in terms of lexical complexity has been rather revealing in terms of text comprehension difficulty. In November 2007, Text B (which was about climate change) was rated as ‘very’ to ‘extremely difficult’ by a high percentage of test-takers, whereas in November 2006, Text C (which involved film reviews) was treated as the least difficult in all five examination periods. As can be seen in Table 5, Text B is shorter and consists of more sentences (20 as opposed to 19), but all other features (characters per word, words per sentence, Dale-Chall and Flesch Reading Ease Index) are lower than those of Text C. However, its lexical density and, more importantly, its lexical diversity are of a much higher level, a factor that might have affected participants’ perception on overall text difficulty.

KPG READING COMPREHENSION MODULE
B2 Level
	November 2006 Text C (Film Reviews)	November 2007 Text B (Climate Change)
Words	421	254
Characters	2029	1128
Sentences	19	20
Average characters per word	4.7	4.3
Average words per sentence	22.2	12.7
Average number of syllables per 100 words	147.03	142.512
Dale-Chall Grade Level	9-10	7-8
Flesch Reading Ease	60.0	73.4
Flesch-Kincaid Grade Level	10.4	6.2
Lexical density	.59	.65
Lexical diversity	111.84	159.53
Table 5: B2 quantitative text measures

Similarly, in relation to C1 texts from all five examination periods, Text B of the November 2007 exam (which was about general quiz rules) was found ‘very’ to ‘extremely difficult’ by the highest percentage of test-takers, whereas Text B of the May 2006 exam (which was about going to the dentist) was treated as the least difficult. As we can see in Table 6, both texts are of equal word length and level of lexical density, and consist of exactly the same number of sentences (16). Text B of November 2007 has been rated as ‘very difficult’ according to the Dale-Chall Readability Formula but ‘fairly easy’ according to the Flesch Index, so no clear conclusion can be drawn from these indices. However, a difference can once again be noticed in relation to lexical diversity, which is of a much higher level in Text B and might partly account for such a discrepancy.screpancy.

KPG READING COMPREHENSION MODULE
C1 Level
	May 2006 Text B (Going to the dentist)	November 2007 Text B (Quiz Rules)
Words	231	230
Characters	1,187	1,137
Sentences	16	16
Average characters per word	5.0	4.8
Average words per sentence	14.4	14.4
Average number of syllables per 100 words	158.01	147.62
Dale-Chall Grade Level	7-8	9-10
Flesch Reading Ease	58.5	67.3
Flesch-Kincaid Grade Level	8.7	7.5
Lexical density	.62	.62
Lexical diversity	86.74	108.46
Table 6: C1 quantitative text measures

4.5 Additional effects

Analysis of the questionnaires has yielded valuable information regarding the effect additional variables such as perceived text lexical complexity and task difficulty may have on test-takers’ impression of overall reading module difficulty. More specifically, in all five examination periods (2006-2008) of the KPG B2 and C1 English language exams, perceived text lexical complexity was consistently found to correlate significantly with perceived reading module difficulty, i.e., the more difficult the test-takers find the vocabulary of the text, the more difficult the whole module seems to become for them (Table 7-Appendix B). Perceived item and task instructions difficulty were also found to correlate significantly with module difficulty, which indicates that as the items become more difficult to answer and the task instructions more complex to comprehend, test-takers found it more difficult to cope with the whole reading module. Moreover, data analysis showed that text preference negatively correlated with text lexical complexity. In other words, the more test-takers liked the topic of a text, the less difficult they found its vocabulary. Finally, topic familiarity was found to correlate significantly with text preference, which indicates that the more familiar test-takers were with certain topics, the more they liked reading them.

5. Conclusions

An important aim of this paper has been to show that a reading comprehension text is not difficult or easy by itself. It is not difficult or easy merely because it has a more or less complicated organization, sentence and paragraph structure, lexical density, variation or dearth. How difficult or easy a text is also depends on the reader – the reader who, in the case of a test, is actually obliged to show that s/he understands it. As a matter of fact, as this paper has attempted to show, text comprehensibility depends on a variety of reader-related factors, including the reader’s perceptions and attitudes towards the text, as well as his or her reading habits i.e., the reading strategies s/he has developed and learnt to use for a successful exam performance. Moreover, as this paper has made clear, there is a strong relationship between these factors and the reader’s age (and therefore his/her literacy level), as well as his/her sex and years of learning the foreign language.

The question that logically follows is why it is useful to know how different readers who are in this case in the role of test-takers, respond to each reading comprehension text. An even more crucial question is for whom is this information useful, ‘Who needs to know how easy or difficult different readers find specific reading comprehension texts?’ Our response to this second question, which indirectly also answers the first one, is that this information is perhaps most valuable for the team of experts responsible for the exams – a team which usually includes test developers, results analysts and scholars who prepare the exams and interpret the results for different audiences, as in the case of the KPG exam battery. Although there is no doubt that test designers cannot change test-takers’ age, sex or years of learning English, it is believed that they can take care not to bias their tests towards either factor, they can check for such possible biases in their analyses of results, and they can inform future test-takers of potential implications to their performance due to these factors. In other words, the availability of valid data related to actual readers i.e., the test-takers who actually sit for the exams, provides the test design team with opportunities to take full advantage of the empirical evidence and make a successful match between readers and texts, without threatening the validity and reliability of the reading test. Undoubtedly, such information could be useful not only for test developers and researchers worldwide, but also for test-takers and/or EFL teachers who may be preparing their students for the specific test, or those who are interested in this information as to steer their choice of reading material for the English class. For example, by taking advantage of the present study and trying to link research findings with practice, publishing companies and curriculum designers may find it easier to choose texts appropriate for their intended readers’ level of language ability, age and experiences.

Finally, from a methodological point of view, this study hopes to be making a contribution given that, instead of controlling one specific variable (a method often employed by researchers), it investigated and cross-related the effect of various variables for their interaction to be better defined and predicted within the context of the KPG exams. Using simultaneous analyses, it also demonstrated a way to detect measurement invariance as a critical prerequisite to achieve validity of inferences based on specific measures.

With specific reference to the KPG English language exams, it should be mentioned that the finding that texts used in the exams seem to be more appealing to older test-takers and better match their experiences and preferences has been carefully considered by KPG test designers for a wider range of topics appealing to younger test-takers to be included in the exam. Regarding text appropriateness and its level of difficulty, continuous effort is paid on carefully selecting reading texts based not only on readability indices and text features such as word length, number of sentences, average characters per word, average words per sentence, etc., but also on lexical density and diversity. In the context of the KPG exams, the findings of the present research could be of practical use to the KPG exams that are currently administered in other languages, i.e., French, Italian, Spanish and German, and beneficial to the design of reading comprehension tests of other levels to be introduced in the years to come. More importantly, the present research comes to provide empirical evidence and further support the rationale and ideology of the KPG exams according to which, “despite its international orientation and its European character, focuses on the Greek user of the target language and aims to cover local needs by responding to the demands of education and the labour market in Greece” (Dendrinos, 2009).

All in all, although it should once again be admitted that reader variables such as those mentioned above are essentially beyond the control of test designers, efforts should be made to alleviate their effects and improve test-takers’ chances for a successful performance by not only acknowledging their existence but also by paying careful attention to the selection of texts, especially when using authentic materials, for foreign language learners to encounter materials that are at an appropriate level of difficulty. By following such an approach they will be able to provide evidence that the methods they employ to elicit data are appropriate for the intended audience, that the procedures used provide stable and consistent data and, consequently that the interpretations they make of the results are justified since they are based on a valid and reliable exam system.

References

Alderson, C. (2000). Assessing Reading. Cambridge: Cambridge University Press.

Allen, D., Bernhardt, B, Berry, T., & Demel, M. (1988). Comprehension and text genre: an analysis of secondary school foreign language readers. The Modern Language Journal, 72, 163-172.

Alptekin, C. (2006). Cultural familiarity in inferential and literal comprehension in L2 reading. System, 34, 494-508.

Bachman, L. (2002). Some reflections on task-based language performance assessment. Language Testing, 19(4), 453-476.

Bailin, A., & Grafstein, A. (2001). The linguistic assumptions underlying readability formulae: a critique. Language & Communication, 21, 285-301.

Barry, S., & Lazarte, A. (1998). Evidence for Mental Models: How Do Prior Knowledge, Syntactic Complexity, and Reading Topic Affect Inference Generation in a Recall Task for Nonnative Readers of Spanish? The Modern Language Journal, 82, 176-193.

Ben-Anath D. (2005): The role of connectives in text comprehension. Teachers College, Columbia University Working Papers in TESOL & Applied Linguistics, 5(2), 1-27.

Bensoussan, M. (1998). Schema effects in EFL reading comprehension. Journal of Research in Reading, 21(3), 213-227.

Bernhardt, E. (1986). Proficient Texts or Proficient Readers? ADFL Bulletin, 18(1), 25-28.

Boyle, J. (1987). Sex differences in listening vocabulary. Language Learning, 37(2), 273-284.

Brantmeier, C. (2003). Does gender make a difference? Passage content and comprehension in second language reading. Reading in a Foreign Language, 15(1), 1-27.

Brantmeier, C. (2005). Effects of Reader’s Knowledge, Text Type and Test Type on L1 and L2 Reading Comprehension in Spanish. The Modern Language Journal, 89, 37-53.

Brown, A. & McNamara, T. (2004). “The Devil is in the Detail”: Researching Gender Issues in Language Assessment. TESOL Quarterly, 38(3), 524-538.

Bügel, K., & Buunk, B. (1996). Sex differences in foreign language text comprehension: the role of interests and prior knowledge. The Modern Language Journal, 80, 15-31.

Calvo, M., Castillo, D., & Schmalhofer, F. (2006). Strategic influence on the time course of predictive inferences in reading. Memory & Cognition, 34(1), 68-77

Carr, N. (2006). The factor structure of test task characteristics and examinee performance. Language Testing, 23(3), 269-289.

Chalhoub-Deville, M., & Turner, C. (2000). What to look for in ESL admission tests: Cambridge certificate exams, IELTS and TOEFL. System, 28(4), 523-539.

Cohen, L., Manion, L., & Morrison, K. (2000). Research Methods in Education. London: Routledge Falmer.

Coles, M., & Hall, C. (2002). Gendered readings: learning from children’s reading choices. Journal of Research in Reading, 25(1), 96-108.

Dale, E., & Chall, J. (1995). Readability Revisited: The New Dale-Chall Readability Formula. Cambridge, Massachusetts: Brookline Books Inc.

Dendrinos, B. (2009). Rationale and ideology of the KPG exams. ELT News, September 2009. Available online at: http://rcel.enl.uoa.gr/kpg/gr_kpgcorner_sep2009.htm .

Dreher, M., & Singer, H. (2001). Friendly Texts and Text-Friendly Teachers. Theory into Practice, 28(2), 98-104

Drucker, M. (2003). What reading teachers should know about ESL learners. International Reading Association, 22-29.

Durán, P., Malvern, D., Richards, B., & Chipere, N. (2004). Developmental Trends in Lexical Diversity. Applied Linguistics, 25(2), 220-242.

Ellis, N. (2001). Memory for language. In P. Robinson (Ed.), Cognition and second language instruction (pp. 33–68). Cambridge, UK: Cambridge University Press.

Fry, E. (1989). Readability Formulas: maligned but valid. Journal of Reading, 32, 292-297.

Fulcher, G. (1997). Text difficulty and accessibility: Reading Formulae and expert judgment. System, 25(4) 497-513.

Garner, M., & Engelhard, G. (1999). Gender Differences in Performance on Multiple-Choice and Constructed Response Mathematics Items. Applied Measurement in Education, 12(1), 29-51.

Halliday, M. A. K. (1996). Literacy and Linguistics: a Functional Perspective. In R. Hasan & G. Williams (Eds.) Literacy in Society (pp. 339-376.) London: Longman.

Hock, T. (1990). The Role of Prior Knowledge and Language Proficiency as Predictors of Reading Comprehension among Undergraduates. Multilingual Matters, 59, 214-224.

Hong, E. (1999). Test anxiety, perceived test difficulty, and test performance: temporal patterns of their effects. Learning & Individual Differences, 11(4), 431-447.

Horiba, Y. (2000). Reader Control in Reading: Effects of Language Competence, Text Type, and Task. Discourse Processes, 29(3) 223-267.

Ilyin, D., Spurling, S., & Seymour, S. (1987). Do learner variables affect cloze correlations? System, 15(2), 149-160.

Jarvis, S. (2002). Short texts, best-fitting curves and new measures of lexical diversity. Language Testing, 19(1), 57-84.

Jennings, M., Fox, J., Graves, B., & Shohamy, E. (1999).The test takers’ choice: an investigation of the effect of topic on language-test performance. Language Testing, 16(4), 426-456.

Keshavarz, M., Atai, M., & Ahmadi, H. (2007). Content schemata, linguistic simplification and EFL readers’ comprehension and recall. Reading in a Foreign Language, 19(1), 19-33.

Ketchum, E. (2006). The cultural baggage of second language reading: An approach to understanding. Foreign Language Annals, 39, 22–42.

Klare, G. (1984). Readability. In D. Pearson (ed.) Handbook of Reading Research Vol.1 (pp. 681-744). New York: Longman.

Krekeler, C. (2006). Language for special academic purposes (LSAP) testing: the effect of background knowledge revisited. Language Testing, 23(1), 99-130.

Leong, E., Ewing, M., & Pitt, L. (2002). E-comprehension: Evaluating B2B websites using readability formulae. Industrial Marketing Management, 31, 125-131.

Lin, L., Zabrucky, K., & Moore, D. (2002). Effects of text difficulty and adults’ age on relative calibration of comprehension. American Journal of Psychology, 115(2), 187-198.

Lumley, T., & O’Sullivan, B. (2005). The effect of test-taker gender, audience and topic on task performance in tape-mediated assessment of speaking. Language Testing, 22(4), 415-437.

Malvern, D., & Richards, B. (2002). Investigating accommodation in language proficiency interviews using a new measure of lexical diversity. Language Testing, 19(1), 85-104.

Meade, C., & Smith, C. (1991). Readability Formulas: Cautions and Criteria. Patient Education and Counseling, 17, 153-158.

Mesmer, H. (2005). Introduction: Text Accessibility and the Struggling Reader. Reading & Writing Quarterly, 21, 1-5.

Moreno, R. & Mayer, R. (1999). Gender differences in responding to open-ended problem-solving questions. Learning & Individual Differences, 11(4), 355-364.

Nassaji, H. (2002). Schema theory and knowledge-based processes in second language reading comprehension: A need for alternative perspectives. Language Learning, 52(2), 439-481.

Nevo, N. (1989). Test-taking strategies on a multiple-choice test of reading comprehension. Language Testing, 6(2), 199-215.

Oakland, T., & Lane, H.B. (2004). Language, Reading, and Readability Formulas: Implications for Developing and Adapting Tests. International Journal Of Testing, 4(3), 239-252.

Parker, R., Hasbrouck, J., & Weaver, L. (2001). Spanish Readability Formulas for Elementary-Level Texts: A Validation Study. Reading & Writing Quarterly, 17, 307-322.

Phakiti, A. (2003). A Closer Look at Gender and Strategy Use in L2 Reading. Language Learning, 53(4), 649-702.

Pichert, J., & Elam, P. (1985). Readability formulas may mislead you. Patient Education and Counseling, 7, 181-191.

Pomplun, M. & Omar, M. (2001). Do Reading Passages About War Provide Factorially Invariant Scores for Men and Women? Applied Measurement in Education, 14(2), 171-189.

Pomplun, M. & Sundbye, N. (1999). Gender Differences in Constructed Response Reading Items. Applied Measurement in Education, 12(1), 95-109.

Pulido, D. (2004). The relationship between text comprehension and second language incidental vocabulary acquisition: A matter of topic familiarity? Language Learning, 54(3), 469–523.

Purpura, J. (1997). An analysis of the relationships between test takers’ cognitive and metacognitive strategy use and second language test performance. Language Learning, 47(2), 289-325.

Read, J. (2000). Assessing Vocabulary. Cambridge: Cambridge University Press.

Richards, B.J. (1987). Type/token ratios: what do they really tell us? Journal of Child Language, 14, 201-209.

Rupp, A., Ferne, T. & Choi H. (2006). How assessing reading comprehension with multiple-choice questions shapes the construct: a cognitive processing perspective. Language Testing, 23(4), 441-474.

Saito, Y., Horwitz, E., & Garza, T. (1999). Foreign Language Reading Anxiety. The Modern Language Journal, 83, 202-218.

Scarcella, R., & Zimmerman, C. (1998). Academic words and gender: ESL student performance on a test of academic lexicon. Studies in Second Language Acquisition, 20(1), 27-49.

Shohamy, E. (2001). Democratic assessment as an alternative. Language Testing, 18(4), 373-391.

Shokrpour, N. (2004). Systemic Functional Grammar as a Basis for Assessing Text Difficulty. Indian Journal of Applied Linguistics, 30(2), 5-26.

Stricker, L., Rock, D., & Bennett, R. (2001). Sex and Ethnic-Group Differences on Accomplishments Measures. Applied Measurement in Education, 14(3), 205-218.

Sullivan, B. (2000). Exploring gender and oral proficiency interview performance. System, 28, 373-386.

Sunderland, J. (1993). Learner Gender and Language Testing: Any Connections? Language Testing Update, 13, 46-56.

____________ (1995). Gender and Language Testing. Language Testing Update, 17, 24-35.

Ungerleider, C. (2003). Large-Scale Student Assessment: Guidelines for Policymakers. International Journal of Testing, 3(2), 119-128.

Wen, Q., & Johnson, R. (1997). L2 learner variables and English achievements: A study of tertiary-level English majors in China. Applied Linguistics, 18(1), 27-48.

Wightman, L. (1998). An Examination of Sex Differences in LSAT Scores from the Perspective of Social Consequences. Applied Measurement in Education, 11(3), 255-277.

Woodfield, R., Earl-Novell, S., & Solomon, L. (2005). Gender and mode of assessment at university: should we assume female students are better suited to coursework and males to unseen examinations? Assessment & Evaluation in Higher Education, 30(1), 35-50.

Wylie, J., & McGuinness C. (2004). The interactive effects of prior knowledge and text structure on memory for cognitive psychology texts. British Journal of Educational Psychology, 74, 497-514.

Endnotes

[1] A shorter version of this paper has been included in the Proceedings of the 4^th Annual South-East European Doctoral Student Conference, while findings of the survey have also been presented at the ALTE 3^rd International Conference, the EALTA 5^th International Conference and the Harvard Graduate School of Education Research Conference.

[2] Sincere thanks to Prof. Bessie Dendrinos, supervisor of my thesis, for her insightful comments and valuable suggestions regarding this article. I am also indebted to her for all the patience and care with which she nurtured the emergence and continuation of this project. This study was supported in part by the RCeL which has generously offered me the research data I have been investigating. However, the views expressed in this paper do not reflect the official policy of the Center and the responsibility for the way the data has been interpreted relies entirely on me.

[3] This part of the research was carried out while I was a research assistant at the RCeL. In this capacity, I coordinated the surveys conducted with KPG test-takers between 2006, 2007 and 2008 examination periods. For the data analysis I present in this paper, I was able to use the RCeL facilities and to consult with the RCeL in-house statistics specialist. These surveys are part of a larger ongoing research project, directed by Prof. B. Dendrinos, investigating the profile of the KPG test-takers, their attitudes towards the KPG English exams and the strategies they use while taking the exam.

[4] Dale and Chall (1995: 80) proposed a definition of readability as ‘the sum total -including the interaction- of all those elements within a given piece of printed material that affect the success a group of readers have with it. The success is the extent to which they understand it, read it at an optimal speed and find it interesting’. Readability is, thus, an umbrella term that broadly refers to the ease or difficulty of a text.

[5] Readability formulas roughly indicate the approximate reading grade level required to understand a text.

Appendix A

Appendix B

[Back]

Contents

Forthcoming Issue

Current Issue

Back Issues

Call for Papers

Developed By A.Sarafantoni
Designed By C.Frantzeskaki