|
Jenny Liontou
FACTORS AFFECTING THE
COMPREHENSION DIFFICULTY OF TEXTS: THE KPG
TEST-TAKERS’ BELIEFS AND READING EXPERIENCES[1]
Abstract
This paper[2]
examines
test-takers’ attitudes to the texts
and tasks of the reading comprehension
module of the KPG English language exams.
As such, it constitutes part of ongoing
doctoral research on the effect specific
text and reader variables such as text
organisation, genre, content, readers’
background knowledge, topic preference and
familiarity, sex, age, exam preparation,
place of residence, etc.
have on text comprehensibility. Findings
from a longitudinal survey conducted at the
RCeL[3]
on a national scale in the form of
questionnaires administered to EFL learners
sitting for the KPG English language exams
are presented in this paper which discusses
text and task difficulty from the
test-takers’ perspective i.e., their
attitudes towards and opinions about the
lexical complexity of the texts, the
difficulty of the tasks and their
familiarity with or preference for specific
text topics.
Keywords:
reading comprehension,
reading attitudes, text
difficulty, readability
1. Introduction
Over the last twenty years
there has been an increasing interest in
exploring and better defining the
interaction between the reader and the text,
rather than text features alone,
acknowledging that reading is a multifaceted
phenomenon, reflecting properties of both
the text and the reader and the interaction
between them (Keshavarz et al., 2007:
20; Rupp et al., 2006: 445; Drucker,
2003: 25; Bachman, 2002: 464; Parker et
al., 2001: 308; Alderson, 2000: 33).
Especially in the field of language testing,
a number of researchers have highlighted the
significant effect of reader factors on
comprehension and further supported the view
that a satisfactory understanding of the
reading process depends on clearly
describing the impact readers’ personal
characteristics such as sex, age and years
of instruction in a foreign language as well
as prior knowledge of and interest in the
topic, can have on overall exam performance
(Pomplun & Omar, 2001: 171; Stricker et
al., 2001: 205; Sullivan, 2000: 373;
Wightman, 1998: 255; Sunderland, 1993: 47).
As Sullivan has stressed (2000: 373-4) it is
necessary for test-designers to better
define any learner variables that may affect
exam performance in order to be able to
detect and eliminate any test features that
can be sources of measurement error and thus
distort their attempts to achieve an
accurate assessment of learners’ language
proficiency. To this end, a number of
researchers have called for more research in
order to enhance our knowledge about the
factors contributing to text difficulty
while at the same time examining the
interaction between text and reader
variables in order to determine how, when
and why one mediates the other under
high-stakes reading test conditions (c.f.
Shokrpour, 2004: 15-16; Phakiti, 2003: 651;
Ungerleider, 2003: 122; Bailin & Grafstein,
2001: 298; Chalhoub-Deville & Turner, 2000:
528; Saito et al., 1999: 216; 212;
Purpura, 1997: 290; Hock, 1990: 215; Nevo,
1989). Given this need, the current research
has been designed to add to our present
state of knowledge on reading comprehension
by exploring text difficulty from the
test-takers’ point of view and examining the
effect a number of reader variables can have
on perceived task and reading module
difficulty.
2. Literature review
The extensive research
conducted in the field of second language
acquisition with specific reference to ways
of reading and text-processing strategies
has shown that several reader and text
variables can have an impact on either the
reading process or reading product and, as
such, need to be taken into account during
test design and validation (c.f. Carr, 2006:
282; Ben-Anath, 2005: 1; Brantmeier, 2005:
37; Mesmer, 2005: 2; Phakiti, 2003: 651;
Leong et al., 2002: 130; Lin et al.,
2002: 187; Read, 2000: 191). In fact, as
pointed out by Alderson (2000: 81) “If the
reading process or product varies according
to such influences, and if such influences
occur in the test or assessment procedures,
then this is a risk to the validity of test
scores, to the generalizability of results
or to the interpretation of performances”.
While differences in the test performance of
different groups of test-takers has been a
long-standing interest among language
testers, the past decade has seen increased
research in this area, both in terms of
test-takers’ personal characteristics such
as sex, age, native language and years of
instruction in a foreign language as well as
psychological factors such as feelings of
anxiety under testing conditions.
Especially
in
relation to
sex, as Brown & McNamara (2004:
525) acknowledged, the issue of
sex-related effects in educational tests has
been a concern of psychometrics in order for
test developers to ensure that test items
are not biased against particular subsets of
the testing population due to their sex,
ethnicity or age inter alia. Sunderland
(1995: 32) and Brown & McNamara
(2004: 524) further emphasized that, since
test results can affect people’s lives,
detecting sex bias, it is not only a
professional issue, but also an ethical,
even political one. Despite the importance
of detecting sex-based bias in foreign
language reading performance, the relevant
literature is rather limited and its
findings far from consistent (Phakiti, 2003:
652;
Coles & Hall, 2002: 97;
Garner & Engelhard, 1999: 29; Moreno &
Mayer, 1999: 355; Pomplun & Sundbye, 1999:
95). For example, Wen and Johnson (1997: 27)
reported that females outperformed males on
a standardized national proficiency test. On
the other hand, Boyle (1987: 273) found that
although female Chinese university students
of English as a foreign language scored
higher overall in a language proficiency
test, males scored higher than females on
listening vocabulary, which might be related
to reading comprehension. Scarcella and
Zimmerman (1998: 27) also found that males
performed significantly better than females
on specific types of vocabulary tests.
Although a
lot of research has focused exclusively on
the role of prior knowledge in reading
comprehension, Bügel and Buunk (1996:
15) investigated the impact of both reading
interests and prior knowledge on the
performance of Dutch secondary students on
English reading tests. Their data analysis
revealed significant differences in test
results, which, according to the
researchers, appeared to be largely due to
sex-based differences in reading habits that
consequently had an impact on students’
prior knowledge and topic interests (ibid:
16). In a more recent study, Brantmeier
(2003: 11) also investigated the effect of
sex-based differences and topic familiarity
on second language reading comprehension
performance of intermediate learners of
Spanish and, similarly with Bügel and Buunk,
reported significant interactions between
readers' sex and passage content with
reading performance.
Much earlier, Allen et al.
(1988: 164) investigated personal factors by
asking learners of French, German and
Spanish to read four passages and report on
their thoughts while reading. Statistical
analysis of the protocols showed that years
of instruction had a significant effect on
subjects’ text comprehension (ibid: 169). In
contrast with Bernhardt (1986: 26) who
argued that comprehension is text-dependent,
Allen et al. found that text-based
factors such as text genre did little to
explain the reading abilities of second
language learners (ibid: 170). In other
words, when statistical analysis of this
study was applied exclusively to the texts
no effects were traced, whereas, when
readers’ characteristics were evaluated in
conjunction with the texts, significant
effects did emerge (ibid: 170). In another
study, Ilyin et al. (1987: 158)
investigated the effect of two specific
learner variables i.e., age and high-school
graduation, in the test performance of adult
learners of English and age was found to
have a significant impact on their scores in
the reading and listening modules of the
exam.
The effect of topic
familiarity on test-takers’ performance has
always been considered a fundamental issue
in validating reading comprehension test
scores (c.f. Nassaji, 2007: 83-85; Alptekin,
2006: 494; Calvo et al., 2006: 68;
Ketchum, 2006: 22; Krekeler, 2006: 99;
Pulido, 2004: 503; Wylie & McGuinness, 2004:
497; Ellis, 2001: 34; Barry & Lazarte, 1998:
190-1; Bensoussan, 1998: 213). In 2004,
Pulido examined the effect of topic
familiarity on second language reading
comprehension and retention of new lexical
items from printed texts. On average, the
participants recalled more semantic
propositions from the more familiar stories
in comparison to the less familiar ones,
which, according to Pulido, shows that their
text comprehension increased due to topic
familiarity (ibid: 489). However, in another
study carried out by Jennings et al.
(1999) regarding the Canadian Academic
English Language test battery, it was found
that factors such as test-takers’ interest
in the topic, prior knowledge or perceived
relevance of the topic had no significant
effect on their overall performance, and as
a result were not a potential threat to the
validity of the test (ibid: 448). Despite
these findings, according to the authors,
the fact that test-takers expressed strong
concerns about the topic of the test
indicated that it was a salient factor to
them (ibid: 449). Finally, Hong (1999: 431)
investigated the relationship between
students’ perception of test difficulty and
actual performance by analyzing
questionnaire and test data collected from
208 undergraduate students and found that
although perceived test difficulty did not
have a direct effect on students’
performance, it had an indirect effect on
the arousal of anxiety i.e., students who
perceived the test as more difficult showed
higher levels of worry and anxiety, which in
turn influenced their performance as
measured by their final scores (ibid: 432).
Hong concluded by pointing out the
importance of understanding students’
perception of test difficulty because its
impact on the arousal of anxiety can further
lead to impaired performance (ibid: 432).
Undoubtedly,
performance in language tests can be
influenced by a wide range of factors, which
can interact in an unpredictable way with
the characteristics of individual
test-takers (Lumley & O’Sullivan, 2005: 415,
Woodfield et al., 2005: 45; Horiba,
2000: 223). Having said that, does not,
however, mean that test developers should
remain passive observers of such
interactions but rather try to eliminate
their effect and control for any sources of
bias and error in their tests. In fact, as
Brown & McNamara (2004: 524) and Shohamy
(2001: 383-4) artfully pointed out, a
responsible professional practice in
language testing involves care for the
rights and interests of particular social
groups who may be at risk from biased
language assessments. Test designers’
knowledge of the variables that can
influence the reading process and product,
which is the main objective of our ongoing
research project, is, thus, in many respects
linked to the validity of the reading tests
i.e., test designers need to know whether
and to what extent the instruments they use
to measure reading ability are, amongst
other things, affected by the readability[4]
of the texts, the difficulty
of the tasks, or the readers’ background
knowledge.
3. Aim and methodology
The aim of the research
presented in this paper has been twofold: a)
to provide a description of KPG test-takers’
personal characteristics for all interested
parties such as test-developers and material
writers to have a clearer picture of their
intended audience and b) to examine whether
and to what extent specific reader variables
such as
sex, age, years of
instruction, background knowledge and topic
preference influence KPG test-takers’
impressions of text and task difficulty.
Taking into consideration test-takers’
feedback and in a way treating them as
“judges” of reading texts is a crucial part
of the present research, since they were the
actual readers who had to interact with the
texts in order to perform a set of tasks and
demonstrate a successful performance in the
specific exams. So, their feelings and
opinions about the texts are believed to be
of great importance since they can give us
more in-depth information about text
difficulty from the reader’s point of view.
At this point it should also be mentioned
that most studies pertinent to EFL
comprehension processes and reading
performance involved a small number of EFL
learners taking part not in real,
high-stakes exams but in experiments
designed to explore a limited number of
reader or text variables in isolation and
which had in many cases produced rather
contradictory results. With particular
reference to the KPG English language exams,
no empirical evidence was available on what
features make a text easier or more
difficult for Greek learners of English to
comprehend or which topics are more familiar
to them and could better suit their
interests and reading preferences.
Furthermore, the idea of examining the
effect not only of text characteristics but
also of personal characteristics on
test-takers’ perceptions of reading module
difficulty comes to question a rather common
belief that since nothing can be done to
control these factors there is no need to
look into their impact; to the contrary, it
is hoped that such an investigation will
make us aware of the impact less explored
factors may have on overall exam
performance. In order to explore these
issues, the following research questions
were formed:
-
Which is KPG test-takers’
profile in terms of personal characteristics
such as age, sex, years of learning English
and place of residence? Are there any
significant differences between B2 and C1
test-takers with regard to these
characteristics?
-
To what extent do specific
reader variables contribute to KPG
test-takers’ perceptions of reading texts
and tasks difficulty?
-
Is there a relationship
between text characteristics and
test-takers’ perceived level of text
difficulty?
In order to collect valuable
information on the KPG test-takers’ profile
and reading preferences as well as their
perceptions of text and task difficulty, a
national survey was conducted by the RCeL
with the support of the Greek Ministry of
Education & Religious Affairs in the form of
questionnaires administered to all KPG
test-takers sitting for the B2 and C1
English language exams in the May and
November 2006 and 2007 and May 2008
examination periods. The questionnaire was
administered on the day of the exam, in exam
centres all over Greece. More specifically,
once test-takers had finished their exam and
before leaving the examination room, they
were kindly requested to provide some
personal information regarding their age,
sex, place of residence, years of learning
English and preparation for the KPG exams.
They also rated on a Likert scale of 1-5 or
1-3 a variety of statements about module,
task, text and instructions difficulty,
topic and genre familiarity, topic
preference, text lexical complexity and text
processing strategies (Appendix A). The
design of the questionnaire was rather time
consuming and required in depth research on
relevant literature, careful piloting and
refining for its final version to contain a
comprehensive range of questions. However,
this effort proved its worth since the
collected data has yielded valuable
information from the test-takers’ point of
view that would not have been possible to
otherwise obtain. The actual process of
designing the questionnaire involved the
clarification of the research purpose and
the formulation of specific questions that
corresponded to this purpose. More
specifically, since the aim of the study was
to investigate the effect specific text and
reader variables have on test-takers’
perceptions of reading comprehension
difficulty, the questions were designed to
elicit information related to these
variables, i.e., topic familiarity, topic
preference, text lexical complexity and
employed reading strategies. Bearing in mind
the fact that the larger the sample size
(n), the more reliable the findings, the
present questionnaire was administered to a
rather large sample (i.e., all test-takers
sitting for the B2 and C1 English language
exams), and thus had to include structured,
closed questions which test-takers could
answer by selecting a numerical value. These
kinds of questions are, according to Cohen
et al. (2000: 251), very useful in
large-scale surveys since they enable
collection of data amenable to thorough
statistical analysis and allow comparisons
to be made across groups, facilitating the
observation of patterns occurring in the
same sample. To avoid dichotomous questions,
a Likert scale was employed, since such a
scaling can provide a range of responses
with a degree of sensitivity while still
generating numerical values (ibid: 251). As
can be seen in Table 1, to date 7.250
questionnaires from five examination periods
have been analyzed, 4.750 in relation to the
B2 level and 2.500 in relation to the C1
level.
|
B2 Level |
C1 Level |
|
Number |
Percent |
Cronbach's Alpha |
Number |
Percent |
Cronbach's Alpha |
May 2006 |
1750 |
17% |
0.794 |
500 |
13% |
0.771 |
November 2006 |
500 |
13% |
0.799 |
500 |
31% |
0.822 |
May 2007 |
1000 |
13% |
0.741 |
500 |
16% |
0.801 |
November 2007 |
750 |
23% |
0.829 |
500 |
47% |
0.774 |
May 2008 |
750 |
15% |
0.791 |
500 |
20% |
0.773 |
Table 1: Test-takers’
questionnaires |
Moreover, in order to
get a general idea of the readability level
of reading texts and to check whether
specific readability formulas could be of
any practical usefulness within the context
of the KPG exams, the readability level of
all reading texts used in past KPG exams was
electronically estimated using four
readability formulas[5],
namely the Dale-Chall Readability Formula,
the Flesch Reading Ease Index, the Gunning's
Fog Index and the Flesch-Kincaid Grade Level
Index, and two computer programs, i.e., the
AceReader Pro Deluxe and the TextAnalyser
Version 2 software (for more information on
readability formulas see Lane, 2004; Leong
et al., 2002; Bailin & Grafstein,
2001; Dreher & Singer, 2001; Fulcher, 1997;
Oakland & Meade & Smith, 1991; Pichert &
Elam, 1985; Fry, 1989; Klare, 1984). To
date, 24 B2 level texts and 24 C1 level
texts have been processed, resulting in a
total of 192 analyses. Along with
readability indices, additional text
features that may contribute to text
difficulty at word and sentence level have
been measured, such as word length, number
of sentences, average characters per word,
average words per sentence, average number
of syllables per 100 words, average number
of sentences per 100 words, and lexical
density. The latter, as defined by
Halliday (1996: 347), refers to the
proportion of content (lexical) words –
basically nouns, full verbs, adjectives, and
adverbs derived from adjectives – over the
total number of words in a text.
Most
importantly, an attempt was made to
investigate whether text lexical diversity
(D) could provide a consistent measurement
tool of text difficulty. Lexical diversity
refers to ‘the range of vocabulary displayed
in the discourses,’ as roughly defined by
Durán et al. in 2004, and has been
extensively mentioned in literature related
to learners’ written or spoken production.
Actually, as Jarvis (2002: 57) explains, it
is considered an essential indicator of
learners’ quality of output, but has not yet
been extensively applied to reading texts.
Measuring lexical diversity is a rather
complex task that has troubled researchers
across several disciplines and has triggered
the development of different formulas,
amongst which the most widely used has been
the type-token ratio (TTR) i.e., the ratio
of different words to total words used in a
text. However, the traditional TTR has been
considered sensitive to sample size and,
according to Richards (1987: 203), renders
itself incapable of assessing lexical
diversity in a satisfactory way if the
sample sizes vary a great deal. More
recently, another transformation of the
traditional TTR was developed by Malvern and
Richards (2002: 85) and incorporated into
the vocd
programme,
which is included in the CLAN suite of
programmes
in the Child Language Data Exchange System
(CHILDES). A minimum sample size of 50
words-tokens is required to compute a valid
D. In this process, data first has to be
transcribed in a standard format (CHAT),
following a specific coding system (which in
fact makes the whole procedure rather time
consuming), but once the text has been
checked and the coding accepted, the
software calculates the average of Ds three
times and reports a final optimum D value
for a text. According to its designers, D
presents methodological advantages over
other measures of lexical diversity since it
is independent of sample size and makes it
possible to perform valid comparisons
between varying quantities of linguistic
data (Malvern & Richards, 2002: 85).
As previously
stated, up until now measurement of lexical
diversity has been mainly confined to spoken
data produced by learners. However, given
the fact that lexical diversity relates to
vocabulary and that vocabulary itself is an
important element of reading texts, the idea
of investigating the relation between
lexical diversity (D) and text complexity
could not remain unexplored. After personal
communication with Professor Malvern (in
19/03/2008), such an application was
considered to be viable so in the following
section, some preliminary findings of one
such investigation are presented.
4. Results and discussion
4.1
KPG test-takers’ profile
Descriptive statistics have
shown that the KPG B2 level test-takers’ age
in all five exam periods ranges from 12 to
62 years. Although age values are scattered,
indicating lack of normal distribution, most
values are found within the range of 15-20
years, with 15 being the most frequent age
for test-takers sitting for the B2 exam. As
far as their sex is concerned, women
outnumber men in all five exam periods.
Regarding the number of years of learning
English, the majority of test-takers have
stated that they had been learning English
for about six to eight years before sitting
for the exam, and also that they have been
especially prepared for it (which means that
they were at least partly familiar with the
format and type of activities included in
the exam). With regard to the KPG C1 level
test-takers’ profile, age values in all five
exam periods range from 12 to 48 years.
Although age values follow no normal
distribution, most of them occur within the
range of 16-24 years, as was the case for B2
test-takers, with 16 being the most frequent
figure. As far as sex is concerned, once
again women taking part in the C1 exam
outnumber men in all five exam periods. In
relation to years of learning English, the
majority of C1 test-takers seem to have been
learning English for about seven to nine
years before sitting for the exam, a bit
longer than the B2 ones. However, as opposed
to the B2 findings, a lower percentage of
test-takers sitting for the C1 exam have
stated that they have undergone some kind of
special preparation before sitting for the
specific exam.
4.2 KPG test-takers’ personal
characteristics & reading module difficulty
Statistical analysis has
shown that age as a predictor variable can
have an effect on test-takers’ perception of
reading module difficulty, topic
familiarity, topic preference, and text
lexical complexity. More specifically,
ordinal and logistic regression analyses
revealed that at a 0.05 significance level,
the older the test-takers, the higher the
probability of finding the reading module
more difficult and the text language more
complex to comprehend (Table 2-Appendix B).
However, at the same time, older test-takers
seem to be more familiar with and to enjoy
reading about specific topics, such as
environmental issues, than their younger
counterparts.
Years of learning English as
an independent variable can also be used to
a limited extent to predict test-takers’
responses. Regression analysis has shown
that as the years of learning English
increase, test-takers are more likely to be
familiar with certain topics and to find the
reading module less difficult and the text
language less complex to comprehend.
Surprisingly enough, a significant effect of
sex on test-takers’ perception of module
difficulty has also been observed since in
most B2 and C1 examination periods, it has
been found that women are more likely than
men to find the vocabulary and the rubrics
less difficult to comprehend and the items
less difficult to answer, but also to be
less familiar with certain topics i.e.,
environmental issues like climate change and
recycling (Table 3-Appendix B).
Although no significant
effect of place of residence on overall
module difficulty has been observed, it
should be pointed out that test-takers from
urban areas seem to have been more familiar
with and liked reading about topics of
rather special social interest (i.e.,
multilingualism in the United States,
Parkour, Common History Books in Europe)
than those living in rural
regions.
In May 2007, a question regarding
preparation for the KPG exams was added in
the questionnaire to test the hypothesis
that test-takers’ familiarization with the
format and content of the specific exam has
a positive impact on their perception of
module difficulty. Data analysis of three
consequent examination periods showed no
significant effect of exam preparation on
test-takers’ perception of text and task
difficulty. However, in May 2007, a
significant effect of exam preparation on
test-takers’ perception of text difficulty
was observed i.e., test-takers specially
prepared for the KPG exam found the text
language less complex to comprehend and the
items less difficult to answer.
4.3 KPG test-takers’
perspective on module difficulty
Data analysis of KPG B2
test-takers’ perspective on overall
difficulty of the reading comprehension
module has shown that in all five exam
periods the majority of responses ranged
from ‘slightly’ to ‘fairly difficult,’ and
only a small percentage of test-takers found
the module ‘very’ or ‘extremely difficult’
(12% in May 2006, 9% in November 2006, 16%
in May 2007, 8,9% in November 2007 and 10.7%
in May 2008). Task instructions were mainly
treated as ‘not difficult’ or ‘slightly
difficult.’ However, as regards expected
level of text and item difficulty, a
noticeable difference has be traced between
May and November 2007, since in May 2007 a
much lower percentage of test-takers found
the texts and the items as difficult as
expected, whereas in November 2007 the
majority of participants (almost 80%) stated
that texts and tasks were as difficult as
expected. Such a discrepancy calls for
further investigation in order for specific
patterns of difference to be identified.
As far as C1 test-takers are
concerned, their perceptions of reading
module and instructions difficulty are quite
similar to those of the B2 level
test-takers, i.e., in all five exam periods,
the majority of participants found the
reading module ‘slightly’ to ‘fairly
difficult,’ and the task instructions ‘not
difficult’ to ‘slightly difficult.’ However,
when looking at the expected level of text
and item difficulty, a discrepancy can once
again be observed between exam periods: in
November 2006, a much lower percentage of
participants found the texts and the items
as difficult as expected, in comparison with
May 2006, November 2007 and May 2008.
4.4
KPG
test-takers’ perspective versus textual
features
The comparative examination
of test-takers’ questionnaires and textual
features has provided preliminary insight
into text comprehension constraints on the
part of the reader. In Table 4, data
regarding B2 test-takers’ perspectives on
text lexical complexity for the B2 level May
and November 2006 exams are presented
(Column 1) and cross-related to four
readability formulas (Columns 2-4), and
estimates of the relationship between the
lexical diversity of a text and its
readability (Column 5). All four readability
formulas seem to fail to agree with
test-takers’ perspective on text lexical
complexity, since texts are mainly rated as
‘very difficult’ by readability formulas and
appropriate for advanced readers (i.e.,
above average readers). On the other hand,
if we compare data in the first and last
columns, there seems to be a better
correspondence between participants’
responses and lexical diversity (D), since,
in most cases, as the percentage of
participants who found the text ‘very’ to
‘extremely difficult’ increases, so does the
lexical diversity of the text.
B2 Level, May 2006 |
|
Test-takers’ impression of text
lexical complexity
[‘very’ to ‘extremely’ difficult] |
Flesch Reading Index
[0=difficult 100=easy] |
Dale-Chall
Index
|
Flesch-Kincaid Index
|
Gunning’s Fog
Index
|
Lexical Diversity
(D)
|
Text A
|
12.7% |
57.3 |
9-10 |
9.6 |
10.40 |
80.75 |
Text B
|
13.8% |
52.6 |
11-12 |
9.9 |
10.80 |
81.32 |
Text C
|
14.7% |
52.7 |
11-12 |
12.9 |
12.40 |
81.16 |
B2 level, November 2006 |
Text A
|
09.1% |
53.8 |
9-10 |
11.3 |
13.20 |
96.58 |
Text B
|
07.0% |
74.0 |
7-8 |
6.8 |
8.00 |
60.89 |
Text C
|
16.9% |
60.0 |
9-10 |
10.4 |
11.60 |
111.84 |
Table 4: B2
test-takers’ perspective vs. textual
features |
An additional
comparative exploration of textual features
in texts that test-takers have rated very
differently in terms of
lexical complexity has been rather revealing
in terms of
text
comprehension difficulty. In November 2007,
Text B (which was about climate change) was
rated as ‘very’ to ‘extremely difficult’ by
a high percentage of test-takers, whereas in
November 2006, Text C (which involved film
reviews) was treated as the least difficult
in all five examination periods. As can be
seen in Table 5, Text B is shorter and
consists of more sentences (20 as opposed to
19), but all other features (characters per
word, words per sentence, Dale-Chall and
Flesch Reading Ease Index) are lower than
those of Text C. However, its lexical
density and, more importantly, its lexical
diversity are of a much higher level, a
factor that might have affected
participants’ perception on overall text
difficulty.
KPG READING COMPREHENSION MODULE |
B2 Level |
|
November 2006
Text C (Film Reviews) |
November 2007
Text B (Climate Change) |
Words
|
421 |
254 |
Characters
|
2029 |
1128 |
Sentences
|
19 |
20 |
Average characters per word |
4.7 |
4.3 |
Average words per sentence |
22.2 |
12.7 |
Average number of syllables per
100 words |
147.03 |
142.512 |
Dale-Chall Grade Level |
9-10 |
7-8 |
Flesch Reading Ease |
60.0 |
73.4 |
Flesch-Kincaid Grade Level |
10.4 |
6.2 |
Lexical density |
.59 |
.65 |
Lexical diversity |
111.84 |
159.53 |
Table 5: B2 quantitative text
measures |
Similarly, in relation to C1
texts from all five examination periods,
Text B of the November 2007 exam (which was
about general quiz rules) was found ‘very’
to ‘extremely difficult’ by the highest
percentage of test-takers, whereas Text B of
the May 2006 exam (which was about going to
the dentist) was treated as the least
difficult. As we can see in Table 6, both
texts are of equal word length and level of
lexical density, and consist of exactly the
same number of sentences (16). Text B of
November 2007 has been rated as ‘very
difficult’ according to the Dale-Chall
Readability Formula but ‘fairly easy’
according to the Flesch Index, so no clear
conclusion
can be drawn from these indices.
However, a difference can once again be
noticed in relation to lexical diversity,
which is of a much higher level in Text B
and might partly account for such a
discrepancy.screpancy.
KPG READING COMPREHENSION MODULE |
C1 Level |
|
May 2006
Text B (Going to the dentist) |
November 2007
Text B (Quiz Rules) |
Words |
231 |
230 |
Characters |
1,187 |
1,137 |
Sentences |
16 |
16 |
Average characters per word |
5.0 |
4.8 |
Average words per sentence |
14.4 |
14.4 |
Average number of syllables per
100 words |
158.01 |
147.62 |
Dale-Chall Grade Level |
7-8 |
9-10 |
Flesch Reading Ease |
58.5 |
67.3 |
Flesch-Kincaid Grade Level |
8.7 |
7.5 |
Lexical density |
.62 |
.62 |
Lexical diversity |
86.74 |
108.46 |
Table 6: C1 quantitative text
measures |
4.5
Additional
effects
Analysis of the
questionnaires has yielded valuable
information regarding the effect additional
variables such as perceived text lexical
complexity and task difficulty may have on
test-takers’ impression of overall reading
module difficulty. More specifically, in all
five examination periods (2006-2008) of the
KPG B2 and C1 English language exams,
perceived text lexical complexity was
consistently found to correlate
significantly with perceived reading module
difficulty, i.e., the more difficult the
test-takers find the vocabulary of the text,
the more difficult the whole module seems to
become for them (Table 7-Appendix B).
Perceived item and task instructions
difficulty were also found to correlate
significantly with module difficulty, which
indicates that as the items become more
difficult to answer and the task
instructions more complex to comprehend,
test-takers found it more difficult to cope
with the whole reading module. Moreover,
data analysis showed that text preference
negatively correlated with text lexical
complexity. In other
words, the more test-takers liked the topic
of a text, the less difficult they found its
vocabulary. Finally, topic familiarity was
found to correlate significantly with text
preference, which indicates that the more
familiar test-takers were with certain
topics, the more they liked reading them.
5. Conclusions
An important aim of this
paper has been to show that a reading
comprehension text is not difficult or easy
by itself. It is not difficult or easy
merely because it has a more or less
complicated organization, sentence and
paragraph structure, lexical density,
variation or dearth. How difficult or easy a
text is also depends on the reader – the
reader who, in the case of a test, is
actually obliged to show that s/he
understands it. As a matter of fact, as this
paper has attempted to show, text
comprehensibility depends on a variety of
reader-related factors, including the
reader’s perceptions and attitudes towards
the text, as well as his or her reading
habits i.e., the reading strategies s/he has
developed and learnt to use for a successful
exam performance. Moreover, as this paper
has made clear, there is a strong
relationship between these factors and the
reader’s age (and therefore his/her literacy
level), as well as his/her sex and years of
learning the foreign language.
The question
that logically follows
is
why it is useful
to know how different readers who are in
this case in the role of test-takers,
respond to each reading comprehension text.
An even more crucial question is for whom is
this information useful, ‘Who needs to know
how easy or difficult different readers find
specific reading comprehension texts?’ Our
response to this second question, which
indirectly also answers the first one, is
that this information is perhaps most
valuable for the team of experts responsible
for the exams – a team which usually
includes test developers, results analysts
and scholars who prepare the exams and
interpret the results for different
audiences, as in the case of the KPG exam
battery. Although there is no doubt that
test designers cannot change test-takers’
age, sex or years of learning English, it is
believed that they can take care not to bias
their tests towards either factor, they can
check for such possible biases in their
analyses of results, and they can inform
future test-takers of potential implications
to their performance due to these factors.
In other words, the availability of valid
data related to actual readers i.e., the
test-takers who actually sit for the exams,
provides the test design team with
opportunities to take full advantage of the
empirical evidence and make a successful
match between readers and texts, without
threatening the validity and reliability of
the reading test. Undoubtedly, such
information could be useful not only for
test developers and researchers worldwide,
but also for test-takers and/or EFL teachers
who may be preparing their students for the
specific test, or those who are interested
in this information as to steer their choice
of reading material for the English class.
For example, by taking advantage of the
present study and trying to link research
findings with practice, publishing companies
and curriculum designers may find it easier
to choose texts appropriate for their
intended readers’ level of language ability,
age and experiences.
Finally, from
a methodological point of view, this study
hopes to be making a contribution given
that, instead of controlling one specific
variable (a method often employed by
researchers), it investigated and
cross-related the effect of various
variables for their interaction to be better
defined and predicted within the context of
the KPG exams. Using simultaneous analyses,
it also demonstrated a way to detect
measurement invariance as a critical
prerequisite to achieve validity of
inferences based on specific measures.
With specific
reference to the KPG English language exams,
it should be mentioned that the finding that
texts used in the exams seem to be more
appealing to older test-takers and better
match their experiences and preferences has
been carefully considered by KPG test
designers for a wider range of topics
appealing to younger test-takers to be
included in the exam. Regarding text
appropriateness and its level of difficulty,
continuous effort is paid on carefully
selecting reading texts based not only on
readability indices and text features such
as word length, number of sentences, average
characters per word, average words per
sentence, etc., but also on lexical density
and diversity. In the context of the KPG
exams, the findings of the present research
could be of practical use to the KPG exams
that are currently administered in other
languages, i.e., French, Italian, Spanish
and German, and beneficial to the design of
reading comprehension tests of other levels
to be introduced in the years to come. More
importantly, the present research comes to
provide empirical evidence and further
support the rationale and ideology of the
KPG exams according to which, “despite its
international orientation and its European
character, focuses on the Greek user of the
target language and aims to cover local
needs by responding to the demands of
education and the labour market in Greece” (Dendrinos,
2009).
All in all,
although it should once again be admitted
that reader variables such as those
mentioned above are essentially beyond the
control of test designers, efforts should be
made to alleviate their effects and improve
test-takers’ chances for a successful
performance by not only acknowledging their
existence but also by paying careful
attention to the selection of texts,
especially when using authentic materials,
for foreign language learners to encounter
materials that are at an appropriate level
of difficulty.
By following such an approach
they will be able to provide evidence that
the methods they employ to elicit data are
appropriate for the intended audience, that
the procedures used provide stable and
consistent data and, consequently that the
interpretations they make of the results are
justified since they are based on a valid
and reliable exam system.
References
Alderson, C. (2000).
Assessing
Reading. Cambridge: Cambridge University
Press.
Allen, D.,
Bernhardt, B, Berry,
T., & Demel, M. (1988). Comprehension and text
genre: an analysis of secondary school foreign
language readers. The Modern Language Journal,
72, 163-172.
Alptekin, C.
(2006). Cultural
familiarity in inferential and literal
comprehension in L2 reading. System, 34, 494-508.
Bachman, L. (2002). Some
reflections on task-based language performance
assessment.
Language Testing, 19(4),
453-476.
Bailin,
A., & Grafstein, A.
(2001). The linguistic assumptions underlying
readability formulae: a critique. Language &
Communication, 21, 285-301.
Barry, S.,
& Lazarte, A. (1998).
Evidence for Mental Models: How Do Prior
Knowledge, Syntactic Complexity, and Reading
Topic Affect Inference Generation in a Recall
Task for Nonnative Readers of Spanish? The
Modern Language Journal, 82, 176-193.
Ben-Anath
D. (2005): The role of
connectives in text comprehension. Teachers
College, Columbia University Working Papers in
TESOL & Applied Linguistics, 5(2), 1-27.
Bensoussan, M. (1998). Schema
effects in EFL reading comprehension. Journal
of Research in Reading, 21(3),
213-227.
Bernhardt,
E. (1986). Proficient
Texts or Proficient Readers? ADFL Bulletin,
18(1), 25-28.
Boyle, J. (1987). Sex differences
in listening vocabulary. Language Learning,
37(2), 273-284.
Brantmeier, C. (2003). Does
gender make a difference? Passage content and
comprehension in second language reading. Reading in a Foreign Language,
15(1),
1-27.
Brantmeier, C. (2005). Effects of
Reader’s Knowledge, Text Type and Test Type on
L1 and L2 Reading Comprehension in Spanish. The
Modern Language Journal, 89,
37-53.
Brown, A. &
McNamara, T. (2004).
“The Devil is in the Detail”: Researching Gender
Issues in Language Assessment. TESOL
Quarterly, 38(3), 524-538.
Bügel, K., &
Buunk, B. (1996).
Sex differences in foreign language text
comprehension: the role of interests and prior
knowledge. The Modern Language Journal,
80, 15-31.
Calvo, M.,
Castillo, D., & Schmalhofer, F. (2006). Strategic influence on
the time course of predictive inferences in
reading. Memory & Cognition, 34(1),
68-77
Carr, N. (2006).
The
factor
structure of test task characteristics and
examinee performance. Language Testing,
23(3), 269-289.
Chalhoub-Deville,
M., & Turner,
C. (2000). What to look for in ESL admission
tests: Cambridge certificate exams, IELTS and
TOEFL. System, 28(4), 523-539.
Cohen, L.,
Manion, L., &
Morrison, K. (2000). Research Methods in
Education. London: Routledge Falmer.
Coles, M., &
Hall, C. (2002).
Gendered readings: learning from children’s
reading choices. Journal of Research in
Reading, 25(1), 96-108.
Dale, E., &
Chall, J. (1995).
Readability Revisited: The New Dale-Chall
Readability Formula. Cambridge,
Massachusetts: Brookline Books Inc.
Dendrinos, B.
(2009). Rationale
and ideology of the KPG exams.
ELT News,
September 2009. Available online at:
http://rcel.enl.uoa.gr/kpg/gr_kpgcorner_sep2009.htm
.
Dreher, M., &
Singer, H. (2001).
Friendly Texts and Text-Friendly Teachers.
Theory into Practice, 28(2), 98-104
Drucker, M. (2003). What reading
teachers should know about ESL learners. International Reading Association,
22-29.
Durán, P.,
Malvern, D., Richards,
B., & Chipere, N. (2004). Developmental Trends
in Lexical Diversity.
Applied Linguistics,
25(2), 220-242.
Ellis, N. (2001).
Memory
for
language. In P. Robinson (Ed.), Cognition and
second language instruction (pp. 33–68).
Cambridge, UK: Cambridge University Press.
Fry, E. (1989).
Readability
Formulas: maligned but valid. Journal of
Reading, 32, 292-297.
Fulcher, G. (1997). Text
difficulty and accessibility: Reading Formulae
and expert judgment. System, 25(4)
497-513.
Garner, M., &
Engelhard, G.
(1999). Gender Differences in Performance on
Multiple-Choice and Constructed Response
Mathematics Items. Applied Measurement in
Education, 12(1), 29-51.
Halliday, M.
A. K. (1996).
Literacy and Linguistics: a Functional
Perspective. In R. Hasan & G. Williams (Eds.)
Literacy in Society (pp. 339-376.) London:
Longman.
Hock, T. (1990). The Role of
Prior Knowledge and Language Proficiency as
Predictors of Reading Comprehension among
Undergraduates. Multilingual Matters, 59, 214-224.
Hong, E. (1999). Test anxiety,
perceived test difficulty, and test performance:
temporal
patterns
of their effects. Learning
& Individual Differences, 11(4),
431-447.
Horiba, Y.
(2000). Reader Control
in Reading: Effects of Language Competence, Text
Type, and Task. Discourse Processes, 29(3) 223-267.
Ilyin, D.,
Spurling, S., &
Seymour, S. (1987). Do learner variables affect
cloze correlations? System, 15(2),
149-160.
Jarvis, S.
(2002).
Short texts, best-fitting curves
and new measures of lexical diversity.
Language Testing,
19(1),
57-84.
Jennings, M., Fox, J., Graves,
B., & Shohamy, E. (1999).The test takers’
choice: an investigation of the effect
of topic on language-test performance.
Language Testing,
16(4),
426-456.
Keshavarz,
M., Atai, M., &
Ahmadi, H. (2007). Content schemata, linguistic
simplification and EFL readers’ comprehension
and recall. Reading in a Foreign Language,
19(1), 19-33.
Ketchum,
E. (2006). The cultural
baggage of second language reading: An approach
to understanding. Foreign Language Annals,
39, 22–42.
Klare, G. (1984).
Readability. In
D. Pearson (ed.) Handbook of Reading Research
Vol.1 (pp. 681-744). New York: Longman.
Krekeler, C. (2006).
Language for
special academic purposes (LSAP) testing: the
effect of background knowledge revisited. Language Testing,
23(1), 99-130.
Leong, E., Ewing,
M., & Pitt, L.
(2002). E-comprehension: Evaluating B2B websites
using readability formulae. Industrial
Marketing Management, 31, 125-131.
Lin, L., Zabrucky,
K., & Moore,
D. (2002). Effects of text difficulty and
adults’ age on relative calibration of
comprehension. American Journal of Psychology,
115(2), 187-198.
Lumley, T., & O’Sullivan, B.
(2005). The effect of test-taker gender,
audience and topic on task performance in
tape-mediated assessment of speaking. Language Testing,
22(4), 415-437.
Malvern, D., &
Richards, B.
(2002). Investigating accommodation in language
proficiency interviews using a new measure of
lexical diversity. Language Testing, 19(1), 85-104.
Meade, C., &
Smith, C. (1991).
Readability Formulas: Cautions and Criteria.
Patient Education and Counseling, 17,
153-158.
Mesmer,
H. (2005).
Introduction: Text Accessibility and the
Struggling Reader.
Reading & Writing Quarterly,
21,
1-5.
Moreno, R. &
Mayer, R. (1999).
Gender differences in responding to open-ended
problem-solving questions. Learning &
Individual Differences, 11(4),
355-364.
Nassaji, H. (2002).
Schema theory
and knowledge-based processes in second language
reading comprehension: A need for alternative
perspectives. Language Learning, 52(2),
439-481.
Nevo, N. (1989).
Test-taking
strategies on a multiple-choice test of reading
comprehension. Language Testing, 6(2),
199-215.
Oakland, T., &
Lane, H.B. (2004).
Language, Reading, and Readability Formulas:
Implications for Developing and Adapting Tests.
International Journal
Of Testing, 4(3), 239-252.
Parker, R.,
Hasbrouck, J., &
Weaver, L. (2001). Spanish Readability Formulas
for Elementary-Level Texts: A Validation Study.
Reading & Writing Quarterly, 17,
307-322.
Phakiti, A. (2003).
A Closer Look
at Gender and Strategy Use in L2 Reading.
Language Learning, 53(4), 649-702.
Pichert, J., & Elam,
P. (1985).
Readability formulas may mislead you. Patient Education and Counseling,
7,
181-191.
Pomplun, M. &
Omar, M. (2001). Do
Reading Passages About War Provide Factorially
Invariant Scores for Men and Women? Applied
Measurement in Education, 14(2),
171-189.
Pomplun, M.
& Sundbye, N. (1999).
Gender Differences in Constructed Response
Reading Items. Applied Measurement in
Education, 12(1), 95-109.
Pulido,
D. (2004).
The relationship between text
comprehension and second language incidental
vocabulary acquisition: A matter of topic
familiarity? Language Learning, 54(3),
469–523.
Purpura, J. (1997). An analysis
of the relationships between test takers’
cognitive and metacognitive strategy use and
second language test performance. Language
Learning, 47(2), 289-325.
Read, J. (2000).
Assessing
Vocabulary. Cambridge: Cambridge University
Press.
Richards, B.J.
(1987). Type/token
ratios: what do they really tell us? Journal
of Child Language, 14, 201-209.
Rupp, A.,
Ferne, T. & Choi H.
(2006). How assessing reading comprehension with
multiple-choice questions shapes the construct:
a cognitive processing perspective. Language
Testing, 23(4), 441-474.
Saito, Y.,
Horwitz, E., & Garza,
T. (1999). Foreign Language Reading Anxiety. The Modern Language Journal,
83,
202-218.
Scarcella, R.,
& Zimmerman, C.
(1998). Academic words and gender: ESL student
performance on a test of academic lexicon. Studies in Second Language Acquisition,
20(1), 27-49.
Shohamy, E. (2001). Democratic
assessment as an alternative. Language
Testing, 18(4), 373-391.
Shokrpour, N. (2004). Systemic
Functional Grammar as a Basis for Assessing Text
Difficulty. Indian Journal of Applied
Linguistics, 30(2), 5-26.
Stricker,
L., Rock, D., &
Bennett, R. (2001). Sex and Ethnic-Group
Differences on Accomplishments Measures. Applied Measurement in Education,
14(3),
205-218.
Sullivan,
B. (2000). Exploring
gender and oral proficiency interview
performance. System, 28, 373-386.
Sunderland, J. (1993). Learner
Gender and Language Testing: Any Connections? Language Testing Update,
13, 46-56.
____________
(1995). Gender and
Language Testing. Language Testing Update,
17, 24-35.
Ungerleider,
C. (2003).
Large-Scale Student Assessment: Guidelines for
Policymakers. International Journal of
Testing, 3(2), 119-128.
Wen, Q.,
&
Johnson, R. (1997). L2
learner variables and English achievements: A
study of tertiary-level English majors in China.
Applied Linguistics, 18(1), 27-48.
Wightman,
L. (1998). An
Examination of Sex Differences in LSAT Scores
from the Perspective of Social Consequences. Applied Measurement in Education,
11(3),
255-277.
Woodfield,
R., Earl-Novell, S., &
Solomon, L. (2005). Gender and mode of
assessment at university: should we assume
female students are better suited to coursework
and males to unseen examinations? Assessment
& Evaluation in Higher Education, 30(1),
35-50.
Wylie, J., &
McGuinness C. (2004). The interactive effects of
prior knowledge and text structure on memory for
cognitive psychology texts. British Journal
of Educational Psychology, 74,
497-514.
Endnotes
[1]
A shorter version of this paper has been
included in the Proceedings of the 4th
Annual South-East European Doctoral
Student Conference, while findings
of the survey have also been presented
at the ALTE 3rd International
Conference, the EALTA 5th
International Conference and the Harvard
Graduate School of Education Research
Conference.
[2]
Sincere thanks to Prof. Bessie Dendrinos,
supervisor of my thesis, for her
insightful comments and valuable
suggestions regarding this article. I am
also indebted to her for all the
patience and care with which she
nurtured the emergence and continuation
of this project. This study was
supported in part by the RCeL which has
generously offered me the research data
I have been investigating. However, the
views expressed in this paper do not
reflect the official policy of the
Center and the responsibility for the
way the data has been interpreted relies
entirely on me.
[3]
This part of the research
was carried out while I was a research
assistant at the RCeL. In this capacity,
I coordinated the surveys conducted with
KPG test-takers between 2006, 2007 and
2008 examination periods. For the data
analysis I present in this paper, I was
able to use the RCeL facilities and to
consult with the RCeL in-house
statistics specialist. These surveys are
part of a larger ongoing research
project, directed by Prof. B. Dendrinos,
investigating the profile of the KPG
test-takers, their attitudes towards the
KPG English exams and the strategies
they use while taking the exam.
[4]
Dale and Chall (1995: 80) proposed a
definition of readability as ‘the sum
total -including the interaction- of all
those elements within a given piece of
printed material that affect the success
a group of readers have with it. The
success is the extent to which they
understand it, read it at an optimal
speed and find it interesting’.
Readability is, thus, an umbrella term
that broadly refers to the ease or
difficulty of a text.
[5]
Readability formulas roughly indicate
the approximate reading grade level
required to understand a text.
Appendix A
Appendix B
|