There has been no common consensus among researchers regarding the best scoring and interpretation procedure for the verbal fluency task. This review provides a summary of various systems of qualitative scoring measures employed by investigators for deeper understanding of verbal fluency performance. Some of the qualitative measures include clustering and switching analysis, method of hierarchical exploration, error production analysis and time course analysis.
Clustering and switching analysis
One of the qualitative methods gaining wide popularity in verbal fluency outcome research is the understanding of mechanisms involved in the optimal word generation regarding characteristics and pattern of word generation. This endeavour helps in understanding not only regarding how well an examinee performs the task but also regarding how one goes about performing the task. The evidence from lesion studies and brain imaging has also provided evidence for the utility of this qualitative measurement in understanding the precise nature of deficient performance (Abwender et al., 2001; Troyer et al.,1997).
The earliest analysis of process of word generation has revealed that words generated during verbal fluency task occur in spurts with some meaningful relation between words rather than evenly in time. However, not all the words within the nested subset are recoverable, and individual tends to shift to next nested subset search in order to facilitate more number of word retrieval (Gruenewald & Lockhead, 1980; Wixted & Rohrer, 1994). Individuals to be successful in word retrieval needs to search for subcategories for the specified letter or category, retrieve the words from the subcategory and then move into next subcategory for retrieval once words have been exhausted.
In consonance with Chertkow and Bub’s (1990) predictions, Troyer et al. (1997) proposed that the qualitative aspects of verbal fluency can be described based on two components, viz. Clustering and Switching. This was based on the emerging evidence of the multifactorial nature of verbal fluency task and due to the limitations of quantitative measure of total correct words of not being able to capture entirely all the important aspects of an individual’s performance (Troyer et al., 1997). The terms clustering and switching was used by Troyer et al. (1997) to better operationalize the “store” and “search” processes involved in VF tasks, respectively. An effective and successful performance on verbal fluency requires an intact ability to produce words related semantically or phonemically (clustering) and an ability to shift efficiently to a new strategy once a category or subcategory is exhausted (switching). The output quantity is therefore determined by the semantic storage and the search component of verbal fluency.
The process of organizing words into semantically or phonemically related subcategories involves the production of clusters. Clusters produced during semantic task involve generation of two or more consecutive words that are related in meaning (e.g: apple, orange in fruits category; car, bus in vehicle category etc). In the same way, clusters produced during phonemic fluency task involves production of two or more consecutive words that are related based on phonemic characteristics (eg: words beginning with the same letter [chair, church]; words differing only by vowel [pan, pen] etc.). A task congruent clustering involving generation of semantically related words during semantic fluency task and phonemic based retrieval during phonemic fluency task has been reported (Raskin et al., 1992; Troyer et al., 1997). Utility of task-discrepant clustering (Abwender et al., 2000; Tallberg et al., 2011) which is a measure of intentional strategy use may also be involved wherin participants may retrieve phonemic clusters during semantic task or semantic clusters during phonemic fluency task. Another qualitative measure estimated during word generation task is the switching which involves calculation of the number of shifts between subcategories. The clusters produced are separated by gaps with the interval between words within clusters shorter than words between clusters (Gruenwald & Lockhead, 1980; Wixted & Roher, 1994). Troyer et al. (1997) postulated that clustering and switching are two dissociable components of verbal fluency with clustering and switching contributing equally to semantic fluency and switching contributing specifically for phonemic fluency. On animal fluency task, it is found that in terms of clustering, healthy adults produce clusters of two words per cluster and switched on an average 10.6 (3.5) times between the subcategories. Among older adults participants tend to switch less as compared to young adults, on average 8.5 (SD = 2.3) times between the subcategories with similar cluster size as younger adults (Troyer et al., 1997).
Each of the strategies used to maximize word production involves separate mechanisms involving specific brain areas. Performance on semantic memory processes of clustering (organizing words related to a subgroup) reflects the role of temporal lobe processes such as lexical or verbal memory storage. The executive processes of switching (engaging in strategic search processes) require mediation of frontal and frontal-subcortical area processes such as initiation, cognitive flexibility, and mental shifting. The evidence for these predictions is documented in clinical populations with predominant brain damage involving frontal or temporal lesions. Poor performance on clustering has been reported among patients with Alzheimer’s disease (Troyer et al., 1998b), Mild Cognitive Impairment (Price et al., 2012), temporal lobe lesions (Henry & Crawford, 2004a; Troyer et al., 1998a) and temporal lobe epilepsy (Giovannetti et al., 2003).
Similarly, switching was reported to be impaired among patients with frontal lobe lesions (Troyer et al., 1998a), Parkinson’s disease (Troster et al., 1998; Troyer et al., 1998b), Huntington’s disease (Ho et al., 2002; Rich, Troyer, Bylsma, & Brandt, 1999), HIV associated dementia (Woods et al., 2004), multiple sclerosis (Henry & Beatty, 2006), depression (Henry & Crawford, 2005a), Acquired immunodeficiency syndrome (Iudicello et al., 2007), schizophrenia (Robert et al., 1998) and under conditions of divided attention (Troyer et al., 1997). In conditions involving diffuse brain lesions, some investigators have also reported deficits in both clustering and switching with predominant influence on one. For instance, Troyer et al. (1998b) reported that in dementia of the Alzheimer’s type, both clustering and switching is impaired, however the severity of impairment is noticed to be more on clustering than switching.
Recently there has been a shift in focus towards the development of clustering and switching during verbal fluency in children (Kave et al., 2008; Koren et al., 2005; Hurks et al., 2010; Sauzeon et al., 2004). Strategic processing during verbal fluency has been examined in clinical paediatric populations including children with PKU (Banerjee et al., 2011), Specific Language Impairment (Henry, Messer & Nash, 2012), blindness (Wakefield, Homewood, & Taylor, 2006), Turner syndrome (Temple, 2002) and Down syndrome (Nash & Snowling, 2008). As expected, both of these strategies are positively correlated with the total number of words produced (Kave et al., 2008; Koren et al., 2005; Robert et al., 1997; Troyer et al., 1997; Troster Fields, Testa et al., 1998).
Despite the clinical utility and good psychometric property, this qualitative measure is not without controversy. From the theoretical perspective, the most widely used Troyer protocol has been criticized by many researchers (Abwender et al., 2001; Demakis et al., 2003; Mayr, 2002; Ross et al., 2007). Abwender et al. (2000) criticized Troyer’s protocol by stating that there is no adequate evidence that clustering inevitably leads to production of more words. They also state that the interpretation of switching whether it is a product of strategic searching and mental flexibility or lack of ability to cluster is not adequately explained. The consideration of single words as having cluster size of zero was also reported to not to take into consideration the failure to generate clusters. Epker, Lacritz, and Cullum (1999) observed that the qualitative measures didnot provide any additional information as compared to the total number of correct word calculation in differentiating individuals with AD, PD, and healthy older adults. Similarly Mayr (2002) criticized Troyer’s scoring system regarding the ambiguous nature of switching score. They supported the view that it is difficult to differentiate whether the number of switches is associated with difficulties in accessing new clusters or difficulties in accessing new words within clusters. The more the time an individual spends in one cluster group; lesser the time will be for the individual to access other clusters. A reduction in the number of switches may be related to a general reduction in processing speed or selective switching deficit. Contrary to Troyer’s view of clustering as an automatic process and switching as an effortful strategic process, Mayr and Kliegl (2000) reported involvement of the strategic component during both the processes. Demakis et al. 2003 considered switching component observed during verbal fluency performance as tapping general cognitive ability rather than specific executive processing. Koren et al. (2005) however considered number of clusters rather than number of switches as a measure of executive component. Similarly, Ross et al. (2007) questioned the consideration of clustering – switching as overt strategies of verbal fluency but rather an artefact of the test itself (especially clustering during letter fluency).
Hierarchical Exploration Analysis
Hierarchical exploration analysis as an outcome measure of semantic retrieval has been employed by many researchers (Beatty et al., 1989; Raboutet et al., 2010; Troster et al., 1995; Sauzeon et al., 2004). The task involved is word generation on supermarket fluency task (generating as many words as possible that can be purchased in a supermarket). The analysis involves a semantic categorical system comprising 10 categories with two hierarchical levels of items (category labels and category exemplars) for each category. The category label corresponds to super and subcategory nouns produced and category exemplars refer to the nouns of category specific items. For example in the category of fruits, the category labels include canned / frozen fruits and category exemplars include lemons, grapes, peaches etc. Based on the word output, a hierarchical ratio score (dividing the number of category labels (or exemplars) produced by the number of total words generated) is estimated. Troster et al. (1995) proposed scoring criteria for the classification of category labels and category exemplars.
Studies in clinical population have supported the use of hierarchical exploration analysis. For example, a selective decrease in the production of category exemplars has been shown in pathologies with temporal-lobe lesions, such as epilepsy or Alzheimer’s disease (Martin & Fedio, 1983; N’Kaoua, Lespinet, Barsse, Rougier, & Claverie, 2001; Troster, Salmon, McCullough, & Butters, 1989; Troster et al., 1995). Similarly in children, Sauzeon et al. (2004) provided scoring for categories sampled, label and exemplar ratio, words per category sampled ratio and category shifts per category sampled ratio among French speaking children between 7-16 years of age.
During the task of word generation for letter / category fluency, individuals tend to produce erroneous responses such as repeating the words (perseveration error) or coming up with words not starting with a particular letter or belonging to a particular category (rule break errors). From the cognitive perspective, the presence of errors is associated with a less effective control system and reduced executive capacities (Rosen & Engle, 1997) and as a strategic means employed by participants to generate new clusters (Troyer et al., 1997). While the Troyer et al protocol necessitates the inclusion of perseverations and errors in clustering-switching analysis, Haugrud, Lanting, and Crossley (2010) reported that their inclusion inflates the cluster size scores.
The error scores are often calculated as the number of individual error types present or a combined score for all the error types produced. Raboutet et al. (2010) calculated error score as the number of intrusions and perseverations produced. Robert and Le Dorze (1997) calculated the following five error types were scored: (1) repetition errors, repeating an item previously given as a correct response within the current test; (2) outside category errors, a word that did not belong to the category currently being tested; for example, saying ”veal” in animals or ”pills” in foods. (3) nonword or unintelligible errors; (4) wrong language errors, a word belonging to the appropriate category but not in the language currently being tested; and (5) other, any error not meeting one of the above definitions. Similarly, the error analysis has been part of many research studies (Hurks et al., 2004, 2006; Martin & Fedio, 1983; Martin et al., 1994; N’Kaoua et al., 2001; Raboutet et al., 2010; Raskin et al., 1992; Rosen & Engle, 1997; Troster et al., 1995).
Researchers have attempted to understand the error response pattern in disordered population in both adults and children which provides important information in clinical practice. Errors such as perseveration can be produced even in healthy controls, but these are mostly described in those individuals with neurological / cognitive deficits. The number of perseverations that adult subjects produce in their responses can have a diagnostic value (Troster et al., 1995). Compared to healthy adults, perseveration errors are frequent in individuals with Alzheimer’s disease, aphasia, frontal lobe damage, Parkinson’s disease, Huntington’s disease, and traumatic brain injury (Azuma, 2004). Pekkala, Albert, Spiro, and Erkinjuntti (2008) reported presence of recurring perseverations (fan, fried, friend, fan), continuous perseverations (fan, fan, fan) and stuck-in-set (continuing to name f words after a new letter has been presented) perseverations in Alzheimer’s disease (AD).
Error analysis has gained popularity also in childhood verbal fluency research. The presence of more number of intrusion errors has been reported in children with ADHD (Mahone, Koth, Cutting, Singer, & Denckla, 2001). Similarly, in preschool children with early treated Phenylketonuria (Welsh, Pennington, Ozonoff, Rouse, & McCabe, 1990) more perseverative errors than control group were reported. Similarly error analysis has gained importance in typically developing children also (Charchat-Fichman et al., 2011; Hurks et al., 2004; Hurks et al., 2006; Tallberg et al., 2011). Tallberg et al. (2011) study in 130 typically developing children speaking Swedish language between 6 to 15 years of age reported presence of predominantly perseveration especially on letter fluency (FAS) as compared to the animal fluency task. Charchat-Fichman et al. (2011) study among Brazilian children (7-10 years) shown that the total number of errors during semantic fluency task correlated with phonemic fluency task though no significant correlation was noted with age with a relatively smaller number of errors.
Time course analysis
The organization of words during verbal fluency production is also examined as a function of time (Crowe, 1998; Hurks et al., 2010; Raboutet et al., 2010). Crowe (1996) proposed a model of lexical organization emphasizing the role of analyzing verbal fluency performance as a function of time, focussing on two types of store for retrieval of words. During the initial time frame (first 15-20 seconds) of verbal fluency task, participants generate words from a long term store called topicon which contains easily accessible common words. Once the topicon is exhausted, effortful search occurs in the store of extensive lexicon. Bousfield (1953) and Gruenewald and Lockhead (1980) showed that the time interval required to access new subcategories is long and increased during the production, whereas the time required to produce items within semantic clusters was short and tended to remain constant. Contrary to this, Mayr and Kliegl (2000) reported of equal contribution of executive and semantic component during word retrieval.
With respect of pattern of productivity over time, it is widely accepted that there exists a negative accelerated curve in terms of number of words produced over the time frame (Crowe, 1998; Hurks et al., 2004, 2006; Venegas & Mansur, 2011). It is observed that the number of words produced was greater with high frequency words produced with speed and accuracy during the first time interval than last intervals. This has been reported in all the participants (e.g., children, young adults, old adults; patients with schizophrenia, aphasia, depression, dementia) irrespective of the population studied (Crowe, 1998; Ober, Dronkers, Koss, Delis, & Friedland, 1986; Hurks et al., 2004, 2006; McDowd et al., 2011; Rosen, 1980). In children, studies (Hurks et al., 2006; Raboutet et al., 2010) have illustrated that the word frequency and number of words produced were observed to decrease with time with the greater score during 0-30 seconds as compared to 31-60 seconds. It was also noticed that the efficiency to avoid perseveration errors decreased with time, along with decrease in intercategorical process of hard switching. An increase in ratio of number of clusters produced, the number of category exemplars and mean cluster size. The varied performance across time was attributed to higher attention load and more effortful and extensive semantic search during the last time frame
Automated approaches using clustering algorithms to scoring consistent with Troyer and Mayr theories of verbal fluency have also been reported by many researchers. These techniques focuses on co-occurence frequencies and amount of competition between exemplars (items generated) for given category. Some of the computational methods include latent semantic analysis, correspondence analysis and hierarchical clustering and network theory (Goni et al., 2011; Schwartz, Baldo, Graves, & Brugger, 2003; Snyder & Munakata, 2010)
Standard Scoring Protocols
Various systems of scoring protocol for verbal fluency performance have been described in literature. The protocols vary in terms of the testing measures employed for analysis purposes. It is however to be kept in mind that a lot of disparity and disagreement exist between researchers on interpretation and utility of the protocol employed.
One of the most common and widely used protocols was given by Troyer et al. (1997). Troyer et al. focussed on analysis of number of words generated excluding errors and repetitions, clustering (number of clusters; cluster size; mean cluster size) and switching (number of switches). Cluster was operationally defined as production of successively generated words belonging to same subcategory (either phonemic or semantic subcategory). For the sequence cat-dog-lion-elephant-zebra, pet animals and African animals were considered as the two clusters produced by the individual. The cluster size, which is the number of words in a cluster, was counted beginning with the second word in each cluster (e.g., a 2-word cluster was counted as having a cluster size of 1). Single or nonclustered words were designated as having a cluster size of 0. The mean cluster size was calculated by adding up the size of each cluster and dividing by the number of clusters produced. For example, the sequence “lemons, chicken, meat, fruit, banana, apple, corn flakes, salt, pepper, cheese, milk, yogurt” contains 6 clusters, with respective cluster sizes of 0, 1, 2, 0, 1, 2 and a mean cluster size of 1.
With respect to definition of clusters, other researchers have attempted to refine Troyer and colleagues’ protocol. Raskin et al. (1992) defined clusters as comprising of pair of words belonging to same subcategory without consideration for longer clusters or cluster size. Based on this, authors emphasized the role of number of clusters as a measure of cognitive flexibility rather than number of switches. Contrary to Troyer protocol, the ratio of total words to number of clusters was considered rather than the single word productions. Robert et al. (1998) considered three consecutive associated words in semantic fluency task and two consecutive associated words in phonemic fluency task as a cluster. In another protocol developed by Abwender et al. (2000) clusters was defined as two or more associated words. The authors did not consider single words as a cluster and hypothesized that single words suggest a failure to retrieve other words from that particular category. In Kosmidis et al. (2004a) scoring protocol, three or more consecutive words were grouped as a cluster for semantic and phonemic fluency.
With respect to switching, Troyer et al defined number of switches as transition between clusters including single words. Abwender et al. (2000) described two types of switches, that is, cluster switch and hard switch reflecting speeded nature of the task. Cluster switch involves transition from one cluster to next cluster and hard switch involves transition between two single words (banana, cheese) or between a cluster and a single word (fruit, banana, cheese). Abwender et al. provided an example for clustering-switching for word generation on food fluency. For the sequence of “banana, orange, milk, cheese”, the number of clusters was considered as two (fruits; dairy products) with one cluster switch and no hard switch.
Other researchers have attempted to extend the protocol. March and Pattison (2006) along with Troyer system, provided scores for raw number of subcategories produced and number of errors (repetitions and categorical error types) in their study of individuals with AD. The number of semantic subcategories as an indicator of semantic memory organization was also reported by da Silva et al., (2004) in their study on impact of literacy and education in semantic fluency. Sauzeon et al. calculated ratio of total number of switches and mean cluster size to total number of words produced (reason for and against by Sauzeon and Troyer to be added in discussion). In Koren et al. (2005) study, instead of the number of switches, only number of clusters were analyzed. Similarly Tucha et al. (2005) alongside Troyer scoring system, counted the number of labels produced as well as clusters within clusters in individuals with ADHD. Lanting, Haugrud, and Crossley (2009) explored the number of novel and repeated clusters and percentage of clustered words in healthy and older adults. Kosmidis et al. (2004a) also provided a specific scoring protocol for animals, fruits and objects based on the data from Greek population.
Robert and Le Dorze (1997) reported use of scoring protocol consisting of total correct words (subcategory labels as in ‘fruits’ and category members as in ‘apple’), number of errors (such as repetition errors, outside category errors such as ”pills” in foods, nonword or unintelligible errors, wrong language errors, others). The analysis also involved analyzing of number of comments (such as swearing, self-talk (”that’s all I know”), and questions about the task – ”can I say that?”). Scoring of number of semantic associations (three or more words belonging to same category), mean length of semantic association and percentage of words in semantic association (SA) were also reported as a part of scoring protocol by Roberts and Le Dorze (1997). In French/English balanced healthy bilinguals Roberts and Le Dorze (1997) reported similar semantic organization for animal and food fluency in both the languages. The mean number of SA’s (4.47 in French; 4.84 in English), mean length of SA’s (4.78 in French and 4.54 in English) and the mean percentage of words in SA’s was 62.6 in French and 64.8 in English. Similarly Reverberi et al. (2006) used an index of semantic relatedness in their study on participants with frontal lesions.
Carneiro et al. (2008) in their study on Portugese category norms for children, reported the scoring of various measures including number of responses and exemplars, responses which are idiosyncratic and inappropriate and commonality and diversity indexes for the categories tested in children. Recently, Raboutet et al. (2010) provided a scoring system involving evaluation of five scores (general scores, intercategorical or switch scores, intracategorical or cluster scores, semantic hierarchical exploration scores and error scores) for each time interval involving supermarket fluency task.
Normal and abnormal aspects of verbal fluency – adults & children
The utility of verbal fluency has been researched in various populations including healthy individuals as well as disordered population. The research findings expanded below illustrates the extent of utility of verbal fluency tasks as a screening, diagnostic and treatment measure in various domains in both adult and childhood population.
Summary of research findings on verbal fluency (VF) among adult population
Chan & Poon (1999); Loonstra, Tarlow, & Sellers (2001);Tombaugh et al., (1999); Troyer et al., (1997)
Influence of demographic variables evidenced with norm based data available in different languages and culture
Differential performance on phonemic and semantic based verbal fluency tasks
Peak performance in 19-30 years of age with subsequent decline
Kemper & McDowd (2008); McDowd et al., (2011); Tombaugh et al., (1999); Troyer et al., (1997); Troyer (2000)
Aging related declines in verbal fluency (qualitative and quantitative measures)
Factors influencing verbal fluency performance documented (viz. age, level of education, gender, verbal intelligence, income, motor response, language effects, reading-writing speed, level of mental & physical activity, functional status)
Used as a test to differentiate aging from dementia due to superior performance
Focal cortical lesions (Frontal & Temporal lobe)
Antonucci, Beeson, Labiner, & Rapcsak (2008); Baldo et al., (2006); Baldo, Schwartz, Wilkins, & Dronkers (2010); Henry & Crawford (2004a); Stuss et al., (1998); Troyer et al., (1998a)
Frontal lobe lesions:
Frontal lobe involvement for letter fluency predominantly than category fluency
Non-conclusive findings on specific regions of frontal lobe involved
Selective deficits in switching
In non fluent aphasia, impaired phonemic clustering with preserved semantic clustering
Temporal lobe lesions:
Category fluency sensitive to temporal lobe damage depending on the extent of damage
Lesser deficits on phonemic fluency as compared to category fluency with selective deficits in clustering
Word generation greater on living than non living things
Impaired semantic clustering with preserved phonemic clustering in Wernicke aphasia
Alzheimer’s disease (AD)
Butters et al., (1987); Cosentino, Scarmeas, Albert, & Stern (2006); Martin & Fedio (1983); McDowd et al., (2011); Monsch et al., (1992); Henry, Crawford, & Phillips (2004); Rohrer, Salmon, Wixted, & Paulsen (1999); Troyer et al., (1998b)
VF deficits seen early in the disease (impaired even in mild AD type) with rapid decline with time
Category fluency more affected than letter fluency
Deterioration in the structure, content and organization of semantic memory with fewer subordinate exemplars
Large proportion of response noted during the earlier part of recall time as a result of quick exhaustion of limited semantic store
No common consensus on whether the deficits reflect semantic store degradation or retrieval deficits related to executive control processes
Smaller clusters on both tasks and switched less often on semantic fluency than controls
Increased erroneous responses (perseverations and rule breaks)
VF task useful in diagnosis, predicting mortality, differential diagnosis (AD and elderly; cortical and subcortical dementia) & monitoring rate of decline
Huntington’s disease (HD)
Henry, Crawford, & Phillips (2005); Ho et al., (2002); Monsch et al., (1992); Rich, Troyer, Bylsma, & Brandt, (1999); Rosser & Hodges (1994); Troster et al., (1989)
Decline in performance over time with disease progression
No common consensus on whether similar level of impairment for both tasks or greater deficits in LF than SF
Among the qualitative tasks, selective impairment in phonemic switching over time but not semantic switching with fewer semantic clusters
A larger proportion of recall noted during the last phase of recall time which results in a slower pattern of retrieval
Dementia with Lewy bodies (DLB)
Ralph et al., (2001); Salmon et al., (1996)
Letter and category fluency equally reduced
Vascular dementia (VaD)
Carew et al., (1997); Jones, Laukka, & Backman (2006)
Both letter and category fluency impaired; Better performance than AD on SF task ;
HIV associated dementia
Woods et al., (2004)
Fewer words and switches with more intrusion errors
Frontotemporal lobar degeneration (FLDT) subgroups
Kramer et al., (2003);Libon et al., (2007); Libon et al., (2009)
Differential pattern of impairment and neural activation reported across FLDT subtypes
VF employed as a task to differentiate between AD and FLDT subtypes
FLDT with behavioral/dysexecutive disorder:
Reduced performance on both tasks related to executive and semantic deficits
Letter fluency deficits related to bilateral frontal atrophy and semantic fluency deficits to left frontal/temporal atrophy
Disproportionate impairment in SF related to the anterior and inferior left temporal lobe atrophy
Impaired lexical/semantic access more than mental search limitations
Progressive nonfluent aphasia:
Equally impaired on fluency tests resultant of Impaired lexical access
Deficits in semantic fluency to the right frontal lobe and letter fluency to left temporal atrophy
Progressive supranuclear palsy (PSP)
Rosser & Hodges (1994)
More impaired on letter fluency than semantic fluency
Deficits in initiation and retrieval mechanisms