purpose of the first intelligence test was to identify mentally deficient children in French public Schools (Franklin 2007). Henry Goddard followed and, translated the intelligence test in 1910, and administered it to what he believed were four hundred children who lacked intellectuality and following year the number of students increased to two thousand white children considered “normal”. (Franklin 2007). In 1911 Goddard was to produce a range of scores that was to be compared to urban, native born foreign born, and others. Suggesting that children cannot learn beyond the scope of their grade of intelligence and was the first to suggest that intelligence tests measured the how much a child was capable of learning at a chronological age (Franklin 2007). Since these tests, an array of intelligence test has been administered to people in attempts to measure a person cognitive ability.
Assessing student’s abilities through intelligence testing has questioned of the appropriateness of psychometric measures in regards to reliability and validity. Reliability regarding relies on consistency. For a test to be able to show consistency, error variance must be minimized. Test construction, test administration and test scoring and interpretation are the main three areas of error variance that can affect reliability (Cohen and Swerdilk 2010). It was argued , Simon Binets test needed to be revised in 1937 because the test originally administered in 1917 showed ‘tasks not always as well chosen s those for ages from six to twelve and that upon retesting black student’s social scientist had shown that black students had received a higher intelligence score(Franklin 2007) .
Regarding validity, in the early 1920s black social scientists debated what intelligence testing was measuring. Validity should be psychometrically sound to avoid test bias. Howard Longs research shows the Army’s Alpha and Beta test yielded results that were greater in variation of individuals and groups within races than between races (Franklin 2007). Bond a director of Education at Langston University in Oklahoma also agreed explaining that the army intelligence score were a reflection of social and environmental impacts rather than innate intellectual abilities of those tested (Franklin 2007). The Brisbane Catholic Education, concentrated on refugees that seemed to be misrepresented as having an intellectual disability. The argument is that the validity of standardized cognitive tests is not able to measure people’s intelligence from a different background and implying that psychometric instruments had not yet been developed in their country (Fraine and Mcdade). Tests that were standardized to certain background schooling experience, level and quality of education and conceptions of behavior test taking experience would affect certain groups if taken the intelligence test(Fraine and Mcdade).
There has been problem in intelligence testing and psychometric properties in relation to non- linguistics intelligence testing. In a study results show that although the Raven’s Progressive Matrices is supposed to be a culture fair IQ test people from English speaking backgrounds did much better scoring 96.71 that people from Zimbabwean sample that scored 72.36 (Shuttleworth-Edwards, Kemp, Rust, Muirhead, Hartman, and Radloff 2004).
Fairness is also a concern when a person takes an intelligence test. Controlling reliability and validity of a intelligence test shows a sound measurement, fairness aims at explaining the test and if it is used in a justifiable way. African refugees have been described as intellectually disable upon Australian schools. The Brisbane Catholic Education in Australia believes refugees from African backgrounds are labeled intellectually disabled upon completing intelligence test and other assessment of consideration such language barriers, culture difference acculturation trauma and previous experiences make it harder for psychologists to assess whether an African American is genuinely intellectually disabled(Fraine and Mcdade). Similar findings displayed that social conditions and low-test scores were highly correlated when measuring mental ability (Franklin 2007). When testing non -linguistic most children scored in the normal range, however these children were black and placed in a school that separated white children from black children (Franklin 2007).
The aim of this study is to check for sound psychometric measurements and culture bias when comparing different groups and the intelligence test taken. It is hypothesized that The PSYGAT will show good internal consistency and validity when tested against the Queendom Verbal. It is hypothesizes that the PSYGAT will show low when tested against the culture fair test. The Queendom tests are a verbal and a culture fair IQ test that are believed to have psychometric properties. The PSYGAT is a intelligence verbal test that was created by third year undergraduate psychology students. The PSYGAT was correlated well with the ACER AL a test that was designed by the Australian Council for Education for testing verbal abilities.
The sample consisted of three hundred and thirty seven undergraduate psychology students recruited from Monash University. Participants were recruited from campuses located at Singapore, Malaysia, South Africa, Clayton, and Caulfield. Participants were split into English speaking background and non-English speaking backgrounds. There were two hundred and sixty females and sixty-eight males. Two hundred and forty four participants were from an English Speaking background and ninety-three participants were from a Non English Speaking background.
Participants engaged in taking three intelligence tests. The tests were to be taken in their own time under no supervision. Participants used the computer to complete the three intelligence tests. Just before beginning the PSYGAT intelligence test a questionnaire asked the participants of the sex age and whether they were from an English or non- English speaking background. All participants’ results would be analyzed at a later stage regarding item analysis.
Undergraduate monash psychology students were asked to complete three intelligence tests that would be administered via computer. Firstly, every participant found access to a computer to participate in the first intelligence test. The first test that was taken was the Queendom Verbal IQ test. Participants attempted the three tests it in their own time and were told that the test should take them approximately 30-35 minutes to complete. After Participants finished the test, they were asked to make sure to keep a track of their score, as they needed the results for later purposes. The second test was the Queendom Culture Fair IQ test. The test involved answering 20 items that participants were told there was no time limits on this test. After completion of this test, participants kept a record of their score. The third test was the PSYGAT. The PSYGAT was a verbal test constructed by third year undergraduate students. Participants were also asked to complete this test as the third and last test. Before beginning, the test participants were asked to enter the scores of the Queendom verbal and Queendom culture fair IQ test before continuing to be tested on the PSYGAT verbal IQ test. The test also asked participants of their age and gender and if they were of English or from a Non English background.
Pearson’s r investigated the relationship between the PSYGAT test, Queendom verbal IQ and the Queendom culture fair IQ for English and non-English speaking backgrounds. Pearson’s r showed that that there was a significant result according to the relationship of the PSYGAT and Queendom verbal in both the English speaking background group ,r=.433,n=244,p<0.05 and the non English speaking background , r=.567, n= 93, p< 0.05. In addition Pearson’s r showed a non significant relationship between the PSYGAT and the culture iq for English speaking background , r= 067, n= 244, p >0.05 but a small significant relationship between the PSYGAT and the Queendom culture fair for the non English speaking background group , r=.238,n 93,p<0.05. Z scores investigated the significance of reliability between the English speaking groups and non-English speaking groups. Results show that internal reliability was not significant between the English and non English speaking backgrounds z=0.29,p>0.05. Z scores were also used to determine if there was a significant difference in validity coefficients between the two groups regarding the PSYGAT and Queendom verbal and the PSYGAT and culture fair IQ. Pearson’s r-values were transformed into Z scores to determine a statistical significance of the correlations between groups. Pearson’s r-values were also transformed into z scores because of participant difference for each group. Results show that there was not a statistical significant difference in validity coefficients between the English and non English speaking background of the correlation between the PSYGAT and the Queendom verbal IQ z=-1.39,p>0.05. However a large statistical significance was shown in the correlation between the PSYGAT and the culture fair IQ z=3.73, p> 0.05.
The aim of this study was to check for sound psychometric measurments and culture bias when comparing different groups and the intelligence test taken. It was hypothesized that the PSYGAT and the Queendom verbal would show no statistical significance in validity coefficients between the English speaking groups and non-English speaking groups. It was also hypothesized that there would be a statistical significance in validity coefficient between the PSYGAT verbal and the Queendom culture fair. The study showed that there was weak validity regarding the PSYGAT and Culture Fair IQ tests. Although there was
The results of the study-undertaken reveal that intelligence tests can show bias of different groups. These biases from this study can be related to lack of understanding cultural backgrounds as well as using relevant psychometric measures, which can be linked to low validity in intelligence tests.
Bonds argument that social and environmental differences in groups can affect test scores are may suggest similarities within this study in regards to the social setting and environment of participants that took the test. The Intelligence and School Achievement of Negro Children also showed that social conditions were highly correlated with low-test score (Franklin 2007). The results from this study show a significant statistical result between groups for the PSYGAT and culture fair test suggesting that social and environmental differences upon taking the test could have caused bias. In addition, The Brisbane Catholic Education argues that psychologists assessing refugees for intellectual abilities may show bias in their assessment due to lack of proper interpretation (Fraine and Mcdade). That is background-schooling experience, level and quality of education and conceptions of behavior test taking experience can reduce a test score for a certain type of group (Fraine and Mcdade). This Study could have affected one group more than another because of such factors.
Suggestion of having psychometric instruments that can measure intelligence scores for different types of groups should also be developed in both countries to reduce bias in regards to cultural differences. In this case, Intelligence tests need to have psychometric properties that can be valid for analyzing score between different groups Cohen and Swerdilk 2010). Similar findings were found when people from English speaking background and non-English speaking backgrounds were tested on their IQ. Results show that although the Raven’s Progressive Matrices is supposed to be a culture fair IQ test people from English speaking backgrounds did much better scoring 96.71 that people from Zimbabwean sample that scored 72.36 (Shuttleworth-Edwards, Kemp, Rust, Muirhead, Hartman, and Radloff 2004). Zindi also suggests that lack of westernized test sophistication may be the cause for lower scores (Shuttleworth-Edwards, Kemp, Rust, Muirhead, Hartman, and Radloff 2004). This study may have contributed to some cultural bias in regards to scores because of one of the test may not have been up to standards in regards to psychometric measures regarding content validity.
The limitations in this study could have affected the results. Firstly, the way the three tests were administered could have yielded different intelligence scores for both the groups. Supervising the test takers making sure that the tests were taken without any aid of books or other people could change the results, which could have yielded different interpretation of the validity of the tests. In addition, although there was good internal reliability between the groups the large difference in participants could have changed the chronbachs alpha to either a higher or a lower value. Participant range could have also affected the statistical difference between the groups when assessing the PSYGAT and culture fair IQ component of the study. In addition item analyses should also be investigated to check whether the items begin tested are fair to all participants taking the intelligence test. That is one question could portray more than one meaning to different participants taking the test. Cole suggests item analyses is a difficult task because one must ensure that the item being used is relevant in construct therefore affecting bias (Cole 1989).