Great Careers in Testing

Marié De Beer, D.Litt.

Marié De Beer is a Research Psychologist registered with the Health Professions' Council of South Africa (HPCSA). Her PhD (D.Litt etPhil) in Psychology was obtained from the University of South Africa in 2000, and for which she developed and validated a dynamic, computerised adaptive test for the measurement of learning potential (LPCAT).

De Beer's experience with testing comes from working as a researcher for the Human Sciences Research Council for seven years, between 1987 and 1993, where she had been involved with standardisation and development of numerous psychological tests. During this time she had also obtained her Master's Degree in Psychology and registered as a Research Psychologist with the Health Professions Council of South Africa.

Moreover, De Beer has been teaching psychological assessment and research methods at the University Of South Africa (UNISA) for the past 18 years, ever since January 1994, where she has supervised numerous Master's and Doctoral students in fields varying from cognitive psychology, through testing and assessment, to organizational psychology.

De Beer has also been awarded the "Women in Research" award of the College of Economic and Management Sciences in August 2011.

Dr. De Beer's Career

During your study years, you had obtained two degrees. Previous to your Bachelor of Arts degree in Psychology, you had obtained a Bachelor of Arts degree in Mathematics. What made you opt for such a quite large career change?
I started out studying computer science, but after my first year changed to combine Psychology and Mathematics as my major subjects for my undergraduate BA degree. Following my graduation, I was a Mathematics teacher for a local high school for one year. I then obtained a job in industry based on my Mathematics major and some computer science background -- which involved Fortran programming in the physical sciences. I completed an honours degree in Mathematics, followed by an honours degree in Psychology -- both obtained part-time while working as a Fortran programmer. During this time, an opening at the Human Sciences Research Council (HSRC) -- the government institution for social science research -- specifically with a view to develop computerised adaptive psychological (cognitive ability) tests led to a change in career towards psychometrics.

Would you say that it is your background in mathematics that sparked your interest in psychometrics? What prompted you to take up psychometrics as your career focus?
The opportunity at the HSRC combined my background in Mathematics, computer programming and psychology within the field of psychometrics. I worked with Dr. Nicolaas Claassen who was at the time responsible for large-scale national projects for the development of group cognitive ability tests for the HSRC. My background in computers, Mathematics and Psychology offered an opportunity to explore the possible use of computerised adaptive testing within the HSRC development projects. At the time we obtained and used the MicroCAT system of Assessment Systems Corporation which was the only commercially available product for development of CATs. The first CAT project was to use the items from two parallel forms of a standardised group cognitive ability test -- the General Scholastic Aptitude Test (GSAT -- forms A and B) in a single item bank to create a parallel CAT version for the GSAT. This was also the project that I used for my Masters' degree.

Much of your work has been done on computerized adaptive testing, and making pencil and paper tests adaptive. Could you please elaborate on the difference between computerized adaptive testing and pencil and paper tests? What are some of the caveats of constructing a computerized version of a test from a pencil and paper one? How is computerized adaptive testing both useful and important?
There is a natural progression from paper-and-pencil assessment to CBT (computer-based-testing) to CAT (computerised adaptive testing). Moving from paper-and-pencil to CBT involves the same items that would normally be administered in paper-and-pencil format being available in the same sequence (and mostly the same format) but being computer-administered. The use of technology is therefore merely to act as a different medium for the administration of the items and for the capturing of the responses of the individual being tested. The scoring and score interpretation remains the same as for the paper-and-pencil version. When one considers CAT, it involves a further step in that the technology is more fully utilised to optimise the assessment in various ways. With CAT, items are interactively selected from the available item bank based on the level of performance of the individual at the particular time -- therefore fewer items may be administered without necessarily forfeiting measurement accuracy. The algorithms for CAT involve access to a bank of items for which accurate item parameters are available, a selection rule for choosing each next item and a termination rule for ending the test. The important thing to remember is that one cannot simply assume measurement equity when the same items that are administered in a paper-and-pencil format are computerised and then administered in a CBT format -- or perhaps as a CAT. Empirical research to evaluate the equivalence of different formats of administration is required to provide the required evidence in support of the equivalence of different forms of the same or similar item tests.

You have spent many years working on and developing the General Scholastic Aptitude Test (GSAT). Could you please explain some of the relevance the GSAT carries in South Africa and some caveats you have encountered whilst standardizing it?
The project leader for the GSAT was Dr. Nicolaas Claassen and he developed various forms of the GSAT for different age and school year levels. I was fortunate to work with Dr. Claassen who was my immediate supervisor during the time that I worked at the HSRC. The CAT version of the GSAT was my responsibility -- and while good results were found, the CAT version of the GSAT was unfortunately discarded when the HSRC abandoned psychological test development as one of their core functions in the early 2000's. The paper-and-pencil versions and forms of the GSAT is currently available from a local test distributor (Mindmuziek) who is contracted to maintain and distribute the psychological tests developed by the HSRC since the HSRC gave up the responsibility for psychological test development and distribution in the early 2000's. In the SA context, discrepancies in socio-economic status, educational standards and the multicultural and multilingual composition of society (four race groups and eleven language groups) contribute to complexity in any test development project. The more so, when cognitive ability is the focus area for psychological test development, these factors contribute to the complexity of the domain within which researchers have to work. South Africa (SA) is a constantly changing context which makes it very difficult to set fixed rules and anticipate that assessment results will remain similar for long periods of time.

Your Doctoral Thesis focused on developing a computerized adaptive test that could measure learning potential, a rather impressive feat. Could you please elaborate how exactly the test measures learning potential and how it adapts to the participant's learning ability?
If one considers Vygotsky's concept of the Zone of Proximal Development (ZPD), it implies an Actual (current) level of achievement, as well as a Proximal (potential) level of achievement. The core value of Vygotsky's work is that it incorporates experiences of learning/teaching as part of the process of assessment -- in order to be able to assess the ZPD. This has been translated in many cases into the "dynamic" test-train-retest approach followed by a number of instruments that aim to measure learning potential. In the pre-test, a measure similar to that which would typically be obtained from any standard assessment is obtained. However, then that measure is not taken as the full "performance" level -- and it is then followed by the learning experience, and following the learning experience, another assessment which is supposed to then measure the "as yet unattained" learning potential. Vygotsky's view was that one would not be able to fully understand the current and potential future levels of performance of an individual by only focusing on their level of performance attained without any help or assistance. He believed that by providing some form of learning or assistance with the performance task, one would get an indication of what (possibly higher) levels of performance an individual may be capable of -- if given relevant learning and training opportunities. This translates to the typical test-train-retest (dynamic) assessment strategy followed by most dynamic / learning potential assessment instruments. In fact, if there is no opportunity for learning during assessment, one is measuring "proficiency" and not "potential". The singular identifying characterising of tests aimed at measuring "learning potential" are the opportunity / activity of learning during assessment.

There are many variations of learning potential tests that use the core test-train-retest strategy. Some use the same standardised measure in both the "pre-test" and repeat the same measure in the "post-test" -- with some form of learning aimed at the core task in-between. Such assessments need to contend with memory, since the exact same items are administered in the pre-test and the post-test -- which could to some degree confound the extent to which improvement can be considered "pure learning/ improvement" and to which degree memory may play a role in the second assessment performance. Other researchers use different methods, for example counting the number of hints needed to reach a pre-determined level of performance, etc. Some of the criticisms against DA measures have been the time-consuming test administration, measurement issues and lack of psychometric information (reliability and validity information) available. These and my exposure to CAT at the HSRC led to the combination of non-verbal figural measurement of fluid ability (considered more fair from the SA multicultural multilingual context), the way in which IRT and CAT could address both measurement issues and complaints of the long administration time of DA measures -- combined to the development of the LPCAT (Learning Potential Computerised Adaptive Test) -- aimed to measure fluid reasoning performance and potential. Measurement is similar to that of a standard cognitive test of fluid ability, but the addition of a learning opportunity as part of the test administration with another assessment opportunity (with a unique new pool of items) is what made it a dynamic measure which was deemed to offer improved fairness while addressing issues of measurement accuracy and time needed for test administration.

Perhaps it is also important to highlight that for many DA measures, the target populations are low performing individuals -- and for such groups, the difference score is often taken as the measure of "potential" -- because the initial level of those being compared are similarly low. In the LPCAT, the focus was on being able to measure learning potential over a much wider spectrum of ability ranges -- and because the initial pre-test levels of performance can vary quite significantly, when the results are interpreted, one cannot merely look at the improvement score (difference score) as a measure of "learning potential" -- but have to consider the pre-test level as well as the improvement at that particular level.

Therefore, our recommendation for interpretation of LPCAT results is to start at the "target level" -- that is, the reason for the assessment and the level at which a training or other opportunity is aimed. Working backwards from there, one could then compare the "current" (pre-test) level of performance of the individual with the "target level" to interpret whether the individual currently meets or gets close to the required "target" level. If the current level of performance (pre-test performance) does not yet reach or get close to the "target" level, one can then also consider the "projected future" (post-test) level to see if -- after training, the individual is able to reach close to or at the required "target" level of performance. This approach has meant that learning potential can be measured of from adult illiterate groups to postgraduate student level groups and the results meaningfully incorporated in terms of the particular decisions that need to be taken. Obviously with the required integration of other relevant information to consider for the particular decision making process. You have been teaching and supervising students at the University of South Africa for eighteen years. What would you say is the most rewarding aspect of teaching and/or supervising? What skills and competencies do you teach in you classroom that you hope the students retain and utilize in their future? I have been a lecturer at the University of South Africa (UNISA) for 18 years. As UNISA is a distance learning institution -- with more than 350,000 registered students! -- it implies that we do not see our students regularly as other lecturers on residential campuses do. Our students are mostly working adults who work full-time or part-time and who study towards their degrees over a number of years -- one to three subjects per year at at time! We tend to see the Masters' and Doctorate level students more as they often attend workshops and lectures. For me the rewarding part of teaching is to encounter students who really are keen to master the subject material -- not merely to qualify or pass an examination. I get excited by younger generations of subject experts who can take the baton that we have carried and go further and do better than we have (yet) been able to. The skills and competencies that I believe are of critical importance is to be able to do the boring as well as the exciting stuff equally well. When you work with data gathering and data capturing during a research project, meticulous accuracy in all that you do is what will determine the quality of the data that you work with and the quality of the research and the possible contribution that you can make in the subject field. Being able to bear at times with the monotony of what research involves so that the overall aims can be achieved is I think a very important characteristic of future researchers. Then also, retaining a sensitivity for the context within which you work is another important characteristic. In the SA context, the extreme diversity in terms of socio-economic, educational and other factors will impact on our society for many decades still to come -- and being sensitive to the fact that other individuals life worlds are often vastly different from your own -- and not to exclude any groups in the work that you envisage and do -- will help to also in the long run build a society and a country that cares for all its citizens and helps to develop and create developmental opportunities for all.

What career accomplishments are you most proud of and where would you like to take your career in the future?
The work that I did for my doctorate research -- the development of the LPCAT -- which took 7 years of research -- remains my proudest achievement. I am quite practically inclined -- so the fact that the LPCAT is used in a large number of countries in Africa, in the East and more recently also in Europe -- is something that I am proud of. Future work will continue to involve updates and revisions of the LPAT to improve its measurement properties, use of modern technology -- such as possibly iPads -- and extending access to it by making available translations of the instructions in more languages are elements that I foresee will keep me busy and involved in the next couple of years.

If you had only a 20 word sentence to convey what you know and do best, what would it be?
Being able to practically apply my subject-knowledge to add value in people's everyday lives.

Testing Careers and Education

For individuals considering a career in psychometrics, what are some of the key qualities, skills or talents you believe they should possess in order to be successful in the field of psychometrics?
A sound knowledge of the basics of measurement and psychometrics -- with inclusion of the Classical Test Theory principles and measuring indices as well as the modern measurement techniques and applications (Rasch and IRT methods). Sound statistical knowledge and an interest in solving practical problems based on real data -- not merely providing theoretical possibilities based on simulated data.

In your opinion, is it an advantage to have a background in mathematics if working in the field of psychometrics?
It is possible to work in psychometrics without having to have an indepth understanding of the intricacies of all the formulas that are applied. A basic knowledge and understanding of measurement is important -- and being able to understand the purpose and value of specific techniques are important -- but each and every person working in psychometrics does not need to be able to contribute from a mathematical or a statistical perspective. The latter would be for the front-end-runners in those fields -- but to master what is necessary to be able to practice in the particular domain in which you choose to pursue applied practical knowledge and to be prepared to master what is necessary for you do do so, is the important characteristic that I would recommend.

How important is expertise in creating tests compared with the expertise of the subject matter in question?
I think that the basic steps in creating tests are very important -- and the core principles and reasons for doing things in a particular way will remain important. On the other hand, one should not necessarily be bound (only) by the traditional way in which things have been done -- and confidence in your subject matter domain may also allow you to explore wider and further than what others before you have done. As long as the scientific rigour of the work is non-negotiable and empirical evidence in support of new explorations gathered and presented, there are wonderful opportunities for being creative within the field of psychometrics -- and to experience both the science and the art of working with research and practical applications within this domain.

Would you say that extensive knowledge of Item Response Theory is more important for students to acquire nowadays than extensive knowledge of Classical Test Theory in order to pursue a successful career in psychometrics?
I believe that students should be knowledgeable in both CTT and IRT and be comfortable using both for the particular value that they bring to various aspects of psychometrics. I believe they both have value t add and that future psychometricians should continue to work knowledgeably with both approaches.

The Past, Present and Future of Testing

What percentage of high stakes tests are computerized adaptive tests in your estimation? What percentage do you estimate it to be in 5 -- 10 years?
At present I would put the percentage (guessing -- no empirical evidence) at around 5% to 10% - but I anticipate that with new freeware programs becoming available and more researchers and other groups becoming better informed about the different possibilities, that this percentage will probably increase to around 25% in the next 5 to 10 years -- if not more. The development of technology, improved security and better understanding of the values (time, accuracy, item security etc) that these methods hold are bound to expand its footprint in the assessment industry within the next few years.

What do you find are the advantages and disadvantages of computerized adaptive tests versus paper and pencil tests?
There are mostly advantages in terms of efficiency, time saving, accuracy of measurement and test security linked to CAT. However, typical technical challenges such as broadband, power problems, and availability of computers in remote regions may continue to provide challenges. Looking at the increased development of other technologies such as cell phones and iPad technology, these are also bound to be impacting on assessment fields in the near future. I think the field of psychometrics should embrace new technology -- but ensure the fastidious research and empirical evidence trials that are what make our products scientific measurement tools and not merely gimmicks of new technology. The scientific rigour and empirical evidence can never be compromised.

Do you think Item Response Theory is the future of testing? Given your appearance and participation at numerous conferences, would you say that many International Conferences are starting to focus on IRT?
Yes, I have to agree -- and IRT and its applications are bound to become more prominent not only in the measurement-focused conferences but all conferences including the practical applications and evidence of how these methods, approaches and techniques can add value in most domains where measurement is of concern.

Do you think an open source online platform for creating computerized adaptive tests would benefit the psychometric community?
I think this would add a lot of value -- because it would on the one hand ensure correct and appropriate formulas and entry data to be required, while not requiring the user to have the in-depth statistical knowledge to be able to utilise such a facility. Similar to many researchers being able to do research and do quantitative data analysis using SPSS without necessarily knowing the intricacies of the statistical formulae and calculations required when doing their data analysis -- such a platform will I believe help to expand the research about and implementation of CAT applications.

What do you believe to be the next step when it comes to standardized tests? How have these tests evolved from the past to the present?
I am not sure that there will necessarily be a dramatic expansion of the dimensions or domains that we are currently interested in in terms of psychological assessment -- but the technology development may bring about new ways of approaching such assessments. With that said, newer is not always better and a critical evaluation and scientific comparison of new approaches and methods that become available should continue to grow the knowledge-base of the field of psychometrics and -- where merited, add new technology, approaches and methods to the existing repertoire of those working, doing research and practising in the field.

What do you believe is the primary value of testing in our society and why do you think it is so important to have standardized tests in place today?
Testing allows for standardised and objective measurement of dimensions and characteristics that are of importance in everyday decision making. On the one hand it can be used from a developmental perspective for individuals to get to know themselves better, to plan their personal development and to help set goals and steer their careers and futures. On the other hand, assessment is often used in a comparative fashion where individuals may be competing for scarce resources -- and in such situations the onus remains to ensure that the measurement information that is used in such decision making processes remain of the highest possible scientific quality and that ongoing research should always support the way in which such assessment results are utilised.

Do you find that standardized tests are all that is needed for evaluating an individual's skills and competencies?
Standardised tests are but one element in a more broad-based assessment grid -- amongst which personal information, application form data, background checks, reference checking, practical assessment etc can all play a valuable role in obtaining a fuller understanding of a particular individual or group within a particular context -- to allow for more responsible decision making. Layering of information from different sources, allow the full profile of the individual to become more 'visible' -- so that within the particular context and with the particular opportunity being considered, the 'best' -- most well thought through and carefully considered -- decision can be made. We don't work in a scientific field where absolute exactness is possible -- so at most each layer of measurement will hopefully contribute to better accuracy in decision making -- but there will always be measurement error, elements not considered and both "false positives" and "false negatives" to counter the best and most carefully considered decisions that we can make!! Nevertheless, the field of psychometrics and standardised assessments has added and will continue to add value to decision making processes and will remain an exciting field to be part of.