Critics have long argued that the tests have cultural or class biases. What’s gone unnoticed is that, stinging from these allegations, all the testing companies have modified their tests to minimize these effects—to the point where they no longer really consider bias an issue. (While that would seem a hard pill to swallow, consider that versions of these tests are used around the world.) On top of those changes, schools with the strictest IQ requirements often make exceptions for children who come from disadvantaged backgrounds or speak something other than English as their first language.
Meanwhile, shaping the debate along the bias fault line has led us to completely ignore the larger question: how often do the tests accurately identify any bright young kids—even the mainstream children?
We asked admissions directors and school superintendents, and they all had the impression that the tests were accurate predictors. The tests come with manuals, and the first chapters of these manuals are dense with research conveying an aura of authentication. However, the statistics reported are generally not about how accurately the tests predict future performance. Instead, the statistics are about the accuracy of the tests at predicting current performance, and how well these tests stack up against competitors’ tests.
Dr. Lawrence Weiss is the Vice President of Clinical Product Development at Pearson/Harcourt Assessment, which owns the WPPSI test. When I asked him how well his test predicts school performance just two or three years later, he explained that it’s not his company’s policy to assemble that data. “We don’t track them down the road. We don’t track predictive validity over time.”
We were shocked by this, because decisions made on the basis of intelligence test results have enormous consequence. A score above 120 puts a child in the 90th percentile or higher, the conventional cutoff line for being called “gifted,” and may qualify her for special classes. A score above 130 puts a child in the 98th percentile, at which point she may be placed in a separate school for the advanced.
Note, these kids aren’t prodigies—a prodigy is far rarer, more like a one in a half-million phenomenon. To be classified as gifted by a school district indicates a child is bright, but not necessarily extraordinary. Half of all college graduates have an IQ of 120 or above; 130 is the average of adults with a Ph.D.
However, earning this classification when young is nothing less than a golden ticket, academically. The rarified learning environment, filled with quick peers, allows teachers to speed up the curriculum. This can make a huge difference in how much a child learns. In California, according to a state government study, children in Gifted and Talented programs make 36.7% more progress every year than the norm. And in many districts, such as New York City and Chicago, students are not retested and remain in the program until they graduate from their school. Those admitted at kindergarten to private schools will stay through eighth grade.

While the publishers of the tests aren’t trying to determine how well early intelligence tests predict later achievement, the academic researchers are.
In 2003, Dr. Hoi Suen, Professor of Educational Psychology at Pennsylvania State University, published a meta-analysis of 44 studies, each of which looked at how well tests given in pre-K or in kindergarten predicted achievement test scores two years later. Most of the underlying 44 studies had been published in the mid-1970s to mid-1990s, and most looked at a single school or school district. Analyzing them together, Suen found that intelligence test scores before children start school, on average, had only a 40% correlation with later achievement test results.
This 40% correlation includes all children, at every ability level. When Suen narrowed his focus down to the studies of gifted or private schools, the correlations weren’t better.
For example, one team of scholars at the University of North Carolina at Charlotte analyzed three years of scores of an upper-middle class private independent school in Charlotte. The school required all applicants take the WPPSI test prior to being admitted to kindergarten. They were identified as smart kids—the average IQ was 116. In third grade, the students took the Comprehensive Testing Battery III, a test developed to fit the advanced curricula of private schools. As a group, the students did well, averaging scores in the 90th percentile.
But did the WPPSI results forecast which students did well? Not really. The correlation between WPPSI scores and the achievement scores was only 40%.
For students at the very high end, the correlations appear to be even lower. Dr. William Tsushima looked at two exclusive private schools in Hawaii—at one school, the kids had an average IQ of 130, while at the other, just over 126. But their reading scores in second grade had only a 26% correlation with WPPSI results. Their math scores had an even poorer correlation.
The relevant question, therefore, is just how many children are miscategorized by such early testing?
As I mentioned before, using tests with that 40% correlation, if a school wanted the top tenth of students in its third-grade gifted program, 72.4% of them wouldn’t have been identified by their IQ test score before kindergarten. And it’s not as if these children would have just missed it by a hair. Many wouldn’t have even come close. Fully one-third of the brightest incoming third graders would have scored “below average” prior to kindergarten.
The amount of false-positives and false-negatives is worrisome to experts such as Dr. Donald Rock, Senior Research Scientist with Educational Testing Service.
“The identification of very bright kids in kindergarten or first grade is not on too thick of ice,” Rock said. “The IQ measures aren’t very accurate at all. Third grade, yeah, second grade, maybe. Testing younger than that, you’re getting kids with good backgrounds, essentially.”
Rock did add that most kids won’t fall too far. “The top one percent will certainly be in the top ten percent five years later. It is true that a kid who blows the top off that test is a bright kid, no question—but kids who do quite well might not be in that position by third grade.”
According to Rock, third grade is when the public school curriculum gets much harder. Children are expected to reason through math, rather than just memorize sums, and the emphasis is shifted to reading for comprehension, rather than just reading sentences aloud using phonics. This step up in difficulty separates children.
“You see growth leveling off in a lot of kids.” As a result, Rock believes third grade is when testing becomes meaningful. “Kids’ rank ordering in third grade is very meaningful. If we measure reading in third grade, it can predict performance much later, in a lot of areas.”
The issue isn’t some innate flaw with intelligence tests. The problem is testing kids too young, with any kind of test.
“I would be concerned if high-stakes judgments such as entry to separate selective schools were based on such test results,” said Dr. Steven Strand of the University of Warwick in Coventry, England. “Such structural decisions tend to be inflexible, and so kids can be locked in on the basis of an early result, while others can be locked out. It’s all about having sufficient flexibility to alter provisions and decisions at a later date.”
In contrast to testing children in preschool or early elementary school, Strand found that IQ tests given in middle schools are actually very good predictors of academic success in high school.
In a recent study published in the journal Intelligence, Strand looked at scores for 70,000 British children. He compared their results on an intelligence test at age eleven with their scores on the GCSE exam at age sixteen. Those correlated very highly. If early childhood IQ tests could predict as well as those taken at age eleven, they’d identify the gifted students about twice as accurately.
Every single scholar we spoke to warned of classifying young children on the basis of a single early test result—all advised of the necessity for secondary testing. And this caution didn’t come from those who are just morally against the idea of any intelligence testing. This admonition came most strongly from those actually writing the tests, including: University of Iowa professor Dr. David Lohman, one of the authors of the Cognitive Abilities Test; Dr. Steven Pfeiffer, author of the Gifted Rating Scales; and Dr. Cecil Reynolds, author of the RIAS (the Reynolds Intellectual Assessment Scales).
Despite the unanimity of this view, because of the cost and time involved, kids are routinely awarded—or denied—entrance on the basis of a single test, and in many schools are never retested.
“Firm number cutoffs are ridiculous,” said Reynolds. “If we were doing the same for identifying special-ed students, it would be against federal law.”