The Hayward Nonstandard Test: an interesting failure

In recent years, I published what I then (and now) consider an interesting test. It was meant to look for indirect signs of profound giftedness. I wrote it with the hope that it would circumvent the ceiling of standard model tests, and I wouldn’t have been surprised if it showed a floor above some other tests’ ceilings. Let me cite the questions before continuing:

  1. Describe who you are, how you see the world, and what your inner world is like.
  2. Describe your most impressive and distinctive achievements.
  3. Describe your most impressive and distinctive failures.
  4. Describe what you hope/wish/want/intend to accomplish with your life. What do you believe you will accomplish?
  5. What is your educational background? Include out of classroom learning you consider appropriate.
  6. What is (are) your domain(s) of desired excellence? What is your work there? What have you achieved? What failures have you experienced?
  7. Have you ever had management problems or been fired? If so, describe each time.
  8. Describe any unusual or distinctive characteristics of your childhood physiology and physique.
  9. What mental health diagnoses and misdiagnoses have been considered for you (that you are comfortable divulging)? Elaborate if desired; if there is information you’d prefer to omit, please say so.
  10. What are your interests?
  11. On a scale of -1.0 to 1.0, rate yourself on the dimensions of the Myers-Briggs test: E(-1) to I(1), S(-1) to N(1), T(-1) to F(1), P(-1) to J(1). Elaborate if desired.There are a few ways to take the Myers-Briggs test, one of the cheapest of which is to check out e.g. Kiersey’s Please Understand Me II from the library; the Kiersey web site has assorted information online.
  12. What is one of your favorite books? Why? Elaborate.
  13. Provide a sample of your best writing.
  14. What is one of your most cherished of your creations? Explain. If feasible, include a copy; if not, describe.
  15. As a child or youth, what was one inconsistency you observed in the adult world that was painful?
  16. Describe, with examples, your sense of humor.
  17. Do you fit in (yes/no/question does not admit a yes or no answer for you)? Explain.
  18. Provide, and answer, one question that you believe will provide me with deep insight into your intelligence.
  19. Write your own short intelligence test.
  20. What else can you say to provide me with evidence of your intelligence?

Richard Feynmann’s Cargo Cult Science address talks about the need to publicize failed experiments as well as successes. I am publishing results, not to claim a new success, but because in its failure it may be interesting. Someone else may find a refinement of the idea that works, or other lessons may be taken from its failure. This seems to be an interesting failure.

I received responses from four men, whom I will call Adam, Brandon, Charles, and David. I opened and read them at the same time to limit bias. Adam seemed gifted, around the top of the range of “optimum intelligence” where you have a definite advantage over others but aren’t so different that it starts to really hurt. Brandon seemed just over the edge; I hesitated in comparing them and finally placed Brandon slightly above Adam. Charles showed signs of real giftedness; earlier in life he had effectively solved a problem that it originally took Euler to solve. Charles struck me as profoundly gifted. Finally, if Charles showed brilliant complexity, David showed a simplicity on the other side of complexity. (“I wouldn’t give a fig for the simplicity on this side of complexity, but I’d give my life for the simplicity on the other side of complexity.”) In my notes, I compared his communication to how Richard Feynman closed the O-ring debate: “Feynmann, after people enquiring into the Challenger disaster had spent days arguing whether it was too cold for the O-rings, took an O-ring, swirled it around in his icewater, and pinched it, snapping it.” David struck me as not only profoundly gifted but at a higher plateau than Charles’s dazzling performance. Trying to describe the spread, I said that if the lowest score were a 1 and the highest were an 8, then I would give Adam 1, Brandon 2, Charles 6, and David 8. (I guessed numbers at 150, 155, 165, and 185; I intentionally did not reconcile these two sets of numbers.) Then I opened their prior test scores.

Charles had scores of 140-151, which I regarded as ceiling scores which did not provide useful information beyond being ceiling scores. Adam, Brandon, and David had highest prior scores of 168, 172, and 174 respectively. (I am inclined to lend more credence to the higher scores as it is more plausible to say that someone properly rated around 170 hit his head on the ceiling and scored around 130 than someone properly rated at 130 accidently obtained a score around 170. I acknowledge that this could inflate my estimates.) After an hour or so of trying to convince myself I could interpret their scores so that they would say my test worked, I realised that my test found a significant difference where none was independently verified. Adam, Brandon, and David had highest scores well within measurement error of each other. Furthermore, Adam had consistently high scores: his lowest score was 156, while no one else had two scores above 155. Comparing with previous data, there was no positive correlation to prior test scores, and the person who looked best from previous scores was the person I’d ranked the lowest.

This does not necessarily mean my test is invalid. Four responses, three of which were within measurement error of each other, do not a norming make. Given that responses had appeared at a rate of about one per year, it’s not clear how long it would take to obtain a basis for a solid anchor norming, and if I would still be alive when enough responses had been completed. I opened the responses more on an intuition than anything else, and what I have is not a norming but an understanding of why it might not have been helpful to wait for enough responses for a norming. Furthermore, the fact that previous test data does not distinguish between them does not mean that they are at the same level. All four normees are bright enough to get ceiling scores on standardized tests. That leaves open the possibility of significant differences between them, including the possibility that Charles and David are appreciably brighter than Adam and Brandon. However, I am speaking about what is possible and not about claims that my results support. My results do not say anything positive about my ability to discriminate between responses. If there is anything interesting obtained from my test, it is not between responses but the fact that people responded at all. My website, CJS Hayward , averages between 500 and 1000 unique visitors per day, with an average of two people reading the test per day. Only four people responded in three years, with all of the normees being brilliant. That seems significant, and I’m not sure what all it means. Apart from that, no ability to discriminate usefully between scores has been established in the usual fashion.

Summary of Responses

I would like to briefly describe the responses I received, both to provide an overall picture and to describe what I would single out in my evaluation. Here and elsewhere in the evaluation, I am intentionally using vague and generic descriptions rather than ones that are detailed and specific. This impoverishes the writing and gives a less valuable analysis, but I want to be cautious about confidence, and I expect that some of the people reading this will be quite good at connecting dots.


Adam’s response was three pages long, seemed candid (as did the others), and included achievements at state level. His responses answered the questions, but did not have the florid, ornate, wheels within wheels quality I associate with someone brilliant who is speaking on a topic he finds interesting. The content of his responses strikes me as reflecting more intelligence than the writing style: it was well-written, but did not reflect the “mental overflow” I was looking for. His list of interests was relatively short (twelve), and included a few items that do not specifically reflect intelligence. Several of his choices suggest noteworthy social maturity; this, combined with my losing track of how he opened his responses, led me to assume that he was more gifted than profoundly gifted.


Brandon’s response was also three pages long, and showed the pain of the social disconnect which many profoundly gifted experience. His list of interests was also short, but the activities themselves more distinctively suggest high intelligence. His general approach, in particular to society and authority, shows many of the signature traits David Kiersey (Please Understand Me II: Temperament, Character, Intelligence, Buffalo: Prometheus, 1998) describes in profiling the NT “rational” temperament. (Three out of the four normees were NTs, and all of them were strongly intuitive.) He also has an uncanny knack for guessing certain kinds of information—which is an anomaly that I’m not sure what to do with. The examples, however, did not leave me wanting attack the anomaly by pointing him to Thomas Gilovich’s How We Know What Isn’t So (New York: Free Press reprint, 1993). He showed a desire to use his mind to transform society that seems to be common among very bright people.


Charles’s response was twenty-seven pages of wheels within wheels. From the first page I was met with nuance that let me know I hadn’t taken everything in on the first reading, despite it being well-written. He claimed not to have any distinctive achievements. This modest remark was followed by no fewer than eight pages of dense summaries of some of his theories. These theories were subtle. They had a logical and scientific character and a spark of something interesting that stretches outside the bounds of science. He used a nonstandard format that made their logical structure clearer—successfully modifying a familiar format to make an unfamiliar format that works better, which is difficult. In the pages of his response I met an edifice of thought which impressed me and which I knew I didn’t understand. (I say this as someone who has put a lot of effort into understanding other people’s belief systems.) His response to that question reminds me of a passage in my current novel:

The woman looked at me briefly. “What languages do you know?”

If anything, I sank further back into my chair. I wished the question would go away. When she continued to listen, I waited for sluggish thoughts to congeal. “I… Fish, Shroud, Inscription, and Shadow are all spoken around my island, and I speak all of them well. I speak Starlight badly, despite the fact that they trade with our village frequently. I do not speak Stream well at all, even though it is known to many races of voyagers. I once translated a book from Boulder to Pedestal, although that is hardly to be reckoned: it was obscure and technical, and it has nothing of the invisible subtlety of ‘common’ conversation. You know how—”

The man said, “Yes; something highly technical in a matter you understand is always easier to translate than children’s talk. Go on.”

“And—I created a special purpose language,” I said, “to try to help a child who couldn’t speak. I did my best, but it didn’t work. I still don’t understand why not. And I—” I tried to think, to remember if there were any languages I had omitted. Nothing returned to my mind.

I looked down and closed my eyes. “I’m sorry. I’m not very good with languages.”

Charles listed approximately fifty different interests—which is less significant than it sounds, as he broke his interests down in more detail than the other normees, but the detailed breakdown strikes me as significant independent of its content. He was the one normee who answered the Myers-Briggs question in the mathematical format requested—which does not mean that he is the only normee who could do that task, but may suggest that he was the one person who didn’t take a shortcut by “just using adjectives”. I wrote the test to listen for a certain accent in how people respond, and his sense of humor showed that accent loud and strong.

He wrote a complete test which seemed to have a low ceiling, but was polished enough that I wouldn’t be surprised to see something similar on the web, and he showed self-criticism in writing the test, acknowledging that it was culture-biased. The completeness and level of polish for that answer caught me off guard.

I was looking to be surprised in a certain way, and for reasons discussed above Charles gave me the kind of surprises I was looking for.


David’s response was twenty pages. He provided an extended writing sample, and (to my surprise) a complete transcript of grades from childhood. His answers were by far the most polished; they give the impression of finding, out of a large space of things that could be said, a microcosmic gem that encapsulates the whole space. Most of his responses were short; the twenty pages stem from the length of his answers to a small number of questions.

Question 11, requesting Myers-Briggs personality type, contained a hidden question. I was interested in Myers-Briggs type, but most interested in whether the normee would question the test or talk about not fitting in the frame the Myers-Briggs test provides. David told his type en route to making a dismissive remark about the test. In other words, he was the one respondent who questioned the test. The most cherished creation he gave was one that showed a certain kind of mental fireworks, reminiscent of the dialogues in Douglas Hofstadter’s Gödel, Escher, Bach: An Eternal Golden Braid (New York: Basic Books reprint, 1999).

David also surprised me, and I heard an accent of brilliance.

Interesting Features

What are the distinctive features of my test? I would like to describe them below.

Emphasis on Tacit Knowing

The way Western culture is shaped means that psychology tries to know its subject-matter with the same kind of knowing as physics has of its subject-matter, in other words I-It rather than I-Thou knowing that is depersonalised and banishes tacit knowing as far as possible. (Banishing anthropomorphism is appropriate when you’re studying rocks. It’s more debatable in trying to understand people.) When I was thinking about how to write up the experiment, before I looked at prior scores, one of the things I intended to compare was writing samples. Brandon offered a clever placeholder in place of a “real” composition. Adam provided some poetry that reminded me of fifth grade English reading; I objectively recognized quality but felt no subjective emotional response. Charles provided poetry that I wasn’t sure I understood but none the less felt like something powerful was washing over me, and I was sorry when it ended. David sent a fiction excerpt that filled me with despair. The tone of the writing was not despairing; I felt the despair of being shown writing so perfect that I despaired of ever attaining that standard.

Why am I talking about my subjective emotional reactions instead of objective assessment? That is why I chose this specific example, instead of examples of thought that would have more to justify them from the framework that understands knowledge in depersonalized and objective terms. I choose it because I paid attention to subjective emotional reactions. I believe that they are tied to tacit and personal ways of knowing: I experienced subjective emotional reactions because I was responding to different pieces of writing that were not of the same quality. Subjective emotional response is one of several things that can be a cue worth listening to.

(I am intentionally keeping the philosophy brief; the philosophical dimension involved in this topic is one that admits very long discussion.)

Listening for an Accent

In most tests, there is a suite of questions meant to map out where a person’s intelligence breaks down, and scoring is how many points total are earned. In this test, the questions do not represent a direct attempt to present difficulty in answering. The intent is rather to obtain a composite picture, and shed indirect light on how bright a person is. The assumption is that different levels of giftedness will leave a definite mark on a person, and that that definite mark is discernible through understanding the person. For one example, above a certain level, a person is so different from the majority of people that there is a social disconnect; children above IQ 170 tend to feel that they don’t fit in anywhere. That kind of social disconnect was clearly discernible in all but one of the responses; Brandon clearly articulated it.

To some extent, that is corroborated by the data. I identified all of the normees as significantly gifted—which I had no reason to anticipate. The first norming of the Mega test had fewer than 10% of normees successfully answer any of the questions. (People who are emotionally insecure often attempt difficult tests to get an answer that may feel special; as the number of emotionally insecure people vastly outweighs the number of people at that level of giftedness, they “should” have been a small minority.) So I was able to recognize giftedness in all of the normees when I was not expecting it. That stated, the evidence does not warrant the conclusion that my test usefully discriminates among the normees.

Problems with the Norming and Test

As this test, or at least this norming, has been a failure, it’s worth paying attention to what went wrong.

Pool of Normees

I have not done any real statistical analysis because there is no basis for analysis, and the statistics would only give a more precise quantification to the statement, “The measurement error exceeds the difference measured.” Even if the four normees represented an optimal 120-140-160-180 spread, four points would be questionable. As is, the only conclusion I can confidently claim from prior test data is that all of the normees are at or above standardized test ceilings. In other words, data from previous tests do not provide a basis to claim that my test discriminates (and what correlation exists is negative).

Two Dimensions Flattened Into One

Giftedness affects personality, but it is inadequate to simply say, “Giftedness is personality.” There is diversity at each stratum of giftedness, and the normee pool did not permit the kind of two-dimensional analysis that would be needed to properly interpret responses (if there is a proper interpretation to be had).

An Invasive Test

This test is invasive. It’s painful and offensive. There is probably a way to attempt a similar operation much more gently and delicately. My guess is that this, more than anything else, is why I only had four responses in three years. If this principle were put to serious use, it would have to be rethought so that it went about its aims with a far defter touch. (Or perhaps just remove certain questions.)

One question which I wonder is whether this offensiveness, which is partly an unedited form of giftedness, was the main reason why only brilliant men responded. The test’s form may have been a powerful selector. So it would have put most people off. But that is not the whole story. Keep in mind that “reading” on a conscious or unconscious level is a two-way street, and the test reveals something significant about me as well as requesting revelation of the normee. A few very bright people, however, might be bothered by the invasiveness, but they recognize and respond to a voice that feels like home. It connects. That, at least, is speculation which seems plausible, but which I don’t see how to support without writing a gentler test.

Not Personal Enough

In one sense, this test was personal, too personal—it probed bluntly into things that are not polite to ask. In another sense, though, it related to the normees as objects to be studied, trying to dissect them as people but still dissecting them. It moves partway from I-It to I-Thou, but I believe it is possible to have a fuller I-Thou knowing, although I don’t know what a fully I-Thou approach would be like. It could be argued that the questions are offensive because the test was not personal enough. In other words, the test reflected an attempt to understand people but not in a personal way. Furthermore, some of the philosophical merits to a personal approach may bear fruit if there were a more genuinely personal approach.

Lack of Checks

The attempt to be objective tries to strip out everything subjective as a means to strip out subjective bias. Ideally one would want to allow subjective strengths while using another form of rigor to mitigate subjective bias, but I am not sure what that other and more difficult rigor would be; I have not solved that problem.

I requested responses to questions and personal information separately, so I wouldn’t know whose material I was working with until after I had ranked the results. There was one normee for whom this attempted anonymization failed—David, whom I know and I hold in awe. I’d like to say that I didn’t let this influence my estimation, but that’s not true. As it is now, Adam’s responses struck me as simple because it seemed what he was saying wasn’t very big, and David’s responses struck me as simplicity on the other side of complexity—something big in an elegant nutshell. Charles’s responses struck me as complex, in other words as simply being big. I’d like to say that I was unbiased, and I didn’t think “David answered, and I’m terribly impressed with him, so I’ll put him highest,” but I simply followed the argument where it led. I’d like to say that, but I can’t. Maybe I should have ranked Charles highest. I’m vulnerable to accusation of bias at least here. And this kind of bias may be present in the attempt to understand another person—recognition is a risk.

Book Knowledge that Didn’t Pan Out

There’s a reason why I asked about people’s worst failures, and it’s not because I like making people squirm.

Howard Gardner’s Extraordinary Minds (New York: Basic Books reprint, 1998) is a multiple intelligence treatment of genius. One of the points that he talked about was failure—experiencing failures and being spurred on by them (120-123). Because of this, I was hoping to see discussion of trying and failing and trying and failing and trying and failing—like Edison’s numerous failures en route to inventing a working light bulb. I believed that genius and those approaching genius not only are not immune to failure, but fail more often and more significantly than the vast majority of human beings.

This is a nice theory, and it may well be true, but the question based on it did not obtain informative answers for this purpose. I was expecting for normees at this level to see different degrees of failing in courageous projects (and in less glorious matters); I would not want to divulge what the normees shared, but if they did experience this pattern of life, I did not discern it in the replies. (This question should probably be removed in derivative work; the offensive questions seem less informative than I had expected.)

Another question was related to Leta Hollingworth’s Children Above 180 IQ: Stanford-Binet Origin and Development (New York: Arno Press, 1975), in which Hollingworth claims that the children she studied were significantly above average size and weight for their age. I thought that the brighter respondents would share this distinctive physique. Only Brandon mentioned something along these lines, which means it might be useful as one piece of a large puzzle, but it was not the predictor I’d hoped. (There were other questions motivated by similar concerns.)

A Successful Failure?

This test is a failure, or at very least my attempt to norm this test is a failure. Out of an estimated two thousand people that were aware of the test, only four responded, and the result is a statistically insignificant and negative correlation. I underestimated Adam in particular; if there is a lesson to be drawn from him, it is that it is possible to be brilliant while showing relatively few of the indirect traits this test sought to identify.

I was not looking forward to the prospect of writing delicate responses to a majority of normees who were insecure and of normal intelligence, and would approach difficult tests to have a big number that will make them feel OK about being human. That this did not happen touches on two reasons why I consider this an interesting failure:

  1. Only brilliant normees responded. Therefore, while demonstrated ability to discriminate between answers is nonexistant, the fact of responding to the test is highly significant. There is an implicit hidden question: not, “What traits will distinguish your response?” but “Will you respond at all?”
  2. I correctly identified all the respondents as significantly gifted. The lowest estimate I gave was a three sigma score. In other words, I correctly identified all respondents as being at or above the 99.9th percentile, even though this was contrary to my expectations.

This is also an interesting failure in that it attempts an inquiry that is based on a different principle. If it were not for confidence issues, I would likely publish the responses so that specific questions could be analyzed. It may be possible to make a hybrid test that combines traditional high-ceiling tests with this basic approach. The two approaches could be complementary.

Given that this is a first try, it may be better to label this approach as “Hasn’t succeeded yet” than “Has failed.” It would be surprising if this kind of distinctive approach succeeded on the first try. Furthermore, the way this norming failed suggests there’s something in the approach.

There are several philosophical questions which admit interesting discussion. One of the more interesting questions is what alternatives to dealing with subjective bias exist besides trying to exclude all subjective elements (officially, at least: I suspect that good “objective” judgment has drawn on subjective strengths all along). Most of the philosophical aspects mentioned merit further inquiry.

I believe that Charlie and David are at a higher plateau than Adam and Brandon; data from other tests does not discriminate from them, but I have priveleged external information that would place David above Adam. If they were to contact a third party who could corroborate that Adam and Brandon are at one high plateau and Charlie and David at a higher plateau, that would be reason to take a second look at the results.

I believe that the responses give a much richer picture of the person than a standard test. Someone, instead of asking, “Does this compete with traditional tests?” might ask, “What interesting data does this give that traditional tests don’t?”

So this test is a failure, but an interesting failure, and perhaps even a successful failure.

