The rise of AI Discourse means that I have discovered that I have Opinions on the Turing Test. I've gone through them a couple of times on social media, but I thought I'd record them here for posterity.
Most popular depictions and analyses of the Turing Test are either a simplification of, or a literal interpretation of Turing's original paper. They have weaknesses that stem from failing to understand Turing, and his strengths and weaknesses.
To start with, it's a fascinating paper. It introduces a way to cut the Gordian knot of "What is intelligence?", it spends a lot of text approaching various objections to artificial intelligence that still go on, and it makes predictions of the complexity required to produce an AI and very roughly when we might expect it, and did all this little more than a year after EDSAC was running!
Indeed the questions and replies in the text are remarkably prescient of where we are with LLMs. And, reading the original paper, Turing did not see the Imitation Game as a pure thought experiment, but something that could be done one day. So why is the conversation around the current state of the Turing Test so messy?
Most people don't see the computer science context of Turing's work. In theoretical computer science, a well-known concept is Turing-completeness. This is the idea that a programming language or system is sufficiently powerful to solve the same problems as any computer system.
The canonical argument used by Turing to show that a system X is Turing-complete is that it is able to emulate other systems known to be Turing-comlete. If it can emulate the other system, it can solve all the problems that system can solve, and must be at least as powerful as it.
It's not explicitly stated, but this is exactly the same arguemnt Turing is using for the Turing Test: If a computer system is able to emulate a human, it must be as powerful (intelligent) as a human.
There are a couple of subtleties here. One is that it defines a domain of emulation: It's suggesting all of practical intelligence can be expressed through conversation. To pass as a human for the purpose of testing intelligence, the system needn't perform physical tasks, draw, listen, etc. Turing suggests that the essence of intelligence can be evaluated through a text stream alone. I think this is reasonable, but it's also a little under-discussed.
Another subtlety is what's necessary vs. sufficient to demonstrate intelligence. A criticism of the Turing Test is that it takes a human-centric definition of intelligence. I think this misses the point. If a system can emulate another intelligent system, it is intelligent. If it cannot, that tells us nothing. That doesn't mean it's not intelligent. And, so far, humans are the only example of "intelligence" we have to hand. A hyperintelligent system that thinks nothing like a human, but can emulate one if it wants, will still pass the Turing test.
"Failing the test tells us nothing" is slightly interesting to compare to the theoretical computer science case of Turing-completeness. If we can show that system Y is not able to emulate a Turing-complete system, we know it is strictly less powerful. On the other hand, if I am unable to do a perfect emulation of Einstein, it doesn't mean I'm not intellligent. It doesn't even mean I'm less intelligent than Einstein (although I am). I can't do a perfect imitation of Trump, either, but I'm pretty sure I can beat him on a bunch of intelligence metrics.
The final subtlety I want to talk about with the Turing Test, compared to Turing-completeness in theoretical computer science, is that Turing-completeness can be formally proven. We can show how to emulate one system with another. On the other hand, the Turing Test is an experiment.
Turing was a fantastic theorist, but not particularly practical. This is a man who hid his savings during World War II by burying silver bars and subsequently lost them, and committed suicide with a cyanide-laced apple. I understand he did not get on well with the more practical computer builders at Cambridge. While he engineered things, I don't see evidence of a scientific mindset, and thus he did not look at The Imitation Game as a proper scientific experiment.
This oversight has plagued the Turing Test to this day.
What would the Turing Test look like as a scientific experiment? To be fair, Turing at least has an experiment and a control, by testing both a machine and a human. The hypothesis is that some machine is intelligent. The aim is to disprove the hypothesis. If we fail to disproce the hypothesis, we haven't shown it to be true, but we have gathered evidence to improve our confidence that it might be. We can construct increasingly elaborate experiments to stress the hypothesis further and further.
All this means that passing bar for the test should not be "a human thinks it's human", but that a set of experts, constructing increasingly elaborate sets of questions, including the feedback from previous rounds of interrogation, cannot put together evidence the machine is not human.
The naive Turing Test is clearly passed now, but it was passed decades ago with Eliza, too. By setting the bar so low, it encourages people to dismiss the actual progress over the years. It leads to conversations about how easily humans are fooled and all kinds of other distractions and confused arguments.
The scientific Turing Test has not been passed. People are able to find ways of making LLM models give distinctly non-human answers. In other words, the scientific Turing Test reflects reality. The iteration of finding increasingly complicated questions with which to distinguish humans and machines makes our progress clear - for an expert questioner, the gap between Eliza and ChatGPT is glaring, and the quality differences across generations of GPT pretty obvious.
Phrasing the Turing Test in terms of a scientific experiment has its own dangers. Focusing too heavily on the falsifiability aspect lets people claim that we can never prove that a machine is intelligent... but really that's just the same argument that you can never tell if any human you meet is intelligent. However, taken in moderation, it gives us a practical and thoughtful approach to assessing machine intelligence.