Air Street Capital and RAAIS founder Nathan Benaich and AI angel investor and UCL IIPP visiting professor Ian Hogarth are back for more after publishing what could well be the most detailed study on the State of AI in 2019.
Benaich and Hogarth outdid themselves in the State of AI study 2020 published today. Although the report ‘s structure and themes remain largely unchanged, its size has increased by almost 30 percent. This is a lot, particularly considering that their 2019 AI report on all AI stuff was already a 136 slide long journey.
The State of AI Report 2020 is 177 slides long, covering technical breakthroughs and their capabilities, the availability, demand and concentration of talent in the industry, broad platforms, funding and application areas for AI-driven innovation today and tomorrow, special parts on AI strategy, and AI forecasts.
AI democratization and industrialization: open code and MLOps
We started out by addressing the reason for such a major commitment, which
Benaich and Hogarth confessed to having taken up a considerable amount of their time.
They discussed their feeling that their collective experience and presently held positions
in business , science, investment and policy give them a unique perspective.
The way they link the dots and offer something of value
back to the AI ecosystem at large is to produce this paper.
Coincidentally, a couple of days back, Gartner ‘s 2020 Hype cycle for AI was also announced. Gartner considers what it calls two megatrends in 2020 that control the AI landscape — democratization and industrialization. Some of the results of Benaich and Hogarth were about the huge cost of training AI models and minimal testing availability. This seems to contradict the stance of Gartner, or at least to suggest another concept of democratization.
Benaich acknowledged that numerous ways of looking at democratization exist. The degree to which AI research is transparent and reproducible is one of them. As the findings of the duo show, it is not: just 15% of AI research papers publish their code, and since 2016, that has not changed much.
Hogarth added that AI has historically had an open ethos as an
academic sector, but the ongoing adoption of the industry is changing that.
Companies are hiring more and more researchers (another trend covered by the report),
and when businesses continue to keep their IP, there is a clash of cultures. OpenAI and DeepMind are prominent organizations criticized for not publishing code:
“Without a kind of big backlash, there’s just so close that you can get. But at the same time, I think the evidence clearly shows that when it’s convenient, they’re still finding ways to be close,” Hogarth said.

Benaich and Hogarth pointed to their observations in terms of MLOps as far as industrialization goes. The equivalent of DevOps for ML models is MLOps, short for machine learning operations: bringing them from development to output, and managing their lifecycle in terms of changes, corrections, redeployments, and so on.
The duo pointed out that some of the more successful and fastest growing Github projects in 2020 are linked to MLOps. Hogarth also added that it is potentially easier to get started with AI today for start-up founders, for instance, than it was a few years ago, in terms of tool availability and infrastructure maturity. But when it comes to training models like GPT3, there is a difference:
The bar is probably higher in terms of compute requirements if you wanted to start a sort of AGI research company today. Especially if you believe in the scale hypothesis, the idea of taking approaches like GPT3 and continuing to scale them up. Without large amounts of capital, that will be more and more expensive and less and less accessible to new entrants.
The other thing that can be done by organizations with very large amounts of capital is to conduct lots of experiments and iterate in large experiments without worrying
too much about the cost of training. So if you have more capital, there is a degree to which you can be more experimental with these large models.
Obviously, that biases you slightly towards these approaches of almost brute
force of just applying more scale, capital and data to the issue. But I believe that if you buy the scaling hypothesis, then that’s a fertile area of progress that should not be rejected just because at the heart of it there is no deep intellectual insight.
How to compete in AI
This is another key finding of the report: the hottest area of AI today is dominated by huge models, large firms and massive training costs: NLP-Natural Language Processing. Based on Google et. release variables. The cost of training NLP models was estimated by research at about $1 per 1000 parameters.
That implies that it could have cost tens of millions to train a model such as OpenAI’s GPT3, which has been hailed as the latest and greatest achievement in AI. The likely budget, experts suggest, was $10 million. That clearly demonstrates that not everyone can aspire to produce anything like GPT3. The question is, is there a different way?
Benaich and Hogarth think so, and to showcase, have an example.
PolyAI is a Voice Assistant company based in London. They created an open source conversational AI model (technically, a pre-trained transformer-based
contextual re-ranker) that outperforms the BERT model of Google in conversational applications.
Not only does the PolyAI model perform much better than Google’s, it needed a
fraction of the parameters to train, which also meant a fraction of the cost.

The obvious question is, how did PolyAI do it, because for others, this might also be inspiration. Benaich noted that the task of identifying intent and understanding what someone on the phone is trying to accomplish by calling is solved in a much better way by treating this issue as what is called a problem of contextual re-ranking:
“That is, we can design a more suitable model that can better learn customer intent from data than just trying to take a general purpose model, in this case BERT, given a kind of menu of potential options that a caller may be trying to achieve based on our understanding of that domain.
In different conversational applications, BERT can do OK, but simply has no kind of
engineering guardrails or nuances of engineering that can make it robust in a real-world domain. You actually have to do more engineering in order to get models to work in production than you have to do research. And engineering, almost by definition, is not of interest to most researchers.’
Long story short: you know your domain better than anyone else. You can do more with less if you can document and make use of this knowledge and have the necessary
engineering rigor. This again pointed to the topic of using domain knowledge in AI. This is what critics of the approach to brute force, also known as the
“scaling hypothesis,” point to.
Simplistically put, what the proponents of the scaling hypothesis seem to think is that intelligence is an emerging phenomenon related to scale. Therefore, if models such as GPT3 become large enough at some point, complex enough, the holy grail of
AI, and perhaps science and engineering as a whole, artificial general intelligence (AGI), can be accomplished by extension.
On the way to general AI?
At least as much about philosophy as it is about science and engineering is how to make progress in AI, and the topic of AGI. It is approached in a holistic manner by Benaich
and Hogarth, encouraged by criticism of models such as GPT3. Gary Marcus is the most prominent critic of techniques such as GPT3. In his critique of models predating GPT3, Marcus has been consistent, as the approach of “brute force” does not seem to change regardless of scale.
Benaich referred to the critique of Marcus, summing it up. GPT3 is an amazing language model that can take a prompt and produce a text sequence that is legible and understandable and relevant to what the prompt was in many instances.
Moreover, we should add, GPT3 can even be applied to other domains, such as
writing software code, which is a subject in and of its own, for example.
There are numerous instances, however, where GPT3 is obvious, either in a way that expresses bias, or it just produces irrelevant outcomes. An interesting point is how we can measure models such as GPT3 for their performance. In their report, Benaich and Hogarth note that existing NLP benchmarks, such as GLUE and SuperGLUE, are now being followed by language models.
These benchmarks are intended to compare the performance of AI language models against humans in a variety of tasks that cover logic, understanding of common sense, and lexical semantics. The human baseline in GLUE was beaten by 1 point a year ago. GLUE is reliably beat today, and its more challenging SuperGLUE sibling is almost beat, too.

In a number of respects, this can be interpreted. One way would be to say that models of AI language are just as good now as human beings. The kind of shortcomings that Marcus points out, however, show this is not the case. What that means, perhaps, is that we need a new benchmark. A new benchmark has been published by researchers from Berkeley, which tries to capture some of these problems across different assignments.
Benaich noted that the debate around PolyAI is related to an interesting extension to what GPT3 could do. It is the element that allows the model to inject some kind of toggles to have some guardrails, or at least tune what kind of outputs it can generate from a given input. There are various ways you may be able to do this, he went on to add.
The use of knowledge bases and knowledge graphs has been discussed previously.
Some kind of learned intent variable that could be used to inject this kind of control over this more general purpose sequence generator was also mentioned by Benaich.
In order to make them useful in production environments, Benaich believes that the critical view is certainly valid to some degree, and points to what models such as
GPT3 could use.
Causality, the next frontier in AI
For his part, Hogarth noted that Marcus is “almost an expert critic of organizations such as DeepMind and OpenAI.” While having those critical perspectives when there is a reckless hype cycle around some of this work is very healthy, he continued to add, OpenAI has one of the more thoughtful policy approaches around this.
Hogarth stressed the underlying difference between proponents and critics of the scaling hypothesis in philosophy. If the critics are wrong, however, he went on to add, then we may have a very smart but not very well adjusted AGI on our hands, as shown by some of these early instances of bias as you scale these models:
“So I think it’s up to organizations like OpenAI if they’re going to pursue this strategy to tell us all how they’re going to do it safely, because it’s not yet apparent from their research agenda. How do you marry AI security with this kind of throwing more data and computing to the issue and AGI’s going to emerge.”
Another part of the State of AI Report 2020 was touched upon by this discussion.
Benaich and Hogarth noted that some researchers feel that progress in mature areas of machine learning is stagnant. Others call for the advancement of causal reasoning and argue that it could overcome obstacles by adding this element to machine learning approaches.
Causality, Hogarth said, is at the heart of much of human progress, arguably. Causal reasoning has given us the scientific method from an epistemological perspective,
and it is at the heart of all of our best world models. So it is exciting the work that individuals like Judea Pearl have pioneered to bring causality to machine learning.
This feels like the greatest potential disruption to the general trend of larger and larger models driven by correlation:
I think you can start building a fairly powerful scaffolding of knowledge on knowledge if you can crack causality, and have machines begin to really contribute to our own
knowledge bases and scientific processes. So I think it’s very exciting. There’s a reason why some of the smartest people in machine learning spend weekends and evenings working on it.
But I think it’s still in its infancy as the commercial community’s area of attention.
In fact, in our report this year, we only found one or two examples of it being used
in the wild, one by faculty at a machine learning company based in London and one by BenevolentAI.’
If you thought that AI research and applications for one report were sufficiently cutting-edge, you’d be wrong. The State of AI Report 2020 is a set of references, and with more insights from Benaich and Hogarth, we’ll revisit it soon.
No Comments
Leave a comment Cancel