Putting the silicon behind the hype

Artificial intelligence (AI) is becoming increasingly versatile. Last year was typified by progress towards a landscape in which enterprises can automate a wider variety of functions more easily, with the potential to lower costs.

The robotics sector is a good example of where versatile AI could pay dividends. In December 2020, a research paper in the academic journal Science Robotics demonstrated how a four-legged robot was able to exploit multiple precursor AI systems to effectively improvise, adapting to changes in its workload on the fly.

The approach, dubbed multi-expert learning architecture, is the result of collaborative research between University of Edinburgh in the UK’s School of Informatics and Zhejiang University in China’s Institute of Cyber-Systems and Control.

Robots using multi-expert learning would be trained over multiple stages. First, a computer simulation shows two distinct neural networks how to perform the robot’s basic mobility functions: how to move across the ground,and how to recover after falling over.

Another eight neural networks then learn how to execute specialised motor skills, such as rolling over or turning left or right, with an AI supervisor, known as a gating network, deployed to synthesise combinations of the neural nets to maximum effect, according to Singularity Hub.

Michael Rovatsos, professor of AI at University of Edinburgh’s School of Informatics and director of its data science and AI-focused Bayes Centre, told GCV: “The need to engineer and train bespoke AI systems to solve specific, narrow problems is still a major cost factor for businesses.

“This research demonstrates we are making tangible progress in terms of recombining and repurposing these components by developing more generic solutions. I anticipate increased versatility will be the focus of much AI research over the next few years and will help remove significant roadblocks for AI adoption.”

In another landmark for versatility in AI, DeepMind, the Google-owned advanced automation lab situated in London, completed a software program that is able to play strategy games such as chess against humans without being shown the rules.

Earlier variants of DeepMind’s technology could learn and master the strategy board game Go. However, the recent iteration – MuZero – can autonomously figure out rules without trial and error meaning the algorithm could be repurposed to explore other environments without first being fed the dynamics.

The developments foreshadow a generation of AI poised to overcome the biggest barriers facing current technologies. These are difficulties in mixing and adapting skills at levels approaching human cognition.

It has been three and a half years since Jensen Huang, Nvidia’s president and chief executive, first predicted AI was to experience a “Cambrian Explosion”, referencing the evolutionary period in which predecessors to most major groups of animals began to appear in the fossil record.

And versatility has certainly brought AI closer to its own Cambrian threshold, despite GCV Analytics data that suggest CVC-backed deals in the space tailed off slightly last year.

Total deals fell to 316 in 2020 from 334 the previous year, with dollar amounts dropping by approximately one quarter to $15.5bn from $20.5bn.

To realise versatile AI’s full potential, there has been a need for more efficient computer chip technologies both in the cloud and on end-devices. This would match the progress of ingenious new algorithms and data techniques.

After a period of sustained research and development (R&D) and corporate venture-driven innovation, and with a multiplicity of accelerated chipsets for data centres to choose from, it is up to the market to determine which products are best. It seems the semiconductor industry has responded, in a bid to put its silicon behind the hype.

Nvidia lays down a marker

Nvidia pulled the rabbit from a hat in May 2020 with the announcement of its new marquee AI-accelerated graphical processing unit (GPU) for the data centre – the A100 – offering respite in a challenging year that saw the industry battling stock shortages due to the coronavirus and spikes in demand.

Deprived of his usual stage at the San Jose Convention Center, an exultant Huang nevertheless gave a “kitchen keynote” in a teleconferenced edition of Nvidia’s flagship GTC event to give the press an in-depth look at the new product.

The A100 processor board contains up to eight connected GPUs. It weighs in at around 50 pounds – lifting it would require roughly the same effort as a small mattress – and builds on Nvidia’s latest Ampere processor design.

Data centres can buy the A100 with GPU systems each containing either 40GB or 80GB of random-access memory (RAM) – the volatile component that feeds data to the computer while it is in operation, but which wipes all information if it is switched off.

The 80GB variant also uses the second-generation high-bandwidth memory standard (HBM2) to enable faster data transfers. This is particularly crucial for shuttling deep learning parameters from memory storage into the processor.

Both variants of the A100GB also use 7-nanometre chip architectures offered by one of Nvidia’s foundry partners, Taiwan Semiconductor Manufacturing Company (TSMC). Smaller chips generally enabling transistors to operate more efficiently, because their electric signals have less distance to travel.

A100 processing units sport an astronomical 54 billion transistors, while Nvidia has broken with its peers in the industry with Ampere by attempting to address data centre and consumer-focused applications from a single architecture design.

When employed in the deep learning-only variant of A100 – the DGX-A100, described by Huang as “the ultimate instrument for advancing AI” – the result is an advertised five petaflops of compute performance. (“Flops” are floating point operations per second, a measure of performance based on calculations which use approximate “floating-point” arithmetic, as a trade-off between range and precision.)

The five petaflops figure is based on algorithms that employ 16 bits, as opposed to 32 bits for standard deep learning models – a technique known as FP16. These models sacrifice precision as a trade-off for squeezing more out of available memory and bandwidth.

Nvidia will feel the A100 launch is a solid platform from which to cement its position as the biggest provider of accelerated GPUs from cloud-based servers. In its earnings update for the quarter ending October 2020, it said data centre revenues had risen 162% year-on-year, hitting a new high of $1.9bn.

According to Forbes, the big four cloud services – Amazon Web Services (AWS), Google Cloud, Alibaba Cloud and Microsoft Azure – used Nvidia GPUs for all but a handful of virtual high-performance computing machines accelerated for fundamental workloads – known as infrastructure-as-a-service (IaaS) instances – in
May 2019.

Of the 2,000 cloud IaaS services that required dedicated accelerators, Forbes said Nvidia GPUs processed 97.4%, compared with just 1.0% through GPUs from its rival AMD, and 1.6% on chipsets from both Xilinx and Intel combined.

The economic case

The launch of Ampere and the A100 is not just about delivering raw power for AI development. It also states Nvidia’s intent to push the economic case for its products to public clouds and AI developers.

A100 can be split into up to seven separate instances to allow public clouds to ration excess capacity to separate clients, which can also have processing requirements adapted to their specific workloads.

Nvidia has pitched the upgrades as capable of teaching AI to learn new tasks – the process known as AI training – and execution to specific applications, a sector called AI inference.

With some competitors looking to introduce AI inference-specific chips, Nvidia has declared the A100 can compute seven times as many sequences per second as its predecessor, the V100, on the popular natural language processing model Bert.

Hypothetically, the A100 could be used for Bert-driven inference at the equivalent of seven separate V100 processors,when rationed across different clients.

The focus on inference is a shrewd move as the market is growing, driven by more demand for complex AI models and the emergence of highly-reactive use-cases, such as automotive and healthcare. In an earnings call early last year, Nvidia revealed AI inference spending on data centres surpassed AI training in 2019, according to industry paper EETimes.

Cem Dilmegani, founder of AI-industry analyst centre AIMultiple, explained: “Inference workloads are an exciting area for the AI chip companies, because there will be a massive switch from current loads using [cheaper] CPUs as the typical AI model gets more complex and whole new models are produced.

“Companies selling AI inference chips can easily make the case for savings by switching to inference-focused workloads. If you are a large company, and you utilise a more complex machine learning workload, then it will be more efficient to use inference chips.”

Challenging the gold standard

GPUs produced by Nvidia and its competitors offer distinct advantages for AI because they can execute a horde of computational sums simultaneously, using a technique known as parallelism.

These sums initially helped render 3D computer graphics for applications such as video games. They have, however, had their capacity boosted massively in order to facilitate AI modelling.

Beyond AI-accelerated GPU units, Nvidia held a 56% market share in the overall GPU space in 2019 thanks also to its gaming products, according to industry research consultancy T4, followed by AMD with 26% and Intel with 18%.

Dilmegani said: “The reason GPUs are better than standard computer processing units is due to parallelism. In a standard computer, you run a relatively limited number of applications – and only one of them at a time would be used intensively.

“In true parallel processing, it is not like that – you need multiple processing units all going at top speeds to solve different parts of the problem. Then, you need to combine the results to solve the problem.”

Supported by its corporate venturing unit, Nvidia GPU Ventures, the company will be looking to extend its reach to more emerging developers by offering synergies with its ecosystem, including through cloud credits, units of measurement that govern how much resource cloud-driven AI tasks can use.

It may, however, have to work to help public clouds upsell accelerated services to smaller clients who prefer using virtual workstations containing only non-accelerated computer processing units, often to execute less complicated machine learning workloads.

Also, the size of top-drawer deep learning models is exponentially increasing. Looking at AIMultiple’s analysis of each iteration of OpenAI’s GPT text generation system, the latest model – GPT-3 – has a whopping 175 billion parameters from 110 million for the initial release back in mid-2018.

This side of the equation provides scope for Nvidia’s challengers to strike back with products that aim to at least match the firm on performance, at least to any noticeable degree, and possibly at a lower overall price point.

Intel will certainly regard it as a chance to prove it is adapting following the loss of its contract to supply chips to consumer technology provider Apple’s Macintosh computers last year, a setback that impacted its share price and could affect as much as 2-3% of its revenue once the transition is completed, according to the Wall Street Journal.

GCV Analytics data show Intel backed 101 deals for AI-related semiconductor technology from 2015-20, making it by far the most prolific corporate investor, ahead of Alphabet (25) and Samsung (23).

A success for Intel’s strategy came in December 2020 when AWS commissioned its Habana Labs AI training stack – powered by up to eight Gaudi tensor processing cores – to accelerate certain software programming instances on its Elastic Computer Cloud (EC2), a service targeted at scaling out cloud-hosted software applications.

The news came almost a year to the day that Intel bought Gaudi through the $2bn acquisition of Israel-based Habana Labs, catalysing earlier equity investment from its corporate venturing unit, Intel Capital.

Stacking up Gaudi against Nvidia’s latest product, the chipset has just 32 gigabytes of high-bandwidth memory – again using the HBM2 standard – against 40GB or 80GB units in the A100. It also uses standard ethernet connectivity, whereas Nvidia has leveraged proprietary standards NVLink and NVSwitch to join its processing units.

Habana Labs would counter that the product performs well against Nvidia’s earlier V100 model. An internal white paper suggests Gaudi was 3.8x faster than the V100 in applying the benchmark ResNet-50 image recognition training model when scaled up to around 650 parallel processors, running through over 800,000 images per second, versus 218,300 for a comparable V100-based cloud system.

AWS has said its own internal audit also put the Gaudi in front of GPUs, on a price-performance basis for executing EC2 tasks, and although the company did not confirm which GPU model was tested, it suggests the cloud provider is courting rivals to Nvidia’s hegemony on AI-accelerated workloads.

Gaudi’s emphasis on price-performance becomes clearer when you consider the processor uses 16-nanometre chips, larger than Nvidia’s A100 and some emerging competitors.

Intel is still unable to support 7-nanometre production – it has announced its manufacturing facilities will not be ready until 2022 at the earliest -–however it would argue Gaudi already possesses enough firepower in the rest of its architecture.

Intel manufactures the product in-house, unlike Nvidia and other rivals that outsource to fabrication partners, and that has arguably put it at a disadvantage in bringing forward new AI chip technologies internally. Habana Labs also offers its own chip focused on AI inference tasks, branded Goya, based on similar technology to Gaudi.

Graphcore leads the rest

Although AI chips is a highly innovative sector, there are significant barriers to entry without allying to a well-resourced industry partner.

With Habana Labs now making some inroads – and, remember, Intel is now on its second bite at the cherry, having reportedly retired AI-accelerated chips from an earlier Intel Capital bet, Nervana, in February 2020 – there is less market scope for emerging VC-backed challengers.

One of the favourites is Graphcore, a UK-based, AI-accelerated chip maker whose corporate investors include vehicles for computer software publisher Microsoft and computing equipment maker Dell.

Cloud services owned by the two firms both offer the first generation of Graphcore’s intelligent processing unit (IPU) for certain workloads, and this has provided leverage for the launch of its successor – the Colossus Mk 2 IPU.

Graphcore’s new chip is assembled by TSMC in 7-nanometre layers and possesses 900 megabytes of “in-processor” memory, which in practice means providing more access to memory hosted on-board the processing cores.

The design principles for the Colossus Mk 2 favoured performance over cost reduction, an approach Graphcore says makes sense because AI training models are getting larger and more complex, with price increases less noticeable when deployed at scale in the cloud.

Graphcore has again implemented its Poplar software stack for the second-generation IPU, aiming to complement the tricks in its hardware with developer-oriented features such as sparsity optimisations, which it says can help AI models produce more-effective parameters. The UK-based outfit has been rewarded by continued strategic and financial investment, including $222m of series E funding banked at a $2.8bn valuation in December, making it one of the hardware industry’s unicorns.

Pierre Socha, an early-stage investment partner at venture capital firm Amadeus Capital Partners, told GCV: “In terms of pricing, Graphcore’s IPU-M2000 and IPU-Pod64 systems offer far more compute power per dollar than GPU-based systems.

“While price is an important consideration, customers also recognise the need to invest in a technology that will enable the continued development of AI and use of emerging models.

“They are acutely aware of the architectural limitations of GPUs and how those risk limiting their ambitions.”

Room at the inn

Investors must judge whether Graphcore has sealed off the challenge to Nvidia, at least as far as accelerated AI training chips are concerned.

Cyril Vančura, investment partner at Imec.xpand, the venture fund linked to Imec, a Belgium-based research institute focused on nanoelectronics and hardware technologies, is among those
who believe the opportunity for early-stage AI chip companies to join Graphcore might be closing.

Vančura was previously on the investment team at one of Graphcore’s corporate backers, Robert Bosch Venture Capital, the corporate venturing arm of Germany-based industrial equipment maker Robert Bosch.

He said: “It has clearly been the AI sector that has driven the investor cycle in the overall semiconductor space over the past five years.Most of that money went into semiconductor startups went into AI and accelerators – so the first wave, companies like Graphcore, were looking to challenge architectures for AI application areas such as data centres, going head-to-head with companies like Nvidia.

“I think that has now changed. We are seeing different architectures that target the edge, both for inference and for learning. These have completely different requirements, especially for power. And I think this is now where the money is largely going in.”

AIMultiple’s Dilmegani agreed consolidation was likely. He said: “Nvidia has the leadership [at present], but it could be that one of these smaller companies becomes successful. It might be that Nvidia buys one of its smaller competitors once it gains enough market share.

“One thing is for certain, and that is eventually there will be consolidation. We do not expect 10 chip vendors to be sharing the market.”

There are perhaps a few more things to consider.

For one, East Asian foundries have built up a rich chip manufacturing ecosystem and are now looking to make a move up the value chain with AI-specialised designs.

TSMC’s R&D is crucial, for a start. The firm launched its 7-nanometre production process in April 2018, according to Yicai Global, which reported it had produced 1 billion of the chips as of August 2020. According to market research firm Counterpoint Research, TSMC held a 28% share of the global semiconductor production market last year, and its capex on smaller chips contrasts with postponed spends at some of its peers.

The firm, however, faces a threat from China-based foundries, which have reportedly lured more than 100 engineers and managers from the company, supporting Beijing’s ambitions to achieve self-sufficiency in chip manufacturing.

China-based AI chip business Enflame Technology is one company targeting growth, having recently racked up $279m of series C funding from investors including internet group Tencent.

Tencent currently owns a 23.2% stake in Enflame, whose series C round valued the business at about $735m, according to local media.

Enflame’s deep-thinking unit (DTU), a deep learning-focused acceleration chip unveiled in December 2019, sports 32 processor cores underpinned by a low-voltage framework. Delivered in partnership with US-based foundry GlobalFoundries, Enflame believes it has hit on a design that will enable it to continue using 12 nanometre-sized chips “long into the future”.

Tencent is working alongside Enflame Technologies on unspecified business cases, perhaps strengthening its own cloud service Tencent Cloud to compete with the larger operation at peer Alibaba. Alibaba Cloud (Aliyun) became profitable in the quarter ending December 2020 and has worked on its own AI chips through its subsidiary, Pingtouge Semiconductor. Alibaba unveiled its first AI chip, Hanguang 800, in September 2019 and updated a year later on its 8-core NPU and 96-core vCPU performance though Aliyun could outperform GPUs for specific applications.

Earlier-stage innovation

As Vančura suggested, VC appetite may increasingly pivot to specific use-cases, such as deploying AI inference on the edge through end-devices like sensors or intermediary networking gateways or to products positioned to alter the way chips are manufactured, targeting the wider production ecosystem.

GCV data showed the number of CVC-backed deals for AI and machine learning-related semiconductor businesses remained static year-on-year in 2020, at 347. However, the equity funding deployed in those rounds fell to $17.7bn from $21.4bn, with the Asia-Pacific region experiencing a heavier fall in dollar amounts than North America.

Imec.xpand portfolio company Ferroelectric Memory Company (FMC) is one business looking to push the frontiers of chip design, aiming to bring to market non-volatile storage that could eventually allow memory cache capabilities to be installed within the core of computer processors – known as the microprocessor. Unlike RAM, non-volatile storage retains its data even once it has been switched off.

FMC’s product leverages a di-electric compound called hafnium oxide that is heated to produce crystalised films that are conducive to memory storage. As hafnium oxide is already used in certain semiconductor components, as well as in dynamic RAM units, FMC’s material should play well with existing technologies.

AI-related applications could include arithmetic and logical operations. This would potentially embed the basis for AI closer to the nexus of execution. FMC, however, stressed it was targeting a wide range of use-cases, rather than solely focusing on automation.

Ali Pourkeramati, chief executive at FMC, which is also backed by Robert Bosch Venture Capital, computer memory chip producer SK Hynix, and corporate venturing units for pharmaceutical firm Merck Group and electronics and semiconductor firm TEL, said: “If you look at [existing microprocessor memory product] E-flash, it adds eight to 18 layers or masks [the sets of materials used to make processors], which becomes detrimental to the logic process when you want go [smaller] than 28 nanometre-sized chips.”

Meanwhile, IBM is working on “in-memory chips” that have algorithmic weights embedded into their non-volatile memory storage to accelerate calculations. Algorithmic weights are important variables used to train up deep neural networks, determining how much credibility to assign to each result of learning.

Abu Sebastian, a distinguished research staff member and manager at IBM, concluded: “Energy efficiency and latency are big concerns from the cloud. The reason why many workloads are currently running on the cloud as opposed to the edge is because the edge may not provide the computer resource, but we believe in-memory computing can make AI at the edge practical.”