Tachyum Cuts Cost of Large Language Models Up to 100x Bringing them to Mainstream
November 14 2023 - 9:55AM
Business Wire
Tachyum® today announced the release of a white paper detailing
how to use 4-bit Tachyum AI (TAI) and 2-bit effective per weight
(TAI2) formats in Large Language Models (LLM) quantization without
accuracy degradation. Tachyum hardware also enables workable LLM
with 1-bit per weight with higher degradation than TAI2 and its AI
scientists are continuing to further improve performance to reduce
degradation as Tachyum looks to bring it to mainstream.
Tachyum addresses massive LLM models with capabilities that have
dramatically increased by more than a thousand times over the past
few years. Examples of these increases include ChatGPT-3.5 LLM with
175 billion parameters, the PALM LLM with 530 billion dense
parameters and the Switch Transformer with 1.6 trillion sparse
parameters.
For example, a 1.6 trillion parameters Switch Transformer would
require 52x NVIDIA H100 80GB GPUs at $41,789 each + 7 x $25,000 for
Supermicro GPU servers = $2,348,028. In contrast, a $23,000 single
Prodigy socket system with 2TB DDR5 DRAM could fit and run such big
models and bring them into the mainstream for generative AI
applications.
AI systems built on Prodigy universal chips with 256PB DDR5 DRAM
(Dynamic Access Random Memory) using FP8 (8-bit floating point) and
4-bit Tachyum AI (TAI) data formats can fit up to 100 quadrillion
parameter models. It can serve more than 150,000x ChatGPT models or
610,000x PALM2 models and represents huge possibilities for using
LLM as a mainstream technology in various industries from retail
and e-commerce, marketing, finance, cyber security, military to
healthcare including faster drug development or practical
implementation of personalized medicine in hospitals.
Effective deployment of LLMs requires low-bit quantization to
minimize model size and inference cost. Low-bit integer formats,
like INT8 and INT4, have been the conventional choice, however, the
emerging low-bit exponential formats offer a compelling
alternative. At reasonable costs, LLMs could be deployed by
enterprises small to large across a variety of industries. LLMs
could be an integral part of an organization’s web presence to
provide an interactive experience, such as enabling the ability of
visitors to ask questions naturally vs. entering search terms.
“By combining TAI 4-bit and effective 2-bit weights with FP8 per
activation, we are capable of quantizing LLMs without much accuracy
degradation,” said Dr. Radoslav Danilak, founder and CEO of
Tachyum. “Our techniques avoid expensive multiplication while
simultaneously reducing the size of the model by 4x to 8x, enabling
generative AI models that can be applied in use cases from complex
language modelling tasks, text generation, drug and chip design,
few-shot learning and reasoning to protein sequence modelling.
Whole new avenues of calculations can be opened with Tachyum
AI.”
As a Universal Processor offering industry leading performance
for all workloads, Prodigy-powered data center servers can
seamlessly and dynamically switch between computational domains
(such as AI/ML, HPC, and cloud) with a single homogeneous
architecture. By eliminating the need for expensive dedicated AI
hardware and dramatically increasing server utilization, Prodigy
reduces CAPEX and OPEX significantly while delivering unprecedented
data center performance, power, and economics. Prodigy integrates
192 high-performance custom-designed 64-bit compute cores, to
deliver up to 4.5x the performance of the highest-performing x86
processors for cloud workloads, up to 3x that of the highest
performing GPU for HPC, and 6x for AI applications.
Those interested in reading “Mainstreaming Large Language Models
With 2-bit TAI Weights” can download a copy at
https://www.tachyum.com/resources/whitepapers/2023/11/14/mainstreaming-large-language-models-with-2-bit-tai-weights/.
Follow Tachyum
https://twitter.com/tachyum
https://www.linkedin.com/company/tachyum
https://www.facebook.com/Tachyum/
About Tachyum
Tachyum is transforming the economics of AI, HPC, public and
private cloud workloads with Prodigy, the world’s first Universal
Processor. Prodigy unifies the functionality of a CPU, a GPU, and a
TPU in a single processor to deliver industry-leading performance,
cost and power efficiency for both specialty and general-purpose
computing. As global data center emissions continue to contribute
to a changing climate, with projections of their consuming 10
percent of the world’s electricity by 2030, the ultra-low power
Prodigy is positioned to help balance the world’s appetite for
computing at a lower environmental cost. Tachyum recently received
a major purchase order from a US company to build a large-scale
system that can deliver more than 50 exaflops performance, which
will exponentially exceed the computational capabilities of the
fastest inference or generative AI supercomputers available
anywhere in the world today. When complete in 2025, the
Prodigy-powered system will deliver a 25x multiplier vs. the
world’s fastest conventional supercomputer – built just this year –
and will achieve AI capabilities 25,000x larger than models for
ChatGPT4. Tachyum has offices in the United States and Slovakia.
For more information, visit https://www.tachyum.com/.
View source
version on businesswire.com: https://www.businesswire.com/news/home/20231114190013/en/
Mark Smith JPR Communications 818-398-1424 marks@jprcom.com