Abacus Semiconductor Corporation, Kloth Architecture, Next-Gen Compute for AI & HPC, Math Processor

Math Processor

Abacus Semiconductor has developed a math processor to accelerate the execution of mathematical functions typically used in traditional HPC as well as in Artifical Intelligence (AI, including Generative AI) and Machine Learning (ML). Unlike existing math accelerators such as GPGPUs, we support all relevant data types natively (integer and floating-point of a variety of word widths, and we also have added RISC-V cores to assist in exception handling and translation between high-level languages and native instructions. We have also added tags to results to bring floating point math precision closer to Prof. John Gustafson’s work on POSITs. While in AI data is usually computed in more limited word widths compared to traditional HPC, the underlying math is exactly the same.

The Math Processor is a massively parallel processor that accelerates all openCL and openACC applications. It connects to the rest of the system via our UHI ports, to implement our Heterogeneous Accelerated Compute paradigm, and internally all cores follow our beyond-von-Neumann and beyond-Harvard core architecture. Like all of our other products, it uses the HRAM Smart Multi-Homed Memory subsystem to share data. It supports large-scale memory systems with configurable coherency domains.

These processors are part of our Heterogeneous Accelerated Compute initiative. In this novel system architecture for AI and HPC, all processor and accelerator cores are connected directly to each other, even across packages and from processor to accelerator and vice versa, to facilitate lower-latency communication across all cores, extending our Kloth Architecture to beyond-von-Neumann and beyond-Harvard CPU scalability and seamless integration. More information about this novel architecture is available at the USPTO under patent US 2025/0036589 A1. This patent extends to the integration of our Smart Multi-Homed Memory as well, to enable shared memory and the Message Passing Interface (MPI).

Using existing math APIs will help replacing handwritten C or assembly code for better performance, easier portability and drastically simplified maintenance while providing better performance and scalability, along with higher precision and accuracy.

Our Math Processors can be used in traditional HPC such as FEA/FEM, solving n-body problems, for Computational Fluid Dynamics (CFD), mechanical, thermal and electrical simulations including signal integrity and power integrity problems, modeling and a variety of other applications, and in more modern HPC fields such as in AI for training and inference. Transforms, vector and matrix math as well as tensor math are specifically implemented in an optimized manner.

We paid special attention to Fused-Multiply-Accumulate (FMA) performance across a wide range of lengths of sequential FMAs in indexed data entities, which is a prerequisite for matrix and tensor operations, and for transforms such as Fourier Transforms or their discrete versions, Discrete Fourier Transforms. While others focus on the FMA alone, we made sure we avoid collisions and contention in data access to the elements that need to be fetched and written back.

These comprehensive subsystems include the Math Processors, firmware, software and APIs as well as SDK plugins. With regards to APIs we support our own native data and instruction formats as well as openCL and openACC. For applications that rely on CUDA we support those function calls and translate them into our native data and instruction formats. The ASC-MP is also referred to as the ASC29400 family.