DCT

7:24-cv-00221

Neural Ai LLC v. NVIDIA Corp

Key Events

Amended Complaint

I. Executive Summary and Procedural Information

Parties & Counsel:
- Plaintiff: Neural AI, LLC (Texas)
- Defendant: NVIDIA Corporation (Delaware)
- Plaintiff’s Counsel: Cherry Johnson Siegmund James PLLC; King & Spalding LLP
Case Identification: 7:24-cv-00221, W.D. Tex., 12/12/2024
Venue Allegations: Plaintiff alleges venue is proper in the Western District of Texas because Defendant NVIDIA has a regular and established place of business in the district, including a large office in Austin, employs hundreds of individuals there in engineering and sales roles, and has committed acts of infringement within the district.
Core Dispute: Plaintiff alleges that Defendant’s GPU accelerators, superchips, and associated software platforms infringe patents related to methods for efficiently coordinating computations and data transfers between a central processing unit (CPU) and a graphics processing unit (GPU).
Technical Context: The technology concerns GPU-accelerated computing, a foundational element of modern artificial intelligence, machine learning, and high-performance computing that leverages the parallel processing power of GPUs to handle complex computational tasks.
Key Procedural History: The complaint alleges that Defendant NVIDIA has been aware of the patented technology since at least 2007 through direct discussions with the inventors. It further alleges that NVIDIA cited the application for the lead patent in its own patent portfolio in 2010 and engaged in investment or acquisition discussions with the inventors' company, Neurala, Inc., in 2016 and 2017, during which the patents were discussed. The '438 and '461 patents are reissues of the '867 patent.

Case Timeline

Date	Event
2006-09-25	Priority Date for ’867, ’438, and ’461 Patents
2007-01-01	Approximate date of first discussions between inventors and NVIDIA CTO
2010-06-28	NVIDIA files patent application that cites the '867 Patent's application
2014-02-11	’867 Patent Issues
2016-09-06	Approximate date of investment/acquisition discussions between inventors and NVIDIA
2017-06-26	Approximate date NVIDIA received materials identifying the '867 patent family
2019-09-25	NVIDIA features inventors’ company in an “Inception Spotlight” article
2021-02-16	’438 Patent (Reissue) Issues
2023-03-14	’461 Patent (Reissue) Issues
2024-12-12	Complaint Filed

II. Technology and Patent(s)-in-Suit Analysis

U.S. Patent No. 8,648,867 - "Graphic Processor Based Accelerator System and Method" (issued Feb. 11, 2014)

The Invention Explained

Problem Addressed: Conventional computing architectures created computational bottlenecks and overhead when exchanging intermediate data between a CPU and a GPU during complex simulations (Compl. ¶3). This process could also lead to "race conditions," where different programmatic threads attempt to access or modify the same shared data simultaneously, corrupting results ('867 Patent, col. 5:60-6:31).
The Patented Solution: The patent discloses a system comprising a CPU, a GPU-based accelerator, and a specialized "accelerator controller" ('867 Patent, col. 4:38-41). This controller manages the flow of data, allowing intermediate results from a "step" of a simulation to be transferred from the GPU to the CPU for review or correction, and then transferred back to the GPU as input for the next "step," all within a single computational cycle (Compl. ¶36). This controller-driven data exchange is designed to free the CPU from managing the GPU's primitive operations ('867 Patent, Abstract).
Technical Importance: This approach enables dynamic, on-the-fly error correction and modification of large-scale simulations, which is critical for modern AI models where restarting a computation from scratch is infeasible (Compl. ¶38).

Key Claims at a Glance

The complaint asserts independent claim 16 (Compl. ¶63).
Claim 16 requires a method for performing a numerical simulation, comprising the steps of:
- Receiving, by an accelerator, first input data from a CPU.
- Transferring, by an accelerator controller, the first input data into a first partition of an accelerator memory, referenced by a first pointer.
- Performing, by at least one GPU, a calculation on the input data to generate first output data.
- Storing, by the accelerator controller, the first output data into a second partition of the accelerator memory, referenced by a second pointer.
- Swapping the first pointer with the second pointer, such that the first output data becomes an input for a second computational cycle.
The complaint does not explicitly reserve the right to assert dependent claims for this patent.

U.S. Patent No. RE48,438 - "Graphic Processor Based Accelerator System and Method" (issued Feb. 16, 2021)

The Invention Explained

Problem Addressed: As a reissue of the ’867 Patent, the ’438 Patent addresses the same fundamental problem of inefficient CPU-GPU interplay.
The Patented Solution: The ’438 patent is specifically directed to applying the disclosed hardware and firmware configurations to the task of processing the layers of an artificial neural network (ANN) (Compl. ¶39). It claims a method where a controller initializes "textures and shaders" in GPU memory, and the GPU performs computations based on these to generate outputs representing neurons in an ANN layer, enabling the system to process subsequent inputs in real-time while a computation is already underway (’438 Patent, col. 17:15-45).
Technical Importance: This method provides a framework for using GPU hardware, originally designed for graphics, to efficiently execute the layered, sequential computations inherent to artificial neural networks (Compl. ¶¶39, 97).

Key Claims at a Glance

The complaint asserts independent claim 21 (Compl. ¶96).
Claim 21 requires a method of performing a sequence of computations for an ANN, comprising:
- Receiving, at a CPU, first input data from an external system in real time.
- Initializing, by a controller, "textures and shaders" in a memory operably coupled to a GPU.
- Transferring the first input data to the GPU-coupled memory.
- Performing, by the GPU, a first computation on the input data based on the textures and shaders to generate first output data, representing a first neuron's output in a first ANN layer.
- Storing the first input and output data in the GPU-coupled memory.
- Transferring second input data into the GPU-coupled memory after the first computation has started but before a second computation begins.
The complaint does not explicitly reserve the right to assert dependent claims for this patent.

U.S. Patent No. RE49,461 - "Graphic Processor Based Accelerator System and Method" (issued Mar. 14, 2023)

Technology Synopsis

The ’461 Patent, also a reissue of the ’867 Patent, claims a method for managing the interplay between a CPU and GPU by separating their operations into two distinct "streams" (Compl. ¶39). A "user interaction stream" runs on the CPU to handle inputs and interruptions, while a "computational stream" runs on the GPU to execute the ANN layers. The invention centers on the system's ability to shift control of the data exchange between these two streams, allowing user commands to be queued and executed at appropriate times without corrupting the ongoing computation ('461 Patent, col. 15:21-16:67).

Asserted Claims

The complaint asserts independent claim 21 (Compl. ¶140).

Accused Features

The complaint alleges that NVIDIA's Grace Hopper Superchip (GH200) and its associated software, including the CUDA platform, infringe by implementing a heterogeneous programming model that separates CPU and GPU workloads and manages their interaction (Compl. ¶¶141-143).

III. The Accused Instrumentality

Product Identification

The complaint names a wide range of NVIDIA products, including its GPU accelerators (Hopper, Ada Lovelace, Ampere, Turing, Volta, Pascal, and Maxwell architectures), superchips, servers, and software platforms (Compl. ¶¶42-60). The infringement allegations focus primarily on the Grace Hopper Superchip (GH200) as a representative example, used in conjunction with NVIDIA's CUDA platform and cuDNN library (Compl. ¶¶62, 95, 139).

Functionality and Market Context

The Grace Hopper Superchip is described as a "heterogeneous accelerated platform" that integrates an NVIDIA Hopper GPU with an NVIDIA Grace CPU on a single chip (Compl. ¶¶64-65). This architecture is designed for high-performance computing (HPC) and AI workloads (Compl. ¶65). The complaint alleges that NVIDIA's proprietary CUDA platform is a "parallel computing platform and programming model" that enables the use of these GPUs for general-purpose computing tasks, including deep neural networks via the cuDNN library (Compl. ¶66). The complaint includes a photograph of the accused Grace Hopper Superchip, highlighting its integrated CPU and GPU components (Compl. p. 22).

IV. Analysis of Infringement Allegations

’867 Patent Infringement Allegations

Claim Element (from Independent Claim 16)	Alleged Infringing Functionality	Complaint Citation	Patent Citation
receiving, by an accelerator, first input data from the central processing unit;	The CUDA programming model's first step is to copy input data from host (CPU) memory to device (GPU) memory, referred to as a "host-to-device transfer."	¶68	col. 7:38-41
transferring, by an accelerator controller, the first input data into a first partition, referenced by first pointer, of an accelerator memory...;	The Hopper GPU architecture's HBM3 memory controllers manage the transfer of data into GPU memory. The CUDA platform uses pointers to reference memory addresses for data transfer functions.	¶69-70	col. 4:60-63
performing, by at least one graphics processing unit during the first computational cycle, at least one calculation on the first portion of the input data as to generate first output data;	CUDA uses "streams" to execute a sequence of commands, including copying input data to the GPU, processing it via a kernel function (e.g., "MyKernel()"), and copying the result.	¶71	col. 12:8-15
storing, by the accelerator controller, the first output data into a second partition, referenced by a second pointer, of the accelerator memory;	CUDA code allocates memory on the device (GPU) and writes the output of a matrix multiplication ("Csub") to that device memory. The cuDNN library uses a "reserve-space buffer" for transferring intermediate results.	¶72-73	col. 13:34-39
and swapping the first pointer with the second pointer at the end of the first computational cycle, such that the first output data becomes an input for a second computational cycle...	The cuDNN library implements operations that take tensors as input and produce tensors as output. NVIDIA documentation confirms that CUDA implements pointer swapping for device (GPU) pointers.	¶74-75	col. 6:15-19

Identified Points of Contention:
- Scope Questions: A central question may be whether the claimed "accelerator controller" reads on the combination of hardware (HBM3 memory controllers) and software (CUDA platform) that NVIDIA allegedly uses to manage data. The defense may argue the patent requires a single, specialized hardware controller distinct from a standard memory controller, whereas the complaint's theory appears to distribute the claimed functions across hardware and software layers.
- Technical Questions: What evidence demonstrates that the accused system's memory controllers perform the specific claimed step of "swapping the first pointer with the second pointer"? The complaint cites a developer forum post stating "A DEVICE pointer in cuda is just a C pointer" and that swapping works, which may be presented as evidence of capability but raises the question of whether the accused products perform this specific step as part of an infringing method (Compl. p. 32).

’438 Patent Infringement Allegations

Claim Element (from Independent Claim 21)	Alleged Infringing Functionality	Complaint Citation	Patent Citation
receiving, at a central processing unit (CPU), first input data acquired from an external system in real time;	The Grace Hopper Superchip's Grace CPU has up to 72 cores and receives input data via a high-speed I/O interface.	¶101	col. 17:18-20
initializing, by a controller operably coupled to a graphics processing unit (GPU), textures and shaders in a memory operably coupled to the GPU;	The Hopper GPU architecture includes texture processing clusters and HBM3 memory controllers. The complaint alleges the CUDA NPP library, used for image processing, passes image data using pointers analogous to textures.	¶105-107	col. 17:21-24
transferring the first input data received by the CPU to the memory operably coupled to the GPU;	The CUDA programming model implements a "host-to-device transfer" to copy input data from the CPU's host memory to the GPU's device memory.	¶110	col. 17:25-27
performing, by the graphics processing unit (GPU), a first computation...representing an output of a first neuron in a first layer in the artificial neural network;	The CUDA platform loads and executes a GPU program, caching data on-chip. The cuDNN library provides primitives for deep neural networks, where operations take tensors as input and produce tensors as output, representing ANN computations.	¶111-113	col. 17:28-37
storing, in the memory operably coupled to the GPU, the first input data and the first output data;	The Grace Hopper Superchip includes up to 144GB of high-bandwidth GPU memory (HBM3e). The CUDA programming model stores the results of GPU computations in this "device memory."	¶114-115	col. 17:38-40
and transferring second input data acquired from the external system in real time into the memory operably coupled to the GPU after the GPU starts the first computation and before the GPU starts a second computation...	The Grace Hopper Superchip architecture depicts the CPU receiving I/O data via PCIe-5 while being in communication with the GPU via NVLink-C2C, allegedly allowing for real-time data transfer during computation.	¶116	col. 17:41-49

Identified Points of Contention:
- Scope Questions: The dispute may center on whether the terms "textures and shaders," which are rooted in graphics processing, can be construed to cover the tensor-based operations and computational primitives (e.g., in cuDNN) used for modern ANN calculations in the accused products. The complaint points to an architectural diagram of the Grace Hopper Superchip to illustrate its features (Compl. p. 46).
- Technical Questions: Does the accused system transfer "second input data... after the GPU starts the first computation and before the GPU starts a second computation" as the claim requires? This timing-specific limitation will require evidence of the precise operational sequence within the accused heterogeneous computing platform.

V. Key Claim Terms for Construction

For the ’867 Patent

The Term: "accelerator controller"
Context and Importance: This term is the lynchpin of claim 16, as it is recited as performing the key data management steps of "transferring", "storing", and "swapping". Its construction will determine whether the accused combination of NVIDIA's hardware memory controllers and CUDA software meets the requirements of a single claimed element.
Intrinsic Evidence for Interpretation:
- Evidence for a Broader Interpretation: The specification states the controller "handles most of the primitive operations needed to set up and control GPU computation," which could be argued to encompass functions performed by a combination of hardware and driver-level software ('867 Patent, Abstract).
- Evidence for a Narrower Interpretation: The patent describes the controller as a "specialized controller" implemented on an expansion card, potentially as an FPGA or ASIC, that "frees the CPU" from control tasks, which may suggest a more discrete hardware component than what is alleged to infringe ('867 Patent, Abstract; col. 4:60-65).

For the ’438 Patent

The Term: "textures and shaders"
Context and Importance: This term is central to claim 21's description of how the GPU computation is set up and performed. The case may turn on whether these graphics-specific terms, originating from a 2006 priority date, can describe the AI-centric computational methods used in NVIDIA's modern architectures.
Intrinsic Evidence for Interpretation:
- Evidence for a Broader Interpretation: The specification of the parent '867 patent explicitly defines these terms broadly for the purpose of the invention: "'texture' in this document refers to a data array unless specified otherwise" and "'shader' in this document refers to a GPU program unless specified otherwise" ('867 Patent, col. 5:1-9). This language may support reading the claims on general data arrays and GPU programs used in AI.
- Evidence for a Narrower Interpretation: The detailed description frequently discusses textures in the context of pixels with color components and packing data into them, which is characteristic of graphics processing ('867 Patent, col. 6:33-51). This could support an argument that the terms should be limited to their traditional graphics-pipeline meaning, which may not apply to modern tensor core operations.

VI. Other Allegations

Indirect Infringement: The complaint alleges that NVIDIA induces infringement by providing the CUDA platform, developer toolkits, technical documentation, and user support, which allegedly instruct and encourage customers to use the accused hardware in a manner that performs the claimed methods (Compl. ¶¶83-86, 127-130).
Willful Infringement: The complaint makes detailed allegations of willful infringement based on NVIDIA's alleged pre-suit knowledge. It asserts NVIDIA knew of the technology from direct discussions with the inventors starting in 2007, from citing the '867 patent application in its own patent prosecution in 2010, and from subsequent investment and partnership discussions where the patent family was explicitly identified (Compl. ¶¶92-93, 136-137, 176-177).

VII. Analyst’s Conclusion: Key Questions for the Case

Technology Evolution and Claim Scope: A central issue will be whether claim terms rooted in the GPGPU architecture of the mid-2000s (e.g., "accelerator controller," "textures and shaders") can be construed to cover the more specialized hardware and software of modern AI accelerators. The dispute will likely focus on whether the patent's explicit re-definition of these terms is broad enough to encompass technologies that have evolved beyond their original graphics-centric context.
Locus of Functionality: The case may turn on whether the functions of the claimed "accelerator controller"—transferring, storing, and pointer swapping—are performed by a single cognizable component in the accused system as required by the claims. A key evidentiary question will be whether NVIDIA's hardware (e.g., memory controllers) performs these specific recited functions, or if the plaintiff’s infringement theory improperly combines discrete hardware operations with functions performed by higher-level software like the CUDA platform.
Willfulness and Pre-Suit Knowledge: Given the extensive and specific allegations of NVIDIA's long-standing awareness of the inventors and their patents, a significant focus of the litigation, should infringement be found, will be on the question of willfulness. The evidence of prior meetings, patent citations, and partnership discussions will be central to determining whether any infringement was "knowing and willful."