DCT

1:25-cv-00586

Onesta IP LLC v. NVIDIA Corp

Key Events

Complaint

I. Executive Summary and Procedural Information

Parties & Counsel:
- Plaintiff: Onesta IP, LLC (Delaware)
- Defendant: NVIDIA Corp (Delaware)
- Plaintiff’s Counsel: Mintz Levin Cohn Ferris Glovsky and Popeo PC; Miller Fair Henry PLLC
Case Identification: 1:25-cv-00586, W.D. Tex., 04/17/2025
Venue Allegations: Plaintiff alleges venue is proper because NVIDIA has regular and established places of business within the Western District of Texas, including an office in Austin, and has committed acts of infringement in the district.
Core Dispute: Plaintiff alleges that Defendant’s graphics processing units (GPUs) and "Superchip" products infringe five patents related to asynchronous task dispatch, priority-based command execution, unified memory management, and GPU chiplet architecture.
Technical Context: The technologies at issue relate to the architecture of high-performance GPUs, which are central to the markets for artificial intelligence, data center computing, gaming, and professional visualization.
Key Procedural History: The complaint does not allege any prior litigation, Inter Partes Review (IPR) proceedings, or licensing history concerning the Asserted Patents.

Case Timeline

Date	Event
2009-09-03	Priority Date for U.S. Patent No. 8,854,381
2010-12-07	Priority Date for U.S. Patent No. 9,519,943
2012-03-29	Priority Date for U.S. Patent Nos. 11,741,019 and 9,116,809
2014-10-07	U.S. Patent No. 8,854,381 Issues
2015-08-25	U.S. Patent No. 9,116,809 Issues
2016-12-13	U.S. Patent No. 9,519,943 Issues
2019-06-28	Priority Date for U.S. Patent No. 11,841,803
2023-08-29	U.S. Patent No. 11,741,019 Issues
2023-12-12	U.S. Patent No. 11,841,803 Issues
2025-04-17	Complaint Filed

II. Technology and Patent(s)-in-Suit Analysis

U.S. Patent No. 8,854,381 - "Processing Unit That Enables Asynchronous Task Dispatch"

Issued: October 7, 2014

The Invention Explained

Problem Addressed: The patent’s background section describes the inefficiency of conventional GPUs that must perform a full, time-consuming "context switch" to interrupt a running task (e.g., a standard graphics task) to execute a more urgent, low-latency task. This process requires substantial time and additional on-chip memory, limiting overall performance (’381 Patent, col. 2:13-58).
The Patented Solution: The invention proposes a processing unit with multiple "virtual engines" that receive tasks in parallel and load their associated state data. A single, shared "shader core" can then execute tasks from different engines concurrently, based on their respective state data, without requiring a traditional context switch (’381 Patent, Abstract; col. 3:1-12). This architecture is depicted in the patent’s Figure 4, which shows multiple command buffers feeding virtual engines that are managed by a scheduling module to access a shared shader core (’381 Patent, Fig. 4).
Technical Importance: This approach was designed to allow high-priority compute tasks to execute alongside standard graphics workloads on a GPU with minimal performance degradation, improving the GPU's suitability for mixed-workload environments (’381 Patent, col. 2:59-63).

Key Claims at a Glance

The complaint asserts independent claim 5 (Compl. ¶34).
Essential elements of claim 5 include:
- An apparatus comprising: a plurality of engines associated with a first processing unit and configured to receive a plurality of tasks from a scheduling module and load state data for each task.
- A shader core associated with the first processing unit, configured to receive the tasks from the engines.
- The shader core is further configured to execute a first task from the plurality while executing a second task from the plurality, based on their respective state data.
- The first task is a graphics-processing task, and the second task is a general-computing task.
The complaint does not explicitly reserve the right to assert dependent claims for this patent.

U.S. Patent No. 9,519,943 - "Priority-Based Command Execution"

Issued: December 13, 2016

The Invention Explained

Problem Addressed: The patent background explains that conventional processing systems often cannot execute different types of commands in a desired order, causing high-priority computational commands to wait behind lower-priority rendering commands, resulting in unacceptable latency for time-sensitive operations (’943 Patent, col. 1:21-44).
The Patented Solution: The invention discloses a processing device with a set of command queues, including at least one designated "high priority queue." A command processor is specifically configured to retrieve and execute commands from the high priority queue before retrieving commands from any of the other, lower-priority queues (’943 Patent, Abstract; col. 1:59-68). This priority-based retrieval is illustrated in Figure 3, which shows multiple ring buffers feeding a command processor controlled by a run list controller (’943 Patent, Fig. 3).
Technical Importance: This architecture provides a hardware-level mechanism for enforcing quality of service (QoS) and ensuring that urgent computational tasks are not delayed by less time-sensitive graphics workloads (’943 Patent, col. 1:39-44).

Key Claims at a Glance

The complaint asserts independent claim 11 (Compl. ¶49).
Essential elements of claim 11 include:
- A processing device comprising: a set of queues configured to hold commands received from a CPU.
- A command processor configured to retrieve commands from the queues.
- The set of queues includes a high priority queue that holds high priority commands.
- The command processor is configured to retrieve a high priority command from the high priority queue before retrieving commands held in other queues.
- A processing core configured to execute the received command.
The complaint does not explicitly reserve the right to assert dependent claims for this patent.

U.S. Patent No. 11,741,019 - "Memory Pools in a Memory Model for a Unified Computing System"

Issued: August 29, 2023
Technology Synopsis: The patent describes a memory architecture for a unified computing system containing multiple processors (e.g., a CPU and a GPU). The system uses a "mapper" to receive a memory operation, map it to one of several "virtual memory pools," and provide the resulting mapping to the processor, thereby abstracting the physical location of different memory resources like system RAM and GPU VRAM (Compl. ¶64; ’019 Patent, Abstract).
Asserted Claims: Independent claim 11 (Compl. ¶63).
Accused Features: The complaint accuses the NVIDIA Blackwell GB200 Superchip, which incorporates a Grace CPU and Blackwell GPUs (Compl. ¶65). The accused features include the Superchip's unified memory architecture, memory controllers, and memory management units, which are alleged to function as the claimed "mapper" that manages memory operations across its shared memory space (Compl. ¶¶67-70).

U.S. Patent No. 11,841,803 - "GPU Chiplets Using High Bandwidth Crosslinks"

Issued: December 12, 2023
Technology Synopsis: The patent discloses a system for creating a large, high-performance GPU from smaller semiconductor "chiplets." It describes a CPU communicably coupled to a first GPU chiplet, which is in turn coupled to a second GPU chiplet via a dedicated, high-bandwidth "passive crosslink." This architecture allows the multiple chiplets to function and appear to software as a single, monolithic GPU (’803 Patent, Abstract; Compl. ¶83).
Asserted Claims: Independent claim 1 (Compl. ¶78).
Accused Features: The NVIDIA Blackwell GB200 Superchip is the accused product (Compl. ¶80). The infringement theory alleges that the two Blackwell B200 "dies" function as the claimed GPU chiplets and that the "NVIDIA High-Bandwidth Interface (NV-HBI)" that connects them at 10 TB/s is the claimed "passive crosslink" (Compl. ¶83).

U.S. Patent No. 9,116,809 - "Memory Heaps in a Memory Model for a Unified Computing System"

Issued: August 25, 2015
Technology Synopsis: Related to the ’019 Patent, this invention describes a method for allocating memory in a unified computing system. The method involves receiving a memory operation that references a shared memory address (SMA), mapping that operation to one of a plurality of "memory heaps" based on the SMA, and providing the mapping result to the processor (’809 Patent, Abstract; Compl. ¶93).
Asserted Claims: Independent claim 1 (Compl. ¶92).
Accused Features: The complaint accuses the NVIDIA Blackwell GB200 Superchip (Compl. ¶94). The accused functionality is the Superchip's performance of the claimed method through its unified memory system, where the system allegedly maps memory operations from the CPU or GPU to different memory heaps based on the address, such as migrating data between system memory and GPU memory (Compl. ¶¶97-99).

III. The Accused Instrumentality

Product Identification

The complaint identifies two main categories of accused products:
1. The NVIDIA GeForce RTX 4060 Ti graphics card, its incorporated GPU, and associated software drivers (accused of infringing the ’381 and ’943 Patents) (Compl. ¶¶36, 51).
2. The NVIDIA Blackwell GB200 Superchip, which incorporates the Blackwell B200 GPU, the NVIDIA Grace CPU, and associated software (accused of infringing the ’019, ’803, and ’809 Patents) (Compl. ¶¶65, 80, 94).

Functionality and Market Context

The GeForce RTX 4060 Ti is a graphics card for the consumer market. The complaint alleges its architecture includes "asynchronous engines" and Streaming Multiprocessors that can "concurrently run compute and 3D" tasks (Compl. ¶¶38-39). It also allegedly contains a "Task Management Unit" that queues tasks based on priority (Compl. ¶52). A diagram from an NVIDIA whitepaper depicts the architecture of the GPU, showing multiple Graphics Processing Clusters (GPCs) surrounding a central L2 Cache (Compl. p. 10).
The Blackwell GB200 Superchip is a high-performance computing product for data centers, particularly for generative AI workloads (Compl. ¶¶66, 81). The complaint alleges it features a unified memory architecture where the CPU and GPUs access a shared memory pool (Compl. ¶¶68-69). It is also alleged to be constructed from two separate GPU dies connected by a high-bandwidth interface (Compl. ¶83). The complaint includes a photograph of the GB200 Superchip, showing two large Blackwell GPUs alongside a smaller Grace CPU on a single board (Compl. p. 30).

IV. Analysis of Infringement Allegations

U.S. Patent No. 8,854,381 - Infringement Allegations

Claim Element (from Independent Claim 5)	Alleged Infringing Functionality	Complaint Citation	Patent Citation
an apparatus comprising: a plurality of engines associated with a first processing unit and configured to receive... a plurality of tasks and to load a state data associated with each of the plurality of tasks	The GeForce RTX 4060 Ti GPU includes the "GigaThread Engine," "GPU front end (command processor)," graphics engine, compute engine, and/or "asynchronous engines," which are configured to receive tasks from the host CPU's graphics driver.	¶38	col. 6:3-7
a shader core associated with the first processing unit and configured to receive the plurality of tasks from at least one of the plurality of engines and to execute a first task from the plurality while executing a second task from the plurality of tasks based on respective state data associated with each of the first and second tasks	The GPU includes "Graphics Processing Clusters (GPCs)," "Texture Processing Clusters (TPCs)," and "Streaming Multiprocessors (SMs)" that can "concurrently run compute and 3D" tasks, as shown in documentation regarding "Asynchronous Compute."	¶39, ¶40	col. 6:8-15
and, wherein the first task comprises a graphics-processing task and the second task comprises a general-computing task	The GPU is alleged to perform graphics-processing tasks concurrently with general-compute tasks, citing NVIDIA documentation on its CUDA architecture and SIMT (Single-Instruction, Multiple-Thread) processing model.	¶41	col. 14:13-15

Identified Points of Contention

Scope Questions: A central question may be whether NVIDIA's collection of functionally distinct units (e.g., "GigaThread Engine," "command processor," "compute engine") collectively constitutes the "plurality of engines" as recited in a single claim element.
Technical Questions: The analysis may focus on whether the concurrent operation within NVIDIA's Streaming Multiprocessors constitutes executing one task "while executing" a second task, as claimed, or if it represents a different form of parallel execution (e.g., fine-grained time-slicing) that is technically distinct from the patented invention.

U.S. Patent No. 9,519,943 - Infringement Allegations

Claim Element (from Independent Claim 11)	Alleged Infringing Functionality	Complaint Citation	Patent Citation
a processing device, comprising: a set of queues, each queue of the set of queues being configured to hold commands received from a central processing unit (CPU)...	The GeForce RTX 4060 Ti GPU allegedly includes a "Grid Management Unit," "Compute Front End," "Task Management Unit," and/or "Work Distribution Unit," which include a plurality of queues to hold commands.	¶52	col. 4:1-3
a command processor configured to retrieve the received commands from the set of queues, wherein the set of queues includes a high priority queue that holds high priority commands, wherein the command processor is configured to retrieve a high priority command held in the high priority queue before retrieving commands held in other queues of the set of queues...	The "Task Management Unit" and "Work Distribution Unit" are alleged to function as the command processor. The TMU is alleged to queue tasks by priority and remove the head of the "highest-priority non-empty list" first. The complaint includes a diagram illustrating this priority-based queue structure (Compl. p. 22, Fig. 7).	¶53	col. 4:4-10
and a processing core configured to execute the received command...	The GPU's "Graphics Processing Clusters (GPCs)," "Texture Processing Clusters (TPCs)," and "Streaming Multiprocessors (SMs)" are alleged to be the processing core that executes the commands.	¶54	col. 4:11-15

Identified Points of Contention

Scope Questions: A dispute may arise over whether the combination of NVIDIA's "Grid Management Unit," "Task Management Unit," and "Work Distribution Unit" can be mapped onto the singular claim term "command processor."
Evidentiary Questions: The complaint heavily relies on a third-party technical paper to describe the operation of the "Task Management Unit" (Compl. ¶¶52-53). A key question will be whether Plaintiff can provide evidence that the accused commercial product actually implements the specific priority-retrieval mechanism described in that paper.

V. Key Claim Terms for Construction

U.S. Patent No. 8,854,381

The Term: "while executing"
Context and Importance: This term is critical to determining the nature of the required concurrency. NVIDIA's architecture uses concepts like "Asynchronous Compute," and the case may turn on whether this implementation meets the temporal requirement of one task running "while" another is also running, as opposed to, for example, being rapidly interleaved or paused.
Intrinsic Evidence for Interpretation:
- Evidence for a Broader Interpretation: The specification repeatedly uses the phrase "substantially in parallel" to describe the execution of tasks, which may support a construction that is not strictly simultaneous but includes various forms of concurrent processing (’381 Patent, col. 3:7-9).
- Evidence for a Narrower Interpretation: The detailed description discusses partitioning the shader core resources "in space and/or time," which could be argued to point toward specific, disclosed methods of concurrency rather than encompassing all possible methods (’381 Patent, col. 5:17-25).

U.S. Patent No. 9,519,943

The Term: "before retrieving"
Context and Importance: This term defines the core of the priority mechanism. The infringement theory depends on showing that NVIDIA's hardware retrieves a complete command from a high-priority queue prior to retrieving any command from a lower-priority queue, as opposed to using a weighted scheduling algorithm where lower-priority tasks might still be retrieved, albeit less frequently.
Intrinsic Evidence for Interpretation:
- Evidence for a Broader Interpretation: The patent's summary states the invention relates to providing for "priority-based execution of commands" generally, which could support a construction covering various priority-based scheduling schemes (’943 Patent, col. 2:48-50).
- Evidence for a Narrower Interpretation: The claim language recites a specific sequence: retrieving from the high priority queue "before retrieving commands held in other queues." This explicit sequence may support a strict, non-preemptible priority construction where lower-priority queues are not serviced until the high-priority queue is empty.

VI. Other Allegations

Indirect Infringement

For all asserted patents, the complaint alleges induced infringement by actively encouraging infringement by subsidiaries, affiliates, and customers through the sale of the accused products along with instructions, marketing materials, and technical documentation that allegedly direct users to operate them in an infringing manner (e.g., Compl. ¶¶42, 56, 71, 85, 100). The complaint also alleges contributory infringement, stating the accused products are a material part of the invention and have no substantial non-infringing uses (e.g., Compl. ¶¶44, 58, 73, 87, 102).

Willful Infringement

The willfulness allegations are based on knowledge of the patents as of the filing date of the complaint. The complaint alleges that "Notwithstanding NVIDIA’s knowledge of the Asserted Patents since at least as early as the filing of the present Complaint, NVIDIA has and continues to willfully infringe" (Compl. ¶31).

VII. Analyst’s Conclusion: Key Questions for the Case

Architectural Mapping: A central issue spanning multiple patents will be one of definitional scope: can the functional blocks described in NVIDIA's public-facing technical documents (e.g., "GigaThread Engine," "Task Management Unit," "NV-HBI") be mapped onto specific, unitary claim terms like "plurality of engines," "command processor," and "passive crosslink," or will Defendant successfully argue these are improper mappings of distinct hardware components onto singular claim elements?
Operational Equivalence: For the patents related to memory management (’019 and ’809) and asynchronous execution (’381), a key technical question will be whether NVIDIA's implementation of unified memory and concurrent computing achieves a similar result through a fundamentally different technical operation than what is claimed, raising questions of both literal infringement and infringement under the doctrine of equivalents.
Evidentiary Foundation: The infringement theories, particularly for the ’943 patent, rely on technical descriptions from sources outside of NVIDIA's own product manuals, such as academic papers. A threshold question for the court will be whether the operational details described in these sources are proven to be present and practiced in the accused commercial products.