7:25-cv-00547
Array Cache Tech LLC v. NVIDIA Corp
I. Executive Summary and Procedural Information
- Parties & Counsel:
- Plaintiff: Array Cache Technologies LLC (Texas)
- Defendant: Nvidia Corporation (Delaware)
- Plaintiff’s Counsel: Fabricant LLP; Davis Firm PC
- Case Identification: 7:25-cv-00547, W.D. Tex., 11/25/2025
- Venue Allegations: Plaintiff alleges venue is proper in the Western District of Texas based on Defendant’s regular and established place of business in the district, including a facility in Austin, Texas, as well as its business transactions and alleged acts of infringement within the district.
- Core Dispute: Plaintiff alleges that Defendant’s graphical processing units, systems-on-a-chip (SoCs), and processors, including the Grace Hopper GH200 Superchip, infringe five patents related to methods for managing cache coherence in multiprocessor systems.
- Technical Context: The patents address cache coherence, a fundamental technology for ensuring data consistency across multiple processing cores, which is critical to the performance and scalability of modern high-performance computing systems like those designed by Defendant.
- Key Procedural History: The complaint does not reference any prior litigation, Inter Partes Review (IPR) proceedings, or specific licensing history concerning the patents-in-suit.
Case Timeline
| Date | Event |
|---|---|
| 2012-03-20 | ’960 Patent Priority Date |
| 2015-02-05 | ’410 and ’861 Patents Priority Date |
| 2016-03-01 | ’960 Patent Issue Date |
| 2016-12-27 | ’464 and ’471 Patents Priority Date |
| 2019-06-18 | ’861 Patent Issue Date |
| 2019-11-01 | Alleged Infringement Begins |
| 2020-01-07 | ’471 Patent Issue Date |
| 2021-07-20 | ’410 Patent Issue Date |
| 2021-11-30 | ’464 Patent Issue Date |
| 2025-11-25 | Complaint Filing Date |
II. Technology and Patent(s)-in-Suit Analysis
U.S. Patent No. 9,274,960 - "System and Method for Simplifying Cache Coherence Using Multiple Write Policies," issued March 1, 2016
The Invention Explained
- Problem Addressed: The patent background describes the complexity, cost, and verification challenges of conventional cache coherence protocols, such as directory-based systems, which require extensive hardware to track data sharers and readers across multiple processor cores (’960 Patent, col. 1:33-2:2).
- The Patented Solution: The invention proposes simplifying coherence by classifying cache lines as either "private" (used by one core) or "shared" (used by multiple cores). Different write policies are applied based on this classification: an efficient "write-back" policy for private data and a "write-through" policy for shared data to maintain consistency. The system introduces a "transient dirty state" for shared lines, allowing a core to perform multiple writes locally before a single, "self-initiated" write-through to the global cache, thus reducing traffic and complexity (’960 Patent, Abstract; col. 4:26-47).
- Technical Importance: This approach aimed to eliminate the need for complex directory structures, invalidation messages, and snooping broadcasts, thereby reducing hardware overhead and simplifying the design of multi-core systems (’960 Patent, col. 3:18-27).
Key Claims at a Glance
- The complaint asserts at least independent Claim 1 (Compl. ¶19).
- Claim 1 of the ’960 patent breaks down into these essential elements:
- A computer system with multiple processor cores, main memory, local cache memory per core, and a global cache memory.
- Each cache line is classified as either shared or private.
- When a core writes a cache line, it performs a write-back to local cache if the line is private, and a write-through to the global cache if the line is shared.
- Shared cache lines can exist in a valid, invalid, or transient dirty state.
- A shared cache line transitions to the transient dirty state when written by its associated core.
- A shared cache line in the transient dirty state transitions to the valid state via a self-initiated write-through to the global cache.
- The complaint does not explicitly reserve the right to assert dependent claims.
U.S. Patent No. 11,068,410 - "Multi-Core Computer Systems With Private/Shared Cache Line Indicators," issued July 20, 2021
The Invention Explained
- Problem Addressed: The patent addresses the challenge of maintaining coherence in hierarchical and clustered cache architectures, where the complexity and number of states required for coherence protocols can become prohibitive, especially at intermediate cache levels (’410 Patent, col. 2:15-25).
- The Patented Solution: The invention introduces the concept of a "common shared level" (CSL), defined as the level in the cache hierarchy where a data block is shared within a specific cluster of cores but remains private from an outside perspective. By identifying this CSL, coherence operations can be localized and simplified, avoiding complex, recursive operations that span the entire hierarchy. A bit indicator on each cache line signifies whether it is private or shared within that context (’410 Patent, Abstract; col. 4:37-54).
- Technical Importance: This method encapsulates the complexity of hierarchical coherence, enabling the use of simpler, more efficient local coherence mechanisms (like self-invalidation) and improving the scalability of clustered multi-core systems (’410 Patent, col. 1:8-14).
Key Claims at a Glance
- The complaint asserts at least independent Claim 1 (Compl. ¶34).
- Claim 1 of the ’410 patent breaks down into these essential elements:
- A computer system with multiple processor cores, local cache per core, at least one intermediary cache coupled to a subset of cores, and at least one shared memory coupled to all cores.
- Each cache line has a bit signifying whether it is private or shared in the shared memory.
- A "common shared level" is identified based on which intermediary or shared memory is shared between two or more cores.
- This common shared level is a level where a memory block's status changes from private to shared.
- A cache coherence operation is selected from a plurality of operations based on the identified common shared level.
- The selected operation is performed upon a coherence event.
- The complaint does not explicitly reserve the right to assert dependent claims.
U.S. Patent No. 10,324,861 - "Systems and Methods for Coherence in Clustered Cache Hierarchies," issued June 18, 2019
Technology Synopsis
This patent describes a method for managing coherence in a clustered cache hierarchy by storing a "Common Shared Level" (CSL) value for each data block. When a data block is written, a coherence mechanism updates caches within the cluster indicated by the CSL, while treating the block as private to caches outside that cluster. A translation look-aside buffer (TLB) miss is used to detect when a new core accesses the data, triggering a re-determination of the CSL value (’861 Patent, Abstract).
Asserted Claims
At least independent Claim 1 is asserted (Compl. ¶51).
Accused Features
The complaint alleges that the GH200's implementation of thread clusters, where a common shared level (e.g., shared or global memory) is identified for a data block, infringes this patent (Compl. ¶¶52-55).
U.S. Patent No. 10,528,471 - "System and Method for Self-Invalidation, Self-Downgrade CacheCoherence Protocols," issued January 7, 2020
Technology Synopsis
This patent details a system where a core, upon a local cache miss, detects a prior store operation from another core to the same memory block. This detection enforces the program order of loads (completing older loads and re-executing younger loads) and causes the core to self-invalidate one or more of its own local cache lines, thereby maintaining coherence without requiring explicit invalidation commands from a central directory (’471 Patent, Abstract).
Asserted Claims
At least independent Claim 13 is asserted (Compl. ¶63).
Accused Features
The infringement allegations target the GH200's cache coherence memory model and synchronization operations, which are alleged to detect prior stores and enforce program order upon a cache miss (Compl. ¶¶67-69).
U.S. Patent No. 11,188,464 - "System and Method for Self-Invalidation, Self-Downgrade CacheCoherence Protocols," issued November 30, 2021
Technology Synopsis
This patent, related to the ’471 patent, describes a method for self-invalidating cache lines. Upon a cache miss, a core checks a "read-after-write detection structure" to see if a race condition exists for the requested memory block. If a race condition (i.e., a prior store) is detected, the system enforces program order between older and younger loads and causes the local cache memory to self-invalidate relevant cache lines (’464 Patent, Abstract).
Asserted Claims
At least independent Claim 1 is asserted (Compl. ¶77).
Accused Features
The complaint alleges that the GH200 determines whether a race condition exists using a read-after-write detection structure and, if so, enforces program order and self-invalidates local cache lines (Compl. ¶¶78-80).
III. The Accused Instrumentality
Product Identification
The complaint names a range of Nvidia graphical processing units, SoCs, and processors, including DGX Systems, Grace Systems, and GeForce Graphics Cards (Compl. ¶15). The allegations focus specifically on the Nvidia Grace Hopper GH200 Superchip as an exemplary accused product (Compl. ¶¶18, 19).
Functionality and Market Context
The GH200 is described as a high-performance computing architecture that integrates a Grace CPU and a Hopper GPU on a single superchip (Compl. ¶20, Fig. 1). It contains a large number of processor cores (72 CPU cores and 132 GPU streaming multiprocessors), multiple levels of cache memory (L1, L2, L3), and high-bandwidth main memory (LPDDR5X and HBM3) (Compl. ¶¶20, 22). The complaint highlights the NVLink-C2C interconnect, which provides a 900GB/s connection between the CPU and GPU and enables hardware coherency, allowing the CPU and GPU to access each other's memory at cache-line granularity (Compl. ¶¶21, 23; Compl. Fig. 8). This architecture is positioned for large-scale AI and high-performance computing applications.
IV. Analysis of Infringement Allegations
’960 Patent Infringement Allegations
| Claim Element (from Independent Claim 1) | Alleged Infringing Functionality | Complaint Citation | Patent Citation |
|---|---|---|---|
| a computer system comprising: multiple processor cores; a main memory; at least one local cache memory associated with...each core...; and a global cache memory | The GH200 comprises 72 CPU cores and 132 GPU multiprocessors, LPDDR5X/HBM3 main memory, L1/L2 local caches, and L2/L3 global cache memory. | ¶20-23 | col. 9:11-24 |
| each of the cache lines being classified as either a shared cache line or a private cache line | The GH200's local L1 and/or L2 caches store cache lines classified as either shared or private. | ¶22 | col. 9:25-27 |
| when a core writes a cache line, the core performs a write-back to the associated local cache memory if the cache line is a private cache line and a write-through to the global cache memory if the cache line is a shared cache line | The GH200 supports cache operators for memory store instructions that include both write-back for all coherent levels (private) and write-through to global L2/L3 memory (shared). | ¶24 | col. 9:31-36 |
| wherein at least one of the shared cache lines are in a valid state, an invalid state, or a transient dirty state | The GH200 comprises shared cache lines which support states of invalid, valid, or transient dirty states, as demonstrated by the ARM CHI protocol and CUDA GPU memory consistency model. | ¶25 | col. 9:37-39 |
| wherein a shared cache line in a local cache memory transitions to the transient dirty state from the valid state or the invalid state when the cache line is written by the associated core | In the GH200, a shared cache line in L1 or L2 local cache transitions to a transient dirty state from a valid or invalid state when written by an associated GPU or CPU core. | ¶26 | col. 9:40-44 |
| and wherein a shared cache line in the transient dirty state transitions to the valid state with a self-initiated write-through to the global cache memory. | A shared cache line in a transient dirty state transitions to valid with a self-initiated write-through to global L2/L3 cache memory, as allegedly demonstrated by CUDA memory store instructions and the relaxed memory model. | ¶26 | col. 9:45-48 |
- Identified Points of Contention:
- Scope Questions: A central question may be whether the cache operator instructions (".wb", ".wt") documented in Nvidia's CUDA programming guide (Compl. ¶24) constitute the specific write-back and write-through policies as required by the claim. The defense may argue that offering optional instructions does not equate to the system inherently performing these distinct policies based on a private/shared classification.
- Technical Questions: The analysis will likely focus on whether the accused products implement a specific "transient dirty state" that matches the claim's functional requirements: transitioning to it upon a write and transitioning from it via a "self-initiated write-through." The complaint points to general memory consistency models (Compl. ¶26), and the factual question will be whether those models map directly to the claimed state machine.
’410 Patent Infringement Allegations
| Claim Element (from Independent Claim 1) | Alleged Infringing Functionality | Complaint Citation | Patent Citation |
|---|---|---|---|
| a computer system comprising: multiple processor cores; at least one local cache memory...; at least one intermediary cache memory...; and at least one shared memory | The GH200 comprises CPU/GPU cores, private L1 local caches, L2/L3 intermediary caches coupled to subsets of cores, and shared/global memory coupled to all cores. A diagram of the GH200 Superchip illustrates this architecture (Compl. Fig. 1). | ¶35-38 | col. 13:17-24 |
| wherein each cache line has a bit that signifies whether this cache line is private or shared in said shared memory | Each cache line of the GH200 includes a bit or corresponding value that identifies the line as private (e.g., unique) or shared. | ¶39 | col. 13:28-30 |
| wherein a common shared level is identified...based on which...is shared between two of the multiple processor cores | The GH200 implements thread clusters where a common shared level (e.g., shared or global memory) is identified among intermediary cache and shared memory based on which is shared among cores of a given cluster. This memory hierarchy is depicted in a complaint figure (Compl. p. 26, Fig. 6). | ¶40 | col. 13:31-37 |
| wherein the common shared level is a level within computer system memory where a bit's value for a memory block becomes shared from being private in levels closer to local cache memory | In GH200, the common shared level is a level (e.g., a cache or global memory level) where a bit value for a memory block transitions from private to shared, per a shared/relaxed memory model. | ¶41 | col. 13:38-42 |
| wherein a cache coherence operation is selected among a plurality of cache coherence operations based on said common shared level being identified | The GH200 selects among cache coherence operations (e.g., load, store, invalidate) based on the identified common shared level (e.g., a cluster level shared with respect to a data block). | ¶42 | col. 13:43-46 |
| and wherein said cache coherence operation is performed upon the occurrence of a coherence event. | The GH200 performs coherence operations (e.g., on a read/write miss, atomic operation) in its thread clusters upon the occurrence of a coherence event. | ¶43 | col. 13:47-49 |
- Identified Points of Contention:
- Scope Questions: The dispute may turn on the definition of "common shared level." The patent describes this as a hardware-level concept, whereas the complaint's allegations rely on mapping it to software-centric constructs from the CUDA programming model, such as "thread block clusters" and "shared memory" (Compl. ¶40). A key legal question will be whether these software abstractions read on the claimed hardware structure.
- Technical Questions: It raises the evidentiary question of how the GH200 actually "identifies" this level. Does the hardware perform a comparison or use a specific indicator as taught in the patent, or is the "level" merely an emergent property of the software model? The complaint's evidence is drawn from documentation, not a direct analysis of the hardware's operation.
V. Key Claim Terms for Construction
For the ’960 Patent
- The Term: "transient dirty state"
- Context and Importance: This term defines a specific, temporary state for a shared cache line that is central to the patent's method of simplifying coherence. Infringement hinges on whether the accused GH200 architecture utilizes a state with the exact transitional properties required by the claim (entry upon write to a shared line, exit upon a self-initiated write-through), as opposed to more conventional "dirty" states.
- Intrinsic Evidence for Interpretation:
- Evidence for a Broader Interpretation: The specification describes the state functionally as one that allows a write-through to be delayed, which could support an argument that any mechanism delaying a write-through from a modified-shared state meets the limitation (’960 Patent, col. 10:25-30).
- Evidence for a Narrower Interpretation: The claim language recites a specific lifecycle: "transitions to the transient dirty state...when...written" and "transitions to the valid state with a self-initiated write-through." This sequence may support a narrower construction tied to this exact state machine, distinguishing it from general MESI protocol states (’960 Patent, Claim 1).
For the ’410 Patent
- The Term: "common shared level"
- Context and Importance: This term is the lynchpin of the invention, defining the boundary for localized coherence operations. The Plaintiff's case depends on construing this term to encompass the "shared memory" level within Nvidia's "thread clusters." Practitioners may focus on this term because its definition will determine whether a software-level abstraction can meet a claim limitation that appears to describe a hardware hierarchy level.
- Intrinsic Evidence for Interpretation:
- Evidence for a Broader Interpretation: The claim defines the term functionally as "a level...where a bit's value for a memory block becomes shared from being private," which could be argued to apply to any boundary, whether defined in hardware or software, where data transitions from single-core to multi-core accessibility (’410 Patent, Claim 1).
- Evidence for a Narrower Interpretation: The patent family specification explains that the CSL can be derived by comparing the IDs of the accessing cores to find the "lowest common cache level" they share (’861 Patent, col. 6:46-54). This suggests a specific hardware-based calculation rather than a logical software construct, potentially supporting a narrower definition.
VI. Other Allegations
Indirect Infringement
For each patent-in-suit, the complaint alleges both induced and contributory infringement. The inducement claims are based on allegations that Nvidia provides its products with instructions, documentation, technical support, and marketing that encourage and instruct customers and end-users to operate the products in an infringing manner (e.g., Compl. ¶¶28, 45). The contributory infringement claims allege that the accused components are material to the inventions, are not staple articles of commerce, and are especially made or adapted for infringement (e.g., Compl. ¶¶29, 46).
Willful Infringement
The complaint alleges that "Nvidia has further had knowledge and notice of the Asserted Patents, and its infringement thereof, at least as of the filing of this Complaint" (Compl. ¶14). While the prayer for relief requests a finding of willfulness (Compl. p. 60, ¶b), the factual allegations do not assert pre-suit knowledge, which may limit the willfulness claim to post-filing conduct.
VII. Analyst’s Conclusion: Key Questions for the Case
- A core issue will be one of evidentiary mapping: can Plaintiff demonstrate that the high-level descriptions of functionality in Nvidia's public technical documentation and programming guides (e.g., CUDA cache operators, memory consistency models) directly correspond to the specific, low-level hardware mechanisms and state transitions required by the patent claims?
- A central legal question will be one of definitional scope: can patent terms that appear grounded in specific hardware implementations, such as a "common shared level" or a "transient dirty state," be construed broadly enough to read on the more abstract, software-defined architectural concepts (e.g., "shared memory" in a "thread cluster") that form the basis of the complaint's infringement theory?
- A key question for damages and indirect infringement will be knowledge and intent: given that the complaint only alleges knowledge as of its filing, the case may turn on what evidence emerges during discovery to establish pre-suit knowledge or willful blindness by Nvidia regarding the patents-in-suit and the alleged infringement.