DCT

1:21-cv-12110

Singular Computing LLC v. Google LLC

Key Events

Amended Complaint

I. Executive Summary and Procedural Information

Parties & Counsel:
- Plaintiff: Singular Computing LLC (Delaware)
- Defendant: Google LLC (Delaware)
- Plaintiff’s Counsel: Prince Lobel Tye LLP
Case Identification: 1:21-cv-12110, D. Mass., 03/10/2022
Venue Allegations: Plaintiff alleges venue is proper because Google maintains regular and established places of business in the District of Massachusetts and has committed acts of patent infringement within the district.
Core Dispute: Plaintiff alleges that Defendant’s Tensor Processing Units (TPUs) infringe patents related to a novel computer architecture that utilizes a large number of low-precision, high-dynamic-range processing elements to improve computational efficiency, particularly for artificial intelligence applications.
Technical Context: The technology concerns specialized processors (ASICs) designed to accelerate machine learning workloads by trading unnecessary arithmetic precision for a massive increase in parallel processing capabilities, a critical factor in the performance of large-scale data centers.
Key Procedural History: The complaint alleges a multi-year history of meetings and technology disclosures between Singular’s founder, Dr. Joseph Bates, and key Google AI personnel beginning in 2010. It further alleges that Google monitored Singular’s patent portfolio and was aware of the pending patent applications through its inter partes review (IPR) activities prior to the patents issuing.

Case Timeline

Date	Event
2009-06-19	Priority date for the '616 and '775 patents
2010-11-03	Dr. Bates allegedly forwards technology document to Google
2013-09-17	Dr. Bates allegedly meets with Google's Jeffrey Dean and Quoc Le
2017-01-01	Approximate date Google housed accused TPUs in at least eight U.S. data centers
2017-02-02	Dr. Bates allegedly presents and demonstrates patented technology at Google
2020-08-25	U.S. Patent No. 10,754,616 issues
2020-10-30	Google allegedly identifies the application for the '775 patent in an IPR petition
2021-11-09	U.S. Patent No. 11,169,775 issues
2022-03-10	Amended Complaint filed

II. Technology and Patent(s)-in-Suit Analysis

U.S. Patent No. 11,169,775 - "PROCESSING WITH COMPACT ARITHMETIC PROCESSING ELEMENT"

Patent Identification: U.S. Patent No. 11,169,775, titled “PROCESSING WITH COMPACT ARITHMETIC PROCESSING ELEMENT,” issued November 9, 2021.

The Invention Explained

Problem Addressed: The patent’s background section describes the inefficiency of conventional computer processors, which, despite containing billions of transistors, enable software to perform only a few high-precision operations per clock cycle, failing to utilize the hardware’s full theoretical computing power ( Compl. ¶¶8-9; ’775 Patent, col. 1:31-44).
The Patented Solution: The invention proposes a computer architecture featuring a massively parallel processor composed of numerous processing elements designed to perform arithmetic on numerical values of "low precision but high dynamic range" ('775 Patent, col. 2:5-7). By reducing the precision of the arithmetic, the processing elements can be made significantly smaller (i.e., using fewer transistors), allowing a much larger number of them to be integrated onto a single chip and operated in parallel, thereby dramatically increasing the total number of operations per second (Compl. ¶¶10-11).
Technical Importance: This architectural approach provides a revolutionary increase in computer efficiency, particularly for emerging applications like artificial intelligence that require massive computational power but can tolerate a degree of arithmetic imprecision (Compl. ¶¶10, 37).

Key Claims at a Glance

Independent Claim 1 is asserted in the complaint (Compl. ¶26).
Essential elements of Claim 1 include:
- A computing system comprising a host computer and a computing chip.
- The chip comprises a processing element array with distinct "edge" and "interior" processing elements, connections between them, an input-output unit, and local memory for each element.
- The system includes four arithmetic units with multiplier circuits adapted for low-precision floating-point values (mantissa no more than 11 bits, exponent at least 6 bits).
- The system also includes a fifth arithmetic unit with a multiplier circuit adapted for high-precision floating-point values (at least 32 bits wide).
- A structural requirement that the multiplier circuit for the high-precision unit comprises more transistors than each of the low-precision multiplier circuits.
The complaint does not explicitly reserve the right to assert dependent claims for the ’775 patent.

U.S. Patent No. 10,754,616 - "PROCESSING WITH COMPACT ARITHMETIC PROCESSING ELEMENT"

Patent Identification: U.S. Patent No. 10,754,616, titled “PROCESSING WITH COMPACT ARITHMETIC PROCESSING ELEMENT,” issued August 25, 2020.

The Invention Explained

Problem Addressed: The patent addresses the same problem of inefficient transistor utilization in conventional computer architectures as the ’775 Patent, with which it shares a specification (Compl. ¶¶8-9; ’616 Patent, col. 1:31-44).
The Patented Solution: The solution is a processor architecture that leverages a large number of compact, low-precision, high-dynamic-range (LPHDR) arithmetic elements to achieve a higher density of computation per unit of resource (e.g., silicon area) compared to traditional designs (’616 Patent, col. 2:1-15).
Technical Importance: The architecture is presented as a way to unlock performance gains that are unattainable with conventional CPUs, especially for computationally intensive tasks like AI and machine learning (Compl. ¶¶10, 37).

Key Claims at a Glance

Claim 10 (rewritten in independent form based on claims 7 and 8) is asserted in the complaint (Compl. ¶25, n.1).
Essential elements of Claim 10 include:
- A computing system comprising a host computer and a computing chip.
- The chip comprises a processing element array with at least 5000 "first processing elements," local memory units, and connections.
- The arithmetic units within these first processing elements have multiplier circuits adapted for low-precision floating-point values (mantissa no more than 11 bits, exponent at least 6 bits).
- The chip also comprises a plurality of "second processing elements" with multiplier circuits adapted for high-precision floating-point values (at least 32 bits wide).
- A numerical requirement that the number of first processing elements is greater by at least 100 than the number of second processing elements.
- A functional requirement that the host computer is programmed to use the array to perform an image identification operation.
The complaint does not explicitly reserve the right to assert dependent claims for the ’616 Patent.

III. The Accused Instrumentality

Product Identification

The accused products are Google’s Tensor Processing Units (“TPU”) versions v2, v3, and v4, also marketed as Cloud TPU (Compl. ¶16).

Functionality and Market Context

The accused TPUs are described as "custom-designed machine learning ASIC[s]" that are deployed in Google’s data centers to power and accelerate AI-driven services such as Google Translate, Photos, Search, Assistant, and Gmail (Compl. ¶¶16, 41). The complaint alleges that a TPU system consists of a host computer (a Host VM CPU) connected to a "TPU board" containing multiple TPU chips, which in turn contain multiple "TPU cores" (Compl. ¶46). This architecture is depicted in a Google diagram showing the relationship between a user's virtual machine, the host, and the TPU board (Compl. p. 18). Each TPU core is alleged to contain a Matrix Multiply Unit (MXU) for performing low-precision matrix operations and a Vector Processing Unit (VPU) for other computations, including higher-precision operations (Compl. ¶¶48, 57).
The complaint alleges that Google recognized that conventional computer architectures were a "scary and daunting" limitation for its AI services and that it incorporated the patented technology into its TPUs to increase computational efficiency and avoid having to double its number of data centers (Compl. ¶¶15-17).

IV. Analysis of Infringement Allegations

'775 Patent Infringement Allegations

Claim Element (from Independent Claim 1)	Alleged Infringing Functionality	Complaint Citation	Patent Citation
a processing element array comprising a first edge processing element positioned at a first edge of the processing element array... a first interior processing element positioned at a first location in the interior...	Each TPU chip comprises a processing element array, allegedly with left edge and interior processing elements, as depicted in figures from Google's own U.S. Patent No. 10,621,269, which is alleged to reflect the TPU architecture.	¶49	col. 29:3-13
wherein the first... arithmetic units each comprises a corresponding multiplier circuit adapted to receive... a first floating point value having a first binary mantissa of width no more than 11 bits and a first binary exponent of width at least 6 bits...	The MXU multiplier circuits in the TPU cores are adapted to receive inputs in the "bfloat16" format, which has a 7-bit mantissa and an 8-bit exponent, meeting the claim's precision requirements. A Google document illustrates this bfloat16 format (Compl. p. 29).	¶¶54-56	col. 5:6-9
wherein the fifth arithmetic unit comprises a corresponding multiplier circuit adapted to receive as inputs... two floating point values each of width at least 32 bits;	The Vector Processing Unit (VPU) in each TPU chip comprises a multiplier circuit adapted to receive 32-bit floating point values ("float32").	¶57	col. 2:13-15
wherein the fifth plurality of transistors exceeds in number each of the first plurality of transistors...	The bfloat16 multiplier circuits (in the MXUs) require "so much less circuitry" and fewer transistors than the 32-bit floating point multiplier circuits (in the VPUs), as allegedly admitted by a Google engineer.	¶58	col. 6:50-54

Identified Points of Contention:
- Scope Questions: A central question may be whether the "systolic array" architecture allegedly used in Google's TPUs (Compl. ¶¶49, 68) constitutes the claimed "processing element array" with distinct "edge" and "interior" elements. The analysis will depend on how these structural terms are construed in light of the patent's specification.
- Technical Questions: The infringement theory maps the low-precision multiplier requirement to the TPU's MXU and the high-precision requirement to the VPU. A key question will be whether the complaint provides sufficient evidence that the VPU and its multiplier circuit function as the claimed "fifth arithmetic unit" and whether the relative transistor counts of the MXU and VPU multipliers satisfy the claim limitation.

'616 Patent Infringement Allegations

Claim Element (from Independent Claim 10)	Alleged Infringing Functionality	Complaint Citation	Patent Citation
a processing element array comprising a plurality of first processing elements, wherein the plurality of first processing elements is no less than 5000 in number...	Each accused TPU processing element array comprises a 128x128 array of processing elements, for a total of 16,384 elements per array, which exceeds the 5000 required.	¶68	col. 28:36-39
wherein the plurality of arithmetic units each comprises a first corresponding multiplier circuit adapted to receive... a first floating point value having a first binary mantissa of width no more than 11 bits and a first binary exponent of width at least 6 bits...	The MXU arithmetic units in the "first processing elements" use the bfloat16 format, which employs a 7-bit mantissa and an 8-bit exponent, falling within the claimed parameters.	¶¶75-77	col. 5:6-9
wherein the computing chip further comprises a plurality of second processing elements, wherein the plurality of second processing elements each comprises a second corresponding multiplier circuit adapted to receive... two floating point values each of width at least 32 bits;	The accused TPU chips include VPUs, which are alleged to be the "second processing elements" and are adapted to handle 32-bit floating point computations.	¶78	col. 2:13-15
wherein the plurality of first processing elements is greater in number, by at least 100, than the plurality of second processing elements;	In each TPU chip, there are allegedly 16,384 MXU multiplier circuits (low-precision) for every 1,024 VPU multiplier circuits (high-precision), a ratio that exceeds the "at least 100 more" requirement.	¶79	col. 28:19-24
wherein said host computer is programmed to provide instructions... to perform an operation whose output is used to identify at least one image... that is similar to at least one input image.	The TPU host provides instructions to the TPU chip to perform image similarity analysis for services like Google Photos, Google Lens, and Google Images.	¶81	col. 17:1-11

Identified Points of Contention:
- Scope Questions: As with the ’775 patent, a key issue will be mapping the components of Google's TPU architecture to the claimed elements. The dispute may focus on whether a TPU core's MXU constitutes a "first processing element" and its VPU constitutes a "second processing element" as those terms are understood in the patent.
- Technical Questions: What evidence does the complaint provide to support the specific numerical counts of "first" and "second" processing elements? The complaint relies on Google's own publications and patents for these figures, but the precise definition and demarcation of these elements within the accused chip will be a central technical question.

V. Key Claim Terms for Construction

The Term: "processing element"
- Context and Importance: This term is the fundamental building block of the claimed invention. The claims require specific counts, types (e.g., "first" and "second"), and locations (e.g., "edge" and "interior") of these elements. The infringement case depends entirely on mapping sub-components of Google's TPU cores (such as the MXU and VPU) to this term. Practitioners may focus on this term because its construction will determine whether Google's architecture can even be compared to the claim structure.
- Intrinsic Evidence for Interpretation:
  - Evidence for a Broader Interpretation: The specification suggests a processing element is a generic execution unit, stating that references to "processing elements" should be understood "more generally as any kind of execution unit" (’616 Patent, col. 8:8-10). This could support an argument that different functional units within a TPU core can be considered distinct "processing elements."
  - Evidence for a Narrower Interpretation: The detailed description shows a specific embodiment of a processing element (PE 400) containing registers, a logic unit, and an LPHDR arithmetic unit as a single, integrated component (’616 Patent, FIG. 4; col. 10:37-39). This could support a narrower definition, potentially requiring that the MXU and VPU be part of a single, unified "processing element" rather than being separate elements.
The Term: "low precision"
- Context and Importance: This term defines the core technical tradeoff of the invention. The claims provide specific numerical bounds (mantissa ≤ 11 bits). Google's accused "bfloat16" format appears to meet this literal definition. However, the patent also provides a broader, functional definition that could become a point of contention.
- Intrinsic Evidence for Interpretation:
  - Evidence for a Broader Interpretation: The summary of the invention defines "low precision" functionally as performing "arithmetic operations which produce results that frequently differ from exact results by at least 0.1%" (’616 Patent, col. 2:16-19). Plaintiff may argue this functional definition is controlling, and that the specific bit-widths are merely exemplary.
  - Evidence for a Narrower Interpretation: A defendant could argue that the meaning is constrained by the more specific language in the claims themselves (e.g., "mantissa of width no more than 11 bits") and the primary embodiment described, which uses a logarithmic number system with a 6-bit fraction (’616 Patent, col. 12:7-9).

VI. Other Allegations

Indirect Infringement: The complaint alleges Google induces infringement by offering its reverse image technology, performed by the accused TPUs, to third parties such as The New York Times and Box, with instructions on how to use the technology (Compl. ¶81).
Willful Infringement: The complaint makes extensive allegations of willful infringement. It claims that Google had pre-suit knowledge of the technology through a long series of confidential meetings and disclosures with the inventor, Dr. Bates, starting in 2010 and involving key Google AI personnel like Jeffrey Dean (Compl. ¶¶83-103). The complaint includes a visual comparison of presentation slides allegedly shown by Dr. Bates to Google with slides from a later Google presentation by Jeffrey Dean, suggesting Google adopted Singular's concepts (Compl. pp. 56-58). It is also alleged that Google knew of the specific patent applications through its IPR monitoring activities before the patents issued (Compl. ¶¶60, 105).

VII. Analyst’s Conclusion: Key Questions for the Case

A core issue will be one of architectural mapping: Can the distinct functional units within Google's TPU cores—specifically the Matrix Multiply Unit (MXU) and the Vector Processing Unit (VPU)—be mapped directly onto the patent's claimed "first processing elements" (low-precision) and "second processing elements" (high-precision), or is there a fundamental structural mismatch between the accused product and the claimed invention?
The case will likely involve a significant question of claim construction: How broadly will the term "processing element" be defined? Will it be construed as a general-purpose execution unit, allowing different parts of a TPU to be counted as separate elements, or as a more specific, integrated structure as depicted in the patent’s embodiments, potentially complicating the infringement read?
A key factual dispute for the jury will be one of copying versus independent invention: Given the detailed allegations of extensive pre-infringement disclosures by the inventor to key Google AI leaders, what evidence will determine whether Google's adoption of a low-precision, massively parallel architecture for its TPUs was the result of independent development or was derived from the technology disclosed by Singular?