DCT

1:23-cv-00816

Friendliai Inc v. Hugging Face Inc

Key Events

Complaint

I. Executive Summary and Procedural Information

Parties & Counsel:
- Plaintiff: FriendliAI Inc. (Republic of Korea)
- Defendant: Hugging Face, Inc. (Delaware)
- Plaintiff’s Counsel: Potter Anderson & Corroon LLP; Fenwick & West LLP
Case Identification: 1:23-cv-00816, D. Del., 07/28/2023
Venue Allegations: Venue is based on Defendant being a Delaware corporation, which establishes residency in the district for patent venue purposes.
Core Dispute: Plaintiff alleges that Defendant’s Text Generation Inference (TGI) service, which is used for serving large language models, infringes a patent related to dynamic or continuous batching of user requests to improve efficiency.
Technical Context: The technology addresses performance bottlenecks in serving large-scale generative AI models, where efficiently managing and scheduling simultaneous user requests is critical for reducing latency and increasing throughput.
Key Procedural History: The complaint alleges that the technology was first publicly disclosed in a July 2022 academic paper ("Orca" paper) co-authored by the patent's inventors. Plaintiff also alleges it sent a notice letter to Defendant regarding the infringement on July 21, 2023, one week before filing the complaint.

Case Timeline

Date	Event
2021-12-03	’775 Patent Priority Date
2022-07-XX	"Orca" paper describing patented technology published
2022-09-13	’775 Patent Issued
2023-02-XX	Accused Product (Text Generation Inference) Launched
2023-07-21	Plaintiff’s counsel sent notice letter to Defendant
2023-07-28	Complaint Filed

II. Technology and Patent(s)-in-Suit Analysis

U.S. Patent No. 11,442,775 - "Dynamic Batching for Inference System for Transformer-Based Generation Tasks", issued September 13, 2022

The Invention Explained

Problem Addressed: The patent describes inefficiencies in prior art AI inference systems that use "static batching." In such systems, multiple user requests are grouped into a fixed batch for processing. This method suffers from high latency because a request that finishes processing early must wait for the slowest request in the batch to complete before its result can be returned. Furthermore, a new request arriving while a batch is in progress must wait until the entire current batch is finished, leading to under-utilization of computational resources (Compl. ¶12; ’775 Patent, col. 1:40-54).
The Patented Solution: The patent proposes a system of "iteration-level scheduling," also referred to as dynamic or continuous batching. The core idea is to allow the system to flexibly modify a batch during its execution cycle. The system can add new incoming requests to a running batch or remove requests that have already completed (Compl. ¶14; ’775 Patent, col. 2:62-65). This is accomplished by scheduling a new batch that includes ongoing requests and a new request, responsive to determining that the execution engine has available memory, thereby allowing for more efficient use of hardware accelerators and faster response times for users (’775 Patent, col. 3:2-24).
Technical Importance: This approach allows for a significant increase in throughput and decrease in latency when serving large generative AI models, which are computationally intensive and sensitive to processing delays (Compl. ¶14).

Key Claims at a Glance

The complaint asserts infringement of one or more claims, with a focus on independent claim 10 (Compl. ¶42).
The essential elements of independent claim 10 include:
- Receiving one or more initial requests for execution by a serving system.
- Scheduling a first batch of these requests on an execution engine.
- Generating a first set of output tokens for this batch.
- Receiving a new request from a client device.
- Scheduling a second batch that includes the new request, where this scheduling is "responsive to determining that the execution engine has memory available."
- The new request has an input token sequence with a length different from at least one other request in the batch.
- Generating a second set of output tokens for the second batch.
The complaint reserves the right to assert other claims of the ’775 Patent (Compl. ¶41).

III. The Accused Instrumentality

Product Identification

The primary accused instrumentality is Defendant’s "Text Generation Inference" ("TGI") software, which functions as an inference server for Large Language Models ("LLMs") (Compl. ¶26). The complaint also names services that incorporate TGI, including "Spaces, Inference Endpoints, Enterprise Hub... HuggingChat, OpenAssistant, and Docker Hub containers" (Compl. ¶41).

Functionality and Market Context

TGI is alleged to possess an "important feature" described by Defendant as "continuous batching" or "dynamic batching" of incoming requests (Compl. ¶28). This feature is marketed as enabling "increased total throughput" and providing an optimal balance between "exploiting the hardware and perceived latency" (Compl. ¶30). A screenshot from Defendant's documentation lists "Continuous batching of incoming requests for increased total throughput" as a key feature of TGI (Compl. ¶42, Ex. 10). Another visual from Defendant's blog describes its "Inference Endpoints" as "Optimized for LLMs, enabling high throughput... and low latency... power[ed] by Text Generation Inference" (Compl. ¶31, Ex. 11).
The complaint alleges TGI is used in production at Hugging Face to power its services and is also made available for customers to download and use on their own systems via Docker Hub (Compl. ¶13, ¶44).

IV. Analysis of Infringement Allegations

’775 Patent Infringement Allegations

Claim Element (from Independent Claim 10)	Alleged Infringing Functionality	Complaint Citation	Patent Citation
A non-transitory computer-readable storage medium storing computer program instructions...	Defendant provides TGI software for download, including via Docker Hub containers.	¶41, ¶44	col. 29:1-5
receiving, by a serving system, one or more requests for execution...	TGI is an inference server that receives requests to run on LLMs.	¶26, ¶42	col. 25:11-13
scheduling, by the scheduler, a batch of requests including the one or more requests for execution on an execution engine;	TGI's "continuous batching" feature schedules a batch of requests for execution.	¶28-29, ¶42	col. 25:57-61
generating, by the execution engine, a first set of output tokens...	TGI generates output tokens by applying the transformer model to the batch.	¶42	col. 26:1-4
receiving, by a request processor, a new request from a client device, the new request including a sequence of input tokens;	The "continuous batching" feature is specifically designed to handle "incoming requests" during processing.	¶28, ¶42	col. 26:33-35
scheduling, by the scheduler, a second batch of requests additionally including the new request for execution on the execution engine, the second batch of requests scheduled responsive to determining that the execution engine has memory available...	TGI is alleged to add new requests into a running batch, which constitutes scheduling a second, modified batch.	¶29, ¶42	col. 26:36-44
wherein in a second set of inputs for the second batch of requests, a length of the sequence of input tokens for the new request is different from a length of an input for at least one request other than the new request;	The complaint alleges TGI handles real-world requests which inherently vary in length, satisfying this limitation.	¶12, ¶42	col. 26:29-32
generating, by the execution engine, a second set of output tokens by applying the transformer model to the second set of inputs for the second batch.	TGI generates a second set of output tokens from the modified batch.	¶42	col. 26:39-44

Identified Points of Contention

Technical Question: The complaint alleges that TGI schedules a second batch, but it provides limited detail on the mechanism by which this occurs. A key question will be whether TGI’s process for adding a new request aligns with the claim requirement of doing so "responsive to determining that the execution engine has memory available." The evidence required to prove this specific conditional logic will be a central point of discovery.
Scope Questions: Claim 10 is directed to a "non-transitory computer-readable storage medium." The complaint alleges infringement by Defendant's use of TGI on its own servers and by its distribution of TGI via Docker. These different activities may implicate different infringement theories (direct infringement vs. inducement) and require different forms of proof.

V. Key Claim Terms for Construction

The Term: "scheduling... a second batch of requests additionally including the new request"

Context and Importance: This phrase is central to the concept of "dynamic" batching. The dispute will likely focus on whether this requires the creation of a new, discrete data structure for the "second batch" or if it can be read more functionally to cover any process of injecting a new request into an ongoing computational stream.
Intrinsic Evidence for a Broader Interpretation: The patent specification describes the invention in flexible terms, stating that "at one or more iterations, the inference system can modify the batch being executed on the execution engine by adding new incoming requests to the batch" (’775 Patent, col. 2:62-64). This language may support a more functional interpretation focused on modification rather than discrete creation.
Intrinsic Evidence for a Narrower Interpretation: The claim's sequential language—"scheduling, by the scheduler, a batch" followed by "scheduling, by the scheduler, a second batch"—could be argued to imply two distinct and separate scheduling events and batch objects. The flowchart in Figure 7 depicts "Schedule a batch of requests" (712) and "Schedule a second batch of requests" (718) as separate steps, which may support a narrower interpretation (’775 Patent, Fig. 7).

The Term: "responsive to determining that the execution engine has memory available"

Context and Importance: This term provides the condition precedent for adding a new request to a batch. Its construction will determine what type of check or determination Plaintiff must prove Defendant's system performs. Practitioners may focus on this term because it links the scheduling action to a specific system state.
Intrinsic Evidence for a Broader Interpretation: The claim language is not explicitly limited to a specific type of memory (e.g., GPU memory, cache memory, RAM). This could support an argument that any check for sufficient computational resources to handle a new request meets the limitation.
Intrinsic Evidence for a Narrower Interpretation: The specification provides a more specific context, describing how an execution engine can "free the allocated cache memory for the completed requests, such that the freed memory can be used for other requests" (’775 Patent, col. 20:23-25). A defendant may argue that the term should be limited to this specific context of freeing and reallocating internal state cache memory, as opposed to a more general system memory check.

VI. Other Allegations

Indirect Infringement: The complaint alleges induced infringement under 35 U.S.C. § 271(b). The factual basis is Defendant's alleged provision of the accused TGI software (e.g., via Docker Hub) along with "supporting materials, documentation, instructions, code and/or technical information" that allegedly instruct and encourage customers to use the software in an infringing manner (Compl. ¶45-46).
Willful Infringement: The complaint alleges willful infringement based on both pre- and post-suit knowledge. The pre-suit allegation is grounded in Defendant's alleged awareness of and intent to copy the technology from the inventors' July 2022 "Orca paper," which allegedly describes the patented invention (Compl. ¶35, ¶49). The post-suit allegation is based on Defendant's continued infringement after receiving a notice letter from Plaintiff's counsel on July 21, 2023 (Compl. ¶38, ¶49).

VII. Analyst’s Conclusion: Key Questions for the Case

A key evidentiary question will be one of operational correspondence: Can Plaintiff demonstrate that Defendant's "continuous batching" in TGI performs the specific steps recited in Claim 10, particularly the act of scheduling a new request responsive to determining that the execution engine has memory available? The case will likely require a deep technical dive into the source code and operational logic of TGI.
A central dispute will be one of knowledge and copying: The willfulness claim hinges on Plaintiff's ability to prove its allegation that Defendant "copied the technology described in the Orca paper." This transforms the case from a standard infringement dispute into one with allegations of deliberate misconduct, raising the stakes for damages.
The outcome may also turn on a question of claim construction: How the court construes the term "scheduling... a second batch" will be critical. A narrow, structural definition could create a path for non-infringement, whereas a broader, functional definition that covers any modification of an in-flight batch would likely favor the Plaintiff.