DCT

4:23-cv-00520

Language Tech Inc v. Microsoft Corp

Key Events

Amended Complaint

I. Executive Summary and Procedural Information

Parties & Counsel:
- Plaintiff: Language Technologies, Inc. (Delaware)
- Defendant: Microsoft Corporation (Washington)
- Plaintiff’s Counsel: McKool Smith, P.C.; Farhang & Medcoff
Case Identification: 4:23-cv-00520, D. Ariz., 04/26/2024
Venue Allegations: Venue is alleged based on Microsoft having regular and established places of business in the District of Arizona, including its "West US 3" datacenter region, and committing acts of infringement within the district by offering and providing accused products and services.
Core Dispute: Plaintiff alleges that Defendant’s Bling FIRE tokenizer, used in Bing search and other natural language processing services, infringes patents related to computerized methods for identifying phrase boundaries in text.
Technical Context: The technology concerns automated text analysis, specifically methods for parsing text into linguistic units like phrases to improve readability, comprehension, and the performance of applications such as search engines and closed-captioning systems.
Key Procedural History: This First Amended Complaint was filed following a court order on a motion to dismiss the original complaint. Plaintiff dedicates significant argument to patent eligibility under 35 U.S.C. § 101, leveraging the prosecution histories of the patents-in-suit to argue that the claimed methods represent a specific, non-abstract improvement over prior art and are not merely directed to the abstract idea of analyzing text.

Case Timeline

Date	Event
1999-07-16	Common Priority Date for Patents-in-Suit
2006-06-27	U.S. Patent No. 7,069,508 Issues
2008-03-18	U.S. Patent No. 7,346,489 Issues
2010-10-28	Plaintiff alleges initial contact with Microsoft researchers regarding the technology
2018-05-01	Plaintiff alleges providing Microsoft with information on all its patents, including the Patents-in-Suit
2019-04-25	Microsoft announces open-source release of the accused Bling FIRE tokenizer
2024-04-26	First Amended Complaint Filing Date

II. Technology and Patent(s)-in-Suit Analysis

U.S. Patent No. 7,069,508 - "System and Method for Formatting Text According to Linguistic, Visual and Psychological Variables," issued June 27, 2006

The Invention Explained

Problem Addressed: The patent’s background section identifies the challenge of formatting text for optimal reading speed and comprehension. It notes that while linguistic research shows the "phrase" is a key unit in comprehension, conventional computerized text presentation, such as in closed-captioning, often presents words "without being grouped in a manner which would assist their comprehension" ('508 Patent, col. 1:27-36, col. 7:35-37).
The Patented Solution: The invention proposes a computerized system that analyzes text to predict phrase boundaries and then formats the text based on those predictions. It uses a "library" containing data on "key words and punctuation" that signal the beginning or end of a phrase. A "readability engine," described as a neural network, uses this library to analyze sequences of words and assign values indicating the likelihood of a phrase break, which then guides the final text formatting ('508 Patent, Abstract; col. 2:12-24; col. 3:14-24).
Technical Importance: The method aims to improve the functionality of computerized text-display systems by presenting text in a way that aligns with natural linguistic and psychological processing, thereby enhancing readability ('508 Patent, col. 2:10-11).

Key Claims at a Glance

The complaint asserts at least independent claim 23 (Compl. ¶117).
Claim 23 Essential Elements:
- A computer-implemented method for formatting text, comprising the steps of:
- providing text input;
- providing a library of key words and punctuation definitions that identify the beginning or end of a phrase;
- using said key words and punctuation definitions to determine characteristics that predict boundary punctuation;
- examining a plurality of words of said text input;
- using said determined characteristics to predict phrase boundaries within said plurality of words;
- repeating the examining and predicting steps for subsequent pluralities of words; and
- formatting the text input according to the predicted phrase boundaries.
The complaint reserves the right to assert additional claims (Compl. ¶117).

U.S. Patent No. 7,346,489 - "System and Method of Determining Phrasing in Text," issued March 18, 2008

The Invention Explained

Problem Addressed: As a continuation of the '508 Patent, the '489 Patent addresses the same problem of automatically processing text to account for its linguistic phrase structure, which impacts reading comprehension and the performance of text-based technologies ('489 Patent, col. 1:33-40).
The Patented Solution: The invention describes a method for determining phrasing by using a library of "key words and punctuation definitions" to identify characteristics that predict phrase or sentence boundaries. The system examines pluralities of words and uses the determined characteristics to predict where phrase boundaries occur in the text ('489 Patent, Abstract). The complaint highlights that this technology has direct application to "tokenization" in Natural Language Processing (NLP) for technologies like internet search engines (Compl. ¶¶103-104). The complaint references a diagram from the CTA-708 closed captioning standard to provide an example of the type of computerized text processing system the invention improves (Compl. ¶55, p. 15).
Technical Importance: This automated phrase prediction is presented as a technological improvement that enables more accurate text parsing, which can enhance search result relevance by analyzing text at the phrase or sentence level rather than just by individual word frequency ('489 Patent, col. 2:20-24; Compl. ¶104).

Key Claims at a Glance

The complaint asserts at least independent claim 1 and dependent claim 3 (Compl. ¶¶113, 125).
Claim 1 Essential Elements:
- A method for determining phrasing in text, comprising the steps of:
- providing text input;
- providing a library of key words and punctuation definitions that identify the beginning or end of a phrase;
- using said key words and punctuation definitions to determine characteristics that predict phrase or sentence boundaries;
- examining a plurality of words of said text input;
- using said determined characteristics to predict phrase boundaries within said plurality of words; and
- repeating the examining and predicting steps until phrase boundaries are predicted for each between-word space.
The complaint reserves the right to assert additional claims (Compl. ¶125).

III. The Accused Instrumentality

Product Identification: The accused instrumentality is Microsoft’s "Bling FIRE tokenizer," which the complaint alleges is used internally by the Bing search engine and is incorporated into other Microsoft NLP products and services, including Azure Cognitive Services (Search, Dictate, AI Language, AI Speech), Language Understanding (LUIS), Cortana, and Translate (Compl. ¶¶111, 114).
Functionality and Market Context: The complaint alleges that Bling FIRE is a tokenizer, defined as a system for "splitting text into constituent elements, such as sentences, phrases, and words" (Compl. ¶111). This functionality is described as foundational for Microsoft’s "Deep Learning based projects" (Compl. ¶111, n.17). The complaint alleges Microsoft began using the tokenizer in Bing search before its public, open-source release announcement on April 25, 2019, and that its use extends to the Bing Chat feature in the Microsoft Edge browser (Compl. ¶¶111, 114).

IV. Analysis of Infringement Allegations

The complaint alleges that the Bling FIRE tokenizer performs the steps of the asserted claims but refers to external exhibits (Exhibits 7 and 8), which are not included in the provided document, for a limitation-by-limitation explanation (Compl. ¶¶112-113). The following analysis is based on the narrative allegations in the complaint body.

'508 Patent Infringement Allegations

Claim Element (from Independent Claim 23)	Alleged Infringing Functionality	Complaint Citation	Patent Citation
A computer-implemented method for formatting text	The complaint alleges Bling FIRE is a "computerized text processing method" implemented in software.	¶111, ¶114	col. 7:48-52
providing a library of key words and punctuation definitions that identify the beginning or end of a phrase	The complaint alleges Bling FIRE is a "tokenizer" that splits text into phrases, which suggests the use of some form of linguistic rules or models, consistent with a library.	¶111, ¶114	col. 3:14-19
using said key words and punctuation definitions to determine characteristics that predict boundary punctuation	The complaint alleges Bling FIRE performs tokenization to identify phrases, which necessarily involves determining characteristics that predict boundaries.	¶111	col. 9:39-42
examining a plurality of words of said text input	As a text processing tokenizer, Bling FIRE necessarily examines input text.	¶111	col.9:43-44
using said determined characteristics to predict phrase boundaries within said plurality of words	This is the core alleged function of the Bling FIRE tokenizer: splitting text into phrases based on its analysis.	¶111	col. 9:45-47
formatting said text input according to the predicted phrase boundaries	The complaint alleges the tokenization is used for applications like search, where the output (ranked results) is formatted based on the phrasal analysis.	¶104	col. 9:50-52

'489 Patent Infringement Allegations

Claim Element (from Independent Claim 1)	Alleged Infringing Functionality	Complaint Citation	Patent Citation
A method for determining phrasing in text	The complaint alleges the Bling FIRE tokenizer performs the function of "determining phrasing in text" through its tokenization process.	¶111	col. 7:15-16
providing a library of key words and punctuation definitions that identify the beginning or end of a phrase	The complaint alleges Bling FIRE's function is to split text into phrases, implying the use of linguistic definitions consistent with the claimed library.	¶111	col. 7:22-25
using said key words and punctuation definitions to determine characteristics that predict phrase or sentence boundaries	The complaint alleges Bling FIRE's purpose is to identify constituent elements like phrases and sentences, which requires determining characteristics that predict their boundaries.	¶111	col. 8:26-29
examining a plurality of words of said text input	The Bling FIRE tokenizer is alleged to process input text.	¶111	col. 8:30-31
using said determined characteristics to predict phrase boundaries within said plurality of words	This is the core alleged function of the Bling FIRE tokenizer.	¶111	col. 8:32-34

Identified Points of Contention:
- Technical Questions: A central question will be what technical method the Bling FIRE tokenizer actually uses. Does its process for "splitting text" map onto the specific steps of the asserted claims, particularly the use of a "library of key words and punctuation definitions" to "determine characteristics" that then "predict phrase boundaries"? The complaint argues this on a functional level but does not provide specific evidence of Bling FIRE's internal architecture.
- Scope Questions: The dispute will likely involve whether the general function of "tokenization" as performed by Bling FIRE is the same as the specific method claimed in the patents. The complaint heavily relies on prosecution history to argue the claims are narrow and specific, citing the applicant's distinction over the Walker prior art, which used "folding rules" and word "attributes" rather than a library of "function words" (Compl. ¶¶83-84, 101). The complaint includes a figure from Walker to illustrate its different "cascading text" method (Compl. ¶79, p. 21).

V. Key Claim Terms for Construction

The complaint focuses extensively on the proper construction of "key words" as a critical issue distinguishing the patented invention from both the prior art and abstract ideas.

The Term: "key words"
Context and Importance: The definition of this term appears central to both the infringement and validity (specifically, patent eligibility) analyses. The complaint notes that in a prior order, the Court adopted a plain meaning of "well-established vocabulary" for "key words" (Compl. ¶32). LTI argues this is incorrect and that the term has a specific technical meaning within the patent. Practitioners may focus on this term because its construction could determine whether the claims are directed to a specific, concrete technical method (as LTI argues) or a more general, abstract concept of analyzing text using vocabulary (as Microsoft appears to argue).
Intrinsic Evidence for Interpretation:
- Evidence for a Broader Interpretation: The complaint itself does not present evidence for a broad interpretation but notes the Court previously construed the term to mean "well-established vocabulary in the English language" in the absence of a more specific definition (Compl. ¶32). A party arguing for a broader view might point to the term's general usage and the lack of an explicit definition in a dedicated lexicography section of the patent.
- Evidence for a Narrower Interpretation: The complaint provides substantial argument for a narrower construction, equating "key words" with "function words." It cites the specification's description of the "library 25" as having an "installed vocabulary of function words and punctuation data" ('508 Patent, col. 4:5-7; Compl. ¶34). It also points to the description of the "Clauseau engine" logic, which, in the absence of punctuation, "looks for an article or stored function word indicating the beginning or end of a phrase" ('508 Patent, col. 4:29-33; Compl. ¶35).

VI. Other Allegations

Indirect Infringement: The complaint alleges induced infringement under 35 U.S.C. § 271(b), stating that Microsoft actively encourages use of the infringing tokenizer by providing "functionality, instructions, training modules, and other assistance" to its users and customers (Compl. ¶¶119, 127). It also alleges contributory infringement under § 271(c), asserting the Bling FIRE tokenizer is not a staple article of commerce suitable for substantial non-infringing use (Compl. ¶¶120, 128).
Willful Infringement: Willfulness is alleged based on pre-suit knowledge of the patents. The complaint claims Microsoft had knowledge of LTI's technology since at least 2010 through communications with researchers and specific knowledge of the patents-in-suit since at least May 2018, when a consulting firm allegedly provided a slide deck with patent numbers to Microsoft executives (Compl. ¶¶107-110, 118, 126).

VII. Analyst’s Conclusion: Key Questions for the Case

Claim Construction & Eligibility: The case appears to hinge on a foundational question of definitional scope and its impact on patent eligibility: Will the term "key words" be construed narrowly, as LTI argues, to mean the specific class of "function words" taught in the specification for identifying phrase boundaries? A narrow construction may support LTI's argument that the claims recite a specific, concrete solution to a technical problem (improving computerized tokenization), thereby surviving an eligibility challenge under 35 U.S.C. § 101. A broader construction may expose the claims to arguments that they are directed to an abstract idea.
Prosecution History Estoppel & Inventive Concept: A second key issue will be the role of the prosecution history. LTI heavily relies on its arguments distinguishing the claims from the Walker prior art to assert an "inventive concept." The court will need to determine if the claimed combination—using a library of function words and punctuation to determine characteristics and predict boundaries—is a non-routine, unconventional improvement over prior art text processing methods, or if these are merely conventional steps applied in a predictable way.
Evidentiary Proof of Infringement: Finally, a central evidentiary question will be one of technical mechanism: Assuming a favorable claim construction for LTI, what evidence can be produced to show that Microsoft's Bling FIRE tokenizer operates according to the specific sequence of steps recited in the claims, as opposed to other known methods of NLP tokenization that may achieve a similar result through a different process?