DCT

1:19-cv-11438

Nuance Communications Inc v. Omilia Natural Language Solutions Ltd

Key Events

Amended Complaint

I. Executive Summary and Procedural Information

Parties & Counsel:
- Plaintiff: Nuance Communications, Inc. (Delaware)
- Defendant: Omilia Natural Language Solutions, Ltd. (Cyprus)
- Plaintiff’s Counsel: Womble Bond Dickinson (US) LLP
Case Identification: 1:19-cv-11438, D. Mass., 06/08/2020
Venue Allegations: Venue is alleged to be proper as Defendant is subject to personal jurisdiction in Massachusetts due to having an office and customers in the state, and because a substantial part of the alleged infringing acts occurred in the district, where Plaintiff resides.
Core Dispute: Plaintiff alleges that Defendant’s conversational artificial intelligence and automatic speech recognition platform infringes eight patents related to speech recognition, interactive voice response systems, and multi-lingual processing.
Technical Context: The technology at issue is automated speech recognition (ASR) and natural language understanding (NLU) used in enterprise-grade conversational interactive voice response (IVR) platforms, a key technology for automating customer service interactions.
Key Procedural History: The complaint alleges a complex history between the parties, beginning with a reseller agreement in 2002 that Plaintiff terminated effective January 2014. Plaintiff alleges that Defendant, both before and after the termination, illicitly recorded live customer calls and downloaded Plaintiff’s proprietary software to develop its own competing ASR engine. The complaint notes ongoing litigation between the parties in other jurisdictions and states that Plaintiff provided Defendant with notice of certain asserted patents as early as October 2018.

Case Timeline

Date	Event
1999-05-13	’905 Patent Priority Date
2000-11-14	’925 Patent Priority Date
2002-06-12	Nuance predecessor and Omilia enter into Value Added Reseller Agreement
2002-11-04	’688 Patent Priority Date
2006-02-14	’925 Patent Issue Date
2006-04-27	’993 Patent Priority Date
2006-12-12	’688 Patent Issue Date
2006-12-19	’839 Patent Priority Date
2007-03-23	’532 Patent Priority Date
2009-01-07	’804 Patent Priority Date
2009-03-17	’905 Patent Issue Date
2009-06-24	’534 Patent Priority Date
2011-09-27	’839 Patent Issue Date
2013-02-19	’804 Patent Issue Date
2013-08-27	’534 Patent Issue Date
2013-09-10	’993 Patent Issue Date
2013-10-30	Nuance provides notice of termination of the 2011 Partner Agreement
2014-01-31	Termination of 2011 Partner Agreement becomes effective
2014-12-09	’532 Patent Issue Date
2015-01-01	Omilia enters the North American market
2016-01-01	Omilia deploys Conversational Virtual Agent solution for Royal Bank of Canada
2018-10-09	Nuance sends letter to Omilia identifying the ’905, ’993, and ’804 Patents
2020-06-08	Complaint Filing Date

II. Technology and Patent(s)-in-Suit Analysis

U.S. Patent No. 7,505,905 - "In-the-field adaptation of a large vocabulary automatic speech recognizer (ASR)"

The Invention Explained

Problem Addressed: The patent addresses the difficulty, expense, and time required to adapt a general-purpose, speaker-independent ASR engine to a specific environment or application, a process that traditionally required extensive, human-supervised tuning (Compl. ¶48; ’905 Patent, col. 1:10-20).
The Patented Solution: The invention provides a method for automated, "in-the-field" adaptation of an ASR engine without human supervision. The system receives live speech data in its deployed environment and uses the recognizer's own (imperfect) output to select and apply adaptation algorithms, which retune the engine's models to improve accuracy for application-specific features like dialects or channel noise. The improved engine is then redeployed in the target environment (’905 Patent, Abstract; col. 2:5-21).
Technical Importance: This automated adaptation technique was designed to make it more practical and economical to customize ASR systems for diverse real-world applications, a critical step for commercial viability (’905 Patent, col. 2:10-14).

Key Claims at a Glance

The complaint asserts at least independent Claim 1, a method claim (Compl. ¶71).
The essential elements of Claim 1 include:
- deploying the speech recognizer in an environment to receive live input data;
- receiving the live input data and an original speech signal;
- without supervision, selecting at least one adaptation algorithm from a plurality of adaptation algorithms;
- applying the algorithm to the live input data to improve at least one application-specific feature for recognition accuracy; and
- redeploying the adapted speech recognizer in the target environment.
The complaint reserves the right to assert additional claims (Compl. ¶70).

U.S. Patent No. 8,532,993 - "Speech recognition based on pronunciation modeling"

The Invention Explained

Problem Addressed: Conventional ASR systems maintain separate pronunciation dictionaries and language models, which makes it difficult to model pronunciation variations that occur across word boundaries in natural, continuous speech (Compl. ¶49; ’993 Patent, col. 3:28-41).
The Patented Solution: The invention proposes moving pronunciation probabilities from the dictionary directly into the language model. By creating unique lexical identifiers for different pronunciations of a word (e.g., "tomato" vs. "tomahto") and incorporating their probabilities into the language model, the system can use broader sentence context to more accurately recognize the spoken utterance (’993 Patent, Abstract; col. 6:1-15).
Technical Importance: This integration enables the language model, which predicts word sequences, to also account for pronunciation variability, a key factor in improving the accuracy of conversational speech recognition (’993 Patent, col. 4:36-44).

Key Claims at a Glance

The complaint asserts at least independent Claim 17, a claim for a computer-readable storage device (Compl. ¶88).
The essential operations of Claim 17 include:
- approximating transcribed speech using a phonemic transcription dataset to yield a language model;
- incorporating into the language model pronunciation probabilities associated with unique labels for each different pronunciation of a word;
- wherein the unique label for a most frequent word indicates a special status in the language model; and
- recognizing an utterance using the language model after incorporating the probabilities.
The complaint reserves the right to assert additional claims (Compl. ¶87).

U.S. Patent No. 8,027,839 - "Using an automated speech application environment to automatically provide text exchange services"

Technology Synopsis: The patent describes a system where a "Chatbot server" acts as an intermediary between a text-based client (e.g., SMS, chat) and a voice-based automated speech response application. This server dynamically converts messages between the two channels, allowing a user to interact with a voice-centric system via text (’839 Patent, Abstract).
Asserted Claims: At least independent Claim 17 (a system claim) (Compl. ¶103).
Accused Features: The complaint alleges that Omilia’s DiaManT® platform, with its "omChat" and "omMobile" plug-ins, infringes by providing an integrated system that allows users to interact via both speech and text-based channels (like SMS, Web-Chat, and Facebook Messenger) and dynamically converts messages between them (Compl. ¶¶104, 107-108).

U.S. Patent No. 8,521,534 - "Dynamically extending the speech prompts of a multimodal application"

Technology Synopsis: The patent describes a system that dynamically enhances a multimodal application by retrieving speech prompts from a metadata container within a media file. This allows the application to present context-relevant voice prompts related to the media content being played (’534 Patent, Abstract).
Asserted Claims: At least independent Claim 13 (a computer program product claim) (Compl. ¶117).
Accused Features: The complaint alleges that the Accused IVR Platform infringes by recording and analyzing call data (a "media file having a metadata container"), retrieving speech prompts based on the content of the call (e.g., caller intent), and modifying the application to include those prompts (Compl. ¶¶120-122).

U.S. Patent No. 8,379,804 - "Using a complex events processor (CEP) to direct the handling of individual call sessions by an interactive voice response (IVR) system"

Technology Synopsis: The patent discloses an IVR system where a Complex Events Processor (CEP) analyzes event data messages from individual call sessions, each identified by a unique ID. Based on this analysis, the CEP can dynamically modify the execution of interaction files for that specific call session (’804 Patent, Abstract).
Asserted Claims: At least independent Claim 1 (a system claim) (Compl. ¶131).
Accused Features: The complaint alleges that the Accused IVR Platform functions as the claimed system by converting speech to textual event data messages associated with a unique ID, analyzing this data to understand conversation history and user intent, and using a CEP to dynamically modify the interaction files for that specific call session (Compl. ¶¶134-135).

U.S. Patent No. 8,909,532 - "Supporting multi-lingual user interaction with a multimodal application"

Technology Synopsis: The patent describes a method for multi-lingual speech recognition where multiple speech engines, each corresponding to a different language and grammar, operate in parallel. The system evaluates the confidence levels of the recognition results from each engine to determine the most likely language spoken by the user and select it for subsequent interaction (’532 Patent, Abstract).
Asserted Claims: At least independent Claim 1 (a method claim) (Compl. ¶146).
Accused Features: The complaint alleges the Accused IVR Platform infringes by recognizing speech in multiple languages, using different grammars for each, assigning confidence levels to recognition results, and selecting the language with the highest confidence level to continue the user interaction (Compl. ¶¶147-150).

U.S. Patent No. 7,149,688 - "Multi-lingual speech recognition with cross-language context modeling"

Technology Synopsis: The patent discloses a method for recognizing speech that contains words from multiple languages. The system determines a common set of features between the subword units (e.g., phonemes) of the different languages, enabling it to select appropriate context-dependent acoustic models even at the boundary between words from different languages (’688 Patent, Abstract).
Asserted Claims: At least independent Claim 1 (a method claim) (Compl. ¶160).
Accused Features: The complaint alleges that the Accused IVR Platform, which is marketed as recognizing speech including "slang & mixed languages," infringes by determining common features between subword units of different languages to select context-dependent units in a context-aware manner (Compl. ¶¶163-167).

U.S. Patent No. 6,999,925 - "Method and apparatus for phonetic context adaptation for improved speech recognition"

Technology Synopsis: The patent describes a method for adapting a general-purpose speech recognizer to a specific domain (e.g., a new language or task). It uses a first recognizer's acoustic model and decision network as a starting point and re-estimates them using a small amount of domain-specific training data to generate a new, specialized recognizer (’925 Patent, Abstract).
Asserted Claims: At least independent Claim 27 (a computerized method claim) (Compl. ¶176).
Accused Features: The complaint alleges the Accused IVR Platform infringes by using "Deep Learning" and a "proprietary method of training and tuning" to adapt its models for different languages and domains, which allegedly involves generating a second, multi-lingual recognizer from a first recognizer using domain-specific training data (Compl. ¶¶177-178, 181).

III. The Accused Instrumentality

Product Identification: Defendant Omilia’s software platform, identified as the "Accused IVR Platform," which includes its "deepASR®" automatic speech recognition engine, "deepNLU®" natural language understanding engine, and "DiaManT®" Omni-Channel Conversational Platform (Compl. ¶¶59, 61).
Functionality and Market Context: The Accused IVR Platform is a conversational AI system marketed to enterprises in sectors like banking and telecommunications for automating customer service (Compl. ¶62). The complaint alleges the platform provides an "omni-channel" experience by integrating voice with text-based channels such as SMS, email, and Facebook Messenger (Compl. ¶63). A central allegation is that the platform's performance, which Omilia markets as achieving "human-level performance," was enabled by improperly using Plaintiff's technology and by training its AI models on live customer call audio recorded from systems running Plaintiff's ASR software (Compl. ¶¶5, 37, 72). A diagram from a 2013 Omilia presentation shows a "Call Recording System" connected to an IVR system that uses "ASR - Nuance" (Compl. p. 14, ¶37).

IV. Analysis of Infringement Allegations

'905 Patent Infringement Allegations

Claim Element (from Independent Claim 1)	Alleged Infringing Functionality	Complaint Citation	Patent Citation
A method of improving the recognition accuracy of a speech recognizer comprising the steps of: deploying the speech recognizer in an environment to receive live input data;	Omilia deploys its Accused IVR Platform to customers, where it is used to service live customer calls.	¶73	col. 2:10-12
receiving the live input data and an original speech signal;	The platform receives live input data in the form of speech from customer calls.	¶74	col. 2:13-14
without supervision, selecting at least one adaptation algorithm from a plurality of adaptation algorithms, and	The platform is alleged to dynamically adapt to different languages and accents, which involves selecting adaptation algorithms without direct human supervision.	¶75	col. 2:15-17
applying the selected adaptation algorithm to the received live input data... to improve at least one application-specific feature...	The platform applies algorithms to adapt to user accents or languages, thereby improving the recognizer's accuracy for that application-specific feature. This is supported by Omilia’s claim to train models with "real world call center audio to optimize" them.	¶¶76, 77	col. 2:18-21
redeploying the adapted speech recognizer in the target environment.	After adaptation for a new accent or language, the ASR is redeployed to achieve improved recognition.	¶78	col. 2:22-23

Identified Points of Contention:
- Scope Questions: The case may turn on whether Omilia’s alleged practice of using "real world call center audio" for large-scale, offline model training constitutes the claimed method of "in-the-field adaptation," which the patent describes as an automated process occurring after deployment. A central question will be whether the term "redeploying" requires a discrete, post-adaptation software update, or if it can read on a system that continuously learns and updates.
- Technical Questions: A key evidentiary question is what specific "adaptation algorithms" the Accused IVR Platform actually uses. The complaint relies on marketing statements about adapting to accents; the case will require evidence of how this adaptation technically functions and whether it operates "without supervision" as claimed. The screenshot stating Omilia is proud to have the "most accurate on-premise ASR engine ... touching human-level performance" will be used by Plaintiff to argue the platform practices a method of improving accuracy (Compl. p. 33, ¶72).

'993 Patent Infringement Allegations

Claim Element (from Independent Claim 17)	Alleged Infringing Functionality	Complaint Citation	Patent Citation
A computer-readable storage device having instructions stored which... cause the processor to perform operations comprising: approximating transcribed speech using a phonemic transcription dataset associated with a speaker, to yield a language model...	Omilia develops its ASR models using recorded speech from real calls, which allegedly involves using phonemic transcription datasets based on pronunciation models to create language models.	¶90	col. 9:10-18
incorporating, into the language model, pronunciation probabilities associated with respective unique labels for each different pronunciation of a word,	The platform's ability to recognize different accents with reduced error rates is alleged to demonstrate that it assigns pronunciation probabilities to different pronunciations of words and incorporates them into its language model.	¶91	col. 9:19-22
wherein the respective unique label for a most frequent word indicates a special status in the language model; and	The platform is alleged to use a statistical language model (SLM) that employs a probabilistic assessment to give special status to frequently used words, a feature Plaintiff links to Omilia's "omAnalytics" tool for discovering "frequent terms."	¶92	col. 9:23-25
after incorporating the pronunciation probabilities into the language model, recognizing an utterance using the language model.	The platform is alleged to recognize utterances by using its SLM, which incorporates the aforementioned pronunciation probabilities.	¶93	col. 9:26-28

Identified Points of Contention:
- Scope Questions: The central dispute may be the interpretation of "incorporating, into the language model, pronunciation probabilities." A question for the court will be whether this requires the probabilities to be structurally integrated into a single language model data file, or if it can cover a system where a separate pronunciation module provides probabilities that are dynamically combined with the language model's outputs during recognition.
- Technical Questions: The complaint alleges on "information and belief" that Omilia's use of deep learning and "millions of samples of real speech" equates to the claimed method of generating and using a language model with incorporated pronunciation probabilities. A technical question will be what evidence demonstrates that Omilia’s neural network-based architecture meets these specific claim limitations, particularly the "special status" for frequent words. The complaint's visual evidence showing Omilia's "omChat" and "omMobile" plug-ins will be used to demonstrate the system's broad technical capabilities (Compl. p. 45).

V. Key Claim Terms for Construction

’905 Patent (Claim 1)

The Term: "without supervision"
Context and Importance: This term is central to the patent's novelty, distinguishing it from methods requiring human transcription. The infringement analysis will depend on whether Omilia’s alleged training methods—which use "real world call center audio"—qualify as unsupervised adaptation or as a form of supervised offline development.
Intrinsic Evidence for Interpretation:
- Evidence for a Broader Interpretation: The specification contrasts the invention with techniques requiring "human listeners" and "human intervention," suggesting "without supervision" primarily means freedom from manual human transcription of the live data used for adaptation (’905 Patent, col. 1:36-42).
- Evidence for a Narrower Interpretation: The patent states that "the imperfect output of the recognizer itself is preferably the only information used to supervise the transcription," which could be argued to limit the claim to a specific closed-loop system where no other corrective data or offline human-guided analysis is used (’905 Patent, col. 2:5-9).

’993 Patent (Claim 17)

The Term: "incorporating, into the language model, pronunciation probabilities"
Context and Importance: This term defines the core technical mechanism of the invention. Practitioners may focus on whether this requires a specific data architecture where pronunciation data is structurally part of the language model, versus a system where separate pronunciation and language models interact.
Intrinsic Evidence for Interpretation:
- Evidence for a Broader Interpretation: The Abstract and Summary describe the invention as "moving a pronunciation model from a dictionary to the language model." This could support a functional interpretation where any system architecture that causes the language model's word-sequence predictions to be directly weighted by pronunciation probabilities infringes (’993 Patent, Abstract).
- Evidence for a Narrower Interpretation: The detailed description explains that pronunciation dependencies are modeled by creating "new lexical items" that are given entries in the language model. This suggests a structural requirement: the language model itself must contain distinct entries for different pronunciations, rather than merely querying an external pronunciation module during operation (’993 Patent, col. 7:42-51).

VI. Other Allegations

Indirect Infringement: For each asserted patent, the complaint alleges inducement on the basis that Omilia configures the Accused IVR Platform for its customers and encourages infringing use. It also alleges contributory infringement on the basis that the platform is a material part of the claimed inventions, is not a staple article of commerce, and is specifically adapted for infringement (Compl. ¶¶79, 81, 94, 96).
Willful Infringement: Willfulness is alleged for the ’905, ’993, and ’804 Patents based on pre-suit knowledge stemming from a notice letter dated October 9, 2018 (Compl. ¶65). For the remaining five patents, willfulness is alleged based on knowledge from at least the filing date of the instant complaint (Compl. ¶66). The complaint asserts this conduct was "deliberate, willful, and knowing, with conscious disregard of Nuance's rights" (Compl. ¶67).

VII. Analyst’s Conclusion: Key Questions for the Case

A central factual issue will be one of technological provenance: What evidence will establish that Defendant’s accused ASR engine is a product of the alleged misappropriation of Plaintiff’s software and illegally recorded training data, as opposed to the result of independent development using modern machine learning techniques?
A key issue of claim scope will be one of technological translation: Can patent claims drafted for rule-based and statistical ASR systems from the late 1990s and 2000s be construed to cover a modern ASR engine allegedly built on "Deep Learning"? For example, does Defendant’s large-scale, offline model training using "real world call center audio" meet the ’905 patent’s requirement for "in-the-field adaptation... without supervision"?
A critical claim construction question will be one of architectural equivalence: For the ’993 patent, does Defendant’s system "incorporate" pronunciation probabilities "into the language model" in the specific structural manner claimed, or does it utilize a distinct neural network architecture where pronunciation and language modeling are functionally separate, thereby falling outside the scope of the claim?