DCT

1:23-cv-00581

Dialect LLC v. Amazon.com Inc

Key Events

Amended Complaint

I. Executive Summary and Procedural Information

Parties & Counsel:
- Plaintiff: Dialect, LLC (Texas)
- Defendant: Amazon.com, Inc., and Amazon Web Services, Inc. (Delaware)
- Plaintiff’s Counsel: Hausfeld LLP; Blue Peak Law Group LLP
Case Identification: 1:23-cv-00581, E.D. Va., 07/31/2023
Venue Allegations: Plaintiff alleges venue is proper in the Eastern District of Virginia based on Amazon establishing its second headquarters (HQ2) in Arlington, Virginia, within the district, and actively hiring for its Alexa teams at that location.
Core Dispute: Plaintiff alleges that Defendant’s Alexa virtual assistant products and associated services infringe seven patents related to natural language understanding, contextual voice processing, and noise filtering technologies.
Technical Context: The lawsuit concerns foundational technologies for natural language understanding (NLU), which enable voice-activated virtual assistants to interpret and respond to conversational human speech.
Key Procedural History: The complaint alleges that VoiceBox Technologies, the original inventor of the Asserted Patents, engaged in a series of meetings with Amazon in 2011 to explore a business relationship, during which VoiceBox’s patented technology was disclosed. The complaint further alleges that Amazon subsequently launched its similar Alexa and Echo products in 2014 without licensing the technology and later hired key technical personnel from VoiceBox.

Case Timeline

Date	Event
2002-06-03	Priority Date for ’006, ’327, ’039 Patents
2002-07-15	Priority Date for ’720, ’845 Patents
2003-07-29	Priority Date for ’468, ’957 Patents
2010-04-06	U.S. Patent No. 7,693,720 Issues
2011-09-06	U.S. Patent No. 8,015,006 Issues
2011-10-07	VoiceBox and Amazon hold teleconference
2011-10-19	VoiceBox meets with Amazon personnel at Amazon's offices
2011-10-26	Amazon personnel meet with VoiceBox at VoiceBox's office
2012-03-20	U.S. Patent No. 8,140,327 Issues
2012-06-05	U.S. Patent No. 8,195,468 Issues
2014-01-01	Amazon announces the launch of Alexa and first-generation Echo (date approximate)
2015-05-12	U.S. Patent No. 9,031,845 Issues
2016-02-16	U.S. Patent No. 9,263,039 Issues
2016-11-15	U.S. Patent No. 9,495,957 Issues
2023-07-31	Complaint Filing Date

II. Technology and Patent(s)-in-Suit Analysis

U.S. Patent No. 7,693,720 - "Mobile Systems And Methods For Responding To Natural Language Speech Utterance"

Patent Identification: U.S. Patent No. 7,693,720, "Mobile Systems And Methods For Responding To Natural Language Speech Utterance," issued April 6, 2010 (Compl. ¶40).

The Invention Explained

Problem Addressed: The patent’s background section describes the difficulty of creating a natural language speech interface suitable for a "vehicular environment" (Compl. ¶43; ’720 Patent, col. 1:34-36). It notes that conventional systems required "highly structured" queries that were "not inherently natural to the human user" and struggled with ambiguous or incomplete commands that required context to be understood (’720 Patent, col. 1:49-54, 2:1-8).
The Patented Solution: The invention is a mobile system that processes natural language by using a speech recognition engine with a dictionary that is "dynamically updated based on at least a history of a current dialog" (Compl. ¶45). A parser then determines the context of the user's utterance and selects an appropriate "domain agent" to handle the request. This architecture, depicted in the patent’s Figure 5, aims to create a more flexible and context-aware conversational experience (’720 Patent, Abstract; col. 21:58-67).
Technical Importance: The described approach sought to move beyond simple voice command-and-control systems toward more fluid, human-like conversational interfaces in the challenging environment of a vehicle (Compl. ¶43).

Key Claims at a Glance

The complaint asserts independent Claim 1 (Compl. ¶104).
Essential elements of Claim 1 include:
- A mobile system responsive to a user's natural language speech utterance.
- A speech unit connected to a computer device on a vehicle that converts the utterance into an electronic signal.
- A natural language speech processing system that includes:
  - A speech recognition engine that uses data including dictionary and phrase entries that are dynamically updated based on a history of a current dialog and prior dialogs.
  - A parser that determines a context for the utterance, selects a domain agent based on that context, and transforms the recognized words into a command formulated in a grammar used by the selected agent.
  - An agent architecture that communicatively couples the system components, allowing the selected domain agent to create and format a response for the user.
The complaint does not explicitly reserve the right to assert dependent claims for the ’720 Patent.

U.S. Patent No. 8,015,006 - "Systems And Methods For Processing Natural Language Speech Utterances With Context-Specific Domain Agents"

Patent Identification: U.S. Patent No. 8,015,006, "Systems And Methods For Processing Natural Language Speech Utterances With Context-Specific Domain Agents," issued September 6, 2011 (Compl. ¶49).

The Invention Explained

Problem Addressed: The patent addresses the "difficult problem" of a machine's ability to communicate naturally with humans, noting that most natural language queries are "incomplete in their definition" or are "ambiguous or subjective," making them hard to translate into a machine-processable form (’006 Patent, col. 1:38-41, 9:16-21).
The Patented Solution: The invention is a method for processing speech that makes "significant use of context, prior information, domain knowledge, and user specific profile data" (’006 Patent, Abstract). A key part of the solution involves the parser not only extracting explicit criteria from a user's speech but also "inferring one or more further criteria and... parameters associated with the request using a dynamic set of prior probabilities or fuzzy possibilities" to resolve ambiguity before formulating a request for a domain agent (’006 Patent, col. 36:1-36; Compl. ¶54).
Technical Importance: The patented method describes a move toward more sophisticated, probabilistic methods for understanding user intent, rather than relying on rigid, predefined command structures (Compl. ¶53).

Key Claims at a Glance

The complaint asserts independent Claim 5 (Compl. ¶137).
Essential elements of Claim 5 include:
- Receiving a natural language speech utterance containing a request.
- Recognizing words or phrases in the utterance using dictionary/phrase tables.
- Parsing the utterance to determine a meaning and a context.
- Formulating the request in accordance with a grammar used by a domain agent, which includes:
  - Determining required and optional values.
  - Extracting criteria and parameters from keywords.
  - Inferring further criteria and parameters using a dynamic set of prior probabilities or fuzzy possibilities.
  - Transforming the extracted and inferred criteria/parameters into tokens compatible with the agent's grammar.
- Processing the formulated request with the domain agent to generate a response.
- Presenting the response via the speech unit.
The complaint does not explicitly reserve the right to assert dependent claims for the ’006 Patent.

Multi-Patent Capsules

Patent Identification: U.S. Patent No. 8,140,327, "System And Method For Filtering And Eliminating Noise From Natural Language Utterances To Improve Speech Recognition And Parsing," issued March 20, 2012 (Compl. ¶58).
Technology Synopsis: The patent describes a system for improving speech recognition in noisy environments by using a microphone array to create directional "nulls" that filter out point noise sources (Compl. ¶61). The system also uses an adaptive filter for echo cancellation and a speech coder that employs adaptive lossy audio compression to preserve essential speech components for the recognition engine (’327 Patent, col. 39:14-40:15).
Asserted Claims: Independent Claim 14 (Compl. ¶178).
Accused Features: The complaint accuses Alexa devices, such as those using the Amazon Alexa Premium Voice Far-Field Development Kit, which employ microphone arrays, beamforming, and noise and echo cancellation technologies (Compl. ¶¶180, 182).
Patent Identification: U.S. Patent No. 8,195,468, "Mobile Systems And Methods Of Supporting Natural Language Human-Machine Interactions," issued June 5, 2012 (Compl. ¶65).
Technology Synopsis: The patent claims a method for processing multi-modal natural language inputs, which include both a speech utterance and a non-speech input (e.g., a screen touch) (Compl. ¶69). The method involves creating separate transcriptions for the speech and non-speech inputs, merging them, and then using a "semantic knowledge-based model"—comprising personalized, general, and environmental models—to determine the most likely context and select an appropriate domain agent (’468 Patent, col. 44:1-44).
Asserted Claims: Independent Claim 19 (Compl. ¶213).
Accused Features: The "Accused Multi-modal Products and Services," such as the Amazon Echo Show, which process both voice commands and on-screen user interactions (Compl. ¶¶212, 215).
Patent Identification: U.S. Patent No. 9,031,845, "Mobile Systems And Methods For Responding To Natural Language Speech Utterance," issued May 12, 2015 (Compl. ¶73).
Technology Synopsis: This patent describes a mobile system in a vehicle that intelligently decides whether to process a voice command "on-board" the vehicle or "off-board" by invoking a wireless network device (Compl. ¶79). After receiving and interpreting an utterance, the system determines the command's domain and context and then determines the appropriate execution location, allowing for a hybrid of local and remote processing (’845 Patent, col. 48:1-36).
Asserted Claims: Independent Claim 1 (Compl. ¶251).
Accused Features: In-vehicle Alexa products like Echo Auto and systems using the Alexa Auto SDK, which feature a "Local Voice Control" extension for on-board command execution in addition to their standard cloud-based processing (Compl. ¶¶264-265).
Patent Identification: U.S. Patent No. 9,263,039, "Systems And Methods For Responding To Natural Language Speech Utterance," issued February 16, 2016 (Compl. ¶83).
Technology Synopsis: The technology is a method for processing both speech and non-speech communications by merging them into a textual query (Compl. ¶87). The system then searches this query for text combinations, compares them to entries in a "context description grammar," generates a "relevance score" from the comparison, and selects one or more domain agents based on that score (’039 Patent, col. 53:13-53).
Asserted Claims: Independent Claim 13 (Compl. ¶290).
Accused Features: Alexa's skill selection architecture, which allegedly compares user utterances to skill invocation phrases and uses systems like "HypRank" to perform "intent-slot semantic analysis" and generate relevance scores to select the most pertinent skill (Compl. ¶¶302, 305, 311).
Patent Identification: U.S. Patent No. 9,495,957, "Mobile Systems And Methods Of Supporting Natural Language Human-Machine Interactions," issued November 15, 2016 (Compl. ¶91).
Technology Synopsis: This patent discloses a system that processes a natural language utterance by using a "context stack" generated from prior utterances (Compl. ¶95). After performing speech recognition on a new utterance, the system compares the resulting words to entries in the context stack, generates "rank scores" for those entries, and identifies the most relevant context entry to determine the user's command or request (’957 Patent, col. 57:1-58:6).
Asserted Claims: Independent Claim 1 (Compl. ¶331).
Accused Features: Alexa's conversational capabilities, which use records of past interactions and a shortlisting-reranking approach to interpret follow-up requests by maintaining context from prior utterances (Compl. ¶¶334-335).

III. The Accused Instrumentality

Product Identification

The complaint names the "Alexa Products" as the accused instrumentalities (Compl. ¶98). This is a broad category encompassing Amazon’s Alexa virtual assistant; the Echo hardware line (e.g., Echo, Echo Dot, Echo Auto, Echo Show); Alexa-related mobile applications; and the supporting cloud infrastructure and services, such as Alexa Voice Services (AVS) (Compl. ¶98, fn. 2).

Functionality and Market Context

The accused products constitute a voice-controlled ecosystem centered on the Alexa assistant. A user speaks a "wake word" followed by an utterance to an Alexa-enabled device (Compl. ¶110). The device sends the utterance to the Alexa cloud, where Amazon's NLU services process the speech to determine the user's intent (Compl. ¶110, p.65). Based on this intent, the system invokes a "skill"—the equivalent of an application or the patents' "domain agent"—to fulfill the request (Compl. ¶110). Skills can be first-party (e.g., playing music from Amazon Music) or third-party (e.g., ordering a car from Uber). The complaint alleges this system is commercially significant, with Amazon's total net sales exceeding $513 billion in fiscal year 2022 (Compl. ¶97). The "How an Alexa Skill Works" diagram illustrates the flow from a user request to the Alexa service, which performs speech recognition and NLU, and then sends the request to the skill's application logic for processing (Compl. p.78).

IV. Analysis of Infringement Allegations

’720 Patent Infringement Allegations

Claim Element (from Independent Claim 1)	Alleged Infringing Functionality	Complaint Citation	Patent Citation
A mobile system responsive to a user generated natural language speech utterance	The Accused Automotive Products and Services (e.g., Echo Auto, Alexa Auto SDK) are mobile systems that respond to hands-free voice commands.	¶105	col. 2:65-3:2
a speech unit connected to a computer device on a vehicle, wherein the speech unit receives a natural language speech utterance... and converts the received... utterance into an electronic signal	Alexa-enabled vehicles include an in-cabin microphone and speakers connected to a head unit, which converts the user's speech into an electronic signal for processing.	¶107-108	col. 16:1-5
a natural language speech processing system... that... receives, processes, and responds to the electronic signal using data received from a plurality of domain agents	The Alexa service acts as a processing system that receives the electronic signal and uses data from "skills" (domain agents) to fulfill the user's request.	¶109-110	col. 20:10-15
a speech recognition engine that recognizes... words or phrases... wherein the data used... includes a plurality of dictionary and phrase entries that are dynamically updated based on at least a history of a current dialog...	Alexa's speech recognition allegedly uses information from different intents and tracks "previously provided information" from the current dialog to improve recognition and understanding. A diagram showing Alexa assigning data from a conversation to different slots illustrates this contextual tracking (Compl. p.66).	¶111-112	col. 23:20-30
a parser that... determin[es] a context for the natural language speech utterance; selecting at least one of the plurality of domain agents based on the determined context; and transforming the... words or phrases into... a command... formulated in a grammar that the selected domain agent uses	The Alexa service parses the utterance to determine context and intent, selects the most relevant skill (domain agent), and formulates a command based on the grammar of the selected skill's intents.	¶113-114	col. 27:40-50
an agent architecture that communicatively couples services of... an agent manager, a system agent, the plurality of domain agents, and an agent library... wherein the selected domain agent uses the... services to create a response	The Alexa platform is alleged to be an agent architecture providing infrastructure and services (e.g., ASK APIs) that couple the user, the Alexa service, and the various skills (domain agents) to generate a response.	¶115-116	col. 21:58-67

Identified Points of Contention:
- Scope Questions: A central question may be whether a system where primary processing occurs in the cloud, accessed via a smartphone connected to a vehicle, meets the claim limitation of "a natural language speech processing system connected to the computer device on the vehicle." The defense may argue the patent envisioned a more self-contained, vehicle-centric system.
- Technical Questions: What specific evidence shows that Alexa's speech recognition engine uses "dictionary and phrase entries that are dynamically updated" based on a current dialog? The complaint alleges Alexa tracks conversation history, but the court may need to determine if this learning mechanism is technically equivalent to the specific update process described in the patent.

’006 Patent Infringement Allegations

Claim Element (from Independent Claim 5)	Alleged Infringing Functionality	Complaint Citation	Patent Citation
receiving, at a speech unit coupled to a processing device, a natural language speech utterance that contains a request	An Alexa device (e.g., Echo Dot) acts as a speech unit coupled to a processing device, receiving a user's spoken request. A diagram showing the request flow from a user to an Echo device illustrates this step (Compl. p.78).	¶140-141	col. 18:27-31
recognizing... one or more words or phrases... using information in one or more dictionary and phrase tables	The Alexa service performs speech recognition to identify words and phrases, using a "wide range of sentences, phrases, and words that users are likely to say."	¶142-143	col. 18:60-65
parsing... information... to determine a meaning... and a context associated with the request	The Alexa service parses the recognized words using intents and slots to determine the meaning and context of the user's request.	¶144-145	col. 19:1-5
formulating, at the parser, the request... in accordance with a grammar used by a domain agent	Alexa formulates the request according to the "intents" of a skill (domain agent), and the complaint alleges that grammar has been a core tool in Alexa's NLU toolkit.	¶146-147	col. 19:6-10
extracting one or more criteria and one or more parameters from one or more keywords... using procedures sensitive to the determined context	Alexa identifies keywords in the utterance and assigns them to different "slots" (parameters) based on the determined intent (context).	¶150-151	col. 19:16-22
inferring one or more further criteria... using a dynamic set of prior probabilities or fuzzy possibilities	Alexa infers parameters for a request by referring back to slots mentioned in the context of the current conversation and makes decisions about "slot values mentioned in context" based on "the probability that any given carryover decision is the correct one."	¶152-153	col. 19:23-27
transforming the... criteria... and... parameters into one or more tokens having a format compatible with the grammar used by the domain agent	The combination of data assigned to slots based on the user's intent is formulated into a request compatible with the skill's logic.	¶154-155	col. 19:28-36
processing the formulated request with the domain agent... to generate a response	The selected skill (domain agent) processes the request using its skill interaction model and application logic to produce a response.	¶156-157	col. 19:37-41
presenting the generated response to the utterance via the speech unit	The Alexa service provides text-to-speech services to generate an audible response to the user's request.	¶158-159	col. 19:42-44

Identified Points of Contention:
- Technical Questions: A key technical question will be whether Alexa's method for handling conversational context meets the specific limitation of "inferring... criteria... using a dynamic set of prior probabilities or fuzzy possibilities." The complaint alleges this occurs through contextual slot carryover, but the analysis will require determining if the underlying mechanism in Alexa's ML models functions as claimed.
- Scope Questions: Does the concept of "domain agent" as described in the patent, which organizes "domain specific behavior and information" (’006 Patent, col. 2:53-55), directly read on an Amazon "skill," which is a third-party or first-party application that conforms to an API provided by Amazon?

V. Key Claim Terms for Construction

The Term: "dynamically updated based on at least a history of a current dialog" (’720 Patent, Claim 1)
Context and Importance: This term is critical for distinguishing the invention from systems with static recognition capabilities. The dispute will likely center on whether Alexa's general machine learning and personalization, which improves over time, is the same as the specific, dialog-based update of "dictionary and phrase entries" recited in the claim. Practitioners may focus on this term because it defines the interactive, learning nature of the claimed system.
Intrinsic Evidence for Interpretation:
- Evidence for a Broader Interpretation: The specification describes using "the user's history of interests and preferences" to interpret questions, suggesting a broad, user-profile-based approach to learning (’720 Patent, col. 2:6-8).
- Evidence for a Narrower Interpretation: The claim explicitly links the "dynamically updated" data to the "dictionary and phrase entries" used by the "speech recognition engine," which could be construed to mean the core recognition models themselves, not just a higher-level user profile (’720 Patent, col. 32:7-12).
The Term: "inferring one or more further criteria... using a dynamic set of prior probabilities or fuzzy possibilities" (’006 Patent, Claim 5)
Context and Importance: This term specifies the technical mechanism for resolving ambiguity. Infringement will depend on whether the complaint can show that Alexa's complex neural networks for contextual understanding operate using a method that is technically equivalent to inferring parameters based on "prior probabilities or fuzzy possibilities."
Intrinsic Evidence for Interpretation:
- Evidence for a Broader Interpretation: The abstract states the invention "makes significant use of context, prior information, domain knowledge, and user specific profile data," and the detailed description notes that "robustness to partial failure is achieved through the use of probabilistic and fuzzy reasoning at several stages," suggesting a general architectural principle (’006 Patent, Abstract; col. 2:61-64).
- Evidence for a Narrower Interpretation: The claim places this "inferring" step within the broader step of "formulating the request," potentially limiting it to a specific type of logic for filling in missing information rather than any general-purpose machine learning model that considers context (’006 Patent, col. 36:23-27).

VI. Other Allegations

Indirect Infringement: The complaint alleges both induced and contributory infringement for all asserted patents. Inducement is primarily based on Amazon providing the Alexa platform, SDKs, and documentation that allegedly instruct and encourage third-party developers and end-users to build and use the infringing functionalities (Compl. ¶¶122-123, 165-166, 198-199).
Willful Infringement: The complaint alleges willful infringement for all asserted patents, based on Amazon's alleged pre-suit knowledge stemming from a series of 2011 meetings where VoiceBox, the original patent owner, allegedly presented its technology to Amazon personnel (Compl. ¶¶28-31, 118, 161). The subsequent hiring of VoiceBox's Chief Scientist is also cited as evidence of knowledge (Compl. ¶35).

VII. Analyst’s Conclusion: Key Questions for the Case

A core issue will be one of technical translation: can patent claims drafted in the early 2000s, describing systems based on explicit "domain agents," "grammars," and "probabilistic tables," be proven to read on a modern, cloud-based architecture that uses opaque, neural network-based machine learning models to achieve similar functional outcomes like contextual understanding and skill selection?
A key evidentiary question will be one of knowledge and intent: can Plaintiff produce sufficient evidence from the 2011 meetings and subsequent events to demonstrate that Amazon had pre-suit knowledge of the patented technology and willfully incorporated it into the Alexa platform, or will Amazon be able to successfully argue that Alexa was the product of independent development in a crowded field?
A central question of claim scope will be whether the architectural elements described in the patents, such as a system on a "vehicle" or a method for processing "multi-modal" inputs, can be construed broadly enough to cover the diverse and distributed nature of the accused Alexa ecosystem, which spans from in-car devices to smart speakers and screen-based displays.