DCT

1:19-cv-01410

VB Assets LLC v. Amazon.com Services LLC

I. Executive Summary and Procedural Information

  • Parties & Counsel:
  • Case Identification: 1:19-cv-01410, D. Del., 10/09/2019
  • Venue Allegations: Plaintiff alleges venue is proper in the District of Delaware because each Defendant entity was incorporated or formed under the laws of Delaware and therefore resides within the district.
  • Core Dispute: Plaintiff alleges that Defendant’s Alexa virtual assistant, Echo smart speakers, and associated products and services infringe six patents related to conversational voice user interfaces, natural language understanding, and voice-enabled commerce.
  • Technical Context: The technology at issue falls within the domain of conversational artificial intelligence and natural language understanding, which underpins the rapidly growing market for voice-controlled digital assistants and smart home devices.
  • Key Procedural History: The complaint alleges extensive pre-suit interaction, beginning in October 2011, during which Plaintiff's predecessor, VoiceBox Technologies, presented its patented technology to Amazon personnel, including individuals who allegedly became leaders of the Alexa and Echo teams. The complaint further alleges that Amazon later hired VoiceBox's Chief Scientist and recruited dozens of its engineers. Post-issuance, U.S. Patent 8,073,681 underwent an inter partes review (IPR2020-01367), with a certificate issued on December 21, 2022, confirming the patentability of claims 1-36 and disclaiming claims 37-42. U.S. Patent 9,015,049 also underwent an inter partes review (IPR2020-01346), with a certificate issued November 17, 2022, cancelling all claims 1-20. Plaintiff also details arguments made during the prosecution of several asserted patents to distinguish them from prior art, which may inform the scope of the claims.

Case Timeline

Date Event
2006-10-16 Priority Date for ’681 and ’049 Patents
2007-02-06 Priority Date for ’176, ’536, and ’097 Patents
2010-10-19 ’176 Patent Issued
2011-10-07 VoiceBox presents technology to Amazon personnel
2011-10-26 VoiceBox presents details of ’176 Patent to Amazon personnel
2011-12-06 ’681 Patent Issued
2014-01-01 Amazon announces launch of Alexa and first-generation Echo
2014-09-16 Priority Date for ’703 Patent
2014-11-11 ’536 Patent Issued
2015-04-21 ’049 Patent Issued
2016-02-23 ’097 Patent Issued
2017-02-02 VoiceBox presents portfolio, including ’681, ’049, ’176, ’536, and ’097 patents, to Amazon Alexa team
2017-04-18 ’703 Patent Issued
2019-10-09 Complaint Filed

II. Technology and Patent(s)-in-Suit Analysis

U.S. Patent No. 8,073,681 - “System and Method for a Cooperative Conversational Voice User Interface,” issued December 6, 2011

The Invention Explained

  • Problem Addressed: The patent’s background section describes the limitations of prior art voice recognition systems, which were often rigid "Command and Control" interfaces that forced users to memorize specific phrases and navigate cumbersome verbal menus, inhibiting widespread adoption (Compl. ¶¶18, 51-52).
  • The Patented Solution: The invention proposes a "cooperative conversational voice user interface" that employs a "conversational speech engine" to overcome these limitations. This engine accumulates and utilizes both "short-term shared knowledge" (context from the current conversation) and "long-term shared knowledge" (historical data from past conversations with the user) to identify a user's intent, infer missing information, and generate an appropriate, adaptive response (’681 Patent, Abstract; col. 8:17-25).
  • Technical Importance: This approach enabled a more natural, free-form conversational interaction between humans and machines, moving beyond the rigid, command-based systems that previously dominated the field (Compl. ¶¶55, 64).

Key Claims at a Glance

  • The complaint quotes language corresponding to system claim 25 and reserves the right to assert additional claims (Compl. ¶¶46, 123).
  • The essential elements of independent claim 25 include:
    • a voice input device configured to receive an utterance during a current conversation;
    • a conversational speech engine with one or more processors configured to:
      • accumulate short-term shared knowledge about the current conversation;
      • accumulate long-term shared knowledge about the user from past conversations;
      • identify a context from both short-term and long-term knowledge;
      • infer additional information if the utterance is insufficient to complete a request;
      • establish an intended meaning for the utterance based on the inferred information; and
      • generate a response based on the established intended meaning.

U.S. Patent No. 9,015,049 - “System and Method for a Cooperative Conversational Voice User Interface,” issued April 21, 2015

The Invention Explained

  • Problem Addressed: Stemming from the same original application as the ’681 patent, the ’049 patent addresses the same problem of rigid "Command and Control" systems that hinder natural human-machine interaction (Compl. ¶¶51-52, 60).
  • The Patented Solution: This patent claims a system that facilitates conversation-based responses by creating and using a "model" of the conversation. The system receives a natural language utterance and identifies a "first model that includes short-term knowledge about the conversation" derived from prior utterances in that same session. This model is then used to identify context, determine an interpretation of the new utterance, and generate a response (’049 Patent, Abstract; Compl. ¶49).
  • Technical Importance: The claimed invention's use of a "model" to track conversational state, rather than simpler text-string matching, represented a departure from conventional approaches and aimed to improve the accuracy of contextual understanding in voice systems (Compl. ¶73).

Key Claims at a Glance

  • The complaint quotes language corresponding to system claim 11 and reserves the right to assert additional claims (Compl. ¶¶49, 131).
  • The essential elements of independent claim 11 include:
    • one or more physical processors programmed to:
      • receive a natural language utterance during a conversation;
      • identify a first model that includes short-term knowledge about the conversation, based on prior utterances;
      • identify, based on the short-term knowledge, context information for the utterance;
      • determine, based on the context information, an interpretation of the utterance; and
      • generate a response based on that interpretation.

U.S. Patent No. 9,626,703 - “Voice Commerce,” issued April 18, 2017

  • Technology Synopsis: The patent addresses the difficulty of online shopping on mobile devices with small screens and keyboards (Compl. ¶79). The claimed solution is a system that allows a user to complete a purchase with a single natural language utterance by automatically determining the product, retrieving stored payment information, and retrieving stored shipping information to complete the transaction "without further user input" from the user identifying these details (Compl. ¶¶76, 84).
  • Asserted Claims: The complaint quotes language corresponding to independent claim 1 (Compl. ¶76).
  • Accused Features: The complaint accuses Amazon's voice commerce system, which allows users of Alexa Products to purchase items via voice commands (Compl. ¶139).

U.S. Patent No. 7,818,176 - “System and Method for Selecting and Presenting Advertisements Based on Natural Language Processing of Voice-based Input,” issued October 19, 2010

  • Technology Synopsis: The patent addresses the failure of existing voice interfaces to effectively deliver marketing and advertising (Compl. ¶103). The invention is a system that receives a voice request, uses a speech recognition engine and a "conversational language processor" to establish the context of the request, and then selects and presents a relevant advertisement to the user. A disclosed technical aspect involves mapping a stream of phonemes to syllables represented in an acoustic grammar to generate a preliminary interpretation (Compl. ¶¶93, 105).
  • Asserted Claims: The complaint quotes language corresponding to independent claim 1 (Compl. ¶93).
  • Accused Features: The complaint accuses Amazon's systems for presenting advertisements and other promotional content to users in response to voice queries made to Alexa Products (Compl. ¶147).

U.S. Patent No. 8,886,536 - “System and Method for Delivering Targeted Advertisements and Tracking Advertisement Interactions in Voice Recognition Contexts,” issued November 11, 2014

  • Technology Synopsis: Sharing a common specification with the ’176 patent, this patent addresses similar problems in voice-based advertising (Compl. ¶101). The claimed method involves receiving a first utterance, providing a response, and then receiving a second, related utterance. The system identifies requests within the second utterance that are to be processed by different devices associated with the user and then determines and presents relevant promotional content (Compl. ¶¶96, 110).
  • Asserted Claims: The complaint quotes language corresponding to independent claim 1 (Compl. ¶96).
  • Accused Features: The complaint accuses Alexa Products of infringing by processing conversational utterances and presenting promotional content, potentially across multiple devices associated with a single user (Compl. ¶155).

U.S. Patent No. 9,269,097 - “System and Method for Delivering Targeted Advertisements and/or Providing Natural Language Processing Based on Advertisements,” issued February 23, 2016

  • Technology Synopsis: Also from the common application family, this patent focuses on interpreting a user's utterance based on a previously presented advertisement (Compl. ¶¶101, 116). Specifically, the method claims to determine whether a pronoun in a user's follow-up utterance (e.g., "buy it") refers to the product or service featured in the advertisement, thereby resolving the pronoun's ambiguous reference by using the advertisement as context (Compl. ¶¶99, 118).
  • Asserted Claims: The complaint quotes language corresponding to independent claim 1 (Compl. ¶99).
  • Accused Features: The complaint accuses Alexa Products of infringing by understanding and processing follow-up voice commands that refer to a product or service in a previously presented advertisement or promotion (Compl. ¶163).

III. The Accused Instrumentality

Product Identification

The accused instrumentalities are collectively termed the "Alexa Products" (Compl. ¶2, fn. 2). This is a broad category that includes Amazon's Alexa virtual assistant software, the Echo line of smart speakers and devices (e.g., Echo, Echo Dot, Echo Show), Amazon's Alexa-enabled applications on mobile devices, the Alexa Voice Services cloud platform, and the associated software, hardware, and cloud infrastructure that enables their operation (Compl. ¶2, fn. 2).

Functionality and Market Context

The accused products provide a voice-controlled, conversational interface that allows users to perform tasks such as playing music, getting information, controlling smart home devices, and purchasing goods and services through voice commands (Compl. ¶¶1, 19). The complaint alleges that VoiceBox Technologies developed an early prototype of a similar voice-controlled speaker, called "Cybermind," long before Amazon's entry into the market (Compl. ¶19). An image of this prototype is included in the complaint as Figure 1 (Compl. ¶19). Plaintiff alleges that Amazon's introduction of the Alexa Products "crushed" its business opportunities (Compl. ¶3).

IV. Analysis of Infringement Allegations

’681 Patent Infringement Allegations

Claim Element (from Independent Claim 25) Alleged Infringing Functionality Complaint Citation Patent Citation
a voice input device configured to receive an utterance during a current conversation with a user Alexa Products include devices with microphones (e.g., Echo smart speakers) that receive user utterances in the course of a conversation. ¶2, fn. 2; ¶123 col. 8:26-30
a conversational speech engine...configured to: accumulate short-term shared knowledge about the current conversation... The Alexa system's conversational speech engine allegedly accumulates knowledge about a user's utterance during a current conversation to maintain context. ¶46; ¶54 col. 8:17-21
accumulate long-term shared knowledge about the user, wherein the long-term shared knowledge includes knowledge about one or more past conversations with the user The Alexa system allegedly accumulates knowledge from a user's past conversations to inform current and future interactions. ¶46; ¶56 col. 8:21-25
identify a context associated with the utterance from the short-term shared knowledge and the long-term shared knowledge The Alexa system allegedly uses both short-term conversational data and long-term user history to identify the context of a given utterance. ¶46; ¶57 col. 8:58-62
infer additional information about the utterance...in response to determining that the utterance contains insufficient information to complete a request When a user's request is ambiguous or incomplete, the Alexa system allegedly infers additional necessary information based on established context. ¶46; ¶57 col. 9:1-5
establish an intended meaning for the utterance within the identified...context based on the additional information inferred about the utterance The Alexa system allegedly establishes an intended meaning for a user's utterance by combining the recognized words with the inferred contextual information. ¶46; ¶57 col. 9:5-9
and generate a response to the utterance based on the intended meaning established within the identified context. The Alexa system allegedly generates a verbal or other response that is based on its established understanding of the user's intended meaning. ¶46; ¶57 col. 9:10-13

’049 Patent Infringement Allegations

Claim Element (from Independent Claim 11) Alleged Infringing Functionality Complaint Citation Patent Citation
one or more physical processors programmed with one or more computer program instructions such that, when executed, the one or more computer program instructions cause the one or more physical processors to: receive a natural language utterance during a conversation between a user and the system The Alexa Products' processors execute instructions to receive user speech via microphones. ¶2, fn. 2; ¶131 col. 2:1-4
identify a first model that includes short-term knowledge about the conversation, wherein the short-term knowledge is based on one or more prior natural language utterances received during the conversation The Alexa system allegedly identifies or creates a model containing short-term knowledge based on the history of the current conversation. ¶49; ¶60-61 col. 8:17-21
identify, based on the short-term knowledge, context information for the natural language utterance The Alexa system allegedly uses the conversational model to identify context for a user's new utterance. ¶49; ¶60 col. 8:58-62
determine, based on the context information, an interpretation of the natural language utterance Based on the identified context, the Alexa system allegedly determines an interpretation of what the user means. ¶49; ¶60 col. 9:5-9
and generate, based on the interpretation of the natural language utterance, a response to the natural language utterance. The Alexa system allegedly provides a response to the user based on its interpretation of the user's intent. ¶49; ¶60 col. 9:10-13

Identified Points of Contention

  • Scope Questions: A central dispute may arise over whether the architecture of Amazon's Alexa platform, which involves complex cloud-based AI and machine learning systems, maps onto the patent's descriptions of a "conversational speech engine" and its method of "accumulating" knowledge. The analysis may question whether "short-term knowledge" as claimed is coextensive with temporary session data used by modern web services, and whether "long-term knowledge" is coextensive with a standard user profile or interaction history.
  • Technical Questions: The complaint alleges that Amazon's system performs functions like "inferring additional information" and establishing an "intended meaning." A technical question will be whether the specific algorithms and processes used by Alexa actually perform these functions as contemplated by the patents. For instance, what evidence does the complaint provide that Alexa's process for handling an ambiguous query is the same as the claimed method of "inferring" information based on "short-term and long-term shared knowledge"?

V. Key Claim Terms for Construction

The Term: "conversational speech engine" (’681 Patent, cl. 25)

  • Context and Importance: This term appears to define the core infringing component. The outcome of the case may depend on whether this term is construed broadly to cover any modern natural language processing system that maintains context, or narrowly to cover only the specific architecture and modules described in the patent. Practitioners may focus on this term because it is not a standard industry term and appears to be defined by its function within the claims.
  • Intrinsic Evidence for Interpretation:
    • Evidence for a Broader Interpretation: The patent describes the engine in functional terms, stating it is configured to perform actions like "accumulate...knowledge," "identify a context," and "infer additional information" (’681 Patent, cl. 25), which could support a construction covering any system that performs these functions.
    • Evidence for a Narrower Interpretation: The detailed description links the "conversational speech engine" to a specific architecture including a "free form voice search module," "context determination process," and "context domain agents" (’681 Patent, col. 8:31-38; Fig. 2). This could support a narrower construction limited to systems embodying this structure.

The Term: "short-term shared knowledge" (’681 Patent, cl. 25) / "short-term knowledge" (’049 Patent, cl. 11)

  • Context and Importance: The distinction between "short-term" and "long-term" knowledge was a key argument used to overcome prior art during prosecution (Compl. ¶¶58-59). The definition of this term will be critical for determining both infringement and validity.
  • Intrinsic Evidence for Interpretation:
    • Evidence for a Broader Interpretation: The claims define the term by its source: "knowledge about the utterance received during the current conversation" (’681 Patent, cl. 25), suggesting it could broadly cover any data from the current user session.
    • Evidence for a Narrower Interpretation: The specification suggests that short-term knowledge is managed in a "short-term context stack" and may be expired "after a psychologically appropriate amount of time" (’681 Patent, col. 9:60-65). This could support a construction requiring a specific data structure or temporal limitation beyond simple session data.

VI. Other Allegations

  • Indirect Infringement: The complaint alleges both induced and contributory infringement for all asserted patents. For inducement, it alleges Amazon instructs users to use the Alexa Products in an infringing manner through user manuals, its website, and the virtual assistant itself (e.g., Compl. ¶124). For contributory infringement, it alleges the Alexa Products are especially made to practice the inventions and are not staple articles of commerce suitable for substantial non-infringing use (e.g., Compl. ¶125).
  • Willful Infringement: The complaint alleges willful infringement based on Amazon's alleged pre-suit knowledge of the patents. It asserts that Amazon was made aware of the patented technology and specific patents (including the ’176 and ’681 patents) in a series of meetings beginning in October 2011 (Compl. ¶¶25-30, 126, 150). The complaint includes a slide from a 2011 presentation to Amazon, Figure 6, which explicitly features the front page of the ’176 patent (Compl. ¶30). The complaint also alleges Amazon gained knowledge during subsequent meetings in 2017 (Compl. ¶41) and from citations to the patents during the prosecution of its own patent applications (Compl. ¶126).

VII. Analyst’s Conclusion: Key Questions for the Case

  • A core issue will be one of technical mapping: can the specific functions claimed in the patents—such as a "conversational speech engine" that "accumulates" distinct "short-term" and "long-term" knowledge to "infer" information—be demonstrably mapped onto the complex, machine-learning-based architecture of Amazon's Alexa cloud platform? Or will Amazon be able to show a fundamental mismatch in technical operation?
  • A second central issue will be one of pre-suit knowledge and intent: given the detailed allegations and visual evidence of meetings where Plaintiff's patented technology was allegedly disclosed to Amazon years before the suit, a key question for the court will be what Amazon knew, when it knew it, and how that knowledge influenced its development of the Alexa platform. This will be critical to the claim of willfulness.
  • A dispositive threshold question will be patent validity: The post-grant cancellation of all claims of the ’049 patent will likely remove it from the case. The confirmation of the ’681 patent's claims in IPR strengthens its standing, but the validity of the remaining four patents will be a significant battleground, likely focusing on whether the claimed concepts were truly inventive over the state of the art in the rapidly evolving field of natural language processing at the time of their filing.