DCT
1:24-cv-01279
VB Assets LLC v. SoundHound Ai Inc
I. Executive Summary and Procedural Information
- Parties & Counsel:
- Plaintiff: VB Assets, LLC (Delaware)
- Defendant: SoundHound AI, Inc. (Delaware)
- Plaintiff’s Counsel: Farnan LLP
- Case Identification: 1:24-cv-01279, D. Del., 01/30/2025
- Venue Allegations: Venue is alleged to be proper based on Defendant SoundHound AI, Inc.'s status as a Delaware corporation, establishing residence in the district.
- Core Dispute: Plaintiff alleges that Defendant’s voice recognition and natural language understanding platforms, including its "Voice Commerce Ecosystem," infringe nine U.S. patents related to conversational voice interfaces, voice-based targeted advertising, and voice-enabled commerce.
- Technical Context: The technology concerns conversational artificial intelligence and voice assistants, a commercially significant field integral to modern automotive systems, consumer electronics, and enterprise customer service solutions.
- Key Procedural History: The complaint alleges that Plaintiff provided Defendant with notice of infringement on November 13, 2024, prior to filing an original complaint on November 21, 2024. During the prosecution of U.S. Patent No. 11,087,385, the applicant distinguished prior art by emphasizing the invention selects a product based on a single user input. Additionally, U.S. Patent No. 8,073,681 has been the subject of an Inter Partes Review (IPR), resulting in the disclaimer of claims 37-42.
Case Timeline
| Date | Event |
|---|---|
| 2001-01-01 | VoiceBox Technologies founded |
| 2006-10-16 | Earliest Priority Date for ’681, ’626, ’699, and ’249 Patents |
| 2007-02-06 | Earliest Priority Date for ’536, ’097, and ’176 Patents |
| 2010-10-19 | U.S. Patent No. 7,818,176 Issues |
| 2010-11-10 | Earliest Priority Date for ’025 Patent |
| 2011-12-06 | U.S. Patent No. 8,073,681 Issues |
| 2012-01-09 | VoiceBox and Toyota announce strategic relationship |
| 2013-01-01 | VoiceBox Technologies ranked by IEEE for patent power |
| 2014-09-16 | Earliest Priority Date for ’385 Patent |
| 2014-11-11 | U.S. Patent No. 8,886,536 Issues |
| 2016-02-23 | U.S. Patent No. 9,269,097 Issues |
| 2016-11-22 | U.S. Patent No. 9,502,025 Issues |
| 2018-01-01 | VoiceBox Technologies sold to Nuance Communications |
| 2019-05-21 | U.S. Patent No. 10,297,249 Issues |
| 2020-08-25 | U.S. Patent No. 10,755,699 Issues |
| 2021-08-10 | U.S. Patent No. 11,087,385 Issues |
| 2022-01-11 | U.S. Patent No. 11,222,626 Issues |
| 2024-11-13 | Plaintiff sends pre-suit notice of infringement to Defendant |
| 2024-11-21 | Plaintiff files original complaint |
| 2025-01-07 | SoundHound demonstrates in-vehicle voice commerce at CES |
| 2025-01-30 | First Amended Complaint filed |
II. Technology and Patent(s)-in-Suit Analysis
U.S. Patent No. 8,073,681 - “SYSTEM AND METHOD FOR A COOPERATIVE CONVERSATIONAL VOICE USER INTERFACE” (Issued December 6, 2011)
The Invention Explained
- Problem Addressed: The patent addresses the limitations of prior "Command and Control" voice recognition systems, which forced users to memorize rigid commands and navigate inflexible menus, failing to provide a natural, conversational experience (Compl. ¶32-33; ’249 Patent, col. 1:49-62).
- The Patented Solution: The invention proposes a "conversational speech engine" that interprets user utterances by leveraging both "short-term shared knowledge" (context from the current conversation) and "long-term shared knowledge" (context from past conversations with the user). This allows the system to identify the correct context, disambiguate words with multiple meanings, and generate a grammatically appropriate response, thereby emulating a more cooperative, human-like dialogue (Compl. ¶25, 34-35; ’249 Patent, col. 4:1-6).
- Technical Importance: This technology represented a step toward more intuitive human-machine interaction by allowing users to "converse naturally" with a voice system instead of adapting their requests to a machine's limited instruction set (Compl. ¶36).
Key Claims at a Glance
- The complaint asserts at least independent Claim 25 (Compl. ¶25).
- The essential elements of Claim 25 include:
- A voice input device configured to receive an utterance containing words with different meanings in different contexts.
- A conversational speech engine with processors configured to:
- Accumulate short-term shared knowledge about the current conversation.
- Accumulate long-term shared knowledge about the user from past conversations.
- Identify a context for the utterance using both the short-term and long-term knowledge.
- Establish an intended meaning for the utterance within that context to disambiguate the user's intent.
- Generate a grammatically or syntactically adapted response based on the intended meaning.
- The complaint reserves the right to assert additional claims (Compl. ¶81).
U.S. Patent No. 11,222,626 - “SYSTEM AND METHOD FOR A COOPERATIVE CONVERSATIONAL VOICE USER INTERFACE” (Issued January 11, 2022)
The Invention Explained
- Problem Addressed: Like the ’681 Patent, this invention seeks to overcome the rigidity of "Command and Control" systems that lack the ability to follow a multi-turn, evolving conversation (Compl. ¶32-33).
- The Patented Solution: The invention describes a system that facilitates multi-turn conversations by tracking the contexts of sequential user utterances. It generates a "context stack" that organizes these contexts in reverse chronological order. When a new utterance is received, the system compares it against the contexts in the stack to determine if the user is referring to a recent topic, allowing it to correctly interpret the input based on the preceding dialogue (Compl. ¶27; ’249 Patent, col. 12:7-14).
- Technical Importance: The use of a context stack allows a voice system to maintain conversational continuity, enabling users to make follow-up requests or refer to earlier topics without having to restate the entire context (Compl. ¶35-36).
Key Claims at a Glance
- The complaint asserts at least independent Claim 10 (Compl. ¶27).
- The essential elements of Claim 10 include:
- One or more processors configured to:
- Track a series of contexts identified from a series of utterances during a conversation.
- Generate a context stack based on the tracked contexts in reverse chronological order.
- Receive a third utterance.
- Determine if the third utterance corresponds to one or more contexts in the stack by comparing it to the contexts in the order they are listed.
- Interpret the third utterance using the corresponding context(s) if a correspondence is found.
- One or more processors configured to:
- The complaint reserves the right to assert additional claims (Compl. ¶87).
U.S. Patent No. 10,755,699 (Issued August 25, 2020)
- Technology Synopsis: The patent describes a system for adapting natural language responses based on a user's specific "manner of speaking." It does so by identifying the manner in which an utterance was spoken (e.g., tone, pace) based on accumulated short-term and long-term knowledge and generates a response tailored to that manner (Compl. ¶29).
- Asserted Claims: Independent Claim 12 (Compl. ¶29).
- Accused Features: The SoundHound Voice AI Systems are accused of infringing by providing adaptive responses that account for user-specific speech patterns (Compl. ¶93).
U.S. Patent No. 10,297,249 (Issued May 21, 2019)
- Technology Synopsis: This patent relates to systems that use short-term knowledge generated from multi-modal device interactions. It describes receiving both a voice input and a non-voice input (e.g., a screen touch), generating short-term knowledge based on the combination of these inputs, and using that knowledge to determine the context and interpretation of the utterance (Compl. ¶31).
- Asserted Claims: Independent Claim 16 (Compl. ¶31).
- Accused Features: The SoundHound Voice AI Systems, which are implemented on devices with both voice and non-voice input capabilities (e.g., car infotainment systems), are accused of infringing (Compl. ¶70, 99).
U.S. Patent No. 8,886,536 (Issued November 11, 2014)
- Technology Synopsis: The patent discloses a system for delivering targeted advertisements within a voice recognition context. It describes using multiple "domain agents" (e.g., a "music" agent and a "navigation" agent) to interpret a user's utterance, determining the correct interpretation, and then selecting and presenting promotional content based on that interpretation (Compl. ¶39).
- Asserted Claims: Independent Claim 32 (Compl. ¶39).
- Accused Features: SoundHound's voice commerce and advertising platforms, which are alleged to interpret user requests to provide targeted responses, are accused of infringement (Compl. ¶71, 105).
U.S. Patent No. 9,269,097 (Issued February 23, 2016)
- Technology Synopsis: This invention covers a system for providing natural language processing based on a previously presented advertisement. Specifically, it describes interpreting a user utterance containing a pronoun (e.g., "call them") by determining whether the pronoun refers to the product, service, or provider mentioned in the advertisement (Compl. ¶41).
- Asserted Claims: Independent Claim 23 (Compl. ¶41).
- Accused Features: SoundHound's Voice AI Systems that deliver promotional content are accused of infringing by enabling users to make follow-up voice commands related to that content (Compl. ¶111).
U.S. Patent No. 7,818,176 (Issued October 19, 2010)
- Technology Synopsis: The patent describes a system for selecting and presenting advertisements by first establishing a context for a natural language utterance. A conversational language processor interprets recognized words to establish this context and then selects an appropriate advertisement within that context for presentation (Compl. ¶43).
- Asserted Claims: Independent Claim 27 (Compl. ¶43).
- Accused Features: SoundHound's advertising platform is accused of infringing by allegedly establishing context from voice requests to select and present relevant ads (Compl. ¶117).
U.S. Patent No. 9,502,025 (Issued November 22, 2016)
- Technology Synopsis: The patent details a "natural language content dedication service." It describes a system where a user can, via a first voice utterance, identify content to dedicate to a recipient and then, via a second utterance, provide a personal message to be associated with that dedicated content (Compl. ¶50).
- Asserted Claims: Independent Claim 1 (Compl. ¶50).
- Accused Features: SoundHound's voice commerce ecosystem, which includes functionalities like ordering food for others, is accused of practicing this dedication service (Compl. ¶78, 123).
U.S. Patent No. 11,087,385 (Issued August 10, 2021)
- Technology Synopsis: This patent covers a "voice commerce" system designed to streamline online shopping. It claims a method where a product is selected based on a single first user input without further input, and the purchase is completed after receiving only a second user input for confirmation, also without further input to identify payment or shipping information (Compl. ¶57).
- Asserted Claims: Independent Claim 16 (Compl. ¶57).
- Accused Features: SoundHound's in-car voice food ordering system and broader "Voice Commerce Ecosystem" are accused of infringing this streamlined purchasing process (Compl. ¶78, 129).
III. The Accused Instrumentality
Product Identification
- The accused products are the "SoundHound Voice AI Systems," which encompass the SoundHound Houndify and Voice AI platform (Compl. ¶71). These systems are implemented across a range of products, including SoundHound's own applications (Chat AI, Music), enterprise solutions for industries like Automotive and Hospitality, and an in-vehicle voice assistant and "Voice Commerce Ecosystem" (Compl. ¶71).
Functionality and Market Context
- The accused instrumentalities provide voice recognition and natural language understanding technology to a wide array of customers, including Hyundai, Mercedes-Benz, and Chipotle (Compl. ¶69; p. 26). The technology is designed to be flexible, operating either fully on a device (Edge), entirely in the cloud, or in a hybrid "Edge+Cloud" configuration to meet different product needs (Compl. ¶72). A diagram in the complaint illustrates this flexible architecture, showing how user utterances can be processed on-device, in the cloud, or through a combination of both (Compl. p. 28). The complaint places particular emphasis on the "Voice Commerce Ecosystem," which enables functionalities such as in-car food ordering, demonstrated by SoundHound at industry events like CES (Compl. ¶78). A screenshot from a SoundHound video explicitly references "building a new voice Commerce" ecosystem (Compl. p. 27).
IV. Analysis of Infringement Allegations
U.S. Patent No. 8,073,681 Infringement Allegations
| Claim Element (from Independent Claim 25) | Alleged Infringing Functionality | Complaint Citation | Patent Citation |
|---|---|---|---|
| a voice input device configured to receive an utterance during a current conversation with a user, wherein the utterance includes one or more words that have different meanings in different contexts | The SoundHound systems are "Voice-Enabled Devices" that include "voice input" functionality for conversational interactions. | ¶70 | ’249 Patent, col. 2:23-28 |
| a conversational speech engine... configured to: accumulate short-term shared knowledge about the current conversation... | SoundHound’s conversational AI platform allegedly processes utterances within the context of an ongoing dialogue, which requires accumulating knowledge from the current session. | ¶71 | ’249 Patent, col. 15:5-22 |
| accumulate long-term shared knowledge about the user... from... one or more past conversations with the user | The SoundHound platform provides personalized, natural language interactions, which allegedly requires the accumulation and use of user-specific knowledge from prior interactions. | ¶69, 71 | ’249 Patent, col. 15:23-41 |
| identify a context associated with the utterance from the short-term shared knowledge and the long-term shared knowledge | SoundHound's system allegedly identifies context from user queries (e.g., interpreting "I'm hungry" as a request for restaurants) by using conversational knowledge. | ¶73; p. 28 | ’249 Patent, col. 14:59-67 |
| establish an intended meaning for the utterance within the identified context to disambiguate an intent... | The core NLU functionality of the accused systems is alleged to determine user intent from natural language utterances. | ¶69 | ’249 Patent, col. 6:5-14 |
| generate a grammatically or syntactically adapted response to the utterance based on the intended meaning... | The accused systems provide voice and other output functions that respond to the user's interpreted request. | ¶70 | ’249 Patent, col. 6:14-22 |
- Identified Points of Contention:
- Architectural Equivalence: A primary question will be whether the complaint provides sufficient factual basis to allege that SoundHound's AI architecture specifically uses distinct "short-term" and "long-term" knowledge stores as contemplated by the patent. The dispute may focus on whether SoundHound's method of storing and accessing user and session data, whatever its architecture, meets these claimed limitations.
- Scope Questions: The case may raise the question of whether simply providing a modern, personalized conversational assistant necessarily requires the specific two-part knowledge accumulation method claimed in the ’681 Patent.
U.S. Patent No. 11,222,626 Infringement Allegations
| Claim Element (from Independent Claim 10) | Alleged Infringing Functionality | Complaint Citation | Patent Citation |
|---|---|---|---|
| track a series of contexts respectively identified for a series of natural language utterances received... during a current conversation... | SoundHound's conversational interface allegedly tracks the context of multi-turn dialogues, such as a user first stating "I'm hungry" and then asking a follow-up question about a specific restaurant type. | ¶73; p. 28 | ’249 Patent, col. 12:7-14 |
| generate a context stack based on the tracked contexts comprising the series of contexts in reverse chronological order... | The complaint alleges that by facilitating multi-turn conversations where users can refer to previous topics, the accused system necessarily generates an ordered record of contexts functionally equivalent to the claimed stack. | ¶27, 87 | ’249 Patent, col. 12:10-14 |
| receive a third natural language utterance... | The accused systems receive sequential voice commands as a core function of their conversational interface. | ¶70 | ’249 Patent, col. 7:36-44 |
| determine whether the third natural language utterance corresponds to one or more of the series of contexts in the generated context stack by comparing... | The system allegedly interprets follow-up questions by referencing the context of prior statements in the same conversation, which corresponds to the claimed function of checking the context stack. | ¶27, 71 | ’249 Patent, col. 12:10-22 |
| responsive to a determination... interpret the third natural language utterance using the corresponding one or more contexts. | The system allegedly provides a relevant answer to a follow-up query by using the established conversational context. | ¶73; p. 28 | ’249 Patent, col. 7:45-51 |
- Identified Points of Contention:
- Technical Questions: A central dispute will likely be whether the complaint offers evidence that SoundHound's platform uses the specific data structure of a "context stack" searched in "reverse chronological order." The defense may argue its system manages conversation history through a different, non-infringing technical method.
- Scope Questions: The infringement analysis will likely depend on whether the term "context stack" is construed narrowly to its specific computer science definition or more broadly to encompass any system that stores and recalls a sequence of conversational topics.
V. Key Claim Terms for Construction
For the ’681 Patent:
- The Term: "short-term shared knowledge" / "long-term shared knowledge"
- Context and Importance: The patentability of the claim and the infringement analysis depend on the distinction between knowledge derived from the "current conversation" (short-term) and from "past conversations" (long-term). Practitioners may focus on whether these terms require distinct data stores or merely refer to data of different ages or types within a unified user profile system.
- Intrinsic Evidence for Interpretation:
- Evidence for a Broader Interpretation: The specification describes these concepts in relation to human conversation, where memory is not strictly partitioned, suggesting the terms could refer to the source and timing of knowledge rather than its storage location (’249 Patent, col. 1:22-49).
- Evidence for a Narrower Interpretation: The specification separately defines "short-term knowledge" as accumulating "during a single conversation" and "long-term shared knowledge" as "generally... user-centric, rather than session-based," which suggests a functional and potentially structural separation between the two types of knowledge (’249 Patent, col. 15:5-30).
For the ’626 Patent:
- The Term: "context stack... in reverse chronological order"
- Context and Importance: This term describes a specific data structure (a stack, implying last-in, first-out logic) and a specific search methodology (reverse chronological). The literal infringement case hinges on whether the accused system employs this exact implementation.
- Intrinsic Evidence for Interpretation:
- Evidence for a Broader Interpretation: A party might argue that "stack" is used colloquially to mean any ordered list of contexts that preserves the conversational history.
- Evidence for a Narrower Interpretation: The claim language is highly specific. The specification describes a process that "may track conversation topics and attempt to fit a current utterance into a most-recent context, next-most-recent topic, etc., traversing the context stack until a most likely intent can be established," which strongly supports a specific, last-in-first-out search methodology (’249 Patent, col. 12:7-14).
VI. Other Allegations
- Indirect Infringement: The complaint alleges both induced and contributory infringement. Inducement is based on allegations that SoundHound markets its Voice AI Systems, provides instructions and support to its customers (e.g., automotive companies), and designs the systems to be used in an infringing manner (Compl. ¶82, 88). Contributory infringement is based on allegations that the accused systems are especially made for practicing the inventions and are not staple articles of commerce (Compl. ¶83, 89).
- Willful Infringement: Willfulness is alleged based on Defendant’s purported knowledge of the patents. The complaint claims this knowledge arises from at least a pre-suit notice letter sent on November 13, 2024, as well as the filing of the original and this amended complaint, after which Defendant allegedly continued its infringing activities (Compl. ¶79).
VII. Analyst’s Conclusion: Key Questions for the Case
- A core issue will be one of architectural proof: The complaint alleges infringement based on the observed high-level functionality of a sophisticated conversational AI. A key evidentiary question is whether discovery will show that SoundHound's underlying software architecture actually implements the specific structures and methods recited in the claims, such as the formal separation of "short-term" and "long-term" knowledge (’681 Patent) or the use of a "context stack" searched in reverse chronological order (’626 Patent).
- A second central issue will be one of claim construction and scope: The viability of the infringement claims may depend on whether terms like "context stack" are interpreted narrowly according to their specific technical meaning or broadly to cover any functional equivalent for managing conversation history. This will be critical in determining whether there is a fundamental mismatch between the patented inventions and the accused system's mode of operation.
- A final key question will involve the voice commerce patents: For patents like the ’385 Patent, which claim a highly streamlined purchase process initiated by a "single first user input," the analysis will focus on whether SoundHound’s multi-turn, interactive food ordering system can be mapped onto these specific and limited claim steps, particularly in light of arguments made during prosecution to distinguish the invention from more conversational prior art.