DCT

1:25-cv-10341

Dialect LLC v. Comcast Corportion

I. Executive Summary and Procedural Information

  • Parties & Counsel:
  • Case Identification: 1:25-cv-10341, N.D. Ill., 08/29/2025
  • Venue Allegations: Venue is based on Defendant maintaining regular and established places of business in the Northern District of Illinois and committing alleged acts of patent infringement within the district.
  • Core Dispute: Plaintiff alleges that Defendant’s Xfinity Voice Remote infringes five patents related to natural language understanding and voice recognition technology.
  • Technical Context: The technology at issue involves conversational voice interfaces that allow users to control electronic devices through natural speech, a central feature in the modern smart device market.
  • Key Procedural History: The complaint alleges a substantive history between Defendant and VoiceBox Technologies, the original patent owner, from approximately 2012 through 2015, including technology demonstrations, a draft license agreement, and an alpha test of voice remote technology involving Comcast employees. This history forms the basis for Plaintiff’s willful infringement allegations. Additionally, the complaint notes that Google filed petitions for inter partes review against four of the five asserted patents in April 2024, and that the Patent Trial and Appeal Board denied institution of those reviews in October 2024.

Case Timeline

Date Event
2002-06-03 Earliest Priority Date ('209 and '825 Patents)
2005-08-29 Earliest Priority Date ('607, '039, and '549 Patents)
2008-07-08 U.S. Patent No. 7,398,209 Issues
2009-11-17 U.S. Patent No. 7,620,549 Issues
c. 2012 VoiceBox and Comcast begin discussions regarding voice technology
2012-10-19 VoiceBox provides draft license to Comcast
2013-05-21 U.S. Patent No. 8,447,607 Issues
2015 Comcast first releases Xfinity Voice Remote
2016-02-16 U.S. Patent No. 9,263,039 Issues
2017-08-15 U.S. Patent No. 9,734,825 Issues
2024-04 Google files petitions for inter partes review of the '209, '039, '825, and '549 Patents
2024-10 PTAB denies institution of Google's inter partes review petitions
2025-08-29 Complaint Filed

II. Technology and Patent(s)-in-Suit Analysis

U.S. Patent No. 7,398,209 - "Systems And Methods For Responding To Natural Language Speech Utterance," issued July 8, 2008

The Invention Explained

  • Problem Addressed: The patent’s background section states that prior art human-machine communication was flawed because "human questions and machine processing of queries may be fundamentally incompatible" (Compl. ¶33; ’209 Patent, col. 1:27-29). Humans rely on context and domain knowledge, whereas machines historically required highly structured, unnatural queries (Compl. ¶33; ’209 Patent, col. 1:32-35).
  • The Patented Solution: The invention proposes a system that uses specialized software modules called "domain agents" to manage specific topics or tasks (Compl. ¶33; ’209 Patent, col. 2:48-51). When a user speaks a natural language query, the system parses it to determine the correct context and domain, selects the appropriate agent, formulates a machine-readable request for that agent, and processes the results for presentation to the user (Compl. ¶33; ’209 Patent, col. 3:53-54). Figure 6 of the patent illustrates this process of parsing a query, selecting an agent, and formatting a command for that agent (Compl. ¶35).
  • Technical Importance: This agent-based architecture was designed to enable more natural and effective human-computer voice interactions, moving beyond the rigid "Command and Control" systems that limited early voice recognition technology (Compl. ¶21).

Key Claims at a Glance

The complaint asserts at least Claim 1, an independent method claim (Compl. ¶70). Its key elements include:

  • Receiving a user generated natural language speech utterance containing a request.
  • Maintaining a dynamic set of prior probabilities or fuzzy possibilities.
  • Recognizing words and phrases using dictionary and phrase tables.
  • Parsing the words to determine a meaning and a context for the request.
  • Selecting at least one "domain agent" based on the determined meaning, where the agent is an "autonomous executable" that responds to requests for the determined context.
  • Formulating the request in accordance with a grammar used by the selected agent.
  • Invoking the agent to process the request and presenting the results.

U.S. Patent No. 8,447,607 - "Mobile Systems And Methods Of Supporting Natural Language Human-Machine Interactions," issued May 21, 2013

The Invention Explained

  • Problem Addressed: The patent identifies the difficulty of creating a natural language interface for mobile environments, which involves challenges like background noise, diverse user needs, and retrieving information from various local and remote sources ('607 Patent, col. 1:39-54).
  • The Patented Solution: The invention describes a system for processing "multi-modal" inputs, which include both a natural language utterance and a non-speech input ('607 Patent, Claim 12). It generates a speech transcription based on a "cognitive model" associated with the user, which incorporates information from prior user interactions to improve accuracy and context determination (Compl. ¶41; ’607 Patent, Abstract). The system then uses a "context stack" to identify the current context and selects a "domain agent" to generate a response (Compl. ¶41; ’607 Patent, Claim 12).
  • Technical Importance: The technology aims to create more robust and personalized voice interactions on mobile devices by integrating multiple input types and learning from a user's specific history to better understand their intent (Compl. ¶40).

Key Claims at a Glance

The complaint asserts at least Claim 12, an independent method claim (Compl. ¶89). Its key elements include:

  • Receiving a multi-modal natural language input including a natural language utterance and a non-speech input.
  • Generating a speech-based transcription based on a "cognitive model" associated with the user that includes information on prior interactions.
  • Generating a merged transcription from the speech and non-speech inputs.
  • Identifying an entry in a "context stack" that matches information in the merged transcription.
  • Identifying a "domain agent" associated with the entry in the context stack.
  • Determining and communicating a request to the domain agent to generate a response.

Multi-Patent Capsule: U.S. Patent No. 9,263,039 - "Systems And Methods For Responding To Natural Language Speech Utterance," issued February 16, 2016

  • Technology Synopsis: This patent describes a method for processing combined speech and non-speech communications by merging them into a query, comparing text from the query to a "context description grammar," generating a "relevance score" from that comparison, and selecting one or more domain agents based on that score to obtain content and generate a response (Compl. ¶44). The patent's Figure 1 shows a diagrammatic view of the system's architecture (Compl. ¶45).
  • Asserted Claims: At least Claim 13 (independent method) (Compl. ¶108).
  • Accused Features: The complaint alleges that the Accused Products' functionality for processing speech and non-speech communications infringes (Compl. ¶111).

Multi-Patent Capsule: U.S. Patent No. 9,734,825 - "Methods and Apparatus for Determining a Domain Based on the Content and Context of a Natural Language Utterance," issued August 15, 2017

  • Technology Synopsis: This patent discloses a method for determining the correct domain for a user's utterance by receiving "prior probabilities or fuzzy possibilities" from a system or domain agent, using those probabilities to score at least two possible contexts for the utterance, and selecting a domain agent based on the determined domain to create and send queries to information sources (Compl. ¶51).
  • Asserted Claims: At least Claim 5 (independent method) (Compl. ¶127).
  • Accused Features: The complaint alleges the Accused Products' functionality for responding to natural language speech infringes (Compl. ¶130).

Multi-Patent Capsule: U.S. Patent No. 7,620,549 - "System and method of supporting adaptive misrecognition in conversational speech," issued November 17, 2009

  • Technology Synopsis: This patent describes a method for processing utterances on a multimodal device where a "follow-up multimodal input" presented "proximate in time" to a prior utterance is used to determine if the initial interpretation was incorrect. The system uses an "adaptive misrecognition engine" to monitor actions associated with the domain agent to make this determination (Compl. ¶56).
  • Asserted Claims: At least Claim 4 (independent method) (Compl. ¶146).
  • Accused Features: The complaint alleges the Accused Products' functionality for processing natural language utterances infringes (Compl. ¶149).

III. The Accused Instrumentality

Product Identification

  • The accused products are Defendant's Xfinity Voice Remotes, including models XR11, XR15, XR16, XR100, and XRA, and the associated voice functionality services (Compl. ¶8, 65).

Functionality and Market Context

  • The Xfinity Voice Remote allows users to control their television service using natural language speech commands (Compl. ¶66). A screenshot from Defendant's support website provides examples of commands such as "Watch NBC," "Find kids movies," and "Show me what's on tonight at seven" (Compl. p. 26). The complaint states that when a user speaks into the remote, the "voice commands are sent to us and our contracted service provider for processing" (Compl. p. 26). Defendant's website also indicates that users can type commands instead of speaking them via the "Xfinity Web Remote" (Compl. ¶66, p. 25). The complaint alleges that by 2018, Defendant had delivered over 18 million of the accused remotes (Compl. ¶65).

IV. Analysis of Infringement Allegations

The complaint does not provide claim charts or a detailed narrative theory mapping specific product features to claim elements for any of the asserted patents. Instead, it recites an independent claim from each patent and alleges, upon information and belief, that the Accused Products practice the claimed methods through their general operation (Compl. ¶¶ 73, 92, 111, 130, 149). The infringement theory appears to be based on the overall functionality of the Xfinity Voice Remote system, which accepts natural language voice commands to search for and control media content. A screenshot in the complaint shows Defendant advertising the remote's support for "natural language speech recognition" (Compl. p. 26). Another screenshot shows Defendant providing "Voice command tips" to users, which allegedly instruct on how to use the infringing functionality (Compl. p. 25).

Identified Points of Contention

  • Scope Questions: A central question for the ’209 and related patents may be whether the architecture of Defendant's back-end processing system, which uses a "contracted service provider" (Compl. p. 26), meets the definition of a system using multiple, "autonomous executable domain agents" as recited in the claims. A dispute may arise over whether the term "multi-modal natural language input" in the ’607 Patent can read on a system where a user can either speak into a physical remote or type into a separate web-based remote (Compl. ¶66, p. 25).
  • Technical Questions: The infringement analysis for the ’607 Patent will raise the question of whether the accused system generates transcriptions "based on a cognitive model associated with the user" that includes "prior interactions." This suggests a system that personalizes its speech recognition for each user, a technical detail that will require discovery into the actual operation of Defendant's processing algorithms. Similarly, analysis of the ’209 Patent will require evidence of whether the system maintains a "dynamic set of prior probabilities," another specific technical implementation not apparent from the product's external functionality.

V. Key Claim Terms for Construction

  • Term: "domain agent" (from Claim 1 of the ’209 Patent)

    • Context and Importance: This term is foundational to the architecture claimed in the ’209, ’825, and other asserted patents. The infringement case may depend on whether the software components in Defendant's back-end system qualify as "domain agents." Practitioners may focus on this term because its definition could distinguish between a generic, monolithic processing system and the specific, modular architecture described in the patent.
    • Intrinsic Evidence for Interpretation:
      • Evidence for a Broader Interpretation: The specification describes data managers (agents) as "autonomous executables that receive, process and respond to user questions, queries and commands" and are "directed to a specific domain of applications" (’607 Patent, col. 5:5-12). This could be argued to encompass any software module that handles a specific function, such as searching for movies or changing channels.
      • Evidence for a Narrower Interpretation: Claim 1 of the ’209 Patent requires the agent to be an "autonomous executable." The specification further describes agents as "complete, convenient and re-distributable packages or modules of functionality" (’607 Patent, col. 5:8-12). This language may support a narrower construction requiring distinct, modular, and potentially interchangeable software components, which may differ from the architecture of the accused system.
  • Term: "cognitive model associated with the user" (from Claim 12 of the ’607 Patent)

    • Context and Importance: This term is critical because it requires the system to be personalized to a specific user's history, not just a generic speech recognition system. The dispute may turn on the degree of personalization implemented in the accused system.
    • Intrinsic Evidence for Interpretation:
      • Evidence for a Broader Interpretation: The patent’s abstract states the invention uses "user specific profile data to achieve a natural environment for users," which could support a construction that covers any system that considers user data, such as viewing history or saved preferences, in its processing.
      • Evidence for a Narrower Interpretation: Claim 12 requires generating a speech-based transcription based on the cognitive model, which includes "prior interactions between the user and the device." This language suggests that the model directly influences the initial voice-to-text conversion itself, not just the subsequent interpretation of the text. The patent’s Figure 8 explicitly distinguishes between a "General Cognitive Model" (806) and a "Personalized Cognitive model" (810), which could be cited to argue that the claim requires a model that is technically distinct and adapted for each specific user.

VI. Other Allegations

  • Indirect Infringement: The complaint alleges inducement of infringement under 35 U.S.C. § 271(b), asserting that Defendant knowingly encourages consumers to use the Accused Products in an infringing manner by providing detailed instructions and advertising (Compl. ¶¶ 76-78). The complaint includes a screenshot of Defendant's "Voice command tips" webpage as evidence of these instructions (Compl. p. 25).
  • Willful Infringement: The complaint alleges willful infringement based on Defendant's alleged pre-suit knowledge of the patented technology dating back to 2012 (Compl. ¶¶ 57-63, 81). The allegations describe a multi-year business relationship between Defendant and VoiceBox, the original inventor, which included meetings where VoiceBox presented its "Patented Contextual Voice and NLU technology," a live demo, a draft license agreement, and an alpha test of voice remote technology with 200 Comcast employees (Compl. ¶¶ 58-61). A presentation slide from these meetings is included as evidence, which explicitly mentions "Patented Contextual Voice and NLU technology" (Compl. ¶60, p. 23).

VII. Analyst’s Conclusion: Key Questions for the Case

  • A central technical issue will be one of architectural correspondence: Does the server-side processing system used by Comcast and its "contracted service provider" employ discrete, "autonomous executable" components that map onto the claimed "domain agent" structure, or does it utilize a different, more integrated architecture for natural language processing?
  • A key evidentiary question will be one of user-specific adaptation: Does the accused system generate speech transcriptions "based on a cognitive model" that is particularized to an individual user's "prior interactions," as required by the ’607 Patent, or does it rely on a general, non-personalized model for voice-to-text conversion?
  • Given the detailed allegations of a prior business relationship and the denial of inter partes review institution for four of the patents, a significant focus of the case will likely be on knowledge and intent: What was the extent of Comcast's exposure to the patented technology during its discussions with VoiceBox, and does this history, coupled with the patents surviving an initial validity challenge, support a finding of willful infringement?