PTAB

IPR2025-01229

Microsoft Corp v. Dialect LLC

Key Events
Petition
petition

1. Case Identification

2. Patent Overview

  • Title: Dynamic Speech Sharpening
  • Brief Description: The ’409 patent relates to speech interpretation methods that involve recognizing basic sub-word units corresponding to distinct speech sounds (“phonemes”) and applying an “acoustic grammar” to map the recognized phonemes to syllables, thereby generating interpretations of a user’s speech.

3. Grounds for Unpatentability

Ground 1: Claims 1-3 and 6 are obvious over Bazzi.

  • Prior Art Relied Upon: Bazzi (I. Bazzi & J. Glass, Heterogeneous Lexical Units For Automatic Speech Recognition, 2000 IEEE Int’l Conf. of Acoustics, Speech, and Signal Processing).
  • Core Argument for this Ground:
    • Prior Art Mapping: Petitioner argued that Bazzi, which was not before the Examiner but was used to reject a corresponding European application, discloses every element of the challenged claims. Bazzi teaches a two-stage speech recognizer that first generates a “scored phonetic graph” (the claimed “stream of phonemes”) from a user’s utterance. This graph is then composed with a syllable lexicon and a syllable grammar (collectively, the claimed “acoustic grammar”) to produce a syllable graph. This process constitutes mapping recognized phonemes to syllables to generate an interpretation. For claim 6, Petitioner asserted Bazzi’s disclosure of finding the “best path(s)” in its framework inherently generates a plurality of candidate interpretations, which are scored and selected to find the most likely hypothesis.
    • Motivation to Combine: Not applicable (single reference ground).
    • Expectation of Success: Not applicable (single reference ground).
    • Key Aspects: Petitioner emphasized that the European Patent Office (EPO) found nearly identical claims unpatentable over Bazzi during prosecution of a counterpart application, which the applicant ultimately abandoned.

Ground 2: Claims 2 & 3 are obvious over Bazzi in further view of Sabourin.

  • Prior Art Relied Upon: Bazzi and Sabourin (Patent 6,108,627).
  • Core Argument for this Ground:
    • Prior Art Mapping: This ground supplements Bazzi to the extent it is found not to explicitly teach the limitations of dependent claims 2 and 3. Claim 2 requires each syllable to be represented by acoustic elements for an onset, nucleus, and coda. Bazzi calls for an “automatic syllabification procedure” but does not detail it. Sabourin explicitly teaches such a procedure, describing how to partition phonemic transcriptions into syllables by assigning consonants to the onset and coda, and vowels to the nuclei. Claim 3 requires transitions between acoustic elements to be constrained by phonotactic rules. Sabourin teaches a “phonotactic post-processing” step to verify and prune illegal phoneme sequences, thereby providing the claimed constraining rules.
    • Motivation to Combine: A POSITA implementing Bazzi’s system, which requires an automatic syllabification procedure, would have been motivated to use the well-documented method taught by Sabourin to perform this function. Sabourin’s method provided the known benefits of improving accuracy and efficiency by ensuring only valid phonemic transcriptions are used.
    • Expectation of Success: A POSITA would have a high expectation of success, as Sabourin provides a direct, known solution for a component explicitly required but not detailed by Bazzi.

Ground 3: Claim 6 is obvious over Bazzi in further view of Epstein.

  • Prior Art Relied Upon: Bazzi and Epstein (Application # 2005/0055209).
  • Core Argument for this Ground:
    • Prior Art Mapping: This ground addresses claim 6, which adds generating and scoring a plurality of candidate word interpretations. While Bazzi teaches finding the “best path(s),” Epstein explicitly details a method for improving speech recognition accuracy by first generating an N-best list of likely hypotheses and then re-scoring them using more advanced semantic language models. This combination teaches generating multiple candidates (Bazzi), assigning scores (Epstein), and selecting the best interpretation (Epstein).
    • Motivation to Combine: A POSITA would combine these references to overcome the known limitations of the n-gram language models used in Bazzi. Epstein provides a direct motivation by teaching that its semantic re-scoring techniques “improve speech recognition accuracy.” This two-stage paradigm of an initial efficient search followed by a more sophisticated re-scoring of an N-best list was a well-known technique to improve performance without excessive computational cost.
    • Expectation of Success: Success was predictable because Epstein’s method is agnostic to the initial hypothesis-generation technique, and Bazzi provides just such a technique. The combination represents a standard, well-understood approach in the art for enhancing speech recognition systems.

4. Key Claim Construction Positions

  • Claim 1 Preamble: Petitioner argued that the preamble reciting “providing out-of-vocabulary interpretation capabilities and for tolerating noise” is non-limiting. It was asserted to be a mere statement of intended purpose, as the steps in the claim body stand on their own and do not depend on the preamble for meaning or completeness.
  • “acoustic grammar”: Petitioner contended the term should be given its plain and ordinary meaning. However, Petitioner also argued that its invalidity grounds hold under two alternative constructions adopted in prior litigation involving a related patent: (1) “grammar of phonotactic rules of the English language that maps phonemes to syllables” (from district court litigation) and (2) “collection of the phonemes...linked together to form syllables, which are linked together to form the words of the language” (from a prior IPR).

5. Key Technical Contentions (Beyond Claim Construction)

  • Interchangeability of "Phone" and "Phoneme": A central technical argument was that a POSITA would have understood the terms “phone” (used in Bazzi) and “phoneme” (used in the claims) to be interchangeable for the purposes of the invention. Petitioner supported this by citing prior art literature stating the terms are often used as synonyms in the field of speech recognition and argued that recognizing a phone (an acoustic realization) inherently results in recognition of its corresponding phoneme.

6. Relief Requested

  • Petitioner requested institution of an inter partes review and cancellation of claims 1-3 and 6 of the ’409 patent as unpatentable under 35 U.S.C. §103.