PTAB

IPR2025-00459

Samsung Electronics Co Ltd v. Cerence Operating Co

Key Events
Petition
petition

1. Case Identification

2. Patent Overview

  • Title: Generating Synthetic Speech with Contrastive Stress
  • Brief Description: The ’486 patent discloses techniques for improving text-to-speech (TTS) systems by generating synthetic speech with contrastive stress. The system analyzes text input to identify portions that differ between text strings, applies stress (e.g., increased pitch, amplitude, or duration) to those differing portions, and synthesizes an audio output reflecting this stress to enhance listener comprehension.

3. Grounds for Unpatentability

Ground 1: Claims 1, 7-8, and 14-15 are obvious over Walker in view of Matsumoto.

  • Prior Art Relied Upon: Walker (Application # 2001/0049602) and Matsumoto (Application # 2007/0233492).
  • Core Argument for this Ground:
    • Prior Art Mapping: Petitioner argued that Walker discloses a text-to-speech system that improves the naturalness of synthesized speech by using context, specifically mentioning its application for weather reports provided via a voice application. However, Walker lacks detail on its TTS engine. Matsumoto remedies this by teaching a speech synthesizer that improves naturalness for similar content (e.g., weather forecasts) by comparing successive sentences (text strings) to identify different words or phrases. Matsumoto then modifies the prosody (pitch, speed, volume) of these differing portions to make them stand out. The combination thus taught identifying differing portions of text strings (Matsumoto) within a speech-enabled application (Walker) and assigning stress to those portions to generate a synthesized output.
    • Motivation to Combine: A person of ordinary skill in the art (POSITA) seeking to implement Walker’s system would be motivated to find a suitable TTS engine. A POSITA would have readily found Matsumoto, which addresses the same problem of improving speech naturalness in a similar context (weather reports), and would incorporate its teachings to provide a more detailed and effective TTS engine for Walker’s system.
    • Expectation of Success: A POSITA would have a reasonable expectation of success in combining the references. The combination involved incorporating a known TTS technique (Matsumoto) into a known system ready for such an improvement (Walker) to achieve the predictable result of more natural-sounding synthesized speech for weather reports.

Ground 2: Claims 2-5, 9-12, and 16-19 are obvious over Walker in view of Matsumoto and Malsheen.

  • Prior Art Relied Upon: Walker (Application # 2001/0049602), Matsumoto (Application # 2007/0233492), and Malsheen (Patent 5,634,084).
  • Core Argument for this Ground:
    • Prior Art Mapping: This ground builds on the Walker and Matsumoto combination by adding Malsheen to teach the limitation of identifying differing text based on "normalized orthography," particularly for numerical fields as recited in the dependent claims. Petitioner asserted that a weather report in the Walker/Matsumoto system could contain temperatures expressed as numerals (e.g., "70 degrees" vs. "80 degrees"). Malsheen teaches a text classifier that expands numerals into full word strings (e.g., "70" to "seventy") before conversion to speech to ensure proper pronunciation. Applying Malsheen’s text normalization to the Walker/Matsumoto combination would allow the system to identify that "eighty" differs from "seventy," thereby teaching the normalized orthography limitation.
    • Motivation to Combine: Walker does not detail how its text cleaner handles numerals. A POSITA implementing the Walker/Matsumoto combination for weather reports containing numbers would be motivated to incorporate Malsheen's teaching on numeral-to-word expansion to ensure proper pronunciation by the speech synthesizer.
    • Expectation of Success: A POSITA would expect success, as this involved applying a known technique for text normalization (Malsheen) to improve a known TTS system (Walker/Matsumoto) to yield the predictable result of correctly pronouncing numbers in synthesized speech.

Ground 3: Claims 6, 13, and 20 are obvious over Walker in view of Matsumoto and Marple.

  • Prior Art Relied Upon: Walker (Application # 2001/0049602), Matsumoto (Application # 2007/0233492), and Marple (Application # 2008/0195391).

  • Core Argument for this Ground:

    • Prior Art Mapping: This ground addresses claims requiring the speech synthesis output to comprise "identification of a plurality of audio recordings." Petitioner argued that while Matsumoto teaches generating a phonogram string with modified prosody, it does not detail how these symbolic phonemes are realized as audio. Marple provides this missing detail, teaching a speech synthesizer that uses a database of pre-recorded phonemes (audio recordings). It assembles or concatenates these recordings to generate the final synthetic speech. The combination of Marple with Walker/Matsumoto thus taught generating speech by selecting and concatenating pre-recorded audio units corresponding to the phonemes with contrastive stress.
    • Motivation to Combine: A POSITA implementing the Walker/Matsumoto system would be motivated to find a method for converting the symbolic phoneme strings into actual audio waveforms. Marple, which discloses using a database of pre-recorded phonemes to create more human-like speech, provides a well-known and resource-efficient solution for this task.
    • Expectation of Success: This combination involved using a known method for audio generation (Marple's concatenative synthesis) to implement a known TTS system, which would predictably result in an improved, more natural-sounding audio output.
  • Additional Grounds: Petitioner asserted additional obviousness challenges based on combinations including Bellegarda (Patent 7,313,523) as an alternative to address arguments that deemphasizing words constitutes applying "contrastive stress."

4. Key Technical Contentions (Beyond Claim Construction)

  • Definition of "Contrastive Stress": A central technical contention was that "contrastive stress," as used in the ’486 patent and understood by a POSITA, requires an increase in prosody (e.g., increased pitch, amplitude, duration) to emphasize a word. Petitioner argued that Matsumoto's method of deemphasizing words that are the same between two text strings (e.g., by decreasing pitch/volume) does not constitute applying contrastive stress. This interpretation was crucial for arguing that the prior art meets the negative limitation in claims 1, 8, and 15, which requires "not assigning contrastive stress" to the non-differing portions of the text strings.

5. Arguments Regarding Discretionary Denial

  • Petitioner argued that discretionary denial under §314(a) based on Fintiv factors would be inappropriate. The petition was filed early in the parallel district court litigation, before significant investment or substantive orders. The scheduled trial date was over a year away, and Petitioner argued there was incomplete overlap between the claims challenged in the IPR and those asserted in the litigation, weighing in favor of institution.
  • Petitioner also contended that denial under §325(d) was unwarranted because none of the prior art references relied upon in the petition were previously considered by the USPTO during prosecution.

6. Relief Requested

  • Petitioner requests institution of an inter partes review and cancellation of claims 1-20 of the ’486 patent as unpatentable.