2:23-cv-01161
Jawbone Innovations LLC v. Sony Electronics Inc
I. Executive Summary and Procedural Information
- Parties & Counsel:- Plaintiff: Jawbone Innovations, LLC (Texas)
- Defendant: Sony Electronics Inc. (California)
- Plaintiff’s Counsel: Fabricant LLP
 
- Case Identification: 2:23-cv-01161, D.N.J., 06/12/2023
- Venue Allegations: Plaintiff alleges venue is proper in the District of New Jersey because Defendant is subject to personal jurisdiction, has committed acts of patent infringement, and has a regular and established place of business in the district.
- Core Dispute: Plaintiff alleges that Defendant’s audio products, including earbuds, headphones, and smart speakers, infringe nine U.S. patents related to acoustic noise suppression, voice activity detection, and dynamic audio enhancement.
- Technical Context: The technology at issue involves advanced methods for isolating a user's voice from ambient noise in personal audio devices, a critical feature for clear communication in varied environments.
- Key Procedural History: The complaint alleges that the technology was originally developed by AliphCom (dba Jawbone), which had a research contract with the Defense Advanced Research Projects Agency (DARPA). Following the 2017 liquidation of Jawbone, Inc., the complaint alleges that Sony was notified of the patent portfolio and its infringement thereof.
Case Timeline
| Date | Event | 
|---|---|
| 2001-05-30 | Earliest Priority Date ('058 Patent) | 
| 2004-09-01 | AliphCom launches "Jawbone" mobile headset | 
| 2007-07-17 | U.S. Patent No. 7,246,058 Issues | 
| 2008-06-13 | Earliest Priority Date ('080, '357, '691, '543, '213, '611 Patents) | 
| 2010-01-12 | Earliest Priority Date ('091 Patent) | 
| 2011-01-10 | Earliest Priority Date ('327 Patent) | 
| 2011-09-13 | U.S. Patent No. 8,019,091 Issues | 
| 2012-11-27 | U.S. Patent No. 8,321,213 Issues | 
| 2012-12-04 | U.S. Patent No. 8,326,611 Issues | 
| 2013-06-18 | U.S. Patent No. 8,467,543 Issues | 
| 2013-08-06 | U.S. Patent No. 8,503,691 Issues | 
| 2017-01-01 | Jawbone, Inc. liquidates; Sony allegedly notified of patents and infringement | 
| 2019-02-26 | U.S. Patent No. 10,218,327 Issues | 
| 2020-09-15 | U.S. Patent No. 10,779,080 Issues | 
| 2021-09-14 | U.S. Patent No. 11,122,357 Issues | 
| 2023-06-12 | Complaint Filed | 
II. Technology and Patent(s)-in-Suit Analysis
U.S. Patent No. 8,019,091 - “Voice Activity Detector (VAD) -Based Multiple-Microphone Acoustic Noise Suppression,” issued September 13, 2011
The Invention Explained
- Problem Addressed: The patent describes the challenge of suppressing acoustic noise, particularly in situations with multiple noise sources, which can degrade the quality of speech detection, transmission, and recording. (Compl. ¶21).
- The Patented Solution: The invention proposes a system that uses a highly accurate voice activity detector (VAD), which may incorporate a sensor that detects physiological information like tissue vibration, to determine when a user is speaking. (’091 Patent, col. 9:11-24). Based on this VAD input, the system generates different mathematical "transfer functions." A first transfer function, calculated when only noise is present, represents the ratio of energies between microphones and is used to remove noise. A second transfer function is generated when speech is present, and a combination of the two is used to produce a denoised audio stream. (Compl. ¶21; ’091 Patent, col. 10:5-15).
- Technical Importance: This VAD-gated, dual-transfer-function approach was designed to provide more precise noise removal than systems that do not have a reliable, independent confirmation of voicing activity. (Compl. ¶21).
Key Claims at a Glance
- The complaint asserts at least independent claim 11. (Compl. ¶38).
- Essential elements of Claim 11:- A receiver that receives at least two acoustic signals from at least two microphones.
- At least one sensor that receives human tissue vibration information associated with user voicing activity.
- A processor that generates a plurality of transfer functions, including:- A first transfer function (representative of a ratio of energy of acoustic signals) generated when voicing activity is absent.
- A second transfer function generated when voicing activity is present.
 
- The processor removes acoustic noise using the first transfer function and at least one combination of the first and second transfer functions.
 
U.S. Patent No. 7,246,058 - “Detecting Voiced and Unvoiced Speech Using Both Acoustic and Nonacoustic Sensors,” issued July 17, 2007
The Invention Explained
- Problem Addressed: The patent addresses the difficulty of distinguishing between three types of audio signals in noisy environments: voiced speech (e.g., vowels), unvoiced speech (e.g., "s" sounds), and background noise. (’058 Patent, col. 1:26-34).
- The Patented Solution: The invention uses a hybrid system with both acoustic microphones and at least one "voicing sensor" that detects physiological information (e.g., tissue vibration). The system's processor generates "cross correlation data" between the physiological sensor's signal and an acoustic signal to identify voiced speech. (’058 Patent, col. 4:1-12). To distinguish unvoiced speech from noise, it generates "difference parameters" based on the relative signal gain between two microphones and compares them to a gain threshold. (Compl. ¶55; ’058 Patent, col. 1:57-67).
- Technical Importance: This approach provided a multi-faceted method for classifying different components of human speech, allowing for more nuanced noise suppression than a simple speech/no-speech determination. (Compl. ¶20-21).
Key Claims at a Glance
- The complaint asserts at least independent claim 1. (Compl. ¶55).
- Essential elements of Claim 1:- At least two microphones receiving acoustic signals.
- At least one voicing sensor receiving physiological information.
- At least one processor that:- Generates cross correlation data between the physiological information and an acoustic signal.
- Identifies voiced speech when the cross correlation data exceeds a correlation threshold.
- Generates difference parameters representing relative signal gain between the two microphones.
- Identifies unvoiced speech when the difference parameters exceed a gain threshold.
- Identifies noise when the difference parameters are less than the gain threshold.
 
 
U.S. Patent No. 10,779,080 - “Dual Omnidirectional Microphone Array (DOMA),” issued September 15, 2020
- Technology Synopsis: This patent describes using an array of omnidirectional microphones to create "virtual microphones." The outputs of the physical microphones are processed to form two distinct virtual microphones that have similar responses to background noise but dissimilar responses to the user's speech. This differential response allows an adaptive filter to significantly reduce noise without distorting the desired speech signal. (Compl. ¶23).
- Asserted Claims: At least independent claim 1. (Compl. ¶73).
- Accused Features: The complaint alleges that Sony’s WF-1000XM4 earbuds, which comprise two physical omnidirectional microphones and a processing component, form two "beamformed virtual microphones" that exhibit the claimed similar noise response and dissimilar speech response. (Compl. ¶24, 75, 77).
U.S. Patent No. 11,122,357 - “Forming Virtual Microphone Arrays Using Dual Omnidirectional Microphone Array (DOMA),” issued September 14, 2021
- Technology Synopsis: Similar to the ’080 Patent, this patent describes noise suppression using physical microphones to form virtual directional microphones. It specifies combining physical microphone signals by filtering and summing them in the time domain to apply a varying linear transfer function, thereby suppressing noise in the final output signal. (Compl. ¶25).
- Asserted Claims: At least independent claim 1. (Compl. ¶88).
- Accused Features: The complaint alleges that the Sony WF-1000XM4 earbuds, which comprise arrays of physical microphones, combine the microphone outputs to create beamformed microphones that reduce noise in the signal. (Compl. ¶26, 89-92).
U.S. Patent No. 8,467,543 - “Microphone and Voice Activity Detection (VAD) Configurations For Use with Communications Systems,” issued June 18, 2013
- Technology Synopsis: This patent describes a system with a voice detection subsystem and a denoising subsystem. It details specific microphone configurations, such as a first microphone oriented toward a talker's mouth and a second oriented away from it. When the voice detection subsystem indicates voicing is occurring, the denoising system can generate a noise waveform estimate (e.g., from the outward-facing microphone) and subtract it from the signal that contains both speech and noise. (Compl. ¶27).
- Asserted Claims: At least independent claim 1. (Compl. ¶101).
- Accused Features: The complaint alleges the Sony WF-1000XM4 earbuds have microphones oriented toward and away from the user's mouth and use a speech-detecting accelerometer to trigger a denoising subsystem that subtracts estimated noise. (Compl. ¶28, 102-104, 107).
U.S. Patent No. 8,503,691 - “Virtual Microphone Arrays Using Dual Omnidirectional Microphone Array (DOMA),” issued August 6, 2013
- Technology Synopsis: This patent, related to the ’357 and ’080 Patents, describes creating virtual microphones from physical omnidirectional microphones. It specifies creating a first virtual microphone with a broad response to speech and a second virtual microphone that has a "single null oriented in a direction toward a source of the speech." This allows the system to differentiate between speech and noise based on their differing responses in the two virtual channels. (Compl. ¶25, 118).
- Asserted Claims: At least independent claim 23. (Compl. ¶118).
- Accused Features: The complaint alleges the Sony WF-1000XM4 earbuds create beamformed microphones, one of which has a broad response to the user's speech and another of which has a null oriented toward the user's mouth. (Compl. ¶26, 119-122).
U.S. Patent No. 8,321,213 - “Acoustic Voice Activity Detection (AVAD) for Electronic Systems,” issued November 27, 2012
- Technology Synopsis: This patent describes an acoustic voice activity detection system that uses a ratio of energies between virtual microphones. The system forms a first virtual microphone by combining signals from two physical microphones. It then forms a special "filter" and applies it to create a second virtual microphone. The system determines that speech is present when the energy ratio between these two virtual microphones exceeds a threshold. (Compl. ¶29, 134).
- Asserted Claims: At least independent claim 1. (Compl. ¶134).
- Accused Features: The complaint alleges that the Sony WF-1000XM4 and LF-S50G products form an array of virtual microphones and detect user speech (e.g., a wake word) by comparing a ratio of energies of the beamformed microphones to a threshold. (Compl. ¶30, 138).
U.S. Patent No. 8,326,611 - “Acoustic Voice Activity Detection (AVAD) for Electronic Systems,” issued December 4, 2012
- Technology Synopsis: This patent is related to the ’213 Patent and describes a method for acoustic voice activity detection. The method involves forming a first virtual microphone by combining signals, forming a filter that describes the relationship for speech between the physical microphones, forming a second virtual microphone by applying that filter, and then detecting speech when the energy ratio between the two virtual microphones exceeds a threshold. (Compl. ¶29, 148).
- Asserted Claims: At least independent claim 1. (Compl. ¶148).
- Accused Features: The complaint alleges that the Sony WF-1000XM4 and LF-S50G products perform the claimed method by forming virtual microphones and detecting speech via an energy ratio comparison against a threshold, for example in wake word detection. (Compl. ¶30, 152).
U.S. Patent No. 10,218,327 - “Dynamic Enhancement of Audio (DAE) In Headset Systems,” issued February 26, 2019
- Technology Synopsis: This patent describes dynamically enhancing the intelligibility of received audio. The system includes a noise level estimator (NLE) with both "stationary" and "non-stationary" noise detectors. Based on the type and level of noise detected, the system modifies the volume and equalization of an incoming audio stream to enhance its clarity for the listener. (Compl. ¶31).
- Asserted Claims: At least independent claim 1. (Compl. ¶162).
- Accused Features: The complaint alleges that the Sony WF-1000XM4 and LF-S50G products implement "adaptive sound control" and "automatic volume control," which include dynamic volume and equalization adjustments based on the detection of both stationary and non-stationary ambient noise levels. (Compl. ¶32, 164-171).
III. The Accused Instrumentality
Product Identification
The complaint identifies the "Accused Products" as Sony earbuds (e.g., WF and LinkBuds series, specifically the WF-1000XM4), headphones (e.g., WH series), beamformer microphones (e.g., MAS-A100), and smart speakers (e.g., LF and RA series, specifically the LF-S50G). (Compl. ¶19, 34).
Functionality and Market Context
The complaint focuses on the Sony WF-1000XM4 earbuds, alleging they incorporate infringing noise suppression and voice detection technologies. (Compl. ¶22). Key accused functionalities include Sony’s "Precise Voice Pickup Technology," which allegedly combines beamforming microphones with a bone-conduction sensor and/or accelerometer to pick up the user's voice clearly. (Compl. ¶22, 41). The complaint alleges these components comprise a voice activity detector. (Compl. ¶22). The complaint provides a diagram from a third-party analysis showing the WF-1000XM4's internal components, including a "Processor" (Sony Integrated Processor V1), "Back Microphone," "Forward Microphone," and "3-Axis Accelerometer." (Compl. p. 17). The complaint also highlights marketing material stating that beamforming microphones are calibrated to pick up sound from the user's mouth, while a bone-conduction sensor picks up vibrations from the user's voice to enhance clarity. (Compl. p. 7).
IV. Analysis of Infringement Allegations
’091 Patent Infringement Allegations
| Claim Element (from Independent Claim 11) | Alleged Infringing Functionality | Complaint Citation | Patent Citation | 
|---|---|---|---|
| a system for removing acoustic noise from the acoustic signals, comprising: a receiver that receives at least two acoustic signals via at least two acoustic microphones positioned in a plurality of locations; | The Sony WF-1000XM4 comprises a receiver and a microphone array with at least two microphones (e.g., forward and back microphones) positioned in different locations to receive acoustic signals. | ¶40 | col. 4:20-24 | 
| at least one sensor that receives human tissue vibration information associated with human voicing activity of a user; | The Sony WF-1000XM4 comprises at least one accelerometer and a bone conduction sensor that allegedly receive human tissue vibration associated with voicing activity. | ¶41 | col. 9:11-16 | 
| a processor coupled among the receiver and the at least one sensor that generates a plurality of transfer functions, wherein the plurality of transfer functions includes a first transfer function representative of a ratio of energy of acoustic signals received using at least two different acoustic microphones... | The Sony Integrated Processor V1 is coupled to the receiver and sensors and allegedly utilizes the microphone array to generate a plurality of transfer functions, including a first transfer function representing a ratio of energy received at the different microphones. | ¶42 | col. 10:5-10 | 
| ...wherein the first transfer function is generated in response to a determination that voicing activity is absent from the acoustic signals for a period of time, | The Sony WF-1000XM4 allegedly generates the first transfer function when the voice pickup unit (accelerometer and/or bone conduction sensor) indicates that voicing activity is absent. | ¶43 | col. 10:10-12 | 
| wherein the plurality of transfer functions includes a second transfer function representative of the acoustic signals, wherein the second transfer function is generated in response to a determination that voicing activity is present... | The Sony WF-1000XM4 allegedly generates a second transfer function when the voice pickup unit detects human tissue vibrations, indicating voicing activity is present. | ¶44 | col. 10:12-15 | 
| wherein acoustic noise is removed from the acoustic signals using the first transfer function and at least one combination of the first transfer function and the second transfer function to produce the denoised acoustic data stream. | The Sony WF-1000XM4 allegedly removes noise by applying the first transfer function (generated when voicing is absent) and a combined transfer function (when voicing is detected). | ¶45 | col. 10:15-20 | 
Identified Points of Contention
- Scope Questions: A central question may be whether the signal processing performed by Sony's Integrated Processor V1 constitutes the generation of a "plurality of transfer functions" as that term is used in the patent. The analysis will likely focus on whether Sony's beamforming and noise cancellation algorithms operate by creating distinct first and second transfer functions based on a binary VAD state (voicing vs. non-voicing).
- Technical Questions: The complaint alleges on "information and belief" that a first transfer function is generated when voicing is absent and a second when it is present. A key factual question for discovery will be to determine the precise mechanism by which the accused products' processor adapts its noise cancellation based on input from the bone conduction and accelerometer sensors.
’058 Patent Infringement Allegations
| Claim Element (from Independent Claim 1) | Alleged Infringing Functionality | Complaint Citation | Patent Citation | 
|---|---|---|---|
| a system for detecting voiced and unvoiced speech...comprising: at least two microphones that receive the acoustic signals; | Each earbud of the Sony WF-1000XM4 comprises at least two MEMS microphones that receive acoustic signals. | ¶56 | col. 2:61-63 | 
| at least one voicing sensor that receives physiological information associated with human voicing activity; | The Sony WF-1000XM4 comprises an accelerometer which allegedly receives human tissue vibration associated with voicing activity. | ¶57 | col. 1:44-46 | 
| and at least one processor coupled among the microphones and the voicing sensor, wherein the at least one processor; generates cross correlation data between the physiological information and an acoustic signal received at one of the two microphones; | The Sony Integrated Processor V1 is coupled to the microphones and accelerometer and allegedly generates cross correlation data between the tissue vibration information and an acoustic signal. | ¶58-59 | col. 4:1-5 | 
| identifies information of the acoustic signals as voiced speech when the cross correlation data...exceeds a correlation threshold; | The Sony Integrated Processor V1 allegedly identifies acoustic signals as speech when the cross correlation data exceeds a threshold based on vibration and/or acoustic signals. | ¶60 | col. 4:5-8 | 
| generates difference parameters between the acoustic signals received at each of the two receivers, wherein the difference parameters are representative of the relative difference in signal gain...; | The Sony Integrated Processor V1 allegedly generates difference parameters between the signals from each microphone, representing the relative difference in signal gain. | ¶61 | col. 1:57-62 | 
| identifies information of the acoustic signals as unvoiced speech when the difference parameters exceed a gain threshold; | The processor allegedly identifies signals as unvoiced speech when the difference parameter exceeds a gain threshold. | ¶62 | col. 1:62-64 | 
| and identifies information of the acoustic signals as noise when the difference parameters are less than the gain threshold. | The processor allegedly identifies signals as noise when the difference parameters are less than the gain threshold. | ¶63 | col. 1:64-67 | 
Identified Points of Contention
- Scope Questions: The dispute may turn on whether Sony's signal processing method can be characterized as generating "cross correlation data" and "difference parameters" and applying distinct "correlation" and "gain" thresholds as required by the claim. The definition of these terms and the specific sequence of operations will be critical.
- Technical Questions: A key evidentiary question will be whether the complaint provides sufficient factual basis that the Sony Integrated Processor V1 performs the specific three-part classification scheme (voiced vs. unvoiced vs. noise) using the distinct methodologies (cross-correlation vs. difference parameters) and separate thresholds recited in the claim, or if it uses a different, more integrated algorithm to achieve a similar outcome.
V. Key Claim Terms for Construction
For the ’091 Patent
- The Term: "transfer function"
- Context and Importance: This term is the core of the claimed invention. Its construction will determine whether the accused product's noise cancellation algorithm, which the complaint alleges is a form of beamforming, falls within the scope of the claim. Practitioners may focus on whether this term requires the specific mathematical relationship between microphone inputs described in the patent or if it can be read more broadly to cover other adaptive filtering techniques.
- Intrinsic Evidence for Interpretation:- Evidence for a Broader Interpretation: The claim defines the first transfer function broadly as "representative of a ratio of energy of acoustic signals received using at least two different acoustic microphones." (’091 Patent, cl. 11).
- Evidence for a Narrower Interpretation: The specification describes specific equations and methods for calculating the transfer function H1(z), which is derived when speech is absent. (’091 Patent, col. 22:15-25, Eq. 2). A defendant may argue the term should be limited to this disclosed methodology.
 
For the ’058 Patent
- The Term: "cross correlation data"
- Context and Importance: The claim requires the processor to generate "cross correlation data" between physiological and acoustic signals specifically to identify "voiced speech." The case may depend on whether the processing performed by the Sony V1 chip meets this definition. Practitioners may focus on the technical evidence required to prove that Sony's processor performs a mathematical cross-correlation, as opposed to a more general comparison or fusion of sensor data.
- Intrinsic Evidence for Interpretation:- Evidence for a Broader Interpretation: The patent does not limit the term to a single mathematical formula in the claims, referring more generally to a processor that "generates cross correlation data." (’058 Patent, cl. 1).
- Evidence for a Narrower Interpretation: The detailed description discusses using a Normalized Least Mean Square (NLMS) adaptive filter to determine the correlation between microphone signals, which a defendant may argue limits the scope of "cross correlation" to such specific adaptive filtering techniques. (’058 Patent, col. 5:48-52).
 
VI. Other Allegations
Indirect Infringement
For all asserted patents, the complaint alleges induced infringement. The basis for inducement is Sony's alleged provision of instruction manuals, websites, and promotional materials that demonstrate to customers how to use the Accused Products in an infringing manner. (Compl. ¶47, 65, 81).
Willful Infringement
For all asserted patents, the complaint alleges willful infringement. The basis for willfulness is alleged pre-suit knowledge, starting in 2017 after the liquidation of Jawbone, Inc., when Sony was allegedly "marketed to" and notified of Jawbone's patent portfolio and its ongoing infringement. (Compl. ¶19, 49, 67, 128).
VII. Analyst’s Conclusion: Key Questions for the Case
- A core issue will be one of technical implementation: Do the algorithms within Sony’s Integrated Processor V1 perform the specific, multi-step logical operations recited in the asserted claims (e.g., generating distinct VAD-gated transfer functions per the ’091 patent; or generating cross-correlation data and separate difference parameters per the ’058 patent), or do they achieve noise cancellation and voice detection through a fundamentally different, non-infringing method?
- A central question will be one of claim scope and evolution: Can claim terms rooted in the context of signal processing from the early-to-mid 2000s (e.g., "transfer function," "cross correlation data") be construed to cover the potentially more sophisticated and integrated audio processing techniques used in modern consumer electronics like the Sony WF-1000XM4?
- A key evidentiary question will be one of knowledge and intent: What evidence will emerge in discovery to substantiate the allegation that Sony was notified of its specific infringement of these nine patents in or around 2017, and what actions, if any, did it take in response? This will be central to the claims for willful and induced infringement.