PTAB

IPR2025-01063

perPlexity Ai Inc v. comet ML Inc

Key Events
Petition
petition

1. Case Identification

2. Patent Overview

  • Title: Systems and Methods for Training a Neural Network
  • Brief Description: The ’968 patent discloses a method for optimizing the training of a neural network (NN) by using a second machine learning model. This model predicts when to stop a training process to conserve computational resources by calculating a "probability of improvement" in the NN's loss and comparing it to a threshold.

3. Grounds for Unpatentability

Ground 1: Obviousness over Baker and Lorenz - Claims 1 and 6-8 are obvious over Baker in view of Lorenz.

  • Prior Art Relied Upon: Baker (International Publication No. WO 2018/175098) and Lorenz (a Nov. 2015 publication).
  • Core Argument for this Ground:
    • Prior Art Mapping: Petitioner argued that Baker discloses a “coach” machine learning (ML) system that trains and optimizes a “student” ML system (which can be an NN). Baker’s coach adjusts hyperparameters to minimize a cost of errors (i.e., loss) and continues the training process until a "stopping criterion is reached." However, Baker did not specify the criterion. Petitioner asserted that Lorenz remedies this by describing well-known stopping criteria for ML optimization, including using a "probability of improvement" (PI) metric. Lorenz teaches terminating an optimization algorithm when the calculated PI for a new set of parameters falls below a predetermined threshold. Petitioner contended that implementing Lorenz’s specific PI-based stopping criterion in Baker’s general training framework renders the core method of claim 1 obvious. Dependent claim 6, which recites determining a mean and variance, was allegedly taught by Lorenz’s PI function, which uses these statistical measures. Dependent claim 7, reciting hyperparameters, was allegedly taught by Baker’s coach, which is designed to optimize them.
    • Motivation to Combine: A Person of Ordinary Skill in the Art (POSITA) would combine these references to improve Baker’s system. Since Baker explicitly requires a "stopping criterion" without detailing one, a POSITA would have looked to known, efficient criteria like those in Lorenz to conserve computational resources and prevent non-productive training runs, which was a known problem in the field.
    • Expectation of Success: A POSITA would have a reasonable expectation of success, as Lorenz’s stopping criteria were established for ML optimization processes, and applying them to Baker’s ML training system would be a straightforward implementation of a known technique to solve a known problem.

Ground 2: Obviousness over Baker, Lorenz, and Shridhar - Claims 2-4 and 9-11 are obvious over Baker in view of Lorenz and Shridhar.

  • Prior Art Relied Upon: Baker (WO 2018/175098), Lorenz (a Nov. 2015 publication), and Shridhar (a Jan. 2019 publication).
  • Core Argument for this Ground:
    • Prior Art Mapping: This ground builds on the Baker and Lorenz combination by adding Shridhar to address limitations related to a "wait value." Petitioner argued that Shridhar teaches another well-known early stopping technique based on a "wait threshold." Specifically, Shridhar stops training if a performance metric (e.g., validation accuracy) remains unchanged for a set number of epochs (e.g., 5 epochs). This period of unchanged performance corresponds to the claimed "wait value," and the set number of epochs corresponds to the "wait threshold." Petitioner asserted that this technique directly maps to the limitations in claims 2-4 and their system-based counterparts in claims 9-11, which involve continuing training if a wait value is above a threshold, incrementing the wait value when loss does not improve, and resetting it to zero when loss does improve.
    • Motivation to Combine: A POSITA would combine Shridhar’s teachings with the Baker/Lorenz framework to create a more robust, multi-faceted stopping condition. Baker’s coach aims to optimize both error cost and computation cost. Lorenz’s PI criterion addresses error cost, while Shridhar’s wait threshold addresses computation cost by terminating training that has stalled or ceased to improve. Using both would provide a more comprehensive solution to optimize the overall training process.
    • Expectation of Success: Success would be expected because implementing multiple, complementary stopping criteria was a known strategy in ML. Combining a performance-based criterion (Lorenz) with a stagnation-based criterion (Shridhar) was a logical and predictable way to enhance the efficiency of the training system disclosed by Baker.

Ground 3: Obviousness over Baker, Lorenz, and Jenatton - Claims 5 and 12 are obvious over Baker in view of Lorenz and Jenatton.

  • Prior Art Relied Upon: Baker (WO 2018/175098), Lorenz (a Nov. 2015 publication), and Jenatton (an Aug. 2017 publication).
  • Core Argument for this Ground:
    • Prior Art Mapping: This ground adds Jenatton to the Baker/Lorenz combination to address claims reciting a "tree data structure." Petitioner argued that Jenatton discloses using a tree-structured model to make hyperparameter selection more efficient. In Jenatton, hierarchical relationships between hyperparameters are organized into a tree, with specific parameter sets populating the leaf nodes. An acquisition function (such as the PI function from Lorenz) is used to identify the "most promising leaf node" to explore, thereby avoiding a brute-force search of the entire parameter space. Petitioner contended this directly teaches the limitation of "obtaining from a leaf of at least one tree data structure data relevant to the NN," as recited in claims 5 and 12.
    • Motivation to Combine: A POSITA would have been motivated to implement Jenatton's tree structure in Baker’s learning coach to improve its efficiency. Baker’s coach identifies optimal hyperparameters, and Jenatton provides a known, computationally superior method for that exact task compared to an exhaustive grid search. This combination represents the use of a known data structure to improve the performance of a known system.
    • Expectation of Success: A POSITA would have a high expectation of success because Jenatton explicitly presents its tree-based approach as a method for speeding up hyperparameter optimization in ML models, which is the precise function of Baker's learning coach. The integration would be a predictable application of a known efficiency-enhancing technique.

4. Relief Requested

  • Petitioner requests institution of an inter partes review and cancellation of claims 1-12 of the ’968 patent as unpatentable.