DCT

5:23-cv-04166

Take2 Tech Ltd v. Pacific Biosciences Of California Inc

I. Executive Summary and Procedural Information

  • Parties & Counsel:
  • Case Identification: 5:23-cv-04166, N.D. Cal., 12/14/2022
  • Venue Allegations: The complaint was filed in the District of Delaware, alleging venue is proper because Defendant is a Delaware corporation that transacts business in the district. The case has since been transferred to the Northern District of California.
  • Core Dispute: Plaintiff alleges that Defendant’s DNA sequencing platforms, software, and services infringe a patent related to methods for detecting epigenetic modifications by analyzing kinetic data from the sequencing process.
  • Technical Context: The technology lies in the field of epigenetics, where chemical modifications to DNA, such as methylation, are analyzed to understand gene expression and disease, a critical area for diagnostics and biomedical research.
  • Key Procedural History: The complaint alleges that the inventors published their "holistic kinetic" model in a PNAS article in January 2021 and shared it with Defendant PacBio. It further alleges that PacBio publicly praised the inventors' work and, in April 2022, released a new software version with features that allegedly incorporate the patented method.

Case Timeline

Date Event
2019-08-16 '794 Patent Priority Date
2021-01-25 Inventors' PNAS article on "holistic kinetic" model published
2021-02-10 PacBio CEO allegedly credits inventors' work in an earnings call
2021-08-17 U.S. Patent No. 11,091,794 Issued
2021-10-18 PacBio hosts ASHG workshop allegedly highlighting inventors' methodology
2022-04-20 PacBio releases SMRT Link v11.0 with accused "5mC CpG Detection" feature
2022-12-14 Complaint Filed

II. Technology and Patent(s)-in-Suit Analysis

U.S. Patent No. 11,091,794 - "Determination of Base Modifications of Nucleic Acids"

  • Patent Identification: U.S. Patent No. 11091794 (“’794 Patent”), issued August 17, 2021.

The Invention Explained

  • Problem Addressed: The complaint asserts that prior techniques for measuring DNA methylation, a key epigenetic modification, "had failed to deliver sufficiently accurate results" (Compl. ¶13). The patent elaborates that while single-molecule, real-time (SMRT) sequencing could detect kinetic effects from modifications, previous analysis methods were not sufficiently robust or accurate, particularly for detecting 5-methylcytosine (5mC) (’794 Patent, col. 18:1-22).
  • The Patented Solution: The invention claims a method using a machine learning model to improve the accuracy of detecting nucleotide modifications. Instead of relying on a single data point, the method creates an "input data structure" from a "window" of nucleotides around a target base. This structure includes kinetic data (pulse width and interpulse duration) and sequence information for nucleotides within the window. A model is trained on data from DNA with known modification states to learn the kinetic signatures of those modifications, and this trained model is then used to predict modifications in unknown samples (’794 Patent, Abstract; FIG. 9; col. 25:54-67).
  • Technical Importance: This approach enables the direct and highly accurate detection of epigenetic modifications from sequencing data without requiring separate, potentially DNA-damaging chemical pre-treatments like bisulfite conversion (Compl. ¶65).

Key Claims at a Glance

  • The complaint asserts independent claim 1 (Compl. ¶36).
  • The essential elements of claim 1 are:
    • Receiving optical pulse data from sequencing and obtaining values for each nucleotide's identity, position, pulse width, and interpulse duration.
    • Creating an "input data structure" that comprises a "window" of nucleotides and includes the identity, relative position, pulse width, and interpulse duration for each nucleotide within that window.
    • Inputting this data structure into a model that has been trained by receiving similar data structures from molecules with a known modification state, storing them as labeled training samples, and optimizing the model's parameters based on its ability to match the known labels.
    • Using the trained model to determine if a modification is present in the sample nucleic acid.

III. The Accused Instrumentality

Product Identification

  • The "PacBio Products," which include the Sequel® II, Sequel IIe, and Revio™ sequencing systems when equipped with SMRT® Link software v11.0 or later (Compl. ¶35). The specific accused functionality is the "5mC CpG Detection" feature introduced in SMRT Link v11.0 (Compl. ¶27).

Functionality and Market Context

  • The accused products provide "HiFi sequencing," which the complaint alleges can be used to "explore DNA modifications" and "measure 5mC methylation" (Compl. ¶37). A screenshot from PacBio's website is provided to support this allegation (Compl. ¶37; Fig. 2).
  • The complaint alleges that the "5mC CpG Detection" feature uses a convolutional neural network (CNN) model, referred to as "primrose," to analyze polymerase kinetics within a defined window of nucleotides (Compl. ¶¶56, 62). This analysis allegedly predicts the methylation status of CpG sites.
  • A key visual from the complaint is a screenshot from a PacBio presentation titled "Neural network to call CpG methylation in HiFi reads," which depicts a "16x17 feature vector" being fed into a TensorFlow CNN to produce a "Probability of methylation" (Compl. ¶50; Fig. 5).

IV. Analysis of Infringement Allegations

'794 Patent Infringement Allegations

Claim Element (from Independent Claim 1) Alleged Infringing Functionality Complaint Citation Patent Citation
(a) receiving data acquired by measuring pulses in an optical signal corresponding to nucleotides sequenced in a sample nucleic acid molecule and obtaining, from the data, values for the following properties: for each nucleotide: an identity of the nucleotide, a position of the nucleotide within the sample nucleic acid molecule, a width of the pulse corresponding to the nucleotide, and an interpulse duration representing a time between the pulse corresponding to the nucleotide and a pulse corresponding to a neighboring nucleotide PacBio's systems observe a polymerase incorporating fluorescently labeled nucleotides, yielding two channels of information: fluorescence (identifying the base, A/C/G/T) and kinetics (how fast bases are incorporated). This process allegedly provides the nucleotide identity, position, pulse width, and interpulse duration. A provided screenshot illustrates the concept of measuring "pulse" and "nucleotide incorporation kinetics" in real time (Compl. ¶41; Fig. 4). ¶¶40-46 col. 21:28-36
(b) creating an input data structure, the input data structure comprising a window of the nucleotides sequenced in the sample nucleic acid molecule, wherein the input data structure includes, for each nucleotide within the window, the properties: the identity of the nucleotide, a position of the nucleotide with respect to a target position within the window, the width of the pulse corresponding to the nucleotide, and the interpulse duration PacBio's software allegedly creates "feature vectors" from HiFi reads in a BAM file. These vectors are alleged to be an "input data structure" comprising a "window" (e.g., 16 or 22 base pairs) that includes kinetic and sequence data for nucleotides surrounding a target CpG site. A screenshot from a PacBio BAM file specification is used to show how kinetic information like interpulse duration (fi/ri tags) and pulse width (fp/rp tags) is encoded (Compl. ¶52; Fig. 6). ¶¶47-54 col. 23:13-22
(c) inputting the input data structure into a model, the model trained by: receiving a first plurality of first data structures... wherein the modification has a known first state... storing a plurality of first training samples... and optimizing... parameters of the model... The "feature vectors" are allegedly input into a Convolutional Neural Network (CNN) model. This model is allegedly trained using control samples of fully methylated and fully unmethylated human DNA, which serve as the training data with a "known first state." The complaint asserts the model's parameters are optimized using this data. ¶¶55-60 col. 25:54-67
(d) determining, using the model, whether the modification is present in a nucleotide at the target position within the window in the input data structure. The output of the accused CNN model is alleged to be a "probability scale measure of whether the CpG is symmetrically 5mC-modified," which determines the methylation status. This determination is then allegedly written to a BAM file. A PacBio marketing diagram is provided showing that a "convolutional neural network model processes polymerase kinetics to determine the methylation status" (Compl. ¶62; Fig. 8). ¶¶61-63 col. 26:1-8
  • Identified Points of Contention:
    • Scope Questions: A central dispute may arise over the term "input data structure comprising a window." The defense may argue that its "feature vector," derived from a standardized BAM file, is technically distinct from the specific "2-D matrix" embodiments described in the patent (’794 Patent, FIG. 4), raising the question of whether the accused data structure falls within the scope of the claim.
    • Technical Questions: The complaint's allegations regarding the training of PacBio's CNN model (claim 1(c)) rely on PacBio's public-facing technical documents and presentations (Compl. ¶¶57, 59). A key question for the court will be whether the specific, proprietary training process for the "primrose" tool actually performs the claimed steps of optimizing parameters based on matching outputs to labels from known training samples, an issue that will likely require expert testimony and discovery into the software's source code and development.

V. Key Claim Terms for Construction

  • The Term: "a window of the nucleotides"

  • Context and Importance: This term defines the "holistic" data-gathering step that is central to the claimed invention's novelty. The infringement case hinges on whether PacBio's method of creating a "feature vector" for its neural network constitutes creating the claimed "window." Practitioners may focus on this term because its construction will determine whether a specific data formatting choice by PacBio can avoid infringement.

  • Intrinsic Evidence for Interpretation:

    • Evidence for a Broader Interpretation: The specification states the "measurement window" can be of "any suitable length" and provides a non-limiting list of sizes ranging from 2 to 10,000 nucleotides, suggesting the term is not confined to a single size or the specific examples shown (’794 Patent, col. 25:41-53).
    • Evidence for a Narrower Interpretation: The patent's primary embodiments and figures depict the "window" as a highly structured "2-D matrix" containing specific kinetic features (IPD, PW) for nucleotides at defined positions relative to a central target (’794 Patent, FIG. 4; col. 23:13-46). A party could argue the term requires this specific, structured data arrangement, not merely any collection of data from a segment of DNA.
  • The Term: "model trained by... optimizing... parameters... based on outputs of the model matching or not matching corresponding labels"

  • Context and Importance: This phrase describes the specific type of supervised machine learning required by the claim. Proving that PacBio's "primrose" model is trained this way is essential for the plaintiff. The ambiguity surrounding proprietary AI training methods makes this a likely point of contention.

  • Intrinsic Evidence for Interpretation:

    • Evidence for a Broader Interpretation: The patent discloses a wide array of potential "machine learning models," including the accused CNN type, and describes the training process in general terms common to supervised learning, suggesting broad coverage of such techniques (’794 Patent, col. 25:36-40; col. 25:54-67).
    • Evidence for a Narrower Interpretation: The patent describes the training process in the context of specific flowcharts (e.g., FIG. 9) and examples using "reference patterns" from known modified and unmodified samples. A defendant might argue that its training regimen involves different or more complex steps (e.g., transfer learning, different optimization objectives) that fall outside the literal scope of "optimizing... based on... matching... labels" as envisioned by the patent.

VI. Other Allegations

  • Indirect Infringement: The complaint alleges that PacBio induces infringement by providing customers with instructions on how to use the accused products to detect 5mC modifications. These instructions are allegedly contained in user guides, reference guides for the SMRT Link software and its "primrose" tool, and other documentation (Compl. ¶66).
  • Willful Infringement: The complaint alleges that PacBio had pre-suit knowledge of the invention through direct communications with the inventors, who shared a PNAS article describing the technology in January 2021 (Compl. ¶¶14-15). It further alleges that PacBio's CEO and marketing personnel publicly acknowledged the inventors' work, and that PacBio had knowledge of the patent itself since its issue date of August 17, 2021 (Compl. ¶¶17, 23, 67).

VII. Analyst’s Conclusion: Key Questions for the Case

  • A core issue will be one of definitional scope: can the claim term "input data structure comprising a window," which is described in the patent's embodiments as a specific matrix of kinetic features, be construed to read on the "feature vector" that PacBio's software allegedly generates from a standard BAM file?
  • A key evidentiary question will be one of technical proof: can the plaintiff demonstrate through discovery that the proprietary training process for PacBio's "primrose" neural network follows the specific supervised learning steps recited in Claim 1, or will PacBio be able to show a material difference in its methodology?
  • A central question for damages will be willfulness: do the alleged pre-suit interactions between the parties, including PacBio's alleged receipt and praise of the inventors' research paper, rise to the level of egregious conduct required to support a finding of willful infringement, especially given the complex and evolving nature of the technology?