DCT

2:23-cv-00459

Trend Micro Inc California Corp v. Open Text Inc

I. Executive Summary and Procedural Information

  • Parties & Counsel:
  • Case Identification: 2:23-cv-00459, E.D. Tex., 10/02/2023
  • Venue Allegations: Plaintiff asserts venue is proper based on Defendants maintaining regular and established places of business in the district, including offices and data centers in Allen and Plano, Texas, and soliciting customers within the district. The complaint also notes a prior case where Defendant OTI allegedly admitted the district was a proper forum for venue.
  • Core Dispute: Plaintiff alleges that Defendant’s cybersecurity, information management, and data capture products infringe three patents related to machine-learning-based malware detection, malicious URL identification, and advanced optical character recognition (OCR).
  • Technical Context: The patents address cybersecurity threats, specifically the use of machine learning classifiers to identify novel malware and malicious web links, and to overcome anti-OCR techniques used in spam images.
  • Key Procedural History: The complaint alleges Defendants have had actual knowledge of the asserted patents since at least September 16, 2022, based on a prior lawsuit filed by Trend Micro against Defendants in the Eastern District of Virginia. The complaint also references prosecution history for two of the asserted patents to argue for their technical merit and to distinguish the claimed inventions from prior art.

Case Timeline

Date Event
2005-08-15 U.S. Patent No. 8,161,548 Priority Date
2006-12-04 U.S. Patent No. 8,045,808 Priority Date
2010-01-13 U.S. Patent No. 8,505,094 Priority Date
2010-12-16 '808 Patent Prosecution Event (Applicant Arguments)
2011-07-13 Defendant OTC acquires Global 360 Holding Corp.
2011-09-30 '548 Patent Prosecution Event (Applicant Arguments)
2011-10-25 U.S. Patent No. 8,045,808 Issued
2012-04-17 U.S. Patent No. 8,161,548 Issued
2013-08-06 U.S. Patent No. 8,505,094 Issued
2015-11-23 Defendant OTC acquires Daegis Inc.
2022-09-16 Alleged Date of Actual Knowledge of Patents-in-Suit
2023-10-02 Complaint Filing Date

II. Technology and Patent(s)-in-Suit Analysis

U.S. Patent No. 8,161,548 - Malware Detection Using Pattern Classification

The Invention Explained

  • Problem Addressed: The patent describes conventional malware detection techniques—which relied on predefined pattern databases or manually written rules—as being unable to effectively detect new, unknown malware while simultaneously achieving a high detection rate and a low false-positive rate (Compl. ¶27; ’548 Patent, col. 1:34-46).
  • The Patented Solution: The invention proposes a method for training a single machine learning classifier using a feature set that is combined from two or more different types of malware. By training a single model on a unified set of features (such as DLL names, function names, and common alphanumeric strings), the system can be tuned to detect multiple distinct malware types, improving efficiency and the ability to detect unknown threats (Compl. ¶¶29-30; ’548 Patent, col. 5:42-51).
  • Technical Importance: This approach provided a more flexible and efficient way to detect a wide variety of malware compared to systems that required separate classifiers for each malware type or relied on time-consuming manual rule creation (Compl. ¶¶28, 31).

Key Claims at a Glance

  • The complaint asserts independent claim 1 (Compl. ¶72).
  • The essential elements of independent claim 1 include:
    • Determining classification labels for a first and a second type of malware.
    • Creating a feature definition file where first features (for the first malware type) and second features (for the second malware type) are "combined into one feature set."
    • The combined features include characteristics of the malware type, DLL and function names, and alphanumeric strings.
    • Selecting training data including both malware and benign software.
    • Executing a training application with the feature file and training data.
    • Outputting a training model arranged to identify both types of malware.
  • The complaint does not explicitly reserve the right to assert dependent claims.

U.S. Patent No. 8,505,094 - Detection of Malicious URLs in a Webpage

The Invention Explained

  • Problem Addressed: Malicious actors were increasingly injecting malicious URLs into legitimate websites, but conventional detection methods like blacklisting or simple pattern matching were ineffective against new or rapidly changing threats (Compl. ¶¶36-37; ’094 Patent, col. 2:7-16).
  • The Patented Solution: The patent discloses a system that uses a machine learning classifier to detect malicious URLs by analyzing features beyond the URL string itself. The invention focuses on vectors related to a URL’s layout (e.g., its location in the header or footer), referring behavior (e.g., a page rank disparity between the host page and the linked page), and content relevancy (e.g., a subject matter mismatch between the two pages) (Compl. ¶¶40-42; ’094 Patent, col. 3:54-59).
  • Technical Importance: This method provided a way to identify malicious links based on the statistical properties of how they are embedded in host pages, a technique designed to catch threats that could evade systems focused only on content or reputation (Compl. ¶43).

Key Claims at a Glance

  • The complaint asserts independent claim 1 (Compl. ¶95).
  • The essential elements of independent claim 1 include:
    • Retrieving HTML code for a Web page.
    • Scanning the code to identify an embedded URL.
    • Identifying "layout features" of the URL, with one feature indicating the URL is in the "header or the footer" of the HTML code.
    • Producing a "numerical layout vector" from these features.
    • Processing the vector with a classifier algorithm.
    • Outputting a score indicating the likelihood the URL is malicious.
  • The complaint does not explicitly reserve the right to assert dependent claims but references claims 7 and 14 as examples (Compl. ¶¶40, 42).

U.S. Patent No. 8,045,808 - Pure Adversarial Approach for Identifying Text Content in Images

Technology Synopsis

The patent addresses the problem of spammers embedding text in images with "anti-OCR" features (e.g., irregular backgrounds, distorted fonts) to evade filters (Compl. ¶48; ’808 Patent, col. 2:1-3). The invention proposes an "adversarial" OCR method that, instead of attempting to transcribe all text, receives a specific search term (e.g., a known spam keyword) and searches the image for a candidate sequence of character blocks that is probabilistically likely to match that term, making it more robust to visual obfuscation (Compl. ¶¶49-50; ’808 Patent, col. 4:19-22).

Asserted Claims

The complaint asserts independent claim 1 (Compl. ¶115).

Accused Features

The complaint accuses Open Text’s data capture products, including Open Text Intelligent Capture, Capture Center, and Capture Document Reader, which allegedly use OCR to extract data from scanned images and documents (Compl. ¶¶62-67).

III. The Accused Instrumentality

Product Identification

The complaint names two primary groups of accused products. For the ’548 and ’094 Patents, the accused products are the "Webroot Products" (e.g., Webroot SecureAnywhere Business Endpoint Protection, BrightCloud Threat Intelligence Services) and other "Open Text Products" that allegedly integrate this security technology (Compl. ¶¶54-55). For the ’808 Patent, the accused products are Open Text’s data and document capture solutions, such as Open Text Intelligent Capture and Open Text Capture Center (Compl. ¶62).

Functionality and Market Context

The Webroot and BrightCloud products are described as security platforms that use machine learning to detect and classify malware and malicious URLs (Compl. ¶73, 96). The complaint alleges that this security functionality is integrated into or packaged with other Open Text offerings, such as the OpenText Security & Protection Cloud (Compl. ¶¶56-57). A screenshot from a marketing video shows that the "Endpoint and Network Security" portion of the OpenText Security & Protection Cloud includes Webroot products (Compl. ¶56). The data capture products are described as end-to-end solutions that use OCR and other recognition technologies to automatically classify documents and extract data (Compl. ¶¶63-64).

IV. Analysis of Infringement Allegations

'548 Patent Infringement Allegations

Claim Element (from Independent Claim 1) Alleged Infringing Functionality Complaint Citation Patent Citation
determining a classification label that represents a type of malware... [and] a second type of malware BrightCloud Threat Intelligence's algorithm allegedly classifies malware into different groups, such as "Windows Exploits" and "Botnets," and determines a classification label for each type. ¶¶74-75 col. 5:42-51
creating a feature definition file... wherein said first and second features are combined into one feature set in said feature definition file BrightCloud allegedly captures up to 10 million features for different types of malware, which, on information and belief, are combined into a feature definition file. ¶76 col. 9:42-49
wherein said features include characteristics of said type of malware, DLL names and function names executed by said type of malware, and alphanumeric strings used by said type of malware The complaint cites a separate Webroot patent (U.S. 10,599,844) and a blog post to allege that the features extracted by the accused products include DLL functions, function name anomalies, and alphanumeric strings. ¶77 col. 5:13-19
selecting software training data including software of the same type as said type of malware and software that is benign BrightCloud Threat Intelligence allegedly selects training data that includes samples of each type of malware as well as benign software. ¶78 col. 9:57-60
executing a training application on a computer... and inputting said feature definition file and said software training data Webroot Endpoint Protection allegedly inputs a feature definition file and training samples into a training application on a computer. ¶79 col. 10:10-14
outputting a training model... arranged to assist in the identification of said type of malware and said second type of malware BrightCloud Intelligence allegedly trains and outputs a machine learning model that classifies at least two or more types of malware. ¶80 col. 10:20-24

A diagram of the BrightCloud IP Reputation Service is provided as evidence of classifying different "Key Threat Types," such as "Windows Exploits" and "Botnets" (Compl. ¶74, p. 32).

  • Identified Points of Contention:
    • Technical Questions: A primary question is whether the complaint provides sufficient evidence that the accused products use the specific types of features recited in the claim (DLL names, function names, alphanumeric strings). The complaint heavily relies on disclosures from a different Webroot patent ('844 Patent) to support this element, raising the question of whether this accurately reflects the accused products' operation (Compl. ¶77).
    • Scope Questions: The claim requires that features from two different types of malware be "combined into one feature set." The infringement analysis may turn on how "combined" is defined and whether the accused system's architecture meets this limitation, a point the patentee emphasized during prosecution (Compl. ¶30).

'094 Patent Infringement Allegations

Claim Element (from Independent Claim 1) Alleged Infringing Functionality Complaint Citation Patent Citation
retrieving HTML code representing a Web page BrightCloud Threat Intelligence allegedly uses "sophisticated internet crawlers" to catalog URLs and retrieve associated data, including HTML code. ¶97 col. 5:10-12
scanning said HTML code and identifying at least one embedded URL of said HTML code The accused products allegedly crawl uncategorized sites dynamically, which includes scanning the HTML to identify embedded URLs and links. ¶98 col. 5:20-21
identifying layout features... one of said layout features indicating that said embedded URL is located within the header or the footer of said HTML code BrightCloud allegedly collects characteristics of a web page, including "how the HTML is constructed beneath the page." The complaint alleges on "information and belief" that this includes determining if a URL is in the header or footer. ¶¶99-100 col. 4:15-20
producing a numerical layout vector that indicates the presence of said layout features The accused products allegedly create "high dimensional input vectors" by encoding the collected characteristics of a web page. ¶100 col. 5:36-38
processing said numerical layout vector using a classifier algorithm BrightCloud Threat Intelligence allegedly processes the input vectors using classifier algorithms including deep neural nets with 40 million nodes. ¶97 (p. 47) col. 5:58-60
outputting a score from said classifier algorithm indicating the likelihood that said embedded URL... is a malicious URL BrightCloud allegedly assigns every internet object, including URLs, a "reputation score ranging from one to one hundred" to indicate if it is malicious. ¶101 col. 6:1-4
  • Identified Points of Contention:
    • Evidentiary Questions: The claim requires identifying a layout feature that specifically indicates a URL is in the "header or the footer." The complaint supports this allegation "on information and belief" (Compl. ¶100), which raises an evidentiary question of whether the plaintiff can prove the accused system performs this specific function, as opposed to a more general layout analysis.
    • Scope Questions: A legal dispute may arise over whether the accused system's alleged function of analyzing "how the HTML is constructed" (Compl. ¶99) is sufficient to meet the specific "header or the footer" limitation, or if a narrower interpretation requiring explicit identification of <head> or <footer> tags is required.

V. Key Claim Terms for Construction

'548 Patent

  • The Term: "combined into one feature set" (Claim 1)
  • Context and Importance: This term is central to the patent’s asserted novelty, distinguishing it from prior art that might use separate models for different malware types. The infringement analysis depends on whether the accused system's architecture involves a "combination" of disparate feature types into a single set before training, as the patentee argued was advantageous during prosecution (Compl. ¶30).
  • Intrinsic Evidence for Interpretation:
    • Evidence for a Broader Interpretation: The detailed description may discuss the benefits of using a "large feature set" in general terms, potentially supporting a construction that does not require a specific method of combination, as long as a single classifier is trained on features from multiple malware families.
    • Evidence for a Narrower Interpretation: The claim uses the specific phrase "one feature set in said feature definition file." The prosecution history cited in the complaint explicitly states that "the first and second groups of features are combined into one feature set in the feature definition file," which may support a narrower construction requiring a literal combination within a single file or data structure (Compl. ¶30).

'094 Patent

  • The Term: "layout features... indicating that said embedded URL is located within the header or the footer" (Claim 1)
  • Context and Importance: The infringement case for this patent may turn on this limitation, as it is a specific, measurable feature. Practitioners may focus on this term because the complaint's allegation that the accused products meet it is based "on information and belief," suggesting it may be a point of factual dispute (Compl. ¶100).
  • Intrinsic Evidence for Interpretation:
    • Evidence for a Broader Interpretation: The specification discusses the concept of malicious links being found in "isolated sections of a Web page" (Compl. ¶41; ’094 Patent, col. 4:30-32). Plaintiff may argue that "header or the footer" are merely exemplary, non-limiting examples of such isolated sections.
    • Evidence for a Narrower Interpretation: The plain language of the claim explicitly recites "header or the footer." The specification may define these terms with reference to specific HTML structures (e.g., <head>, <footer> tags), which would support a narrower construction that the accused product must specifically identify.

VI. Other Allegations

  • Indirect Infringement: The complaint alleges both induced and contributory infringement. Inducement is alleged based on Defendants providing software along with user manuals, marketing, and instructions that allegedly direct customers to operate the products in an infringing manner (Compl. ¶¶84, 105, 125). Contributory infringement is alleged on the basis that the accused software components are specially designed to practice the patents and have no substantial non-infringing uses (Compl. ¶¶86, 107, 127).
  • Willful Infringement: Willfulness is alleged based on Defendants having "actual knowledge" of the asserted patents since at least September 16, 2022, the date Trend Micro filed a separate patent infringement complaint against them in another district (Compl. ¶¶90, 111, 131).

VII. Analyst’s Conclusion: Key Questions for the Case

  • A central issue will be one of evidentiary sufficiency: For the ’548 Patent, can Plaintiff prove that the accused products perform the claimed method using the specific feature types recited (e.g., DLL names, function names), given that the complaint relies on disclosures from a separate patent, not direct evidence from the accused products themselves?
  • A key question will be one of claim scope and factual proof: For the ’094 Patent, does the accused system's general analysis of "how the HTML is constructed" meet the explicit claim limitation of identifying a URL's location in the "header or the footer," and what factual evidence will emerge to support this allegation beyond the complaint's "information and belief" pleading?
  • A third question will relate to technical operation: For the ’808 Patent, does the accused OCR technology perform the claimed "adversarial" method of starting with a known search term and calculating sequence probabilities, or does it employ a more conventional approach of transcribing text first and then searching the output, a method the patent sought to improve upon?