DCT

1:22-cv-01063

Trend Micro Inc v. Open Text Inc

Key Events

Complaint

I. Executive Summary and Procedural Information

Parties & Counsel:
- Plaintiff: Trend Micro Incorporated (California)
- Defendant: Open Text, Inc. (Delaware), Open Text Corp. (Ontario, Canada), Open Text Public Sector Solutions, Inc. (Virginia), and Webroot, Inc. (Delaware)
- Plaintiff’s Counsel: Paul Hastings LLP
Case Identification: 1:22-cv-01063, E.D. Va., 09/16/2022
Venue Allegations: Plaintiff alleges venue is proper in the Eastern District of Virginia because Defendant Open Text Public Sector Solutions, Inc. is incorporated in Virginia and maintains a regular and established place of business in Arlington. Venue over other defendants is based on allegations of maintaining regular and established places of business in the district, including through local offices and employees, and for the foreign Canadian entity, based on the provision that it may be sued in any judicial district.
Core Dispute: Plaintiff alleges that Defendants’ cybersecurity and data capture products infringe three patents related to machine-learning-based malware detection, malicious URL identification, and optical character recognition (OCR).
Technical Context: The case concerns cybersecurity technologies that use machine learning to identify threats, such as malware in files or malicious links in webpages, and data processing technology that uses OCR to extract text from images, which is critical in anti-spam and document management systems.
Key Procedural History: The complaint references Applicant arguments made during the prosecution of the '548 and '808 patents to distinguish prior art. For the '548 patent, the Applicant emphasized the novelty of combining multiple feature sets into a single larger set to train one model for detecting different malware types. For the '808 patent, the Applicant distinguished its "adversarial" approach of searching an image for a specific term from prior art that first extracted all text and then analyzed it. These arguments may inform the court’s interpretation of the claim scope.

Case Timeline

Date	Event
2005-08-15	U.S. Patent No. 8,161,548 Priority Date
2006-12-04	U.S. Patent No. 8,045,808 Priority Date
2010-01-13	U.S. Patent No. 8,505,094 Priority Date
2011-10-25	U.S. Patent No. 8,045,808 Issued
2012-04-17	U.S. Patent No. 8,161,548 Issued
2013-08-06	U.S. Patent No. 8,505,094 Issued
2019-01-01	OpenText acquires Carbonite (parent company of Webroot) (approx. date based on Compl. ¶29)
2021-02-04	OpenText launches BrightCloud Cloud Service Intelligence (Compl. ¶30)
2022-09-16	Complaint Filing Date

II. Technology and Patent(s)-in-Suit Analysis

U.S. Patent No. 8,161,548 - "Malware Detection Using Pattern Classification"

Issued: April 17, 2012

The Invention Explained

Problem Addressed: Prior art malware detection techniques were unable to effectively handle new, "unknown" malware. Predefined pattern databases could only detect known threats, while rule-based or heuristic systems were difficult to maintain, time-consuming, and could not achieve both a high detection rate and a low false-positive rate (Compl. ¶¶33-34; ’548 Patent, col. 1:34-47).
The Patented Solution: The invention proposes a method for training a malware classifier using a machine learning algorithm. The method involves creating a single, combined "feature set" from features relevant to at least two different types of malware. This allows a single trained model to identify multiple malware types, which the patent asserts is more efficient and effective than conventional approaches that might require separate models for each malware type (Compl. ¶¶36-38; ’548 Patent, col. 5:42-51). The features include software characteristics, DLL and function names, and common alphanumeric strings associated with malware (’548 Patent, Abstract).
Technical Importance: This approach represented a shift toward using more sophisticated, consolidated machine learning models in cybersecurity, aiming to create more flexible and scalable defenses against a rapidly diversifying threat landscape (Compl. ¶35; ’548 Patent, col. 2:65-4:19).

Key Claims at a Glance

The complaint asserts at least Claim 1, an independent method claim (Compl. ¶78).
Essential elements of Claim 1 include:
- Determining classification labels for a first type of malware and a second type of malware.
- Creating a feature definition file that includes first features for the first malware type and second features for the second malware type.
- Combining the first and second features into "one feature set" in the feature definition file.
- The features include software characteristics, DLL names, function names, and alphanumeric strings.
- Executing a training application with the feature definition file and training data (including malware and benign software).
- Outputting a training model arranged to assist in identifying both types of malware.
The complaint does not explicitly reserve the right to assert dependent claims but makes general allegations of infringement of "one or more claims" (Compl. ¶76).

U.S. Patent No. 8,505,094 - "Detection of Malicious URLs in a Web Page"

Issued: August 6, 2013

The Invention Explained

Problem Addressed: Legitimate websites were being compromised by hackers who injected malicious URLs, but detecting these URLs was difficult. Methods like crawling the entire web were not scalable, and blacklists could not keep up with new, rapidly changing malicious domains (Compl. ¶¶41-43; ’094 Patent, col. 1:65-2:6).
The Patented Solution: The invention claims a method for detecting malicious URLs by analyzing specific categories of features using a machine learning classifier. It constructs "numerical vectors" based on: (1) layout features (e.g., if the URL is hidden or in the header/footer of the HTML), (2) referring features (e.g., the relative page rank of the parent page versus the linked child page), and (3) content relevancy features (e.g., whether the linked content is relevant to the host page) (Compl. ¶¶46-49; ’094 Patent, col. 3:19-59). The classifier processes these vectors to score the likelihood that a URL is malicious.
Technical Importance: This technology provided a multi-faceted, contextual approach to URL analysis, moving beyond simple blacklisting to a more nuanced, behavior-based detection model capable of identifying previously unknown threats (Compl. ¶¶50-51; ’094 Patent, col. 2:17-27).

Key Claims at a Glance

The complaint asserts at least Claim 1, an independent method claim (Compl. ¶101).
Essential elements of Claim 1 include:
- Retrieving HTML code for a Web page.
- Scanning the code to identify an embedded URL.
- Identifying layout features of the URL, specifically noting if the URL is "located within the header or the footer of said HTML code."
- Producing a "numerical layout vector" indicating the presence of these layout features.
- Processing the vector with a classifier algorithm.
- Outputting a score indicating the likelihood the URL is malicious.
The complaint also references concepts from claims 7 (referring vector) and 14 (relevancy vector) in its narrative (Compl. ¶¶47, 49).

U.S. Patent No. 8,045,808 - "Pure Adversarial Approach for Identifying Text Content in Images"

Issued: October 25, 2011

The Invention Explained

The patent addresses the problem of spammers using images with anti-OCR features (e.g., irregular backgrounds, confusing fonts) to evade detection (Compl. ¶55; ’808 Patent, col. 2:1-3). The invention proposes an "adversarial" OCR that, instead of trying to recognize every character, receives a specific search term and searches the image directly for a candidate sequence of character blocks that is likely to match that term, making it more robust against obfuscation (Compl. ¶¶56-57; ’808 Patent, col. 4:19-22, 9:41-47).

Key Claims at a Glance

The complaint asserts at least Claim 1 (Compl. ¶121).

Accused Features

The accused products are Defendants' document and character recognition products, including Open Text Intelligent Capture, Capture Center, and others, which use OCR to extract data from scanned images and documents (Compl. ¶¶68-73).

III. The Accused Instrumentality

Product Identification

The complaint names two main categories of accused products.

For the '548 and '094 patents: "Webroot Products" and "Open Text Products," including BrightCloud Threat Intelligence Services, Webroot SecureAnywhere, Webroot DNS Protection, and the Open Text Security & Protection Cloud (Compl. ¶¶33, 61-62).
For the '808 patent: "’808 Patent Accused Products," including Open Text Intelligent Capture (formerly Captiva), Open Text Capture Center, and other data capture solutions (Compl. ¶68).

Functionality and Market Context

The accused cybersecurity products (Webroot/BrightCloud) allegedly use machine learning to detect malware and malicious URLs. The complaint alleges these services form the core of a "SMB Powerhouse" for Defendants, supporting over 818,000 businesses (Compl. ¶¶65, 79, 102; Compl. p. 20). The complaint includes a screenshot from an investor presentation showing that the "Carbonite (including Webroot & Brightcloud)" business unit supports 23,000 MSPs (Compl. p. 20).
The accused data capture products (Intelligent Capture) allegedly use OCR and machine learning to automatically classify documents and extract data, such as keywords, from scanned images (Compl. ¶¶69, 122). A marketing screenshot describes this as an "end-to-end capture solution" that includes "automated data extraction/OCR" (Compl. p. 37).

IV. Analysis of Infringement Allegations

’548 Patent Infringement Allegations

Claim Element (from Independent Claim 1)	Alleged Infringing Functionality	Complaint Citation	Patent Citation
A method of training a malware classifier...	Defendants' BrightCloud Threat Intelligence Services perform a method of training a malware classifier using machine learning.	¶79	col. 10:1-20
determining a classification label that represents a type of malware... [and] a second type of malware	The BrightCloud algorithm allegedly classifies malware into different groups, determining classification labels for at least two types of malware that are not benign.	¶¶80-81	col. 9:40-44
creating a feature definition file that includes first features... and that includes second features..., wherein said first and second features are combined into one feature set in said feature definition file...	BrightCloud allegedly captures up to 10 million features to classify different malware types, and it is alleged these features are combined into a feature definition file.	¶82	col. 5:42-51
wherein said features include characteristics of said type of malware, DLL names and function names executed by said type of malware, and alphanumeric strings used by said type of malware	The accused products allegedly collect these specific feature types, citing a related patent ('844 Patent) and marketing materials as evidence. A marketing blog post is shown identifying a Trojan by its use of random alphanumeric characters (Compl. p. 44).	¶83	col. 5:52-6:47
selecting software training data including software of the same type as said type of malware and software that is benign	BrightCloud allegedly selects training data including each type of malware and benign software, using millions of data points to train its models.	¶84	col. 9:56-62
executing a training application on a computer... and inputting said feature definition file and said software training data into said training application	Webroot Endpoint Protection allegedly inputs a feature definition file and 10 million training samples into a training application on a computer, leveraging Amazon Web Services and the San Diego Supercomputer Center.	¶85	col. 10:9-15
outputting a training model... arranged to assist in the identification of said type of malware and said second type of malware	BrightCloud allegedly trains and publishes new models daily that classify at least two or more types of malware. A diagram from a datasheet shows a "Capture -> Analyze -> Classify -> Publish" workflow, culminating in a reputation service (Compl. p. 40).	¶86	col. 10:15-20

Identified Points of Contention

Scope Questions: A central question will be whether Defendants' machine learning process meets the specific limitation of combining features for "a type of malware" and a "second type of malware" into "one feature set." Defendants may argue their system is more dynamic or uses separate models, while Plaintiff will likely point to its prosecution history arguments that this "one feature set" approach is a key, unconventional aspect of the invention (Compl. ¶37).
Technical Questions: What evidence demonstrates that the accused system trains a single model to identify at least two distinct malware types, as opposed to training multiple specialized models in parallel? The complaint alleges over 500 classifiers operate in parallel (Compl. ¶80), which may raise the question of whether this corresponds to one unified model or many separate ones.

’094 Patent Infringement Allegations

Claim Element (from Independent Claim 1)	Alleged Infringing Functionality	Complaint Citation	Patent Citation
retrieving HTML code representing a Web page	BrightCloud Threat Intelligence allegedly uses "sophisticated internet crawlers that catalog all URLs" to retrieve HTML code. A screenshot highlights this capability (Compl. p. 50).	¶103	col. 6:1-4
scanning said HTML code and identifying at least one embedded URL of said HTML code	BrightCloud allegedly scans HTML to identify embedded URLs, particularly when a user visits an "uncategorized site." A datasheet screenshot states, "Malicious URLs often hide in otherwise benign domains" (Compl. p. 51).	¶104	col. 5:1-14
identifying layout features of said embedded URL... one of said layout features indicating that said embedded URL is located within the header or the footer of said HTML code	The accused system allegedly captures "all of the characteristics on a web page," including "how the HTML is constructed beneath the page," which the complaint alleges includes identifying whether a URL is in the header or footer.	¶105	col. 3:19-34
producing a numerical layout vector that indicates the presence of said layout features	BrightCloud allegedly creates "high dimensional input vectors" by encoding the characteristics of a web page into a form suitable for machine learning.	¶106	col. 6:28-50
processing said numerical layout vector using a classifier algorithm	The accused system allegedly processes these input vectors using classifier algorithms such as Bayesian classifiers, Support Vector Machines, and Deep Learning neural nets.	¶97 (by reference from ¶106)	col. 5:6-9
outputting a score... indicating the likelihood that said embedded URL... is a malicious URL	BrightCloud allegedly assigns every internet object, including URLs, a "reputation score ranging from one to one hundred," with scores between one and twenty considered malicious.	¶107	col. 9:21-25

Identified Points of Contention

Scope Questions: Does the accused system's general analysis of "how the HTML is constructed" (Compl. ¶105) meet the specific claim requirement of identifying if a URL is "located within the header or the footer"? Defendants may argue their system uses a holistic layout analysis that does not specifically check for this binary location, creating a potential mismatch with the claim language.
Technical Questions: What evidence will show that the "10 million characteristics" allegedly captured by the accused system (Compl. ¶105) are used to create a "numerical layout vector" that specifically "indicates the presence of said layout features" as claimed, rather than being part of a more general feature set?

V. Key Claim Terms for Construction

Patent: ’548 Patent

The Term: "one feature set"
Context and Importance: This term is critical because the patent's alleged inventive concept, as argued during prosecution, is the combination of features from at least two different malware types into a single, unified set to train a single model (Compl. ¶37). The infringement analysis will turn on whether Defendants' system, which allegedly uses "over 500 classifiers operating in parallel" (Compl. ¶80), can be said to use "one feature set" in the manner claimed.
Intrinsic Evidence for Interpretation:
- Evidence for a Broader Interpretation: The specification describes combining feature sets as creating a "large feature set" (’548 Patent, col. 5:45-46), which could be argued to encompass any aggregation of features, even if processed in a distributed or parallel manner.
- Evidence for a Narrower Interpretation: The Applicant's arguments during prosecution explicitly state that the advantage of "combining these sets of features into a single larger feature set, is that a training model can be produced to identify at least two different kinds of malware" and that if not combined, "one must either use two training models, or use two different classification algorithms" (Compl. ¶37). This suggests "one feature set" is inextricably linked to using a single model and algorithm, potentially narrowing its scope.

Patent: ’094 Patent

The Term: "layout features... located within the header or the footer of said HTML code"
Context and Importance: This term specifies a very particular type of feature to be identified. The infringement dispute will likely focus on whether the accused products, which are alleged to analyze "how the HTML is constructed beneath the page" (Compl. ¶105), perform this specific check. Practitioners may focus on this term because it appears to be a clear, potentially limiting element that requires specific evidence to prove infringement.
Intrinsic Evidence for Interpretation:
- Evidence for a Broader Interpretation: The specification broadly discusses identifying URLs in "isolated sections," which are then defined as the header or footer, and also notes that "the injected URL is not visible (or hidden)" (’094 Patent, col. 3:20-25). A party might argue that any analysis that identifies hidden or non-prominently-placed URLs meets the spirit of this limitation.
- Evidence for a Narrower Interpretation: The claim language is explicit: "one of said layout features indicating that said embedded URL is located within the header or the footer." The specification provides a concrete example where the feature "in_tail" is assigned a binary value based on the URL's location (’094 Patent, col. 6:35-61). This supports a narrow reading requiring a direct check for location in the document's header or footer sections.

VI. Other Allegations

Indirect Infringement

The complaint alleges Defendants induce infringement by providing customers with user manuals, instructions, marketing, and support that encourage use of the accused products in an infringing manner (Compl. ¶¶91, 112, 132). It further alleges contributory infringement, stating the products are "specially designed" to infringe and have "no substantial non-infringing uses" (Compl. ¶¶92, 113, 133).

Willful Infringement

Willfulness is alleged based on Defendants' knowledge of the patents "since at least the filing of this Complaint" (Compl. ¶¶96, 117, 137). This establishes a basis for potential enhanced damages for any infringement occurring after the complaint was served.

VII. Analyst’s Conclusion: Key Questions for the Case

A central issue will be one of technical implementation versus claim scope: For the '548 patent, does the accused system's use of numerous parallel classifiers constitute the claimed method of using "one feature set" to train a single model for multiple malware types, or is this a fundamentally different architecture? The resolution will depend on how the court construes "one feature set" in light of the patent's specification and prosecution history.
A key evidentiary question will be one of specificity: For the '094 patent, can Plaintiff provide evidence that Defendants' general analysis of webpage "characteristics" and "how the HTML is constructed" includes the specific, claimed step of identifying whether a URL is "located within the header or the footer"? The case may turn on whether Plaintiff can prove this specific feature is identified, rather than just being part of a larger, more abstract layout analysis.
A foundational legal question will be patent eligibility: The complaint proactively argues that the patents are directed to specific, unconventional technological solutions, not abstract ideas (Compl. ¶¶38-39, 50-52). A key battleground will be whether these claims, which are implemented in software and use machine learning, represent a patent-eligible improvement to computer functionality or are merely an abstract process of data analysis performed by a generic computer.