DCT

1:19-cv-02069

Recursive Web Tech LLC v. Netbase Solutions Inc

Key Events
Complaint
complaint

I. Executive Summary and Procedural Information

  • Parties & Counsel:
  • Case Identification: 1:19-cv-02069, D. Del., 10/30/2019
  • Venue Allegations: Venue is alleged to be proper in the District of Delaware because the Defendant is a Delaware corporation and is therefore deemed to be a resident of the District.
  • Core Dispute: Plaintiff alleges that Defendant’s social media monitoring and analytics platforms infringe three patents related to the automated extraction of structured facts, such as names, timestamps, and key numbers, from unstructured and semi-structured documents.
  • Technical Context: The technology at issue falls within the domain of natural language processing (NLP) and is central to the modern business intelligence and social media analytics industries, which rely on transforming vast amounts of unstructured online text into structured, queryable data.
  • Key Procedural History: The complaint does not reference any prior litigation, inter partes review proceedings, or licensing history concerning the patents-in-suit. The three asserted patents share a common specification and claim priority to the same 2004 provisional application.

Case Timeline

Date Event
2004-06-18 Earliest Priority Date for ’807, ’661, and ’848 Patents
2010-07-13 U.S. Patent No. 7,756,807 Issues
2012-08-14 U.S. Patent No. 8,244,661 Issues
2013-12-31 U.S. Patent No. 8,620,848 Issues
2016-02-19 Date of blog post provided as evidence of accused product functionality
2019-10-30 Complaint Filed

II. Technology and Patent(s)-in-Suit Analysis

U.S. Patent No. 7,756,807 - "System and Method for Facts Extraction and Domain Knowledge Repository Creation From Unstructured and Semi-Structured Documents"

The Invention Explained

  • Problem Addressed: The patent’s background section describes the limitations of conventional keyword-based search engines, which cannot aggregate facts or answer complex questions that require understanding relationships within text (e.g., “How many CRM companies hired a chief privacy officer in the last two years?”) (’807 Patent, col. 2:1-7). Extracting such information automatically is described as a "formidable task" due to the non-standard and complex structure of web documents (’807 Patent, col. 1:22-30).
  • The Patented Solution: The invention proposes a system that automatically extracts structured "facts" from unstructured documents to build a searchable knowledge repository, or "oracle" (’807 Patent, Abstract). To achieve this, it employs specialized grammatical parsing techniques, such as "island grammar," to identify and extract specific types of information, such as a person's name, their job title, their employer, and a direct quote attributed to them (a "PPCQ" quadruple) (’807 Patent, col. 27:35-41; FIG. 11).
  • Technical Importance: This technology represents a method to automate the creation of structured business intelligence databases from the vast amount of unstructured text available on the internet.

Key Claims at a Glance

  • The complaint asserts at least independent claim 15 (Compl. ¶13).
  • The essential elements of Claim 15 are:
    • A method for extraction of people names, positions they have with companies, companies names and their quotes from an article comprising:
    • building a list of paragraphs; and
    • applying island grammar to extract quadruples that include at last one of, person, position, company and quote.
  • The complaint does not explicitly reserve the right to assert dependent claims.

U.S. Patent No. 8,244,661 - "System and Method for Facts Extraction and Domain Knowledge Repository Creation From Unstructured and Semi-Structured Documents"

The Invention Explained

  • Problem Addressed: The patent identifies the temporal relevance of facts as a critical but difficult aspect of information extraction. It notes the challenge of extracting timestamps from web pages that lack uniform placement rules for dates and contain "false clues" that can confuse automated systems (’661 Patent, col. 3:7-14, 3:42-45).
  • The Patented Solution: The invention provides a method for automated timestamp extraction and verification. The process involves parsing a page into paragraphs and then searching within each paragraph for "valid triads"—sequences of three words representing a year, month, and day (’661 Patent, Abstract; col. 19:1-8). The system then applies rules to resolve ambiguity and correctly associate the extracted timestamp with the relevant portions of the text (’661 Patent, FIG. 8).
  • Technical Importance: This method allows an automated system to determine when an extracted fact was valid, a crucial capability for building accurate and historically-aware knowledge repositories.

Key Claims at a Glance

  • The complaint asserts at least independent claim 1 (Compl. ¶26).
  • The essential elements of Claim 1 are:
    • A method of time stamp extraction and verification, comprising:
    • parsing a page and representing it as a sequence of paragraphs; and
    • building list of candidates for time stamp for each paragraph by extraction of valid triads representing year, month and day.
  • The complaint does not explicitly reserve the right to assert dependent claims.

U.S. Patent No. 8,620,848 - "System and Method for Facts Extraction and Domain Knowledge Repository Creation From Unstructured and Semi-Structured Documents"

Technology Synopsis

  • This patent discloses a method for identifying and extracting "Very Important Numbers (VINs)" and their associated objects from text (’848 Patent, Abstract). The invention aims to solve the problem of automatically identifying key quantitative data within a document (e.g., revenue, number of employees, market share) and correctly linking that number to the entity it describes, thereby capturing high-value business intelligence facts (’848 Patent, col. 5:46-6:1).

Asserted Claims

  • At least independent claim 2 is asserted (Compl. ¶39).

Accused Features

  • The complaint alleges that the accused product's NLP engine infringes by extracting numerical metrics such as "Mentions," "Reach," "Net Sentiments," and "BPI (Brand Passion Index) rank" from social media posts and associating them with a brand (Compl. ¶41).

III. The Accused Instrumentality

Product Identification

  • NetBase Live Pulse, NetBase Pro, NetBase Enterprise, and similar services, collectively referred to as the "Product" (Compl. ¶14).

Functionality and Market Context

  • The "Product" is a social media analytics and monitoring platform that provides "real-time insights for strategic and timely business decisions" (Compl. p. 4). The complaint alleges that the platform uses a Natural Language Processing (NLP) engine with "deep parsing" capabilities to analyze unstructured data from sources like social media websites (Compl. ¶17, ¶41). A screenshot in the complaint describes NetBase's NLP as a "patented" engine that "surfaces and analyzes sentiment for every subject in the sentence" (Compl. p. 23).
  • Alleged functions include extracting people's names, positions, company names, and quotes; parsing documents into paragraphs; extracting timestamps; and extracting key numerical metrics associated with brands, such as the number of mentions or net sentiment score (Compl. ¶15, ¶16, ¶28, ¶30, ¶41). The complaint includes a screenshot of the product's alert functionality, which displays an "EVENT SUMMARY" with a timestamp and the number of mentions within a recent period (Compl. p. 4).

IV. Analysis of Infringement Allegations

'807 Patent Infringement Allegations

Claim Element (from Independent Claim 15) Alleged Infringing Functionality Complaint Citation Patent Citation
A method for extraction of people names, positions they have with companies, companies names and their quotes from an article comprising: The Product performs a method for extracting this information from social media websites to provide media monitoring services. ¶15 col. 27:35-41
building a list of paragraphs; and The Product builds a list of paragraphs from source material, such as social media posts, and displays it to its users. ¶16 col. 27:35-36
applying island grammar to extract quadruples that include at last one of, person, position, company and quote. The Product applies "deep parsing" to extract quadruples that include at least one of person, position, company, and quote. ¶17 col. 19:22-26

Identified Points of Contention

  • Scope Questions: A central question may be whether the term "island grammar," a specific parsing methodology described in the patent, can be construed to read on the "deep parsing" and "Natural Language Processing" that the complaint alleges the accused product performs (Compl. ¶17). The complaint provides a screenshot describing NetBase's NLP capabilities, including "Deep-parsing texts in 10+ languages," which will be central to this inquiry (Compl. p. 8).
  • Technical Questions: The complaint alleges the product "builds a list of paragraphs" (Compl. ¶16), but it raises the question of what technical evidence supports this specific step, as opposed to a more general processing of a text stream that does not involve the discrete creation of a "list" as contemplated by the claim.

'661 Patent Infringement Allegations

Claim Element (from Independent Claim 1) Alleged Infringing Functionality Complaint Citation Patent Citation
A method of time stamp extraction and verification, comprising: The Product performs a method of time stamp extraction and verification to provide real-time social data to users. ¶28 col. 18:26-34
parsing a page and representing it as a sequence of paragraphs; and The Product parses data from social media websites (e.g., Facebook, Twitter) and represents it as a sequence of paragraphs. ¶29 col. 28:1-2
building list of candidates for time stamp for each paragraph by extraction of valid triads representing year, month and day. The Product provides real-time notifications to users with a timestamp that contains the year, month, and day ("triad"). ¶30 col. 19:1-8

Identified Points of Contention

  • Scope Questions: The claim requires "extraction of valid triads" from a "paragraph." This raises the question of whether this limitation reads on a process that obtains timestamp information from structured metadata associated with a social media post (e.g., data from a platform's API), which may not involve "extraction" from the text of the "paragraph" itself.
  • Technical Questions: A key factual question is the source of the timestamps in the accused product. The complaint provides a screenshot showing an "EVENT SUMMARY" with the timestamp "STARTED 01 Jun 15," but does not specify how this timestamp was derived (Compl. p. 4). The analysis will depend on whether this date was extracted from unstructured text or read from a structured data field.

V. Key Claim Terms for Construction

  • The Term: "applying island grammar" (’807 Patent, Claim 15)

  • Context and Importance: This term defines the core technical mechanism for fact extraction in the asserted claim. The outcome of the infringement analysis for the ’807 patent will likely depend on whether the Defendant's NLP and "deep parsing" technology falls within the legal construction of this term. Practitioners may focus on this term because the complaint does not allege the use of "island grammar" explicitly, instead using the more general term "deep parsing" (Compl. ¶17).

  • Intrinsic Evidence for Interpretation:

    • Evidence for a Broader Interpretation: The patent presents island grammar as a process for sentence parsing in general, illustrated by a high-level flowchart (e.g., "Build list of Context Grammar rules," "Build list of Local Grammar rules") (’807 Patent, FIG. 9). This could support an interpretation covering a range of rule-based parsing systems.
    • Evidence for a Narrower Interpretation: The specification provides a highly specific structure for the grammar rules, defining them as a sequence of separators and typed objects (e.g., (Separator0, Object1_Type, Object1_Role, ...)). This detailed embodiment could be used to argue for a narrower construction limited to this specific implementation (’807 Patent, col. 20:2-8).
  • The Term: "extraction of valid triads representing year, month and day" (’661 Patent, Claim 1)

  • Context and Importance: This term is critical because infringement of the ’661 patent hinges on the specific method used to obtain a timestamp. If the accused product primarily reads structured timestamp metadata from an API, it may not perform the "extraction" from a paragraph as claimed.

  • Intrinsic Evidence for Interpretation:

    • Evidence for a Broader Interpretation: The claim language itself does not impose limitations beyond identifying a year, month, and day from a paragraph, which could support a construction covering any method that achieves this result.
    • Evidence for a Narrower Interpretation: The specification describes a detailed algorithm for identifying triads, including checking for specific separators and discarding candidates that match the current date to avoid confusing a publication date with a date appearing on the webpage for convenience (’661 Patent, col. 19:1-21). This explicit algorithm may support a narrower construction limited to text-based pattern matching within the body of a paragraph.

VI. Other Allegations

  • Indirect Infringement: The complaint alleges that Defendant "induces others to perform" the patented methods (Compl. ¶15, ¶28). The factual basis for this allegation is that Defendant provides its media monitoring platform as a service to its customers, who in turn use the platform's functionalities to perform the allegedly infringing steps of data extraction and analysis (Compl. ¶15).

VII. Analyst’s Conclusion: Key Questions for the Case

  • A core issue will be one of technical equivalence: can the patent-specific term "island grammar," as defined in the ’807 patent, be construed to cover the accused product's more generally described "deep parsing" and "Natural Language Processing" engine? The resolution will likely depend on detailed evidence of how the accused system's parsing algorithms operate.
  • A key evidentiary question will be one of operational mechanics: does the accused product perform the claimed "extraction of valid triads" from the unstructured text of a social media post as required by the ’661 patent, or does it primarily rely on reading structured timestamp metadata provided by the social media platforms themselves? This distinction may be dispositive for infringement of that patent.
  • A central question of claim scope will be how claims with a 2004 priority date, which describe extracting facts from traditional web articles, apply to the modern, often more structured, environment of social media feeds. The defense may argue that these feeds present data in a manner fundamentally different from the unstructured documents contemplated by the patents.