DCT

1:17-cv-01849

Spider Search Analytics LLC v. Importio Corp

Key Events
Complaint

I. Executive Summary and Procedural Information

  • Parties & Counsel:
  • Case Identification: 1:17-cv-01849, D. Del., 12/22/2017
  • Venue Allegations: Venue is alleged to be proper in the District of Delaware because the Defendant is a Delaware corporation.
  • Core Dispute: Plaintiff alleges that Defendant’s web data extraction platform infringes patents related to automated methods for crawling websites, including the dynamic "deep web," and extracting structured facts from unstructured documents.
  • Technical Context: The technology at issue addresses the automated extraction of structured data from websites, a foundational process for modern data analytics, business intelligence, and competitive market analysis.
  • Key Procedural History: The complaint notes that U.S. Patent No. 8,620,848 is a divisional of an application that is part of the same family as U.S. Patent No. 7,454,430, indicating a shared technical disclosure which may be relevant to claim construction. No other prior litigation or administrative proceedings are mentioned in the complaint.

Case Timeline

Date Event
2004-06-18 Priority Date for ’430 & ’848 Patents
2005-06-13 ’430 Patent Application Filed
2008-11-18 ’430 Patent Issued
2013-03-13 ’848 Patent Application Filed
2013-12-31 ’848 Patent Issued
2017-12-22 Complaint Filed

II. Technology and Patent(s)-in-Suit Analysis

U.S. Patent No. 7,454,430 - "System and Method for Facts Extraction and Domain Knowledge Repository Creation from Unstructured and Semi-Structured Documents," issued November 18, 2008 (’430 Patent)

The Invention Explained

  • Problem Addressed: The patent describes the challenge of automatically extracting structured information from the vast number of unstructured or semi-structured documents on the internet, particularly from the “Deep or Dynamic Web” (’430 Patent, col. 3:55-65). This part of the web contains information stored in databases that is only accessible by submitting queries through web forms (e.g., searching for products on an e-commerce site), which traditional crawlers cannot easily navigate (’430 Patent, col. 4:56-65).
  • The Patented Solution: The invention discloses a multi-component system for navigating the deep web to extract facts (’430 Patent, Abstract). As depicted in Figure 5, the process involves a “scout” that probes web forms to collect sample dynamic pages, an “analyzer” that studies these pages to determine the underlying query structure, and a “harvester” that uses this understanding to systematically query the forms and collect all available pages for subsequent fact extraction (’430 Patent, col. 13:46-59).
  • Technical Importance: This automated approach to deep web crawling and data extraction was aimed at moving beyond simple keyword search to create structured, queryable knowledge repositories from previously inaccessible online data sources (’430 Patent, col. 2:8-12).

Key Claims at a Glance

  • The complaint asserts infringement of claims 1 through 27 (Compl. ¶29). Independent method claim 10 is representative of the deep web crawling technology.
  • Claim 10 essential elements:
    • Utilizing scout crawling rules to collect dynamic pages.
    • Utilizing an analyzer and extractor to determine underlying structure of queries.
    • Generating instructions for a harvester.
    • The harvester provides requests to a server and collects available pages from the server.

U.S. Patent No. 8,620,848 - "System and Method for Facts Extraction and Domain Knowledge Repository Creation from Unstructured and Semi-Structured Documents," issued December 31, 2013 (’848 Patent)

The Invention Explained

  • Problem Addressed: The patent addresses the need to extract specific, valuable quantitative information from documents, which is often presented in varied formats like paragraphs or tables (’848 Patent, col. 5:44-54). Identifying not just the numbers but also the objects they describe is a significant challenge for automated systems (’848 Patent, col. 21:5-10).
  • The Patented Solution: The invention describes methods for identifying and extracting what it terms “Very Important Numbers (VINs)” and their associated objects from documents (’848 Patent, Fig. 12). The process involves first determining the areas within a document that contain numbers (such as paragraphs and tables), extracting the numbers themselves, and then analyzing the immediate context (e.g., a sentence or table structure) to determine the object to which each number refers (’848 Patent, col. 21:11-24).
  • Technical Importance: By structuring quantitative data found in unstructured text, this method enables the aggregation and analysis of key metrics that are critical for business decisions but are often locked in prose or simple tables (’848 Patent, col. 6:56-61).

Key Claims at a Glance

  • The complaint asserts infringement of claims 1 through 7 (Compl. ¶29). Independent method claim 2 is representative of the numerical data extraction technology.
  • Claim 2 essential elements:
    • A method for extraction of very important numbers and objects that are associated with from a page comprising: determining paragraphs and tables containing numbers.
    • Extracting the numbers.

III. The Accused Instrumentality

Product Identification

  • The accused instrumentality is the Defendant’s Import.io Platform, which includes its Web Crawlers and Web Data Extractors (Compl. ¶19, ¶26).

Functionality and Market Context

  • The complaint alleges the Import.io Platform is a software tool used to build web crawlers capable of navigating and collecting data from the "deep web" (Compl. ¶20). Users can define crawler controls, such as "Crawl depth" and "Crawl URL templates," to target specific dynamic pages for data collection (Compl. ¶21). The platform is alleged to analyze the structure of web pages to extract information and store it in a structured format, such as a spreadsheet (XLSX or CSV) or via a JSON API (Compl. ¶22, ¶23). A screenshot provided in the complaint shows a user interface for handling paginated websites, a common feature of dynamic content (Compl. ¶21, "Pagination Extract from difficult sites" visual). Another visual demonstrates the platform’s "Point & Click" interface for identifying data to be extracted (Compl. ¶23).

IV. Analysis of Infringement Allegations

’430 Patent Infringement Allegations

Claim Element (from Independent Claim 10) Alleged Infringing Functionality Complaint Citation Patent Citation
A method for building a deep web crawler, comprising: utilizing scout crawling rules to collect dynamic pages; The complaint alleges the Import.io Platform utilizes "scout crawling rules to collect dynamic pages" by providing users with controls to define how a crawler should operate, such as setting crawl depth and URL templates. ¶20, ¶21 col. 28:5-7
utilizing an analyzer and extractor to determine underlying structure of queries; The complaint alleges the platform includes software tools that are "trained to recognize information structures on web pages" and "determine the underlying structure of queries and extract the information from the web pages." ¶20, ¶22, ¶29 col. 28:8-9
generating instructions for a harvester, wherein the harvester provides requests to a server and collects available pages from the server. The complaint alleges that after determining the query structure, the platform "then generates instructions for a harvester to provide requests to a server and collect available pages from the server." ¶22, ¶29 col. 28:10-13
  • Identified Points of Contention:
    • Scope Questions: A central question may be whether the multi-stage process described in the patent (scout -> analyzer -> harvester) is practiced by the accused platform. The complaint's description suggests a user-configured tool, which raises the question of whether the terms "scout", "analyzer", and "harvester", as defined in the patent's specification, read on the functionality of the Import.io Platform.
    • Technical Questions: The patent describes the "scout" as a component that "randomly 'pings' the forms to collect dynamic pages" (’430 Patent, col. 13:49-50). It is an open question what evidence the complaint provides that the accused platform's user-defined "crawler controls" (Compl. ¶21) perform this specific "scouting" function as required by the claim when read in light of the specification.

’848 Patent Infringement Allegations

Claim Element (from Independent Claim 2) Alleged Infringing Functionality Complaint Citation Patent Citation
A method for extraction of very important numbers and objects that are associated with from a page comprising: determining paragraphs and tables containing numbers; The complaint alleges the Import.io Platform is used to extract "both numbers and objects associated on web pages, which are often in the form of paragraphs and tables." A provided screenshot explains the platform can "Extract data without learning to program" from web pages. ¶17, ¶22, ¶23 col. 28:14-18
and extracting the numbers. The complaint alleges the platform performs extraction and stores the relevant data in a structured format. A visual in the complaint shows extracted data can be output to formats like "XLSX or CSV," "Google Sheets," and "Tableau." ¶22, ¶23 col. 28:19-20
  • Identified Points of Contention:
    • Scope Questions: The term "very important numbers" appears in the claim preamble but not in the body, which simply requires "extracting the numbers." A key issue will be whether the scope of "the numbers" is limited by the preamble and the specification's extensive discussion of "Very Important Numbers (VINs)" (’848 Patent, Fig. 12; col. 21:5-10), or if it covers the extraction of any numerical data as the complaint appears to allege.
    • Technical Questions: What evidence does the complaint provide that the accused platform performs the claimed step of "determining paragraphs and tables containing numbers" as a distinct step, rather than simply allowing a user to point and click on any desired data element on a page, which may or may not be a number within a paragraph or table?

V. Key Claim Terms for Construction

  • For the ’430 Patent:

    • The Term: "scout crawling rules" (from claim 10)
    • Context and Importance: This term appears central to the patent's method for navigating the deep web. Its construction will be critical to determining whether the user-configurable settings in the accused platform (Compl. ¶21) meet this limitation, or if the term requires a more specific, automated process as described in the patent.
    • Intrinsic Evidence for Interpretation:
      • Evidence for a Broader Interpretation: The claim language itself is functional ("utilizing scout crawling rules to collect dynamic pages"), which may support an interpretation that covers any set of rules that achieve that function.
      • Evidence for a Narrower Interpretation: The specification describes a specific process where the "scout randomly 'pings' the forms" and applies rules to "a valid form that a current crawled page contains" based on positive and negative keywords (’430 Patent, col. 13:49-50, col. 14:30-44). This may support a narrower construction tied to this automated, keyword-driven approach.
  • For the ’848 Patent:

    • The Term: "extracting the numbers" (from claim 2)
    • Context and Importance: This element appears exceptionally broad on its face. The entire infringement case for this patent may turn on whether this term is given its plain and ordinary meaning (i.e., extracting any number) or is narrowed by the claim's preamble ("very important numbers") and the specification's detailed disclosure related to "VINs."
    • Intrinsic Evidence for Interpretation:
      • Evidence for a Broader Interpretation: The body of the claim recites simply "extracting the numbers," without the "very important" modifier. A party might argue that preamble language is not limiting and the claim body should control.
      • Evidence for a Narrower Interpretation: The patent repeatedly frames the invention around "Very Important Numbers (VINs)" and their associated objects, as shown in the title of Figure 12 and the detailed description (’848 Patent, col. 21:5-10). A party could argue the inventors acted as their own lexicographer and defined the invention as being limited to these specific types of numbers.

VI. Other Allegations

  • Indirect Infringement: The complaint alleges both induced and contributory infringement (Compl. ¶34, ¶45, ¶53). The factual basis includes allegations that Defendant provides its platform to customers to "create their own web extraction agents" (Compl. ¶35) and provides instructions and support services that cause infringement (Compl. p. 14, first para.). The complaint also alleges the platform has "no substantial non-infringing use" (Compl. ¶41).
  • Willful Infringement: Willfulness is alleged based on knowledge of the patents "at least as of the service of this complaint" (Compl. ¶40). This suggests the primary basis for willfulness is post-suit conduct. The complaint also includes broader allegations that Defendant acted "knowingly or with willful blindness" pre-suit (Compl. ¶49, ¶57).

VII. Analyst’s Conclusion: Key Questions for the Case

  • A core issue will be one of technical and functional scope: Does the accused Import.io Platform—a toolset that appears to be configured and operated by its users—perform the specific, automated, multi-stage functions of a "scout", "analyzer", and "harvester" as described and claimed in the ’430 Patent, or is there a fundamental operational mismatch?
  • A second central issue will be one of definitional breadth: Can the broad claim language "extracting the numbers" in the ’848 Patent be read to cover the general extraction of any numerical data, or will its scope be narrowed by the patent's consistent framing of the invention around "Very Important Numbers" and their specific business context?
  • A key evidentiary question will relate to divided infringement: Given that customers configure the crawlers and direct the data extraction, the case may turn on whether Plaintiff can prove that Defendant directly performs every step of the asserted method claims, or alternatively, directs or controls its customers' actions in a manner sufficient to establish liability for joint infringement.