DCT

1:18-cv-00224

PersonalWeb Tech LLC v. Ziff Davis LLC

Key Events
Complaint
complaint

I. Executive Summary and Procedural Information

  • Parties & Counsel:
  • Case Identification: 1:18-cv-00224, S.D.N.Y., 01/10/2018
  • Venue Allegations: Plaintiff alleges venue is proper because Defendant maintains a regular and established place of business in the Southern District of New York, conducts business in the district, and has committed the alleged acts of infringement there.
  • Core Dispute: Plaintiff alleges that Defendant’s website, pcmag.com, infringes five patents related to using content-based identifiers to manage, locate, and control access to data in distributed computer networks.
  • Technical Context: The patents address foundational technologies for content-addressable storage, where data is identified by a unique value derived from its content, a concept central to modern cloud computing, content delivery networks, and data deduplication.
  • Key Procedural History: The complaint notes that the last of the patents-in-suit has expired, directing the allegations to damages incurred before expiration. It also states that Plaintiff has previously enforced these patents, resulting in settlements and non-exclusive licenses. Public records indicate the patents have been subject to multiple ex parte reexaminations and inter partes reviews, with some claims being canceled, though key asserted claims appear to have survived these challenges.

Case Timeline

Date Event
1995-04-11 Earliest Priority Date for U.S. Patent Nos. 5,978,791; 6,928,442; 7,802,310; 7,945,544; and 8,099,420
1999-11-02 U.S. Patent No. 5,978,791 Issues
2005-08-09 U.S. Patent No. 6,928,442 Issues
2010-09-21 U.S. Patent No. 7,802,310 Issues
2011-05-17 U.S. Patent No. 7,945,544 Issues
2012-01-17 U.S. Patent No. 8,099,420 Issues
2018-01-10 Complaint Filed

II. Technology and Patent(s)-in-Suit Analysis

U.S. Patent No. 5,978,791 - "Data Processing System Using Substantially Unique Identifiers to Identify Data Items, Whereby Identical Data Items Have the Same Identifiers"

The Invention Explained

  • Problem Addressed: In distributed computer networks, traditional methods of identifying data (e.g., by file name and path) are context-dependent, which can lead to data duplication, difficulty in verifying data integrity, and challenges in managing data across different locations (Compl. ¶11-12; ’791 Patent, col. 1:49-2:11).
  • The Patented Solution: The invention proposes a system where each "data item" (a sequence of bits, like a file) is given a "substantially unique identifier" that depends only on the content of the data item itself (’791 Patent, col. 3:26-34). This identifier, referred to as a "True Name," is generated using a function like a cryptographic hash, making it independent of the data's name or location and allowing for efficient detection of duplicates and verification of content in a distributed system (Compl. ¶13-14; ’791 Patent, Abstract).
  • Technical Importance: This approach of content-addressable storage is a foundational element of modern data deduplication technologies, distributed file systems, and content delivery networks that require efficient management of data copies (Compl. ¶10).

Key Claims at a Glance

  • The complaint asserts independent claim 38 (Compl. ¶36).
  • The essential elements of claim 38, a method claim, include:
    • (A) Determining a "substantially unique identifier" for a data item based on all of, and only, the data in that item.
    • (B) Requesting the data item by sending its identifier from a "requester location" to one or more "provider locations."
    • (C) At a provider location: (a) determining and maintaining a set of identifiers for its available data items; (b) using the set of identifiers to determine if the requested data item is present; and (c) if present, "notifying the requestor that the provider has a copy."
  • The complaint also asserts dependent claim 42 (Compl. ¶36).

U.S. Patent No. 6,928,442 - "Enforcement and Policing of Licensed Content Using Content-Based Identifiers"

The Invention Explained

  • Problem Addressed: The patent extends the "True Name" concept to the problem of managing and policing licensed or authorized content in a distributed network where multiple copies of files may exist (’442 Patent, Abstract).
  • The Patented Solution: The invention describes a method where the content-based identifier of a data file is used not only to locate it but also to determine if a copy of that file on a given computer is an "unauthorized copy or an unlicensed copy" (’442 Patent, Abstract; col. 2:4-11). This allows for content policing based on the data itself rather than on unreliable file names or locations.
  • Technical Importance: This method provides a framework for implementing digital rights management (DRM) and content access control in distributed systems, such as content delivery networks, where verifying the status of content is critical (Compl. ¶47, 50).

Key Claims at a Glance

  • The complaint asserts independent claim 10 (Compl. ¶46).
  • The essential elements of claim 10, a method claim, include:
    • Obtaining a "name for a data file" that is based on a function of the data comprising the file's contents.
    • Determining, using that name, whether a copy of the data file is present on at least one computer in a plurality of computers.
    • Determining whether a present copy is an "unauthorized copy or an unlicensed copy of the data file."
  • The complaint also asserts dependent claim 11 (Compl. ¶46).

U.S. Patent No. 7,802,310 - "Controlling Access to Data in a Data Processing System"

  • Technology Synopsis: The ’310 Patent describes a system for controlling data access in a network. A first computer receives a request from a second computer that includes a content-dependent name for a data item; in response, the first computer compares that name to a plurality of stored values to determine if access is authorized and, based on that determination, allows or denies the second computer access to the data item (Compl. ¶53, 56-57).
  • Asserted Claims: Claims 20, 69, and 71 are asserted, with claim 69 being independent (Compl. ¶54).
  • Accused Features: The accused features include the use of E-Tags as content-dependent names in HTTP CONDITIONAL GET requests, where upstream cache or origin servers compare the received E-Tag to a database of values to determine if a browser is authorized to use its cached content (Compl. ¶56-57).

U.S. Patent No. 7,945,544 - "Similarity-Based Access Control of Data in a Data Processing System"

  • Technology Synopsis: The ’544 Patent claims a method where a "digital key" for a file (e.g., a webpage) is determined by applying functions to its constituent parts (e.g., asset files). This key is added to a database, and a "search key" from a subsequent request is matched against the database to determine if content has changed and to control access (Compl. ¶60, 63-64, 66-67).
  • Asserted Claims: Claims 46, 48, 49, 52, 55, and 56 are asserted, with claim 46 being independent (Compl. ¶61).
  • Accused Features: The complaint alleges that E-Tags for webpages function as "digital keys," generated by hashing the contents of an index file that lists the URIs of the webpage's asset files. An E-Tag in a later HTTP request serves as a "search key" to be matched against a database of E-Tags on a server to control content delivery (Compl. ¶63-68).

U.S. Patent No. 8,099,420 - "Accessing Data in a Data Processing System"

  • Technology Synopsis: The ’420 Patent describes a system that determines one or more content-dependent digital identifiers for a data item. The system then selectively permits the data item to be made available for access based on whether at least one of its identifiers corresponds to an entry in one or more databases (Compl. ¶72, 76-77).
  • Asserted Claims: Claims 25, 26, 27, 29, 30, 32-36, and 166 are asserted, with claim 166 being independent (Compl. ¶73).
  • Accused Features: The accused system generates fingerprints and E-Tags for webpage files. Webpage servers with databases of these E-Tag values use CONDITIONAL GET requests to selectively determine whether a requesting computer can access its cached content or must access newly authorized content (Compl. ¶76-78).

III. The Accused Instrumentality

Product Identification

  • The accused instrumentality is the system and method used to operate the website pcmag.com (Compl. ¶19).

Functionality and Market Context

  • The complaint alleges that the pcmag.com website uses a distributed architecture to deliver content efficiently to users. This system is alleged to use a Ruby on Rails architecture to generate "fingerprints" based on the content of webpage files (e.g., asset files, index files) and appends these fingerprints to the files' URLs (Compl. ¶21). These files are uploaded as objects to an Amazon S3 hosting system, which in turn generates a content-based E-Tag value for each object using a hash function (Compl. ¶22-23). The system is designed to control a network of origin servers, intermediate cache servers, and end-user browser caches. It allegedly forces these components to use the HTTP "IF-NONE-MATCH" protocol, where a client sends a cached E-Tag to a server to verify if the content is still current and authorized for use (Compl. ¶26). Based on whether the E-Tag matches the server's current version, the server responds with either an HTTP 304 (Not Modified) message, authorizing the use of the cached content, or an HTTP 200 (OK) message with new content (Compl. ¶27-28). This process allegedly reduces bandwidth and efficiently manages content updates across the network (Compl. ¶20).

No probative visual evidence provided in complaint.

IV. Analysis of Infringement Allegations

U.S. Patent No. 5,978,791 Infringement Allegations

Claim Element (from Independent Claim 38) Alleged Infringing Functionality Complaint Citation Patent Citation
(A) determining a substantially unique identifier for the data item, the identifier depending on and being determined using all of the data in the data item and only the data in the data item... Defendant’s system calculates hash fingerprints and E-Tags based entirely on the contents of its webpage files (e.g., asset files). ¶38 col. 12:55-65
(B) requesting the particular data item by sending the data identifier of the data item from the requester location to at least one location of a plurality of provider locations in the system. Defendant’s system forces requester locations (browsers, downstream caches) to send CONDITIONAL GET requests containing E-Tags (the data identifier) to provider locations (intermediate caches, origin servers). ¶39 col. 6:22-30
(C)(a) for each data item... (i) determining a substantially unique identifier for the data item... and (ii) making and maintaining a set of identifiers of data items. Defendant's origination servers and intermediate cache servers create and maintain databases or tables that map file URIs to their corresponding E-Tags (the set of identifiers). ¶40 col. 7:20-30
(C)(b) determining, based on the set of identifiers, whether the data item corresponding to the requested data identifier is present at the provider location. A responding server determines if the requested content is present by comparing the E-Tag received in an "IF-NONE-MATCH" request against its stored set of E-Tag values to find a match. ¶41 col. 14:55-59
(C)(c) based on the determining... notifying the requestor that the provider has a copy of the given data item. When a match is found, the responding server sends an HTTP 304 message, which notifies the requesting location that the provider has the same file content and that the requester is re-authorized to use its cached copy. ¶42 col. 16:15-28
  • Identified Points of Contention:
    • Scope Questions: A primary question may be whether standard features of the HTTP protocol, such as E-Tags and CONDITIONAL GET requests, fall within the scope of the patent's "substantially unique identifier" and associated system. The defense could argue these are conventional tools for cache validation, not the specific "True Name" data management system described in the patent. The complaint, however, maps these standard features directly onto the claim elements (Compl. ¶38-42).
    • Technical Questions: The analysis may turn on whether an HTTP 304 "Not Modified" response constitutes "notifying the requestor that the provider has a copy" as required by the claim (Compl. ¶42). A court may need to determine if this status code, which indicates content freshness, performs the same function as the notification of possession described in the patent.

U.S. Patent No. 6,928,442 Infringement Allegations

Claim Element (from Independent Claim 10) Alleged Infringing Functionality Complaint Citation Patent Citation
obtaining a name for a data file, the name being based at least in part on a given function of the data, wherein the data used by the function comprises the contents of the particular file. Defendant’s system obtains E-Tags and content fingerprints for its index and asset files by applying a hash function to the contents of those files. ¶48 col. 2:4-11
determining, using at least the name, whether a copy of the data file is present on at least one of said computers. An upstream server (origination or intermediate cache) receives a CONDITIONAL GET request with an E-Tag and compares that E-Tag to its stored list of E-Tags to determine if it has a copy of the corresponding file content. ¶49 col. 2:12-14
determining whether a copy of the data file that is present on a at least one of said computers is an unauthorized copy or an unlicensed copy of the data file. If the server finds a matching E-Tag, it determines the copy at the downstream location is "authorized or licensed." If there is no match, it determines the copy is "unauthorized or unlicensed" and sends new content. ¶50 col. 2:15-18
  • Identified Points of Contention:
    • Scope Questions: The central dispute may focus on the interpretation of "unauthorized copy or an unlicensed copy." The complaint equates a cache freshness check (a non-matching E-Tag) with a determination of unauthorized status (Compl. ¶50). A defendant may argue that this conflates two distinct technical functions: checking if content is stale versus checking if it is unlicensed.
    • Technical Questions: The case may require evidence on whether the accused system's E-Tag comparison functions as a substantive authorization or licensing check beyond merely indicating that a newer version of a file is available.

V. Key Claim Terms for Construction

  • The Term: "substantially unique identifier" (’791 Patent, Claim 38)

    • Context and Importance: This term is the foundation of the asserted patents. Its construction is critical because Plaintiff’s infringement theory depends on mapping this term to standard web technologies like E-Tags and content-based "fingerprints." Practitioners may focus on this term because its scope will determine whether the patent covers a specific, novel architecture or broadly applies to common content delivery practices.
    • Intrinsic Evidence for Interpretation:
      • Evidence for a Broader Interpretation: The specification states the identifier is generated by a function that "reduces a data block B of arbitrary length to a relatively small, fixed size identifier" and provides MD5 and SHA as examples (’791 Patent, col. 12:55-65, col. 13:12-14). This language could support an interpretation covering any content-based hash value, such as an E-Tag.
      • Evidence for a Narrower Interpretation: The patent consistently refers to this identifier as a "True Name" and embeds it within a detailed system architecture including a "True File registry (TFR)" and "Local Directory Extensions (LDE)" (’791 Patent, Fig. 1(b); col. 7:27-44). The defense may argue that the term must be construed in this specific context and is not a generic content hash.
  • The Term: "unauthorized copy or an unlicensed copy" (’442 Patent, Claim 10)

    • Context and Importance: This term is central to the ’442 patent's infringement allegation. Plaintiff alleges that a non-matching E-Tag signifies that cached content is "unauthorized" (Compl. ¶50). The case may turn on whether a technical check for content staleness is legally equivalent to a check for license or authorization status.
    • Intrinsic Evidence for Interpretation:
      • Evidence for a Broader Interpretation: The patent is titled "Enforcement and Policing of Licensed Content" and discusses distributing files only to "licensed (or authorized) parties" (’442 Patent, Title; Abstract). Plaintiff may argue that in the context of the accused system, which controls content distribution, an outdated file is functionally an "unauthorized" file because the user is no longer authorized to access that specific version.
      • Evidence for a Narrower Interpretation: The claim language distinguishes between determining "whether a copy... is present" and determining if that copy is "unauthorized or unlicensed." A defendant could argue this implies two separate determinations, and that a simple freshness check (presence) does not satisfy the distinct requirement of a license or authorization check.

VI. Other Allegations

  • Indirect Infringement: The complaint alleges that Defendant "forced" and "controlled" the behavior of third-party intermediate cache servers and end-user browsers to perform the steps of the claimed methods (Compl. ¶26, 39, 40). These allegations may support a claim for induced infringement by suggesting that Defendant designed and operated its website system with the knowledge and intent that it would cause other components in the network to perform the infringing acts.

VII. Analyst’s Conclusion: Key Questions for the Case

  • A core issue will be one of technical and definitional scope: Can standard web caching protocols, such as HTTP E-Tags and CONDITIONAL GET requests, be construed to meet the specific claim limitations of a "substantially unique identifier" used to "notify" a requester of possession and determine if content is "unauthorized," or does this represent an attempt to apply the patent claims to technology outside the context of the invention?
  • A key question of liability will be one of control: To what extent can the operator of a website be held liable for the automated and standardized actions of third-party components in the content delivery chain, such as intermediary caches and end-user browsers, under theories of direct or indirect infringement? The plaintiff's allegations that the defendant "forced" these actions will be a central point of contention.