DCT

1:18-cv-08373

PersonalWeb Tech LLC v. BuzzFeed Inc

Key Events
Complaint

I. Executive Summary and Procedural Information

  • Parties & Counsel:
  • Case Identification: 1:18-cv-08373, S.D.N.Y., 09/13/2018
  • Venue Allegations: Venue is alleged to be proper as Defendant has a regular and established place of business in the Southern District of New York and has committed acts of infringement in the district.
  • Core Dispute: Plaintiff alleges that Defendant’s website content delivery system infringes patents related to using content-based identifiers (hashes) to manage, distribute, and control access to data in a computer network.
  • Technical Context: The technology relates to fundamental methods for identifying data based on its content rather than its name or location, a core concept in modern cloud computing, content delivery networks (CDNs), and distributed data storage.
  • Key Procedural History: The complaint notes that the patents-in-suit have been successfully enforced against third parties, resulting in settlements and non-exclusive licenses. It also states that the last of the patents-in-suit has expired and the infringement allegations are directed to the time period before expiration.

Case Timeline

Date Event
1995-04-11 Priority Date for ’442, ’310, and ’420 Patents
2005-08-09 U.S. Patent No. 6,928,442 Issued
2010-09-21 U.S. Patent No. 7,802,310 Issued
2012-01-17 U.S. Patent No. 8,099,420 Issued
2018-09-13 Complaint Filed

II. Technology and Patent(s)-in-Suit Analysis

U.S. Patent No. 6,928,442 - “Enforcement and Policing of Licensed Content Using Content-Based Identifiers”

The Invention Explained

  • Problem Addressed: In expanding computer networks, conventional methods of identifying data by name and location are inefficient and unreliable. This can lead to data duplication, difficulty in locating files, and an inability to verify that a received file is the correct one, as different files can have identical names and identical files can have different names (Compl. ¶¶13-14; ’442 Patent, col. 2:13-25).
  • The Patented Solution: The invention replaces conventional naming with "substantially unique, content-based identifiers," which the patent dubs "True Names" (Compl. ¶¶14, 16). A "True Name" is generated by applying a cryptographic hash function (such as MD5 or SHA) to the entire sequence of bits comprising a data item ('442 Patent, col. 13:2-15). Because the identifier is derived from the content itself, any two identical data items will have the same "True Name", allowing for efficient management, deduplication, and verification in a distributed system (Compl. ¶17; ’442 Patent, col. 5:8-11).
  • Technical Importance: This content-centric approach to data identification provides a foundational method for reducing bandwidth and storage requirements while increasing the efficiency of data retrieval in large-scale networks (Compl. ¶11).

Key Claims at a Glance

  • The complaint asserts independent claim 10 and dependent claim 11 ('442 Patent, col. 41:12-42:12; Compl. ¶48).
  • Essential elements of independent claim 10 include:
    • A method in a system with a plurality of files distributed across a plurality of computers.
    • obtaining a name for a data file, the name being based at least in part on a given function of the data, wherein the data used by the function comprises the contents of the particular file.
    • determining, using at least the name, whether a copy of the data file is present on at least one of said computers.
    • determining whether a copy of the data file that is present on at least one of said computers is an unauthorized copy or an unlicensed copy of the data file.
  • The complaint does not explicitly reserve the right to assert other dependent claims for this patent.

U.S. Patent No. 7,802,310 - “Controlling Access to Data in a Data Processing System”

The Invention Explained

  • Problem Addressed: The patent addresses the challenge of managing and controlling access to data items distributed across multiple computers in a network, where traditional access control methods are inefficient (’310 Patent, col. 1:13-19).
  • The Patented Solution: The invention provides a method for controlling content distribution by using content-dependent names. A first computer (e.g., an origin server) receives a request from a second computer (e.g., a browser) that includes a content-dependent name for a particular data item. Based on this content-dependent name, the first computer determines whether the second computer's content is authorized or licensed and accordingly permits or denies access to it (’310 Patent, Abstract).
  • Technical Importance: This method allows for decentralized and efficient access control decisions in a distributed system, as authorization can be determined simply by comparing content-based identifiers without needing to inspect the data itself (Compl. ¶¶58-59).

Key Claims at a Glance

  • The complaint asserts independent claim 20 (’310 Patent, col. 39:8-30:30; Compl. ¶56).
  • Essential elements of independent claim 20 include:
    • A computer-implemented method for controlling distribution of content from a first computer to at least one other computer.
    • The control is in response to a request from a second device that includes a content-dependent name of a data item.
    • The content-dependent name is based on a message digest or hash function of the data item's content.
    • Based on the content-dependent name, the first device either permits or does not permit the content to be provided to or accessed by the other computer, depending on whether the content is determined to be unauthorized or unlicensed.
  • The complaint does not explicitly reserve the right to assert other dependent claims for this patent.

U.S. Patent No. 8,099,420 - “Accessing Data in a Data Processing System”

Technology Synopsis

This patent addresses accessing data in a distributed processing system. It discloses a system that determines one or more content-dependent digital identifiers for a data item and then selectively permits access to that data item based on whether its identifier corresponds to an entry in one or more databases (’420 Patent, Abstract).

Asserted Claims

The complaint asserts claims 25, 26, 27, 29, 30, 32, 34–36, and independent claim 166 (Compl. ¶63).

Accused Features

The complaint alleges that Defendant’s system for distributing its website content infringes the ’420 patent. Specifically, Defendant’s servers allegedly apply hash functions to determine content-based ETags for webpage files (the "content-dependent digital identifiers") and then compare these ETags against a database of ETag values to selectively permit or deny access to cached content (Compl. ¶¶66-68).

III. The Accused Instrumentality

Product Identification

The accused instrumentality is Defendant’s website, buzzfeed.com, and the associated system and method for providing and controlling the distribution of its webpage content (Compl. ¶¶29, 49).

Functionality and Market Context

  • The complaint alleges that Defendant’s system distributes webpage files (e.g., HTML base files and asset files like images and scripts) across a plurality of computers, including origin servers, intermediate cache servers, and endpoint browser caches (Compl. ¶¶34, 49).
  • The core accused functionality involves the use of the standard HTTP caching mechanism. Specifically, the system is alleged to generate content-based "ETag" values by applying a hash function to the contents of webpage and asset files (Compl. ¶35). When a browser or intermediate cache requests a file it already has cached, it sends a "conditional" GET request containing the ETag in an "If-None-Match" header (Compl. ¶31).
  • An upstream server receives this request and compares the ETag from the request with the ETag it currently has for that file. If the ETags match, the server sends an HTTP 304 (Not Modified) response, authorizing the cache to use its existing copy. If they do not match, the server sends an HTTP 200 (OK) response with the new file content and the new ETag (Compl. ¶¶41-42). This process is alleged to reduce bandwidth and server load by serving files only when their content has changed (Compl. ¶33).

IV. Analysis of Infringement Allegations

No probative visual evidence provided in complaint.

'6,928,442 Patent Infringement Allegations

Claim Element (from Independent Claim 10) Alleged Infringing Functionality Complaint Citation Patent Citation
a method, in a system in which a plurality of files are distributed across a plurality of computers Defendant’s system distributes webpage files across production servers, origin servers, intermediate cache servers, and endpoint browser caches. ¶49 col. 41:13-15
obtaining a name for a data file, the name being based at least in part on a given function of the data, wherein the data used by the function comprises the contents of the particular file Defendant’s system generates or obtains ETags for its webpage and asset files by using a hash function based on the contents of those files. ¶50 col. 13:2-15
determining, using at least the name, whether a copy of the data file is present on at least one of said computers Defendant’s servers, in response to a conditional GET request containing a URI and an ETag, determine whether a file matching the URI is present. ¶51 col. 15:58-62
determining whether a copy of the data file that is present on a at least one of said computers is an unauthorized copy or an unlicensed copy of the data file If the ETag in the request matches the server's ETag, the downstream copy is determined to be authorized; if there is no match, it is determined to be unauthorized. ¶52 col. 41:22-26

'7,802,310 Patent Infringement Allegations

Claim Element (from Independent Claim 20) Alleged Infringing Functionality Complaint Citation Patent Citation
controlling distribution of content from a first computer to at least one other computer, in response to a request obtained by a first device in the system from a second device in the system... An upstream cache or origin server (first computer) controls content distribution in response to conditional GET requests from a downstream browser or cache (second device). ¶58 col. 39:11-16
the request including at least a content-dependent name of a particular data item, the content-dependent name being based at least in part on a function of at least some of the data... wherein the function comprises a message digest function or a hash function... The conditional GET request includes an ETag, which is alleged to be a content-dependent name based on hashing the data item's contents. ¶58 col. 39:16-24
based at least in part on said content-dependent name... (A) permitting the content to be provided to or accessed... if it is not determined that the content is unauthorized or unlicensed, otherwise, (B)... not permitting the content to be provided to or accessed... The server compares the ETag in the request to its stored ETag. A match results in an HTTP 304 response (permitting access to the cached copy); a mismatch results in an HTTP 200 response (not permitting access to the old content). ¶59 col. 39:25-30:30

Identified Points of Contention

  • Scope Questions: A central dispute may arise over the meaning of terms like "unauthorized copy or an unlicensed copy." The complaint equates a technically outdated file (indicated by an ETag mismatch) with an "unauthorized" copy. The defense may argue that the patent’s language, particularly its title ("Enforcement and Policing of Licensed Content"), implies a legal status (e.g., copyright infringement or license violation), not mere cache invalidation. A similar question arises as to whether a standard ETag, used alongside a URI, functions as a "name" in the manner contemplated by the patents, which describe "True Names" as a replacement for conventional file naming systems.
  • Technical Questions: The complaint alleges "on information and belief" that Defendant's ETags are generated by applying a hash function to the file contents. A key factual question will be whether the accused system actually operates this way. The specific algorithm used to generate ETags could be a point of contention, as some systems use non-content-based values like version numbers or timestamps, which may not meet the claim limitations of being "based at least in part on... the contents of the particular file."

V. Key Claim Terms for Construction

  • The Term: "unauthorized copy or an unlicensed copy of the data file" (’442 Patent, Claim 10)

  • Context and Importance: The viability of the infringement theory for the ’442 patent hinges on construing this term to encompass a technically outdated or non-current version of a file in a cache. Practitioners may focus on this term because if it is limited to a legal status (e.g., a pirated copy), then the accused ETag-based caching system, which manages content versions rather than legal rights, may not infringe.

  • Intrinsic Evidence for Interpretation:

    • Evidence for a Broader Interpretation: The complaint frames the concept as being "reauthorized to serve/use" content, suggesting a technical permission rather than a legal one (Compl. ¶38). The patent specification discusses a system that tracks data items "independent of the name, date, or other properties of the data item" which may support an interpretation based on the technical state of the data rather than its legal status (’442 Patent, col. 4:33-36).
    • Evidence for a Narrower Interpretation: The patent’s title, "Enforcement and Policing of Licensed Content," and abstract, which states "A copy of a requested file is only provided to licensed (or authorized) parties," strongly suggest a legal context related to licensing and copyright. The specification also describes tracking files for "licensing purposes" (’442 Patent, col. 8:52-56).
  • The Term: "content-dependent name" (’310 Patent, Claim 20)

  • Context and Importance: Plaintiff's allegation is that an HTTP ETag is a "content-dependent name." The dispute will likely center on whether an ETag, which functions as a validation token alongside a primary identifier (the URI), qualifies as a "name" as taught by the patent family, which describes "True Names" as foundational identifiers for data items.

  • Intrinsic Evidence for Interpretation:

    • Evidence for a Broader Interpretation: The patent defines the name as being "based at least in part on a function of at least some of the data comprising the particular data item, wherein the function comprises a message digest function or a hash function" (’310 Patent, Claim 20). The complaint alleges ETags are generated in precisely this manner, supporting a functional interpretation (Compl. ¶35).
    • Evidence for a Narrower Interpretation: The patent family specification describes "True Names" as a system to elevate data processing "over conventional file-naming systems" (Compl. ¶18), suggesting a replacement for, rather than a supplement to, conventional names like URIs. An ETag's role as a secondary validation token, rather than a primary identifier, may support a narrower construction.

VI. Analyst’s Conclusion: Key Questions for the Case

  • A core issue will be one of definitional scope: can the patent term "unauthorized copy," which appears rooted in the context of legal content licensing, be construed broadly enough to cover the technically outdated file status managed by a standard HTTP ETag-based caching system? The outcome of this claim construction dispute may be dispositive.
  • A key evidentiary question will be one of technical operation: does the complaint’s "information and belief" allegation hold true that Buzzfeed's ETags are generated by a hash function based on the file's contents, as required by the claims? Or are they generated using other metrics like timestamps or version numbers, which could place the accused system outside the scope of the asserted claims?