DCT

1:18-cv-00164

PersonalWeb Tech LLC v. Atlas Obscura Inc

Key Events
Complaint
complaint

I. Executive Summary and Procedural Information

  • Parties & Counsel:
  • Case Identification: 1:18-cv-00164, E.D.N.Y., 01/10/2018
  • Venue Allegations: Venue is alleged to be proper as Defendant has a regular and established place of business in the Eastern District of New York and has committed alleged acts of infringement in the District.
  • Core Dispute: Plaintiff alleges that Defendant’s website content delivery and caching system infringes five patents related to identifying, locating, and managing data using content-based unique identifiers.
  • Technical Context: The patents relate to foundational concepts in distributed data storage, where data items are identified by a cryptographic hash of their content, a technique now central to cloud computing and content delivery networks for ensuring data integrity and efficiency.
  • Key Procedural History: The complaint states that Plaintiff has previously enforced the patents-in-suit, resulting in settlements and the grant of non-exclusive licenses. The complaint also notes that the last of the patents-in-suit has expired and that the allegations are directed to infringement that occurred during the life of the patents.

Case Timeline

Date Event
1995-04-11 Priority Date for Patents-in-Suit
1999-11-02 U.S. Patent No. 5,978,791 Issued
2005-08-09 U.S. Patent No. 6,928,442 Issued
2010-09-21 U.S. Patent No. 7,802,310 Issued
2011-05-17 U.S. Patent No. 7,945,544 Issued
2012-01-17 U.S. Patent No. 8,099,420 Issued
2018-01-10 Complaint Filed

II. Technology and Patent(s)-in-Suit Analysis

U.S. Patent No. 5,978,791 - "Data Processing System Using Substantially Unique Identifiers to Identify Data Items, Whereby Identical Data Items Have the Same Identifiers"

The Invention Explained

  • Problem Addressed: In expanding computer networks, conventional methods for naming and locating data items were context-dependent (e.g., reliant on file paths and server locations), leading to data duplication, difficulty in verifying data integrity, and inefficiencies in locating and accessing stored data (Compl. ¶12; ’791 Patent, col. 1:11-2:17).
  • The Patented Solution: The invention proposes replacing conventional naming with content-based identifiers, which it terms "True Names" (Compl. ¶15). A True Name is a "substantially unique identifier" generated by applying a cryptographic function (like a hash) to the entire sequence of bits of a data item. Because the identifier depends only on the content itself, two identical data items will have the same True Name, regardless of their location, filename, or other contextual metadata, thereby allowing for efficient, verifiable, and location-independent data management (’791 Patent, col. 3:28-32, Abstract).
  • Technical Importance: This approach provided a method for decentralized, content-addressable storage, a key concept for reducing redundancy and improving integrity in large-scale distributed systems like the early internet and modern cloud storage (Compl. ¶10).

Key Claims at a Glance

  • The complaint asserts independent method claim 38 (Compl. ¶36).
  • Claim 38 requires, in essential part:
    • (A) Determining a substantially unique identifier for a data item, where the identifier depends on and is determined using all of and only the data in the data item.
    • (B) Requesting the data item by sending its identifier from a requester location to a provider location.
    • (C) At the provider location: (a) maintaining a set of identifiers for a plurality of data items, (b) determining if the requested data item is present by comparing its identifier to the set, and (c) if present, notifying the requester that the provider has a copy.
  • The complaint also asserts claim 42 and reserves the right to assert other claims (Compl. ¶36).

U.S. Patent No. 6,928,442 - "Enforcement and Policing of Licensed Content Using Content-Based Identifiers"

The Invention Explained

  • Problem Addressed: This patent builds on the "True Name" system to address the challenge of managing and controlling access to files distributed across multiple computers in a network, particularly in the context of licensed content (’442 Patent, Abstract).
  • The Patented Solution: The invention describes a method where a "name" for a data file is generated based on a function of its content. This content-based name is then used to determine not only if a copy of the file is present on a computer, but also whether that copy is "an unauthorized copy or an unlicensed copy," enabling a system to police the distribution of content (’442 Patent, col. 1:47-58).
  • Technical Importance: This technology relates to digital rights management (DRM) and content control in distributed networks, allowing for verification of content authorization based on the intrinsic properties of the data itself rather than external metadata (Compl. ¶47).

Key Claims at a Glance

  • The complaint asserts independent method claim 10 (Compl. ¶46).
  • Claim 10 requires, in essential part:
    • Obtaining a name for a data file that is based at least in part on a function of the data comprising the file's contents.
    • Using at least that name to determine whether a copy of the data file is present on at least one computer in a plurality of computers.
    • Determining whether the copy that is present is an "unauthorized copy or an unlicensed copy of the data file."
  • The complaint also asserts claim 11 and reserves the right to assert other claims (Compl. ¶46).

U.S. Patent No. 7,802,310 - "Controlling Access to Data in a Data Processing System"

  • Technology Synopsis: The patent describes a system for controlling data access where a first computer receives a request from a second computer that includes a content-dependent name (e.g., a hash). The first computer compares this name to a plurality of values to determine if access is authorized and, based on that determination, allows or denies access to the data item.
  • Asserted Claims: Independent claims 20, 69, and 71 are asserted (Compl. ¶54).
  • Accused Features: The accused functionality involves upstream cache or origination servers ("first computer") receiving CONDITIONAL GET requests with E-Tags ("content-dependent name") from user browsers ("second computer") and comparing the E-Tags to a stored list to determine if the cached content is still authorized for use (Compl. ¶¶56-57).

U.S. Patent No. 7,945,544 - "Similarity-Based Access Control of Data in a Data Processing System"

  • Technology Synopsis: The patent discloses a computer-implemented method where a "digital key" is determined for a file by applying a function to its contents or parts. This key is added to a database. A "search key," generated from search criteria, is then used to match against digital keys in the database to retrieve information about the corresponding file.
  • Asserted Claims: Independent claim 46 and dependent claims 48, 49, 52, 55, and 56 are asserted (Compl. ¶61).
  • Accused Features: The accused method includes generating an E-Tag ("digital key") for a webpage's index file, mapping this E-Tag to file information in a database/table, and using a received E-Tag from a browser's CONDITIONAL GET request as a "search key" to match against the database to determine if the webpage content has changed (Compl. ¶¶63-68).

U.S. Patent No. 8,099,420 - "Accessing Data in a Data Processing System"

  • Technology Synopsis: The patent describes a system that determines one or more content-dependent digital identifiers for a data item. It then uses these identifiers and a database of corresponding identifiers to selectively permit the data item to be made available for access across a network, ensuring it is not provided without authorization.
  • Asserted Claims: Independent claim 166 and dependent claims 25, 26, 27, 29, 30, and 32-36 are asserted (Compl. ¶72).
  • Accused Features: The accused system allegedly applies hash functions to webpage files to determine fingerprints and E-Tags ("content-dependent digital identifiers"). These identifiers are stored in databases on webpage servers and are used in a system of CONDITIONAL GET requests to selectively determine whether a requesting computer can access cached content or must receive new authorized content (Compl. ¶¶75-77).

III. The Accused Instrumentality

Product Identification

  • The accused instrumentality is the website located at "atlasobscura.com", including its underlying architecture and method for distributing webpage content to users (Compl. ¶19).

Functionality and Market Context

  • The complaint alleges the website uses a Ruby on Rails architecture to compile webpage files and generate a "fingerprint" from the content of each file using a hash function (Compl. ¶21). These files are uploaded as objects to an Amazon S3 host system, which generates an associated "E-Tag" value by applying a hash function to the object's content (Compl. ¶¶22-23). To efficiently manage cached content, the system allegedly uses the HTTP "CONDITIONAL GET" protocol with "IF-NONE-MATCH" headers. A user's browser sends the E-Tag of its cached content; if it matches the server's current E-Tag, the server responds with an HTTP 304 message, authorizing the use of the cached version. If it does not match, the server sends an HTTP 200 message with the new content and a new E-Tag (Compl. ¶¶26-28). The complaint alleges that by appending content fingerprints to URLs and using E-Tags, Defendant controls the behavior of intermediate and end-point caches to reduce bandwidth and efficiently update content (Compl. ¶¶20, 32-33). No probative visual evidence provided in complaint.

IV. Analysis of Infringement Allegations

’791 Patent Infringement Allegations

Claim Element (from Independent Claim 38) Alleged Infringing Functionality Complaint Citation Patent Citation
(A) determining a substantially unique identifier for the data item, the identifier depending on and being determined using all of the data in the data item and only the data in the data item, whereby two identical data items in the system will have the same identifier Defendant’s website determines a content-based identifier for data items (e.g., asset files) by calculating a hash "fingerprint" and/or an E-Tag based solely on the file's contents (a sequence of bits). ¶38 col. 12:55-13:1
(B) requesting the particular data item by sending the data identifier of the data item from the requester location to at least one location of a plurality of provider locations in the system A user's browser (requester location) sends a "CONDITIONAL GET" request containing the E-Tag (data identifier) to upstream servers (provider locations) to request verification of the corresponding webpage file. ¶39 col. 15:10-14
(C) on at least some of the provider locations, (a) for each data item... (i) determining a substantially unique identifier... and (ii) making and maintaining a set of identifiers of data items Defendant’s origination and intermediate cache servers (provider locations) determine and maintain a set of identifiers by storing E-Tags and URIs (which include content fingerprints) mapped to the corresponding content. ¶40 col. 7:25-34
(b) determining, based on the set of identifiers, whether the data item corresponding to the requested data identifier is present at the provider location A responding server determines if the content corresponding to the E-Tag in the "CONDITIONAL GET" request is present by comparing the received E-Tag to the E-Tag values it has stored in its database/table. ¶41 col. 7:25-34
(c) based on the determining, when the provider location determines that the particular data item is present at the provider location, notifying the requestor that the provider has a copy of the given data item When a match is found between the received E-Tag and the stored E-Tag, the responding server issues an HTTP 304 message, notifying the requesting browser that the server has the same (and still authorized) version of the file content. ¶42 col. 15:10-24

’442 Patent Infringement Allegations

Claim Element (from Independent Claim 10) Alleged Infringing Functionality Complaint Citation Patent Citation
obtaining a name for a data file, the name being based at least in part on a given function of the data, wherein the data... comprises the contents of the particular file Defendant obtains E-Tags and fingerprints ("name") for its index and asset files, where the name is generated using a hash function ("given function") based on the contents of the file. ¶48 col. 11:54-61
determining, using at least the name, whether a copy of the data file is present on at least one of said computers Origination and intermediate cache servers, upon receiving a "CONDITIONAL GET" request, use the E-Tag ("the name") to determine if a copy of the content is present by comparing it to the E-Tags of files stored on that server. ¶49 col. 1:49-54
determining whether a copy of the data file that is present... is an unauthorized copy or an unlicensed copy of the data file If the E-Tag comparison results in a match, the server determines the cached copy is "authorized or licensed." If there is no match, it is determined that the cached copy is "unauthorized or unlicensed." ¶50 col. 1:55-58
  • Identified Points of Contention:
    • Scope Questions: A central question may be whether standard web caching identifiers like "E-Tags" and URL "fingerprints" fall within the scope of the patents' terms "substantially unique identifier" (’791 Patent) and "name for a data file" (’442 Patent). The analysis may depend on whether these web technologies provide the level of uniqueness and content-dependency contemplated by the patents.
    • Technical Questions: The complaint equates a standard cache validation mechanism with a system for determining if a copy is "unauthorized or unlicensed" (’442 Patent). A point of contention may be whether an HTTP 304/200 response, which signals cache validity, performs the claimed function of policing licensed content, or if it is a fundamentally different technical operation aimed at efficiency. Further, questions may arise regarding the level of control Defendant exercises over the entire distributed system, including third-party intermediate caches, to establish direct infringement of the system and method claims (Compl. ¶22).

V. Key Claim Terms for Construction

  • For the ’791 Patent:

    • The Term: "substantially unique identifier"
    • Context and Importance: This term is the core of the invention. The infringement case hinges on whether the accused E-Tags and fingerprints meet this definition. Practitioners may focus on this term because its construction will determine whether standard, widely used web technologies read on the claims.
    • Intrinsic Evidence for Interpretation:
      • Evidence for a Broader Interpretation: The specification describes the identifier as one that is "virtually guaranteed to represent the data block B and only data block B," suggesting a functional, probabilistic standard rather than a requirement for a specific algorithm (’791 Patent, col. 12:52-54).
      • Evidence for a Narrower Interpretation: The specification repeatedly uses cryptographic message digest functions like MD4, MD5, and SHA as the primary examples of functions that can create such an identifier, suggesting the term may be limited to identifiers with strong cryptographic properties (’791 Patent, col. 13:13-19).
  • For the ’442 Patent:

    • The Term: "unauthorized copy or an unlicensed copy"
    • Context and Importance: The complaint's theory of infringement maps this term to a cache-miss scenario (i.e., when a server's E-Tag does not match a browser's E-Tag). The viability of this theory depends on construing "unauthorized" to include content that is merely out-of-date in a cache.
    • Intrinsic Evidence for Interpretation:
      • Evidence for a Broader Interpretation: The patent does not explicitly define the term, which may allow for a construction that includes any copy the system does not authorize for use in a given transaction, such as an outdated cached file. The claims focus on the act of "determining" based on the content-based name, which is what the complaint alleges the server does.
      • Evidence for a Narrower Interpretation: The patent's title is "Enforcement and Policing of Licensed Content", and the specification discusses a "license table" (’442 Patent, col. 7:51-56). This context suggests the term refers to copies that violate a license agreement (e.g., pirated content), not merely stale data in a browser cache that needs updating for technical reasons.

VI. Other Allegations

The complaint does not provide sufficient detail for analysis of indirect or willful infringement.

VII. Analyst’s Conclusion: Key Questions for the Case

  • A core issue will be one of definitional scope: can standard internet protocols developed for caching efficiency, such as HTTP "E-Tags" and "IF-NONE-MATCH" headers, be construed to meet the patent claims' requirements for a "substantially unique identifier" used to determine whether a file is an "unauthorized or unlicensed copy"?
  • A key question will be one of functional equivalence: does the accused system's technical process for validating a browser cache—responding with new content when an E-Tag doesn't match—perform the specific function of "policing licensed content" as described in the patents, or is there a fundamental mismatch in purpose and operation?
  • An evidentiary challenge will be establishing direct infringement across a distributed network: what evidence will be required to demonstrate that Defendant "controls" or "directs" the actions of third-party intermediate cache servers and end-user browsers to the extent required to prove it performs every step of the asserted method claims?