DCT

5:18-cv-04626

PersonalWeb Tech LLC v. Shopify Inc

Key Events
Complaint

I. Executive Summary and Procedural Information

  • Parties & Counsel:
  • Case Identification: 5:18-cv-04626, N.D. Cal., 08/01/2018
  • Venue Allegations: Venue is alleged to be proper because Defendant Shopify Inc. is not a U.S. resident, and Defendant Shopify (USA) Inc. has a regular and established place of business in the district and has allegedly committed acts of infringement there.
  • Core Dispute: Plaintiffs allege that Defendant’s website and content delivery system infringe four patents related to using content-based identifiers to manage, locate, and control access to data in distributed computer networks.
  • Technical Context: The technology relates to fundamental aspects of cloud computing and content delivery networks, specifically methods for uniquely identifying data based on its content (e.g., through hashing) to improve efficiency, reduce data duplication, and manage caching.
  • Key Procedural History: The complaint notes that Plaintiffs have successfully enforced the patents-in-suit against other parties, resulting in settlements and non-exclusive licenses. It also states that the last of the patents-in-suit has expired and that the infringement allegations are directed to the time period before expiration.

Case Timeline

Date Event
1995-04-11 Priority Date for ’442, ’310, ’544, and ’420 Patents
2005-08-09 U.S. Patent No. 6,928,442 Issued
2010-09-21 U.S. Patent No. 7,802,310 Issued
2011-05-17 U.S. Patent No. 7,945,544 Issued
2012-01-17 U.S. Patent No. 8,099,420 Issued
2018-08-01 Complaint Filed

II. Technology and Patent(s)-in-Suit Analysis

U.S. Patent No. 6,928,442 - "Enforcement and Policing of Licensed Content Using Content-Based Identifiers," issued August 9, 2005 (’442 Patent)

The Invention Explained

  • Problem Addressed: The patent’s background section describes the inability of conventional data naming systems (which rely on file names and storage locations) to function efficiently in large, distributed networks. This approach leads to data duplication, difficulty in locating content, and an inability to verify that a named file corresponds to the correct data (Compl. ¶¶14-15; ’442 Patent, col. 1:11-2:46).
  • The Patented Solution: The invention proposes replacing conventional naming with "substantially unique identifiers" called "True Names," which are generated by applying a cryptographic hash function (such as MD5 or SHA) to the content of a data item. This allows any data item—such as a file, a part of a file, or a digital image—to be identified, managed, and located based solely on its content, independent of its name or location (Compl. ¶¶16-19; ’442 Patent, col. 3:30-36, Abstract). This process is illustrated in a flowchart in FIG. 10(a) of the patent (’442 Patent, col. 13:5-15).
  • Technical Importance: This content-centric approach was intended to reduce bandwidth and storage requirements in expanding networks by making it simple to identify and eliminate duplicate data (Compl. ¶13).

Key Claims at a Glance

  • The complaint asserts independent claim 10 and dependent claim 11 (’442 Patent, Compl. ¶52).
  • Essential elements of independent claim 10 include:
    • A method in a system with files distributed across multiple computers.
    • obtaining a name for a data file based at least in part on a function of the data within the file.
    • determining, using that name, whether a copy of the data file is present on one of the computers.
    • determining whether a present copy of the data file is an "unauthorized copy or an unlicensed copy."

U.S. Patent No. 7,802,310 - "Controlling Access to Data in a Data Processing System," issued September 21, 2010 (’310 Patent)

The Invention Explained

  • Problem Addressed: The patent addresses the same problems as the ’442 Patent regarding the management of data in distributed networks using conventional, location-based naming schemes (’310 Patent, col. 1:11-2:46).
  • The Patented Solution: The invention describes a system for controlling access to data using content-dependent names. A "first computer" (e.g., a server) receives a request from a "second computer" (e.g., a client) that includes a content-dependent name for a data item. The first computer compares this name to a database of values to determine if access is authorized. If authorized, the system allows the data item to be provided to or accessed by the second computer (’310 Patent, Abstract). The process is depicted in flowcharts such as FIG. 28, which shows a lookup process for a "True Name" and logic for forwarding the request (’310 Patent, col. 46:3-9).
  • Technical Importance: The solution provides a method for implementing access control in a system where data is identified by its content rather than its location, a key function for secure content delivery.

Key Claims at a Glance

  • The complaint asserts independent claim 69 (’310 Patent, Compl. ¶60).
  • Essential elements of independent claim 69 include:
    • A system in a network of computers with hardware and software.
    • The system is configured to receive a request for a data item at a first computer from a second computer, where the request includes a "content-dependent name" generated by a hash or message digest function.
    • In response, the system compares the content-dependent name to a plurality of values.
    • The system determines if access is authorized based on whether the name corresponds to one of the values.
    • The system allows access if it is not determined to be unauthorized.

U.S. Patent No. 7,945,544 - "Similarity-Based Access Control of Data in a Data Processing System," issued May 17, 2011 (’544 Patent)

  • Technology Synopsis: This patent describes a method for creating a "digital key" for a file by first applying a hash function to individual parts of the file to create "part values" (or fingerprints), and then applying a second hash function to those part values to create the final key for the entire file (Compl. ¶70). This hierarchical key is then added to a database and can be used as a search key to find matching files (Compl. ¶71-73).
  • Asserted Claims: Claims 46, 48, 52, and 55 (with 46 being independent) (Compl. ¶67).
  • Accused Features: The complaint alleges that Shopify’s system infringes by generating fingerprints for individual asset files (the "first function" on "parts"), including those fingerprints in the asset URIs within a webpage file, and then generating an ETag for the entire webpage file using a hash function on its contents, which includes the asset URIs (the "second function") (Compl. ¶¶69-70).

U.S. Patent No. 8,099,420 - "Accessing Data in a Data Processing System," issued January 17, 2012 (’420 Patent)

  • Technology Synopsis: This patent claims a system that uses content-dependent digital identifiers to manage data access. The system determines identifiers for data items using a given function and uses a database of these identifiers to selectively permit or deny access to the data item within a network of computers (Compl. ¶¶81-82).
  • Asserted Claims: Claims 25, 26, 27, 29, 30, 32, 34–36, and 166 (with 166 being independent) (Compl. ¶78).
  • Accused Features: The complaint accuses Shopify's system of using hash functions to create content-dependent identifiers (ETags) for webpage files and maintaining databases of these ETags on its servers (Compl. ¶¶81, 83). The system allegedly uses these identifiers in conditional GET requests to selectively determine whether a requesting computer can access its cached content or must receive new content (Compl. ¶83).

III. The Accused Instrumentality

Product Identification

  • The accused instrumentality is the "shopify.com" website and its associated system and method for distributing webpage content to users (Compl. ¶¶31, 33).

Functionality and Market Context

  • The complaint alleges that Shopify's system operates a content delivery architecture that uses content-based identifiers to manage caching and reduce bandwidth. The system is alleged to use two primary types of identifiers:
    1. Asset File "Fingerprints": For assets like scripts and stylesheets, a hash function is applied to the file's content to generate a "fingerprint," which is then inserted into the asset's filename and its corresponding URI (Compl. ¶¶34, 38). When an asset's content changes, a new fingerprint and URI are generated (Compl. ¶39).
    2. Webpage File "ETags": For the main HTML webpage files, an "ETag" value is generated by applying a hash function to the sequence of bits comprising the file's content (Compl. ¶41).
  • This system allegedly uses these identifiers to control caching via "conditional" HTTP GET requests containing an "If-None-Match" header. A browser or intermediate cache sends the ETag of its stored version to the server. If the server's version has the same ETag, it responds with a "304 Not Modified" message, saving bandwidth. If the ETags do not match, the server sends the new file along with its new ETag (Compl. ¶¶43, 45-48). This process is alleged to allow for the efficient update of cached information and reduce server overhead (Compl. ¶36).

No probative visual evidence provided in complaint.

IV. Analysis of Infringement Allegations

’442 Patent Infringement Allegations

Claim Element (from Independent Claim 10) Alleged Infringing Functionality Complaint Citation Patent Citation
a method, in a system in which a plurality of files are distributed across a plurality of computers Defendant's system distributes webpage files across production servers, origin servers, intermediate cache servers, and endpoint browser caches. ¶53 col. 1:47-54
obtaining a name for a data file, the name being based at least in part on a given function of the data, wherein the data used by the function comprises the contents of the particular file Defendant generates or obtains ETags for its webpage files using a hash function, where the ETags are based on the contents of those files. ¶54 col. 13:5-15
determining, using at least the name, whether a copy of the data file is present on at least one of said computers Defendant's origin servers and intermediate cache servers, upon receiving a conditional GET request with an ETag, compare the ETag to determine if a copy of the content with that ETag is present. ¶55 col. 3:36-40
determining whether a copy of the data file that is present on a at least one of said computers is an unauthorized copy or an unlicensed copy of the data file If the ETag received in a request matches the server's stored ETag, the downstream copy is determined to be authorized; if there is no match, it is determined to be unauthorized. ¶56 col. 40:24-30

’310 Patent Infringement Allegations

Claim Element (from Independent Claim 69) Alleged Infringing Functionality Complaint Citation Patent Citation
A system operable in a network of computers, the system comprising hardware including at least a processor, and software, in combination with said hardware Defendant's system includes a network of computers (production servers, origin servers, caches) comprising hardware with processors and utilizing software such as a web development framework and HTTP protocol software. ¶61 col. 5:5-9
(a) to receive at a first computer, from a second computer, a request regarding a data item, said request including at least a content-dependent name for the data item, the content-dependent name being based at least in part on a function of the data ... wherein the function that was used is a message digest function or a hash function... An upstream cache or origin server (first computer) receives conditional GET requests from a downstream browser or cache (second computer). The requests include URIs containing content-based fingerprints generated by hashing the contents of the asset files. ¶62 col. 37:25-35
(b) in response to said request: (i) to cause the content-dependent name of the data item to be compared to a plurality of values; and (ii) to determine if access to the data item is authorized or unauthorized based on whether or not the content-dependent name corresponds to at least one of said plurality of values... The upstream server maintains a plurality of URI values and compares the URI value received in the request to its stored values. This comparison determines whether the content at the downstream computer is still authorized for use. ¶63 col. 38:4-10
  • Identified Points of Contention:
    • Scope Questions: A central dispute may concern whether the patent family’s terminology, which describes a comprehensive, content-based file system (e.g., "True Name," "file"), can be read to cover the specific technical components of a web caching system (e.g., "ETag," "fingerprinted URI"). Further, the ’442 Patent is titled "Enforcement and Policing of Licensed Content," which raises the question of whether the claimed method of determining if a file is "unauthorized or unlicensed" is limited to a rights-management context or if it can encompass a technical check for stale cache content, as alleged.
    • Technical Questions: What evidence does the complaint provide that an ETag mismatch, which technically signifies that a cached file is out-of-date, performs the specific function of determining that a copy is "unauthorized or unlicensed" as required by claim 10 of the ’442 Patent? The infringement theory appears to equate a technical state (staleness) with a legal or permissive status (unauthorized).

V. Key Claim Terms for Construction

  • The Term: "unauthorized copy or an unlicensed copy" (’442 Patent, cl. 10)

    • Context and Importance: The viability of the infringement allegation for the ’442 Patent may hinge on the construction of this term. The complaint alleges that an ETag mismatch is a determination that a cached file is "unauthorized" (Compl. ¶56). Practitioners may focus on this term because a defendant could argue that a cache validation check is a purely technical function to ensure content freshness and is distinct from a determination of authorization or licensing status.
    • Intrinsic Evidence for Interpretation:
      • Evidence for a Broader Interpretation: The specification states that a goal of the system is to "verify that data retrieved from another location is the desired or requested data" (’442 Patent, col. 4:24-27). This language could be interpreted broadly to include verifying that data is the most current, authorized version.
      • Evidence for a Narrower Interpretation: The patent’s title specifically refers to "Licensed Content," and the specification describes tracking files for "licensing purposes" (’442 Patent, Title; col. 7:8-9). Claim 10 itself recites "unlicensed copy," reinforcing a rights-management context. This evidence may support a narrower construction limited to determinations of legal rights or permissions, not merely content staleness.
  • The Term: "obtaining a name for a data file" (’442 Patent, cl. 10)

    • Context and Importance: The infringement theory depends on construing the generation of an ETag as "obtaining a name." A defendant may argue that an ETag is a cache validator or metadata tag, not a "name" in the sense of a file identifier as contemplated by the patent.
    • Intrinsic Evidence for Interpretation:
      • Evidence for a Broader Interpretation: The patent defines its "True Name" as a "substantially unique data identifier" based on content (’442 Patent, col. 6:8-10). An ETag, being a content-based hash, may fit this functional definition of an identifier.
      • Evidence for a Narrower Interpretation: The specification consistently discusses "True Names" as an alternative to conventional "pathname or contextual name" schemes within a file system hierarchy (’442 Patent, col. 8:25-28, FIG. 2). This context may support a narrower interpretation where the "name" must serve as the primary identifier for a file within a file system, a role an ETag does not typically perform.

VI. Other Allegations

  • Indirect Infringement: The complaint alleges that Defendant "caused" its servers and downstream caches to perform the claimed steps (Compl. ¶¶54-55, 62-63). These allegations suggest a theory of induced infringement based on Defendant’s control over the architecture and operation of its content delivery system, which instructs browsers and caches on how to request and validate content.

VII. Analyst’s Conclusion: Key Questions for the Case

  • A core issue will be one of definitional scope: can the term "unauthorized copy," which in the context of the ’442 Patent’s title and specification suggests a rights-management or licensing framework, be construed to cover a technically stale data file identified by an ETag mismatch in a web caching system?
  • A key related question will be one of functional mapping: does the generation and comparison of ETags and fingerprinted URIs for the purpose of ensuring cache coherency perform the same function as the patents’ claimed system of "obtaining a name" and using it to "control access" or determine "authorization" in a distributed file system?