ownCloud
Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Back to homepage

24. Messagepack metadata

Context and Problem Statement

File metadata management is an important aspect for oCIS as a data platform. While using extended attributes to store metadata allows attaching the metadata to the actual file it causes a significant amount of syscalls that outweigh the benefits. Furthermore, filesystems are subject to different limitations in the number of extended attributes or the value size that is available.

Decision Drivers

Performance of reading extended attributes suffers from the syscall overhead when listing and reading all attributes. Getting rid of limitations imposed by the filesystem used to store decomposedfs metadata.

Considered Options

Going back to the original ADR-0016 Storage for Files Metadata we decided to use a dedicated file for metadata storage next to the decomposedfs file representing the node. Several options for the data format were considered:

  • Use JSON files to store metadata
  • Use INI files to store metadata
  • Use msgpack files to store metadata
  • Use protobuf messages to store metadata

Decision Outcome

Chosen option: “msgpack files”, because we want to stay with a self describing binary format. This is a performance tradeoff that is faster and more efficient than text based formats and more flexible but less efficient than protobuf.

Note: directory listings are still read from the storage and remain uncached.

Positive Consequences:

  • Way less syscalls
  • Node metadata can easily be cached, avoiding all trips to the storage until a file changes.

Negative Consequences:

  • We need to migrate existing metadata
  • We need to build tooling that allows manipulating metadata similar to setfattr and getfattr.

Pros and Cons of the Options

Ini files

  • Good, human readable
  • Good, self describing
  • Good, widely used and well understood
  • Good, suited for key value like content - exactly what we need for extended attributes
  • Bad, slower and less efficient than binary formats

JSON files

  • Good, human readable
  • Good, self describing
  • Good, widely used and well understood
  • Good, could be used for more than just key value
  • Bad, slower and less efficient than binary formats

Msgpack files

  • Good, self describing
  • Good, efficient because it is binary encoded
  • Good, could be used for more than just key value
  • Bad, not human readable - requires tooling to manipulate safely

protobuf files

  • Good, very efficient because it is binary encoded
  • Good, could be used for more than just key value
  • Bad, not human readable
  • Bad, not self describing - requires tooling to evolve the messages