24. Messagepack metadata
- Status: accepted
- Deciders: @butonic, @aduffeck, @micbar, @dragotin
- Date: 2023-03-15
File metadata management is an important aspect for oCIS as a data platform. While using extended attributes to store metadata allows attaching the metadata to the actual file it causes a significant amount of syscalls that outweigh the benefits. Furthermore, filesystems are subject to different limitations in the number of extended attributes or the value size that is available.
Performance of reading extended attributes suffers from the syscall overhead when listing and reading all attributes. Getting rid of limitations imposed by the filesystem used to store decomposedfs metadata.
Going back to the original ADR-0016 Storage for Files Metadata we decided to use a dedicated file for metadata storage next to the decomposedfs file representing the node. Several options for the data format were considered:
- Use JSON files to store metadata
- Use INI files to store metadata
- Use msgpack files to store metadata
- Use protobuf messages to store metadata
Chosen option: “msgpack files”, because we want to stay with a self describing binary format. This is a performance tradeoff that is faster and more efficient than text based formats and more flexible but less efficient than protobuf.
Note: directory listings are still read from the storage and remain uncached.
- Way less syscalls
- Node metadata can easily be cached, avoiding all trips to the storage until a file changes.
- We need to migrate existing metadata
- We need to build tooling that allows manipulating metadata similar to
setfattr
andgetfattr
.
- Good, human readable
- Good, self describing
- Good, widely used and well understood
- Good, suited for key value like content - exactly what we need for extended attributes
- Bad, slower and less efficient than binary formats
- Good, human readable
- Good, self describing
- Good, widely used and well understood
- Good, could be used for more than just key value
- Bad, slower and less efficient than binary formats
- Good, self describing
- Good, efficient because it is binary encoded
- Good, could be used for more than just key value
- Bad, not human readable - requires tooling to manipulate safely
- Good, very efficient because it is binary encoded
- Good, could be used for more than just key value
- Bad, not human readable
- Bad, not self describing - requires tooling to evolve the messages