GPTar
Written by Reynir Björnsson, Feb 17, 2024At Robur we developed a piece of software for mirroring or exposing an opam repository. We have it deployed at opam.robur.coop, and you can use it as an alternative to opam.ocaml.org. It is usually more up-to-date with the git opam-repository than opam.ocaml.org although in the past it suffered from occasional availability issues. I can recommend reading Hannes' post about opam-mirror. This article is about adding a partition table to the disk as used by opam-mirror. For background I can recommend reading the previously linked subsection of the opam-mirror article.
The opam-mirror persistent storage scheme
Opam-mirror uses a single block device for its persistent storage. On the block device it stores cached source code archives from the opam repository. These are stored in a tar archive consisting of files whose file name is the sha256 checksum of the file contents. Furthermore, at the end of the block device some space is allocated for dumping the cloned git state of the upstream (git) opam repository as well as caches storing maps from md5 and sha512 checksums respectively to the sha256 checksums. The partitioning scheme is entirely decided by command line arguments. In other words, there is no partition table on the disk image.
This scheme has the nice property that the contents of the tar archive can be inspected by regular tar utilities in the host system.
Due to the append-only nature of tar and in the presence of concurrent downloads a file written to the archive may be partial or corrupt.
Opam-mirror handles this by prepending a pending/
directory to partial downloads and to-delete/
directory for corrupt downloads.
If there are no files after the failed download in the tar archive the file can be removed without any issues.
Otherwise a delete would involve moving all subsequent files further back in the archive - which is too error prone to do robustly.
So using the tar utilities in the host we can inspect how much garbage has accumulated in the tar file system.
The big downside to this scheme is that since the disk partitioning is not stored on the disk the contents can easily become corrupt if the wrong offsets are passed on the command line. Therefore I have for a long time been wanting to use an on-disk partition table. The problem is both MBR and GPT (GUID Partition Table) store the table at the beginning of the disk. If we write a partition table at the beginning it is suddenly not a valid tar archive anymore. Of course, in Mirage we can just write and read the table at the end if we please, but then we lose the ability to inspect the partition table in the host system.
GPT header as tar file name
My first approach, which turned out to be a dead end, was when I realized that a GPT header consists of 92 bytes at the beginning followed by reserved space for the remainder of the LBA. The reserved space should be all zeroes, but it seems no one bothers to enforce this. What's more is that a tar header starts with the file name in the first 100 bytes. This got me thinking we could embed a GPT header inside a tar header by using the GPT header as the tar header file name!
I started working on implementing this, but I quickly realized that 1) the tar header has a checksum, and 2) the gpt header has a checksum as well. Having two checksums that cover each other is tricky. Updating one checksum affects the other checksum. So I started reading a paper written by Martin Stigge et al. about reversing CRC as the GPT header use CRC32 checksum. I ended up writing something that I knew was incorrect.
Next, I realized the GPT header's checksum only covers the first 92 bytes - that is, the reserved space is not checksummed!
I find this and the fact that the reserved space should be all zeroes but no one checks odd about GPT.
This simplified things a lot as we don't have to reverse any checksums!
Then I implemented a test binary that produces a half megabyte disk image with a hybrid GPT and tar header followed by a tar archive with a file test.txt
whose content is Hello, World!
.
I had used the byte G
as the link indicator.
In POSIX.1-1988 the link indicators A
-Z
are reserved for vendor specific extensions, and it seemed G
was unused.
A mistake I made was to not update the tar header checksum - the ocaml-tar library doesn't support this link indicator value so I had manually updated the byte value in the serialized header but forgot to update the checksum.
This was easily remediated as the checksum is a simple sum of the bytes in the header.
The changes made are viewable on GitHub.
I also had to work around a bug in ocaml-tar.
GNU tar was successfully able to list the archive.
A quirk is that the archive will start with a dummy file GPTAR
which consists of any remaining space in the first LBA if the sector size is greater than 512 bytes followed by the partition table.
Protective MBR
Unfortunately, neither fdisk nor parted recognized the GPT partition table. I was able to successfully read the partition table using ocaml-gpt however. This puzzled me. Then I got a hunch: I had read about protective MBRs on the Wikipedia page on GPT. I had always thought it was optional and not needed in a new system such as Mirage that doesn't have to care too much about legacy code and operating systems.
So I started comparing the layout of MBR and tar.
The V7 tar format only uses the first 257 bytes of the 512 byte block.
The V7 format is differentiated by the UStar, POSIX/pax and old GNU tar formats by not having the string ustar
at byte offset 257[1].
The master boot record format starts with the bootstrap code area.
In the classic format it is the first 446 bytes.
In the modern standard MBR format the first 446 bytes are mostly bootstrap code too with the exception of a handful bytes at offset 218 or so which are used for a timestamp or so.
This section overlaps with the tar V7 linked file name field.
In both formats these bytes can be zero without any issues, thankfully.
This is great!
This means we can put a tar header in the bootstrap code area of the MBR and have it be a valid tar header and MBR record at the same time.
The protective MBR has one partition of type 0xEE
whose LBA starts at sector 1 and the number of LBAs should cover the whole disk, or be 0xFFFFFFFF
(maximum representable number in unsigned 32 bit).
In practice this means we can get away with only touching byte offsets 446-453 and 510-511 for the protective MBR.
The MBR does not have a checksum which also makes things easier.
Using this I could create a disk image that parted and fdisk recognized as a GPT partitioned disk!
With the caveat that they both reported that the backup GPT header was corrupt.
I had just copied the primary GPT header to the end of the disk.
It turns out that the alternate, or backup, GPT header should have the current LBA and backup LBA fields swapped (and the header crc32 recomputed).
I updated the ocaml-gpt code so that it can marshal alternate GPT headers.
Finally we can produce GPT partitioned disks that can be inspected with tar utilities!
$ /usr/sbin/parted disk.img print
WARNING: You are not superuser. Watch out for permissions.
Model: (file)
Disk /home/reynir/workspace/gptar/disk.img: 524kB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: pmbr_boot
Number Start End Size File system Name Flags
1 17.4kB 19.5kB 2048B Real tar archive hidden
$ tar -tvf disk.img
?r-------- 0/0 16896 1970-01-01 01:00 GPTAR unknown file type ‘G’
-r-------- 0/0 14 1970-01-01 01:00 test.txt
The code is freely available on GitHub.
Future work
One thing that bothers me a bit is the dummy file GPTAR
.
By using the G
link indicator GNU tar will print a warning about the unknown file type G
,
but it will still extract the dummy file when extracting the archive.
I have been thinking about what tar header I could put in the MBR so tar utilities will skip the partition table but not try to extract the dummy file.
Ideas I've had is to:
- Pretend it is a directory with a non-zero file size which is nonsense. I'm unsure what tar utilities would do in that case. I fear not all implementations will skip to the next header correctly as a non-zero directory is nonsense. I may give it a try and check how GNU tar, FreeBSD tar and ocaml-tar react.
- Say it is a PAX extended header and use a nonsense tag or attribute whose value covers the GPT header and partition table.
The problem is the PAX extended header content format is
<length> <tag>=<value>\n
where<length>
is the decimal string encoding of the length of<tag>=<value>\n
. In other words it must start with the correct length. For sector size 512 this is a problem because the PAX extended header content would start with the GPT header which starts with the stringEFI PART
. If the sector size is greater than 512 we can use the remaining space in LBA 0 to write a length, dummy tag and some padding. I may try this for a sector size of 4096, but I'm not happy that it doesn't work with sector size 512 which solo5 will default to.
If you have other ideas what I can do please reach out!
-
This is somewhat simplified. There are some more nuances between the different formats, but for this purpose they don't matter much.
↩︎︎