Commit https://github.com/QubesOS/qubes-linux-utils/commit/c1d42f1 --
"qfile-unpacker: do not call fdatasync() at each file" fixing
QubesOS/qubes-issues#1257 -- increased the chance of data loss with
qvm-move-to-vm: Say it nominally succeeds, and *deletes* the files from
the source VM. Soon after, the destination VM or the system could crash,
or an external drive hosting ~/QubesIncoming/srcVM could get unplugged
by accident, all before the data had really been persisted to disk.
But reverting the commit (ignoring the performance issue) wouldn't
completely solve this:
"Calling fsync() does not necessarily ensure that the entry in the
directory containing the file has also reached disk. For that an
explicit fsync() on a file descriptor for the directory is also
needed." - fsync(2)
It gets even worse for "slow symlinks" (whose target is too long to be
stored directly in the inode metadata), apparently they can't be synced
at all individually.
So instead, just call syncfs() once after everything has been unpacked:
+ Should prevent all data loss (if fs and disk are well behaved)
+ Allows caching and reordering -> no slowdown with many small files
- Blocks until any unrelated writes on the filesystem finish :\
The filesystem hosting ~/QubesIncoming/srcVM/ needs to support O_TMPFILE
too, in addition to the kernel. If it doesn't, take the use_tmpfile = 0
fallback.
POSIX requires that a read(2) which can be proved to occur after a
write() has returned returns the new data.
We want here only that other processes in the same VM will see the
file either fully written, or not see it at all. So ensuring that
linkat(2) is called after write is completed should be enough.
FixesQubesOS/qubes-issues#1257
When file opened with O_TMPFILE but use_tmpfile==0, the file will not be
linked to the directory (the code at the end of process_one_file_reg).
Additionally it is waste of time trying using O_TMPFILE when it's
already known it shouldn't be.
Also use_tmpfile==0 can mean we don't have access to /proc
(set_procfs_fd wasn't called), so even if linking the file to its
directory would be attempted, it would fail. This is the case for
dom0-updates copy.
Otherwise source domain can modify (append) the file while the user
already is accessing it. While incoming files should be treated as
untrusted, this problem could allow file modification after the user
makes some sanity checks.
By passing an empty file with a declared negative size,
a hostile VM can decrease the total bytes counter, while
not have do supply a huge amount of data, thus disabing
the byte size check, and potentially filling the target
filesystem.
Also do not rely on unpack being called just once if we don't
have to and initialize counts.
Since we don't know directory size before populating with files,
we just accumulate the size on the second pass, but do not actually
check for the limit being reached. If there's any file after that,
that'll trip the check.