Use seccomp policy to avoid necessary sync operations#44
Use seccomp policy to avoid necessary sync operations#44talex5 merged 2 commits intoocurrent:masterfrom
Conversation
|
Good plan! I hadn't realised you could force syscalls to succeed using seccomp too. |
|
Hmm, looks like the version of |
Sync operations are really slow on btrfs. They're also pointless, since
if the computer crashes while we're doing a build then we'll just throw
it away and start again anyway.
This commit provides a seccomp policy that causes all sync operations to
"fail", with errno 0 ("success").
On my machine, this reduces the time to `apt-get install -y shared-mime-info`
from 18.5s to 4.7s.
Based on https://bblank.thinkmo.de/using-seccomp-to-filter-sync-operations.html
Use `--fast-sync` to enable to new behaviour (requires the latest runc).
This should allow `linux32` to work.
| match get_machine () with | ||
| | "x86_64" -> ["SCMP_ARCH_X86_64"; "SCMP_ARCH_X86"; "SCMP_ARCH_X32"] | ||
| | "aarch64" -> ["SCMP_ARCH_AARCH64"; "SCMP_ARCH_ARM"] | ||
| | _ -> [] |
There was a problem hiding this comment.
could we enumerate this somehow so that it'll fail on an unknown arch? Otherwise we'll run into this when adding riscv-32 in the future. (or could just make a note to remember to update this somewhere when we get around to riscv32)
There was a problem hiding this comment.
there is logic in https://github.com/avsm/osrelease/blob/master/lib/osrelease.ml that i could release that does all the arch detection (based on opams), if that helps
There was a problem hiding this comment.
Could do. But when we add a new multi-arch platform then we'll test it and discover the problem immediately anyway.
|
Merging now to fix cluster performance problems. Can be improved later if needed. |
Might help with problems such as this: ``` [11030132.006555] INFO: task ocluster-worker:602217 blocked for more than 120 seconds. [11030132.015596] Not tainted 5.4.0-40-generic ocurrent#44-Ubuntu [11030132.022547] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [11030132.032061] ocluster-worker D 0 602217 1 0x00004000 [11030132.032069] Call Trace: [11030132.032092] __schedule+0x2e3/0x740 [11030132.032106] ? __switch_to_asm+0x40/0x70 [11030132.032116] ? __switch_to_asm+0x34/0x70 [11030132.032126] schedule+0x42/0xb0 [11030132.032130] schedule_preempt_disabled+0xe/0x10 [11030132.032132] __mutex_lock.isra.0+0x182/0x4f0 [11030132.032142] ? try_to_del_timer_sync+0x54/0x80 [11030132.032145] __mutex_lock_slowpath+0x13/0x20 [11030132.032148] mutex_lock+0x2e/0x40 [11030132.032199] btrfs_start_delalloc_roots+0x60/0x280 [btrfs] [11030132.032238] flush_space+0x5dd/0x740 [btrfs] [11030132.032281] ? lock_extent_buffer_for_io+0x370/0x370 [btrfs] [11030132.032325] ? __clear_extent_bit+0x201/0x4a0 [btrfs] [11030132.032372] priority_reclaim_metadata_space.isra.0+0x18b/0x220 [btrfs] [11030132.032429] ? can_overcommit.part.0+0x5f/0xc0 [btrfs] [11030132.032466] btrfs_reserve_metadata_bytes+0x578/0x950 [btrfs] [11030132.032501] ? btrfs_truncate_inode_items+0x35e/0xdb0 [btrfs] [11030132.032505] ? __mutex_lock.isra.0+0x429/0x4f0 [11030132.032557] ? __btrfs_block_rsv_release+0x1c1/0x300 [btrfs] [11030132.032595] btrfs_block_rsv_refill+0x7d/0xa0 [btrfs] [11030132.032628] evict_refill_and_join+0x39/0xd0 [btrfs] [11030132.032670] btrfs_evict_inode+0x417/0x4c0 [btrfs] [11030132.032689] evict+0xd2/0x1b0 [11030132.032698] iput+0x148/0x210 [11030132.032708] dentry_unlink_inode+0xc6/0x110 [11030132.032720] d_delete+0x76/0x80 [11030132.032727] vfs_rmdir+0x179/0x1a0 [11030132.032732] do_rmdir+0x18c/0x1c0 [11030132.032736] __x64_sys_rmdir+0x17/0x20 [11030132.032744] do_syscall_64+0x57/0x190 [11030132.032747] entry_SYSCALL_64_after_hwframe+0x44/0xa9 ```
CHANGES: - Add support for nested / multi-stage builds (@talex5 ocurrent/obuilder#48 ocurrent/obuilder#49). This allows you to use a large build environment to create a binary and then copy that into a smaller runtime environment. It's also useful to get better caching if two things can change independently (e.g. you want to build your software and also a linting tool, and be able to update either without rebuilding the other). - Add healthcheck feature (@talex5 ocurrent/obuilder#52). - Checks that Docker is running. - Does a test build using busybox. - Clean up left-over runc containers on restart (@talex5 ocurrent/obuilder#53). If btrfs crashes and makes the filesystem read-only then after rebooting there will be stale runc directories. New jobs with the same IDs would then fail. - Remove dependency on dockerfile (@talex5 ocurrent/obuilder#51). This also allows us more control over the formatting (e.g. putting a blank line between stages in multi-stage builds). - Record log output from docker pull (@talex5 ocurrent/obuilder#46). Otherwise, it's not obvious why we've stopped at a pull step, or what is happening. - Improve formatting of OBuilder specs (@talex5 ocurrent/obuilder#45). - Use seccomp policy to avoid necessary sync operations (@talex5 ocurrent/obuilder#44). Sync operations are really slow on btrfs. They're also pointless, since if the computer crashes while we're doing a build then we'll just throw it away and start again anyway. Use a seccomp policy that causes all sync operations to "fail", with errno 0 ("success"). On my machine, this reduces the time to `apt-get install -y shared-mime-info` from 18.5s to 4.7s. Use `--fast-sync` to enable to new behaviour (it requires runc 1.0.0-rc92). - Use a mutex to avoid concurrent btrfs operations (@talex5 ocurrent/obuilder#43). Btrfs deadlocks enough as it is. Don't stress it further by trying to do two things at once. Internal changes: - Improve handling of file redirections (@talex5 ocurrent/obuilder#46). Instead of making the caller do all the work of closing the file descriptors safely, add an `FD_move_safely` mode. - Travis tests: ensure apt cache is up-to-date (@talex5 ocurrent/obuilder#50).
Sync operations are really slow on btrfs. They're also pointless, since if the computer crashes while we're doing a build then we'll just throw it away and start again anyway.
This commit provides a seccomp policy that causes all sync operations to "fail", with errno 0 ("success").
On my machine, this reduces the time to
apt-get install -y shared-mime-infofrom 18.5s to 4.7s.Based on https://bblank.thinkmo.de/using-seccomp-to-filter-sync-operations.html