On Fri, May 02, 2014 at 01:11:03PM -0400, Dave Jones wrote: > On Fri, May 02, 2014 at 09:44:42AM -0700, Josh Triplett wrote: > > > Topics: > > - Kconfig, and avoiding excessive configurability in the pursuit of tiny > > - Optimizing a kernel for its exact target userspace. > > - Examples of shrinking the kernel > > Something that's partially related here: Making stuff optional > reduces attack surface the kernel presents. We're starting to grow > more and more CONFIG options to disable syscalls. I'd like to hear > peoples reactions on introducing even more optionality in this area. I'd certainly like to see just about every syscall made optional, for userspace that doesn't need it. For specialized systems, that certainly would decrease attack surface. However, seccomp decreases attack surface by the same amount, and for any except those specialized systems that would make more sense, because the set of available syscalls can then change with a simple policy change rather than a new kernel. And this doesn't free us from the obligation to make all new APIs secure against hostile userspace. > I had a patch to make this particular syscall a cond_syscall, but then > XFS ate my homework and I haven't had chance to revisit this. > So, my questions are: > - are there other obvious syscalls we could make optional without userspace > freaking out when they suddenly start getting ENOSYS ? I've attached a complete list of the syscalls from include/linux/syscalls.h that do not appear in kernel/sys_ni.c, and thus always exist. (syscalls.h notably does not include all the arch-specific syscalls, some of which might make sense to leave out as well.) Of those, a few classes of syscalls that seem obvious, for various classes of specialized or legacy-free systems: - For any syscall updated to have a foo2, foo3, etc, a single config option to leave out all the older versions would make sense, to go with userspace that never calls the older versions. - Likewise, the non-64 file calls. - Likewise, sys_old* - splice/vmsplice/tee. - sys_*sync* - sys_clock_* and any other time functions. - sys_sched_* - All signal-related syscalls - rlimit syscalls - sys_*xattr* - sys_nice - sys_cap{get,set} - fadvise, fallocate, readahead, etc. - uid/gid functions. - ioperm/iopl - ptrace - sendfile - times - utimes and company > - how much configurability here is too much ? > r_f_p was an obvious candidate because it's.. well, nasty. Some of the > more straightforward syscalls may not be such a big deal, but then we > have CONFIG's for kcmp and other 'simple' syscalls already.. We need a more systematic mechanism, I think. CONFIG_SYSCALL_FOO for every possible FOO seems too much, even for classes of syscalls. Ideally, we could feed in a table of syscalls collected by some analysis of the target userspace, and the kernel will then have exactly those syscalls. - Josh Triplett