From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.1 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 99180C433E1 for ; Sun, 31 May 2020 21:04:04 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3DB33206E2 for ; Sun, 31 May 2020 21:04:04 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="cHArp5MH" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3DB33206E2 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A9D9380007; Sun, 31 May 2020 17:04:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A757B8E0003; Sun, 31 May 2020 17:04:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9B2F480007; Sun, 31 May 2020 17:04:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0091.hostedemail.com [216.40.44.91]) by kanga.kvack.org (Postfix) with ESMTP id 83C3C8E0003 for ; Sun, 31 May 2020 17:04:03 -0400 (EDT) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 3B3D7485F for ; Sun, 31 May 2020 21:04:03 +0000 (UTC) X-FDA: 76878241566.19.page46_63685f047006 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin19.hostedemail.com (Postfix) with ESMTP id 1C8591AD1B3 for ; Sun, 31 May 2020 21:04:03 +0000 (UTC) X-HE-Tag: page46_63685f047006 X-Filterd-Recvd-Size: 4882 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf39.hostedemail.com (Postfix) with ESMTP for ; Sun, 31 May 2020 21:04:02 +0000 (UTC) Received: from mail-wm1-f52.google.com (mail-wm1-f52.google.com [209.85.128.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 551C12074A for ; Sun, 31 May 2020 21:04:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1590959041; bh=SLulN+aWlozERqwepn37XPPXVtQGVg1dNiexWD7wTdw=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=cHArp5MH1x1xT1yYBp6jPx09P82zfcOHtyoHPLAlQ0GDBrZwlw4vBfcbLPfNS+AMi 4aUL2zMk82k5c3KhzkZcyHgrVBdmipaedwP2qs9FHyOcFUsfHOPk85xI/zqiya8jfh z/9VoE5LeTuiJNmtCYoOLFSSZDvzmae6uykPpSrk= Received: by mail-wm1-f52.google.com with SMTP id g10so1257444wmh.4 for ; Sun, 31 May 2020 14:04:01 -0700 (PDT) X-Gm-Message-State: AOAM531YeTE39NJDCfJ/0YSUGtiqMmwjjcyrp96IiYz3aVgBYBSDZyV6 qMQjL3kMdtx+9VhMnI3E54Ngl103PCknLfVdMZgf0g== X-Google-Smtp-Source: ABdhPJx52LKrvAzhvylZ66QDOXJGseqtTn9m3YUISFVsp5V3Uiyt2NGce9gfQ7vOrglgz68eqsDcynAzmtbq5SzAzPQ= X-Received: by 2002:a1c:abc3:: with SMTP id u186mr18469081wme.21.1590959039832; Sun, 31 May 2020 14:03:59 -0700 (PDT) MIME-Version: 1.0 References: <85367hkl06.fsf@collabora.com> <079539BF-F301-47BA-AEAD-AED23275FEA1@amacapital.net> <50a9e680-6be1-ff50-5c82-1bf54c7484a9@gmail.com> In-Reply-To: From: Andy Lutomirski Date: Sun, 31 May 2020 14:03:48 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH RFC] seccomp: Implement syscall isolation based on memory areas To: Andy Lutomirski Cc: Paul Gofman , Gabriel Krisman Bertazi , Linux-MM , LKML , kernel@collabora.com, Thomas Gleixner , Kees Cook , Will Drewry , "H . Peter Anvin" , Zebediah Figura Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 1C8591AD1B3 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam03 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sun, May 31, 2020 at 11:57 AM Andy Lutomirski wrote: > > > What if there was a special filter type that ran a BPF program on each > syscall, and the program was allowed to access user memory to make its > decisions, e.g. to look at some list of memory addresses. But this > would explicitly *not* be a security feature -- execve() would remove > the filter, and the filter's outcome would be one of redirecting > execution or allowing the syscall. If the "allow" outcome occurs, > then regular seccomp filters run. Obviously the exact semantics here > would need some care. Let me try to flesh this out a little. A task could install a syscall emulation filter (maybe using the seccomp() syscall, maybe using something else). There would be at most one such filter per process. Upon doing a syscall, the kernel will first do initial syscall fixups (e.g. SYSENTER/SYSCALL32 magic argument translation) and would then invoke the filter. The filter is an eBPF program (sorry Kees) and, as input, it gets access to the task's register state and to an indication of which type of syscall entry this was. This will inherently be rather architecture specific -- x86 choices could be int80, int80(translated), and syscall64. (We could expose SYSCALL32 separately, I suppose, but SYSENTER is such a mess that I'm not sure this would be productive.) The program can access user memory, and it returns one of two results: allow the syscall or send SIGSYS. If the program tries to access user memory and faults, the result is SIGSYS. (I would love to do this with cBPF, but I'm not sure how to pull this off. Accessing user memory is handy for making the lookup flexible enough to detect Windows vs Linux. It would be *really* nice to finally settle the unprivileged eBPF subset discussion so that we can figure out how to make eBPF work here.) execve() clears the filter. clone() copies the filter. Does this seem reasonable? Is the implementation complexity small enough? Is the eBPF thing going to be a showstopper? Using a signal instead of a bespoke thunk simplifies a lot of thorny details but is also enough slower that catching all syscalls might be a performance problem.