From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A4E1DE7717F for ; Mon, 16 Dec 2024 18:35:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 37F106B00A8; Mon, 16 Dec 2024 13:35:23 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 308296B00B7; Mon, 16 Dec 2024 13:35:23 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1D0EB6B00B8; Mon, 16 Dec 2024 13:35:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id E95166B00A8 for ; Mon, 16 Dec 2024 13:35:22 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id A22D3801EC for ; Mon, 16 Dec 2024 18:35:22 +0000 (UTC) X-FDA: 82901673960.10.75E3E10 Received: from mail-ot1-f54.google.com (mail-ot1-f54.google.com [209.85.210.54]) by imf27.hostedemail.com (Postfix) with ESMTP id CF2BC40005 for ; Mon, 16 Dec 2024 18:34:47 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=R+5AEpKi; spf=pass (imf27.hostedemail.com: domain of jeffxu@chromium.org designates 209.85.210.54 as permitted sender) smtp.mailfrom=jeffxu@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734374088; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=oXEK7Vse6aakb2JheJS8WYLCYFeLAE0epOYSkGGSTqM=; b=1R9Z7BPfYnTF22X92Ckxtwjaj44to8Y3JjRtKRTEBwsUcPc1V4Wdb4N0OCrHtGkvvoCov7 AwTYTSkzhZgYnbjfaoX2MqWqN0KVxthKYNzNvEx9m/w2xb7bKpK37ElC4dGIEjq46GAOGY FbLs51eEXEietdbLb9PopgKQ+mZXfhg= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734374088; a=rsa-sha256; cv=none; b=RQBXHh43XJVQlC1kLL2qiIzi/vwpch7CXSA8YYU1Z5uenwiB/d9qtGfSuCJxqvG+xtfES2 xurPd14oxRWPEb1lpazau1Mb9x3HjRjbny9ngRNhQhL/9pJ+wMTofFlj/oE6WZ6kVNzgfk BUCAlZpzXd6xAh0MsXv5fPpAHX3FO3Y= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=R+5AEpKi; spf=pass (imf27.hostedemail.com: domain of jeffxu@chromium.org designates 209.85.210.54 as permitted sender) smtp.mailfrom=jeffxu@chromium.org; dmarc=pass (policy=none) header.from=chromium.org Received: by mail-ot1-f54.google.com with SMTP id 46e09a7af769-71e2bc22b3bso534663a34.3 for ; Mon, 16 Dec 2024 10:35:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1734374119; x=1734978919; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=oXEK7Vse6aakb2JheJS8WYLCYFeLAE0epOYSkGGSTqM=; b=R+5AEpKiI5//ztmtR++HKevPu2vRxdNJlNuh3EC8Ebrr1e2Hn7iskSw0KEGnPZU9GX MEgnkK43wzDSRXmJjBBuoky1AKTPcwjs0pXeKO4SdZZeE4pnJn2BCLOHDuQQowlse10B Irg0HbeSjeiLcTfVbiF2f9Qw9nFH0/HJwyiZ0= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734374119; x=1734978919; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=oXEK7Vse6aakb2JheJS8WYLCYFeLAE0epOYSkGGSTqM=; b=gTv34sjNaH49qVpZeM+zKyeBxaQUahDwA/18hEH1law98lwgavyvQuM3Fu/AAiKETL svSY7EDhxtQ3US1oDf3+rkkWOmRmEEcNmnxV4UhxAPi0dBBW6kCO8HKC+2IeVrO7I8E0 gPM04cZIICVjPirzURl6iwEFoSCuvkHYiNSmYfA1rpQ1eAy29DwJkkO+0LReOaWK5U7U 31gh5sBAp/HKXjYvU4XjJZ2nCJudYPGs1qROFksNID75LrVOOauc3mvTsV68kbIs+pSL XA9k/+zGnoiW0c57hbGiQpWa6GCW26Z3wFSceGAfoyzsl1/AqqPWnbV0thFCYmfR100v OfDg== X-Forwarded-Encrypted: i=1; AJvYcCX/CQwoDbNw5iIZcGt44r2GfTJ5YKDpiEoQxUVsgJQgR2VrkxwuNNkoI8TyAT8Qt6bkCg48NYurMA==@kvack.org X-Gm-Message-State: AOJu0YytodmJKXU5FKAQciELTjbfal0L+Obqtv2PQRRinxwo9uAhW5vO A93UHCz02JYjt0Qm/cIMMfGD1CfeVPMHOHKyVXy2JbVlAA6+fUcj9AWB3gJuYbKh5FC6PxlToHs LtyF3A8Fr/ydrskJiExyuSDQ19G5sRe9yMY3k X-Gm-Gg: ASbGncvrdLfsZRzrpjJRt6giFTFl5Eo8u2sHWwgphvaP+LrbWRxcQyNrZdExSlm9Bcx LU379dcIpN4MCQJT87fc7tZhYMOJrVaHbJPcNLgbSPRUtWNM+/5wxkqxIcbre/6I5eA== X-Google-Smtp-Source: AGHT+IExY3RjjFLn3tPMx9+giIoAqAfP5Lu05qFPLELxlUn5L2uO9fAM5yo6LSyh1i1i6mjZIwzEf4LGOzkSZN571NU= X-Received: by 2002:a05:6870:9688:b0:291:cb6:f3cd with SMTP id 586e51a60fabf-2a3ac77e492mr2908435fac.8.1734374119592; Mon, 16 Dec 2024 10:35:19 -0800 (PST) MIME-Version: 1.0 References: <20241125202021.3684919-1-jeffxu@google.com> <20241125202021.3684919-2-jeffxu@google.com> In-Reply-To: From: Jeff Xu Date: Mon, 16 Dec 2024 10:35:06 -0800 Message-ID: Subject: Re: [PATCH v4 1/1] exec: seal system mappings To: Andrei Vagin Cc: akpm@linux-foundation.org, keescook@chromium.org, jannh@google.com, torvalds@linux-foundation.org, adhemerval.zanella@linaro.org, oleg@redhat.com, linux-kernel@vger.kernel.org, linux-hardening@vger.kernel.org, linux-mm@kvack.org, jorgelo@chromium.org, sroettger@google.com, ojeda@kernel.org, adobriyan@gmail.com, anna-maria@linutronix.de, mark.rutland@arm.com, linus.walleij@linaro.org, Jason@zx2c4.com, deller@gmx.de, rdunlap@infradead.org, davem@davemloft.net, hch@lst.de, peterx@redhat.com, hca@linux.ibm.com, f.fainelli@gmail.com, gerg@kernel.org, dave.hansen@linux.intel.com, mingo@kernel.org, ardb@kernel.org, Liam.Howlett@oracle.com, mhocko@suse.com, 42.hyeyoo@gmail.com, peterz@infradead.org, ardb@google.com, enh@google.com, rientjes@google.com, groeck@chromium.org, mpe@ellerman.id.au, Dmitry Safonov <0x7f454c46@gmail.com>, Mike Rapoport , Alexander Mikhalitsyn , Andrei Vagin Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: CF2BC40005 X-Rspam-User: X-Rspamd-Server: rspam07 X-Stat-Signature: aekjap4hwaan63xkkwubuqhcs7qd3t7s X-HE-Tag: 1734374087-678930 X-HE-Meta: U2FsdGVkX18WAk8Fd3gOVs3q2sUiEV9f94fIUIr7vRfC8rJhL8FQ0YXF82l/wXBmKvtYxa4PtgTYmioKP8XHjx2iWJyJ4yNl1P7tE8PrMotmcWd/H9Wm7PfMaj1jWsmd870yGEOjiXwE/a+8BJcZ3j2bZMKWi6ps51e95N3Fv7kl2ddQer8IdRLI+eHLtEzWrmmpp7/2oeuynSusT8FazWE2KsKdJ5gkiZcbF4hO9ZgbR01BHpYZyTnLXpsumjyQxI16Mgo5MJYMvbIVURIqJ3INOM7+VJTKqc/lbTg8JeBq03gl0sS3Z3QDq8kSFGrzVTsyWjhKBudiR8wUWIqQyzB3JqGWovTU+DXGdmEC0dE78Q1Q+z86iVwX7BZwp0fTAFh2fJXDJbrzvB132UzV374hv6zAOFLnzHCxPiNX66LfAPyQu//rOZxFF4/ssE/3otBFxdnl/xrGftihxA5IjnoutR2ks/9+UsYhKrQq7cmzXprAF45p/R9b+kEE8e1a4FFbZrC9qn8KSU76WJGqjNaTNstAg9chk3+Jzf77yUPkWwObG9MJavAQ9e/+6SR5i/mEXbM4AgOBjcheNxjwps7HdENm0GrmJ6eIauJBTQqwPlJLDNy+bj1xhxgGu9CLg+ISBi7mxX5jObJoa/XYzDqW4HF5XCAOdGkzpNb6bF4H7AEI7gKGYRBrs1t0qZlZnKx1nX3UZkVLvp6R8QxFh+LjFYYT3OSZ+2F3QalfbvDTkS/epbASX+M/y5LqaYwoNqoSeKfe02TACBFGhpNDHFiSWzz10Zeolog8xCkWM+emwJyL0eqSEIWBjQEq+hY4K04NTf4XOv0fhIgRt9cqQbnfAq1P/73nnlnwu8cLZORkkO55Mc0sLtkWPL5glgJdcUAdZmpnqTFtak5vzf57CfzYwG9omq78dZFFnX0rhCbXlvbRdbDt40QvjEX7Tq/xuGVRjcwnDyJCZN1l/e4 dSj1wRnT FRQU+fW4lqvSgzTRI2/1Tl3JrCm8w0kfTTt1WnUvpAzTfn4xIPYJ84Jaq/ZNIP9+4zpPNtGjEZ54Ay6dvqvP0+Mppt8EDbdGxFA+DYtr3TqN0lXbR9EJJMpnTLU3Q5mO3uE2eK/6/xdb3NVMf9g7mUV/Eu7m8tF4VsEmOexBsnLc8/iXityqmgKeat5kJRSL0B/IaWqa1kqGkeRgvl46vk4qQ17zVwP4j2f5vnRj9l+/TZDeLtcouFy7jHgdVewSw34lVnps+x+BZI71zBxYPA//v6g3W9IpfNyIEqcfQmmioCGgTs+4RASsQHUcCzzFNoKTS X-Bogosity: Ham, tests=bogofilter, spamicity=0.172929, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Andrei On Thu, Dec 12, 2024 at 10:33=E2=80=AFPM Andrei Vagin wr= ote: > > On Wed, Dec 11, 2024 at 2:47=E2=80=AFPM Jeff Xu wro= te: > > > > Hi Andrei > > > > Thanks for your email. > > I was hoping to get some feedback from CRIU devs, and happy to see you > > reaching out.. > > > ... > > I have been thinking of other alternatives, but those would require > > more understanding on CRIU use cases. > > One of my questions is: Would CRIU target an individual process? or > > entire systems? > > It targets individual processes that have been forked from the main > CRIU process. > > > > > If it is an individual process, we could use prctl to opt-in/opt-out > > certain processes. There could be two alternatives. > > 1> Opt-in solution: process must set prctl.seal_criu_mapping, this > > needs to be set before execve() because sealing is applied at execve() > > call. > > 2> opt-out solution: The system will by default seal all of the system > > mappings, but individual processes can opt-out by setting > > prctl.not_seal_criu_mappings. This also needs to be set before > > execve() call. > > I like the idea and I think the opt-out solution should work for CRIU. > CRIU will be able to call this prctl and re-execute itself. > Great! Let's iterate on the opt-out solution then. > Let me give you a bit of context on how CRIU works. When CRIU restores > processes, it recreates a process tree by forking itself. Afterwards, it > restores all mappings in each process but doesn't put them to proper > addresses. After that, each process unmaps CRIU mappings from its address > space and remaps its restored mappings to the proper addresses. So CRIU s= hould > be able to move system mappings and seal them if they have been sealed be= fore > dump. Thanks for the context. > BTW, It isn't just about CRIU. gVisor and maybe some other sandbox soluti= ons > will be affected by this change too. gVisor uses stub-processes to repres= ent > guest address spaces. In a stub process, it unmaps all system mappings. > > > > > For both cases, we will want to identify what type of mapping CRIU > > cares about, i.e. maybe CRIU doesn't care about uprobe and vsyscall ? > > and only care about vdso/vvar/sigpage ? > > As for now, it handles only vdso/vvar/sigpage mappings. It doesn't care > about vsyscall because it is always mapped to the fixed address. > Given this understanding that CRIU intends to replace the current process's vdso/vvar with that of the restored process, and therefore doesn't want the parent CRIU process to seal the vdso/vvar, a prctl opt-out for vdso/vvar is reasonable path going forward. The sigpage mapping also should be included in this opt-out, for the same reason as vdso/vvar, it is created by the arch_setup_additional_pages() call during execve(). However, the uprobe mapping shouldn't be included by this opt-out, as it is not created by arch_setup_additional_pages() during execveat(). CRIU should simply restore it from the restored process, if present. vsyscall, which is created when the system boots, and maps to a fixed virtual address and page, shouldn't be included by this opt-out. So I'm proposing to opt-out vdso/vvar/sigpage with a new prctl: disable_mseal_criu_system_mappings =3D true/false What do you think ? > gVisor should be able to unmap all system mappings from a process > address space. > Do you think this opt-out solution will work for gVisor too ? Thanks -Jeff > Thanks, > Andrei