From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 35CDBE7717D for ; Wed, 11 Dec 2024 22:47:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B88B26B007B; Wed, 11 Dec 2024 17:47:02 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B38786B0082; Wed, 11 Dec 2024 17:47:02 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A26D76B0085; Wed, 11 Dec 2024 17:47:02 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 860966B007B for ; Wed, 11 Dec 2024 17:47:02 -0500 (EST) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 2E04E16142E for ; Wed, 11 Dec 2024 22:47:02 +0000 (UTC) X-FDA: 82884164412.12.C416661 Received: from mail-oa1-f41.google.com (mail-oa1-f41.google.com [209.85.160.41]) by imf08.hostedemail.com (Postfix) with ESMTP id 96967160008 for ; Wed, 11 Dec 2024 22:46:44 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=Oznco0Ql; spf=pass (imf08.hostedemail.com: domain of jeffxu@chromium.org designates 209.85.160.41 as permitted sender) smtp.mailfrom=jeffxu@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733957209; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=95t+hQrjG4o8vA5zoKM7vBXIQf303T8EyerxMKuzCW0=; b=K1oMddWP7bveX3a6/WRPrDH4yT5iBfF+X5c2y/x+xabIfFrPmeBrjNintj019nLtcCoPXX owSLXKVGjV3+K+FGkJ1G0jZneWBziWUt5gZ8gX6CfHI6h0H6BMwkk+66W7cBU03QHHMCW5 9K7+Yv0Olrz5bDfrKpANerCnzhOdf9U= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733957209; a=rsa-sha256; cv=none; b=qAlme2Z+cXubMB1YT3aj8px0SuN7Uq0pDYik/blB6xv2PR+dtePR/IEoqlNQkfLguoToZi ZRyxSLvAxn1GjFJN6E9CPhgeF9SPYCKCNLZQ3kLyPJDhPsXERZkDP1lgUmssob51otPTX9 a2dMegZdBDG9OqgxXPdXDS+1U2U7DQE= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=Oznco0Ql; spf=pass (imf08.hostedemail.com: domain of jeffxu@chromium.org designates 209.85.160.41 as permitted sender) smtp.mailfrom=jeffxu@chromium.org; dmarc=pass (policy=none) header.from=chromium.org Received: by mail-oa1-f41.google.com with SMTP id 586e51a60fabf-29fd8fed5bcso253704fac.0 for ; Wed, 11 Dec 2024 14:47:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1733957219; x=1734562019; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=95t+hQrjG4o8vA5zoKM7vBXIQf303T8EyerxMKuzCW0=; b=Oznco0QlsiBUOM6GvobbnLl5NOJGhtZszsvVIqlZ2/n94fbzRezGefFksydbCWzFc5 zL7aIoNiAB5WEgE9pNasTGp3BQF9skB3QzDH62JRra2tiUWY9PQMrx/piRShJufl7n7g SF7adiPRhaAPqJzk7v8Qs/TvZFEr0Lyjod7DA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733957219; x=1734562019; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=95t+hQrjG4o8vA5zoKM7vBXIQf303T8EyerxMKuzCW0=; b=oArkjbLTMcQr5XUFAQi3LNWdabOPai+fWhvOeYdzvI9i/Nj7mPX4N7iwnD5eEgrQMa yAgzn9VtgULi14gsslvDL04TAXjwHpAdehXmKS7OJ6PN8oUIlZKYqFN4sdd59lGBA2M2 CkrbtLVNHvjF7T8l0glG+uaTDrYK6OFMQ3PCK5VlcZtmj21r+mAcmRsJj5bRLTrzhEhT S8KToRDoZhoFJxpK/iHQf60/tc5RnXzAgopecN2gk8ipl4XdMh5KPT5vKntBGAYSpb69 Gb4U2XjbgYMOm1h47cs3cSCrLh1VP3pjYjSfmKqiq1g9j6sLWxnlqrKX0nv/JQC3BS8k rP3A== X-Forwarded-Encrypted: i=1; AJvYcCW8SKNnPTc0tEotPqso02sU5mxHZLV9sXKWmT4Qhj09Ibu99wU79Vm20YZ41PEjkdX+W3qs5QjsGA==@kvack.org X-Gm-Message-State: AOJu0Yz4N3p2Vo2JdgyoYnZUeQQ3TFueoPvaCJy2pyHb+6ikbTAumZ+i qm02ymA2Rc2ad5ibxpC/GmrKJChCuB3reciHrq+rvh/wVAK/g2T6tTMQRSwcIgQnmp4WJWV8Arw 932CRx7yAL/9uxTolQTw9Be9yun282orZUdG2 X-Gm-Gg: ASbGncsDFXZgJ+Zx77WRxYgubf677pMemZ5Nf94ot/hkE75XIqhwtqTdH1QvSMaN5LA mgQUfsJKDvLLH2ulkJiRErsj/wcU36XyUxqqdZ65t8qDMcGMuW5ZvizcEWbXIXlm0mAcs X-Google-Smtp-Source: AGHT+IGGAl854VSncgoC5QYcMwCe/wplU8p9SZbZDX9VPPxk/mMJOrsZauN7UyHwuYU9Xy+TkG+mx2GwsIIFItQ6pkA= X-Received: by 2002:a05:6870:40c7:b0:277:e512:f280 with SMTP id 586e51a60fabf-2a012d2c8d2mr906248fac.3.1733957219184; Wed, 11 Dec 2024 14:46:59 -0800 (PST) MIME-Version: 1.0 References: <20241125202021.3684919-1-jeffxu@google.com> <20241125202021.3684919-2-jeffxu@google.com> In-Reply-To: From: Jeff Xu Date: Wed, 11 Dec 2024 14:46:46 -0800 Message-ID: Subject: Re: [PATCH v4 1/1] exec: seal system mappings To: Andrei Vagin Cc: akpm@linux-foundation.org, keescook@chromium.org, jannh@google.com, torvalds@linux-foundation.org, adhemerval.zanella@linaro.org, oleg@redhat.com, linux-kernel@vger.kernel.org, linux-hardening@vger.kernel.org, linux-mm@kvack.org, jorgelo@chromium.org, sroettger@google.com, ojeda@kernel.org, adobriyan@gmail.com, anna-maria@linutronix.de, mark.rutland@arm.com, linus.walleij@linaro.org, Jason@zx2c4.com, deller@gmx.de, rdunlap@infradead.org, davem@davemloft.net, hch@lst.de, peterx@redhat.com, hca@linux.ibm.com, f.fainelli@gmail.com, gerg@kernel.org, dave.hansen@linux.intel.com, mingo@kernel.org, ardb@kernel.org, Liam.Howlett@oracle.com, mhocko@suse.com, 42.hyeyoo@gmail.com, peterz@infradead.org, ardb@google.com, enh@google.com, rientjes@google.com, groeck@chromium.org, mpe@ellerman.id.au, Dmitry Safonov <0x7f454c46@gmail.com>, Mike Rapoport , Alexander Mikhalitsyn , Andrei Vagin Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 96967160008 X-Stat-Signature: s8yhzfeyibjc6uss56k45df78fp4sqew X-Rspam-User: X-HE-Tag: 1733957204-898981 X-HE-Meta: U2FsdGVkX1+mta/DWintLuxBlSsM2+Y+HUCuT5dT7sBBn27o80vcDD9/wfvNCs4P8IrPnt8f6wOPPEGMgIgBLs8XSB0uFj+WVxwsfZIMZM40ssZznin45elLb8XwRm/xY8a68Lrf3gMqkdGYpmLJihKE3uMo98iECwXEOdysYqCuul9VJBLRB7i5JmzezEMwGWoneaLzvt3hzY9248xQXF/gXMGnn0mwG9ACE8wgx8xDxeeqmscogwpl0a0a07tWWHtZrUdINBYisjGQjDXqE8wgT+mDmgtSuh4DXNCR4mK8T7wE4lNkqqUM5Xee03PG8Y/DCXZfle8StWfY0hWznmLUvzFOYCdlLC9/6NVsB5Odq+htqeDhRfPy3d3PYwNC2pV5g56O7DUuULtFwvmgRSqmy/E4po2daCxIK+kQ8F2pKmnKVOFqF/rbs8T7b+sVwHZ9izsXB5zCnYEHH0CfgchQp/aIFBKW469A6PmrFs340IPgMEsSWRvevE5MqyHVFwnNSuYPBRYrrlvrgJbe7R2pEFAs5Gpc3OVMsRXr3UDQWg8vqE/QRgAbOpn7SP1zqWHiIi0/j6BKm5A/31uCxksvZ8JPjCYSZt5SkZ4wE57XxUxuVatB56VShFrPj7VPwA6JlICWjm9GHEMwRlfCDIWOifi1KuqvxQod+CcvHuVca9sKvuUevpmfNzGrarHhFbbZHYgvkBoo1IrOTbTI3NFs2IssmZv+JpFzk6xVxVW1CXih8+uwb8UXebpdi9hbE+u9BS0goLC+sH2fw3+NmHMNxz5jarhuvT7DT1jF/OhnQgCjTPqXs4SN3Nu8c21xX9ONSxwl0zSpOWIgf59Xz2HgYASurOpIIlK0G0oOhwdZ9gG6YnuXDcmI03mVsKryy7jvt0XW7EfP0vaeM6J1DK5St++ijmvNvAslLThIP0Fz1wgd4S3xoWBdD/eSWKHOPb+MPJs9FFObgC3YTYw YYDAMN6c qHB6VjYsXxg3qNIZnQIStb0rStZUvi1UG01ej792g5AMJthGcMZMGncLrT8PsThURrDktpJAyRYkzh+hafLMHokoATmE/38RG5+gRVzCEnJM9g3wxAYBgOxzZ5cbadIPthozpj6Nz0Ga+3vO3InETztOGjUZhGjurKwiszomHvBnJ6ZpqBgWKfqSHkdsFpRw4qwfxZ5yQ7bgKqVuyjA9Lbkwhgr2dMIS8njonJB/D/6pOW8TFLilAg6zjQk8xWiUMkBmx+iOfkDDWAfOrJ95yOrFVvXVtmNMNRgD3myBxj5vtWVQyFkpv45q72ide2+1Ew5/g X-Bogosity: Ham, tests=bogofilter, spamicity=0.167339, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Andrei Thanks for your email. I was hoping to get some feedback from CRIU devs, and happy to see you reaching out.. On Mon, Dec 9, 2024 at 8:12=E2=80=AFPM Andrei Vagin wrot= e: > > On Mon, Nov 25, 2024 at 12:49=E2=80=AFPM wrote: > > > > From: Jeff Xu > > > > Seal vdso, vvar, sigpage, uprobes and vsyscall. > > > > Those mappings are readonly or executable only, sealing can protect > > them from ever changing or unmapped during the life time of the process= . > > For complete descriptions of memory sealing, please see mseal.rst [1]. > > > > System mappings such as vdso, vvar, and sigpage (for arm) are > > generated by the kernel during program initialization, and are > > sealed after creation. > > > > Unlike the aforementioned mappings, the uprobe mapping is not > > established during program startup. However, its lifetime is the same > > as the process's lifetime [2]. It is sealed from creation. > > > > The vdso, vvar, sigpage, and uprobe mappings all invoke the > > _install_special_mapping() function. As no other mappings utilize this > > function, it is logical to incorporate sealing logic within > > _install_special_mapping(). This approach avoids the necessity of > > modifying code across various architecture-specific implementations. > > > > The vsyscall mapping, which has its own initialization function, is > > sealed in the XONLY case, it seems to be the most common and secure > > case of using vsyscall. > > > > It is important to note that the CHECKPOINT_RESTORE feature (CRIU) may > > alter the mapping of vdso, vvar, and sigpage during restore > > operations. Consequently, this feature cannot be universally enabled > > across all systems. > > > ... > > > > +config SEAL_SYSTEM_MAPPINGS > > + bool "seal system mappings" > > + default n > > + depends on 64BIT > > + depends on ARCH_HAS_SEAL_SYSTEM_MAPPINGS > > + depends on !CHECKPOINT_RESTORE > > Hi Jeff, > > I like the idea of this patchset, but I don=E2=80=99t like the idea of > forcing users to choose between this security feature and > checkpoint/restore functionality. We need to explore ways to make this > feature work with checkpoint/restore. Relying on CAP_CHECKPOINT_RESTORE > is the obvious approach. > I agree that forcing users to choose isn't ideal. I'd prefer a solution where both approaches can be used in some way, depending on the situation and distributions. Hopefully, with input from CRIU developers, this can be achieved. However, it makes sense to unconditionally seal vdso/vvar for systems like ChromeOS and Android that don't currently use CRIU, so we will need a KCONFIG for that. > CRIU just needs to move these mappings, and it doesn't need to change > their properties or modify their contents. With that in mind, here are That is an important detail to know, thanks for bringing it up. > two options: > * Allow moving sealed mappings for processes with CAP_CHECKPOINT_RESTORE. We could try to propose this under a new KCONFIG, e.g. CONFIG_SEAL_SYSTEM_MAPPING_WITH_CAP_CHECK IIUC, You propose allowing userspace mremap vdso if the process has CAP_CHECKPOINT_RESTORE. However, I believe this approach raises security concerns. During the RFC for mseal, initially, I suggested sealing mmap, mremap, and munmap individually, but Linus rejected this proposal, for a good reason, mremap could leave an empty hole in the address space, thus allowing the attacker to fill it with attacker controlled content. Furthermore, CAP_SYSTEM_ADMIN allows setting any capacity, so this would become a by-pass for sealing. > * Allow temporarily "unsealing" mappings for processes with > CAP_CHECKPOINT_RESTORE. CRIU could unseal mappings, move them, and > then seal them back. > We could also try to propose this under CONFIG_SEAL_SYSTEM_MAPPING_FOR_CRIU= . It's important to note that temporarily unsealing a mapping from userspace is not permitted. If a mapping has the capability to be unsealed, it fundamentally does not provide the sealing property. Perhaps the intention was for these steps to be carried out within the kernel? e.g. the userspace could instruct the kernel to relocate the vdso mapping. Since the kernel can ensure the vdso contents are not manipulated by an attacker, this approach could offer a viable solution. I have been thinking of other alternatives, but those would require more understanding on CRIU use cases. One of my questions is: Would CRIU target an individual process? or entire systems? If it is an individual process, we could use prctl to opt-in/opt-out certain processes. There could be two alternatives. 1> Opt-in solution: process must set prctl.seal_criu_mapping, this needs to be set before execve() because sealing is applied at execve() call. 2> opt-out solution: The system will by default seal all of the system mappings, but individual processes can opt-out by setting prctl.not_seal_criu_mappings. This also needs to be set before execve() call. For both cases, we will want to identify what type of mapping CRIU cares about, i.e. maybe CRIU doesn't care about uprobe and vsyscall ? and only care about vdso/vvar/sigpage ? > Another approach might be to make this feature configurable on a > per-process basis (e.g., via prctl). Once enabled for a process, it > would be inherited by all its children. It can't be disabled unless a > process has CAP_CHECKPOINT_RESTORE. > > I've added Mike, Dima, and Alex to the thread. They might have > other ideas. > Thanks. Please feel free to chime in, I will also add Mike,Dima and Alex to the new version of this series as well. Thanks! -Jeff > Thanks, > Andrei