From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9F6BDC35FFC for ; Wed, 19 Mar 2025 15:09:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D39A9280002; Wed, 19 Mar 2025 11:09:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CE88D280001; Wed, 19 Mar 2025 11:09:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B62AC280002; Wed, 19 Mar 2025 11:09:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 96488280001 for ; Wed, 19 Mar 2025 11:09:44 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 66A92120296 for ; Wed, 19 Mar 2025 15:09:46 +0000 (UTC) X-FDA: 83238635172.03.EBB4C97 Received: from smtp-relay-internal-0.canonical.com (smtp-relay-internal-0.canonical.com [185.125.188.122]) by imf05.hostedemail.com (Postfix) with ESMTP id 05AE8100008 for ; Wed, 19 Mar 2025 15:09:43 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=canonical.com header.s=20210705 header.b="qW/sskSL"; spf=pass (imf05.hostedemail.com: domain of aleksandr.mikhalitsyn@canonical.com designates 185.125.188.122 as permitted sender) smtp.mailfrom=aleksandr.mikhalitsyn@canonical.com; dmarc=pass (policy=none) header.from=canonical.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742396984; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8aBOIVOtwTJyDKkbjMF2Aak4oYSj8+ED8AjxSctFyBo=; b=zd3FmzB4AxG+Z+9l9BERWjD5XoZgFNgD4sE//PRj/z6Sn1ieq4D2EMP/pDQtjGEq6G/dQA ma31tdJ9HHTnUH7n7nY37wZXGGI/5weWaWSsMO0PWGKaS78yKuP4t38uirRD9Mya2VKCcb 8lVJzNMKu5pYff/5Y3BDDYG+Q0tGGV4= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=canonical.com header.s=20210705 header.b="qW/sskSL"; spf=pass (imf05.hostedemail.com: domain of aleksandr.mikhalitsyn@canonical.com designates 185.125.188.122 as permitted sender) smtp.mailfrom=aleksandr.mikhalitsyn@canonical.com; dmarc=pass (policy=none) header.from=canonical.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742396984; a=rsa-sha256; cv=none; b=BmvpaYLP7gL7uzwmfYadmWifqVZ00H4KLOifeyYqFnSs3SKWJVqXMIYwXc0M0NcFCXM8RD 4NJ1rJBnW3CKzVgbT9PF5H3Py4wDeRxjci1mc9L2kDVjjyKmtR+kfKDTnuUbMq7R8Ve9cA nj0M8G8prbdhBaeHHh+A+6+BWuq8PTs= Received: from mail-vk1-f200.google.com (mail-vk1-f200.google.com [209.85.221.200]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-internal-0.canonical.com (Postfix) with ESMTPS id 4F1563FCB1 for ; Wed, 19 Mar 2025 15:09:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1742396952; bh=8aBOIVOtwTJyDKkbjMF2Aak4oYSj8+ED8AjxSctFyBo=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=qW/sskSLgBluHAKrz/YLsizRkv479FkcIBuaGXxm5cDG89/RjYi6nrDGeCQqX+4nP lDz0kxz3x8JKYMwvt/nCXAjfD3ZykJ/+hiNxYpf+9CNbJb7je6BbG/Rr4VF6v+oUdo L4X6BqwbqbjEH+Px0M04ss0q8aTYwLrKmH/BC97X7OWkrArEi4e+JNXJhkRkpSegH4 g9xD1WgwW19psNV8L8zxU/y0wqelR7fza2ED6e3rrr7pDOSg5aEJrtEez1CijC9Tuc Jgdji/CJgorSMtUVGE0C5ZQBeyw2v6Di/h0a9+3bs4Y2KtwjA5RnLW5uw6EBmCiFOQ 010WudpJU36Ug== Received: by mail-vk1-f200.google.com with SMTP id 71dfb90a1353d-523f9fafc5bso1412748e0c.0 for ; Wed, 19 Mar 2025 08:09:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742396943; x=1743001743; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8aBOIVOtwTJyDKkbjMF2Aak4oYSj8+ED8AjxSctFyBo=; b=OrgxNs7hCkP7Eo8b9KD67R1wtvRI8R2wPLnvP5MxTXT+/qmlPqSskU/wsnYmeMb6aK O6P4yQxJgEtgZzMtZlc8YJL4R2qC0fBK2E4fAMECRrmycwv5dN5OXXRZa1GEB0v35UAj XeERaP0Mi+dPGajaa5OrAwz0MsaNr8tzcDgO4PCq8m3nHUIX7zYFBAHTEx2cCpyrXLY7 agTdtQ70F0qq0Ej9Psezmzk2jjdIGPv2gHWr1Axnq58TzTEYz1Nk9jlvkVjWE2cG33nZ vJjXQPwbcm2LYUPypeCHjLwcUeN9PN9R7jmdOPJQmfqTLnquU/uW84h+W6wggwQq0Ohr GdXg== X-Forwarded-Encrypted: i=1; AJvYcCUnMl/P6SMkvLZPfPYHeyJYToMhwUCkfF+qbbwKyMXfIViI1xMupNx9DuCKG0pAw8U3snGpdwipWA==@kvack.org X-Gm-Message-State: AOJu0YzeCFmxedi69L4vkS5FLdi6Ga6U8gnJIX4LDp6DCQ29vei+g7tC YngKL96jqp+Qd9Un7zx5xH5y6T+6J55PEKbR656xpOCtvyOXBU8YdCVwftDJQ5wgz4JOI3OYgTr h36u3vbJKwkP2pHBI4QjS20dHCa23sy32t4tdYXTXpU/jlDMQ60QfJVTWrzBO9nNLF9dLBO+8U2 hsJn1cmn0LopbAZ8Dm71kDNIwrA/LrMlkv57CHHPY= X-Gm-Gg: ASbGncs4A2KOgPghFC0zmllOn7Bvcnx4AvdZfYujpmbWYydW3W1OTNpjSBeykWWyZKK WDCTidYzP5Kfill5+jI2LgR3FiRkJ1hGeUS5icEycJc7qWn1W1k8dzlOQMH0ZttfAdEITy9Hn5w == X-Received: by 2002:a05:6122:4710:b0:524:2fe2:46ba with SMTP id 71dfb90a1353d-5258929589emr2290088e0c.11.1742396943258; Wed, 19 Mar 2025 08:09:03 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEZZ2JRERhNw6fQtgv1k3Uevcdo+N02hVl/dbQz5Ech6qMnUThFI0UpnwTJfu5OuJvFHJ0IevrcJdPwqxH/+ng= X-Received: by 2002:a05:6122:4710:b0:524:2fe2:46ba with SMTP id 71dfb90a1353d-5258929589emr2290011e0c.11.1742396942930; Wed, 19 Mar 2025 08:09:02 -0700 (PDT) MIME-Version: 1.0 References: <278393de-2729-4ed0-822c-87f33c7ce27e@redhat.com> In-Reply-To: <278393de-2729-4ed0-822c-87f33c7ce27e@redhat.com> From: Aleksandr Mikhalitsyn Date: Wed, 19 Mar 2025 16:08:51 +0100 X-Gm-Features: AQ5f1JrrL-o1nzunyhb4dVJT7qp6u8-CUZOYotSFXxMDYx9WLb26b0mVdqfOBmg Message-ID: Subject: Re: [PATCH v4 0/5] implement lightweight guard pages To: David Hildenbrand Cc: Lorenzo Stoakes , James.Bottomley@hansenpartnership.com, Liam.Howlett@oracle.com, akpm@linux-foundation.org, arnd@kernel.org, brauner@kernel.org, chris@zankel.net, deller@gmx.de, hch@infradead.org, jannh@google.com, jcmvbkbc@gmail.com, jeffxu@chromium.org, jhubbard@nvidia.com, linux-api@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, mattst88@gmail.com, muchun.song@linux.dev, paulmck@kernel.org, richard.henderson@linaro.org, shuah@kernel.org, sidhartha.kumar@oracle.com, surenb@google.com, tsbogend@alpha.franken.de, vbabka@suse.cz, willy@infradead.org, criu@lists.linux.dev, Andrei Vagin , Pavel Tikhomirov Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Queue-Id: 05AE8100008 X-Rspamd-Server: rspam08 X-Stat-Signature: ziusam6atcuk7ipw8f3g93ntddxjqmt1 X-HE-Tag: 1742396983-819305 X-HE-Meta: U2FsdGVkX192cpl23JevVKeSzAq63uXhPRNc8xRfp9GW9iGadbqOqVxVM9gjfbU9wG6yAFNkiRjUoYmviHZNbCnW7nzbeJHZY30AmwcGqJwhFueVh8B+kwca27324l1fru+j5GbPm4EhIjWP2A4ZbjfFzsZm/MjEnzP/q4Yu/3PtEJhKZmSAKIBI83GvImsAjpqGdl6u8/V4+6noZcHJBmLBmVY9HaV91qz3hqegT8KZnt9C5SM92+muPwMfYQ9TrmdaQgFxSDehkkOySIYrq1AHKzjCNvT4Texz5ahve7sXkoJgAIQ0Ys/qeXZ4+4QvF5cTh+Fy2Hq4meQMIqyVuxTWKA6o7zBo0g5EOBhIsvuTX1byLwlPnOdymv6SSukqvT8MiBQVbk7cFrcxS2K4fVyqJtYiqn5ZKcSVNAn5zQ9Q1ewI2tEkxZpyswMnFNnYocCpIncArNDdlg9F9Gvb7LLxABzlq2U3MmpTVRZrA8MIAlMqQEx2CpVzarJ8bWx1zErYQ+u9eZ+1ZdWZ6YZ6g00XDHjrI9WX12iMofq2ViUVFDogIRaQHbgZfpzlXM5wjstK2s+FPLIS+vXOrqL3ViaqXsTRu17k/T6dfvIvxLl697pkkNY04EkYH5hBqs4Uk2vztpBfNXB6nG1z+frV9swLc/ofRG/4cmEMAmjH6c+xzb5yadxaaZaUqQwskKMueSHi4waqtA2o+vTAaIVedp/J3S/dJWRLBUrSMXm5x1s4Kc+6HpNYD/O9m33YrblqcVKCMWEsmHZfEo8XErq6sgvC++sWiyqP4s4Ptb1anMPdSKeywp7bv2mwBZ8jQUtBm/WQITZIOFbQlnczArTRqebBWs+v5HGiF+p5EJgR1UVr5fF9sYLM+n/tN701NiQltLAH6aX5q1629OU/6U5boi4noRpNOR9x8W7Hfxzj8Ksczm9VR45C4ltNXZBH/WXqB+cgV6WHrsXZZpeKicU sOCWHt+v OXQklAtGdPcwDy7KvBuJFeoJ4K9BoxQ3IJaUJamZl02N0vbyYZUN8jqTnjJleqypBt7RCC3y54WOJALe6yZmp1h5NqcRF5Ae+HBf1C8LVmrR4bT/ZR1XtdrN87DxqZes7K8UBECrDGcHoWe2ebBcm87kOA1Wf7j6IeHXqOiaP54kqFdF7j0/3WsuKv1mh4BzDDpsg9fLVR1xJ6ZwoYdtZ0on+o5S3s0qXibRZitIFosr8CwzDC3Mfjuk5401I5n2JZoCQ/N7DsqD2vqw6lzOGluR5yRRqBvUY6JVDy5JDPjAGrzHSIkqT1RS1/nsoRnQg7QGZiYJabl8/i2T1HxomNY0JmoWAAcq16CgFU5vDfqBAtQuqCPTx1bqQn5eSyw3VOYc004OszAaLachXVMuEZMkm9xrRZ3Ff1RGr/T2++dtC5fUG73UubQ9MzlP0u5KGfoPLSYvkye+0WdgxH6LXAwBaQA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000869, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Mar 19, 2025 at 3:53=E2=80=AFPM David Hildenbrand wrote: > > On 19.03.25 15:50, Alexander Mikhalitsyn wrote: > > On Mon, Oct 28, 2024 at 02:13:26PM +0000, Lorenzo Stoakes wrote: > >> Userland library functions such as allocators and threading implementa= tions > >> often require regions of memory to act as 'guard pages' - mappings whi= ch, > >> when accessed, result in a fatal signal being sent to the accessing > >> process. > >> > >> The current means by which these are implemented is via a PROT_NONE mm= ap() > >> mapping, which provides the required semantics however incur an overhe= ad of > >> a VMA for each such region. > >> > >> With a great many processes and threads, this can rapidly add up and i= ncur > >> a significant memory penalty. It also has the added problem of prevent= ing > >> merges that might otherwise be permitted. > >> > >> This series takes a different approach - an idea suggested by Vlasimil > >> Babka (and before him David Hildenbrand and Jann Horn - perhaps more -= the > >> provenance becomes a little tricky to ascertain after this - please fo= rgive > >> any omissions!) - rather than locating the guard pages at the VMA lay= er, > >> instead placing them in page tables mapping the required ranges. > >> > >> Early testing of the prototype version of this code suggests a 5 times > >> speed up in memory mapping invocations (in conjunction with use of > >> process_madvise()) and a 13% reduction in VMAs on an entirely idle and= roid > >> system and unoptimised code. > >> > >> We expect with optimisation and a loaded system with a larger number o= f > >> guard pages this could significantly increase, but in any case these > >> numbers are encouraging. > >> > >> This way, rather than having separate VMAs specifying which parts of a > >> range are guard pages, instead we have a VMA spanning the entire range= of > >> memory a user is permitted to access and including ranges which are to= be > >> 'guarded'. > >> > >> After mapping this, a user can specify which parts of the range should > >> result in a fatal signal when accessed. > >> > >> By restricting the ability to specify guard pages to memory mapped by > >> existing VMAs, we can rely on the mappings being torn down when the > >> mappings are ultimately unmapped and everything works simply as if the > >> memory were not faulted in, from the point of view of the containing V= MAs. > >> > >> This mechanism in effect poisons memory ranges similar to hardware mem= ory > >> poisoning, only it is an entirely software-controlled form of poisonin= g. > >> > >> The mechanism is implemented via madvise() behaviour - MADV_GUARD_INST= ALL > >> which installs page table-level guard page markers - and > >> MADV_GUARD_REMOVE - which clears them. > >> > >> Guard markers can be installed across multiple VMAs and any existing > >> mappings will be cleared, that is zapped, before installing the guard = page > >> markers in the page tables. > >> > >> There is no concept of 'nested' guard markers, multiple attempts to in= stall > >> guard markers in a range will, after the first attempt, have no effect= . > >> > >> Importantly, removing guard markers over a range that contains both gu= ard > >> markers and ordinary backed memory has no effect on anything but the g= uard > >> markers (including leaving huge pages un-split), so a user can safely > >> remove guard markers over a range of memory leaving the rest intact. > >> > >> The actual mechanism by which the page table entries are specified mak= es > >> use of existing logic - PTE markers, which are used for the userfaultf= d > >> UFFDIO_POISON mechanism. > >> > >> Unfortunately PTE_MARKER_POISONED is not suited for the guard page > >> mechanism as it results in VM_FAULT_HWPOISON semantics in the fault > >> handler, so we add our own specific PTE_MARKER_GUARD and adapt existin= g > >> logic to handle it. > >> > >> We also extend the generic page walk mechanism to allow for installati= on of > >> PTEs (carefully restricted to memory management logic only to prevent > >> unwanted abuse). > >> > >> We ensure that zapping performed by MADV_DONTNEED and MADV_FREE do not > >> remove guard markers, nor does forking (except when VM_WIPEONFORK is > >> specified for a VMA which implies a total removal of memory > >> characteristics). > >> > >> It's important to note that the guard page implementation is emphatica= lly > >> NOT a security feature, so a user can remove the markers if they wish.= We > >> simply implement it in such a way as to provide the least surprising > >> behaviour. > >> > >> An extensive set of self-tests are provided which ensure behaviour is = as > >> expected and additionally self-documents expected behaviour of guard > >> ranges. > > > > Dear Lorenzo, > > Dear colleagues, > > > > sorry about raising an old thread. > > > > It looks like this feature is now used in glibc [1]. And we noticed fai= lures in CRIU [2] > > CI on Fedora Rawhide userspace. Now a question is how we can properly d= etect such > > "guarded" pages from user space. As I can see from MADV_GUARD_INSTALL i= mplementation, > > it does not modify VMA flags anyhow, but only page tables. It means tha= t /proc//maps > > and /proc//smaps interfaces are useless in this case. (Please, cor= rect me if I'm missing > > anything here.) > > > > I wonder if you have any ideas / suggestions regarding Checkpoint/Resto= re here. We (CRIU devs) are happy > > to develop some patches to bring some uAPI to expose MADV_GUARDs, but b= efore going into this we decided > > to raise this question in LKML. > > > See [1] and [2] Hi David, Huge thanks for such a fast and helpful reply ;) > > [1] > https://lkml.kernel.org/r/cover.1740139449.git.lorenzo.stoakes@oracle.com > [2] https://lwn.net/Articles/1011366/ > > > -- > Cheers, > > David / dhildenb Kind regards, Alex >