From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B9DC8C636D3 for ; Wed, 1 Feb 2023 22:51:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D62F36B0071; Wed, 1 Feb 2023 17:51:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D13956B0072; Wed, 1 Feb 2023 17:51:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B8E266B0073; Wed, 1 Feb 2023 17:51:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id A636D6B0071 for ; Wed, 1 Feb 2023 17:51:45 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 5402CA0DC6 for ; Wed, 1 Feb 2023 22:51:45 +0000 (UTC) X-FDA: 80420221770.23.AB6A879 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf20.hostedemail.com (Postfix) with ESMTP id 2020C1C000A for ; Wed, 1 Feb 2023 22:51:42 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=VJM5FvaO; spf=pass (imf20.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675291903; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=XSLskm6AR/RoEO7AKZfXBH2mjiqKgHPTYdoXGz36+6Y=; b=BzOfg5fzmfxvZMsXcuwiLYiDKYNhGlQkSoy557mhXU4v7kL7e2jIDSHzQSXXdgcAyro7I5 Znu4K6sNKxssIntPcj3W4IJoopKhT8mY3lfAiSEdcyhub0gDTggoUcris9rk5IkqHkX8tj 1Zgka2nxJ+xs2y/AduLgGEWEVXAUgGc= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=VJM5FvaO; spf=pass (imf20.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675291903; a=rsa-sha256; cv=none; b=pTFaviC9q0nan+hKflbnXWsXIKPxaYqvmrY34v04AvWBsOQ2K5Z5/GEB5qmue/DRYPmrBZ sGe0QYuhVJsZq5qy9al6QqPltTaAUaUAgGguvSX+UFa1YS8lpoyfIBmSU8vVKV+6mqEjTo nowAptYuzyKvClTXiKfdAYqXBlbKUaY= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1675291902; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=XSLskm6AR/RoEO7AKZfXBH2mjiqKgHPTYdoXGz36+6Y=; b=VJM5FvaObRP0itzuqqED26D0+xF0IrnRM87B4RkyhjXd/PRsvNWPw3smnAdu84SKM7sgYA p/WdxYWEqhYHrfG1XjUic+oW9y3AmH8SBhnqC0MNQ+DRgsp8JzCWLtJkI4dSF5KPpa0AH2 14rqjGjG1h7w6Wck3crtRLc3fQtKvVA= Received: from mail-qk1-f197.google.com (mail-qk1-f197.google.com [209.85.222.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-669-Wqli5SC3O_-f8QVhFAg-Rg-1; Wed, 01 Feb 2023 17:51:39 -0500 X-MC-Unique: Wqli5SC3O_-f8QVhFAg-Rg-1 Received: by mail-qk1-f197.google.com with SMTP id s7-20020a05620a0bc700b006e08208eb31so255534qki.3 for ; Wed, 01 Feb 2023 14:51:39 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=XSLskm6AR/RoEO7AKZfXBH2mjiqKgHPTYdoXGz36+6Y=; b=PhSQrlMFd1kP4dcugpv/4jZbnqkAXxwrCYSlB/6BqTVtWJrLSdweLpPSusm+8W+v2k PUgltR1rfo9TTmPHzGAKKdSEX0afQzOrz/fTlKq06zvK+BrLHuobgfd0gnwfq+HCrElG i4/JKtAEzmPgd2+r+USFJIMq2tG6LxeDYnmrZEEm58IKeDc/iMM7LG5ijbLJBipLAWym oKT9GlGm+rIg6/PBUFnrXddYmTozZvTVsQUstOCuMZgR9O1JBdxN8xI0n1osQpVRe++O TLDOWaMNl972s+zYFfs3H6hO1q4JeeG00CGL3gAIvnb7YS+g8vMcWJPInDLnNs9TO9qq aypA== X-Gm-Message-State: AO0yUKXCTwicX9R/AeB7bzqPFuzzs4FCEEN0o0gEoK65AxllE3qMkJN6 cbC89hA9bcOesMo9G6wJmEMi/D4va4D/S5ZfZKmxrRkHtFHvPMR+PQNvWW23aTHP8Owrb+nhX/X BEMmeAr2mpyY= X-Received: by 2002:ac8:4908:0:b0:3b8:6c6e:4949 with SMTP id e8-20020ac84908000000b003b86c6e4949mr6666863qtq.4.1675291898863; Wed, 01 Feb 2023 14:51:38 -0800 (PST) X-Google-Smtp-Source: AK7set9QU/pg3lX447cZDPT+ECjMCN9D89jwTNBjSZqxAR811W/xmquV+vVt3b6u8OCgqafcLR9zUg== X-Received: by 2002:ac8:4908:0:b0:3b8:6c6e:4949 with SMTP id e8-20020ac84908000000b003b86c6e4949mr6666813qtq.4.1675291898528; Wed, 01 Feb 2023 14:51:38 -0800 (PST) Received: from x1n (bras-base-aurron9127w-grc-56-70-30-145-63.dsl.bell.ca. [70.30.145.63]) by smtp.gmail.com with ESMTPSA id a8-20020ac84348000000b003b86d8ad0c1sm5593624qtn.3.2023.02.01.14.51.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 01 Feb 2023 14:51:37 -0800 (PST) Date: Wed, 1 Feb 2023 17:51:35 -0500 From: Peter Xu To: Muhammad Usama Anjum Cc: David Hildenbrand , Andrew Morton , =?utf-8?B?TWljaGHFgiBNaXJvc8WCYXc=?= , Andrei Vagin , Danylo Mocherniuk , Paul Gofman , Cyrill Gorcunov , Alexander Viro , Shuah Khan , Christian Brauner , Yang Shi , Vlastimil Babka , "Liam R . Howlett" , Yun Zhou , Suren Baghdasaryan , Alex Sierra , Matthew Wilcox , Pasha Tatashin , Mike Rapoport , Nadav Amit , Axel Rasmussen , "Gustavo A . R . Silva" , Dan Williams , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, Greg KH , kernel@collabora.com Subject: Re: [PATCH v9 1/3] userfaultfd: Add UFFD WP Async support Message-ID: References: <20230131083257.3302830-1-usama.anjum@collabora.com> <20230131083257.3302830-2-usama.anjum@collabora.com> MIME-Version: 1.0 In-Reply-To: <20230131083257.3302830-2-usama.anjum@collabora.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 2020C1C000A X-Stat-Signature: ggr9nx9754un7ojuwt1uwoeooitfeh3y X-Rspam-User: X-HE-Tag: 1675291902-884285 X-HE-Meta: U2FsdGVkX18x7Hl7EcfXEP0OdD72rcMEadHP5kZutG/ll/+MVJRuHxylBgbtdjAOS9iygTG27/QewTWnwmXDhxJPxBV3ETahw2iATGMwJFSuh9eVAdrJB1F/r8lUQoEzZSCqmQrEVoTTqwK/K9gf4vj8sseLUVM0TXoIae0V376zuuJ9mEG64EXfQBtWkdl54x3cDCf3fC0gvC9zpHEEyYZF82Xa/Aad+cDgkcmmqk4bTeI25i2tdAQhbDzVJxrTT3tNNNuHmKVEqC7UcVipfwe3GCFJjISWtMIKe+5k5FgWsLOzMQNJHxWSU+iUbj1vT2THsYrPjR2NHbWGCpS6kBIvcUDKgyW0HGj/mLuGXTwytX7p8YEe/nwQM6K+NHVFLZJOEmqfVVPqQ0oXu4stYuPHjjuPpPn0WVSL44gLOm1J0VWrmEVUrqGG1cg3V/jTpZfPk76VLcD7/AdCrul9BBmNuPy2cENtmHqJCZmYo+sjbcbSGR+a1XgMODLAVJ2meEluggmwoXdkEGOB8O6HnT8K1C7lzoAKKqj9kvlY1lQEMAAJf9sIIsWriV/FQ4FwHTV2+BbvXfZlOS+vQ7Tv61qWFKE3emsEB4WdJ9HIOPdKOyDa61jO9uGDWWvunR7DwJIQRXYtg+3au9PdNxlJyDlVXtd+mmlGvmZSMUwDEXYLbFLNYFuEvNcafNO626f/Savg6I0ddVawAQAkJA+Rk0z+75SwKPWpxhXjZL7/WErckt5YNtij9JUzzR31LLZPcWFOZZACjCzLK7NXivW0cuo109CXZ+RB0l4Ez3Nu/O716/+s3OrtXuFdDJBB+bJsvr1BekuL/tvTaua/9NBIUqW32uh9YbKnjzhCik6ByHqEreyz4dn1LFKrxiCdyKSsWzmEAmVgn+OQ/opLgw5dtVLG9rFT47DP+lRp6pHKbHNn8Gg8K5cKiB14X9uj0ErSEJCk5xTRvfGMmecmkVJ 8ebx4LAu ByQvgx2vbjOWtZkx0aem0rSivQtftU94BZogIEkIRnJWvoJJjODYI4hrlpQ59JCDl0tG4jDGl5ZafyqtpxH1qmriONMr2cqU+fH/jjxMhNOw61vzyty9Od/meuBygxGZoaMRrW3IBtMTfSB7qynh6NYSCAP7/1c8DPLShjTFJ0sSXMG24arFDz2VjAj2CqUjQutIQZvMEy6Aywq7P2qulCV2LHerBDxHZtabwCzbRecoPJaDd/ckHCfbVNAWVaiEDQkLo2k1wenqo9iA6Q/ObRqiS7BOp6TxqMFCm0P44/pP0cS4MP36vMLUfERKAuX0Krcyei0bVqk9nZ9xkYwbY6bHRMiZkjw86L0+ZyVZbeZ03SbVvVhz/qx0VwdCKEU+kh79hLMPYYGWgwx217Aa0vA77o0WeQXANu0QOK5svQItZIW2QvmFzGQsVidIz+3TJBcT2d42KtnMGdfwkXf8BAKAhAQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jan 31, 2023 at 01:32:55PM +0500, Muhammad Usama Anjum wrote: > Add new WP Async mode (UFFD_FEATURE_WP_ASYNC) which resolves the page > faults on its own. It can be used to track that which pages have been > written-to from the time the pages were write-protected. It is very > efficient way to track the changes as uffd is by nature pte/pmd based. > > UFFD synchronous WP sends the page faults to the userspace where the > pages which have been written-to can be tracked. But it is not efficient. > This is why this asynchronous version is being added. After setting the > WP Async, the pages which have been written to can be found in the pagemap > file or information can be obtained from the PAGEMAP_IOCTL. > > Suggested-by: Peter Xu > Signed-off-by: Muhammad Usama Anjum > --- > Changes in v9: > - Correct the fault resolution with code contributed by Peter > > Changes in v7: > - Remove UFFDIO_WRITEPROTECT_MODE_ASYNC_WP and add UFFD_FEATURE_WP_ASYNC > - Handle automatic page fault resolution in better way (thanks to Peter) > > update to wp async > --- > fs/userfaultfd.c | 11 +++++++++++ > include/linux/userfaultfd_k.h | 6 ++++++ > include/uapi/linux/userfaultfd.h | 8 +++++++- > mm/memory.c | 23 ++++++++++++++++++++--- > 4 files changed, 44 insertions(+), 4 deletions(-) > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c > index 15a5bf765d43..c17835a0e842 100644 > --- a/fs/userfaultfd.c > +++ b/fs/userfaultfd.c > @@ -1867,6 +1867,10 @@ static int userfaultfd_writeprotect(struct userfaultfd_ctx *ctx, > mode_wp = uffdio_wp.mode & UFFDIO_WRITEPROTECT_MODE_WP; > mode_dontwake = uffdio_wp.mode & UFFDIO_WRITEPROTECT_MODE_DONTWAKE; > > + /* The unprotection is not supported if in async WP mode */ > + if (!mode_wp && (ctx->features & UFFD_FEATURE_WP_ASYNC)) > + return -EINVAL; > + > if (mode_wp && mode_dontwake) > return -EINVAL; > > @@ -1950,6 +1954,13 @@ static int userfaultfd_continue(struct userfaultfd_ctx *ctx, unsigned long arg) > return ret; > } > > +int userfaultfd_wp_async(struct vm_area_struct *vma) > +{ > + struct userfaultfd_ctx *ctx = vma->vm_userfaultfd_ctx.ctx; > + > + return (ctx && (ctx->features & UFFD_FEATURE_WP_ASYNC)); > +} > + > static inline unsigned int uffd_ctx_features(__u64 user_features) > { > /* > diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h > index 9df0b9a762cc..94dcb4dc1b4a 100644 > --- a/include/linux/userfaultfd_k.h > +++ b/include/linux/userfaultfd_k.h > @@ -179,6 +179,7 @@ extern int userfaultfd_unmap_prep(struct mm_struct *mm, unsigned long start, > unsigned long end, struct list_head *uf); > extern void userfaultfd_unmap_complete(struct mm_struct *mm, > struct list_head *uf); > +extern int userfaultfd_wp_async(struct vm_area_struct *vma); > > #else /* CONFIG_USERFAULTFD */ > > @@ -274,6 +275,11 @@ static inline bool uffd_disable_fault_around(struct vm_area_struct *vma) > return false; > } > > +static inline int userfaultfd_wp_async(struct vm_area_struct *vma) > +{ > + return false; > +} > + > #endif /* CONFIG_USERFAULTFD */ > > static inline bool pte_marker_entry_uffd_wp(swp_entry_t entry) > diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h > index 005e5e306266..f4252ef40071 100644 > --- a/include/uapi/linux/userfaultfd.h > +++ b/include/uapi/linux/userfaultfd.h > @@ -38,7 +38,8 @@ > UFFD_FEATURE_MINOR_HUGETLBFS | \ > UFFD_FEATURE_MINOR_SHMEM | \ > UFFD_FEATURE_EXACT_ADDRESS | \ > - UFFD_FEATURE_WP_HUGETLBFS_SHMEM) > + UFFD_FEATURE_WP_HUGETLBFS_SHMEM | \ > + UFFD_FEATURE_WP_ASYNC) > #define UFFD_API_IOCTLS \ > ((__u64)1 << _UFFDIO_REGISTER | \ > (__u64)1 << _UFFDIO_UNREGISTER | \ > @@ -203,6 +204,10 @@ struct uffdio_api { > * > * UFFD_FEATURE_WP_HUGETLBFS_SHMEM indicates that userfaultfd > * write-protection mode is supported on both shmem and hugetlbfs. > + * > + * UFFD_FEATURE_WP_ASYNC indicates that userfaultfd write-protection > + * asynchronous mode is supported in which the write fault is automatically > + * resolved and write-protection is un-set. Please mention a few other things: - It only supports anon and shmem (so hugetlb is not supported) - It will only take effect when any vma is registered with wr-protection mode. Otherwise the flag will be ignored. In userfaultfd_register(), we need to fail the ioctl if anyone tries to register any hugetlb vma with this new flag set. > */ > #define UFFD_FEATURE_PAGEFAULT_FLAG_WP (1<<0) > #define UFFD_FEATURE_EVENT_FORK (1<<1) > @@ -217,6 +222,7 @@ struct uffdio_api { > #define UFFD_FEATURE_MINOR_SHMEM (1<<10) > #define UFFD_FEATURE_EXACT_ADDRESS (1<<11) > #define UFFD_FEATURE_WP_HUGETLBFS_SHMEM (1<<12) > +#define UFFD_FEATURE_WP_ASYNC (1<<13) > __u64 features; > > __u64 ioctls; > diff --git a/mm/memory.c b/mm/memory.c > index 4000e9f017e0..04843e35550e 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -3351,8 +3351,21 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) > > if (likely(!unshare)) { > if (userfaultfd_pte_wp(vma, *vmf->pte)) { > - pte_unmap_unlock(vmf->pte, vmf->ptl); > - return handle_userfault(vmf, VM_UFFD_WP); > + if (userfaultfd_wp_async(vma)) { > + /* > + * Nothing needed (cache flush, TLB invalidations, > + * etc.) because we're only removing the uffd-wp bit, > + * which is completely invisible to the user. > + */ > + pte_t pte = pte_clear_uffd_wp(*vmf->pte); > + > + set_pte_at(vma->vm_mm, vmf->address, vmf->pte, pte); > + /* Update this to be prepared for following up CoW handling */ > + vmf->orig_pte = pte; > + } else { > + pte_unmap_unlock(vmf->pte, vmf->ptl); > + return handle_userfault(vmf, VM_UFFD_WP); > + } > } > > /* > @@ -4812,8 +4825,11 @@ static inline vm_fault_t wp_huge_pmd(struct vm_fault *vmf) > > if (vma_is_anonymous(vmf->vma)) { > if (likely(!unshare) && > - userfaultfd_huge_pmd_wp(vmf->vma, vmf->orig_pmd)) > + userfaultfd_huge_pmd_wp(vmf->vma, vmf->orig_pmd)) { > + if (userfaultfd_wp_async(vmf->vma)) > + goto split_and_return; > return handle_userfault(vmf, VM_UFFD_WP); > + } > return do_huge_pmd_wp_page(vmf); > } > > @@ -4825,6 +4841,7 @@ static inline vm_fault_t wp_huge_pmd(struct vm_fault *vmf) > } > } > > +split_and_return: The "and_return" is superfluous, IMHO. Just make it "split"? > /* COW or write-notify handled on pte level: split pmd. */ > __split_huge_pmd(vmf->vma, vmf->pmd, vmf->address, false, NULL); Would you also update Documentation/admin-guide/mm/userfaultfd.rst in the same patch? Thanks, -- Peter Xu