From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 12374C05027
	for <linux-mm@archiver.kernel.org>; Fri, 20 Jan 2023 14:53:16 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 8A6FA6B0075; Fri, 20 Jan 2023 09:53:16 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 857A66B0078; Fri, 20 Jan 2023 09:53:16 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 6D1996B007B; Fri, 20 Jan 2023 09:53:16 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14])
	by kanga.kvack.org (Postfix) with ESMTP id 594276B0075
	for <linux-mm@kvack.org>; Fri, 20 Jan 2023 09:53:16 -0500 (EST)
Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay01.hostedemail.com (Postfix) with ESMTP id 27C5A1C6714
	for <linux-mm@kvack.org>; Fri, 20 Jan 2023 14:53:16 +0000 (UTC)
X-FDA: 80375470392.12.66F7782
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124])
	by imf19.hostedemail.com (Postfix) with ESMTP id ED76E1A0012
	for <linux-mm@kvack.org>; Fri, 20 Jan 2023 14:53:12 +0000 (UTC)
Authentication-Results: imf19.hostedemail.com;
	dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=UrTAXZkG;
	dmarc=pass (policy=none) header.from=redhat.com;
	spf=pass (imf19.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1674226393;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=AUOjGmTFXC4DMVmZa6wjPRMDEu/y7poMMIu/vvuBRrM=;
	b=Be3OSjrne0L8kFq8uP/CS1yPPrYoQ1w8v8lyda8NRtn6A2QLWBCugpNBZBXmZoZZtUHH9L
	rnq5k6D7OEFVzl+r55YCe8in4X2d9Uv3Sie+Ju/XCY66mrTJLnlbDLWdxU7WCXQOW4lxxe
	lSHxqkJMTY03TXpc5/c0NjebhRrjATY=
ARC-Authentication-Results: i=1;
	imf19.hostedemail.com;
	dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=UrTAXZkG;
	dmarc=pass (policy=none) header.from=redhat.com;
	spf=pass (imf19.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1674226393; a=rsa-sha256;
	cv=none;
	b=WhM/XpMjVPWifSgDHNltfgcmdjnio2/VzC4ruzjJ+GsQsYe6aFzLoAbDDebolvgu+S+BLG
	XSMlY67uVjg/sIJAIwHUSImyDKOwmIdpDk0xY8/3SMsTbemkK6Gy/3XGmGhgGz3CKOl616
	dssrtxSeVcICYa6p0/jQ2HdtuUlSRTo=
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1674226392;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=AUOjGmTFXC4DMVmZa6wjPRMDEu/y7poMMIu/vvuBRrM=;
	b=UrTAXZkGCKYkZ10vg0idYsnje8HWiEOOuYUWcYsL9ljzzGaMBaE6FlW1gLE4nCqmW8xTvU
	4laWyk8Fbp0K+WwzTKI691nblPnLKlnWNGor5hL+3g3zZsTydEDKVFpUCMhPPlRdeVj+S1
	No+L8Kx2EpFSh8cLLNDBIZ47meTfdUI=
Received: from mail-vk1-f197.google.com (mail-vk1-f197.google.com
 [209.85.221.197]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id
 us-mta-636-r_fRqpoGOKK5yR1wsXfhgQ-1; Fri, 20 Jan 2023 09:53:09 -0500
X-MC-Unique: r_fRqpoGOKK5yR1wsXfhgQ-1
Received: by mail-vk1-f197.google.com with SMTP id bb21-20020a056122221500b003e20d9fec6dso321373vkb.12
        for <linux-mm@kvack.org>; Fri, 20 Jan 2023 06:53:09 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=in-reply-to:content-disposition:mime-version:references:message-id
         :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date
         :message-id:reply-to;
        bh=AUOjGmTFXC4DMVmZa6wjPRMDEu/y7poMMIu/vvuBRrM=;
        b=J10pKVb92HVpPtRd91hXQhV8/ccGqeYdjr81iPMSocH3gqcdf+27P6YOMAs4NMAvpz
         ypU/YvO142arr1YGCCcjmPN1Kkx5iAGDY9CxT73as1rtbQjEdqXU3T0QX+P4LLPVEPI6
         h2HsTl7tBE/CvJpXWXyGJQp4cX7rlwcSh4orR7zR66+jO5hLWrq/CB1NXd+eoYqv7rm8
         8SM5s0OtORmQIjOif+foYfmkM4IhM8f52ymNSjEv7K4FGOhoeQ6IkPcxmsyOQf/XlUPY
         V+9sbdnVv/BjxrpoeXay1KE7sCTx3XUJFXR1avkOIlieqAlaugawXkDAcaEqi2BvJ7iN
         VA9w==
X-Gm-Message-State: AFqh2kqHL/MsSkz7OlGdpYHkyrrvLgrCFzCr91RHG5/Rf5/sRo6CjOPM
	uJC3gsXEmxyiUc+sKnYJfOLMFBnuCH7CLf12ePCugW9u0Y7z2RqeH/30Z/RzcU8DVkXqcO2XdNy
	HEP2acwtuxe0=
X-Received: by 2002:a05:6102:6c4:b0:3d3:c6b6:f6cb with SMTP id m4-20020a05610206c400b003d3c6b6f6cbmr10110944vsg.33.1674226388607;
        Fri, 20 Jan 2023 06:53:08 -0800 (PST)
X-Google-Smtp-Source: AMrXdXuKEI5wlPM5iwQbylwxVk98E9UP/R/7m4X/rvDyrG3sF/GyxINJUK4YDTd8CcGLUnyV8nbN3A==
X-Received: by 2002:a05:6102:6c4:b0:3d3:c6b6:f6cb with SMTP id m4-20020a05610206c400b003d3c6b6f6cbmr10110884vsg.33.1674226388173;
        Fri, 20 Jan 2023 06:53:08 -0800 (PST)
Received: from x1n (bras-base-aurron9127w-grc-56-70-30-145-63.dsl.bell.ca. [70.30.145.63])
        by smtp.gmail.com with ESMTPSA id q22-20020a05620a2a5600b0070638ad5986sm12859313qkp.85.2023.01.20.06.53.05
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Fri, 20 Jan 2023 06:53:07 -0800 (PST)
Date: Fri, 20 Jan 2023 09:53:05 -0500
From: Peter Xu <peterx@redhat.com>
To: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: David Hildenbrand <david@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	=?utf-8?B?TWljaGHFgiBNaXJvc8WCYXc=?= <emmir@google.com>,
	Andrei Vagin <avagin@gmail.com>,
	Danylo Mocherniuk <mdanylo@google.com>,
	Paul Gofman <pgofman@codeweavers.com>,
	Cyrill Gorcunov <gorcunov@gmail.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Shuah Khan <shuah@kernel.org>,
	Christian Brauner <brauner@kernel.org>,
	Yang Shi <shy828301@gmail.com>, Vlastimil Babka <vbabka@suse.cz>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	Yun Zhou <yun.zhou@windriver.com>,
	Suren Baghdasaryan <surenb@google.com>,
	Alex Sierra <alex.sierra@amd.com>,
	Matthew Wilcox <willy@infradead.org>,
	Pasha Tatashin <pasha.tatashin@soleen.com>,
	Mike Rapoport <rppt@kernel.org>, Nadav Amit <namit@vmware.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	"Gustavo A . R . Silva" <gustavoars@kernel.org>,
	Dan Williams <dan.j.williams@intel.com>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, linux-kselftest@vger.kernel.org,
	Greg KH <gregkh@linuxfoundation.org>, kernel@collabora.com
Subject: Re: [PATCH v7 1/4] userfaultfd: Add UFFD WP Async support
Message-ID: <Y8qq0dKIJBshua+X@x1n>
References: <20230109064519.3555250-1-usama.anjum@collabora.com>
 <20230109064519.3555250-2-usama.anjum@collabora.com>
 <Y8gkY8OlnOwvlkj4@x1n>
 <0bed5911-48b9-0cc2-dfcf-d3bc3b0e8388@collabora.com>
 <Y8lxW5YtD6MX61WD@x1n>
MIME-Version: 1.0
In-Reply-To: <Y8lxW5YtD6MX61WD@x1n>
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
X-Rspam-User: 
X-Rspamd-Server: rspam02
X-Rspamd-Queue-Id: ED76E1A0012
X-Stat-Signature: gfnkcfhhbzmekusej7547yffocgfuyt9
X-HE-Tag: 1674226392-395784
X-HE-Meta: U2FsdGVkX19wEdo8Ub97Q+uLr4BRPZKg/Q27uQjQaRor3qzoo87/pLOrykzyoigiBZ6yL//Swqy9ybKlCaK65RUS1OETVjUnQvooABV8KYlHki421Hw5JlwT+dGfL+nmstByYSZzdbDMerI1ywAzdGIlDXyCCCKTXHYrR7ZEDnpJ1UVfIfmKb4VDlIr872BfAk0qr6oEWmC7In6VX5rYHi8SMQ9Z7xqfujItAe5O2cmtBQhDxsyTaFX990nQDvFqXN6mBnCKyBtua1Z3pBUennqXns0o2TH0FyexymMG/2k8QmSlaeD37CUS9xQg2JvMOwidhbMoAhzgMlyaRsEAsM9gcxmhPGEzTobjFYl0qakGUUh6BveC4gMfkLZGRte+3nDrd2sa5ukfd5K9jwfz5n/VuHqCf50KoDhKFyjsE8YrT9Fg9UHDNj1DE3ZuNtC+l9HqOy5XcE21BfEs2DTLruc02TChLUEJmEC/tSj//TpXaEI5hqSJSR2o70dbZc5RylJzEhCYICaLHasCl2NFLjTgYRc7RospfxSIJxR4jAtUcTyrNsGCkyZGTdGgEyByZBk6aZNMaShIsouux19UwXcBnixGcmmQEcmn3+6z6RCBXFcSJneBvSuG9Q1zarqYXOpQexzXgAMqsyYvBj0A1YMmbKSDIsPX0DGG295TfeYXvXmEub5TpR+T/56NpmoLEmyYq0t/QntJiBi0Ow4gv6dvgjzt9wvBZ9fR5PoH5sd8orCP9YS1EB/sIPXyZZ0oT2IaUUb/2dW0HA0WEdg9/x8bXJ+g8NRWI3K53e6FtEoh7jJnvY+jUKUqSAYV4nUwn9aecEyjmLu8hNNwSjktRRwV7l5QZ9TEWyli123oUem7F+T0hi+9w71Yztms1jXJU9tH9Hes0fbx02JYTRytK5jZM/SGD2xlbfdH/UeUpxia8IRhNL3QVFpKbJXc1PBd0DQaPlQ+gNUGvA24P94
 /ZwyLbm3
 ePHOADMoocpxg7DjsYbwAcwTW2SnykWXTl1bpi4d8ap3mEnJ1hrKG9Tay8j5gjr+ZxOAHCJhm5er2+FODMbXbuDPTfAAMjMLMqu4EpV86HXm+uVer1rRRuUE8XmOmUJCNlwTkqJNFs+T9//qzjuLG4ZjuhnFRO3ytLZuYBcr7rVfpZgVD7pLLTH8lYxvgrCxPkske87G41A3jUCY8UJJhJtcx4cIPkyi3H68in6P39fg5yMPMKdsIKR4fQw0J6sCf9mgnDIwjmQ6uvf19ezAMH2Auhwk7hrvDnAs+IxdBt3O06r/Us+Fnx+PWyRFt4tXoxOJOX1JEe5bZ3YruIZJXStxM5VD9uN5q+jA5U2Rz3+Zw6mW4Hfjq9FTy+gTrXQANCZCt6uy0VwH0fAbaWcoVcX8jBA==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Thu, Jan 19, 2023 at 11:35:39AM -0500, Peter Xu wrote:
> On Thu, Jan 19, 2023 at 08:09:52PM +0500, Muhammad Usama Anjum wrote:
> 
> [...]
> 
> > >> @@ -497,80 +498,93 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason)
> > >>  
> > >>  	/* take the reference before dropping the mmap_lock */
> > >>  	userfaultfd_ctx_get(ctx);
> > >> +	if (ctx->async) {
> > > 
> > > Firstly, please consider not touching the existing code/indent as much as
> > > what this patch did.  Hopefully we can keep the major part of sync uffd be
> > > there with its git log, it also helps reviewing your code.  You can add the
> > > async block before that, handle the fault and return just earlier.
> > This is possible. Will do in next revision.
> > 
> > > 
> > > And, I think this is a bit too late because we're going to return with
> > > VM_FAULT_RETRY here, while maybe we don't need to retry at all here because
> > > we're going to resolve the page fault immediately.
> > > 
> > > I assume you added this because you wanted userfaultfd_ctx_get() to make
> > > sure the uffd context will not go away from under us, but it's not needed
> > > if we're still holding the mmap read lock.  I'd expect for async mode we
> > > don't really need to release it at all.
> > I'll have to check the what should be returned here. We should return
> > something which shows that the fault has been resolved.
> 
> VM_FAULT_NOPAGE may be the best to describe it, but I guess it shouldn't
> have a difference here if to just return zero.  And, I guess you don't even
> need to worry on the retval here because I think you can leverage do_wp_page.
> More below.
> 
> > 
> > > 
> > >> +		// Resolve page fault of this page
> > > 
> > > Please use "/* ... */" as that's the common pattern of commenting in the
> > > Linux kernel, at least what I see in mm/.
> > Will do.
> > 
> > > 
> > >> +		unsigned long addr = (ctx->features & UFFD_FEATURE_EXACT_ADDRESS) ?
> > >> +				      vmf->real_address : vmf->address;
> > >> +		struct vm_area_struct *dst_vma = find_vma(ctx->mm, addr);
> > >> +		size_t s = PAGE_SIZE;
> > > 
> > > This is weird - if we want async uffd-wp, let's consider huge page from the
> > > 1st day.
> > > 
> > >> +
> > >> +		if (dst_vma->vm_flags & VM_HUGEPAGE) {
> > > 
> > > VM_HUGEPAGE is only a hint.  It doesn't mean this page is always a huge
> > > page.  For anon, we can have thp wr-protected as a whole, not happening for
> > > !anon because we'll split already.
> > > 
> > > For anon, if a write happens to a thp being uffd-wp-ed, we'll keep that pmd
> > > wr-protected and report the uffd message.  The pmd split happens when the
> > > user invokes UFFDIO_WRITEPROTECT on the small page.  I think it'll stop
> > > working for async uffd-wp because we're going to resolve the page faults
> > > right away.
> > > 
> > > So for async uffd-wp (note: this will be different from hugetlb), you may
> > > want to consider having a pre-requisite patch to change wp_huge_pmd()
> > > behavior: rather than calling handle_userfault(), IIUC you can also just
> > > fallback to the split path right below (__split_huge_pmd) so the thp will
> > > split now even before the uffd message is generated.
> > I'll make the changes and make this. I wasn't aware that the thp is being
> > broken in the UFFD WP. At this time, I'm not sure if thp will be handled by
> > handle_userfault() in one go. Probably it will as the length is stored in
> > the vmf.
> 
> Yes I think THP can actually be handled in one go with uffd-wp anon (even
> if vmf doesn't store any length because page fault is about address only
> not length, afaict).  E.g. thp firstly get wr-protected in thp size, then
> when unprotect the user app sends UFFDIO_WRITEPROTECT(wp=false) with a
> range covering the whole thp.
> 
> But AFAIU that should be quite rare because most uffd-wp scenarios are
> latency sensitive, resolving page faults in large chunk definitely enlarges
> that.  It could happen though when it's not resolving an immediate page
> fault, so it could happen in the background.
> 
> So after a second thought, a safer approach is we only go to the split path
> if async is enabled, in wp_huge_pmd().  Then it doesn't need to be a
> pre-requisite patch too, it can be part of the major patch to implement the
> uffd-wp async mode.
> 
> > 
> > > 
> > > I think it should be transparent to the user and it'll start working for
> > > you with async uffd-wp here, because it means when reaching
> > > handle_userfault, it should not be possible to have thp at all since they
> > > should have all split up.
> > > 
> > >> +			s = HPAGE_SIZE;
> > >> +			addr &= HPAGE_MASK;
> > >> +		}
> > >>  
> > >> -	init_waitqueue_func_entry(&uwq.wq, userfaultfd_wake_function);
> > >> -	uwq.wq.private = current;
> > >> -	uwq.msg = userfault_msg(vmf->address, vmf->real_address, vmf->flags,
> > >> -				reason, ctx->features);
> > >> -	uwq.ctx = ctx;
> > >> -	uwq.waken = false;
> > >> -
> > >> -	blocking_state = userfaultfd_get_blocking_state(vmf->flags);
> > >> +		ret = mwriteprotect_range(ctx->mm, addr, s, false, &ctx->mmap_changing);
> > > 
> > > This is an overkill - we're pretty sure it's a single page, no need to call
> > > a range function here.
> > Probably change_pte_range() should be used here to directly remove the WP here?
> 
> Here we can persue the best performance, or we can also persue the easist
> way to implement.  I think the best we can have is we don't release either
> the mmap read lock _and_ the pgtable lock, so we resolve the page fault
> completely here.  But that requires more code changes.
> 
> So far an probably intermediate (and very easy to implement) solution is:
> 
> (1) Remap the pte (vmf->pte) and retake the lock (vmf->ptl).  Note: you
>     need to move the chunk to be before mmap read lock released first,
>     because we'll need that to make sure pgtable lock and the pgtable page
>     being still exist at the first place.
> 
> (2) If *vmf->pte != vmf->orig_pte, it means the pgtable changed, retry
>     (with VM_FAULT_NOPAGE).  We must have orig_pte set btw in this path.
> 
> (2) Remove the uffd-wp bit if it's set (and it must be set, because we
>     checked again on orig_pte with pgtable lock held).
> 
> (3) Invoke do_wp_page() again with the same vmf.
> 
> This will focus the resolution on the single page and resolve CoW in one
> shot if needed.  We may need to redo the map/lock of pte* but I suppose it
> won't hurt a lot if we just modified the fields anyway, so we can leave
> that for later.

I just noticed it's actually quite straigtforward to just not fall into
handle_userfault at all.  It can be as simple as:

---8<---
diff --git a/mm/memory.c b/mm/memory.c
index 4000e9f017e0..09aab434654c 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3351,8 +3351,20 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf)

        if (likely(!unshare)) {
                if (userfaultfd_pte_wp(vma, *vmf->pte)) {
-                       pte_unmap_unlock(vmf->pte, vmf->ptl);
-                       return handle_userfault(vmf, VM_UFFD_WP);
+                       if (userfaultfd_uffd_wp_async(vma)) {
+                               /*
+                                * Nothing needed (cache flush, TLB
+                                * invalidations, etc.) because we're only
+                                * removing the uffd-wp bit, which is
+                                * completely invisible to the user.
+                                * This falls through to possible CoW.
+                                */
+                               set_pte_at(vma->vm_mm, vmf->address, vmf->pte,
+                                          pte_clear_uffd_wp(*vmf->pte));
+                       } else {
+                               pte_unmap_unlock(vmf->pte, vmf->ptl);
+                               return handle_userfault(vmf, VM_UFFD_WP);
+                       }
                }
---8<---

Similar thing will be needed for hugetlb if that'll be supported.

One thing worth mention is, I think for async wp it doesn't need to be
restricted by UFFD_USER_MODE_ONLY, because comparing to the sync messages
it has no risk of being utilized for malicious purposes.

> 
> [...]
> 
> > > Then when the app wants to wr-protect in async mode, it simply goes ahead
> > > with UFFDIO_WRITEPROTECT(wp=true), it'll happen exactly the same as when it
> > > was sync mode.  It's only the pf handling procedure that's different (along
> > > with how the fault is reported - rather than as a message but it'll be
> > > consolidated into the soft-dirty bit).
> > PF handling will resovle the fault after un-setting the _PAGE_*_UFFD_WP on
> > the page. I'm not changing the soft-dirty bit. It is too delicate (if you
> > get the joke).
> 
> It's unfortunate that the old soft-dirty solution didn't go through easily.
> Soft-dirty still covers something that uffd-wp cannot do right now, e.g. on
> tracking mostly any type of pte mappings.  Uffd-wp can so far only track
> fully ram backed pages like shmem or hugetlb for files but not any random
> page cache.  Hopefully it still works at least for your use case, or it's
> time to rethink otherwise.
> 
> > 
> > > 
> > >>  
> > >>  	if (mode_wp && mode_dontwake)
> > >>  		return -EINVAL;
> > >> @@ -2126,6 +2143,7 @@ static int new_userfaultfd(int flags)
> > >>  	ctx->flags = flags;
> > >>  	ctx->features = 0;
> > >>  	ctx->released = false;
> > >> +	ctx->async = false;
> > >>  	atomic_set(&ctx->mmap_changing, 0);
> > >>  	ctx->mm = current->mm;
> > >>  	/* prevent the mm struct to be freed */
> > >> diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h
> > >> index 005e5e306266..b89665653861 100644
> > >> --- a/include/uapi/linux/userfaultfd.h
> > >> +++ b/include/uapi/linux/userfaultfd.h
> > >> @@ -284,6 +284,11 @@ struct uffdio_writeprotect {
> > >>   * UFFDIO_WRITEPROTECT_MODE_DONTWAKE: set the flag to avoid waking up
> > >>   * any wait thread after the operation succeeds.
> > >>   *
> > >> + * UFFDIO_WRITEPROTECT_MODE_ASYNC_WP: set the flag to write protect a
> > >> + * range, the flag is unset automatically when the page is written.
> > >> + * This is used to track which pages have been written to from the
> > >> + * time the memory was write protected.
> > >> + *
> > >>   * NOTE: Write protecting a region (WP=1) is unrelated to page faults,
> > >>   * therefore DONTWAKE flag is meaningless with WP=1.  Removing write
> > >>   * protection (WP=0) in response to a page fault wakes the faulting
> > >> @@ -291,6 +296,7 @@ struct uffdio_writeprotect {
> > >>   */
> > >>  #define UFFDIO_WRITEPROTECT_MODE_WP		((__u64)1<<0)
> > >>  #define UFFDIO_WRITEPROTECT_MODE_DONTWAKE	((__u64)1<<1)
> > >> +#define UFFDIO_WRITEPROTECT_MODE_ASYNC_WP	((__u64)1<<2)
> > >>  	__u64 mode;
> > >>  };
> > >>  
> > >> -- 
> > >> 2.30.2
> > >>
> > > 
> > 
> > I should have added Suggested-by: Peter Xy <peterx@redhat.com> to this
> > patch. I'll add in the next revision if you don't object.
> 
> I'm fine with it.  If so, please do s/Xy/Xu/.
> 
> > 
> > I've started working on next revision. I'll reply to other highly valuable
> > review emails a bit later.
> 
> Thanks,
> 
> -- 
> Peter Xu

-- 
Peter Xu