linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: david@redhat.com, Andrew Morton <akpm@linux-foundation.org>,
	kernel@collabora.com, Paul Gofman <pgofman@codeweavers.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 1/2] mm/userfaultfd: Support WP on multiple VMAs
Date: Mon, 13 Feb 2023 16:11:11 -0500	[thread overview]
Message-ID: <Y+qnb/Ix8P5J3Kl4@x1n> (raw)
In-Reply-To: <9f0278d7-54f1-960e-ffdf-eeb2572ff6d1@collabora.com>

On Mon, Feb 13, 2023 at 10:50:39PM +0500, Muhammad Usama Anjum wrote:
> On 2/13/23 9:54 PM, Peter Xu wrote:
> > On Mon, Feb 13, 2023 at 09:31:23PM +0500, Muhammad Usama Anjum wrote:
> >> mwriteprotect_range() errors out if [start, end) doesn't fall in one
> >> VMA. We are facing a use case where multiple VMAs are present in one
> >> range of interest. For example, the following pseudocode reproduces the
> >> error which we are trying to fix:
> >>
> >> - Allocate memory of size 16 pages with PROT_NONE with mmap
> >> - Register userfaultfd
> >> - Change protection of the first half (1 to 8 pages) of memory to
> >>   PROT_READ | PROT_WRITE. This breaks the memory area in two VMAs.
> >> - Now UFFDIO_WRITEPROTECT_MODE_WP on the whole memory of 16 pages errors
> >>   out.
> >>
> >> This is a simple use case where user may or may not know if the memory
> >> area has been divided into multiple VMAs.
> >>
> >> Reported-by: Paul Gofman <pgofman@codeweavers.com>
> >> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
> >> ---
> >> Changes since v1:
> >> - Correct the start and ending values passed to uffd_wp_range()
> >> ---
> >>  mm/userfaultfd.c | 38 ++++++++++++++++++++++----------------
> >>  1 file changed, 22 insertions(+), 16 deletions(-)
> >>
> >> diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
> >> index 65ad172add27..bccea08005a8 100644
> >> --- a/mm/userfaultfd.c
> >> +++ b/mm/userfaultfd.c
> >> @@ -738,9 +738,12 @@ int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start,
> >>  			unsigned long len, bool enable_wp,
> >>  			atomic_t *mmap_changing)
> >>  {
> >> +	unsigned long end = start + len;
> >> +	unsigned long _start, _end;
> >>  	struct vm_area_struct *dst_vma;
> >>  	unsigned long page_mask;
> >>  	int err;
> > 
> > I think this needs to be initialized or it can return anything when range
> > not mapped.
> It is being initialized to -EAGAIN already. It is not visible in this patch.

I see, though -EAGAIN doesn't look suitable at all.  The old retcode for
!vma case is -ENOENT, so I think we'd better keep using it if we want to
have this patch.

> 
> > 
> >> +	VMA_ITERATOR(vmi, dst_mm, start);
> >>  
> >>  	/*
> >>  	 * Sanitize the command parameters:
> >> @@ -762,26 +765,29 @@ int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start,
> >>  	if (mmap_changing && atomic_read(mmap_changing))
> >>  		goto out_unlock;
> >>  
> >> -	err = -ENOENT;
> >> -	dst_vma = find_dst_vma(dst_mm, start, len);
> >> +	for_each_vma_range(vmi, dst_vma, end) {
> >> +		err = -ENOENT;
> >>  
> >> -	if (!dst_vma)
> >> -		goto out_unlock;
> >> -	if (!userfaultfd_wp(dst_vma))
> >> -		goto out_unlock;
> >> -	if (!vma_can_userfault(dst_vma, dst_vma->vm_flags))
> >> -		goto out_unlock;
> >> +		if (!dst_vma->vm_userfaultfd_ctx.ctx)
> >> +			break;
> >> +		if (!userfaultfd_wp(dst_vma))
> >> +			break;
> >> +		if (!vma_can_userfault(dst_vma, dst_vma->vm_flags))
> >> +			break;
> >>  
> >> -	if (is_vm_hugetlb_page(dst_vma)) {
> >> -		err = -EINVAL;
> >> -		page_mask = vma_kernel_pagesize(dst_vma) - 1;
> >> -		if ((start & page_mask) || (len & page_mask))
> >> -			goto out_unlock;
> >> -	}
> >> +		if (is_vm_hugetlb_page(dst_vma)) {
> >> +			err = -EINVAL;
> >> +			page_mask = vma_kernel_pagesize(dst_vma) - 1;
> >> +			if ((start & page_mask) || (len & page_mask))
> >> +				break;
> >> +		}
> >>  
> >> -	uffd_wp_range(dst_mm, dst_vma, start, len, enable_wp);
> >> +		_start = (dst_vma->vm_start > start) ? dst_vma->vm_start : start;
> >> +		_end = (dst_vma->vm_end < end) ? dst_vma->vm_end : end;
> >>  
> >> -	err = 0;
> >> +		uffd_wp_range(dst_mm, dst_vma, _start, _end - _start, enable_wp);
> >> +		err = 0;
> >> +	}
> >>  out_unlock:
> >>  	mmap_read_unlock(dst_mm);
> >>  	return err;
> > 
> > This whole patch also changes the abi, so I'm worried whether there can be
> > app that relies on the existing behavior.
> Even if a app is dependent on it, this change would just don't return error
> if there are multiple VMAs under the hood and handle them correctly. Most
> apps wouldn't care about VMAs anyways. I don't know if there would be any
> drastic behavior change, other than the behavior becoming nicer.

So this logic existed since the initial version of uffd-wp.  It has a good
thing that it strictly checks everything and it makes sense since uffd-wp
is per-vma attribute.  In short, the old code fails clearly.

While the new proposal is not: if -ENOENT we really have no idea what
happened at all; some ranges can be wr-protected but we don't know where
starts to go wrong.

Now I'm looking at the original problem..

 - Allocate memory of size 16 pages with PROT_NONE with mmap
 - Register userfaultfd
 - Change protection of the first half (1 to 8 pages) of memory to
   PROT_READ | PROT_WRITE. This breaks the memory area in two VMAs.
 - Now UFFDIO_WRITEPROTECT_MODE_WP on the whole memory of 16 pages errors
   out.

Why the user app should wr-protect 16 pages at all?

If so, uffd_wp_range() will be ran upon a PROT_NONE range which doesn't
make sense at all, no matter whether the user is aware of vma concept or
not...  because it's destined that it's a vain effort.

So IMHO it's the user app needs fixing here, not the interface?  I think
it's the matter of whether the monitor is aware of mprotect() being
invoked.

In short, I hope we're working on things that helps at least someone, and
we should avoid working on things that does not have clear benefit yet.
With the WP_ENGAGE new interface being proposed, I just didn't see any
benefit of changing the current interface, especially if the change can
bring uncertainties itself (e.g., should we fail upon !uffd-wp vmas, or
should we skip?).

> 
> > 
> > Is this for the new pagemap effort?  Can this just be done in the new
> > interface rather than changing the old?
> We found this bug while working on pagemap patches. It is already being
> handled in the new interface. We just thought that this use case can happen
> pretty easily and unknowingly. So the support should be added.

Thanks.  My understanding is that it would have been reported if it
affected any existing uffd-wp user.

> 
> Also mwriteprotect_range() gives a pretty straight forward way to WP or
> un-WP a range. Async WP can be used in coordination with pagemap file
> (PM_UFFD_WP flag in PTE) as well. There may be use cases for it. On another
> note, I don't see any use cases of WP async and PM_UFFD_WP flag as
> !PM_UFFD_WP flag doesn't give direct information if the page is written for
> !present pages.

Currently we do maintain PM_UFFD_WP even for swap entries, so if it was
written then I think we'll know even if the page was swapped out:

	} else if (is_swap_pte(pte)) {
		if (pte_swp_uffd_wp(pte))
			flags |= PM_UFFD_WP;
		if (pte_marker_entry_uffd_wp(entry))
			flags |= PM_UFFD_WP;

So it's working?

> 
> > 
> > Side note: in your other pagemap series, you can optimize "WP_ENGAGE &&
> > !GET" to not do generic pgtable walk at all, but use what it does in this
> > patch for the initial round or wr-protect.
> Yeah, it is implemented with some optimizations.

IIUC in your latest public version is not optimized, but I can check the
new version when it comes.

Thanks,

-- 
Peter Xu



  reply	other threads:[~2023-02-13 21:11 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-13 16:31 Muhammad Usama Anjum
2023-02-13 16:31 ` [PATCH v2 2/2] mm/userfaultfd: add VM_WARN_ONCE() Muhammad Usama Anjum
2023-02-13 16:54 ` [PATCH v2 1/2] mm/userfaultfd: Support WP on multiple VMAs Peter Xu
2023-02-13 17:50   ` Muhammad Usama Anjum
2023-02-13 21:11     ` Peter Xu [this message]
2023-02-14  8:49       ` Muhammad Usama Anjum
2023-02-14 21:50         ` Peter Xu
2023-02-15  7:08           ` Muhammad Usama Anjum
2023-02-15 21:45             ` Peter Xu
2023-02-16  6:25               ` Muhammad Usama Anjum
2023-02-16 16:41                 ` Peter Xu
2023-02-17 10:59                   ` Muhammad Usama Anjum
2023-02-17 16:03                     ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y+qnb/Ix8P5J3Kl4@x1n \
    --to=peterx@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=kernel@collabora.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=pgofman@codeweavers.com \
    --cc=usama.anjum@collabora.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox