Re: [PATCH] mm/userfaultfd: provide unmasked address on page-fault

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: David Hildenbrand <david@redhat.com>
To: Nadav Amit <nadav.amit@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Peter Xu <peterx@redhat.com>, LKML <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Mike Rapoport <rppt@linux.vnet.ibm.com>, Jan Kara <jack@suse.cz>,
	stable@vger.kernel.org
Subject: Re: [PATCH] mm/userfaultfd: provide unmasked address on page-fault
Date: Sat, 9 Oct 2021 09:59:35 +0200	[thread overview]
Message-ID: <f5ea62e1-cf21-6bd8-37f8-1a0f9637402c@redhat.com> (raw)
In-Reply-To: <E2ADE3F0-74B1-4D1D-80AE-0BBC49D932E6@gmail.com>

On 09.10.21 00:02, Nadav Amit wrote:
> 
> 
>> On Oct 8, 2021, at 1:05 AM, David Hildenbrand <david@redhat.com> wrote:
>>
>> On 08.10.21 01:50, Nadav Amit wrote:
>>> From: Nadav Amit <namit@vmware.com>
>>> Userfaultfd is supposed to provide the full address (i.e., unmasked) of
>>> the faulting access back to userspace. However, that is not the case for
>>> quite some time.
>>> Even running "userfaultfd_demo" from the userfaultfd man page provides
>>> the wrong output (and contradicts the man page). Notice that
>>> "UFFD_EVENT_PAGEFAULT event" shows the masked address.
>>> 	Address returned by mmap() = 0x7fc5e30b3000
>>> 	fault_handler_thread():
>>> 	    poll() returns: nready = 1; POLLIN = 1; POLLERR = 0
>>> 	    UFFD_EVENT_PAGEFAULT event: flags = 0; address = 7fc5e30b3000
>>> 		(uffdio_copy.copy returned 4096)
>>> 	Read address 0x7fc5e30b300f in main(): A
>>> 	Read address 0x7fc5e30b340f in main(): A
>>> 	Read address 0x7fc5e30b380f in main(): A
>>> 	Read address 0x7fc5e30b3c0f in main(): A
>>> Add a new "real_address" field to vmf to hold the unmasked address. It
>>> is possible to keep the unmasked address in the existing address field
>>> (and mask whenever necessary) instead, but this is likely to cause
>>> backporting problems of this patch.
>>
>> Can we be sure that no existing users will rely on this behavior that has been the case since end of 2016 IIRC, one year after UFFD was upstreamed?
> 
> Let me to blow off your mind: how do you be sure that the current behavior does not make applications to misbehave? It might cause performance issues as it did for me or hidden correctness issues.
> 

Fair point, but now we can speculate what's more likely:

Having an app rely on >4 year old kernel behavior just after the feature 
was released or having and app rely on kernel behavior that was the case 
for the last 4 years?

<offtopic>
Someone once told me about the unwritten way to remove things from the 
kernel. 1) Silently break it upstream 2) Wait 2 kernel releases 3) 
Propose removal of the feature because it's broken and nobody complained.
<\offtopic>

You might ask "why does David even care?", here is why:

For the records, I *do* have a prototype from last year that breaks with 
this new behavior as far as I can tell: using uffd in the context of 
virtio-balloon in QEMU. I just pushed the latest state to a !private 
github tree:
   https://github.com/davidhildenbrand/qemu/tree/virtio-balloon-uffd

In that code, I made sure that I'm only dealing with 4k pages (because 
that's the only thing virtio-balloon really can deal with), and during 
the debugging I figured that the kernel always returns 4k aligned page 
fault addresses, so I didn't care about masking. I'll reuse the 
unmodified fault address for UFFDIO_ZEROPAGE()/UFFDIO_COPY()/... which 
should then fail because:

"
EINVAL The start or the len field of the ufdio_range structure
               was not a multiple of the system page size; or len was
               zero; or the specified range was otherwise invalid.
"

If I'm too lazy to read all documentation, I'm quite sure that there are 
other people that don't. I don't care to much if this patch breaks that 
prototype, it's just a prototype after all, but I am concerned that we 
might break other users in a similar way.

>> I do wonder what the official ABI nowadays is, because man pages aren't necessarily the source of truth.
> 
> Documentation/admin-guide/mm/userfaultfd.rst says: "You get the address of the access that triggered the missing page
> event”.
> 
> So it is a bug.

The least thing I would expect in the patch description is a better 
motivation ("who cares and why" -- I know you have a better motivation 
that making the doc correct :) ) and a discussion on the chances of this 
actually breaking other apps (see my example).

I'd sleep better if we'd glue the changed behavior to a new feature 
flag, but that's just my 2 cents.

-- 
Thanks,

David / dhildenb

next prev parent reply	other threads:[~2021-10-09  7:59 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-07 23:50 Nadav Amit
2021-10-08  8:05 ` David Hildenbrand
2021-10-08 22:02   ` Nadav Amit
2021-10-09  7:59     ` David Hildenbrand [this message]
2021-10-10  5:29   ` Mike Rapoport

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f5ea62e1-cf21-6bd8-37f8-1a0f9637402c@redhat.com \
    --to=david@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=jack@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nadav.amit@gmail.com \
    --cc=peterx@redhat.com \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox