linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Namhyung Kim <namhyung@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@kernel.org>, Ian Rogers <irogers@google.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	Kan Liang <kan.liang@linux.intel.com>,
	linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, Matthew Wilcox <willy@infradead.org>
Subject: Re: [PATCH] perf: map pages in advance
Date: Fri, 29 Nov 2024 14:24:24 +0100	[thread overview]
Message-ID: <6795cc9a-6230-431f-b089-7909f7bc4f30@redhat.com> (raw)
In-Reply-To: <94dabe57-232b-4a21-b2cf-bcfbda75c881@lucifer.local>

On 29.11.24 14:19, Lorenzo Stoakes wrote:
> On Fri, Nov 29, 2024 at 02:12:23PM +0100, David Hildenbrand wrote:
>> On 29.11.24 14:02, Lorenzo Stoakes wrote:
>>> On Fri, Nov 29, 2024 at 01:59:01PM +0100, David Hildenbrand wrote:
>>>> On 29.11.24 13:55, Lorenzo Stoakes wrote:
>>>>> On Fri, Nov 29, 2024 at 01:45:42PM +0100, David Hildenbrand wrote:
>>>>>> On 29.11.24 13:26, Peter Zijlstra wrote:
>>>>>>> On Fri, Nov 29, 2024 at 01:12:57PM +0100, David Hildenbrand wrote:
>>>>>>>
>>>>>>>> Well, I think we simply will want vm_insert_pages_prot() that stops treating
>>>>>>>> these things like folios :) . *likely*  we'd want a distinct memdesc/type.
>>>>>>>>
>>>>>>>> We could start that work right now by making some user (iouring,
>>>>>>>> ring_buffer) set a new page->_type, and checking that in
>>>>>>>> vm_insert_pages_prot() + vm_normal_page(). If set, don't touch the refcount
>>>>>>>> and the mapcount.
>>>>>>>>
>>>>>>>> Because then, we can just make all the relevant drivers set the type, refuse
>>>>>>>> in vm_insert_pages_prot() anything that doesn't have the type set, and
>>>>>>>> refuse in vm_normal_page() any pages with this memdesc.
>>>>>>>>
>>>>>>>> Maybe we'd have to teach CoW to copy from such pages, maybe not. GUP of
>>>>>>>> these things will stop working, I hope that is not a problem.
>>>>>>>
>>>>>>> Well... perf-tool likes to call write() upon these pages in order to
>>>>>>> write out the data from the mmap() into a file.
>>>>>
>>>>> I'm confused about what you mean, write() using the fd should work fine, how
>>>>> would they interact with the mmap? I mean be making a silly mistake here
>>>>
>>>> write() to file from the mmap()'ed address range to *some* file.
>>>>
>>>
>>> Yeah sorry my brain melted down briefly, for some reason was thinking of read()
>>> writing into the buffer...
>>>
>>>> This will GUP the pages you inserted.
>>>>
>>>> GUP does not work on PFNMAP.
>>>
>>> Well it _does_ if struct page **pages is set to NULL :)
>>
>> Hm? :)
>>
>> check_vma_flags() unconditionally refuses VM_PFNMAP.
> 
> Ha, funny with my name all over git blame there... ok yup missed this, the
> vm_normal_page() == NULL stuff must but for mixed map (and those other weird
> cases I think you can get0...
> 
> Well good. Where is write() invoking GUP? I'm kind of surprised it's not just
> using uaccess?
> 
> One thing to note is I did run all the perf tests with no issues whatsoever. You
> would _think_ this would have come up...
> 
> I'm editing some test code to explicitly write() from the buffer anyway to see.
> 
> If we can't do pfnmap, and we definitely can't do mixedmap (because it's
> basically entirely equivalent in every way to just faulting in the pages as
> before and requires the same hacks) then I will have to go back to the drawing
> board or somehow change the faulting code.
> 
> This really sucks.
> 
> I'm not quite sure I even understand why we don't allow GUP used _just for
> pinning_ on VM_PFNMAP when it is -in effect- already pinned on assumption
> whatever mapped it will maintain the lifetime.
> 
> What a mess...

Because VM_PFNMAP is dangerous as hell. To get a feeling for that (and also why I
raised my refcounting comment earlier) just read recent:

commit 79a61cc3fc0466ad2b7b89618a6157785f0293b3
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Wed Sep 11 17:11:23 2024 -0700

     mm: avoid leaving partial pfn mappings around in error case
     
     As Jann points out, PFN mappings are special, because unlike normal
     memory mappings, there is no lifetime information associated with the
     mapping - it is just a raw mapping of PFNs with no reference counting of
     a 'struct page'.


GUP relies on the refcount. In a PFNMAP you don't have any way to make sure the
driver won't go down, free the page, to have it used by someone else while IO is still
happening to that page.

-- 
Cheers,

David / dhildenb



  reply	other threads:[~2024-11-29 13:24 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-28 11:37 Lorenzo Stoakes
2024-11-28 13:08 ` David Hildenbrand
2024-11-28 13:20   ` Lorenzo Stoakes
2024-11-28 13:37     ` David Hildenbrand
2024-11-28 14:23       ` Lorenzo Stoakes
2024-11-29 12:12         ` David Hildenbrand
2024-11-29 12:26           ` Peter Zijlstra
2024-11-29 12:45             ` David Hildenbrand
2024-11-29 12:55               ` Lorenzo Stoakes
2024-11-29 12:59                 ` David Hildenbrand
2024-11-29 13:02                   ` Lorenzo Stoakes
2024-11-29 13:12                     ` David Hildenbrand
2024-11-29 13:19                       ` Lorenzo Stoakes
2024-11-29 13:24                         ` David Hildenbrand [this message]
2024-11-29 13:38                           ` Lorenzo Stoakes
2024-11-29 13:47                             ` David Hildenbrand
2024-11-29 13:59                               ` Lorenzo Stoakes
2024-11-29 14:09                                 ` David Hildenbrand
2024-11-29 14:35                                   ` Lorenzo Stoakes
2024-11-29 14:48                                     ` David Hildenbrand
2024-11-29 20:42                                       ` Matthew Wilcox
2024-11-29 12:48           ` Lorenzo Stoakes
2024-11-29 13:11             ` David Hildenbrand
2024-11-28 20:47 ` Lorenzo Stoakes
2024-11-29  8:52   ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6795cc9a-6230-431f-b089-7909f7bc4f30@redhat.com \
    --to=david@redhat.com \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=irogers@google.com \
    --cc=jolsa@kernel.org \
    --cc=kan.liang@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox