From: Lorenzo Stoakes <lstoakes@gmail.com>
To: David Hildenbrand <david@redhat.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Jason Gunthorpe <jgg@ziepe.ca>, Jens Axboe <axboe@kernel.dk>,
Matthew Wilcox <willy@infradead.org>,
Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>,
Leon Romanovsky <leon@kernel.org>,
Christian Benvenuti <benve@cisco.com>,
Nelson Escobar <neescoba@cisco.com>,
Bernard Metzler <bmt@zurich.ibm.com>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>,
Arnaldo Carvalho de Melo <acme@kernel.org>,
Mark Rutland <mark.rutland@arm.com>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
Jiri Olsa <jolsa@kernel.org>, Namhyung Kim <namhyung@kernel.org>,
Ian Rogers <irogers@google.com>,
Adrian Hunter <adrian.hunter@intel.com>,
Bjorn Topel <bjorn@kernel.org>,
Magnus Karlsson <magnus.karlsson@intel.com>,
Maciej Fijalkowski <maciej.fijalkowski@intel.com>,
Jonathan Lemon <jonathan.lemon@gmail.com>,
"David S . Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Christian Brauner <brauner@kernel.org>,
Richard Cochran <richardcochran@gmail.com>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Jesper Dangaard Brouer <hawk@kernel.org>,
John Fastabend <john.fastabend@gmail.com>,
linux-fsdevel@vger.kernel.org, linux-perf-users@vger.kernel.org,
netdev@vger.kernel.org, bpf@vger.kernel.org,
Oleg Nesterov <oleg@redhat.com>, Jason Gunthorpe <jgg@nvidia.com>,
John Hubbard <jhubbard@nvidia.com>, Jan Kara <jack@suse.cz>,
"Kirill A . Shutemov" <kirill@shutemov.name>,
Pavel Begunkov <asml.silence@gmail.com>,
Mika Penttila <mpenttil@redhat.com>,
Dave Chinner <david@fromorbit.com>, Theodore Ts'o <tytso@mit.edu>,
Peter Xu <peterx@redhat.com>,
Matthew Rosato <mjrosato@linux.ibm.com>,
"Paul E . McKenney" <paulmck@kernel.org>,
Christian Borntraeger <borntraeger@linux.ibm.com>
Subject: Re: [PATCH v7 1/3] mm/mmap: separate writenotify and dirty tracking logic
Date: Tue, 2 May 2023 18:09:14 +0100 [thread overview]
Message-ID: <bf04a98a-9de6-4532-a36c-59572d22dd7c@lucifer.local> (raw)
In-Reply-To: <f777a151-edfc-4882-8aca-9a926179c5bb@lucifer.local>
On Tue, May 02, 2023 at 05:53:46PM +0100, Lorenzo Stoakes wrote:
> On Tue, May 02, 2023 at 06:38:53PM +0200, David Hildenbrand wrote:
> > On 02.05.23 18:34, Lorenzo Stoakes wrote:
> > > vma_wants_writenotify() is specifically intended for setting PTE page table
> > > flags, accounting for existing PTE flag state and whether that might
> > > already be read-only while mixing this check with a check whether the
> > > filesystem performs dirty tracking.
> > >
> > > Separate out the notions of dirty tracking and a PTE write notify checking
> > > in order that we can invoke the dirty tracking check from elsewhere.
> > >
> > > Note that this change introduces a very small duplicate check of the
> > > separated out vm_ops_needs_writenotify(). This is necessary to avoid making
> > > vma_needs_dirty_tracking() needlessly complicated (e.g. passing a
> > > check_writenotify flag or having it assume this check was already
> > > performed). This is such a small check that it doesn't seem too egregious
> > > to do this.
> > >
> > > Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
> > > Reviewed-by: John Hubbard <jhubbard@nvidia.com>
> > > Reviewed-by: Mika Penttilä <mpenttil@redhat.com>
> > > Reviewed-by: Jan Kara <jack@suse.cz>
> > > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> > > ---
> > > include/linux/mm.h | 1 +
> > > mm/mmap.c | 36 +++++++++++++++++++++++++++---------
> > > 2 files changed, 28 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > > index 27ce77080c79..7b1d4e7393ef 100644
> > > --- a/include/linux/mm.h
> > > +++ b/include/linux/mm.h
> > > @@ -2422,6 +2422,7 @@ extern unsigned long move_page_tables(struct vm_area_struct *vma,
> > > #define MM_CP_UFFD_WP_ALL (MM_CP_UFFD_WP | \
> > > MM_CP_UFFD_WP_RESOLVE)
> > > +bool vma_needs_dirty_tracking(struct vm_area_struct *vma);
> > > int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot);
> > > static inline bool vma_wants_manual_pte_write_upgrade(struct vm_area_struct *vma)
> > > {
> > > diff --git a/mm/mmap.c b/mm/mmap.c
> > > index 5522130ae606..295c5f2e9bd9 100644
> > > --- a/mm/mmap.c
> > > +++ b/mm/mmap.c
> > > @@ -1475,6 +1475,31 @@ SYSCALL_DEFINE1(old_mmap, struct mmap_arg_struct __user *, arg)
> > > }
> > > #endif /* __ARCH_WANT_SYS_OLD_MMAP */
> > > +/* Do VMA operations imply write notify is required? */
> > > +static bool vm_ops_needs_writenotify(const struct vm_operations_struct *vm_ops)
> > > +{
> > > + return vm_ops && (vm_ops->page_mkwrite || vm_ops->pfn_mkwrite);
> > > +}
> > > +
> > > +/*
> > > + * Does this VMA require the underlying folios to have their dirty state
> > > + * tracked?
> > > + */
> > > +bool vma_needs_dirty_tracking(struct vm_area_struct *vma)
> > > +{
> >
> > Sorry for not noticing this earlier, but ...
>
> pints_owed++
>
> >
> > what about MAP_PRIVATE mappings? When we write, we populate an anon page,
> > which will work as expected ... because we don't have to notify the fs?
> >
> > I think you really also want the "If it was private or non-writable, the
> > write bit is already clear */" part as well and remove "false" in that case.
> >
>
> Not sure a 'write bit is already clear' case is relevant to checking
> whether a filesystem dirty tracks? That seems specific entirely to the page
> table bits.
>
> That's why I didn't include it,
>
> A !VM_WRITE shouldn't be GUP-writable except for FOLL_FORCE, and that
> surely could be problematic if VM_MAYWRITE later?
>
> Thinking about it though a !VM_SHARE should probably can be safely assumed
> to not be dirty-trackable, so we probably do need to add a check for
> !VM_SHARED -> !vma_needs_dirty_tracking
>
On second thoughts, we explicitly check FOLL_FORCE && !is_cow_mapping() in
check_vma_flags() so that case cannot occur.
So actually yes we should probably include this on the basis of that and
the fact that a FOLL_WRITE operation will CoW the MAP_PRIVATE mapping.
This was an (over)abundance of caution.
Will fix on respin.
> > Or was there a good reason to disallow private mappings as well?
> >
>
> Until the page is CoW'd walking the page tables will get you to the page
> cache page right? This was the reason I (perhaps rather too quickly) felt
> MAP_PRIVATE should be excluded.
>
> However a FOLL_WRITE would trigger CoW... and then we'd be trivially OK.
>
> So yeah, ok perhaps I dismissed that a little too soon. I was concerned
> about some sort of egregious FOLL_FORCE case where somehow we'd end up with
> the page cache folio. But actually, that probably can't happen...
>
> > --
> > Thanks,
> >
> > David / dhildenb
> >
next prev parent reply other threads:[~2023-05-02 17:09 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-02 16:34 [PATCH v7 0/3] mm/gup: disallow GUP writing to file-backed mappings by default Lorenzo Stoakes
2023-05-02 16:34 ` [PATCH v7 1/3] mm/mmap: separate writenotify and dirty tracking logic Lorenzo Stoakes
2023-05-02 16:38 ` David Hildenbrand
2023-05-02 16:53 ` Lorenzo Stoakes
2023-05-02 17:09 ` Lorenzo Stoakes [this message]
2023-05-02 17:16 ` David Hildenbrand
2023-05-02 16:34 ` [PATCH v7 2/3] mm/gup: disallow FOLL_LONGTERM GUP-nonfast writing to file-backed mappings Lorenzo Stoakes
2023-05-02 16:42 ` David Hildenbrand
2023-05-02 16:56 ` Lorenzo Stoakes
2023-05-02 16:34 ` [PATCH v7 3/3] mm/gup: disallow FOLL_LONGTERM GUP-fast " Lorenzo Stoakes
2023-05-02 17:13 ` David Hildenbrand
2023-05-02 17:22 ` Peter Zijlstra
2023-05-02 17:34 ` David Hildenbrand
2023-05-02 18:17 ` Lorenzo Stoakes
2023-05-02 19:07 ` David Hildenbrand
2023-05-02 19:07 ` Jason Gunthorpe
2023-05-02 19:25 ` Lorenzo Stoakes
2023-05-02 19:33 ` David Hildenbrand
2023-05-02 19:37 ` Lorenzo Stoakes
2023-05-02 23:51 ` Jason Gunthorpe
2023-05-03 0:22 ` Jason Gunthorpe
2023-05-02 18:59 ` Peter Zijlstra
2023-05-02 19:05 ` David Hildenbrand
2023-05-02 17:34 ` Lorenzo Stoakes
2023-05-02 17:31 ` Lorenzo Stoakes
2023-05-02 17:38 ` David Hildenbrand
2023-05-02 17:45 ` Lorenzo Stoakes
2023-05-02 19:17 ` David Hildenbrand
2023-05-02 19:45 ` Lorenzo Stoakes
2023-05-02 18:45 ` [PATCH v7 0/3] mm/gup: disallow GUP writing to file-backed mappings by default Matthew Rosato
2023-05-02 18:53 ` Lorenzo Stoakes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bf04a98a-9de6-4532-a36c-59572d22dd7c@lucifer.local \
--to=lstoakes@gmail.com \
--cc=acme@kernel.org \
--cc=adrian.hunter@intel.com \
--cc=akpm@linux-foundation.org \
--cc=alexander.shishkin@linux.intel.com \
--cc=asml.silence@gmail.com \
--cc=ast@kernel.org \
--cc=axboe@kernel.dk \
--cc=benve@cisco.com \
--cc=bjorn@kernel.org \
--cc=bmt@zurich.ibm.com \
--cc=borntraeger@linux.ibm.com \
--cc=bpf@vger.kernel.org \
--cc=brauner@kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=david@fromorbit.com \
--cc=david@redhat.com \
--cc=dennis.dalessandro@cornelisnetworks.com \
--cc=edumazet@google.com \
--cc=hawk@kernel.org \
--cc=irogers@google.com \
--cc=jack@suse.cz \
--cc=jgg@nvidia.com \
--cc=jgg@ziepe.ca \
--cc=jhubbard@nvidia.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=jonathan.lemon@gmail.com \
--cc=kirill@shutemov.name \
--cc=kuba@kernel.org \
--cc=leon@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=maciej.fijalkowski@intel.com \
--cc=magnus.karlsson@intel.com \
--cc=mark.rutland@arm.com \
--cc=mingo@redhat.com \
--cc=mjrosato@linux.ibm.com \
--cc=mpenttil@redhat.com \
--cc=namhyung@kernel.org \
--cc=neescoba@cisco.com \
--cc=netdev@vger.kernel.org \
--cc=oleg@redhat.com \
--cc=pabeni@redhat.com \
--cc=paulmck@kernel.org \
--cc=peterx@redhat.com \
--cc=peterz@infradead.org \
--cc=richardcochran@gmail.com \
--cc=tytso@mit.edu \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox