linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>,
	SeongJae Park <sj@kernel.org>,
	"Liam R. Howlett" <howlett@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	Vlastimil Babka <vbabka@suse.cz>,
	kernel-team@meta.com, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [RFC PATCH 00/16] mm/madvise: batch tlb flushes for MADV_DONTNEED and MADV_FREE
Date: Wed, 5 Mar 2025 19:26:23 +0000	[thread overview]
Message-ID: <8c7714ca-a766-4a07-bfdc-f1d91aee379d@lucifer.local> (raw)
In-Reply-To: <46960f37-0d12-4cfd-a214-1ddae2495665@redhat.com>

On Wed, Mar 05, 2025 at 08:19:41PM +0100, David Hildenbrand wrote:
> On 05.03.25 19:56, Matthew Wilcox wrote:
> > On Wed, Mar 05, 2025 at 10:15:55AM -0800, SeongJae Park wrote:
> > > For MADV_DONTNEED[_LOCKED] or MADV_FREE madvise requests, tlb flushes
> > > can happen for each vma of the given address ranges.  Because such tlb
> > > flushes are for address ranges of same process, doing those in a batch
> > > is more efficient while still being safe.  Modify madvise() and
> > > process_madvise() entry level code path to do such batched tlb flushes,
> > > while the internal unmap logics do only gathering of the tlb entries to
> > > flush.
> >
> > Do real applications actually do madvise requests that span multiple
> > VMAs?  It just seems weird to me.  Like, each vma comes from a separate
> > call to mmap [1], so why would it make sense for an application to
> > call madvise() across a VMA boundary?
>
> I had the same question. If this happens in an app, I would assume that a
> single MADV_DONTNEED call would usually not span multiples VMAs, and if it
> does, not that many (and that often) that we would really care about it.
>
> OTOH, optimizing tlb flushing when using a vectored MADV_DONTNEED version
> would make more sense to me. I don't recall if process_madvise() allows for
> that already, and if it does, is this series primarily tackling optimizing
> that?

Yeah it's weird, but people can get caught out by unexpected failures to merge
if they do fun stuff with mremap().

Then again mremap() itself _mandates_ that you only span a single VMA (or part
of one) :)

Can we talk about the _true_ horror show that - you can span multiple VMAs _with
gaps_ and it'll allow you, only it'll return -ENOMEM at the end?

In madvise_walk_vmas():

	for (;;) {
		...

		if (start < vma->vm_start) {
			unmapped_error = -ENOMEM;
			start = vma->vm_start;
			...
		}

		...

		error = visit(vma, &prev, start, tmp, arg);
		if (error)
			return error;

		...
	}

	return unmapped_error;

So, you have no idea if that -ENOMEM is due to a gap, or do to the
operation returning an -ENOMEM?

I mean can we just drop this? Does anybody in their right mind rely on
this? Or is it intentional to deal with somehow a racing unmap?

But, no, we hold an mmap lock so that's not it.

Yeah OK so can we drop this madness? :) or am I missing some very important
detail about why we allow this?

I guess spanning multiple VMAs we _have_ to leave in because plausibly
there are users of that out there?

>
> --
> Cheers,
>
> David / dhildenb
>


  reply	other threads:[~2025-03-05 19:26 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-05 18:15 SeongJae Park
2025-03-05 18:15 ` [RFC PATCH 01/16] mm/madvise: use is_memory_failure() from madvise_do_behavior() SeongJae Park
2025-03-05 20:25   ` Shakeel Butt
2025-03-05 23:13     ` SeongJae Park
2025-03-05 18:15 ` [RFC PATCH 02/16] mm/madvise: split out populate behavior check logic SeongJae Park
2025-03-05 20:32   ` Shakeel Butt
2025-03-05 23:18     ` SeongJae Park
2025-03-05 18:15 ` [RFC PATCH 03/16] mm/madvise: deduplicate madvise_do_behavior() skip case handlings SeongJae Park
2025-03-05 18:15 ` [RFC PATCH 04/16] mm/madvise: remove len parameter of madvise_do_behavior() SeongJae Park
2025-03-05 18:16 ` [RFC PATCH 05/16] mm/madvise: define and use madvise_behavior struct for madvise_do_behavior() SeongJae Park
2025-03-05 21:02   ` Shakeel Butt
2025-03-05 21:40     ` Shakeel Butt
2025-03-05 23:56       ` SeongJae Park
2025-03-06  3:37         ` Shakeel Butt
2025-03-06  4:18           ` SeongJae Park
2025-03-05 18:16 ` [RFC PATCH 06/16] mm/madvise: pass madvise_behavior struct to madvise_vma_behavior() SeongJae Park
2025-03-05 18:16 ` [RFC PATCH 07/16] mm/madvise: make madvise_walk_vmas() visit function receives a void pointer SeongJae Park
2025-03-05 18:16 ` [RFC PATCH 08/16] mm/madvise: pass madvise_behavior struct to madvise_dontneed_free() SeongJae Park
2025-03-05 18:16 ` [RFC PATCH 09/16] mm/memory: split non-tlb flushing part from zap_page_range_single() SeongJae Park
2025-03-06 18:45   ` Shakeel Butt
2025-03-06 19:09     ` SeongJae Park
2025-03-05 18:16 ` [RFC PATCH 10/16] mm/madvise: let madvise_dontneed_single_vma() caller batches tlb flushes SeongJae Park
2025-03-06 18:36   ` Shakeel Butt
2025-03-06 19:10     ` SeongJae Park
2025-03-05 18:16 ` [RFC PATCH 11/16] mm/madvise: let madvise_free_single_vma() " SeongJae Park
2025-03-05 18:16 ` [RFC PATCH 12/16] mm/madvise: batch tlb flushes for process_madvise(MADV_DONTNEED[_LOCKED]) SeongJae Park
2025-03-06 18:36   ` Shakeel Butt
2025-03-06 19:11     ` SeongJae Park
2025-03-05 18:16 ` [RFC PATCH 13/16] mm/madvise: batch tlb flushes for process_madvise(MADV_FREE) SeongJae Park
2025-03-05 18:16 ` [RFC PATCH 14/16] mm/madvise: batch tlb flushes for madvise(MADV_{DONTNEED[_LOCKED],FREE} SeongJae Park
2025-03-05 18:16 ` [RFC PATCH 15/16] mm/madvise: remove !tlb support from madvise_dontneed_single_vma() SeongJae Park
2025-03-06 18:37   ` Shakeel Butt
2025-03-05 18:16 ` [RFC PATCH 16/16] mm/madvise: remove !caller_tlb case of madvise_free_single_vma() SeongJae Park
2025-03-05 18:56 ` [RFC PATCH 00/16] mm/madvise: batch tlb flushes for MADV_DONTNEED and MADV_FREE Matthew Wilcox
2025-03-05 19:19   ` David Hildenbrand
2025-03-05 19:26     ` Lorenzo Stoakes [this message]
2025-03-05 19:35       ` David Hildenbrand
2025-03-05 19:39         ` Lorenzo Stoakes
2025-03-05 19:46     ` Shakeel Butt
2025-03-05 19:49       ` David Hildenbrand
2025-03-05 20:59         ` SeongJae Park
2025-03-05 19:49       ` Lorenzo Stoakes
2025-03-05 19:57         ` Shakeel Butt
2025-03-05 22:46           ` SeongJae Park
2025-03-05 20:22 ` Shakeel Butt
2025-03-05 22:58   ` SeongJae Park
2025-03-05 20:36 ` Nadav Amit
2025-03-05 23:02   ` SeongJae Park

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8c7714ca-a766-4a07-bfdc-f1d91aee379d@lucifer.local \
    --to=lorenzo.stoakes@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=howlett@gmail.com \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=shakeel.butt@linux.dev \
    --cc=sj@kernel.org \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox