Re: [RFC PATCH 00/16] mm/madvise: batch tlb flushes for MADV_DONTNEED and MADV_FREE

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: SeongJae Park <sj@kernel.org>
To: Shakeel Butt <shakeel.butt@linux.dev>
Cc: SeongJae Park <sj@kernel.org>,
	"Liam R. Howlett" <howlett@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@redhat.com>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	kernel-team@meta.com, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [RFC PATCH 00/16] mm/madvise: batch tlb flushes for MADV_DONTNEED and MADV_FREE
Date: Wed,  5 Mar 2025 14:58:02 -0800	[thread overview]
Message-ID: <20250305225803.60171-1-sj@kernel.org> (raw)
In-Reply-To: <ro2wtggwxbmwk6lhvcixwrefo44x7ggeumevv7lyupvudwxjsg@onh2e46eqzcy>

On Wed, 5 Mar 2025 12:22:25 -0800 Shakeel Butt <shakeel.butt@linux.dev> wrote:

> On Wed, Mar 05, 2025 at 10:15:55AM -0800, SeongJae Park wrote:
> > For MADV_DONTNEED[_LOCKED] or MADV_FREE madvise requests, tlb flushes
> > can happen for each vma of the given address ranges.  Because such tlb
> > flushes are for address ranges of same process, doing those in a batch
> > is more efficient while still being safe.  Modify madvise() and
> > process_madvise() entry level code path to do such batched tlb flushes,
> > while the internal unmap logics do only gathering of the tlb entries to
> > flush.
> > 
> > In more detail, modify the entry functions to initialize an mmu_gather
> > ojbect and pass it to the internal logics.  Also modify the internal
> > logics to do only gathering of the tlb entries to flush into the
> > received mmu_gather object.  After all internal function calls are done,
> > the entry functions finish the mmu_gather object to flush the gathered
> > tlb entries in the one batch.
> > 
> > Patches Seuquence
> > =================
> > 
> > First four patches are minor cleanups of madvise.c for readability.
> > 
> > Following four patches (patches 5-8) define new data structure for
> > managing information that required for batched tlb flushing (mmu_gather
> > and behavior), and update code paths for MADV_DONTNEED[_LOCKED] and
> > MADV_FREE handling internal logics to receive it.
> > 
> > Three patches (patches 9-11) for making internal MADV_DONTNEED[_LOCKED]
> > and MADV_FREE handling logic ready for batched tlb flushing follow. 
> 
> I think you forgot to complete the above sentence or the 'follow' at the
> end seems weird.

Thank you for catching this.  I just wanted to say these three patches come
after the previous ones.  I will wordsmith this part in the next version.

> 
> > The
> > patches keep the support of unbatched tlb flushes use case, for
> > fine-grained and safe transitions.
> > 
> > Next three patches (patches 12-14) update madvise() and
> > process_madvise() code to do the batched tlb flushes utilizing the
> > previous patches introduced changes.
> > 
> > Final two patches (patches 15-16) clean up the internal logics'
> > unbatched tlb flushes use case support code, which is no more be used.
> > 
> > Test Results
> > ============
> > 
> > I measured the time to apply MADV_DONTNEED advice to 256 MiB memory
> > using multiple process_madvise() calls.  I apply the advice in 4 KiB
> > sized regions granularity, but with varying batch size (vlen) from 1 to
> > 1024.  The source code for the measurement is available at GitHub[1].
> > 
> > The measurement results are as below.  'sz_batches' column shows the
> > batch size of process_madvise() calls.  'before' and 'after' columns are
> > the measured time to apply MADV_DONTNEED to the 256 MiB memory buffer in
> > nanoseconds, on kernels that built without and with the MADV_DONTNEED
> > tlb flushes batching patch of this series, respectively.  For the
> > baseline, mm-unstable tree of 2025-03-04[2] has been used.
> > 'after/before' column is the ratio of 'after' to 'before'.  So
> > 'afetr/before' value lower than 1.0 means this patch increased
> > efficiency over the baseline.  And lower value means better efficiency.
> 
> I would recommend to replace the after/end column with percentage i.e.
> percentage improvement or degradation.

Thank you for the nice suggestion.  I will do so in the next version.

> 
> > 
> >     sz_batches    before       after        after/before
> >     1             102842895    106507398    1.03563204828102
> >     2             73364942     74529223     1.01586971880929
> >     4             58823633     51608504     0.877343022998937
> >     8             47532390     44820223     0.942940655834895
> >     16            43591587     36727177     0.842529018271347
> >     32            44207282     33946975     0.767904595446515
> >     64            41832437     26738286     0.639175910310939
> >     128           40278193     23262940     0.577556694263817
> >     256           41568533     22355103     0.537789077136785
> >     512           41626638     22822516     0.54826709762148
> >     1024          44440870     22676017     0.510251419470411
> > 
> > For <=2 batch size, tlb flushes batching shows no big difference but
> > slight overhead.  I think that's in an error range of this simple
> > micro-benchmark, and therefore can be ignored.  
> 
> I would recommend to run the experiment multiple times and report
> averages and standard deviation which will support your error range
> claim.

Again, good suggestion.  I will do so.

> 
> > Starting from batch size
> > 4, however, tlb flushes batching shows clear efficiency gain.  The
> > efficiency gain tends to be proportional to the batch size, as expected.
> > The efficiency gain ranges from about 13 percent with batch size 4, and
> > up to 49 percent with batch size 1,024.
> > 
> > Please note that this is a very simple microbenchmark, so real
> > efficiency gain on real workload could be very different.
> > 
> 
> I think you are running a single thread benchmark on a free machine. I
> expect this series to be much more beneficial on loaded machine and for
> multi-threaded applications.

Your understanding of my test setup is correct and I agree to your expectation.

> No need to test that scenario but if you
> have already done that then it would be good to report.

I don't have such test results or plans for those with specific timeline for
now.  I will share those if I get a chance, of course.


Thanks,
SJ

next prev parent reply	other threads:[~2025-03-05 22:58 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-05 18:15 SeongJae Park
2025-03-05 18:15 ` [RFC PATCH 01/16] mm/madvise: use is_memory_failure() from madvise_do_behavior() SeongJae Park
2025-03-05 20:25   ` Shakeel Butt
2025-03-05 23:13     ` SeongJae Park
2025-03-05 18:15 ` [RFC PATCH 02/16] mm/madvise: split out populate behavior check logic SeongJae Park
2025-03-05 20:32   ` Shakeel Butt
2025-03-05 23:18     ` SeongJae Park
2025-03-05 18:15 ` [RFC PATCH 03/16] mm/madvise: deduplicate madvise_do_behavior() skip case handlings SeongJae Park
2025-03-05 18:15 ` [RFC PATCH 04/16] mm/madvise: remove len parameter of madvise_do_behavior() SeongJae Park
2025-03-05 18:16 ` [RFC PATCH 05/16] mm/madvise: define and use madvise_behavior struct for madvise_do_behavior() SeongJae Park
2025-03-05 21:02   ` Shakeel Butt
2025-03-05 21:40     ` Shakeel Butt
2025-03-05 23:56       ` SeongJae Park
2025-03-06  3:37         ` Shakeel Butt
2025-03-06  4:18           ` SeongJae Park
2025-03-05 18:16 ` [RFC PATCH 06/16] mm/madvise: pass madvise_behavior struct to madvise_vma_behavior() SeongJae Park
2025-03-05 18:16 ` [RFC PATCH 07/16] mm/madvise: make madvise_walk_vmas() visit function receives a void pointer SeongJae Park
2025-03-05 18:16 ` [RFC PATCH 08/16] mm/madvise: pass madvise_behavior struct to madvise_dontneed_free() SeongJae Park
2025-03-05 18:16 ` [RFC PATCH 09/16] mm/memory: split non-tlb flushing part from zap_page_range_single() SeongJae Park
2025-03-06 18:45   ` Shakeel Butt
2025-03-06 19:09     ` SeongJae Park
2025-03-05 18:16 ` [RFC PATCH 10/16] mm/madvise: let madvise_dontneed_single_vma() caller batches tlb flushes SeongJae Park
2025-03-06 18:36   ` Shakeel Butt
2025-03-06 19:10     ` SeongJae Park
2025-03-05 18:16 ` [RFC PATCH 11/16] mm/madvise: let madvise_free_single_vma() " SeongJae Park
2025-03-05 18:16 ` [RFC PATCH 12/16] mm/madvise: batch tlb flushes for process_madvise(MADV_DONTNEED[_LOCKED]) SeongJae Park
2025-03-06 18:36   ` Shakeel Butt
2025-03-06 19:11     ` SeongJae Park
2025-03-05 18:16 ` [RFC PATCH 13/16] mm/madvise: batch tlb flushes for process_madvise(MADV_FREE) SeongJae Park
2025-03-05 18:16 ` [RFC PATCH 14/16] mm/madvise: batch tlb flushes for madvise(MADV_{DONTNEED[_LOCKED],FREE} SeongJae Park
2025-03-05 18:16 ` [RFC PATCH 15/16] mm/madvise: remove !tlb support from madvise_dontneed_single_vma() SeongJae Park
2025-03-06 18:37   ` Shakeel Butt
2025-03-05 18:16 ` [RFC PATCH 16/16] mm/madvise: remove !caller_tlb case of madvise_free_single_vma() SeongJae Park
2025-03-05 18:56 ` [RFC PATCH 00/16] mm/madvise: batch tlb flushes for MADV_DONTNEED and MADV_FREE Matthew Wilcox
2025-03-05 19:19   ` David Hildenbrand
2025-03-05 19:26     ` Lorenzo Stoakes
2025-03-05 19:35       ` David Hildenbrand
2025-03-05 19:39         ` Lorenzo Stoakes
2025-03-05 19:46     ` Shakeel Butt
2025-03-05 19:49       ` David Hildenbrand
2025-03-05 20:59         ` SeongJae Park
2025-03-05 19:49       ` Lorenzo Stoakes
2025-03-05 19:57         ` Shakeel Butt
2025-03-05 22:46           ` SeongJae Park
2025-03-05 20:22 ` Shakeel Butt
2025-03-05 22:58   ` SeongJae Park [this message]
2025-03-05 20:36 ` Nadav Amit
2025-03-05 23:02   ` SeongJae Park

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250305225803.60171-1-sj@kernel.org \
    --to=sj@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=howlett@gmail.com \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=shakeel.butt@linux.dev \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox