linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Byungchul Park <byungchul@sk.com>
To: Michal Hocko <mhocko@suse.com>
Cc: Matthew Wilcox <willy@infradead.org>,
	Dave Hansen <dave.hansen@intel.com>,
	David Hildenbrand <david@redhat.com>,
	Byungchul Park <lkml.byungchul.park@gmail.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	kernel_team@skhynix.com, akpm@linux-foundation.org,
	ying.huang@intel.com, vernhao@tencent.com,
	mgorman@techsingularity.net, hughd@google.com,
	peterz@infradead.org, luto@kernel.org, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	rjgolo@gmail.com
Subject: Re: [PATCH v11 09/12] mm: implement LUF(Lazy Unmap Flush) defering tlb flush when folios get unmapped
Date: Fri, 14 Jun 2024 11:45:18 +0900	[thread overview]
Message-ID: <20240614024518.GB47085@system.software.com> (raw)
In-Reply-To: <Zmg7GXK1SGFJNdge@tiehlicka>

On Tue, Jun 11, 2024 at 01:55:05PM +0200, Michal Hocko wrote:
> On Tue 11-06-24 09:55:23, Byungchul Park wrote:
> > On Mon, Jun 10, 2024 at 03:23:49PM +0200, Michal Hocko wrote:
> > > On Tue 04-06-24 09:34:48, Byungchul Park wrote:
> > > > On Mon, Jun 03, 2024 at 06:01:05PM +0100, Matthew Wilcox wrote:
> > > > > On Mon, Jun 03, 2024 at 09:37:46AM -0700, Dave Hansen wrote:
> > > > > > Yeah, we'd need some equivalent of a PTE marker, but for the page cache.
> > > > > >  Presumably some xa_value() that means a reader has to go do a
> > > > > > luf_flush() before going any farther.
> > > > > 
> > > > > I can allocate one for that.  We've got something like 1000 currently
> > > > > unused values which can't be mistaken for anything else.
> > > > > 
> > > > > > That would actually have a chance at fixing two issues:  One where a new
> > > > > > page cache insertion is attempted.  The other where someone goes to look
> > > > > > in the page cache and takes some action _because_ it is empty (I think
> > > > > > NFS is doing some of this for file locks).
> > > > > > 
> > > > > > LUF is also pretty fundamentally built on the idea that files can't
> > > > > > change without LUF being aware.  That model seems to work decently for
> > > > > > normal old filesystems on normal old local block devices.  I'm worried
> > > > > > about NFS, and I don't know how seriously folks take FUSE, but it
> > > > > > obviously can't work well for FUSE.
> > > > > 
> > > > > I'm more concerned with:
> > > > > 
> > > > >  - page goes back to buddy
> > > > >  - page is allocated to slab
> > > > 
> > > > At this point, tlb flush needed will be performed in prep_new_page().
> > > 
> > > But that does mean that an unaware caller would get an additional
> > > overhead of the flushing, right? I think it would be just a matter of
> > 
> > pcp for locality is already a better source of side channel attack.  FYI,
> > tlb flush gets barely performed only if pending tlb flush exists.
> 
> Right but rare and hard to predict latencies are much worse than
> consistent once.

No doubt it'd be the best if we keep things consistent as long as
possible.  How consistent *we require* it would be, matters.  Lemme know
criteria for that if any.  I will check it.

> > > time before somebody can turn that into a side channel attack, not to
> > > mention unexpected latencies introduced.
> > 
> > Nope.  The pending tlb flush performed in prep_new_page() is the one
> > that would've done already with the vanilla kernel.  It's not additional
> > tlb flushes but it's subset of all the skipped ones.
> 
> But those skipped once could have happened in a completely different
> context (e.g. a different process or even a diffrent security domain),
> right?

Right.

> > It's worth noting all the existing mm reclaim mechaisms have already
> > introduced worse unexpected latencies.
> 
> Right, but a reclaim, especially direct reclaim, are expected to be
> slow. It is much different to see spike latencies on system with a lot
> of memory.

Talking about rt system?  In rt system, the system should prevent its
memory from being reclaimed, IMHO, since these will add unexpected
latencies.

Reclaim and migrations alreay introduce unexpected latencies themselves.
Why does only latencies by luf matter?  I'm asking to understand what
you mean, in order to fix luf if any.

   vanilla
   -------
   alloc_page() {
      ...
      preempted by kswapd or direct reclaim {
         ...
         reclaim
            unmap file pages
   	 tlb shootdown
         ...
         migration
            unmap pages
   	 tlb shootdown
         ...
      }
      ...
      interrupted by tlb shootdown from other CPUs {
         ...
      }
      ...
      prep_new_page() {
         ...
      }
   }
   
   with luf
   --------
   alloc_page() {
      ...
      preempted by kswapd or direct reclaim {
         ...
         reclaim
            unmap file pages
   	 (skip tlb shootdown)
         ...
         migration
            unmap pages
   	 (skip tlb shootdown)
         ...
      }
      ...
      interrupted by tlb shootdown from other CPUs {
         ...
      }
      ...
      prep_new_page() {
         ...
         /*
          * This can be tlb shootdown skipped in this context or others.
          */
         tlb shootdown with much smaller cpumask
         ...
      }
   }

I really want to understand why only latentcies introduced in luf
matter?  Why does not latencies already introduced in vanilla matter?

	Byungchul

> -- 
> Michal Hocko
> SUSE Labs


  reply	other threads:[~2024-06-14  2:45 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-31  9:19 [PATCH v11 00/12] LUF(Lazy Unmap Flush) reducing tlb numbers over 90% Byungchul Park
2024-05-31  9:19 ` [PATCH v11 01/12] x86/tlb: add APIs manipulating tlb batch's arch data Byungchul Park
2024-05-31  9:19 ` [PATCH v11 02/12] arm64: tlbflush: " Byungchul Park
2024-05-31  9:19 ` [PATCH v11 03/12] riscv, tlb: " Byungchul Park
2024-05-31  9:19 ` [PATCH v11 04/12] x86/tlb, riscv/tlb, mm/rmap: separate arch_tlbbatch_clear() out of arch_tlbbatch_flush() Byungchul Park
2024-05-31  9:19 ` [PATCH v11 05/12] mm: buddy: make room for a new variable, ugen, in struct page Byungchul Park
2024-05-31  9:19 ` [PATCH v11 06/12] mm: add folio_put_ugen() to deliver unmap generation number to pcp or buddy Byungchul Park
2024-05-31  9:19 ` [PATCH v11 07/12] mm: add a parameter, unmap generation number, to free_unref_folios() Byungchul Park
2024-05-31  9:19 ` [PATCH v11 08/12] mm/rmap: recognize read-only tlb entries during batched tlb flush Byungchul Park
2024-05-31  9:19 ` [PATCH v11 09/12] mm: implement LUF(Lazy Unmap Flush) defering tlb flush when folios get unmapped Byungchul Park
2024-05-31 16:12   ` Dave Hansen
2024-05-31 18:04     ` Byungchul Park
2024-05-31 21:46       ` Dave Hansen
2024-05-31 22:09         ` Matthew Wilcox
2024-06-01  2:20         ` Byungchul Park
2024-06-01  7:22         ` David Hildenbrand
2024-06-03  9:35           ` Byungchul Park
2024-06-03 13:23             ` Dave Hansen
2024-06-03 16:05               ` David Hildenbrand
2024-06-03 16:37                 ` Dave Hansen
2024-06-03 17:01                   ` Matthew Wilcox
2024-06-03 18:00                     ` David Hildenbrand
2024-06-04  8:16                       ` Huang, Ying
2024-06-04  0:34                     ` Byungchul Park
2024-06-10 13:23                       ` Michal Hocko
2024-06-11  0:55                         ` Byungchul Park
2024-06-11 11:55                           ` Michal Hocko
2024-06-14  2:45                             ` Byungchul Park [this message]
2024-06-04  1:53               ` Byungchul Park
2024-06-04  4:43                 ` Byungchul Park
2024-06-06  8:33                   ` David Hildenbrand
2024-06-14  1:57                 ` Byungchul Park
2024-06-11  9:12               ` Byungchul Park
2024-05-31  9:19 ` [PATCH v11 10/12] mm: separate move/undo parts from migrate_pages_batch() Byungchul Park
2024-05-31  9:20 ` [PATCH v11 11/12] mm, migrate: apply luf mechanism to unmapping during migration Byungchul Park
2024-05-31  9:20 ` [PATCH v11 12/12] mm, vmscan: apply luf mechanism to unmapping during folio reclaim Byungchul Park

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240614024518.GB47085@system.software.com \
    --to=byungchul@sk.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=dave.hansen@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=hughd@google.com \
    --cc=kernel_team@skhynix.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkml.byungchul.park@gmail.com \
    --cc=luto@kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rjgolo@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=vernhao@tencent.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox