linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Byungchul Park <byungchul@sk.com>
To: Dave Hansen <dave.hansen@intel.com>
Cc: David Hildenbrand <david@redhat.com>,
	Byungchul Park <lkml.byungchul.park@gmail.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	kernel_team@skhynix.com, akpm@linux-foundation.org,
	ying.huang@intel.com, vernhao@tencent.com,
	mgorman@techsingularity.net, hughd@google.com,
	willy@infradead.org, peterz@infradead.org, luto@kernel.org,
	tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	dave.hansen@linux.intel.com, rjgolo@gmail.com
Subject: Re: [PATCH v11 09/12] mm: implement LUF(Lazy Unmap Flush) defering tlb flush when folios get unmapped
Date: Tue, 4 Jun 2024 10:53:48 +0900	[thread overview]
Message-ID: <20240604015348.GB26609@system.software.com> (raw)
In-Reply-To: <d650c29b-129f-4fac-9a9d-ea1fbdae2c3a@intel.com>

On Mon, Jun 03, 2024 at 06:23:46AM -0700, Dave Hansen wrote:
> On 6/3/24 02:35, Byungchul Park wrote:
> ...> In luf's point of view, the points where the deferred flush should be
> > performed are simply:
> > 
> > 	1. when changing the vma maps, that might be luf'ed.
> > 	2. when updating data of the pages, that might be luf'ed.
> 
> It's simple, but the devil is in the details as always.

Agree with that.

> > All we need to do is to indentify the points:
> > 
> > 	1. when changing the vma maps, that might be luf'ed.
> > 
> > 	   a) mmap and munmap e.i. fault handler or unmap_region().
> > 	   b) permission to writable e.i. mprotect or fault handler.
> > 	   c) what I'm missing.
> 
> I'd say it even more generally: anything that installs a PTE which is
> inconsistent with the original PTE.  That, of course, includes writes.
> But it also includes crazy things that we do like uprobes.  Take a look
> at __replace_page().
> 
> I think the page_vma_mapped_walk() checks plus the ptl keep LUF at bay
> there.  But it needs some really thorough review.
> 
> But the bigger concern is that, if there was a problem, I can't think of
> a systematic way to find it.
> 
> > 	2. when updating data of the pages, that might be luf'ed.
> > 
> > 	   a) updating files through vfs e.g. file_end_write().
> > 	   b) updating files through writable maps e.i. 1-a) or 1-b).
> > 	   c) what I'm missing.
> 
> Filesystems or block devices that change content without a "write" from
> the local system.  Network filesystems and block devices come to mind.

AFAIK, every network filesystem eventully "updates" its connected local
filesystem.  It could be still handled at the point where updating the
local file system.

> I honestly don't know what all the rules are around these, but they
> could certainly be troublesome.
> 
> There appear to be some interactions for NFS between file locking and
> page cache flushing.
> 
> But, stepping back ...
> 
> I'd honestly be a lot more comfortable if there was even a debugging LUF

I'd better provide a method for better debugging.  Lemme know whatever
it is we need.

> mode that enforced a rule that said:

Why "debugging mode"?  The following rules should be enforced always.

>   1. A LUF'd PTE can't be rewritten until after a luf_flush() occurs

"luf_flush() should be followed when.." is more correct because
"luf_flush() -> another luf -> the pte gets rewritten" can happen.  So
it should be "the pte gets rewritten -> another luf by any chance ->
luf_flush()", that is still safe.

>   2. A LUF'd page's position in the page cache can't be replaced until
>      after a luf_flush()

"luf_flush() should be followed when.." is more correct too.

These two rules are exactly same as what I described but more specific.
I like your way to describe the rules.

	Byungchul

> or *some* other independent set of rules that can tell us when something
> goes wrong.  That uprobes code, for instance, seems like it will work.
> But I can also imagine writing it ten other ways where it would break
> when combined with LUF.


  parent reply	other threads:[~2024-06-04  1:54 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-31  9:19 [PATCH v11 00/12] LUF(Lazy Unmap Flush) reducing tlb numbers over 90% Byungchul Park
2024-05-31  9:19 ` [PATCH v11 01/12] x86/tlb: add APIs manipulating tlb batch's arch data Byungchul Park
2024-05-31  9:19 ` [PATCH v11 02/12] arm64: tlbflush: " Byungchul Park
2024-05-31  9:19 ` [PATCH v11 03/12] riscv, tlb: " Byungchul Park
2024-05-31  9:19 ` [PATCH v11 04/12] x86/tlb, riscv/tlb, mm/rmap: separate arch_tlbbatch_clear() out of arch_tlbbatch_flush() Byungchul Park
2024-05-31  9:19 ` [PATCH v11 05/12] mm: buddy: make room for a new variable, ugen, in struct page Byungchul Park
2024-05-31  9:19 ` [PATCH v11 06/12] mm: add folio_put_ugen() to deliver unmap generation number to pcp or buddy Byungchul Park
2024-05-31  9:19 ` [PATCH v11 07/12] mm: add a parameter, unmap generation number, to free_unref_folios() Byungchul Park
2024-05-31  9:19 ` [PATCH v11 08/12] mm/rmap: recognize read-only tlb entries during batched tlb flush Byungchul Park
2024-05-31  9:19 ` [PATCH v11 09/12] mm: implement LUF(Lazy Unmap Flush) defering tlb flush when folios get unmapped Byungchul Park
2024-05-31 16:12   ` Dave Hansen
2024-05-31 18:04     ` Byungchul Park
2024-05-31 21:46       ` Dave Hansen
2024-05-31 22:09         ` Matthew Wilcox
2024-06-01  2:20         ` Byungchul Park
2024-06-01  7:22         ` David Hildenbrand
2024-06-03  9:35           ` Byungchul Park
2024-06-03 13:23             ` Dave Hansen
2024-06-03 16:05               ` David Hildenbrand
2024-06-03 16:37                 ` Dave Hansen
2024-06-03 17:01                   ` Matthew Wilcox
2024-06-03 18:00                     ` David Hildenbrand
2024-06-04  8:16                       ` Huang, Ying
2024-06-04  0:34                     ` Byungchul Park
2024-06-10 13:23                       ` Michal Hocko
2024-06-11  0:55                         ` Byungchul Park
2024-06-11 11:55                           ` Michal Hocko
2024-06-14  2:45                             ` Byungchul Park
2024-06-04  1:53               ` Byungchul Park [this message]
2024-06-04  4:43                 ` Byungchul Park
2024-06-06  8:33                   ` David Hildenbrand
2024-06-14  1:57                 ` Byungchul Park
2024-06-11  9:12               ` Byungchul Park
2024-05-31  9:19 ` [PATCH v11 10/12] mm: separate move/undo parts from migrate_pages_batch() Byungchul Park
2024-05-31  9:20 ` [PATCH v11 11/12] mm, migrate: apply luf mechanism to unmapping during migration Byungchul Park
2024-05-31  9:20 ` [PATCH v11 12/12] mm, vmscan: apply luf mechanism to unmapping during folio reclaim Byungchul Park

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240604015348.GB26609@system.software.com \
    --to=byungchul@sk.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=dave.hansen@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=hughd@google.com \
    --cc=kernel_team@skhynix.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkml.byungchul.park@gmail.com \
    --cc=luto@kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rjgolo@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=vernhao@tencent.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox