From: Dave Hansen <dave.hansen@intel.com>
To: Byungchul Park <byungchul@sk.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Cc: kernel_team@skhynix.com, akpm@linux-foundation.org,
ying.huang@intel.com, vernhao@tencent.com,
mgorman@techsingularity.net, hughd@google.com,
willy@infradead.org, david@redhat.com, peterz@infradead.org,
luto@kernel.org, tglx@linutronix.de, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, rjgolo@gmail.com
Subject: Re: [PATCH v11 09/12] mm: implement LUF(Lazy Unmap Flush) defering tlb flush when folios get unmapped
Date: Fri, 31 May 2024 09:12:42 -0700 [thread overview]
Message-ID: <fab1dd64-c652-4160-93b4-7b483a8874da@intel.com> (raw)
In-Reply-To: <20240531092001.30428-10-byungchul@sk.com>
On 5/31/24 02:19, Byungchul Park wrote:
...
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 0283cf366c2a..03683bf66031 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2872,6 +2872,12 @@ static inline void file_end_write(struct file *file)
> if (!S_ISREG(file_inode(file)->i_mode))
> return;
> sb_end_write(file_inode(file)->i_sb);
> +
> + /*
> + * XXX: If needed, can be optimized by avoiding luf_flush() if
> + * the address space of the file has never been involved by luf.
> + */
> + luf_flush();
> }
...
> +void luf_flush(void)
> +{
> + unsigned long flags;
> + unsigned short int ugen;
> +
> + /*
> + * Obtain the latest ugen number.
> + */
> + spin_lock_irqsave(&luf_lock, flags);
> + ugen = luf_gen;
> + spin_unlock_irqrestore(&luf_lock, flags);
> +
> + check_luf_flush(ugen);
> +}
Am I reading this right? There's now an unconditional global spinlock
acquired in the sys_write() path? How can this possibly scale?
So, yeah, I think an optimization is absolutely needed. But, on a more
fundamental level, I just don't believe these patches are being tested.
Even a simple microbenchmark should show a pretty nasty regression on
any decently large system:
> https://github.com/antonblanchard/will-it-scale/blob/master/tests/write1.c
Second, I was just pointing out sys_write() as an example of how the
page cache could change. Couldn't a separate, read/write mmap() of the
file do the same thing and *not* go through sb_end_write()?
So:
fd = open("foo");
ptr1 = mmap(fd, PROT_READ);
ptr2 = mmap(fd, PROT_READ|PROT_WRITE);
foo = *ptr1; // populate the page cache
... page cache page is reclaimed and LUF'd
*ptr2 = bar; // new page cache page is allocated and written to
printk("*ptr1: %d\n", *ptr1);
Doesn't the printk() see stale data?
I think tglx would call all of this "tinkering". The approach to this
series is to "fix" narrow, specific cases that reviewers point out, make
it compile, then send it out again, hoping someone will apply it.
So, for me, until the approach to this series changes: NAK, for x86.
Andrew, please don't take this series. Or, if you do, please drop the
patch enabling it on x86.
I also have the feeling our VFS friends won't take kindly to having
random luf_foo() hooks in their hot paths, optimized or not. I don't
see any of them on cc.
next prev parent reply other threads:[~2024-05-31 16:13 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-31 9:19 [PATCH v11 00/12] LUF(Lazy Unmap Flush) reducing tlb numbers over 90% Byungchul Park
2024-05-31 9:19 ` [PATCH v11 01/12] x86/tlb: add APIs manipulating tlb batch's arch data Byungchul Park
2024-05-31 9:19 ` [PATCH v11 02/12] arm64: tlbflush: " Byungchul Park
2024-05-31 9:19 ` [PATCH v11 03/12] riscv, tlb: " Byungchul Park
2024-05-31 9:19 ` [PATCH v11 04/12] x86/tlb, riscv/tlb, mm/rmap: separate arch_tlbbatch_clear() out of arch_tlbbatch_flush() Byungchul Park
2024-05-31 9:19 ` [PATCH v11 05/12] mm: buddy: make room for a new variable, ugen, in struct page Byungchul Park
2024-05-31 9:19 ` [PATCH v11 06/12] mm: add folio_put_ugen() to deliver unmap generation number to pcp or buddy Byungchul Park
2024-05-31 9:19 ` [PATCH v11 07/12] mm: add a parameter, unmap generation number, to free_unref_folios() Byungchul Park
2024-05-31 9:19 ` [PATCH v11 08/12] mm/rmap: recognize read-only tlb entries during batched tlb flush Byungchul Park
2024-05-31 9:19 ` [PATCH v11 09/12] mm: implement LUF(Lazy Unmap Flush) defering tlb flush when folios get unmapped Byungchul Park
2024-05-31 16:12 ` Dave Hansen [this message]
2024-05-31 18:04 ` Byungchul Park
2024-05-31 21:46 ` Dave Hansen
2024-05-31 22:09 ` Matthew Wilcox
2024-06-01 2:20 ` Byungchul Park
2024-06-01 7:22 ` David Hildenbrand
2024-06-03 9:35 ` Byungchul Park
2024-06-03 13:23 ` Dave Hansen
2024-06-03 16:05 ` David Hildenbrand
2024-06-03 16:37 ` Dave Hansen
2024-06-03 17:01 ` Matthew Wilcox
2024-06-03 18:00 ` David Hildenbrand
2024-06-04 8:16 ` Huang, Ying
2024-06-04 0:34 ` Byungchul Park
2024-06-10 13:23 ` Michal Hocko
2024-06-11 0:55 ` Byungchul Park
2024-06-11 11:55 ` Michal Hocko
2024-06-14 2:45 ` Byungchul Park
2024-06-04 1:53 ` Byungchul Park
2024-06-04 4:43 ` Byungchul Park
2024-06-06 8:33 ` David Hildenbrand
2024-06-14 1:57 ` Byungchul Park
2024-06-11 9:12 ` Byungchul Park
2024-05-31 9:19 ` [PATCH v11 10/12] mm: separate move/undo parts from migrate_pages_batch() Byungchul Park
2024-05-31 9:20 ` [PATCH v11 11/12] mm, migrate: apply luf mechanism to unmapping during migration Byungchul Park
2024-05-31 9:20 ` [PATCH v11 12/12] mm, vmscan: apply luf mechanism to unmapping during folio reclaim Byungchul Park
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fab1dd64-c652-4160-93b4-7b483a8874da@intel.com \
--to=dave.hansen@intel.com \
--cc=akpm@linux-foundation.org \
--cc=bp@alien8.de \
--cc=byungchul@sk.com \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=hughd@google.com \
--cc=kernel_team@skhynix.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=mgorman@techsingularity.net \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rjgolo@gmail.com \
--cc=tglx@linutronix.de \
--cc=vernhao@tencent.com \
--cc=willy@infradead.org \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox