From: Hillf Danton <hdanton@sina.com>
To: Byungchul Park <byungchul@sk.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [RFC PATCH v12 00/26] LUF(Lazy Unmap Flush) reducing tlb numbers over 90%
Date: Thu, 20 Feb 2025 18:32:22 +0800 [thread overview]
Message-ID: <20250220103223.2360-1-hdanton@sina.com> (raw)
In-Reply-To: <20250220052027.58847-1-byungchul@sk.com>
On Thu, 20 Feb 2025 14:20:01 +0900 Byungchul Park <byungchul@sk.com>
> To check luf's stability, I ran a heavy LLM inference workload consuming
> 210GiB over 7 days on a machine with 140GiB memory, and decided it's
> stable enough.
>
> I'm posting the latest version so that anyone can try luf mechanism if
> wanted by any chance. However, I tagged RFC again because there are
> still issues that should be resolved to merge to mainline:
>
> 1. Even though system wide total cpu time for TLB shootdown is
> reduced over 95%, page allocation paths should take additional cpu
> time shifted from page reclaim to perform TLB shootdown.
>
> 2. We need luf debug feature to detect when luf goes wrong by any
> chance. I implemented just a draft version that checks the sanity
> on mkwrite(), kmap(), and so on. I need to gather better ideas
> to improve the debug feature.
>
> ---
>
> Hi everyone,
>
> While I'm working with a tiered memory system e.g. CXL memory, I have
> been facing migration overhead esp. tlb shootdown on promotion or
> demotion between different tiers. Yeah.. most tlb shootdowns on
> migration through hinting fault can be avoided thanks to Huang Ying's
> work, commit 4d4b6d66db ("mm,unmap: avoid flushing tlb in batch if PTE
> is inaccessible").
>
> However, it's only for migration through hinting fault. I thought it'd
> be much better if we have a general mechanism to reduce all the tlb
> numbers that we can apply to any unmap code, that we normally believe
> tlb flush should be followed.
>
> I'm suggesting a new mechanism, LUF(Lazy Unmap Flush), that defers tlb
> flush until folios that have been unmapped and freed, eventually get
> allocated again. It's safe for folios that had been mapped read-only
> and were unmapped, as long as the contents of the folios don't change
> while staying in pcp or buddy so we can still read the data through the
> stale tlb entries.
>
Given pcp or buddy, you are opening window for use after free which makes
no sense in 99% cases.
next prev parent reply other threads:[~2025-02-20 10:32 UTC|newest]
Thread overview: 102+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-20 5:20 Byungchul Park
2025-02-20 5:20 ` [RFC PATCH v12 01/26] x86/tlb: add APIs manipulating tlb batch's arch data Byungchul Park
2025-02-20 5:20 ` [RFC PATCH v12 02/26] arm64/tlbflush: " Byungchul Park
2025-02-20 5:20 ` [RFC PATCH v12 03/26] riscv/tlb: " Byungchul Park
2025-02-20 5:20 ` [RFC PATCH v12 04/26] x86/tlb, riscv/tlb, mm/rmap: separate arch_tlbbatch_clear() out of arch_tlbbatch_flush() Byungchul Park
2025-02-20 5:20 ` [RFC PATCH v12 05/26] mm/buddy: make room for a new variable, luf_key, in struct page Byungchul Park
2025-02-20 5:20 ` [RFC PATCH v12 06/26] mm: move should_skip_kasan_poison() to mm/internal.h Byungchul Park
2025-02-20 5:20 ` [RFC PATCH v12 07/26] mm: introduce luf_ugen to be used as a global timestamp Byungchul Park
2025-02-20 5:20 ` [RFC PATCH v12 08/26] mm: introduce luf_batch to be used as hash table to store luf meta data Byungchul Park
2025-02-20 5:20 ` [RFC PATCH v12 09/26] mm: introduce API to perform tlb shootdown on exit from page allocator Byungchul Park
2025-02-20 5:20 ` [RFC PATCH v12 10/26] mm: introduce APIs to check if the page allocation is tlb shootdownable Byungchul Park
2025-02-20 5:20 ` [RFC PATCH v12 11/26] mm: deliver luf_key to pcp or buddy on free after unmapping Byungchul Park
2025-02-20 5:20 ` [RFC PATCH v12 12/26] mm: delimit critical sections to take off pages from pcp or buddy alloctor Byungchul Park
2025-02-20 5:20 ` [RFC PATCH v12 13/26] mm: introduce pend_list in struct free_area to track luf'd pages Byungchul Park
2025-02-20 5:20 ` [RFC PATCH v12 14/26] mm/rmap: recognize read-only tlb entries during batched tlb flush Byungchul Park
2025-02-20 5:20 ` [RFC PATCH v12 15/26] fs, filemap: refactor to gather the scattered ->write_{begin,end}() calls Byungchul Park
2025-02-20 5:20 ` [RFC PATCH v12 16/26] mm: implement LUF(Lazy Unmap Flush) defering tlb flush when folios get unmapped Byungchul Park
2025-02-20 5:20 ` [RFC PATCH v12 17/26] x86/tlb, riscv/tlb, arm64/tlbflush, mm: remove cpus from tlb shootdown that already have been done Byungchul Park
2025-02-20 5:20 ` [RFC PATCH v12 18/26] mm/page_alloc: retry 3 times to take pcp pages on luf check failure Byungchul Park
2025-02-20 5:20 ` [RFC PATCH v12 19/26] mm: skip luf tlb flush for luf'd mm that already has been done Byungchul Park
2025-02-20 5:20 ` [RFC PATCH v12 20/26] mm, fs: skip tlb flushes for luf'd filemap " Byungchul Park
2025-02-20 5:20 ` [RFC PATCH v12 21/26] mm: perform luf tlb shootdown per zone in batched manner Byungchul Park
2025-02-20 5:20 ` [RFC PATCH v12 22/26] mm/page_alloc: not allow to tlb shootdown if !preemptable() && non_luf_pages_ok() Byungchul Park
2025-02-20 5:20 ` [RFC PATCH v12 23/26] mm: separate move/undo parts from migrate_pages_batch() Byungchul Park
2025-02-20 5:20 ` [RFC PATCH v12 24/26] mm/migrate: apply luf mechanism to unmapping during migration Byungchul Park
2025-02-20 5:20 ` [RFC PATCH v12 25/26] mm/vmscan: apply luf mechanism to unmapping during folio reclaim Byungchul Park
2025-02-20 5:20 ` [RFC PATCH v12 26/26] mm/luf: implement luf debug feature Byungchul Park
2025-02-20 10:32 ` Hillf Danton [this message]
2025-02-20 10:51 ` [RFC PATCH v12 00/26] LUF(Lazy Unmap Flush) reducing tlb numbers over 90% Byungchul Park
2025-02-20 11:09 ` Byungchul Park
2025-02-20 11:49 ` Hillf Danton
2025-02-20 12:20 ` Byungchul Park
2025-02-20 12:40 ` Byungchul Park
2025-02-20 13:54 ` Matthew Wilcox
2025-02-20 15:09 ` Steven Rostedt
2025-02-20 22:53 ` Kent Overstreet
2025-02-20 23:05 ` Steven Rostedt
2025-02-20 23:21 ` Kent Overstreet
2025-02-20 23:25 ` Hillf Danton
2025-02-20 23:44 ` Steven Rostedt
[not found] ` <20250221230556.2479-1-hdanton@sina.com>
2025-02-22 7:16 ` Greg KH
[not found] ` <20250222101100.2531-1-hdanton@sina.com>
2025-02-22 13:57 ` Greg KH
2025-03-10 23:24 ` Dan Williams
2025-03-10 23:53 ` Barry Song
[not found] ` <20250619134922.1219-1-hdanton@sina.com>
2025-06-20 17:00 ` Dan Williams
2025-02-20 15:15 ` Dave Hansen
2025-02-20 15:29 ` Vlastimil Babka
2025-02-20 23:37 ` Byungchul Park
2025-02-26 11:30 ` RFC v12 rebased on v6.14-rc4 Byungchul Park
2025-02-26 12:03 ` [RFC PATCH v12 based on v6.14-rc4 01/25] x86/tlb: add APIs manipulating tlb batch's arch data Byungchul Park
2025-02-26 12:03 ` [RFC PATCH v12 based on v6.14-rc4 02/25] arm64/tlbflush: " Byungchul Park
2025-02-26 12:03 ` [RFC PATCH v12 based on v6.14-rc4 03/25] riscv/tlb: " Byungchul Park
2025-02-26 12:03 ` [RFC PATCH v12 based on v6.14-rc4 04/25] x86/tlb, riscv/tlb, mm/rmap: separate arch_tlbbatch_clear() out of arch_tlbbatch_flush() Byungchul Park
2025-02-26 12:03 ` [RFC PATCH v12 based on v6.14-rc4 05/25] mm/buddy: make room for a new variable, luf_key, in struct page Byungchul Park
2025-02-26 12:03 ` [RFC PATCH v12 based on v6.14-rc4 06/25] mm: move should_skip_kasan_poison() to mm/internal.h Byungchul Park
2025-02-26 12:03 ` [RFC PATCH v12 based on v6.14-rc4 07/25] mm: introduce luf_ugen to be used as a global timestamp Byungchul Park
2025-02-26 12:03 ` [RFC PATCH v12 based on v6.14-rc4 08/25] mm: introduce luf_batch to be used as hash table to store luf meta data Byungchul Park
2025-02-26 12:03 ` [RFC PATCH v12 based on v6.14-rc4 09/25] mm: introduce API to perform tlb shootdown on exit from page allocator Byungchul Park
2025-02-26 12:03 ` [RFC PATCH v12 based on v6.14-rc4 10/25] mm: introduce APIs to check if the page allocation is tlb shootdownable Byungchul Park
2025-02-26 12:03 ` [RFC PATCH v12 based on v6.14-rc4 11/25] mm: deliver luf_key to pcp or buddy on free after unmapping Byungchul Park
2025-02-26 12:03 ` [RFC PATCH v12 based on v6.14-rc4 12/25] mm: delimit critical sections to take off pages from pcp or buddy alloctor Byungchul Park
2025-02-26 12:03 ` [RFC PATCH v12 based on v6.14-rc4 13/25] mm: introduce pend_list in struct free_area to track luf'd pages Byungchul Park
2025-02-26 12:03 ` [RFC PATCH v12 based on v6.14-rc4 14/25] mm/rmap: recognize read-only tlb entries during batched tlb flush Byungchul Park
2025-02-26 12:03 ` [RFC PATCH v12 based on v6.14-rc4 15/25] fs, filemap: refactor to gather the scattered ->write_{begin,end}() calls Byungchul Park
2025-02-26 12:03 ` [RFC PATCH v12 based on v6.14-rc4 16/25] mm: implement LUF(Lazy Unmap Flush) defering tlb flush when folios get unmapped Byungchul Park
2025-02-26 12:03 ` [RFC PATCH v12 based on v6.14-rc4 17/25] x86/tlb, riscv/tlb, arm64/tlbflush, mm: remove cpus from tlb shootdown that already have been done Byungchul Park
2025-02-26 12:03 ` [RFC PATCH v12 based on v6.14-rc4 18/25] mm/page_alloc: retry 3 times to take pcp pages on luf check failure Byungchul Park
2025-02-26 12:03 ` [RFC PATCH v12 based on v6.14-rc4 19/25] mm: skip luf tlb flush for luf'd mm that already has been done Byungchul Park
2025-02-26 12:03 ` [RFC PATCH v12 based on v6.14-rc4 20/25] mm, fs: skip tlb flushes for luf'd filemap " Byungchul Park
2025-02-26 12:03 ` [RFC PATCH v12 based on v6.14-rc4 21/25] mm: perform luf tlb shootdown per zone in batched manner Byungchul Park
2025-02-26 12:03 ` [RFC PATCH v12 based on v6.14-rc4 22/25] mm/page_alloc: not allow to tlb shootdown if !preemptable() && non_luf_pages_ok() Byungchul Park
2025-02-26 12:03 ` [RFC PATCH v12 based on v6.14-rc4 23/25] mm/migrate: apply luf mechanism to unmapping during migration Byungchul Park
2025-02-26 12:03 ` [RFC PATCH v12 based on v6.14-rc4 24/25] mm/vmscan: apply luf mechanism to unmapping during folio reclaim Byungchul Park
2025-02-26 12:03 ` [RFC PATCH v12 based on v6.14-rc4 25/25] mm/luf: implement luf debug feature Byungchul Park
2025-02-26 11:33 ` RFC v12 rebased on mm-unstable as of Feb 21, 2025 Byungchul Park
2025-02-26 12:01 ` [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 01/25] x86/tlb: add APIs manipulating tlb batch's arch data Byungchul Park
2025-02-26 12:01 ` [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 02/25] arm64/tlbflush: " Byungchul Park
2025-02-26 12:01 ` [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 03/25] riscv/tlb: " Byungchul Park
2025-02-26 12:01 ` [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 04/25] x86/tlb, riscv/tlb, mm/rmap: separate arch_tlbbatch_clear() out of arch_tlbbatch_flush() Byungchul Park
2025-02-26 12:01 ` [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 05/25] mm/buddy: make room for a new variable, luf_key, in struct page Byungchul Park
2025-02-26 12:01 ` [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 06/25] mm: move should_skip_kasan_poison() to mm/internal.h Byungchul Park
2025-02-26 12:01 ` [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 07/25] mm: introduce luf_ugen to be used as a global timestamp Byungchul Park
2025-02-26 12:01 ` [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 08/25] mm: introduce luf_batch to be used as hash table to store luf meta data Byungchul Park
2025-02-26 12:01 ` [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 09/25] mm: introduce API to perform tlb shootdown on exit from page allocator Byungchul Park
2025-02-26 12:01 ` [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 10/25] mm: introduce APIs to check if the page allocation is tlb shootdownable Byungchul Park
2025-02-26 12:01 ` [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 11/25] mm: deliver luf_key to pcp or buddy on free after unmapping Byungchul Park
2025-02-26 12:01 ` [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 12/25] mm: delimit critical sections to take off pages from pcp or buddy alloctor Byungchul Park
2025-02-26 12:01 ` [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 13/25] mm: introduce pend_list in struct free_area to track luf'd pages Byungchul Park
2025-02-26 12:01 ` [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 14/25] mm/rmap: recognize read-only tlb entries during batched tlb flush Byungchul Park
2025-02-26 12:01 ` [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 15/25] fs, filemap: refactor to gather the scattered ->write_{begin,end}() calls Byungchul Park
2025-02-26 12:01 ` [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 16/25] mm: implement LUF(Lazy Unmap Flush) defering tlb flush when folios get unmapped Byungchul Park
2025-02-26 12:01 ` [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 17/25] x86/tlb, riscv/tlb, arm64/tlbflush, mm: remove cpus from tlb shootdown that already have been done Byungchul Park
2025-02-26 12:01 ` [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 18/25] mm/page_alloc: retry 3 times to take pcp pages on luf check failure Byungchul Park
2025-02-26 12:01 ` [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 19/25] mm: skip luf tlb flush for luf'd mm that already has been done Byungchul Park
2025-02-26 12:01 ` [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 20/25] mm, fs: skip tlb flushes for luf'd filemap " Byungchul Park
2025-02-26 12:01 ` [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 21/25] mm: perform luf tlb shootdown per zone in batched manner Byungchul Park
2025-02-26 12:01 ` [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 22/25] mm/page_alloc: not allow to tlb shootdown if !preemptable() && non_luf_pages_ok() Byungchul Park
2025-02-26 12:01 ` [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 23/25] mm/migrate: apply luf mechanism to unmapping during migration Byungchul Park
2025-02-26 12:01 ` [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 24/25] mm/vmscan: apply luf mechanism to unmapping during folio reclaim Byungchul Park
2025-02-26 12:01 ` [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 25/25] mm/luf: implement luf debug feature Byungchul Park
2025-02-22 1:14 ` [RFC PATCH v12 00/26] LUF(Lazy Unmap Flush) reducing tlb numbers over 90% Shakeel Butt
2025-02-20 23:23 ` Byungchul Park
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250220103223.2360-1-hdanton@sina.com \
--to=hdanton@sina.com \
--cc=byungchul@sk.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox