linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Qi Zheng <zhengqi.arch@bytedance.com>
To: akpm@linux-foundation.org
Cc: david@redhat.com, jannh@google.com, hughd@google.com,
	willy@infradead.org, muchun.song@linux.dev, vbabka@kernel.org,
	peterx@redhat.com, mgorman@suse.de, catalin.marinas@arm.com,
	will@kernel.org, dave.hansen@linux.intel.com, luto@kernel.org,
	peterz@infradead.org, x86@kernel.org, lorenzo.stoakes@oracle.com,
	zokeefe@google.com, rientjes@google.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v4 00/11] synchronously scan and reclaim empty user PTE pages
Date: Tue, 10 Dec 2024 16:57:04 +0800	[thread overview]
Message-ID: <53fb3b26-4a28-48a2-8403-a9b8d2fe6c24@bytedance.com> (raw)
In-Reply-To: <cover.1733305182.git.zhengqi.arch@bytedance.com>

Hi Andrew,

I have sent patch[1][2][3] to fix recently reported issues:

[1]. 
https://lore.kernel.org/lkml/20241210084156.89877-1-zhengqi.arch@bytedance.com/
(Fix warning, need to be folded into [PATCH v4 02/11])

[2]. 
https://lore.kernel.org/lkml/20241206112348.51570-1-zhengqi.arch@bytedance.com/
(Fix uninitialized symbol, need to be folded into [PATCH v4 09/11])

[3]. 
https://lore.kernel.org/lkml/20241210084431.91414-1-zhengqi.arch@bytedance.com/
(fix UAF, need to be placed before [PATCH v4 11/11])

If you need me to re-post a complete v5, please let me know.

Thanks,
Qi


On 2024/12/4 19:09, Qi Zheng wrote:
> Changes in v4:
>   - update the process_addrs.rst in [PATCH v4 01/11]
>     (suggested by Lorenzo Stoakes)
>   - fix [PATCH v3 4/9] and move it after [PATCH v3 5/9]
>     (pointed by David Hildenbrand)
>   - change to use any_skipped instead of rechecking pte_none() to detect empty
>     user PTE pages (suggested by David Hildenbrand)
>   - rebase onto the next-20241203
> 
> Changes in v3:
>   - recheck pmd state instead of pmd_same() in retract_page_tables()
>     (suggested by Jann Horn)
>   - recheck dst_pmd entry in move_pages_pte() (pointed by Jann Horn)
>   - introduce new skip_none_ptes() (suggested by David Hildenbrand)
>   - minor changes in [PATCH v2 5/7]
>   - remove tlb_remove_table_sync_one() if CONFIG_PT_RECLAIM is enabled.
>   - use put_page() instead of free_page_and_swap_cache() in
>     __tlb_remove_table_one_rcu() (pointed by Jann Horn)
>   - collect the Reviewed-bys and Acked-bys
>   - rebase onto the next-20241112
> 
> Changes in v2:
>   - fix [PATCH v1 1/7] (Jann Horn)
>   - reset force_flush and force_break to false in [PATCH v1 2/7] (Jann Horn)
>   - introduce zap_nonpresent_ptes() and do_zap_pte_range()
>   - check pte_none() instead of can_reclaim_pt after the processing of PTEs
>     (remove [PATCH v1 3/7] and [PATCH v1 4/7])
>   - reorder patches
>   - rebase onto the next-20241031
> 
> Changes in v1:
>   - replace [RFC PATCH 1/7] with a separate serise (already merge into mm-unstable):
>     https://lore.kernel.org/lkml/cover.1727332572.git.zhengqi.arch@bytedance.com/
>     (suggested by David Hildenbrand)
>   - squash [RFC PATCH 2/7] into [RFC PATCH 4/7]
>     (suggested by David Hildenbrand)
>   - change to scan and reclaim empty user PTE pages in zap_pte_range()
>     (suggested by David Hildenbrand)
>   - sent a separate RFC patch to track the tlb flushing issue, and remove
>     that part form this series ([RFC PATCH 3/7] and [RFC PATCH 6/7]).
>     link: https://lore.kernel.org/lkml/20240815120715.14516-1-zhengqi.arch@bytedance.com/
>   - add [PATCH v1 1/7] into this series
>   - drop RFC tag
>   - rebase onto the next-20241011
> 
> Changes in RFC v2:
>   - fix compilation errors in [RFC PATCH 5/7] and [RFC PATCH 7/7] reproted by
>     kernel test robot
>   - use pte_offset_map_nolock() + pmd_same() instead of check_pmd_still_valid()
>     in retract_page_tables() (in [RFC PATCH 4/7])
>   - rebase onto the next-20240805
> 
> Hi all,
> 
> Previously, we tried to use a completely asynchronous method to reclaim empty
> user PTE pages [1]. After discussing with David Hildenbrand, we decided to
> implement synchronous reclaimation in the case of madvise(MADV_DONTNEED) as the
> first step.
> 
> So this series aims to synchronously free the empty PTE pages in
> madvise(MADV_DONTNEED) case. We will detect and free empty PTE pages in
> zap_pte_range(), and will add zap_details.reclaim_pt to exclude cases other than
> madvise(MADV_DONTNEED).
> 
> In zap_pte_range(), mmu_gather is used to perform batch tlb flushing and page
> freeing operations. Therefore, if we want to free the empty PTE page in this
> path, the most natural way is to add it to mmu_gather as well. Now, if
> CONFIG_MMU_GATHER_RCU_TABLE_FREE is selected, mmu_gather will free page table
> pages by semi RCU:
> 
>   - batch table freeing: asynchronous free by RCU
>   - single table freeing: IPI + synchronous free
> 
> But this is not enough to free the empty PTE page table pages in paths other
> that munmap and exit_mmap path, because IPI cannot be synchronized with
> rcu_read_lock() in pte_offset_map{_lock}(). So we should let single table also
> be freed by RCU like batch table freeing.
> 
> As a first step, we supported this feature on x86_64 and selectd the newly
> introduced CONFIG_ARCH_SUPPORTS_PT_RECLAIM.
> 
> For other cases such as madvise(MADV_FREE), consider scanning and freeing empty
> PTE pages asynchronously in the future.
> 
> This series is based on next-20241112 (which contains the series [2]).
> 
> Note: issues related to TLB flushing are not new to this series and are tracked
>        in the separate RFC patch [3]. And more context please refer to this
>        thread [4].
> 
> Comments and suggestions are welcome!
> 
> Thanks,
> Qi
> 
> [1]. https://lore.kernel.org/lkml/cover.1718267194.git.zhengqi.arch@bytedance.com/
> [2]. https://lore.kernel.org/lkml/cover.1727332572.git.zhengqi.arch@bytedance.com/
> [3]. https://lore.kernel.org/lkml/20240815120715.14516-1-zhengqi.arch@bytedance.com/
> [4]. https://lore.kernel.org/lkml/6f38cb19-9847-4f70-bbe7-06881bb016be@bytedance.com/
> 
> Qi Zheng (11):
>    mm: khugepaged: recheck pmd state in retract_page_tables()
>    mm: userfaultfd: recheck dst_pmd entry in move_pages_pte()
>    mm: introduce zap_nonpresent_ptes()
>    mm: introduce do_zap_pte_range()
>    mm: skip over all consecutive none ptes in do_zap_pte_range()
>    mm: zap_install_uffd_wp_if_needed: return whether uffd-wp pte has been
>      re-installed
>    mm: do_zap_pte_range: return any_skipped information to the caller
>    mm: make zap_pte_range() handle full within-PMD range
>    mm: pgtable: reclaim empty PTE page in madvise(MADV_DONTNEED)
>    x86: mm: free page table pages by RCU instead of semi RCU
>    x86: select ARCH_SUPPORTS_PT_RECLAIM if X86_64
> 
>   Documentation/mm/process_addrs.rst |   4 +
>   arch/x86/Kconfig                   |   1 +
>   arch/x86/include/asm/tlb.h         |  20 +++
>   arch/x86/kernel/paravirt.c         |   7 +
>   arch/x86/mm/pgtable.c              |  10 +-
>   include/linux/mm.h                 |   1 +
>   include/linux/mm_inline.h          |  11 +-
>   include/linux/mm_types.h           |   4 +-
>   mm/Kconfig                         |  15 ++
>   mm/Makefile                        |   1 +
>   mm/internal.h                      |  19 +++
>   mm/khugepaged.c                    |  45 +++--
>   mm/madvise.c                       |   7 +-
>   mm/memory.c                        | 253 ++++++++++++++++++-----------
>   mm/mmu_gather.c                    |   9 +-
>   mm/pt_reclaim.c                    |  71 ++++++++
>   mm/userfaultfd.c                   |  51 ++++--
>   17 files changed, 397 insertions(+), 132 deletions(-)
>   create mode 100644 mm/pt_reclaim.c
> 


      parent reply	other threads:[~2024-12-10  8:57 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-04 11:09 Qi Zheng
2024-12-04 11:09 ` [PATCH v4 01/11] mm: khugepaged: recheck pmd state in retract_page_tables() Qi Zheng
2024-12-04 11:09 ` [PATCH v4 02/11] mm: userfaultfd: recheck dst_pmd entry in move_pages_pte() Qi Zheng
2024-12-10  8:41   ` [PATCH v4 02/11 fix] fix: " Qi Zheng
2024-12-04 11:09 ` [PATCH v4 03/11] mm: introduce zap_nonpresent_ptes() Qi Zheng
2024-12-04 11:09 ` [PATCH v4 04/11] mm: introduce do_zap_pte_range() Qi Zheng
2024-12-04 11:09 ` [PATCH v4 05/11] mm: skip over all consecutive none ptes in do_zap_pte_range() Qi Zheng
2024-12-04 11:09 ` [PATCH v4 06/11] mm: zap_install_uffd_wp_if_needed: return whether uffd-wp pte has been re-installed Qi Zheng
2024-12-04 11:09 ` [PATCH v4 07/11] mm: do_zap_pte_range: return any_skipped information to the caller Qi Zheng
2024-12-04 11:09 ` [PATCH v4 08/11] mm: make zap_pte_range() handle full within-PMD range Qi Zheng
2024-12-04 11:09 ` [PATCH v4 09/11] mm: pgtable: reclaim empty PTE page in madvise(MADV_DONTNEED) Qi Zheng
2024-12-04 22:36   ` Andrew Morton
2024-12-04 22:47     ` Jann Horn
2024-12-05  3:23       ` Qi Zheng
2024-12-05  3:35     ` Qi Zheng
2024-12-06 11:23   ` [PATCH v4 09/11 fix] fix: " Qi Zheng
2024-12-04 11:09 ` [PATCH v4 10/11] x86: mm: free page table pages by RCU instead of semi RCU Qi Zheng
2024-12-04 11:09 ` [PATCH v4 11/11] x86: select ARCH_SUPPORTS_PT_RECLAIM if X86_64 Qi Zheng
2024-12-10  8:44   ` [PATCH v4 12/11] mm: pgtable: make ptlock be freed by RCU Qi Zheng
2024-12-04 22:49 ` [PATCH v4 00/11] synchronously scan and reclaim empty user PTE pages Andrew Morton
2024-12-04 22:56   ` Jann Horn
2024-12-05  3:59     ` Qi Zheng
2024-12-05  3:56   ` Qi Zheng
2024-12-10  8:57 ` Qi Zheng [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53fb3b26-4a28-48a2-8403-a9b8d2fe6c24@bytedance.com \
    --to=zhengqi.arch@bytedance.com \
    --cc=akpm@linux-foundation.org \
    --cc=catalin.marinas@arm.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=hughd@google.com \
    --cc=jannh@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=luto@kernel.org \
    --cc=mgorman@suse.de \
    --cc=muchun.song@linux.dev \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rientjes@google.com \
    --cc=vbabka@kernel.org \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    --cc=zokeefe@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox