From: Kairui Song <ryncsn@gmail.com>
To: Jeongjun Park <aha310510@gmail.com>, akpm@linux-foundation.org
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
syzbot+fa43f1b63e3aa6f66329@syzkaller.appspotmail.com
Subject: Re: [PATCH v2] mm: swap: prevent possible data-race in __try_to_reclaim_swap
Date: Fri, 11 Oct 2024 17:18:37 +0800 [thread overview]
Message-ID: <CAMgjq7BdaUe6YweW03MHEOmunXm7tJz=nMOCWTvq_jDoVGAL+Q@mail.gmail.com> (raw)
In-Reply-To: <20241007070623.23340-1-aha310510@gmail.com>
On Mon, Oct 7, 2024 at 3:06 PM Jeongjun Park <aha310510@gmail.com> wrote:
>
> A report [1] was uploaded from syzbot.
>
> In the previous commit 862590ac3708 ("mm: swap: allow cache reclaim to skip
> slot cache"), the __try_to_reclaim_swap() function reads offset and folio->entry
> from folio without folio_lock protection.
>
> In the currently reported KCSAN log, it is assumed that the actual data-race
> will not occur because the calltrace that does WRITE already obtains the
> folio_lock and then writes.
>
> However, the existing __try_to_reclaim_swap() function was already implemented
> to perform reads under folio_lock protection [1], and there is a risk of a
> data-race occurring through a function other than the one shown in the KCSAN
> log.
>
> Therefore, I think it is appropriate to change read operations for
> folio to be performed under folio_lock.
>
> [1]
>
> ==================================================================
> BUG: KCSAN: data-race in __delete_from_swap_cache / __try_to_reclaim_swap
>
> write to 0xffffea0004c90328 of 8 bytes by task 5186 on cpu 0:
> __delete_from_swap_cache+0x1f0/0x290 mm/swap_state.c:163
> delete_from_swap_cache+0x72/0xe0 mm/swap_state.c:243
> folio_free_swap+0x1d8/0x1f0 mm/swapfile.c:1850
> free_swap_cache mm/swap_state.c:293 [inline]
> free_pages_and_swap_cache+0x1fc/0x410 mm/swap_state.c:325
> __tlb_batch_free_encoded_pages mm/mmu_gather.c:136 [inline]
> tlb_batch_pages_flush mm/mmu_gather.c:149 [inline]
> tlb_flush_mmu_free mm/mmu_gather.c:366 [inline]
> tlb_flush_mmu+0x2cf/0x440 mm/mmu_gather.c:373
> zap_pte_range mm/memory.c:1700 [inline]
> zap_pmd_range mm/memory.c:1739 [inline]
> zap_pud_range mm/memory.c:1768 [inline]
> zap_p4d_range mm/memory.c:1789 [inline]
> unmap_page_range+0x1f3c/0x22d0 mm/memory.c:1810
> unmap_single_vma+0x142/0x1d0 mm/memory.c:1856
> unmap_vmas+0x18d/0x2b0 mm/memory.c:1900
> exit_mmap+0x18a/0x690 mm/mmap.c:1864
> __mmput+0x28/0x1b0 kernel/fork.c:1347
> mmput+0x4c/0x60 kernel/fork.c:1369
> exit_mm+0xe4/0x190 kernel/exit.c:571
> do_exit+0x55e/0x17f0 kernel/exit.c:926
> do_group_exit+0x102/0x150 kernel/exit.c:1088
> get_signal+0xf2a/0x1070 kernel/signal.c:2917
> arch_do_signal_or_restart+0x95/0x4b0 arch/x86/kernel/signal.c:337
> exit_to_user_mode_loop kernel/entry/common.c:111 [inline]
> exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline]
> __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
> syscall_exit_to_user_mode+0x59/0x130 kernel/entry/common.c:218
> do_syscall_64+0xd6/0x1c0 arch/x86/entry/common.c:89
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
> read to 0xffffea0004c90328 of 8 bytes by task 5189 on cpu 1:
> __try_to_reclaim_swap+0x9d/0x510 mm/swapfile.c:198
> free_swap_and_cache_nr+0x45d/0x8a0 mm/swapfile.c:1915
> zap_pte_range mm/memory.c:1656 [inline]
> zap_pmd_range mm/memory.c:1739 [inline]
> zap_pud_range mm/memory.c:1768 [inline]
> zap_p4d_range mm/memory.c:1789 [inline]
> unmap_page_range+0xcf8/0x22d0 mm/memory.c:1810
> unmap_single_vma+0x142/0x1d0 mm/memory.c:1856
> unmap_vmas+0x18d/0x2b0 mm/memory.c:1900
> exit_mmap+0x18a/0x690 mm/mmap.c:1864
> __mmput+0x28/0x1b0 kernel/fork.c:1347
> mmput+0x4c/0x60 kernel/fork.c:1369
> exit_mm+0xe4/0x190 kernel/exit.c:571
> do_exit+0x55e/0x17f0 kernel/exit.c:926
> __do_sys_exit kernel/exit.c:1055 [inline]
> __se_sys_exit kernel/exit.c:1053 [inline]
> __x64_sys_exit+0x1f/0x20 kernel/exit.c:1053
> x64_sys_call+0x2d46/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:61
> do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> do_syscall_64+0xc9/0x1c0 arch/x86/entry/common.c:83
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
> value changed: 0x0000000000000242 -> 0x0000000000000000
>
> Reported-by: syzbot+fa43f1b63e3aa6f66329@syzkaller.appspotmail.com
> Fixes: 862590ac3708 ("mm: swap: allow cache reclaim to skip slot cache")
> Signed-off-by: Jeongjun Park <aha310510@gmail.com>
> ---
> mm/swapfile.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 0cded32414a1..eb782fcd5627 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -194,9 +194,6 @@ static int __try_to_reclaim_swap(struct swap_info_struct *si,
> if (IS_ERR(folio))
> return 0;
>
> - /* offset could point to the middle of a large folio */
> - entry = folio->swap;
> - offset = swp_offset(entry);
> nr_pages = folio_nr_pages(folio);
> ret = -nr_pages;
>
> @@ -210,6 +207,10 @@ static int __try_to_reclaim_swap(struct swap_info_struct *si,
> if (!folio_trylock(folio))
> goto out;
>
> + /* offset could point to the middle of a large folio */
> + entry = folio->swap;
> + offset = swp_offset(entry);
> +
> need_reclaim = ((flags & TTRS_ANYWAY) ||
> ((flags & TTRS_UNMAPPED) && !folio_mapped(folio)) ||
> ((flags & TTRS_FULL) && mem_cgroup_swap_full(folio)));
> --
Reviewed-by: Kairui Song <kasong@tencent.com>
Hi Andrew,
Will this be added to stable and 6.12? 862590ac3708 is already in 6.12
and this fixes a potential issue of it.
next prev parent reply other threads:[~2024-10-11 9:18 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-07 7:06 Jeongjun Park
2024-10-07 22:41 ` Chris Li
2024-10-11 9:18 ` Kairui Song [this message]
2024-10-14 2:17 ` Jeongjun Park
2024-10-14 2:28 ` Kairui Song
2024-10-14 4:08 ` Jeongjun Park
2024-10-14 20:13 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAMgjq7BdaUe6YweW03MHEOmunXm7tJz=nMOCWTvq_jDoVGAL+Q@mail.gmail.com' \
--to=ryncsn@gmail.com \
--cc=aha310510@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=syzbot+fa43f1b63e3aa6f66329@syzkaller.appspotmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox