From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7711ACFB43F for ; Sun, 6 Oct 2024 20:15:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 838E06B0170; Sun, 6 Oct 2024 16:15:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7E72A6B0171; Sun, 6 Oct 2024 16:15:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6AF3E6B0172; Sun, 6 Oct 2024 16:15:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 4B98A6B0170 for ; Sun, 6 Oct 2024 16:15:58 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id C0FD74032F for ; Sun, 6 Oct 2024 20:15:57 +0000 (UTC) X-FDA: 82644283554.25.2FCDC29 Received: from mail-lj1-f177.google.com (mail-lj1-f177.google.com [209.85.208.177]) by imf05.hostedemail.com (Postfix) with ESMTP id E7832100005 for ; Sun, 6 Oct 2024 20:15:55 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=b6t7gGrG; spf=pass (imf05.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.177 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1728245579; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=lQ41D6YpHw3mMfa3ZeJYQeO4Vo/r+hrFXgDDtjGd1J4=; b=c/+qy8l1fo5FEJe5F59We12v145nt0A4BPSXzwjCjCYU6fP3HA7GXv2J+oMMAfWEhlNJWc 5YdOhyT6F+gA3w/h9rIU/aN2K8J+wFyzBn0+5DCfT7c7cZ29UII9iJ9nh3bZxodgLxr6oF gDNp+qiz640s3FZQWn/+faMxonFqHEM= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=b6t7gGrG; spf=pass (imf05.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.177 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1728245579; a=rsa-sha256; cv=none; b=cc8v4sgaGICbwDx5GtOPfQAaj5bUSDCIGN9Cxd+yopUnLY4S4zVg5moDUROQpBdXvwO+HF 0wDCXJ1qMOPCYZIYGjxaTIyVC38KFxigeEKjvxRBG32KaiEJPHcYtDPhQhxdgRCEwcDGWL +ZVTpmAo0Ji9sJGAZi5P1rQD76kmr0M= Received: by mail-lj1-f177.google.com with SMTP id 38308e7fff4ca-2fac9eaeafcso38467641fa.3 for ; Sun, 06 Oct 2024 13:15:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1728245754; x=1728850554; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=lQ41D6YpHw3mMfa3ZeJYQeO4Vo/r+hrFXgDDtjGd1J4=; b=b6t7gGrGvk1521Uuxw0CMuCpCg7rGIYlgPuUeRuqWgu/b1rjx64XtZXwPIFxSYPO2D NGiNbnsUL2GbFKqhOh7wDehujWlp8oAsRYgu0AMFY7ohOSdJABnI3MWuMesVqRVC8AhA vbc6Np0vcH+eCqXrG8OLoGvc0r+vJ7T4MME8tYc73k8+qDSvR4ziLjElJ/SKz4EMCedy /RngKBmQpemVwifnw9D+8vJmrFj9xnLrGZNkSENf3Pzt0ikStlLPYxT63OnXuYxf9xR6 sgXzC5zbkadG+2eeQ2LPJfNskgc9yiuAduAhy5KAcTLEZOOkC0vpqnq+ZTaSHhx5OJPN 3aQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728245754; x=1728850554; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=lQ41D6YpHw3mMfa3ZeJYQeO4Vo/r+hrFXgDDtjGd1J4=; b=jImzGvV/1IdCHgTnSCNpnB9UbSeQMmMA2fGaOBUFj25XWfcQ6ZdF92/Dqfx+nX5EX8 F2hKAdCmAZuiU1C9STr6/0749FCmVH3qkaZOmplZOhJ99DR7lT2MO40A2t+KtWPDJ+ng H9bcPbuTPqevxhhCRK+AI95wYlAHtB0lJ0e57kHoMkDyAn2nm6UXdaea+V2IGXlmfa0D PS+bK4XJsYxzMaJ2ghdFBZ4jpuCKrxwYQgZUg8McLaxOgue4gdc+1Yulm49cxpbZU1H7 CvRrvSRK3WlvgeojW6TXYz4czcr7GZWnv+KT6xgh6fzAX8pIGPXs+0+TKrIHaDNEil2K yycA== X-Forwarded-Encrypted: i=1; AJvYcCXeGgQOl7zMqjKWpWMOA01/X805ERvYmEPYYPHpZhcMwsE8yxHqlAxxGLybtVWcD72l+lVoZa3d0A==@kvack.org X-Gm-Message-State: AOJu0YxuRvUGEhiyKMV+LZUSTlnCXEqrSTDRupbWahlLniPrSPcZwQFn tW3oEdFKrdjIA8PThz9f6m6bF5/flFmbqyMMlC1sfLA/huC1kz/w75Wfwui24I9t3Wh9jtx4/wm +IO5VH6Ft3dMeu+F5Q1u6goAHdOQ= X-Google-Smtp-Source: AGHT+IG9R8rYIz8urcx8DXoB6WpkYs7kTrkH0lAgse4E1WzTnVT5mQBqoMEDNEQWj+dF3rkBsrnefTBwl1ynCBfxAf8= X-Received: by 2002:a2e:be84:0:b0:2ef:2b38:879c with SMTP id 38308e7fff4ca-2faf3c0169cmr43054081fa.3.1728245753618; Sun, 06 Oct 2024 13:15:53 -0700 (PDT) MIME-Version: 1.0 References: <20241004142504.4379-1-aha310510@gmail.com> In-Reply-To: <20241004142504.4379-1-aha310510@gmail.com> From: Kairui Song Date: Mon, 7 Oct 2024 04:15:37 +0800 Message-ID: Subject: Re: [PATCH] mm: swap: prevent possible data-race in __try_to_reclaim_swap To: Jeongjun Park Cc: akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, syzbot+fa43f1b63e3aa6f66329@syzkaller.appspotmail.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: E7832100005 X-Stat-Signature: oypizp37ek6bpht7o3oaxrs5ayaubizd X-Rspam-User: X-HE-Tag: 1728245755-146637 X-HE-Meta: U2FsdGVkX18othesCNTZYIdqe3cQrjkY0f+VawIS2uGq4IeWG9E2M8z2EY210NIxp1x8WNCXQSkUk5DRR1oh69aiaIZwt8iliWBr5eLda/kOxHJGadeB7Z5i5xFgQEeIqD2mqFBpwGMpZNeAo4kueCoW/3t4U9cNP02vZ2PS3tYhPgfoFxpJw2EFme7tf/6fzZ5P9zvtzq4TvaEcMfQPVPB1MCAlQyS18kt05rxboINFGJOcbNExVtVBkfy6SIo0haH16loXVj6T2XuPT1FBzv1EZ5mD53VtvV4Tsllx5naTXB4MZea4jepGidkfZTlBwEZV0iqULtHP33Q8eB8WnIqPF4LPNJwfna5XlAIsEIJ1iicAbdwwKGGb7dG3x3BxWAXu6/siHqx02AtLfh0DVXIMzXxoJi83F1jzg3fmbUEzRxThLDF+crGIZ8olzP+UUbbq27agXscggrOoOlbL0T4qjoyhrQsk+t9ZGctaa9+3oxNryRdX/rxKttbDw91tx4hRirWRIRe8p6bP7H/TXMxrmZ/00PRPY4GnL+NPFWWEKQzHkaPIizo8KG9j1uY2K3GyqWEDZntwi/g9n05U9uojV6KSYZ5kzUWnsmUcrVhxv8jm+7oQ8BDVvGd9lonobb3WhGqY3KjDz+cWXWTDwt7uRbvl44CjjZsltadnxiEs7GTmv9HjQoq7cYDFhAh1NMO2VkrnlrWedrQ08bYhXsVY4GkhF0fNfprXcgHEmIkLCbQoGgPT6EsCcgesM763k6NLFu/1+Rjx9yyxr+r43ioNXghmCqYwJ0VuFsX2wwVqgp2uX57GzCrmcHfhnDJT19HXChZejdzXkafXnyGKN76vVTnnoy2aoYFkLu5FgxDCmgcWmy5InCSCpbyqDaSwdsxWD3s6S1iU/KdvmRnVifDUWhXABVVO+RgyfR229Ls0yClYVksaPm5uKl/hYDx9VPpkzK3sZiH37y9eTnD 3UcnrG9o igh2aLNp4Ryr/VS285us3TkgdMUBdk2BVIDNYPKNmmSN89YwJy97GK8CbVp1+lHFMY3AFn+zwubEYBuewbiiGD4hpV/j3Z/49YEfKWpqH74M6kIws48zSHcw2qTSITAN1iSDEv+iY0OL72FZj20hhVgH8eSrfiP+orB6NxvIhJU/kvKIXXkk6AVbGux4PDIi2UcsqMoqYLwT+bzENzL3s0jvMTiof1u7CQSdhRcIDDE0eDX/yUFDhm6tvge5oyCof3X8N+5UMp8Ajkxt1nzJVnhjwHYhwcA4XoQKlKCx7WZLV5AVhgVxqy2LsNxE5hRREZAVcD6HUwe668vMM6QNs7x0d2mtb8AqXoLwBMJ/mzCr6jX35yv071vlyU+Aud1BYHmPxqRktxkW1zt/hNAMpkVWt3JVQzfqBbVoVjRa1R+rLWSbajmZdy4C9YIN/t1389db0nNkywEmOJLmrkDoQBL/symwLow+Pdzljfh6BG7oiSm2eh6j248mwYVIUXIPLfwHsXgvQDPamOYRkJD05Z9MO0Vqf0qnfwLaQ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Oct 4, 2024 at 10:26=E2=80=AFPM Jeongjun Park = wrote: > > A report [1] was uploaded from syzbot. > > In the previous commit 862590ac3708 ("mm: swap: allow cache reclaim to sk= ip > slot cache"), the __try_to_reclaim_swap() function reads offset and nr_pa= ges > from folio without folio_lock protection. > > In the currently reported KCSAN log, it is assumed that the actual data-r= ace > will not occur because the calltrace that does WRITE already obtains the > folio_lock and then writes. > > However, the existing __try_to_reclaim_swap() function was already implem= ented > to perform reads under folio_lock protection [1], and there is a risk of = a > data-race occurring through a function other than the one shown in the KC= SAN > log. > > Therefore, I think it is appropriate to change all read operations for > folio to be performed under folio_lock. > > [1] > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > BUG: KCSAN: data-race in __delete_from_swap_cache / __try_to_reclaim_swap > > write to 0xffffea0004c90328 of 8 bytes by task 5186 on cpu 0: > __delete_from_swap_cache+0x1f0/0x290 mm/swap_state.c:163 > delete_from_swap_cache+0x72/0xe0 mm/swap_state.c:243 > folio_free_swap+0x1d8/0x1f0 mm/swapfile.c:1850 > free_swap_cache mm/swap_state.c:293 [inline] > free_pages_and_swap_cache+0x1fc/0x410 mm/swap_state.c:325 > __tlb_batch_free_encoded_pages mm/mmu_gather.c:136 [inline] > tlb_batch_pages_flush mm/mmu_gather.c:149 [inline] > tlb_flush_mmu_free mm/mmu_gather.c:366 [inline] > tlb_flush_mmu+0x2cf/0x440 mm/mmu_gather.c:373 > zap_pte_range mm/memory.c:1700 [inline] > zap_pmd_range mm/memory.c:1739 [inline] > zap_pud_range mm/memory.c:1768 [inline] > zap_p4d_range mm/memory.c:1789 [inline] > unmap_page_range+0x1f3c/0x22d0 mm/memory.c:1810 > unmap_single_vma+0x142/0x1d0 mm/memory.c:1856 > unmap_vmas+0x18d/0x2b0 mm/memory.c:1900 > exit_mmap+0x18a/0x690 mm/mmap.c:1864 > __mmput+0x28/0x1b0 kernel/fork.c:1347 > mmput+0x4c/0x60 kernel/fork.c:1369 > exit_mm+0xe4/0x190 kernel/exit.c:571 > do_exit+0x55e/0x17f0 kernel/exit.c:926 > do_group_exit+0x102/0x150 kernel/exit.c:1088 > get_signal+0xf2a/0x1070 kernel/signal.c:2917 > arch_do_signal_or_restart+0x95/0x4b0 arch/x86/kernel/signal.c:337 > exit_to_user_mode_loop kernel/entry/common.c:111 [inline] > exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline] > __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline] > syscall_exit_to_user_mode+0x59/0x130 kernel/entry/common.c:218 > do_syscall_64+0xd6/0x1c0 arch/x86/entry/common.c:89 > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > read to 0xffffea0004c90328 of 8 bytes by task 5189 on cpu 1: > __try_to_reclaim_swap+0x9d/0x510 mm/swapfile.c:198 > free_swap_and_cache_nr+0x45d/0x8a0 mm/swapfile.c:1915 > zap_pte_range mm/memory.c:1656 [inline] > zap_pmd_range mm/memory.c:1739 [inline] > zap_pud_range mm/memory.c:1768 [inline] > zap_p4d_range mm/memory.c:1789 [inline] > unmap_page_range+0xcf8/0x22d0 mm/memory.c:1810 > unmap_single_vma+0x142/0x1d0 mm/memory.c:1856 > unmap_vmas+0x18d/0x2b0 mm/memory.c:1900 > exit_mmap+0x18a/0x690 mm/mmap.c:1864 > __mmput+0x28/0x1b0 kernel/fork.c:1347 > mmput+0x4c/0x60 kernel/fork.c:1369 > exit_mm+0xe4/0x190 kernel/exit.c:571 > do_exit+0x55e/0x17f0 kernel/exit.c:926 > __do_sys_exit kernel/exit.c:1055 [inline] > __se_sys_exit kernel/exit.c:1053 [inline] > __x64_sys_exit+0x1f/0x20 kernel/exit.c:1053 > x64_sys_call+0x2d46/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:= 61 > do_syscall_x64 arch/x86/entry/common.c:52 [inline] > do_syscall_64+0xc9/0x1c0 arch/x86/entry/common.c:83 > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > value changed: 0x0000000000000242 -> 0x0000000000000000 > > Reported-by: syzbot+fa43f1b63e3aa6f66329@syzkaller.appspotmail.com > Fixes: 862590ac3708 ("mm: swap: allow cache reclaim to skip slot cache") > Signed-off-by: Jeongjun Park > --- > mm/swapfile.c | 13 ++++++------- > 1 file changed, 6 insertions(+), 7 deletions(-) > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 0cded32414a1..904c21256fc2 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -193,13 +193,6 @@ static int __try_to_reclaim_swap(struct swap_info_st= ruct *si, > folio =3D filemap_get_folio(address_space, swap_cache_index(entry= )); > if (IS_ERR(folio)) > return 0; > - > - /* offset could point to the middle of a large folio */ > - entry =3D folio->swap; > - offset =3D swp_offset(entry); > - nr_pages =3D folio_nr_pages(folio); > - ret =3D -nr_pages; > - > /* > * When this function is called from scan_swap_map_slots() and it= 's > * called by vmscan.c at reclaiming folios. So we hold a folio lo= ck > @@ -210,6 +203,12 @@ static int __try_to_reclaim_swap(struct swap_info_st= ruct *si, > if (!folio_trylock(folio)) > goto out; > > + /* offset could point to the middle of a large folio */ > + entry =3D folio->swap; > + offset =3D swp_offset(entry); > + nr_pages =3D folio_nr_pages(folio); > + ret =3D -nr_pages; > + > need_reclaim =3D ((flags & TTRS_ANYWAY) || > ((flags & TTRS_UNMAPPED) && !folio_mapped(folio))= || > ((flags & TTRS_FULL) && mem_cgroup_swap_full(foli= o))); > -- > Thanks for catching this! This could lead to real problems, holding reference is not enough for protecting folio->swap. There are several BUG_ONs later that will be triggered if it changed. But you still have to keep `nr_pages ` and `ret` before the `folio_trylock `, or `ret` will be uninitialized if folio_trylock fails, this function should always return the page number even if the try lock failed. And as WIlly said, `folio_nr_pages` doesn't require folio lock.