From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 97CC0C021B2 for ; Sun, 23 Feb 2025 18:22:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 11FA96B007B; Sun, 23 Feb 2025 13:22:24 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0D03A6B0082; Sun, 23 Feb 2025 13:22:24 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EB2856B0083; Sun, 23 Feb 2025 13:22:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id C9D666B007B for ; Sun, 23 Feb 2025 13:22:23 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 48A261C6A46 for ; Sun, 23 Feb 2025 18:22:23 +0000 (UTC) X-FDA: 83152029366.02.37C068F Received: from mail-lj1-f169.google.com (mail-lj1-f169.google.com [209.85.208.169]) by imf19.hostedemail.com (Postfix) with ESMTP id 56AC11A0008 for ; Sun, 23 Feb 2025 18:22:21 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=AnQZocVM; spf=pass (imf19.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.169 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740334941; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=yAMj3CJr0eyOhuiIvatbgsXBCE3tisydD/SwTS+kILw=; b=oK4ErCrm/cobExa7kPAyDopCbVQ4tTrQilCLIrxXAZ+YOMPBnppLlvOzb4TfUHbFmEyG19 WNnE7unmDsYGxKJISK7YJdj2HdmDwg0rXtd38JwQKHkLRbNo/W+vX3DuzefFZrxqOdI0L9 k69uWisbLdOOJOOOY18O/XhPI428uuU= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=AnQZocVM; spf=pass (imf19.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.169 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740334941; a=rsa-sha256; cv=none; b=KmCwMr+4X7YnknBpcTNIw7F7tQBX8nwsWPWZ0FrXruG5NQJUSG4j3WRhfWzAXXIqJn3Dxz zaHWs870DRlZs+rskNw2aqp4xqCL7iokUp6/DdPJ4f0ie9S9FurvPgOUhLpWquF00z4iLI BW6JVlEcmOhvZe+JUm5jU+XmcIfG6DA= Received: by mail-lj1-f169.google.com with SMTP id 38308e7fff4ca-30a36eecb9dso36988151fa.2 for ; Sun, 23 Feb 2025 10:22:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1740334939; x=1740939739; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=yAMj3CJr0eyOhuiIvatbgsXBCE3tisydD/SwTS+kILw=; b=AnQZocVMskvKluFATSSdgP1Mr50tI02dWdJwng6dWSCuNHp2JaVnIjr12NU7fg4ME5 4aLg5pOCI6wRk+tuL2225uf1h7mPUV9Eg/Cx/pYVx98ZScd5MtyARHJogbNFCq0R2mhy wm3Uwu2pMVH4lmS76pHBiGYWaFo0yvZcFPV35QLSTA5fApomXQ+5UG4B3mcHKeDuYlT2 FkNO2uQq+xCyrsQIFsF41mn5GRbfqofSx+IXOcLn8NVKqpCfNd/0qyLZtMBVq/EzUUav qhsZ9mNnnG/PpvU4JG+aBgKoFRrZ0JJYitx5Qon38jtvuTUWvCwvbTbtZVIuwGc/pJKj rG3A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740334939; x=1740939739; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=yAMj3CJr0eyOhuiIvatbgsXBCE3tisydD/SwTS+kILw=; b=GusuB6mnH0LAvkAepr4HfW5Kn5YB+4w6FGK2wRe91rrGj75Esm+lJkSHIWIah15YIc vyC3nYHTAZMIs0301D/GmwH1HL8R4iM9oPYmjgZpQ8GDnnMcg4DglR9nhDNm2EMV4q4+ YSjqSh+EI4VxWTvX7s06oOWjQv8T6c7tLfIvLTt857PSo9Lk2sLuh5bQjSNC7mqHNVoB is6YWhuiybLc/oT835rl/ikk7RYkm1JDhPmtqXGGM4MBNGnc0qWYvgWuYUUAGsQE1qwP bsoK5L2HQNuOm17wbTxYkg9DreDAryaTB6HZPxNxmOT1NqXXtiuWzrDHMUPbVCDku8aQ q0SA== X-Forwarded-Encrypted: i=1; AJvYcCXOVKzCj2miimKH4VdaNx5gH6wio6JlWDyChbmevkdS9fkfHvrmF8KrCIQBDjDqjgbw+FU+cbsVtA==@kvack.org X-Gm-Message-State: AOJu0Yz3sxStIjC9gvOL6EFKGGAByFw7VPBoFGLmG4lC6dLKvS0EcFWH TMV5UMJJ5a5f1Ynh9iBuXsATxXOhKlI0xCtujP+4OCIuUv7wcZVSx5alGidoAEF8Q9DQeB6e0N5 GypjHmbMpMzp5YGyLzh8zFFsShUk= X-Gm-Gg: ASbGncuGVGzYiiMMQ16ZErxODI/dQkAtyKe2oyFWkFA3j+N7u8XV9h0MHjshw+hNvYt 7A9aLSSNj751QTFnbUhOBtSnR14GAJj6rWsmyVGtV49pQ3Wfg6oYP5riW7wYxSqjbmo7filGR/k gu4noLqzE= X-Google-Smtp-Source: AGHT+IFL91UEj1E4npQkqa08kCZcmPohOg1WSdSwo6GID2lgmHBnReo0wU2ueSqavxTwwjzy/xB4TxuC0PahKqH+YpU= X-Received: by 2002:a2e:9c8d:0:b0:306:10d6:28ab with SMTP id 38308e7fff4ca-30a5985e2a2mr32932371fa.5.1740334939366; Sun, 23 Feb 2025 10:22:19 -0800 (PST) MIME-Version: 1.0 References: <1738717785.im3r5g2vxc.none.ref@localhost> <1738717785.im3r5g2vxc.none@localhost> <25e2d5e4-8214-40de-99d3-2b657181a9fd@linux.alibaba.com> <5dd39b03-c40e-4f34-bf89-b3e5a12753dc@linux.alibaba.com> In-Reply-To: From: Kairui Song Date: Mon, 24 Feb 2025 02:22:03 +0800 X-Gm-Features: AWEUYZmES3RFfQbRyxkw5Oy4WSTKFulXiOgte4CoFIeasX2d5vPqD2AHskJwtkQ Message-ID: Subject: Re: Hang when swapping huge=within_size tmpfs from zram To: Baolin Wang , "Alex Xu (Hello71)" Cc: Lance Yang , linux-mm@kvack.org, Daniel Gomez , Barry Song , David Hildenbrand , Hugh Dickins , Kefeng Wang , Matthew Wilcox , Ryan Roberts , linux-kernel@vger.kernel.org, Andrew Morton , "ziy@nvidia.com" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Queue-Id: 56AC11A0008 X-Stat-Signature: 1icudf3467aqr3nmo6etmztpgpwtkspx X-Rspamd-Server: rspam03 X-HE-Tag: 1740334941-63766 X-HE-Meta: U2FsdGVkX18QNAmgSLgNJVj9auTZZozjJNfLfq8Z8hioZJnF2LPuZCtMpSsfNUJ/NlCU/fHks+j5NVZXea4/xsXjCHs/yOuBhji+JT84LqTgbOBVfobZVZylfBPkLwtRhbssVv6xr6imvGbbseh++EXiEi6bodabE5/T/E9U5V1ASzMB4ieGIMvBui06OTClL4KSjZoIE0a1kmWci7DCnfeCJNlZKLyKPv+2tBeFj5ut5Iyf2UDEmL2L21hU3VqYcaPV7ypyTAScFTCzYOKvGIBk49bM7TGKal+WINFWevBSTlmE40JLxUYqJvnTemvZhyX/rvyVKW6wZj9XBPjfFJEeGB5iryV4Dc+7XEfkz58xDjjAYFJv6MD5qNDLucSieHwkPBnMS+jl65IwC+0GICGb/htsOh9eOjZRIS4m200UUMzeG1sHRrSLebLDCornClXWJgeziB6wBYFGC0FqT+/9K/455kjvONvlcpWoV9ooKJj9pq7lD9MHTDj+m2o5531t5nzkBu5/AEssqDbMR55+mTYOf080Ytldvf/5pT8bsOW2A46JHrk7J0Zx5BReupQ/SqjL92PaDO819KE2ioaeYt2mjGaY7Qq5YoS7z9qlDh8IWFkOo5GkTVcoyNTrNtnj1jx/bsjriB8SLId1kFJHF1xGHHBPkMQYnSGb7gR6woyx+KO04OzJDBzL4kJqd6wLcJkZt+jY7awOV+B3wZ1FxP8LUFGlDwpkn9vAPEtMe6UL/gcxX4IZLVhP4SzorlyDKFL2NoVKh5xYxpzN+64a8Jttx7igMyjfNJ//3DZZlyHrb4JquonTSeQRRQEOVIsUYrgR3zF1VQW9ce8mwJNWeWaNRRDotfLvmmPAL6VmItWhU6yOwasZqjfcbtD/yBLNw1XeDW9+Hl9fjJNb5tnm7nOwCKIEEQ9bDlQKlQ8EyMu5qJySKeD76fp4pvi1z5YfqLv4QGAFY47cdEZ PIIdW3Bv 6Z1J/gPkkH+2h1RWbBuWfp+z8xINiw6xcmEHayQoROMZ231aVu3HRU7+Zgnn6zmY72X2XeGFn5ACiJlhy5Mpfi39cd2+imlUgMD0okACdkU4HJgzCFPyrW3vu/GzRM9WMySMvMU+ny4porphnsq2dVlWUs+9m70vNYAV4JO2s+P+rRUDkRSjdz6QrE25B+EZqhzJbQ3MkCNKv24wAhJjw7/qmHEmD8dsUqL850QsBqbrJoP1p1SNqw5xNh5gmr82m2b5+RL3ZuXxZc+v5v024zJJsHegHMCXfi2yzi9D19ZYgWfDtgs/O0hjib9mbhFEdJ3dWwrBTYXjbgvRNPVvIXv0+5ea+q/jcigGhVmX9Dt+aHOb9Lzcq8+ianSRhBbio1D6UNrEAEuh4q5Ip7cro2nStLEF9qlyGs5dJOZhEGOE1PieqHHH7ZE/7Dc3LnUerTZox5DanOaBWgJ4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000302, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Feb 24, 2025 at 1:53=E2=80=AFAM Kairui Song wrot= e: > > On Fri, Feb 7, 2025 at 3:24=E2=80=AFPM Baolin Wang > wrote: > > > > On 2025/2/5 22:39, Lance Yang wrote: > > > On Wed, Feb 5, 2025 at 2:38=E2=80=AFPM Baolin Wang > > > wrote: > > >> On 2025/2/5 09:55, Baolin Wang wrote: > > >>> Hi Alex, > > >>> > > >>> On 2025/2/5 09:23, Alex Xu (Hello71) wrote: > > >>>> Hi all, > > >>>> > > >>>> On 6.14-rc1, I found that creating a lot of files in tmpfs then de= leting > > >>>> them reliably hangs when tmpfs is mounted with huge=3Dwithin_size,= and it > > >>>> is swapped out to zram (zstd/zsmalloc/no backing dev). I bisected = this > > >>>> to acd7ccb284b "mm: shmem: add large folio support for tmpfs". > > >>>> > > >>>> When the issue occurs, rm uses 100% CPU, cannot be killed, and has= no > > >>>> output in /proc/pid/stack or wchan. Eventually, an RCU stall is > > >>>> detected: > > >>> > > >>> Thanks for your report. Let me try to reproduce the issue locally a= nd > > >>> see what happens. > > >>> > > >>>> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: > > >>>> rcu: Tasks blocked on level-0 rcu_node (CPUs 0-11): P25160 > > >>>> rcu: (detected by 10, t=3D2102 jiffies, g=3D532677, q=3D4997 n= cpus=3D12) > > >>>> task:rm state:R running task stack:0 pid:251= 60 > > >>>> tgid:25160 ppid:24309 task_flags:0x400000 flags:0x00004004 > > >>>> Call Trace: > > >>>> > > >>>> ? __schedule+0x388/0x1000 > > >>>> ? kmem_cache_free.part.0+0x23d/0x280 > > >>>> ? sysvec_apic_timer_interrupt+0xa/0x80 > > >>>> ? asm_sysvec_apic_timer_interrupt+0x16/0x20 > > >>>> ? xas_load+0x12/0xc0 > > >>>> ? xas_load+0x8/0xc0 > > >>>> ? xas_find+0x144/0x190 > > >>>> ? find_lock_entries+0x75/0x260 > > >>>> ? shmem_undo_range+0xe6/0x5f0 > > >>>> ? shmem_evict_inode+0xe4/0x230 > > >>>> ? mtree_erase+0x7e/0xe0 > > >>>> ? inode_set_ctime_current+0x2e/0x1f0 > > >>>> ? evict+0xe9/0x260 > > >>>> ? _atomic_dec_and_lock+0x31/0x50 > > >>>> ? do_unlinkat+0x270/0x2b0 > > >>>> ? __x64_sys_unlinkat+0x30/0x50 > > >>>> ? do_syscall_64+0x37/0xe0 > > >>>> ? entry_SYSCALL_64_after_hwframe+0x50/0x58 > > >>>> > > >>>> > > >>>> Let me know what information is needed to further troubleshoot thi= s > > >>>> issue. > > >> > > >> Sorry, I can't reproduce this issue, and my testing process is as fo= llows: > > >> 1. Mount tmpfs with huge=3Dwithin_size > > >> 2. Create and write a tmpfs file > > >> 3. Swap out the large folios of the tmpfs file to zram > > >> 4. Execute 'rm' command to remove the tmpfs file > > > > > > I=E2=80=99m unable to reproduce the issue as well, and am following s= teps similar > > > to Baolin's process: > > > > > > 1) Mount tmpfs with the huge=3Dwithin_size option and enable swap (us= ing > > > zstd/zsmalloc without a backing device). > > > 2) Create and write over 10,000 files in the tmpfs. > > > 3) Swap out the large folios of these tmpfs files to zram. > > > 4) Use the rm command to delete all the files from the tmpfs. > > > > > > Testing with both 2MiB and 64KiB large folio sizes, and with > > > shmem_enabled=3Dwithin_size, but everything works as expected. > > > > Thanks Lance for confirming again. > > > > Alex, could you give more hints on how to reproduce this issue? > > > > Hi Baolin, > > I can reproduce this issue very easily with the build linux kernel > test, and the failure rate is very high. I'm not exactly sure this is > the same bug but very likely, my test step: > > 1. Create a 10G ZRAM device and set up SWAP on it. > 2. Create a 1G memcg, and spawn a shell in it. > 3. Mount tmpfs with huge=3Dwithin_size, and then untar linux kernel > source code into it. > 4. Build with make -j32 (higher or lower job number may also work), > the build will always fall within 10s due to file corrupted. > > After some debugging, the reason is in shmem_swapin_folio, when swap > cache is hit `folio =3D swap_cache_get_folio(swap, NULL, 0);` sets folio > to a 0 order folio, then the following shmem_add_to_page_cache will > insert a order 0 folio overriding a high order entry in shmem's > xarray, so data are lost. Swap cache hit could be due to many reasons, > in this case it's the readahead. > > One quick fix is just always split the entry upon shmem fault of 0 > order folio like this: > > diff --git a/mm/shmem.c b/mm/shmem.c > index 4ea6109a8043..c8e5c419c675 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -2341,6 +2341,10 @@ static int shmem_swapin_folio(struct inode > *inode, pgoff_t index, > } > } > > + /* Swapin of 0 order folio must always ensure the entries are spl= it */ > + if (!folio_order(folio)) > + shmem_split_large_entry(inode, index, swap, gfp); > + > alloced: > /* We have to do this with folio locked to prevent races */ > folio_lock(folio); > > And Hi Alex, can you help confirm if the above patch fixes your reported = bug? > > If we are OK with this, this should be merged into 6.14 I think, but > for the long term, it might be a good idea to just share a similar > logic of (or just reuse) __filemap_add_folio for shmem? > __filemap_add_folio will split the entry on insert, and code will be > much cleaner. Some extra comments for above patch: If it raced with another split, or the entry used for swap cache lookup is wrongly aligned due to large entry, the shmem_add_to_page_cache below will fail with -EEXIST and try again. So that seems to be working well in my test.