From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0E29DC021B2 for ; Sun, 23 Feb 2025 17:53:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DFD386B007B; Sun, 23 Feb 2025 12:53:28 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D872D6B0082; Sun, 23 Feb 2025 12:53:28 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C009C6B0083; Sun, 23 Feb 2025 12:53:28 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 9CF0C6B007B for ; Sun, 23 Feb 2025 12:53:28 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 27057A010F for ; Sun, 23 Feb 2025 17:53:28 +0000 (UTC) X-FDA: 83151956496.10.95F4C34 Received: from mail-lj1-f177.google.com (mail-lj1-f177.google.com [209.85.208.177]) by imf03.hostedemail.com (Postfix) with ESMTP id 26CD020003 for ; Sun, 23 Feb 2025 17:53:25 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=FdqFD5f7; spf=pass (imf03.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.177 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740333206; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=JBbVLznRp3wr0ziewh2y83BSDd6c32bl40jV+ZXyexQ=; b=Y75zkVcuoLr+1YfifTHPUPYVc97n9pHVhnBSiLz8+5/LXyZaQTHomY8gXUHi0axzwrpGOE +dSxgUBvMlNlz4nVOn6MOQhOYIvh/C5ahj66phytqKAKV4zHucIcEt0brF/vJG9ZBGzhKm JKdU60t56S/czkU/Fp1GUZw8qKCtTpM= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=FdqFD5f7; spf=pass (imf03.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.177 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740333206; a=rsa-sha256; cv=none; b=ezO+dxNztNk77wA0pTfUT8KCVLFtJ7UWGYpf+zErOMbX350WJqgng5LkAPJPZ1Pi1ZWU5S 84G5h4y7N0Y0vmFipvIxuv8lacQZvyUunNcu4jlSrYslbEn+EL1af4+SnhONVCkTjcrYGF K4WYzHD4IubmU19vE2YJ/8IJmnC89RY= Received: by mail-lj1-f177.google.com with SMTP id 38308e7fff4ca-3078fb1fa28so28669631fa.3 for ; Sun, 23 Feb 2025 09:53:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1740333204; x=1740938004; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=JBbVLznRp3wr0ziewh2y83BSDd6c32bl40jV+ZXyexQ=; b=FdqFD5f7Nt01qDYUFNJ5F433o9dwiVkOqZtvlvwTMWjWVPCAACxg4Or3APMtQlr+Vk 63wlIK+CbalaWIo4Yt52SrwVK6Pe7btEf+3VlTzZ5zKTHRsqUt8uPPrCIsn1asZg68Ph +k6JFKQ6J+gIYqr/76F+zLvlnKJf7oWAom6kjyN2DWTWFGAvqCsXmT5tr7OfYqPrfnvb EhH2yLssY3TnCsyzH1xOBXUCMgBYqHgSJQwV+pTahO63sX8H6HF0ZBo3uD7B4xaDnqBj 8MUCjxqyYyB7LQWOtrwxi7lj4uVBQC/y3Tw/pMO95YRxVen4A0mxDBs+95YJhOw0YfTK 4deg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740333204; x=1740938004; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=JBbVLznRp3wr0ziewh2y83BSDd6c32bl40jV+ZXyexQ=; b=k7FgXQBE8LAapV9SIw1OK+QJPVk31L2UF+JxDc4fEcclub4kcAZHXtRDiep6TA4qSK reoisVRQ6am5wRXuCaxDNneBWNYL0ZRErZeiBIiu+qjwPkYqahCUdIrvaMOsT050EJtX ZL5dnlFuLZ+ootoYv3HWEyszeAlpOMgMOTG+7H42t+NGGdO6Ca5r5GlkJioFs/ep4E5g xIbz9hBlCeLbiHc+MBDEXZAGYYqAqtm7ELaqKVyUj1HkeIgWB0S/CVjd/6yXuu71zQzF 8lKL6Gt5yfvST8B/mq/NJQSVD2PEKaqelZOKllN4NsHs4J/JOvCdDI2pSJmPrcBJ4YxJ DZWQ== X-Forwarded-Encrypted: i=1; AJvYcCXB5rT0f6bf/12C08bgW/cQ/lgaRaMbr1vCmtONQoP25PjQgNy4y5BAvzluPqahaLyvoujexjMbow==@kvack.org X-Gm-Message-State: AOJu0Yzq9nr/j9T7z2xvZ40nfRXUdgkNWBvICaVbZVtvoILTQ7WMXRTf 7dsC50cCjr+sh0KTo4gVVX0hHZeg/NYV5mxDzu+nLky+V4MTJF1rHa8wgjpkPjeVicPCShfmVDF pMhpMb7LmvicAN6FSCWiDPHrc2y0= X-Gm-Gg: ASbGnctmcfnX8+xJldLDphA4buyQxoZEN7WuhXbZUZp21VprJRb9KFZkoLF0c8gieaZ Jim9353GGHnz4tymxrbwbb4oApGZiev/BNqX6wttrIw/LYOarxSaffL6kV+VPnpXxd3RUhluJAO m8gnpOynQ= X-Google-Smtp-Source: AGHT+IEH5vgzsoTHwYY3yT0gALB3H55PMPk2C7NUO9KIbuEMH6aHmjmjN9ZABpY1z8174RDjCcd+TTBPEihhynwDUDg= X-Received: by 2002:a2e:9497:0:b0:309:269e:3ac7 with SMTP id 38308e7fff4ca-30a5b18af54mr37509581fa.11.1740333203810; Sun, 23 Feb 2025 09:53:23 -0800 (PST) MIME-Version: 1.0 References: <1738717785.im3r5g2vxc.none.ref@localhost> <1738717785.im3r5g2vxc.none@localhost> <25e2d5e4-8214-40de-99d3-2b657181a9fd@linux.alibaba.com> <5dd39b03-c40e-4f34-bf89-b3e5a12753dc@linux.alibaba.com> In-Reply-To: From: Kairui Song Date: Mon, 24 Feb 2025 01:53:07 +0800 X-Gm-Features: AWEUYZmOsMq7PEhlCoRG0Qy5DuQaQVGhhSIFYh_ZOZErn8gQjwRsWidGarZLfn8 Message-ID: Subject: Re: Hang when swapping huge=within_size tmpfs from zram To: Baolin Wang , "Alex Xu (Hello71)" Cc: Lance Yang , linux-mm@kvack.org, Daniel Gomez , Barry Song , David Hildenbrand , Hugh Dickins , Kefeng Wang , Matthew Wilcox , Ryan Roberts , linux-kernel@vger.kernel.org, Andrew Morton , "ziy@nvidia.com" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 26CD020003 X-Stat-Signature: id9xs445p5p7iscdt9zom3s1nr3uoswh X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1740333205-594635 X-HE-Meta: U2FsdGVkX18d+2Y4b5uPq5PSBhFIK4Pm/pcvJujcj2gCfQ0+mNS01IJuOWTOSO5HZNAfeap0K8i5TNU9oe1MSLTM7IOPS4jM4glyvDeqxC/mbXSQhl577StA84PMaf7zFwR+pvBbGRwWBw7yQTkV8kDznvoLZMWsV/XzFfjOB0ApKoicwU7tvotKgCqHc9DbhDrEnqU0S6y2ddBxbHJRQO8Kq/k53G0/pxyNcHNktkbPztmz2Qd0Y8KgdWLNllMO2flID7SnD4Y9nkeTTql/HxJeEo3GU6OVkVecMShI2XnqzME09rB+sS+pSpuGTMIQAEWbi17UhvC45h8bD/c40Nj0FIKSvLSOLc7Z2EaidIbtYdBznt850QNIVL4w947n92h/ErTrmL0qZqImarwMA1Ugb2hq0Ag2RQnxvc2y3znnGbgnlL8QwHPZPd80vPLFhexioi/4XJN6brQV4Akqi2vZfdcoHKQMkZ8UrneK/1aEkaCor41BI5wBKmVd6ySoogzfn86x1PL2RbFjxNJcX6rtqmE8hwxYwZBMxYXl6phHXcYmbODII7XkMvseQsmh1g+7wLXQP+oBGTOA8UYTMZ715cUzLlje36iOp7oquOAJn3EFhZI5MkANY2zPp2ImK4tpSlVQAkPeJFaMuZ5vx+JCoGthSYiQEdIP0d2a7PtsQol5pfHnfiK4BphAkhK3CpaWhMz4PbBmz0tCpfB+37XHZo2bgEMdDe/q6H4A5oBklyFNLJej2EZC8PGNUhYPgmQvwbmpWnURrHbTfF6+ReX96dY0QgPh403QxwHOyaI9AnSoQJ7dkVD5m0KbeXnMrGsjBjn7RtQHYfKllIjNZL+lX84kVylyJcQonIkklN/S5FGsJStw1Y8W5VTTxUA4kCmMVTeQ10mv/quh4wCqEWvEuRRUp3xvyto84l4roIb+q6EBmgb1htCGIjeZkIZ+YemThMZAhR+DqnCYx/o b7ePcrta 6QN7k6IHZouAFrL5k8QjmCVLBZ6rSi2Pc6KX1Tjno49QSTmeKftIn6BRSYqJ90uBzpdsU3AAti3Eb4Ad2Nb7aM/6IfOt7paLqYfqdYP3LK86sLMCtUyyzJgFXdCfPPDpHsLshz1C6szC/Yuruylt9nTvuHhhVGxtfkme2z3e8+/H7cGgmF4GbJjRsB1JsxFKArpMjdLmjIQnSJ+POj76jDCzJmLFDnvhe7RhO/rZgN8yHBQt67/h062DtPPv5J59FY4u+BsxR77vpjYLLdqTGEQTaQ1c2RZ9EwN2c47qcBfuBvOKMu+MFfdatNUGvoCAr5lYYrpvXVdTK6TwRw0yHs6jmTa0v86onz9QrKkTqJuOPOZvcvgSnPTJWdLRNqtifA4LaX6EG4W6b3DHUxsNFlE4odP+UwTm1PEUaFr+sFyhJ4XigM2QEXueM6JudiIHsU0yF2ivhJwmuyk0Tjb2Aj+uLqIRZKh387Xu02EpzaUTncQ/1NJMOAGsfrwP9NPWJXL+Q X-Bogosity: Ham, tests=bogofilter, spamicity=0.000002, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Feb 7, 2025 at 3:24=E2=80=AFPM Baolin Wang wrote: > > On 2025/2/5 22:39, Lance Yang wrote: > > On Wed, Feb 5, 2025 at 2:38=E2=80=AFPM Baolin Wang > > wrote: > >> On 2025/2/5 09:55, Baolin Wang wrote: > >>> Hi Alex, > >>> > >>> On 2025/2/5 09:23, Alex Xu (Hello71) wrote: > >>>> Hi all, > >>>> > >>>> On 6.14-rc1, I found that creating a lot of files in tmpfs then dele= ting > >>>> them reliably hangs when tmpfs is mounted with huge=3Dwithin_size, a= nd it > >>>> is swapped out to zram (zstd/zsmalloc/no backing dev). I bisected th= is > >>>> to acd7ccb284b "mm: shmem: add large folio support for tmpfs". > >>>> > >>>> When the issue occurs, rm uses 100% CPU, cannot be killed, and has n= o > >>>> output in /proc/pid/stack or wchan. Eventually, an RCU stall is > >>>> detected: > >>> > >>> Thanks for your report. Let me try to reproduce the issue locally and > >>> see what happens. > >>> > >>>> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: > >>>> rcu: Tasks blocked on level-0 rcu_node (CPUs 0-11): P25160 > >>>> rcu: (detected by 10, t=3D2102 jiffies, g=3D532677, q=3D4997 ncp= us=3D12) > >>>> task:rm state:R running task stack:0 pid:25160 > >>>> tgid:25160 ppid:24309 task_flags:0x400000 flags:0x00004004 > >>>> Call Trace: > >>>> > >>>> ? __schedule+0x388/0x1000 > >>>> ? kmem_cache_free.part.0+0x23d/0x280 > >>>> ? sysvec_apic_timer_interrupt+0xa/0x80 > >>>> ? asm_sysvec_apic_timer_interrupt+0x16/0x20 > >>>> ? xas_load+0x12/0xc0 > >>>> ? xas_load+0x8/0xc0 > >>>> ? xas_find+0x144/0x190 > >>>> ? find_lock_entries+0x75/0x260 > >>>> ? shmem_undo_range+0xe6/0x5f0 > >>>> ? shmem_evict_inode+0xe4/0x230 > >>>> ? mtree_erase+0x7e/0xe0 > >>>> ? inode_set_ctime_current+0x2e/0x1f0 > >>>> ? evict+0xe9/0x260 > >>>> ? _atomic_dec_and_lock+0x31/0x50 > >>>> ? do_unlinkat+0x270/0x2b0 > >>>> ? __x64_sys_unlinkat+0x30/0x50 > >>>> ? do_syscall_64+0x37/0xe0 > >>>> ? entry_SYSCALL_64_after_hwframe+0x50/0x58 > >>>> > >>>> > >>>> Let me know what information is needed to further troubleshoot this > >>>> issue. > >> > >> Sorry, I can't reproduce this issue, and my testing process is as foll= ows: > >> 1. Mount tmpfs with huge=3Dwithin_size > >> 2. Create and write a tmpfs file > >> 3. Swap out the large folios of the tmpfs file to zram > >> 4. Execute 'rm' command to remove the tmpfs file > > > > I=E2=80=99m unable to reproduce the issue as well, and am following ste= ps similar > > to Baolin's process: > > > > 1) Mount tmpfs with the huge=3Dwithin_size option and enable swap (usin= g > > zstd/zsmalloc without a backing device). > > 2) Create and write over 10,000 files in the tmpfs. > > 3) Swap out the large folios of these tmpfs files to zram. > > 4) Use the rm command to delete all the files from the tmpfs. > > > > Testing with both 2MiB and 64KiB large folio sizes, and with > > shmem_enabled=3Dwithin_size, but everything works as expected. > > Thanks Lance for confirming again. > > Alex, could you give more hints on how to reproduce this issue? > Hi Baolin, I can reproduce this issue very easily with the build linux kernel test, and the failure rate is very high. I'm not exactly sure this is the same bug but very likely, my test step: 1. Create a 10G ZRAM device and set up SWAP on it. 2. Create a 1G memcg, and spawn a shell in it. 3. Mount tmpfs with huge=3Dwithin_size, and then untar linux kernel source code into it. 4. Build with make -j32 (higher or lower job number may also work), the build will always fall within 10s due to file corrupted. After some debugging, the reason is in shmem_swapin_folio, when swap cache is hit `folio =3D swap_cache_get_folio(swap, NULL, 0);` sets folio to a 0 order folio, then the following shmem_add_to_page_cache will insert a order 0 folio overriding a high order entry in shmem's xarray, so data are lost. Swap cache hit could be due to many reasons, in this case it's the readahead. One quick fix is just always split the entry upon shmem fault of 0 order folio like this: diff --git a/mm/shmem.c b/mm/shmem.c index 4ea6109a8043..c8e5c419c675 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2341,6 +2341,10 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, } } + /* Swapin of 0 order folio must always ensure the entries are split= */ + if (!folio_order(folio)) + shmem_split_large_entry(inode, index, swap, gfp); + alloced: /* We have to do this with folio locked to prevent races */ folio_lock(folio); And Hi Alex, can you help confirm if the above patch fixes your reported bu= g? If we are OK with this, this should be merged into 6.14 I think, but for the long term, it might be a good idea to just share a similar logic of (or just reuse) __filemap_add_folio for shmem? __filemap_add_folio will split the entry on insert, and code will be much cleaner.