From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0BBF5C27C79 for ; Mon, 17 Jun 2024 10:33:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 77D496B016F; Mon, 17 Jun 2024 06:33:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 72D006B0171; Mon, 17 Jun 2024 06:33:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5F4B76B0172; Mon, 17 Jun 2024 06:33:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 40F2E6B016F for ; Mon, 17 Jun 2024 06:33:24 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id D8F16121568 for ; Mon, 17 Jun 2024 10:33:23 +0000 (UTC) X-FDA: 82240018686.26.C455E5F Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf06.hostedemail.com (Postfix) with ESMTP id 23D6B180019 for ; Mon, 17 Jun 2024 10:33:20 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=none; spf=pass (imf06.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718620396; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=O0HxiVAYT553VJZqxWltw3/Y165RtKPr1G5pDJcrJWk=; b=R+XGYIyZV3vsfwczrZrA+sdw2VwLfGgxq8ZA6lZkqVxr/1ru5nei8GXpcuzztXJd1a2vrO a8FmIeRf6855u4+fFX/iU2/D+6iyoYvIGnmtPPkzCJML5oBcslLbnBdmh7KX3iaZ7Mpglh lw9NYnVzWAFBCYbv7Nqcu8zQ8s8fEaI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718620396; a=rsa-sha256; cv=none; b=10yzbBjNkAX3JeuoLl4ag7zzEiXHSMmBrGNh8v0Xg5yq00arlhbmjR4rx5hzHpjeihglLG NJFMACmtJ38qd9gJrvFglPT4jyeL69Dyst/k+zHpSon17ERVefbFAmUzOZbctVOmPq4N+P f9dVLJe7/1gujMe1C3uKJTJF7oHif58= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=none; spf=pass (imf06.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id ADC0FDA7; Mon, 17 Jun 2024 03:33:44 -0700 (PDT) Received: from [10.57.73.35] (unknown [10.57.73.35]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 202233F6A8; Mon, 17 Jun 2024 03:33:17 -0700 (PDT) Message-ID: Date: Mon, 17 Jun 2024 11:33:16 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] nfs: fix nfs_swap_rw for large-folio swap Content-Language: en-GB To: Barry Song <21cnbao@gmail.com>, Christoph Hellwig Cc: Andrew Morton , Trond Myklebust , Anna Schumaker , Steve French , linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, linux-mm@kvack.org, Barry Song References: <20240614100329.1203579-1-hch@lst.de> <20240614100329.1203579-2-hch@lst.de> <20240614112148.cd1961e84b736060c54bdf26@linux-foundation.org> <20240616085436.GA28058@lst.de> <9ef638fc-5606-45da-a237-2e09ee05bbeb@arm.com> From: Ryan Roberts In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 23D6B180019 X-Stat-Signature: 3brcak4podbqhanm6oqwc9qra8rfwd64 X-HE-Tag: 1718620400-953046 X-HE-Meta: U2FsdGVkX1+zpjzf7tGEFhZGybgWXVlV+oVPiP1DZsAha6oXrtaS1YtnVHoHnSlHakzSRd02Cvd13KkmDN5EAC9vNZawmkfilZ8HfQfq+wsPo78vL6Vwa12sgmP8EBvDZpjHvQbJWMHklomRnooP+nNDPs9KKRdfgcFdTJX2qUT73NhhbK+s4RTGhHwN6moDujf86BqotfnK9vrMpF4mxWOHDU+GlqslLySmUe12Y6Bz4rjV77vzX9/S/CsHjlk4jr5/sa3lmhhx9ChPpl3LIvQnQsBD6MbXvxv9zfsDQ4Meu8qL9FQZkf3zoEKqyS75FE7V771+xyuhOw25QxLpd2XE1dur7QF/X+i4TqGUraU9yApz0dH5y9Q1QZI0VJEr6JJuTw70I95n9v0HcNLfWOMiIDaCqE0EMG/+nQIoZiIz5k4QwOaLE2DGUxgORjAZHJJP1pJUIFfpJcIZ0ZwrzrZbkL6SrHimy8szHFUwsj1sjyKv8pKHkbj1Gsv81AdfcVQPZaXIzcJpOFeje56Cp4OxNA45aHhnyntmroSHKYNBRgakES2K7JbyGJjefSwify38DqwDelMi+ZQ23IbHrJnIaeTvWdg7rDLsHwcadwLiM0rIMVjJMnq81XNyqiYfeVaLoAQyGHCCPxW4IjrDDAboUMGSGg4SOosF348WXLHbsNMx0bbXtIdDdWG4OylqrKzw8jef5T0oWZBJXZFSYX6VqNWjH+bTpKv61XlRp+al4Jbr9rJ7NA0YJkpjeMckAeYObwqfPN2ICCxj86g3p2SmKPVVuu5L0qnN/P3ORxW/oXUTotmSr9zsVRvIRv+JYvUVMO+4XKWHSt1coaPKkIF+6c/Cw+RdICd6yHCC0bU89UF7uEKpR6LOGmluaZ8YKqYKvpzJ1qDlCq6zr2BnwezRt4metiMPQjx0V8LCcHNfZiDV3ta5m2jpAyuiFgnc4sqdkezJKBnWy1LWrz9 CQtSMgDZ vRWe/k6shVUmeF47rkFccHXn4sqw9kV4bY9zGVursl6foG2o3l/sjW74cb7MPpydwcEKlrEYlb7O3YMR1z6EbuUgwx5dfErt2ybUF7qWYVb8TtHm2PDEjQqjy24Bq/Rcwsc5s6hJ3Z1bNAvFH/PupMBLCaGXYuUsrGvz/4BkV8WhZ+AkGXF3p0/SbXtGf8O12+A+wi1uHYQ8fquAchDwJ6kb+YIs9TCtvVd8ufjOLpevZSnHTnIGl2p/8W9/baU5Qn68hmY/2CbVcjWE/ZW/ge7gAEA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 17/06/2024 10:40, Barry Song wrote: > On Mon, Jun 17, 2024 at 8:03 PM Ryan Roberts wrote: >> >> On 16/06/2024 11:23, Barry Song wrote: >>> On Sun, Jun 16, 2024 at 4:54 PM Christoph Hellwig wrote: >>>> >>>> On Sun, Jun 16, 2024 at 12:16:10PM +1200, Barry Song wrote: >>>>> As I understand it, this isn't happening because we don't support >>>>> mTHP swapping out to a swapfile, whether it's on NFS or any >>>>> other filesystem. >>>> >>>> It does happen. The reason why I sent this patch is becaue I observed >>>> the BUG_ON trigger on a trivial swap generation workload (usemem.c from >>>> xfstests). >>> >>> This is quite unusual. Could you share your setup and backtrace? I'd >>> like to reproduce the issue, as the mm code only supports mTHP >>> swapout on block devices. What is your swap device or swap file? >>> Additionally, on what kind of filesystem is the executable file built >>> from usemem.c located? >> >> Yes, I'm also confused by this, since as Barry says, the swap-out changes to >> support mTHP are only intended to be activated when the swap device is a >> non-rotating block device - swap files on file systems are explicitly not >> supported and all swapping should be done page-by-page in that case. This >> constraint is exactly the same as for the pre-existing PMD-size THP swap-out >> support. So if you are seeing large folios being written after the mTHP swap-out >> change, you should also be seeing large folios before this change. >> >> Hopefully the stack trace will tell us what's going on here. > > Hi Ryan, Christoph, > > I am able to reproduce the issue now. I am debugging and will update > the root cause > with you this week. Ahh great; for some reason I'm not receiving Chrostoph's mails so didn't see the stack trace and instructions until you replied to it. I had a go are repro'ing too but am failing to even get the systemd nfs service to start. I'll leave it to you. > > Initial investigation shows the issue might *not* be related to THP_SWPOUT. > > I am even able to reproduce it after disabling thp and mthp, entirely by > small folios: > > [ 215.925069] folio_alloc_swap folio nr:1 anon:1 swapbacked:1 > [ 215.926383] vmscan: shrink_folio_list folio nr:1 anon:1 swapbacked:1 > [ 215.927008] folio_alloc_swap folio nr:1 anon:1 swapbacked:1 > [ 215.929368] ------------[ cut here ]------------ > [ 215.929824] kernel BUG at fs/nfs/direct.c:144! > [ 215.930403] Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP > [ 215.931264] Modules linked in: > [ 215.932328] CPU: 3 PID: 214 Comm: mthp_swpout_tes Not tainted > 6.10.0-rc3-ga12328d9fb85-dirty #292 > [ 215.932953] Hardware name: linux,dummy-virt (DT) > [ 215.933461] pstate: 21400005 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) > [ 215.934030] pc : nfs_swap_rw+0x60/0x70 > [ 215.935079] lr : swap_write_unplug+0x64/0xb0 > [ 215.935559] sp : ffff800087363280 > [ 215.935958] x29: ffff800087363280 x28: ffff0000c3241800 x27: fffffdffc323a4c0 > [ 215.937012] x26: fffffdffc323a4c8 x25: ffff0001b4a51500 x24: ffff80008250f670 > [ 215.937893] x23: 0000000000000001 x22: ffff0000c0b2da00 x21: 0000000000020000 > [ 215.938734] x20: ffff0000c46a8bd8 x19: ffff0000c154f800 x18: ffffffffffffffff > [ 215.939594] x17: 0000000000000000 x16: 0000000000000000 x15: ffff800107363097 > [ 215.940591] x14: 0000000000000000 x13: 313a64656b636162 x12: 7061777320313a6e > [ 215.941621] x11: 6f6e6120313a726e x10: ffff800083e86318 x9 : ffff8000803e9ad4 > [ 215.942673] x8 : ffff800087363168 x7 : 0000000000000000 x6 : ffff0001adbfa4c6 > [ 215.943674] x5 : 0000000000000002 x4 : 0000000000020000 x3 : 0000000000020000 > [ 215.944673] x2 : ffff8000806015e8 x1 : ffff8000873632a0 x0 : ffff0000c154f800 > [ 215.945568] Call trace: > [ 215.945906] nfs_swap_rw+0x60/0x70 > [ 215.946351] __swap_writepage+0x2e8/0x328 > [ 215.946775] swap_writepage+0x68/0xd0 > [ 215.947184] pageout+0xe4/0x430 > [ 215.947587] shrink_folio_list+0x9bc/0xf60 > [ 215.947992] reclaim_folio_list+0x8c/0x168 > [ 215.948454] reclaim_pages+0xfc/0x178 > [ 215.948843] madvise_cold_or_pageout_pte_range+0x8d8/0xf28 > [ 215.949285] walk_pgd_range+0x390/0x808 > [ 215.949660] __walk_page_range+0x1e0/0x1f0 > [ 215.950040] walk_page_range+0x1f0/0x2c8 > [ 215.950458] madvise_pageout+0xf8/0x280 > [ 215.950905] madvise_vma_behavior+0x314/0xa20 > [ 215.951361] madvise_walk_vmas+0xc0/0x128 > [ 215.951807] do_madvise.part.0+0x110/0x558 > [ 215.952298] __arm64_sys_madvise+0x68/0x88 > [ 215.952723] invoke_syscall+0x50/0x128 > [ 215.953148] el0_svc_common.constprop.0+0x48/0xf8 > [ 215.953592] do_el0_svc+0x28/0x40 > [ 215.954036] el0_svc+0x50/0x150 > [ 215.954610] el0t_64_sync_handler+0x13c/0x158 > [ 215.955070] el0t_64_sync+0x1a4/0x1a8 > [ 215.955685] Code: a8c17bfd d50323bf 9a9fd000 d65f03c0 (d4210000) > [ 215.956510] ---[ end trace 0000000000000000 ]--- > > >> >> (Sorry for my slow responses/lack of engagement over the last month; its been a >> combination of paternity leave/lack of sleep/working on other things. I'm hoping >> to get properly back into this stuff within the next couple of weeks). >> >> Thanks, >> Ryan >>