From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7BD43C02181 for ; Fri, 24 Jan 2025 19:07:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DF883280094; Fri, 24 Jan 2025 14:07:42 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DA8FE280092; Fri, 24 Jan 2025 14:07:42 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C4952280094; Fri, 24 Jan 2025 14:07:42 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id A993E280092 for ; Fri, 24 Jan 2025 14:07:42 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 2172414172A for ; Fri, 24 Jan 2025 19:07:42 +0000 (UTC) X-FDA: 83043279564.14.53BD54F Received: from mail-ej1-f49.google.com (mail-ej1-f49.google.com [209.85.218.49]) by imf17.hostedemail.com (Postfix) with ESMTP id 2C04140007 for ; Fri, 24 Jan 2025 19:07:39 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="CuXiOp/y"; spf=pass (imf17.hostedemail.com: domain of marc.c.dionne@gmail.com designates 209.85.218.49 as permitted sender) smtp.mailfrom=marc.c.dionne@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737745660; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=M64ccCiSMLIZYqvDwKn2KQG8TzxywkOKAzy05Y8MZaw=; b=YgX4kApZ5Hf20jfyR59iI69xre73qCIRnLl85Fb/3k3UaTgZIBN8O8HgQkAQgYNR71uxi4 3+/pZuA/Hew+JzQcr3fWXYrXvJ15Xgl20BEz4B2xxbBsS3ZBFKByC+MiHFOvT57LyTcUQ5 S2ySPYBwoFwGypj4V2DRJ4hdCW1nuM8= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737745660; a=rsa-sha256; cv=none; b=RujWATP6z6e5Z+tCH+cJ2qJ6wYUC/dDxZFlL27a0cj0Lbjg+ohmvwo26qMJTMzihr0oxDJ FJZkzDvZNQ+22cJjwpy1n0JupcBH3BQOriayv/Gjqeg/43fTujTp/pYMRG2OZ14ZKoGe/4 R6tOhRs2U3cQgcaZeKgPShGhGqcK/fY= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="CuXiOp/y"; spf=pass (imf17.hostedemail.com: domain of marc.c.dionne@gmail.com designates 209.85.218.49 as permitted sender) smtp.mailfrom=marc.c.dionne@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-ej1-f49.google.com with SMTP id a640c23a62f3a-ab65fca99b6so475892466b.0 for ; Fri, 24 Jan 2025 11:07:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1737745658; x=1738350458; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=M64ccCiSMLIZYqvDwKn2KQG8TzxywkOKAzy05Y8MZaw=; b=CuXiOp/yJJHtlswRFqMW0lYdmjFKDilla16Q6CDDiu29XO6UfN/mDAzZcjiQMdPMsU H2byutgUMWbHeU7hwe35ZuwLmAJ+Vrxult4MZi6CFYUPDztDC1Xi0CvDsM97HQbicW6m 9l1ZdAVHF7XyoXmJ/c9bFQifGxl28BBFwXL019xGFgUEaTWOuNBXZ7MX9rpBLFf1Lj8v bFvk6NskYnU1qNf9cXAsMEWI41WEtNFHcWvOcBLIYClqQLUwrqcGpqU16KljgjPcbb6c 5HVauejmcthYRePtlKptlTxvwZ7BsO9gg075+YGQCa46OIj1G3MePkbI9nagiepLLWS1 H1Dw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737745658; x=1738350458; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=M64ccCiSMLIZYqvDwKn2KQG8TzxywkOKAzy05Y8MZaw=; b=xUMJcp+L3e2ORNCFYGVfPW2qG9b8fsiInubKi9S8ZWJzdtEzXMZktd6EJUM46deLDa k3fwTAg8lckO3ata04b0LzCdtnSeE7Rqu5MzWMjfX4uUU+DqZjPokO8SEbbesXyfmJI7 XjiCMadM2fSVcvhClKYU58jfD8ebO4ws7MmMPlE6i+S/OuahoEo6a54Nu3tuC4+GFi76 Q+lEwr1RGT1EE8bG5ZwZGhp/ZGN79j4/zGt/IrDIzABMTtmd/y5e8zo7hBejQUCa+LTE e6NlL0gy2Ejj7fXmF8zEoV4OOOmzoPbtLpGxPJlS9ILseFk4NSMJk5KqO6ytjGXASBuG mH+A== X-Forwarded-Encrypted: i=1; AJvYcCVW0YgHlvMXJPPpxDAx3ECLcLa9zUyRd4gb2axmxygIZLUHEwIjGZzGkBqV3T1B4IJWM/dZbJBDQg==@kvack.org X-Gm-Message-State: AOJu0YzxhWXJqgf2PkaJEU/Q1ZB3hOnmoiwKx1glDy9IKhFAxYQKdTF1 RSpX2YpHcwUS1HH4oxl9qxxdxHQ2HQKYtJJnT5lSjj0Cwgl4YFGAlHYaiqGTizwRCZyYFYZSdLe jcRSrAgJoupi964BPIG831NRtRls= X-Gm-Gg: ASbGncvo1B0xWbPzXOQd/97/HthUnotfBUdDUx72/AJDXXcceGR1fhb9qFuQl6OgWBl 3ZIqzkBHBT4ZyYWXOmIOJbRTUqbYcsZgr9x5ktO3PbcH+2LURcBBTpIG6nJI= X-Google-Smtp-Source: AGHT+IHVokvHgS4VPbH29H+qihMfYRv5DXmUYISCN+1KE4QC6a2L3CH+v4Rg2tloe5yQ1DMWw4vIjjDbAzeSubHjNHs= X-Received: by 2002:a17:907:1c2a:b0:aac:832:9bf7 with SMTP id a640c23a62f3a-ab38b27be47mr2790084466b.24.1737745658104; Fri, 24 Jan 2025 11:07:38 -0800 (PST) MIME-Version: 1.0 References: <20241216204124.3752367-1-dhowells@redhat.com> <20241216204124.3752367-28-dhowells@redhat.com> In-Reply-To: From: Marc Dionne Date: Fri, 24 Jan 2025 15:07:26 -0400 X-Gm-Features: AWEUYZm1VpljAqRhQQ6dX7tl5sygVWqfg2AdWCAk6rSkUpuGiA6tuCJB2vasH_k Message-ID: Subject: Re: [PATCH v5 27/32] netfs: Change the read result collector to only use one work item To: Ihor Solodrai Cc: David Howells , Christian Brauner , Steve French , Matthew Wilcox , Jeff Layton , Gao Xiang , Dominique Martinet , Paulo Alcantara , Shyam Prasad N , Tom Talpey , Eric Van Hensbergen , Ilya Dryomov , netfs@lists.linux.dev, linux-afs@lists.infradead.org, linux-cifs@vger.kernel.org, linux-nfs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs@lists.linux.dev, linux-erofs@lists.ozlabs.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, bpf Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: tpckz6ysbmj7wgb9pgoqq535m11pwef4 X-Rspamd-Queue-Id: 2C04140007 X-Rspam-User: X-Rspamd-Server: rspam06 X-HE-Tag: 1737745659-633118 X-HE-Meta: U2FsdGVkX193GowaP6kVSynyCHPoZ8+ksLNeh51daS3VY8KV9YTA3VUbLwCpKI72699aECV39+FBKtbES0BilwF49cKytGKbvOd8Ql8yQsLI2u5T0Axt6esrkUtlmJkfb1809Ir0bM/9uLzt5KeLhRbNP30cjDQXmjlt3r4KaklkjTb6Dl09zhlbnYcNPVUmew75qd8quoPmK1Xwe6FCM3wWsczvgfkgGx/iawTPGVpGzaBXojxE2ihWA6yITGjAZaJGiKjzZUiuP4gntFTkGga7tbobWbfirEiZYv8uQLkVp7T0KAzp9e6zoqnx4hjt8YCCO3LxnFeGXa+bmSxArdtRuzsdLW8VLXgNuG1Ljh+E9XYJc1TxEYbAcK+pWNiWp++EDH0SrInxDVKt1etC+FJYPE2h2q7Uf3KsGiOhl8oHQhBVR6obfqerbP9rZrCiJFKPJSAkcwPvEhAlouwHkNnvsyOLmxeH2ysyMdpDBONdLtppEJRogi+60tpMSTT+MSUImfnFK1y4Gyms/sTLbhnrwJ0Y6anRaH3n4TDHpoIa5F1SqX95imRdFEw0JSWJAACEieMW4l3y/4rIcI/kPmvjV/9175EMwQz+jwUPSOuxqdKCXU5nTQd6dV+kZ0H39fQq7eDU4pYlHOseUTJCDr7gsV/vfM7j+fHxWbNtPgMmnF5JqN0eSAaLpB5pRwb6h+DSe8Gucn9Igfqq706ijCSUEgR2xBgHvKzVeCOrTZXlmBXlx8dh8rO+KuS2tDrvX8qzcZ5rjms5b5ihcwuhixjBKn3ayE4Y8Bj0HmEykYsA0AojgUefORsxJUMEZ2C+XpEpB5KiecEEeoQL5PiuAGrDIMLbTaXALStaCgl7DTV43yJyExpqo7UkKZeRAyFWNld1sDPfImSvXVCNqbA7tJl61B+ABPDlMNjsz8PJZDc49SaH/9fDFFDvSbkHKTDvmtcluIUjD9M74vfVUzC 5ulrN6/N 3puKoKucXS87zYkymWe5eXlXG1+xYGSSyyU9FzPo3NRZ8+hhMeKsmgolTQ7x9uHChEzCp7P4SbE/D5SA9KwHlI2o6Nlt4ukF9llIxOf9lk1DiBM2iUyuYHeCYWppxHiyKLHHAaegcYrBhkMDZ43udDuw5FJqomVl6mXQMAZqi02rLJGJPYqpmFt8PeN4NckPB7wy7JxGAGlKMYLxXe4n6W1kLbWJLlWWnv0py/HjXRIj7wqVn3AYsEHIiLjpwhOIaCf6bTt/12pjHbE7qGz064OG228B6KeEPxof4tFEeUsz9HECQnNyrRID6Uy8HCrkaESXBX7vI3vnMSeuCs/hPdufmBSYebBJhstEJHrPzJ2rHrdJavrhlQUkWuXBjTMpeHL01IbhiJ2eW7UUgkZ+6tyR7OXtPAP4PEaw2 X-Bogosity: Ham, tests=bogofilter, spamicity=0.004417, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Jan 24, 2025 at 2:46=E2=80=AFPM Ihor Solodrai = wrote: > > On Friday, January 24th, 2025 at 10:21 AM, Marc Dionne wrote: > > > > > [...] > > > > > A log snippet: > > > > > > 2025-01-24T02:15:03.9009694Z [ 246.932163] INFO: task ip:1055 blocked= for more than 122 seconds. > > > 2025-01-24T02:15:03.9013633Z [ 246.932709] Tainted: G OE 6.13.0-g2bcb= 9cf535b8-dirty #149 > > > 2025-01-24T02:15:03.9018791Z [ 246.933249] "echo 0 > /proc/sys/kernel= /hung_task_timeout_secs" disables this message. > > > 2025-01-24T02:15:03.9025896Z [ 246.933802] task:ip state:D stack:0 pi= d:1055 tgid:1055 ppid:1054 flags:0x00004002 > > > 2025-01-24T02:15:03.9028228Z [ 246.934564] Call Trace: > > > 2025-01-24T02:15:03.9029758Z [ 246.934764] > > > 2025-01-24T02:15:03.9032572Z [ 246.934937] __schedule+0xa91/0xe80 > > > 2025-01-24T02:15:03.9035126Z [ 246.935224] schedule+0x41/0xb0 > > > 2025-01-24T02:15:03.9037992Z [ 246.935459] v9fs_evict_inode+0xfe/0x17= 0 > > > 2025-01-24T02:15:03.9041469Z [ 246.935748] ? __pfx_var_wake_function+= 0x10/0x10 > > > 2025-01-24T02:15:03.9043837Z [ 246.936101] evict+0x1ef/0x360 > > > 2025-01-24T02:15:03.9046624Z [ 246.936340] __dentry_kill+0xb0/0x220 > > > 2025-01-24T02:15:03.9048855Z [ 246.936610] ? dput+0x3a/0x1d0 > > > 2025-01-24T02:15:03.9051128Z [ 246.936838] dput+0x114/0x1d0 > > > 2025-01-24T02:15:03.9053548Z [ 246.937069] __fput+0x136/0x2b0 > > > 2025-01-24T02:15:03.9056154Z [ 246.937305] task_work_run+0x89/0xc0 > > > 2025-01-24T02:15:03.9058593Z [ 246.937571] do_exit+0x2c6/0x9c0 > > > 2025-01-24T02:15:03.9061349Z [ 246.937816] do_group_exit+0xa4/0xb0 > > > 2025-01-24T02:15:03.9064401Z [ 246.938090] __x64_sys_exit_group+0x17/= 0x20 > > > 2025-01-24T02:15:03.9067235Z [ 246.938390] x64_sys_call+0x21a0/0x21a0 > > > 2025-01-24T02:15:03.9069924Z [ 246.938672] do_syscall_64+0x79/0x120 > > > 2025-01-24T02:15:03.9072746Z [ 246.938941] ? clear_bhb_loop+0x25/0x80 > > > 2025-01-24T02:15:03.9075581Z [ 246.939230] ? clear_bhb_loop+0x25/0x80 > > > 2025-01-24T02:15:03.9079275Z [ 246.939510] entry_SYSCALL_64_after_hwf= rame+0x76/0x7e > > > 2025-01-24T02:15:03.9081976Z [ 246.939875] RIP: 0033:0x7fb86f66f21d > > > 2025-01-24T02:15:03.9087533Z [ 246.940153] RSP: 002b:00007ffdb3cf93f8= EFLAGS: 00000202 ORIG_RAX: 00000000000000e7 > > > 2025-01-24T02:15:03.9092590Z [ 246.940689] RAX: ffffffffffffffda RBX:= 00007fb86f785fa8 RCX: 00007fb86f66f21d > > > 2025-01-24T02:15:03.9097722Z [ 246.941201] RDX: 00000000000000e7 RSI:= ffffffffffffff80 RDI: 0000000000000000 > > > 2025-01-24T02:15:03.9102762Z [ 246.941705] RBP: 00007ffdb3cf9450 R08:= 00007ffdb3cf93a0 R09: 0000000000000000 > > > 2025-01-24T02:15:03.9107940Z [ 246.942215] R10: 00007ffdb3cf92ff R11:= 0000000000000202 R12: 0000000000000001 > > > 2025-01-24T02:15:03.9113002Z [ 246.942723] R13: 0000000000000000 R14:= 0000000000000000 R15: 00007fb86f785fc0 > > > 2025-01-24T02:15:03.9114614Z [ 246.943244] > > > > > > That looks very similar to something I saw in afs testing, with a > > similar stack but in afs_evict_inode where it hung waiting in > > netfs_wait_for_outstanding_io. > > > > David pointed to this bit where there's a double get in > > netfs_retry_read_subrequests, since netfs_reissue_read already takes > > care of getting a ref on the subrequest: > > > > diff --git a/fs/netfs/read_retry.c b/fs/netfs/read_retry.c > > index 2290af0d51ac..53d62e31a4cc 100644 > > --- a/fs/netfs/read_retry.c > > +++ b/fs/netfs/read_retry.c > > @@ -152,7 +152,6 @@ static void netfs_retry_read_subrequests(struct > > netfs_io_request *rreq) > > __clear_bit(NETFS_SREQ_BOUNDARY, > > &subreq->flags); > > > > } > > > > - netfs_get_subrequest(subreq, > > netfs_sreq_trace_get_resubmit); > > netfs_reissue_read(rreq, subreq); > > if (subreq =3D=3D to) > > break; > > > > That seems to help for my afs test case, I suspect it might help in > > your case as well. > > Hi Marc. Thank you for the suggestion. > > I've just tried this diff on top of bpf-next (d0d106a2bd21): > > diff --git a/fs/netfs/read_retry.c b/fs/netfs/read_retry.c > index 2290af0d51ac..53d62e31a4cc 100644 > --- a/fs/netfs/read_retry.c > +++ b/fs/netfs/read_retry.c > @@ -152,7 +152,6 @@ static void netfs_retry_read_subrequests(struct netfs= _io_request *rreq) > __clear_bit(NETFS_SREQ_BOUNDARY, &subreq-= >flags); > } > > - netfs_get_subrequest(subreq, netfs_sreq_trace_get= _resubmit); > netfs_reissue_read(rreq, subreq); > if (subreq =3D=3D to) > break; > > > and I'm getting a hung task with the same stack > > [ 184.362292] INFO: task modprobe:2527 blocked for more than 20 seconds. > [ 184.363173] Tainted: G OE 6.13.0-gd0d106a2bd21-di= rty #1 > [ 184.363651] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disable= s this message. > [ 184.364142] task:modprobe state:D stack:0 pid:2527 tgid:25= 27 ppid:2134 flags:0x00000002 > [ 184.364743] Call Trace: > [ 184.364907] > [ 184.365057] __schedule+0xa91/0xe80 > [ 184.365311] schedule+0x41/0xb0 > [ 184.365525] v9fs_evict_inode+0xfe/0x170 > [ 184.365782] ? __pfx_var_wake_function+0x10/0x10 > [ 184.366082] evict+0x1ef/0x360 > [ 184.366312] __dentry_kill+0xb0/0x220 > [ 184.366561] ? dput+0x3a/0x1d0 > [ 184.366765] dput+0x114/0x1d0 > [ 184.366962] __fput+0x136/0x2b0 > [ 184.367172] __x64_sys_close+0x9e/0xd0 > [ 184.367443] do_syscall_64+0x79/0x120 > [ 184.367685] entry_SYSCALL_64_after_hwframe+0x76/0x7e > [ 184.368005] RIP: 0033:0x7f4c6fc7f60b > [ 184.368249] RSP: 002b:00007ffc7582beb8 EFLAGS: 00000297 ORIG_RAX: 0000= 000000000003 > [ 184.368733] RAX: ffffffffffffffda RBX: 0000555e18cff7a0 RCX: 00007f4c6= fc7f60b > [ 184.369176] RDX: 00007f4c6fd64ee0 RSI: 0000000000000001 RDI: 000000000= 0000000 > [ 184.369634] RBP: 00007ffc7582bee0 R08: 0000000000000000 R09: 000000000= 0000007 > [ 184.370078] R10: 0000555e18cff980 R11: 0000000000000297 R12: 000000000= 0000000 > [ 184.370544] R13: 00007f4c6fd65030 R14: 0000555e18cff980 R15: 0000555e1= 8d7b750 > [ 184.371004] > [ 184.371151] > [ 184.371151] Showing all locks held in the system: > [ 184.371560] 1 lock held by khungtaskd/32: > [ 184.371816] #0: ffffffff83195d90 (rcu_read_lock){....}-{1:3}, at: deb= ug_show_all_locks+0x2e/0x180 > [ 184.372397] 2 locks held by kworker/u8:21/2134: > [ 184.372695] #0: ffff9a5300104d48 ((wq_completion)events_unbound){+.+.= }-{0:0}, at: process_scheduled_works+0x23a/0x600 > [ 184.373376] #1: ffff9e9882187e20 ((work_completion)(&sub_info->work))= {+.+.}-{0:0}, at: process_scheduled_works+0x25a/0x600 > [ 184.374075] > [ 184.374182] =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D > > So this appears to be something different. > > > > > Marc Looks like there may be a similar issue with the netfs_get_subrequest() at line 196, which also precedes a call to netfs_reissue_read. Might be worth trying with that removed as well. Marc