linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Ihor Solodrai <ihor.solodrai@pm.me>
To: David Howells <dhowells@redhat.com>
Cc: Christian Brauner <christian@brauner.io>,
	Steve French <smfrench@gmail.com>,
	Matthew Wilcox <willy@infradead.org>,
	Jeff Layton <jlayton@kernel.org>,
	Gao Xiang <hsiangkao@linux.alibaba.com>,
	Dominique Martinet <asmadeus@codewreck.org>,
	Marc Dionne <marc.dionne@auristor.com>,
	Paulo Alcantara <pc@manguebit.com>,
	Shyam Prasad N <sprasad@microsoft.com>,
	Tom Talpey <tom@talpey.com>,
	Eric Van Hensbergen <ericvh@kernel.org>,
	Ilya Dryomov <idryomov@gmail.com>,
	netfs@lists.linux.dev, linux-afs@lists.infradead.org,
	linux-cifs@vger.kernel.org, linux-nfs@vger.kernel.org,
	ceph-devel@vger.kernel.org, v9fs@lists.linux.dev,
	linux-erofs@lists.ozlabs.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, bpf <bpf@vger.kernel.org>
Subject: Re: [PATCH v5 27/32] netfs: Change the read result collector to only use one work item
Date: Fri, 24 Jan 2025 17:59:39 +0000	[thread overview]
Message-ID: <a7x33d4dnMdGTtRivptq6S1i8btK70SNBP2XyX_xwDAhLvgQoPox6FVBOkifq4eBinfFfbZlIkMZBe3QarlWTxoEtHZwJCZbNKtaqrR7PvI=@pm.me> (raw)
In-Reply-To: <20241216204124.3752367-28-dhowells@redhat.com>

On Monday, December 16th, 2024 at 12:41 PM, David Howells <dhowells@redhat.com> wrote:

> Change the way netfslib collects read results to do all the collection for
> a particular read request using a single work item that walks along the
> subrequest queue as subrequests make progress or complete, unlocking folios
> progressively rather than doing the unlock in parallel as parallel requests
> come in.
> 
> The code is remodelled to be more like the write-side code, though only
> using a single stream. This makes it more directly comparable and thus
> easier to duplicate fixes between the two sides.
> 
> This has a number of advantages:
> 
> (1) It's simpler. There doesn't need to be a complex donation mechanism
> to handle mismatches between the size and alignment of subrequests and
> folios. The collector unlocks folios as the subrequests covering each
> complete.
> 
> (2) It should cause less scheduler overhead as there's a single work item
> in play unlocking pages in parallel when a read gets split up into a
> lot of subrequests instead of one per subrequest.
> 
> Whilst the parallellism is nice in theory, in practice, the vast
> majority of loads are sequential reads of the whole file, so
> committing a bunch of threads to unlocking folios out of order doesn't
> help in those cases.
> 
> (3) It should make it easier to implement content decryption. A folio
> cannot be decrypted until all the requests that contribute to it have
> completed - and, again, most loads are sequential and so, most of the
> time, we want to begin decryption sequentially (though it's great if
> the decryption can happen in parallel).
> 
> There is a disadvantage in that we're losing the ability to decrypt and
> unlock things on an as-things-arrive basis which may affect some
> applications.
> 
> Signed-off-by: David Howells dhowells@redhat.com
> 
> cc: Jeff Layton jlayton@kernel.org
> 
> cc: netfs@lists.linux.dev
> cc: linux-fsdevel@vger.kernel.org
> ---
> fs/9p/vfs_addr.c | 3 +-
> fs/afs/dir.c | 8 +-
> fs/ceph/addr.c | 9 +-
> fs/netfs/buffered_read.c | 160 ++++----
> fs/netfs/direct_read.c | 60 +--
> fs/netfs/internal.h | 21 +-
> fs/netfs/main.c | 2 +-
> fs/netfs/objects.c | 34 +-
> fs/netfs/read_collect.c | 716 ++++++++++++++++++++---------------
> fs/netfs/read_pgpriv2.c | 203 ++++------
> fs/netfs/read_retry.c | 207 +++++-----
> fs/netfs/read_single.c | 37 +-
> fs/netfs/write_collect.c | 4 +-
> fs/netfs/write_issue.c | 2 +-
> fs/netfs/write_retry.c | 14 +-
> fs/smb/client/cifssmb.c | 2 +
> fs/smb/client/smb2pdu.c | 5 +-
> include/linux/netfs.h | 16 +-
> include/trace/events/netfs.h | 79 +---
> 19 files changed, 819 insertions(+), 763 deletions(-)

Hello David.

After recent merge from upstream BPF CI started consistently failing
with a task hanging in v9fs_evict_inode. I bisected the failure to
commit e2d46f2ec332, pointing to this patch.

Reverting the patch seems to have helped:
https://github.com/kernel-patches/vmtest/actions/runs/12952856569

Could you please investigate?

Examples of failed jobs:
  * https://github.com/kernel-patches/bpf/actions/runs/12941732247
  * https://github.com/kernel-patches/bpf/actions/runs/12933849075

A log snippet:

    2025-01-24T02:15:03.9009694Z [  246.932163] INFO: task ip:1055 blocked for more than 122 seconds.
    2025-01-24T02:15:03.9013633Z [  246.932709]       Tainted: G           OE      6.13.0-g2bcb9cf535b8-dirty #149
    2025-01-24T02:15:03.9018791Z [  246.933249] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    2025-01-24T02:15:03.9025896Z [  246.933802] task:ip              state:D stack:0     pid:1055  tgid:1055  ppid:1054   flags:0x00004002
    2025-01-24T02:15:03.9028228Z [  246.934564] Call Trace:
    2025-01-24T02:15:03.9029758Z [  246.934764]  <TASK>
    2025-01-24T02:15:03.9032572Z [  246.934937]  __schedule+0xa91/0xe80
    2025-01-24T02:15:03.9035126Z [  246.935224]  schedule+0x41/0xb0
    2025-01-24T02:15:03.9037992Z [  246.935459]  v9fs_evict_inode+0xfe/0x170
    2025-01-24T02:15:03.9041469Z [  246.935748]  ? __pfx_var_wake_function+0x10/0x10
    2025-01-24T02:15:03.9043837Z [  246.936101]  evict+0x1ef/0x360
    2025-01-24T02:15:03.9046624Z [  246.936340]  __dentry_kill+0xb0/0x220
    2025-01-24T02:15:03.9048855Z [  246.936610]  ? dput+0x3a/0x1d0
    2025-01-24T02:15:03.9051128Z [  246.936838]  dput+0x114/0x1d0
    2025-01-24T02:15:03.9053548Z [  246.937069]  __fput+0x136/0x2b0
    2025-01-24T02:15:03.9056154Z [  246.937305]  task_work_run+0x89/0xc0
    2025-01-24T02:15:03.9058593Z [  246.937571]  do_exit+0x2c6/0x9c0
    2025-01-24T02:15:03.9061349Z [  246.937816]  do_group_exit+0xa4/0xb0
    2025-01-24T02:15:03.9064401Z [  246.938090]  __x64_sys_exit_group+0x17/0x20
    2025-01-24T02:15:03.9067235Z [  246.938390]  x64_sys_call+0x21a0/0x21a0
    2025-01-24T02:15:03.9069924Z [  246.938672]  do_syscall_64+0x79/0x120
    2025-01-24T02:15:03.9072746Z [  246.938941]  ? clear_bhb_loop+0x25/0x80
    2025-01-24T02:15:03.9075581Z [  246.939230]  ? clear_bhb_loop+0x25/0x80
    2025-01-24T02:15:03.9079275Z [  246.939510]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
    2025-01-24T02:15:03.9081976Z [  246.939875] RIP: 0033:0x7fb86f66f21d
    2025-01-24T02:15:03.9087533Z [  246.940153] RSP: 002b:00007ffdb3cf93f8 EFLAGS: 00000202 ORIG_RAX: 00000000000000e7
    2025-01-24T02:15:03.9092590Z [  246.940689] RAX: ffffffffffffffda RBX: 00007fb86f785fa8 RCX: 00007fb86f66f21d
    2025-01-24T02:15:03.9097722Z [  246.941201] RDX: 00000000000000e7 RSI: ffffffffffffff80 RDI: 0000000000000000
    2025-01-24T02:15:03.9102762Z [  246.941705] RBP: 00007ffdb3cf9450 R08: 00007ffdb3cf93a0 R09: 0000000000000000
    2025-01-24T02:15:03.9107940Z [  246.942215] R10: 00007ffdb3cf92ff R11: 0000000000000202 R12: 0000000000000001
    2025-01-24T02:15:03.9113002Z [  246.942723] R13: 0000000000000000 R14: 0000000000000000 R15: 00007fb86f785fc0
    2025-01-24T02:15:03.9114614Z [  246.943244]  </TASK>
    2025-01-24T02:15:03.9115895Z [  246.943415]
    2025-01-24T02:15:03.9119326Z [  246.943415] Showing all locks held in the system:
    2025-01-24T02:15:03.9122278Z [  246.943865] 1 lock held by khungtaskd/32:
    2025-01-24T02:15:03.9128640Z [  246.944162]  #0: ffffffffa9195d90 (rcu_read_lock){....}-{1:3}, at: debug_show_all_locks+0x2e/0x180
    2025-01-24T02:15:03.9131426Z [  246.944792] 2 locks held by kworker/0:2/86:
    2025-01-24T02:15:03.9132752Z [  246.945102]
    2025-01-24T02:15:03.9136561Z [  246.945222] =============================================

It's worth noting that that the hanging does not happen on *every*
test run, but often enough to fail the CI pipeline.

You may try reproducing with a container I used for bisection:

    docker pull ghcr.io/theihor/bpf:v9fs_evict_inode-repro
    docker run -d --privileged --device=/dev/kvm --cap-add ALL -v /path/to/your/kernel/source:/ci/workspace ghcr.io/theihor/bpf:v9fs_evict_inode-repro
    docker exec -it <container_id_or_name> /bin/bash
    /ci/run.sh # in the container shell

Note that inside the container it's an "ubuntu" user, and you might
have to run `chown -R ubuntu:ubuntu /ci/workspace` first, or switch to
root.

> [...]



  reply	other threads:[~2025-01-24 17:59 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-16 20:40 [PATCH v5 00/32] netfs: Read performance improvements and "single-blob" support David Howells
2024-12-16 20:40 ` [PATCH v5 01/32] netfs: Clean up some whitespace in trace header David Howells
2024-12-16 20:40 ` [PATCH v5 02/32] cachefiles: " David Howells
2024-12-16 20:40 ` [PATCH v5 03/32] netfs: Use a folio_queue allocation and free functions David Howells
2024-12-16 20:40 ` [PATCH v5 04/32] netfs: Add a tracepoint to log the lifespan of folio_queue structs David Howells
2024-12-16 20:40 ` [PATCH v5 05/32] netfs: Abstract out a rolling folio buffer implementation David Howells
2024-12-16 20:40 ` [PATCH v5 06/32] netfs: Make netfs_advance_write() return size_t David Howells
2024-12-16 20:40 ` [PATCH v5 07/32] netfs: Split retry code out of fs/netfs/write_collect.c David Howells
2024-12-16 20:40 ` [PATCH v5 08/32] netfs: Drop the error arg from netfs_read_subreq_terminated() David Howells
2024-12-16 20:40 ` [PATCH v5 09/32] netfs: Drop the was_async " David Howells
2024-12-16 20:41 ` [PATCH v5 10/32] netfs: Don't use bh spinlock David Howells
2024-12-16 20:41 ` [PATCH v5 11/32] afs: Don't use mutex for I/O operation lock David Howells
2024-12-16 20:41 ` [PATCH v5 12/32] afs: Fix EEXIST error returned from afs_rmdir() to be ENOTEMPTY David Howells
2024-12-16 20:41 ` [PATCH v5 13/32] afs: Fix directory format encoding struct David Howells
2024-12-16 20:41 ` [PATCH v5 14/32] netfs: Remove some extraneous directory invalidations David Howells
2024-12-16 20:41 ` [PATCH v5 15/32] cachefiles: Add some subrequest tracepoints David Howells
2024-12-16 20:41 ` [PATCH v5 16/32] cachefiles: Add auxiliary data trace David Howells
2024-12-16 20:41 ` [PATCH v5 17/32] afs: Add more tracepoints to do with tracking validity David Howells
2024-12-16 20:41 ` [PATCH v5 18/32] netfs: Add functions to build/clean a buffer in a folio_queue David Howells
2024-12-16 20:41 ` [PATCH v5 19/32] netfs: Add support for caching single monolithic objects such as AFS dirs David Howells
2024-12-16 20:41 ` [PATCH v5 20/32] afs: Make afs_init_request() get a key if not given a file David Howells
2024-12-16 20:41 ` [PATCH v5 21/32] afs: Use netfslib for directories David Howells
2024-12-16 20:41 ` [PATCH v5 22/32] afs: Use netfslib for symlinks, allowing them to be cached David Howells
2024-12-16 20:41 ` [PATCH v5 23/32] afs: Eliminate afs_read David Howells
2024-12-16 20:41 ` [PATCH v5 24/32] afs: Fix cleanup of immediately failed async calls David Howells
2024-12-16 20:41 ` [PATCH v5 25/32] afs: Make {Y,}FS.FetchData an asynchronous operation David Howells
2024-12-16 20:41 ` [PATCH v5 26/32] Display waited-on page index after 1min of waiting David Howells
2024-12-16 20:41 ` [PATCH v5 27/32] netfs: Change the read result collector to only use one work item David Howells
2025-01-24 17:59   ` Ihor Solodrai [this message]
2025-01-24 18:21     ` Marc Dionne
2025-01-24 18:46       ` Ihor Solodrai
2025-01-24 19:07         ` Marc Dionne
2025-01-24 19:54           ` Ihor Solodrai
2024-12-16 20:41 ` [PATCH v5 28/32] afs: Make afs_mkdir() locally initialise a new directory's content David Howells
2024-12-16 20:41 ` [PATCH v5 29/32] afs: Use the contained hashtable to search a directory David Howells
2024-12-16 20:41 ` [PATCH v5 30/32] afs: Locally initialise the contents of a new symlink on creation David Howells
2024-12-16 20:41 ` [PATCH v5 31/32] afs: Add a tracepoint for afs_read_receive() David Howells
2024-12-16 20:41 ` [PATCH v5 32/32] netfs: Report on NULL folioq in netfs_writeback_unlock_folios() David Howells

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='a7x33d4dnMdGTtRivptq6S1i8btK70SNBP2XyX_xwDAhLvgQoPox6FVBOkifq4eBinfFfbZlIkMZBe3QarlWTxoEtHZwJCZbNKtaqrR7PvI=@pm.me' \
    --to=ihor.solodrai@pm.me \
    --cc=asmadeus@codewreck.org \
    --cc=bpf@vger.kernel.org \
    --cc=ceph-devel@vger.kernel.org \
    --cc=christian@brauner.io \
    --cc=dhowells@redhat.com \
    --cc=ericvh@kernel.org \
    --cc=hsiangkao@linux.alibaba.com \
    --cc=idryomov@gmail.com \
    --cc=jlayton@kernel.org \
    --cc=linux-afs@lists.infradead.org \
    --cc=linux-cifs@vger.kernel.org \
    --cc=linux-erofs@lists.ozlabs.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=marc.dionne@auristor.com \
    --cc=netdev@vger.kernel.org \
    --cc=netfs@lists.linux.dev \
    --cc=pc@manguebit.com \
    --cc=smfrench@gmail.com \
    --cc=sprasad@microsoft.com \
    --cc=tom@talpey.com \
    --cc=v9fs@lists.linux.dev \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox