From: Chuck Lever III <chuck.lever@oracle.com>
To: David Howells <dhowells@redhat.com>
Cc: Trond Myklebust <trondmy@hammerspace.com>,
Matthew Wilcox <willy@infradead.org>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Al Viro <viro@zeniv.linux.org.uk>,
Christoph Hellwig <hch@infradead.org>,
Jens Axboe <axboe@kernel.dk>, Jeffrey Layton <jlayton@kernel.org>,
Christian Brauner <brauner@kernel.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
Anna Schumaker <anna@kernel.org>,
Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: [RFC PATCH 27/28] sunrpc: Use sendmsg(MSG_SPLICE_PAGES) rather then sendpage
Date: Fri, 17 Mar 2023 15:29:40 +0000 [thread overview]
Message-ID: <18CE11D1-74E4-4FAF-86B5-5CE834935726@oracle.com> (raw)
In-Reply-To: <818650.1679001712@warthog.procyon.org.uk>
> On Mar 16, 2023, at 5:21 PM, David Howells <dhowells@redhat.com> wrote:
>
> Chuck Lever III <chuck.lever@oracle.com> wrote:
>
>> Therefore, this kind of change needs to be accompanied by both
>> benchmark results and some field testing to convince me it won't
>> cause harm.
>
> Btw, what do you use to benchmark NFS performance?
It depends on what I'm trying to observe. I have only a small
handful of systems in my lab, which is why I was not able to
immediately detect the effects of the zero-copy change in my
lab. Daire has a large client cohort on a fast network, so is
able to see the impact of that kind of change quite readily.
A perhaps more interesting question is what kind of tooling
would I use to measure the performance of the proposed change.
The bottom line is whether or not applications on clients can
see a change. NFS client implementations can hide server and
network latency improvement from applications, and RPC-on-TCP
adds palpable latency as well that reduces the efficacy of
server performance optimizations.
For that I might use a multi-threaded fio workload with fixed
record sizes (2KB, 8KB, 128KB, 1MB) and then look at the
throughput numbers and latency distribution for each size.
In a single thread qd=1 test, iozone can show changes in
READ latency pretty clearly, though most folks believe qd=1
tests are junk.
I generally run such tests on 100GbE with a tmpfs or NVMe
export to take filesystem latencies out of the equation,
although that might matter more for WRITE latency if you
can keep your READ workload completely in server memory.
To measure server-side behavior without the effects of the
network or client, NFSD has a built-in trace point,
nfsd:svc_stats_latency, that records the latency in
microseconds of each RPC. Run the above workloads and
record this tracepoint (perhaps with a capture filter to
record only the latency of READ operations).
Then you can post-process the raw latencies to get an average
latency and deviation, or even look at latency distribution
to see if the shape of the outlier curve has changed. I use
awk for this.
[ Sidebar: you can use this tracepoint to track latency
outliers too, but that's another topic. ]
Second, I might try a flame graph study to measure changes in
instruction path length, and also capture an average cycles-
per-byte-read value. Looking at CPU cache misses can often be
a rathole, but flame graphs can surface changes there too.
And lastly, you might want to visit lock_stats to see if
there is any significant change in lock contention. An
unexpected increase in lock contention can often rob
positive changes made in other areas.
My guess is that for the RQ_SPLICE_OK case, the difference
would amount to the elimination of the kernel_sendpage
calls, which are indirect, but not terribly expensive.
Those calls amount to a significant cost only on large I/O.
It might not amount to much relative to the other costs
in the READ path.
So the real purpose here would have to be refactoring to
use bvecs instead of the bespoke xdr_buf structure, and I
would like to see support for bvecs in all of our transports
(looking at you, RDMA) to make this truly worthwhile. I had
started this a while back, but lack of a bvec-based RDMA API
made it less interesting to me. It isn't clear to me yet
whether bvecs or folios should be the replacement for
xdr_buf's head/pages/tail, but I'm a paid-in-full member of
the uneducated rabble.
This might sound like a lot of pushback, but actually I am
open to discussing clean-ups in this area, including the
one you proposed. Just getting a little more careful about
this kind of change as time goes on. And it sounds like you
were already aware of the most recent previous attempt at
this kind of improvement.
--
Chuck Lever
next prev parent reply other threads:[~2023-03-17 15:30 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-16 15:25 [RFC PATCH 00/28] splice, net: Replace sendpage with sendmsg(MSG_SPLICE_PAGES) David Howells
2023-03-16 15:25 ` [RFC PATCH 01/28] net: Declare MSG_SPLICE_PAGES internal sendmsg() flag David Howells
2023-03-16 15:25 ` [RFC PATCH 02/28] Add a special allocator for staging netfs protocol to MSG_SPLICE_PAGES David Howells
2023-03-16 17:28 ` Matthew Wilcox
2023-03-16 18:00 ` David Howells
2023-03-16 15:25 ` [RFC PATCH 03/28] tcp: Support MSG_SPLICE_PAGES David Howells
2023-03-16 18:37 ` Willem de Bruijn
2023-03-16 18:44 ` David Howells
2023-03-16 19:00 ` Willem de Bruijn
2023-03-21 0:38 ` David Howells
2023-03-21 14:22 ` Willem de Bruijn
2023-03-16 15:25 ` [RFC PATCH 04/28] tcp: Convert do_tcp_sendpages() to use MSG_SPLICE_PAGES David Howells
2023-03-16 15:25 ` [RFC PATCH 05/28] tcp_bpf: Inline do_tcp_sendpages as it's now a wrapper around tcp_sendmsg David Howells
2023-03-16 15:25 ` [RFC PATCH 06/28] espintcp: Inline do_tcp_sendpages() David Howells
2023-03-16 15:25 ` [RFC PATCH 07/28] tls: " David Howells
2023-03-16 15:25 ` [RFC PATCH 08/28] siw: " David Howells
2023-03-16 15:25 ` [RFC PATCH 09/28] tcp: Fold do_tcp_sendpages() into tcp_sendpage_locked() David Howells
2023-03-16 15:26 ` [RFC PATCH 10/28] ip, udp: Support MSG_SPLICE_PAGES David Howells
2023-03-16 15:26 ` [RFC PATCH 11/28] udp: Convert udp_sendpage() to use MSG_SPLICE_PAGES David Howells
2023-03-16 15:26 ` [RFC PATCH 12/28] af_unix: Support MSG_SPLICE_PAGES David Howells
2023-03-16 15:26 ` [RFC PATCH 13/28] crypto: af_alg: Indent the loop in af_alg_sendmsg() David Howells
2023-03-16 15:26 ` [RFC PATCH 14/28] crypto: af_alg: Support MSG_SPLICE_PAGES David Howells
2023-03-16 15:26 ` [RFC PATCH 15/28] crypto: af_alg: Convert af_alg_sendpage() to use MSG_SPLICE_PAGES David Howells
2023-03-16 15:26 ` [RFC PATCH 16/28] splice, net: Use sendmsg(MSG_SPLICE_PAGES) rather than ->sendpage() David Howells
2023-03-16 15:26 ` [RFC PATCH 17/28] Remove file->f_op->sendpage David Howells
2023-03-16 15:26 ` [RFC PATCH 18/28] siw: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage to transmit David Howells
2023-03-16 15:26 ` [RFC PATCH 19/28] ceph: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage David Howells
2023-03-16 15:26 ` [RFC PATCH 20/28] iscsi: " David Howells
2023-03-16 15:26 ` [RFC PATCH 21/28] tcp_bpf: Make tcp_bpf_sendpage() go through tcp_bpf_sendmsg(MSG_SPLICE_PAGES) David Howells
2023-03-16 15:26 ` [RFC PATCH 22/28] net: Use sendmsg(MSG_SPLICE_PAGES) not sendpage in skb_send_sock() David Howells
2023-03-16 15:26 ` [RFC PATCH 23/28] algif: Remove hash_sendpage*() David Howells
2023-03-16 15:26 ` [RFC PATCH 24/28] ceph: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage() David Howells
2023-03-16 15:26 ` [RFC PATCH 25/28] rds: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage David Howells
2023-03-16 15:26 ` [RFC PATCH 26/28] dlm: " David Howells
2023-03-16 15:26 ` [RFC PATCH 27/28] sunrpc: Use sendmsg(MSG_SPLICE_PAGES) rather then sendpage David Howells
2023-03-16 16:17 ` Trond Myklebust
2023-03-16 17:10 ` Chuck Lever III
2023-03-16 17:28 ` David Howells
2023-03-16 17:41 ` Chuck Lever III
2023-03-16 21:21 ` David Howells
2023-03-17 15:29 ` Chuck Lever III [this message]
2023-03-16 16:24 ` David Howells
2023-03-16 17:23 ` Trond Myklebust
2023-03-16 18:06 ` David Howells
2023-03-16 19:01 ` Trond Myklebust
2023-03-22 13:10 ` David Howells
2023-03-22 18:15 ` [RFC PATCH] iov_iter: Add an iterator-of-iterators David Howells
2023-03-22 18:47 ` Trond Myklebust
2023-03-22 18:49 ` Matthew Wilcox
[not found] ` <20230316152618.711970-29-dhowells@redhat.com>
2023-03-16 15:57 ` [RFC PATCH 28/28] sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) Marc Kleine-Budde
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=18CE11D1-74E4-4FAF-86B5-5CE834935726@oracle.com \
--to=chuck.lever@oracle.com \
--cc=anna@kernel.org \
--cc=axboe@kernel.dk \
--cc=brauner@kernel.org \
--cc=davem@davemloft.net \
--cc=dhowells@redhat.com \
--cc=edumazet@google.com \
--cc=hch@infradead.org \
--cc=jlayton@kernel.org \
--cc=kuba@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nfs@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=torvalds@linux-foundation.org \
--cc=trondmy@hammerspace.com \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox