From: Keith Busch <kbusch@kernel.org>
To: Jeff Layton <jlayton@kernel.org>
Cc: Mike Snitzer <snitzer@kernel.org>,
Chuck Lever <chuck.lever@oracle.com>, NeilBrown <neil@brown.name>,
Olga Kornievskaia <okorniev@redhat.com>,
Dai Ngo <Dai.Ngo@oracle.com>, Tom Talpey <tom@talpey.com>,
Trond Myklebust <trondmy@kernel.org>,
Anna Schumaker <anna@kernel.org>,
linux-nfs@vger.kernel.org, linus-fsdevel@vger.kernel.org,
linux-mm@kvack.org, hch@infradead.org
Subject: Re: [RFC PATCH v2 4/8] lib/iov_iter: remove piecewise bvec length checking in iov_iter_aligned_bvec
Date: Thu, 10 Jul 2025 08:48:04 -0600 [thread overview]
Message-ID: <aG_SpLuUv4EH7fAb@kbusch-mbp> (raw)
In-Reply-To: <5819d6c5bb194613a14d2dcf05605e701683ba49.camel@kernel.org>
On Thu, Jul 10, 2025 at 09:52:53AM -0400, Jeff Layton wrote:
> On Tue, 2025-07-08 at 12:06 -0400, Mike Snitzer wrote:
> > iov_iter_aligned_bvec() is strictly checking alignment of each element
> > of the bvec to arrive at whether the bvec is aligned relative to
> > dma_alignment and on-disk alignment. Checking each element
> > individually results in disallowing a bvec that in aggregate is
> > perfectly aligned relative to the provided @len_mask.
> >
> > Relax the on-disk alignment checking such that it is done on the full
> > extent described by the bvec but still do piecewise checking of the
> > dma_alignment for each bvec's bv_offset.
> >
> > This allows for NFS's WRITE payload to be issued using O_DIRECT as
> > long as the bvec created with xdr_buf_to_bvec() is composed of pages
> > that respect the underlying device's dma_alignment (@addr_mask) and
> > the overall contiguous on-disk extent is aligned relative to the
> > logical_block_size (@len_mask).
> >
> > Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> > ---
> > lib/iov_iter.c | 5 +++--
> > 1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/lib/iov_iter.c b/lib/iov_iter.c
> > index bdb37d572e97..b2ae482b8a1d 100644
> > --- a/lib/iov_iter.c
> > +++ b/lib/iov_iter.c
> > @@ -819,13 +819,14 @@ static bool iov_iter_aligned_bvec(const struct iov_iter *i, unsigned addr_mask,
> > unsigned skip = i->iov_offset;
> > size_t size = i->count;
> >
> > + if (size & len_mask)
> > + return false;
> > +
> > do {
> > size_t len = bvec->bv_len;
> >
> > if (len > size)
> > len = size;
> > - if (len & len_mask)
> > - return false;
> > if ((unsigned long)(bvec->bv_offset + skip) & addr_mask)
> > return false;
> >
>
> cc'ing Keith too since he wrote this helper originally.
Thanks.
There's a comment in __bio_iov_iter_get_pages that says it expects each
vector to be a multiple of the block size. That makes it easier to
slit when needed, and this patch would allow vectors that break the
current assumption when calculating the "trim" value.
But for nvme, you couldn't split such a bvec into a usable command
anyway. I think you'd have to introduce a different queue limit to check
against when validating iter alignment if you don't want to use the
logical block size.
next prev parent reply other threads:[~2025-07-10 14:48 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-08 16:06 [RFC PATCH v2 0/8] NFSD: support DIO Mike Snitzer
2025-07-08 16:06 ` [RFC PATCH v2 1/8] NFSD: Relocate the fh_want_write() and fh_drop_write() helpers Mike Snitzer
2025-07-10 13:59 ` Jeff Layton
2025-07-08 16:06 ` [RFC PATCH v2 2/8] NFSD: Move the fh_getattr() helper Mike Snitzer
2025-07-10 13:59 ` Jeff Layton
2025-07-08 16:06 ` [RFC PATCH v2 3/8] NFSD: filecache: add STATX_DIOALIGN and STATX_DIO_READ_ALIGN support Mike Snitzer
2025-07-10 7:45 ` Christoph Hellwig
2025-07-14 17:46 ` Mike Snitzer
2025-07-08 16:06 ` [RFC PATCH v2 4/8] lib/iov_iter: remove piecewise bvec length checking in iov_iter_aligned_bvec Mike Snitzer
2025-07-10 7:24 ` Christoph Hellwig
2025-07-10 7:32 ` Mike Snitzer
2025-07-10 7:44 ` Christoph Hellwig
2025-07-10 13:52 ` Jeff Layton
2025-07-10 14:48 ` Keith Busch [this message]
2025-07-10 16:12 ` Mike Snitzer
2025-07-10 16:29 ` Keith Busch
2025-07-10 17:22 ` Mike Snitzer
2025-07-10 19:51 ` Keith Busch
2025-07-10 19:57 ` Keith Busch
2025-08-01 15:23 ` Keith Busch
2025-08-01 16:10 ` Mike Snitzer
2025-07-08 16:06 ` [RFC PATCH v2 5/8] NFSD: pass nfsd_file to nfsd_iter_read() Mike Snitzer
2025-07-08 16:06 ` [RFC PATCH v2 6/8] NFSD: add io_cache_read controls to debugfs interface Mike Snitzer
2025-07-10 7:47 ` Christoph Hellwig
2025-07-14 17:33 ` Mike Snitzer
2025-07-10 14:06 ` Jeff Layton
2025-07-10 22:46 ` Chuck Lever
2025-07-14 16:47 ` Mike Snitzer
2025-07-15 11:57 ` Jeff Layton
2025-07-08 16:06 ` [RFC PATCH v2 7/8] NFSD: add io_cache_write " Mike Snitzer
2025-07-08 16:06 ` [RFC PATCH v2 8/8] NFSD: issue READs using O_DIRECT even if IO is misaligned Mike Snitzer
2025-07-08 21:22 ` Mike Snitzer
2025-07-10 7:51 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aG_SpLuUv4EH7fAb@kbusch-mbp \
--to=kbusch@kernel.org \
--cc=Dai.Ngo@oracle.com \
--cc=anna@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=hch@infradead.org \
--cc=jlayton@kernel.org \
--cc=linus-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nfs@vger.kernel.org \
--cc=neil@brown.name \
--cc=okorniev@redhat.com \
--cc=snitzer@kernel.org \
--cc=tom@talpey.com \
--cc=trondmy@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox