From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 88395D42BBF for ; Tue, 12 Nov 2024 18:43:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0273D8D0005; Tue, 12 Nov 2024 13:43:06 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id EF1238D0001; Tue, 12 Nov 2024 13:43:05 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CCF368D0005; Tue, 12 Nov 2024 13:43:05 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 537B08D0001 for ; Tue, 12 Nov 2024 13:43:05 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id EC0FA1A0531 for ; Tue, 12 Nov 2024 18:43:04 +0000 (UTC) X-FDA: 82778313492.06.92C7CFF Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf21.hostedemail.com (Postfix) with ESMTP id 591611C000D for ; Tue, 12 Nov 2024 18:41:43 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=eMxeHhyj; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf21.hostedemail.com: domain of bfoster@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=bfoster@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731436895; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=tPoALtQLssYylid2Eh8irr2ayHO3s2n1Pnm5KZqNhDA=; b=73XnUj7qHI5GE0nnEp7Edi15/Zzcnyy5VsjZj/Mt+8wntLjvw8R46TNnSFA86C65C8hX4H v+QPNymP0oD94KK+b+hYfFf57b6XtMD0+jVyccJXA2Mx4X3wn/WqjU9G9YcxvcvXqJ3EI9 HO/qlIUnSfEDGmuaVwGKOewftHUz1aQ= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=eMxeHhyj; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf21.hostedemail.com: domain of bfoster@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=bfoster@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731436895; a=rsa-sha256; cv=none; b=hzOBSpudJFBSjfNBBPSX7HlEAyb1sJ9r5yI+Ru+gTugMGrqN/UhA9ktdbTX/ve1c0CAPKi QsnY+Z8g2z4JsA/pFGaJ/poAVSEa9eO//4Tyze3dy32o1ISoi8iLNii3Duoo5h59KTR1t8 fFnsHxutAVDk0DMpNahpAIJIoPOnReQ= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1731436982; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=tPoALtQLssYylid2Eh8irr2ayHO3s2n1Pnm5KZqNhDA=; b=eMxeHhyjtJkStwUjEterl/PyJRO31xKYMJZyeH9VB5dp+9OHGEBY1w5e8mqT/qPdI7FQMf XqpaenBJ6kWDWbKpn/UclDo7Frpld1twdxA252W+X6VWGFB2jAT2QR/cPCSbWFOFv85FSE 7lmWNUKw8LIEvbZbIukfaFOyd6N1fZs= Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-189-KBKMB-AKMtCdhLFGSI1cJg-1; Tue, 12 Nov 2024 13:42:58 -0500 X-MC-Unique: KBKMB-AKMtCdhLFGSI1cJg-1 X-Mimecast-MFC-AGG-ID: KBKMB-AKMtCdhLFGSI1cJg Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 68AFC1955EE8; Tue, 12 Nov 2024 18:42:56 +0000 (UTC) Received: from bfoster (unknown [10.22.80.120]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 4402A1956086; Tue, 12 Nov 2024 18:42:54 +0000 (UTC) Date: Tue, 12 Nov 2024 13:44:27 -0500 From: Brian Foster To: Jens Axboe Cc: Christoph Hellwig , "Kirill A. Shutemov" , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, hannes@cmpxchg.org, clm@meta.com, linux-kernel@vger.kernel.org, willy@infradead.org Subject: Re: [PATCH 08/15] mm/filemap: add read support for RWF_UNCACHED Message-ID: References: <221590fa-b230-426a-a8ec-7f18b74044b8@kernel.dk> <04fd04b3-c19e-4192-b386-0487ab090417@kernel.dk> <31db6462-83d1-48b6-99b9-da38c399c767@kernel.dk> <3da73668-a954-47b9-b66d-bb2e719f5590@kernel.dk> <7a4ef71f-905e-4f2a-b3d2-8fd939c5a865@kernel.dk> <3f378e51-87e7-499e-a9fb-4810ca760d2b@kernel.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3f378e51-87e7-499e-a9fb-4810ca760d2b@kernel.dk> X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 591611C000D X-Stat-Signature: uu5ym5buzh1mjtymzm6zteak7pa4dfwc X-HE-Tag: 1731436903-899185 X-HE-Meta: U2FsdGVkX188ettWb5DAGomuw0e94hayz2L7QYvnnzeVNi2dzSYHG5CyvfwrXpoHlVgjpfDPdvs7EOnk/EcNtgoCZW/O2d9I36Vzzp4Axo0sSLmejVT4Hx8kVtlyoXfa/POM4xKmioQpkih+xr6WFb5f1wcpyDjRZnQMJlEQNDHCNGtGKF2H9kXZnTXwSwat9Hbht3W4cG37MfXb2FuOnOfOTzVs7yaD+9w00S+6ad7bNMKg9EdwJ4dbEY92Ad1U1vaLGyi22C1NwWN+0Vgz1LEVh1B0GkIMM9OXZ3MQLR+PhIXsI+8Hv95WTe2aNeG+snxaemNFbTRtGIk/PpGWBZ/V7c3fRpAyDnzSNNUdQH6IyZYH4p6FhJ5NCXte+CHaCfNYcDJs3aMptTRHCpBKtt0YeYW00r87Zf9yuet0TR1oE9ezUoPMqFTdb5EenHWRwKwloFWujDEw4CRe8n78MSyCByWVY54RGinaicfIn1BABTGmvO4EUpPtCLaS3YxDpuQeqZOHO8BIeChU02mP6v8tewoTdiw03OI4yoVZuseoC6JHkr44qVBrTG7WPLvrpyCKRPn2I/kmIylnVCeIcG9JNSsIMTEyvNVWjQZFnrfDrLQFsqgOzzhhpdec1uJK9zKWJjME5wm6ro9gVeQnannNAaa+WSrCPJsPkCuRQyeQP6c43e9g7TMm3HoEsNrPapQBQwKCvWf57fdF7mIzXG4brgRj4+ey8vate9wlyxkarozk7WgOiIy4Frs2kHTxDtMTsog+tH3nHEs/dQ0Sj+BF5ykuNOyRUTpaF8Ox7t5yI9QD4IN8olgWCcnU0ry8hCGoyIuB6UOYhohWQJ3RCAYi1G300SvUb5q3oXAM8Wu/oZp67tS/GrIlEra27AJC13dRzV6tXUvR5yb59mdoCKJ4M1YfprFJbzwi2284ug60IimALHx3+e4wJNyLY2L1+Oihy3TGN8IvNvrInCR 9mEB69Zn ceYhSTozf0WkksUCypq0VEpwpx91ZZBjidGwtdJrvkB6CHOgqARFyYY+VT/L+ElgCMBDJ5OSdVohBBm5s05vsoaQkovdFYbV+r19LZIIuo/T39O+brOlZ4IFff7y2eCcvR/Mwd7wPKkOojH/q7/Iy+R0xgDE62Iw3SfYF08h0OYkdHoN1AZ8GbJoM/HeMtYzh6eVC84EcaG9SZ32qi2AdzYeRkOQKJ+zlTPyPiF6dGbz7gDdVuhBCIu197tQtZd0QTod12IZGgvhV7UvmNkF4XoDJbWHcgoL0fvdEHPM2Jl4YVVMsB2stkLdLM9+3KtljYXzNg7LtWhDo3ndWRIm89FHs1uusbWbaZB+xv1UcgUBIrmiYAK7gUjlNnnDHE8Ra8C0NTCfdxvE8TIAi4sQC5WH2+g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Nov 12, 2024 at 10:19:02AM -0700, Jens Axboe wrote: > On 11/12/24 10:06 AM, Jens Axboe wrote: > > On 11/12/24 9:39 AM, Brian Foster wrote: > >> On Tue, Nov 12, 2024 at 08:14:28AM -0700, Jens Axboe wrote: > >>> On 11/11/24 10:13 PM, Christoph Hellwig wrote: > >>>> On Mon, Nov 11, 2024 at 04:42:25PM -0700, Jens Axboe wrote: > >>>>> Here's the slightly cleaned up version, this is the one I ran testing > >>>>> with. > >>>> > >>>> Looks reasonable to me, but you probably get better reviews on the > >>>> fstests lists. > >>> > >>> I'll send it out once this patchset is a bit closer to integration, > >>> there's the usual chicken and egg situation with it. For now, it's quite > >>> handy for my testing, found a few issues with this version. So thanks > >>> for the suggestion, sure beats writing more of your own test cases :-) > >>> > >> > >> fsx support is probably a good idea as well. It's similar in idea to > >> fsstress, but bashes the same file with mixed operations and includes > >> data integrity validation checks as well. It's pretty useful for > >> uncovering subtle corner case issues or bad interactions.. > > > > Indeed, I did that too. Re-running xfstests right now with that too. > > Here's what I'm running right now, fwiw. It adds RWF_UNCACHED support > for both the sync read/write and io_uring paths. > Nice, thanks. Looks reasonable to me at first glance. A few randomish comments inlined below. BTW, I should have also mentioned that fsx is also useful for longer soak testing. I.e., fstests will provide a decent amount of coverage as is via the various preexisting tests, but I'll occasionally run fsx directly and let it run overnight or something to get the op count at least up in the 100 millions or so to have a little more confidence there isn't some rare/subtle bug lurking. That might be helpful with something like this. JFYI. > > diff --git a/ltp/fsx.c b/ltp/fsx.c > index 41933354..104910ff 100644 > --- a/ltp/fsx.c > +++ b/ltp/fsx.c > @@ -43,6 +43,10 @@ > # define MAP_FILE 0 > #endif > > +#ifndef RWF_UNCACHED > +#define RWF_UNCACHED 0x80 > +#endif > + > #define NUMPRINTCOLUMNS 32 /* # columns of data to print on each line */ > > /* Operation flags (bitmask) */ > @@ -101,7 +105,9 @@ int logcount = 0; /* total ops */ > enum { > /* common operations */ > OP_READ = 0, > + OP_READ_UNCACHED, > OP_WRITE, > + OP_WRITE_UNCACHED, > OP_MAPREAD, > OP_MAPWRITE, > OP_MAX_LITE, > @@ -190,15 +196,16 @@ int o_direct; /* -Z */ > int aio = 0; > int uring = 0; > int mark_nr = 0; > +int rwf_uncached = 1; > > int page_size; > int page_mask; > int mmap_mask; > -int fsx_rw(int rw, int fd, char *buf, unsigned len, unsigned offset); > +int fsx_rw(int rw, int fd, char *buf, unsigned len, unsigned offset, int flags); > #define READ 0 > #define WRITE 1 > -#define fsxread(a,b,c,d) fsx_rw(READ, a,b,c,d) > -#define fsxwrite(a,b,c,d) fsx_rw(WRITE, a,b,c,d) > +#define fsxread(a,b,c,d,f) fsx_rw(READ, a,b,c,d,f) > +#define fsxwrite(a,b,c,d,f) fsx_rw(WRITE, a,b,c,d,f) > My pattern recognition brain wants to see an 'e' here. ;) > struct timespec deadline; > > @@ -266,7 +273,9 @@ prterr(const char *prefix) > > static const char *op_names[] = { > [OP_READ] = "read", > + [OP_READ_UNCACHED] = "read_uncached", > [OP_WRITE] = "write", > + [OP_WRITE_UNCACHED] = "write_uncached", > [OP_MAPREAD] = "mapread", > [OP_MAPWRITE] = "mapwrite", > [OP_TRUNCATE] = "truncate", > @@ -393,12 +402,14 @@ logdump(void) > prt("\t******WWWW"); > break; > case OP_READ: > + case OP_READ_UNCACHED: > prt("READ 0x%x thru 0x%x\t(0x%x bytes)", > lp->args[0], lp->args[0] + lp->args[1] - 1, > lp->args[1]); > if (overlap) > prt("\t***RRRR***"); > break; > + case OP_WRITE_UNCACHED: > case OP_WRITE: > prt("WRITE 0x%x thru 0x%x\t(0x%x bytes)", > lp->args[0], lp->args[0] + lp->args[1] - 1, > @@ -784,9 +795,8 @@ doflush(unsigned offset, unsigned size) > } > > void > -doread(unsigned offset, unsigned size) > +__doread(unsigned offset, unsigned size, int flags) > { > - off_t ret; > unsigned iret; > > offset -= offset % readbdy; > @@ -818,23 +828,39 @@ doread(unsigned offset, unsigned size) > (monitorend == -1 || offset <= monitorend)))))) > prt("%lld read\t0x%x thru\t0x%x\t(0x%x bytes)\n", testcalls, > offset, offset + size - 1, size); > - ret = lseek(fd, (off_t)offset, SEEK_SET); > - if (ret == (off_t)-1) { > - prterr("doread: lseek"); > - report_failure(140); > - } > - iret = fsxread(fd, temp_buf, size, offset); > + iret = fsxread(fd, temp_buf, size, offset, flags); > if (iret != size) { > - if (iret == -1) > - prterr("doread: read"); > - else > + if (iret == -1) { > + if (errno == EOPNOTSUPP && flags & RWF_UNCACHED) { > + rwf_uncached = 1; I assume you meant rwf_uncached = 0 here? If so, check out test_fallocate() and friends to see how various operations are tested for support before the test starts. Following that might clean things up a bit. Also it's useful to have a CLI option to enable/disable individual features. That tends to be helpful to narrow things down when it does happen to explode and you want to narrow down the cause. Brian > + return; > + } > + prterr("dowrite: read"); > + } else { > prt("short read: 0x%x bytes instead of 0x%x\n", > iret, size); > + } > report_failure(141); > } > check_buffers(temp_buf, offset, size); > } > +void > +doread(unsigned offset, unsigned size) > +{ > + __doread(offset, size, 0); > +} > > +void > +doread_uncached(unsigned offset, unsigned size) > +{ > + if (rwf_uncached) { > + __doread(offset, size, RWF_UNCACHED); > + if (rwf_uncached) > + return; > + } > + __doread(offset, size, 0); > +} > + > void > check_eofpage(char *s, unsigned offset, char *p, int size) > { > @@ -870,7 +896,6 @@ check_contents(void) > unsigned map_offset; > unsigned map_size; > char *p; > - off_t ret; > unsigned iret; > > if (!check_buf) { > @@ -885,13 +910,7 @@ check_contents(void) > if (size == 0) > return; > > - ret = lseek(fd, (off_t)offset, SEEK_SET); > - if (ret == (off_t)-1) { > - prterr("doread: lseek"); > - report_failure(140); > - } > - > - iret = fsxread(fd, check_buf, size, offset); > + iret = fsxread(fd, check_buf, size, offset, 0); > if (iret != size) { > if (iret == -1) > prterr("check_contents: read"); > @@ -1064,9 +1083,8 @@ update_file_size(unsigned offset, unsigned size) > } > > void > -dowrite(unsigned offset, unsigned size) > +__dowrite(unsigned offset, unsigned size, int flags) > { > - off_t ret; > unsigned iret; > > offset -= offset % writebdy; > @@ -1101,18 +1119,18 @@ dowrite(unsigned offset, unsigned size) > (monitorend == -1 || offset <= monitorend)))))) > prt("%lld write\t0x%x thru\t0x%x\t(0x%x bytes)\n", testcalls, > offset, offset + size - 1, size); > - ret = lseek(fd, (off_t)offset, SEEK_SET); > - if (ret == (off_t)-1) { > - prterr("dowrite: lseek"); > - report_failure(150); > - } > - iret = fsxwrite(fd, good_buf + offset, size, offset); > + iret = fsxwrite(fd, good_buf + offset, size, offset, flags); > if (iret != size) { > - if (iret == -1) > + if (iret == -1) { > + if (errno == EOPNOTSUPP && flags & RWF_UNCACHED) { > + rwf_uncached = 0; > + return; > + } > prterr("dowrite: write"); > - else > + } else { > prt("short write: 0x%x bytes instead of 0x%x\n", > iret, size); > + } > report_failure(151); > } > if (do_fsync) { > @@ -1126,6 +1144,22 @@ dowrite(unsigned offset, unsigned size) > } > } > > +void > +dowrite(unsigned offset, unsigned size) > +{ > + __dowrite(offset, size, 0); > +} > + > +void > +dowrite_uncached(unsigned offset, unsigned size) > +{ > + if (rwf_uncached) { > + __dowrite(offset, size, RWF_UNCACHED); > + if (rwf_uncached) > + return; > + } > + __dowrite(offset, size, 0); > +} > > void > domapwrite(unsigned offset, unsigned size) > @@ -2340,11 +2374,21 @@ have_op: > doread(offset, size); > break; > > + case OP_READ_UNCACHED: > + TRIM_OFF_LEN(offset, size, file_size); > + doread_uncached(offset, size); > + break; > + > case OP_WRITE: > TRIM_OFF_LEN(offset, size, maxfilelen); > dowrite(offset, size); > break; > > + case OP_WRITE_UNCACHED: > + TRIM_OFF_LEN(offset, size, maxfilelen); > + dowrite_uncached(offset, size); > + break; > + > case OP_MAPREAD: > TRIM_OFF_LEN(offset, size, file_size); > domapread(offset, size); > @@ -2702,7 +2746,7 @@ uring_setup() > } > > int > -uring_rw(int rw, int fd, char *buf, unsigned len, unsigned offset) > +uring_rw(int rw, int fd, char *buf, unsigned len, unsigned offset, int flags) > { > struct io_uring_sqe *sqe; > struct io_uring_cqe *cqe; > @@ -2733,6 +2777,7 @@ uring_rw(int rw, int fd, char *buf, unsigned len, unsigned offset) > } else { > io_uring_prep_writev(sqe, fd, &iovec, 1, o); > } > + sqe->rw_flags = flags; > > ret = io_uring_submit_and_wait(&ring, 1); > if (ret != 1) { > @@ -2781,7 +2826,7 @@ uring_rw(int rw, int fd, char *buf, unsigned len, unsigned offset) > } > #else > int > -uring_rw(int rw, int fd, char *buf, unsigned len, unsigned offset) > +uring_rw(int rw, int fd, char *buf, unsigned len, unsigned offset, int flags) > { > fprintf(stderr, "io_rw: need IO_URING support!\n"); > exit(111); > @@ -2789,19 +2834,21 @@ uring_rw(int rw, int fd, char *buf, unsigned len, unsigned offset) > #endif > > int > -fsx_rw(int rw, int fd, char *buf, unsigned len, unsigned offset) > +fsx_rw(int rw, int fd, char *buf, unsigned len, unsigned offset, int flags) > { > int ret; > > if (aio) { > ret = aio_rw(rw, fd, buf, len, offset); > } else if (uring) { > - ret = uring_rw(rw, fd, buf, len, offset); > + ret = uring_rw(rw, fd, buf, len, offset, flags); > } else { > + struct iovec iov = { .iov_base = buf, .iov_len = len }; > + > if (rw == READ) > - ret = read(fd, buf, len); > + ret = preadv2(fd, &iov, 1, offset, flags); > else > - ret = write(fd, buf, len); > + ret = pwritev2(fd, &iov, 1, offset, flags); > } > return ret; > } > > > -- > Jens Axboe >