From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BF5DFC636D6 for ; Thu, 9 Feb 2023 16:15:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 02E236B0071; Thu, 9 Feb 2023 11:15:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id F20A96B0072; Thu, 9 Feb 2023 11:15:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DC12A6B0074; Thu, 9 Feb 2023 11:15:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id C7E046B0071 for ; Thu, 9 Feb 2023 11:15:49 -0500 (EST) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 82E8980A42 for ; Thu, 9 Feb 2023 16:15:49 +0000 (UTC) X-FDA: 80448254418.12.EFF2318 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf14.hostedemail.com (Postfix) with ESMTP id D32F610002C for ; Thu, 9 Feb 2023 16:15:45 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=at0NNN30; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf14.hostedemail.com: domain of dhowells@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=dhowells@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675959346; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=bNUizXtmcdnkOYN/PK7pVzEd0gUwRHUo/YtJnnQtf7s=; b=8kCf9xdCvhmGUdPlkJMbWHBfdDy3GOMVIUr30yqCDndswK9RjKUXxgeUGuE90bRBSN1pIg RWzamSAi2NCoGLQb6wlH4oURMwOhmYRjnr4TCrhfF54U8vGCvG4zXOaQiYFnb+tWXApNhx GbfRW782PFwCquy0GYwhcXJSNytlq1s= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=at0NNN30; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf14.hostedemail.com: domain of dhowells@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=dhowells@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675959346; a=rsa-sha256; cv=none; b=67YZupiJ8glgUJdZ03XAj6kaBp5k+kPDW+NA6e5YKfdJ7MTn3oohtr8tAD++ovTltgS1Fa Ro+XG+DyYZ0PkHeJBpgiDhU5lv+K5p0y420Vh6kyTUx+jTFQHIAgs0S1KQwAFe4ULjyFWM 0p3GosniCqmIS1ZVEaeCAiKx1g8SNuw= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1675959345; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bNUizXtmcdnkOYN/PK7pVzEd0gUwRHUo/YtJnnQtf7s=; b=at0NNN30Csvjn8Qp+HwHyRccsyoZxVM84RYQzABX3pdf5dt4miJmkhQplzIFP4TOErsfHh L5vF13pN3pjxpzgtV1rOLJjRLVeoJFrJXheh4I0PI4VLTr3wRen8vZ4y1pWCVpZSm0uW6n 4KABdVDnsbOsf2JklNIff+bJZGQnQHs= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-449-yy_9a8k6Mr-6k9U0h0Tuow-1; Thu, 09 Feb 2023 11:15:41 -0500 X-MC-Unique: yy_9a8k6Mr-6k9U0h0Tuow-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id DC46485C70E; Thu, 9 Feb 2023 16:15:40 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8B0041121314; Thu, 9 Feb 2023 16:15:38 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 From: David Howells In-Reply-To: References: <20230209102954.528942-1-dhowells@redhat.com> <20230209102954.528942-2-dhowells@redhat.com> To: Matthew Wilcox Cc: dhowells@redhat.com, Jens Axboe , Al Viro , Christoph Hellwig , Jan Kara , Jeff Layton , David Hildenbrand , Jason Gunthorpe , Logan Gunthorpe , Hillf Danton , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, syzbot+a440341a59e3b7142895@syzkaller.appspotmail.com, Christoph Hellwig , John Hubbard Subject: [PATCH v14 01/12] splice: Fix O_DIRECT file read splice to avoid reversion of ITER_PIPE MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <909201.1675959337.1@warthog.procyon.org.uk> Content-Transfer-Encoding: quoted-printable Date: Thu, 09 Feb 2023 16:15:37 +0000 Message-ID: <909202.1675959337@warthog.procyon.org.uk> X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: D32F610002C X-Stat-Signature: uwcu3bj9rrmus6yukmppa1g1dgafdj83 X-HE-Tag: 1675959345-238764 X-HE-Meta: U2FsdGVkX18oY5aCe4Ter5k1PP3sf6t8NBlXhi40x3BjlPFc+bS5pEFRtGUR7BsiHUP1RWh8Oyw8q4gNm8e7JAfnEr9Tn5OhYsdvrRexgnXzB5jhdbkxqDzvjlvzeaDiT8KyfNPz5zntwU3/0a46ssd3fKIxxhrvJZpiqLkcRgUE88hM649bcteN7mObZlkyIlC78Bej3um3bD8IVSiwOb2WBg/UXZJPdrMgNAgeXaJYIqhEdpi2I129U/iLra4exE7t6rtcJqQ/nPBrpgyj84h8PBgFFqfw/dvZFR9i5s9EYNniQjEjjVq4o4zbYWa9kP1KQyhVcSAzsryK/r6E4+OzguAe75qZYuvMFK8xz4E9nQCCa2v1eZq+znZKKedsTCa30jQ48vi2ZZnxIg59lupaLIqTBcDZP9oLNihRBXmd+q2cyuewwdWieocR04HIREd8/B1V8vBA5lQ03oN8gM0djFImjU52IJhBiocZDZCdUyghlsIPF7JhR+cZfhdCLUuPmYlYlLEY5j4QVSEkIFm3Uda4OpNCFDANZ8az7jIIQb2ZlxTBKGWEAOXTdMN4Q+IGXvLPTA/RMBK5FpAg8dZt3eno6/h5QNvZPVvP/I0Rhm4t8VLgj8z119cwpewAFXYbU3oXjy2GDb8eqUdr6+22uY7u5FOUlezy6OGnrDTZFKnt7osykF8ljY7TN0nfKChexWhR0igml7qK0NAgoY9H1YBTpiQb5aOStqevc0rMfAYf5OLuRGBMexU4Ea2+46z9TN6eUD0kB9WWGZOStRt1pSBLKldCiES8K2TW15RKGwHhqilxaeRbM4VAknyudtx4nk+3g3I0+yFumcvxOit/+yMmM5CsXNHwr/kSWOXA+OSibbevZZLLqIa9jzbQy9YKR2M/FMgZ+doJQbxtHoJQoa5RdRAXof4odoikV+el269FwjkmUutXnsZMgSoDGj8tt7O3CEg4SlKGuNf YDhYGBdN hkzOIVez78M1a5b5ZW+WDdc2ifmW9qCoQOc/7rBBEGbO1/kClW4FPxHvZPVVY96InRkpLB0neaLCmMEyAWnqzPuv1/Fi+t5V8tHcZ844xK/ju1MlZikECg4IJI4pnqC0dqP5tUvijx4/5rJYTuJRYDav9Ank28HUi5bm533ayrnUcY0cM2a/5r2e5BobV8DYZjKq18zL+A5lRSS29b9tcaUPiCWAeVlztUo4cv7KumE6Cc4qBLbwBa8/8LUu0js26ajgKTBuUtpDgu9YNhxhH+5irCQj95us8CQ5j6gVel+ve3FU3nBpxmog8JbefCLClqqaEKDCJRYTMI6W5cphI7WgX5p0JLP1yz3DXqOzGfEy0Xx3+5CAbBkgt47fdBYTtabxJrY+Te4Bjy9rsAHUTlCZmx5q5NgrkWYOAEqYEx8D4BPJ9+W2oRZwdfbOJuFNKaIN1qrMAKHkx75Bf/kkhzDzLsgE1uXQVCiwIGJQFokzP9idtzy8MEMa6y4J+540+A2yc+y+ZIkZZbdKnaE+qSE0epkyMiCfX9RB4ZcOZCSWxBTuqtmboqoE4p/sszXbgacyn2puisBRCQ1+q0HFnRX4Cg/RDwhwfC+aS/pWLXooYRJEZzkXISchexRab57r1ub2+xDUYIp2h0m1cz+568IJQcPthcpAbrZbE+F75e4dl7ZKqj+rYPGJo+6tea5YSWL4e3uwJMZQa17l+3x3Ltjkfsva4riRL7nUJrtU89Xw7haE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Matthew Wilcox wrote: > Please don't use alloc_pages_bulk_list(). ... > If you have that array, you can then use release_pages() ... Done. See attached replacement patch. David --- splice: Fix O_DIRECT file read splice to avoid reversion of ITER_PIPE With the upcoming iov_iter_extract_pages() function, pages extracted from = a non-user-backed iterator such as ITER_PIPE aren't pinned. __iomap_dio_rw(), however, calls iov_iter_revert() to shorten the iterator to just the bufferage it is going to use - which has the side-effect of freeing the excess pipe buffers, even though they're attached to a bio and may get written to by DMA (thanks to Hillf Danton for spotting this[1]). This then causes memory corruption that is particularly noticable when the syzbot test[2] is run. The test boils down to: out =3D creat(argv[1], 0666); ftruncate(out, 0x800); lseek(out, 0x200, SEEK_SET); in =3D open(argv[1], O_RDONLY | O_DIRECT | O_NOFOLLOW); sendfile(out, in, NULL, 0x1dd00); run repeatedly in parallel. What I think is happening is that ftruncate() occasionally shortens the DIO read that's about to be made by sendfile's splice core by reducing i_size. Fix this by splitting the handling of a splice from an O_DIRECT file fd of= f from that of non-DIO and in this case, replacing the use of an ITER_PIPE iterator with an ITER_BVEC iterator for which reversion won't free the buffers. The DIO-specific code bulk allocates all the buffers it thinks i= t is going to use in advance, does the read synchronously and only then trim= s the buffer down. The pages we did use get pushed into the pipe. This should be more efficient for DIO read by virtue of doing a bulk page allocation, but slightly less efficient by ignoring any partial page in th= e pipe. Fixes: 920756a3306a ("block: Convert bio_iov_iter_get_pages to use iov_ite= r_extract_pages") Reported-by: syzbot+a440341a59e3b7142895@syzkaller.appspotmail.com Signed-off-by: David Howells cc: Jens Axboe cc: Christoph Hellwig cc: Al Viro cc: David Hildenbrand cc: John Hubbard cc: linux-mm@kvack.org cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20230207094731.1390-1-hdanton@sina.com/ [1= ] Link: https://lore.kernel.org/r/000000000000b0b3c005f3a09383@google.com/ [= 2] --- Notes: ver #14) - Use alloc_pages_bulk_array() rather than alloc_pages_bulk_list(). - Use release_pages() rather than a loop calling __free_page(). = ver #13) - Don't completely replace generic_file_splice_read(), but rather onl= y use this if we're doing a splicing from an O_DIRECT file fd. fs/splice.c | 98 ++++++++++++++++++++++++++++++++++++++++++++++++++++++= ++++++ 1 file changed, 98 insertions(+) diff --git a/fs/splice.c b/fs/splice.c index 5969b7a1d353..91244270b36e 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -282,6 +282,101 @@ void splice_shrink_spd(struct splice_pipe_desc *spd) kfree(spd->partial); } = +/* + * Splice data from an O_DIRECT file into pages and then add them to the = output + * pipe. + */ +static ssize_t generic_file_direct_splice_read(struct file *in, loff_t *p= pos, + struct pipe_inode_info *pipe, + size_t len, unsigned int flags) +{ + struct iov_iter to; + struct bio_vec *bv; + struct kiocb kiocb; + struct page **pages; + unsigned int head; + ssize_t ret; + size_t used, npages, chunk, remain, reclaim; + int i; + + /* Work out how much data we can actually add into the pipe */ + used =3D pipe_occupancy(pipe->head, pipe->tail); + npages =3D max_t(ssize_t, pipe->max_usage - used, 0); + len =3D min_t(size_t, len, npages * PAGE_SIZE); + npages =3D DIV_ROUND_UP(len, PAGE_SIZE); + + bv =3D kzalloc(array_size(npages, sizeof(bv[0])) + + array_size(npages, sizeof(struct page *)), GFP_KERNEL); + if (!bv) + return -ENOMEM; + + pages =3D (void *)(bv + npages); + npages =3D alloc_pages_bulk_array(GFP_USER, npages, pages); + if (!npages) { + kfree(bv); + return -ENOMEM; + } + + remain =3D len =3D min_t(size_t, len, npages * PAGE_SIZE); + + for (i =3D 0; i < npages; i++) { + chunk =3D min_t(size_t, PAGE_SIZE, remain); + bv[i].bv_page =3D pages[i]; + bv[i].bv_offset =3D 0; + bv[i].bv_len =3D chunk; + remain -=3D chunk; + } + + /* Do the I/O */ + iov_iter_bvec(&to, ITER_DEST, bv, npages, len); + init_sync_kiocb(&kiocb, in); + kiocb.ki_pos =3D *ppos; + ret =3D call_read_iter(in, &kiocb, &to); + + reclaim =3D npages * PAGE_SIZE; + remain =3D 0; + if (ret > 0) { + reclaim -=3D ret; + remain =3D ret; + *ppos =3D kiocb.ki_pos; + file_accessed(in); + } else if (ret < 0) { + /* + * callers of ->splice_read() expect -EAGAIN on + * "can't put anything in there", rather than -EFAULT. + */ + if (ret =3D=3D -EFAULT) + ret =3D -EAGAIN; + } + + /* Free any pages that didn't get touched at all. */ + reclaim /=3D PAGE_SIZE; + if (reclaim) { + npages -=3D reclaim; + release_pages(pages + npages, reclaim); + } + + /* Push the remaining pages into the pipe. */ + head =3D pipe->head; + for (i =3D 0; i < npages; i++) { + struct pipe_buffer *buf =3D &pipe->bufs[head & (pipe->ring_size - 1)]; + + chunk =3D min_t(size_t, remain, PAGE_SIZE); + *buf =3D (struct pipe_buffer) { + .ops =3D &default_pipe_buf_ops, + .page =3D bv[i].bv_page, + .offset =3D 0, + .len =3D chunk, + }; + head++; + remain -=3D chunk; + } + pipe->head =3D head; + + kfree(bv); + return ret; +} + /** * generic_file_splice_read - splice data from file to a pipe * @in: file to splice from @@ -303,6 +398,9 @@ ssize_t generic_file_splice_read(struct file *in, loff= _t *ppos, struct kiocb kiocb; int ret; = + if (in->f_flags & O_DIRECT) + return generic_file_direct_splice_read(in, ppos, pipe, len, flags); + iov_iter_pipe(&to, ITER_DEST, pipe, len); init_sync_kiocb(&kiocb, in); kiocb.ki_pos =3D *ppos;