From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 60BB1C61DA4 for ; Mon, 6 Feb 2023 23:02:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 846B36B0074; Mon, 6 Feb 2023 18:02:58 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7F6726B0075; Mon, 6 Feb 2023 18:02:58 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6BE666B0078; Mon, 6 Feb 2023 18:02:58 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 5C9E46B0074 for ; Mon, 6 Feb 2023 18:02:58 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 305571206DA for ; Mon, 6 Feb 2023 23:02:58 +0000 (UTC) X-FDA: 80438394036.20.E51A5C9 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf19.hostedemail.com (Postfix) with ESMTP id 618EE1A0011 for ; Mon, 6 Feb 2023 23:02:56 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=OyV7s6ML; spf=pass (imf19.hostedemail.com: domain of dhowells@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=dhowells@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675724576; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=HafLRwuDKhvMUB3t6snmTU2FQ23EqiYcnHLv7JdxfGI=; b=mq+Gnhp5FU3Dq3EP6Z7obm/MxLBBB46OtXR1eSTL9UV3T7FYRy4hXT/BUOUeGmd+Fob76g kO2EbcAbCpbI8MeWM6sPV5n9HIL9eV2uDAHyDPht3L8EFs99BxGgJkFoTEubJ0xVqkv5Wd h36ml9lg0l0yOTKcJGRCMAe6c3I4LPY= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=OyV7s6ML; spf=pass (imf19.hostedemail.com: domain of dhowells@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=dhowells@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675724576; a=rsa-sha256; cv=none; b=5n3+ofT9EECXhgQPiOkzwfQL/qtH0mEeYm84NtVIXD7NhdqXhUK6dPzRMFIIi43WfynSxM C4YoBLozBvzLyPQik9y/LV46NxnKZA9TOnKSxa9bbG4kAdAeQARKO0uPLSMjEwKuAWR133 d4rfJIzG1iCLOg8uXkGRaGs7S6TqMg8= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1675724575; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=HafLRwuDKhvMUB3t6snmTU2FQ23EqiYcnHLv7JdxfGI=; b=OyV7s6ML3JTTBTNh2KAZlSfOM2a0nH1nmXEOSK5hoVpQHtgx/ScQgEpWX+I9wkLcTb3HEQ P+HCxYxtqqPKVI0v63x4s1LS9YOQMFAPTtnT+C9nc5cNOWLTwpXHU3oTsqInXgwcBe/Lxa dJNw/H+T/0wzF8PJb+4HyaycKHhxW8Y= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-358-YY9qNXDPPFaf80fTCGFsuQ-1; Mon, 06 Feb 2023 18:02:54 -0500 X-MC-Unique: YY9qNXDPPFaf80fTCGFsuQ-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 06B378027FD; Mon, 6 Feb 2023 23:02:54 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.97]) by smtp.corp.redhat.com (Postfix) with ESMTP id E8950C15BA0; Mon, 6 Feb 2023 23:02:52 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 From: David Howells To: Jens Axboe , Christoph Hellwig cc: dhowells@redhat.com, David Hildenbrand , John Hubbard , linux-mm@kvack.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Need help tracking down a bug in the bio-FOLL_PIN patches MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <2811507.1675724572.1@warthog.procyon.org.uk> Content-Transfer-Encoding: quoted-printable Date: Mon, 06 Feb 2023 23:02:52 +0000 Message-ID: <2811508.1675724572@warthog.procyon.org.uk> X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 618EE1A0011 X-Stat-Signature: 6dwm6hgc1p1zs663pu9unoibq6z4n856 X-Rspam-User: X-HE-Tag: 1675724576-548236 X-HE-Meta: U2FsdGVkX1+Lwbu6PxT2tlTF/SznuW4AGaIArheunq4RvgjP1nsiSh6ANth0wcrCz5p1NE1eFMgUWKsSFPLXIqa3TRSSzQhdB3CSBD6VifpbRNXXTAORJIp1H3bN7LAtIHz9AK8eGfoGGXZrtmHjkclKD5HnZEmAZwtK9hZWSm8gH7sH1WX5mmiaobsMZ7DnRmv4S6rb/RkqaXcqL+qxvu79Rh45aOUvDF7kBV8/i+MZKKDy5TDblWreeAv1+osGDJ8DcYPGTT0D3jMOK8GKPSozIukd9BZsFFdTbdbhbXv2Kf3Jrmmrl5bVIRMmhGwHgOk5HoHuYkWv/iGncVmHFPWLYWOa6HUHktujTky13n+Rz7zGvV0ujmzBsXK4tR/0cnqEHYl96OlvAakXboo4LP65Idz8nkuA+8OGmkBpHEqW47TzpZO3yoAMiucag8oZZdkv9cg+s8Hk7F49m5Y4lBlNygL0JEEv6/fft0bkb4NBl8cEyYaqI/i0OK4d4qxgy05qSWMDKq4zOnElVbaCIQIq39mHUPXyLPZBu0t/hKUElGEnfBVdTa8NhyNoiw6r/BtOyuufi7r308tSEcx2cuAoiDVZHDwBYRYv3sSHZibMrQEm5KkShC/GmtLZlY7rGAKIqiY2n5EJgun7IuNv6kilD4tYSM1WvQleaQlzpf0eNlHnaIbRTD12D7SwOoXN6VbGJ6MEbJmJA/jjtDIuPolvVNoJWG00lrlz54z6bgRzx9AFtYzSwoTfdWRD5hjnv27fXbeiAbfkJxYBLtn2twETf5OMzO2VrV9JOKDretqw6bo+sSWBhGZQtaeg1ClKHgYLTTFB/FKdEX2cuoy+Y+GlAgMBHPcYiP/kB51kpTHE+8YKuMJuGgHiuPRcakqqeWrHGUZff9j3G4em8hQGzu13nasxo1TzbU4p3H6NDzBzt+IesScWTENMZxW73QE5BPmFeTLjtqwqA2f39zG Q80gGS7Q ED0Ed4Px7ocEaG+xU0N7iFz36s12PzASHEfjoz65R3s48auU6p47QAwnUVdpWPnsd6I7etiKe8Fm4YGblveBZ6oOFd9p+FO3omR2DaRTHQP8DmR5u7dBHsO3svlCEEEtEkCB9GM+lIpcRkLlyAxPM+Qqute7UwyYw8YlggnvZCnhWQFFpEfh49gwmUCjsclaWE9Qy7hzxP4+EvyUF7x8YNmUWPzF06qE4bd+c5nI/l0d1NFxp694WkbKnnggU5Zleqfjz16mEWBdMcogs/A/4ckU9k6xXOT2xUBLfJC/dn8Bfviih7Y3mspRGY3IhT8NiYjPG2TN3LWDYZVrNq2hgnrso3Bof2Hu00vpEj0sdGM3GSf18kmRBDFMyqw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Jens, Christoph, I need some help tracking down a bug in the patches that make the bio usin= g page pinning or no pinning using iov_iter_extract_pages(). The bug causes seemingly random memory corruption once the "block: Convert bio_iov_iter_get_pages to use iov_iter_extract_pages" patch is applied. The bug was detected by a syzbot special: https://lore.kernel.org/r/000000000000b0b3c005f3a09383@google.com/ The basic test body creates/opens a file, truncates it, opens it again O_DIRECT and then uses sendfile to copy from the file to itself, causing t= he file to extend as it goes. I've added a reduced testcase below. Note tha= t the problem only seems to occur if several instances of the test are run i= n parallel. After a few iterations, random memory corruption start showing = up and I see things like: syz-direct-send[6095]: segfault at 0 ip 0000000000000000 sp 00007ffc81488b= 28 error 14 in syz-direct-sendfile[400000+1000] likely on CPU 0 (core 0, s= ocket 0) Code: Unable to access opcode bytes at 0xffffffffffffffd6. BUG: Bad rss-counter state mm:00000000d5d20a39 type:MM_FILEPAGES val:2 BUG: Bad rss-counter state mm:00000000d5d20a39 type:MM_ANONPAGES val:2 BUG: non-zero pgtables_bytes on freeing mm: 8192 The bug goes away if the file is not truncated, O_DIRECT is not used or tw= o different files are used. I've investigated the splice and iov_iter code and looked at what sendfile= () is doing in this case: (1) sendfile creates buffer pages and adds them into a pipe, does an, in = this case, DIO read into those pages, then calls the fs write_iter to writ= e the data to the file. (2) iov_iter_extract_pages() does not get refs/pins on the pages extracte= d from an ITER_PIPE iterator - but it shouldn't need to as the pipe hol= ds the refs. These pages are passed to DIO read - this op is synchronou= s, so any bios associated with it should be complete. (3) I enabled the page_ref tracepoints and added a page flag to limit it = to pages allocated by append_pipe(). This shows the buffer pipe pages b= eing added and I made it dump the list of them in __bio_release_pages() (w= hich I made non-optional in bio_release_pages()). (4) I added some extra page_ref_set tracepoints with weird "val" paramete= rs to add markers into the log. (5) I added a tracepoint to trace the lifetime of a bio struct and a flag= to turn on the tracing, set when the pageflag added in (3) is seen. Mos= t of the time I can see the bio being destroyed in the correct order with regard to the splice code, though occasionally there's a bit missing. (6) Substituting a fixed preallocated page for the page coming out of the pipe in iter_file_splice_write() doesn't get rid of the bug: - array[n].bv_page =3D buf->page; + array[n].bv_page =3D splice_tmp; (7) Getting an extra ref on the buffer pipe page and deliberately leaking= it gets rid of the problem. (8) Substituting a fixed preallocated page for the page sent to the DIO r= ead in iov_iter_extract_pipe_pages() gets rid of the problem. The pages going through the pipe seem to passed to write_iter with no issues. (9) I've tried instrumenting kmap() and co. to catch debug-marked pages b= eing accessed after they've been released, but didn't see anything. This might not catch if DMA is doing the corrupting. (10) On the notion that DMA might do the corrupting, I've tried adding a permanent ref on the pages, adding them to a list and scanning them occasionally - but that doesn't catch anything. (11) KASAN doesn't spot anything interesting - which might also suggest DMA-based corruption. But since we're dealing with the contents of pages, not the page structs themselves (I think), I'm not sure kasan would see spot anything. I'm wondering if the apparent interaction with sendfile/splice is actually= a red herring and that the page turnover that that induces is having an effe= ct. One thing I don't see is how commenting out ftruncate() should cause the problem to go away if it's something to do with the splice buffer pipe - though I guess ftruncate() would release a bunch of pages. Here's an excerpt from a trace of something I'd expect to see: page_ref_set: pfn=3D0x10e38c flags=3Ddebug_mark count=3D1 mapcount=3D0 = mapping=3D0 mt=3D0 val=3D777 page_ref_set: pfn=3D0x10e38c flags=3Ddebug_mark count=3D1 mapcount=3D0 = mapping=3D0 mt=3D0 val=3D666 bio: bio=3D00038d84 ADD-PG I=3D10e38c bio: bio=3D00038d84 END-IO I=3D0 page_ref_set: pfn=3D0x10e38c flags=3Ddebug_mark count=3D1 mapcount=3D0 = mapping=3D0 mt=3D0 val=3D623 bio: bio=3D00038d84 UNINIT I=3D0 bio_endio: bio=3D00038d84 iomap_dio_bio_end_io+0x0/0xec bio: bio=3D00038d84 REL-PG I=3D0 page_ref_set: pfn=3D0x10e38c flags=3Ddebug_mark count=3D1 mapcount=3D0 = mapping=3D0 mt=3D0 val=3D980 bio: bio=3D00038d84 FREE I=3D0 bio: bio=3D00038d84 UNINIT I=3D0 page_ref_set: pfn=3D0x10e38c flags=3Ddebug_mark count=3D1 mapcount=3D0 = mapping=3D0 mt=3D0 val=3D888 page_ref_mod_and_test: pfn=3D0x10e38c flags=3Ddebug_mark count=3D0 mapc= ount=3D0 mapping=3D0 mt=3D0 val=3D-1 ret=3D1 The weird val=3DN codes on page_ref_set lines are: 777 - The page iov_iter_extract_pipe_pages() got from append_pipe() 666 - __bio_add_page() adding a page 623 - bio_endio() logging a page 98n - __bio_release_pages() logging the nth page 888 - iter_file_splice_write() adding page to array[] But occasionally I'm seeing something like: page_ref_set: pfn=3D0x1102df flags=3Ddebug_mark count=3D1 mapcount=3D0 = mapping=3D0000000000000000 mt=3D0 val=3D777 page_ref_set: pfn=3D0x1102df flags=3Ddebug_mark count=3D1 mapcount=3D0 = mapping=3D0000000000000000 mt=3D0 val=3D666 bio: bio=3D0000e514 ADD-PG I=3D1102df page_ref_set: pfn=3D0x1102df flags=3Ddebug_mark count=3D1 mapcount=3D0 = mapping=3D0000000000000000 mt=3D0 val=3D888 page_ref_mod_and_test: pfn=3D0x1102df flags=3Ddebug_mark count=3D0 mapc= ount=3D0 mapping=3D0000000000000000 mt=3D0 val=3D-1 ret=3D1 though I'm not sure why. Could it be an attempt to read beyond the EOF? = I don't see the bio being torn down, but the page is passed to iter_file_splice_write() and released, despite for all I know still with outstanding I/O pending. Another possibility is that the bio flag got cleared. David --- #define _GNU_SOURCE = #include #include #include #include #include #include #define file_size 0x800 #define send_size 0x1dd00 #define repeat_count 1000 static char buffer[send_size]; int main(int argc, char *argv[]) { int in, out, i, wt; if (argc !=3D 2 || !argv[1][0]) { fprintf(stderr, "Usage: %s \n", argv[0]); exit(2); } for (i =3D 0; i < repeat_count; i++) { switch (fork()) { case -1: perror("fork"); exit(1); case 0: out =3D creat(argv[1], 0666); if (out < 0) { perror(argv[1]); exit(1); } if (ftruncate(out, file_size) < 0) { perror("ftruncate"); exit(1); } if (lseek(out, 0x200, SEEK_SET) < 0) { perror("lseek"); exit(1); } in =3D open(argv[1], O_RDONLY | O_DIRECT | O_NOFOLLOW); if (in < 0) { perror("open"); exit(1); } if (sendfile(out, in, NULL, send_size) < 0) { perror("sendfile"); exit(1); } exit(0); default: if (wait(&wt) < 0) { perror("wait"); exit(1); } break; } } exit(0); }