From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66061C4332F for ; Tue, 19 Oct 2021 03:39:52 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id ECE1561356 for ; Tue, 19 Oct 2021 03:39:51 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org ECE1561356 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 3D7296B0072; Mon, 18 Oct 2021 23:39:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 387726B0073; Mon, 18 Oct 2021 23:39:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 276006B0074; Mon, 18 Oct 2021 23:39:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0023.hostedemail.com [216.40.44.23]) by kanga.kvack.org (Postfix) with ESMTP id 155E36B0072 for ; Mon, 18 Oct 2021 23:39:51 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id C4EAE2D3A4 for ; Tue, 19 Oct 2021 03:39:50 +0000 (UTC) X-FDA: 78711782940.18.D0E53A0 Received: from mail-io1-f50.google.com (mail-io1-f50.google.com [209.85.166.50]) by imf11.hostedemail.com (Postfix) with ESMTP id 79C81F0000B5 for ; Tue, 19 Oct 2021 03:39:50 +0000 (UTC) Received: by mail-io1-f50.google.com with SMTP id d125so18786836iof.5 for ; Mon, 18 Oct 2021 20:39:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=n6lDICXPanQEczqPf1bcFgv6zFUKLXclwZT3Vx8an2c=; b=FMbqaiTr5PEeUP75Jb/aP4hyS7MEL/x+jux2gmx6RlE5rZ137vcbkE6G4dglSyX7Or 2Ld2QAK4iBk8ygsowwZ9vBNBin7fVmHNsN9J9SVvQA7Rk4NcLjfLJ4zzUx3Ks3vtsiQS yLthCRU2DkR1Oe4RDnFrc5JkgU/wNxmA53sq2HTbil7N6g0NSVOqA8hdo5MxqDkjhWJG Ol7Aqirb0ATLZF+WuBu9JcsNiSk7Vbz468S5J8q37lKK2tF06bpWrZl81OVgevJkGHzk 1IiVmrasxxAKWyMlY/ZYncPWfw2O8h/K1daXQ26p335GGkcsTzeemlFWyn4htDL5GG/z As/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=n6lDICXPanQEczqPf1bcFgv6zFUKLXclwZT3Vx8an2c=; b=x3gPDuhe2a/ohjMaLmINRecacFtDCtPTBO5ozHJz6W4bD6vE+qVvnt5lSqI+7EarUR gT2ab2T+VCNgjwoLmWv47eGSgLCM5WB78pa5rSmS6GIw4TBkGCGCwKK+Krp8+1Xl7JLu o1hV7SqkDbRJbIyVDTfe+ba9p69nQgdekit5MBSW6mR6BxM/GAEXxMcRgfndnZFTkw6b tHjcD9yrWArIwSzOpdwis0fQ0z20PA/KkV/4/7Rv0H+D28KpHxqBXC/LZ3kdgh18NihN 3sGibUsjS08vXYnsnz/RQ0VCXhBKKTtyeM+nalESDXS5pGG7xvsOI2Hyz/5/Ruu2PHLB hy6Q== X-Gm-Message-State: AOAM532ekXLxLWZrPkUmUwkQXD82Mobj8SyT+G5Gd99k4ltqE/ki4M8A q2KN+NkXBliWO5bRN03bZHq7sQTyAMd5mYyjr9c= X-Google-Smtp-Source: ABdhPJz4nuA7PmwfuBJ5tsyh2CNf4H3pm0RviWHWzHavFd+QgNNxCpNXGjvnjLX2bXNayB6lkBKOMknS//U/nml9C+c= X-Received: by 2002:a05:6602:2dd2:: with SMTP id l18mr17560574iow.86.1634614789863; Mon, 18 Oct 2021 20:39:49 -0700 (PDT) MIME-Version: 1.0 References: <20211018114349.b80a27af9bfa7f16162b0ec4@linux-foundation.org> In-Reply-To: <20211018114349.b80a27af9bfa7f16162b0ec4@linux-foundation.org> From: Zhengyuan Liu Date: Tue, 19 Oct 2021 11:39:38 +0800 Message-ID: Subject: Re: Problem with direct IO To: Andrew Morton Cc: viro@zeniv.linux.org.uk, tytso@mit.edu, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, mysql@lists.mysql.com, linux-ext4@vger.kernel.org, =?UTF-8?B?5YiY5LqR?= , Zhengyuan Liu Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 79C81F0000B5 X-Stat-Signature: 674mxqd6xu689bsw3rctbiety64ges5g Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=FMbqaiTr; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf11.hostedemail.com: domain of liuzhengyuang521@gmail.com designates 209.85.166.50 as permitted sender) smtp.mailfrom=liuzhengyuang521@gmail.com X-HE-Tag: 1634614790-208780 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Oct 19, 2021 at 2:43 AM Andrew Morton wrote: > > On Mon, 18 Oct 2021 09:09:06 +0800 Zhengyuan Liu wrote: > > > Ping. > > > > I think this problem is serious and someone may also encounter it in > > the future. > > > > > > On Wed, Oct 13, 2021 at 9:46 AM Zhengyuan Liu > > wrote: > > > > > > Hi, all > > > > > > we are encounting following Mysql crash problem while importing tables : > > > > > > 2021-09-26T11:22:17.825250Z 0 [ERROR] [MY-013622] [InnoDB] [FATAL] > > > fsync() returned EIO, aborting. > > > 2021-09-26T11:22:17.825315Z 0 [ERROR] [MY-013183] [InnoDB] > > > Assertion failure: ut0ut.cc:555 thread 281472996733168 > > > > > > At the same time , we found dmesg had following message: > > > > > > [ 4328.838972] Page cache invalidation failure on direct I/O. > > > Possible data corruption due to collision with buffered I/O! > > > [ 4328.850234] File: /data/mysql/data/sysbench/sbtest53.ibd PID: > > > 625 Comm: kworker/42:1 > > > > > > Firstly, we doubled Mysql has operating the file with direct IO and > > > buffered IO interlaced, but after some checking we found it did only > > > do direct IO using aio. The problem is exactly from direct-io > > > interface (__generic_file_write_iter) itself. > > > > > > ssize_t __generic_file_write_iter() > > > { > > > ... > > > if (iocb->ki_flags & IOCB_DIRECT) { > > > loff_t pos, endbyte; > > > > > > written = generic_file_direct_write(iocb, from); > > > /* > > > * If the write stopped short of completing, fall back to > > > * buffered writes. Some filesystems do this for writes to > > > * holes, for example. For DAX files, a buffered write will > > > * not succeed (even if it did, DAX does not handle dirty > > > * page-cache pages correctly). > > > */ > > > if (written < 0 || !iov_iter_count(from) || IS_DAX(inode)) > > > goto out; > > > > > > status = generic_perform_write(file, from, pos = iocb->ki_pos); > > > ... > > > } > > > > > > From above code snippet we can see that direct io could fall back to > > > buffered IO under certain conditions, so even Mysql only did direct IO > > > it could interleave with buffered IO when fall back occurred. I have > > > no idea why FS(ext3) failed the direct IO currently, but it is strange > > > __generic_file_write_iter make direct IO fall back to buffered IO, it > > > seems breaking the semantics of direct IO. > > That makes sense. > > > > The reproduced environment is: > > > Platform: Kunpeng 920 (arm64) > > > Kernel: V5.15-rc > > > PAGESIZE: 64K > > > Mysql: V8.0 > > > Innodb_page_size: default(16K) > > This is all fairly mature code, I think. Do you know if earlier > kernels were OK, and if so which versions? we have tested v4.18 and v4.19 and the problem is still here, the earlier version such before v4.12 doesn't support Arm64 well so we can't test. I think this problem has something to do with page size, if we change kernel page size from 64K to 4k or just set Innodb_page_size to 64K then we cannot reproduce this problem. Typically we use 4k as kernel page size and FS block size, if database use more than 4k as IO unit then it won't interleave for each IO in kernel page cache as each one will occupy one or more page cache, that means it is hard to trigger this problem on x84 or other platforms using 4k page size. But thing got changed when come to Arm64 64K page size, if database uses a smaller IO unit, in our Mysql case that is 16K DIO, then two IO could share one page cache and if one falls back to buffered IO it can trigger the problem. For example, aio got two direct IO which share the same page cache to write , it dispatched the first one to storage and begin process the second one before the first one completed, if the second one fall back to buffered IO it will been copy to page cache and mark the page as dirty, upon that the first one completed it will check and invalidate it's page cache, if it is dirty then the problem occured. If my analysis isn't correct please point it out, thanks.