From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2D22DC433F5 for ; Thu, 21 Oct 2021 12:11:57 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C3A5B61260 for ; Thu, 21 Oct 2021 12:11:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org C3A5B61260 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id E6A82940009; Thu, 21 Oct 2021 08:11:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E18D4900002; Thu, 21 Oct 2021 08:11:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CE118940009; Thu, 21 Oct 2021 08:11:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0242.hostedemail.com [216.40.44.242]) by kanga.kvack.org (Postfix) with ESMTP id BFDC5900002 for ; Thu, 21 Oct 2021 08:11:55 -0400 (EDT) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 7071C8249980 for ; Thu, 21 Oct 2021 12:11:55 +0000 (UTC) X-FDA: 78720330990.28.380AA15 Received: from mail-io1-f54.google.com (mail-io1-f54.google.com [209.85.166.54]) by imf24.hostedemail.com (Postfix) with ESMTP id 34D09B0000A0 for ; Thu, 21 Oct 2021 12:11:52 +0000 (UTC) Received: by mail-io1-f54.google.com with SMTP id h196so628714iof.2 for ; Thu, 21 Oct 2021 05:11:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=MX4RTUKipBxn3GCRAnSGv3+ePA+aZeTzUhawOp96TW8=; b=PQ/IuCFj6sfDuj4uFzeSnYRXNd9+IoguIe2jXnM7jxhtYB0uQ5A3jtgH7mlrvuuM9c aw0C8rf0ErqIwyB9aKV4/uS9fcGdlCgpAxC3Xyad4rmxNKzm3obkMVRJd2Y/1um9I200 UmzuQKHWlAU84DoRptaFjqIz7Grvxy3dcSUxRwrN2pplTXc07nH4qUs6sdfyJf5P3p+6 3fCBdQXnl/xthhJ79UkhUe03FhTy13NaZY3c3jQ5wOx7YgerHY3cMnNLtqtUlbcOhksm 9fLu+02U/s1hOgwbW3XftBY+P6PpYGe0Z70w+GO1L4GXcqvw51UmSvXbQbjkErKUiN5u T0hA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=MX4RTUKipBxn3GCRAnSGv3+ePA+aZeTzUhawOp96TW8=; b=2sLIacHxisaz9jb6YQSWNonV0fRafBhmFElqZMCUky41AEKDlUyhQLwrNFgkcPmY5j ppl/LbRjQP1gQlUIpWZjbqMVyIv2bQAesvncfvtUKoYpL7RGfRvE2nPubGGEcfq/Svxz wOiur7uvDieW4vZboob+D1O9vamFOPK6HsgYZmX7rd9UO/cHhMbTqnhvUrQ37mxqiP0g qa/o43Z0yJ8qtds+9wpA1hio0adnDT9xBKIBWFPOFuHghRKZJrRyn6UsFkm6wXdsVYx2 h8Rw1WQjTe0cSP3iiwdt14ow+WrSo8L48gpih6AqQYfzBf8YB9c5J6F5dSNKmT8Ev+Xx BgfQ== X-Gm-Message-State: AOAM533oQqtNpVJJUrnNqK+2w6hfDgu04zYB2SOdQP7veIletKOKM0TH 3VdoeOoRb2P6SVE3+YB2GMAE0zscLbnC57WFxck= X-Google-Smtp-Source: ABdhPJxV0eGA4bNjneusaPELr/dMv86GyC9/N6znoBxjMYTCykwAIxGIu48HEmACrqNXko9UhJPiObo6SbLSmM3GG08= X-Received: by 2002:a6b:7302:: with SMTP id e2mr3734611ioh.41.1634818314460; Thu, 21 Oct 2021 05:11:54 -0700 (PDT) MIME-Version: 1.0 References: <20211020173729.GF16460@quack2.suse.cz> <20211021080304.GB5784@quack2.suse.cz> In-Reply-To: <20211021080304.GB5784@quack2.suse.cz> From: Zhengyuan Liu Date: Thu, 21 Oct 2021 20:11:43 +0800 Message-ID: Subject: Re: Problem with direct IO To: Jan Kara Cc: viro@zeniv.linux.org.uk, Andrew Morton , tytso@mit.edu, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-ext4@vger.kernel.org, =?UTF-8?B?5YiY5LqR?= , Zhengyuan Liu Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 34D09B0000A0 X-Stat-Signature: yh8pcokuxxhjaaifwq88brq5a3jhr5w8 Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="PQ/IuCFj"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf24.hostedemail.com: domain of liuzhengyuang521@gmail.com designates 209.85.166.54 as permitted sender) smtp.mailfrom=liuzhengyuang521@gmail.com X-HE-Tag: 1634818312-242266 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Oct 21, 2021 at 4:03 PM Jan Kara wrote: > > On Thu 21-10-21 10:21:55, Zhengyuan Liu wrote: > > On Thu, Oct 21, 2021 at 1:37 AM Jan Kara wrote: > > > On Wed 13-10-21 09:46:46, Zhengyuan Liu wrote: > > > > we are encounting following Mysql crash problem while importing tables : > > > > > > > > 2021-09-26T11:22:17.825250Z 0 [ERROR] [MY-013622] [InnoDB] [FATAL] > > > > fsync() returned EIO, aborting. > > > > 2021-09-26T11:22:17.825315Z 0 [ERROR] [MY-013183] [InnoDB] > > > > Assertion failure: ut0ut.cc:555 thread 281472996733168 > > > > > > > > At the same time , we found dmesg had following message: > > > > > > > > [ 4328.838972] Page cache invalidation failure on direct I/O. > > > > Possible data corruption due to collision with buffered I/O! > > > > [ 4328.850234] File: /data/mysql/data/sysbench/sbtest53.ibd PID: > > > > 625 Comm: kworker/42:1 > > > > > > > > Firstly, we doubled Mysql has operating the file with direct IO and > > > > buffered IO interlaced, but after some checking we found it did only > > > > do direct IO using aio. The problem is exactly from direct-io > > > > interface (__generic_file_write_iter) itself. > > > > > > > > ssize_t __generic_file_write_iter() > > > > { > > > > ... > > > > if (iocb->ki_flags & IOCB_DIRECT) { > > > > loff_t pos, endbyte; > > > > > > > > written = generic_file_direct_write(iocb, from); > > > > /* > > > > * If the write stopped short of completing, fall back to > > > > * buffered writes. Some filesystems do this for writes to > > > > * holes, for example. For DAX files, a buffered write will > > > > * not succeed (even if it did, DAX does not handle dirty > > > > * page-cache pages correctly). > > > > */ > > > > if (written < 0 || !iov_iter_count(from) || IS_DAX(inode)) > > > > goto out; > > > > > > > > status = generic_perform_write(file, from, pos = iocb->ki_pos); > > > > ... > > > > } > > > > > > > > From above code snippet we can see that direct io could fall back to > > > > buffered IO under certain conditions, so even Mysql only did direct IO > > > > it could interleave with buffered IO when fall back occurred. I have > > > > no idea why FS(ext3) failed the direct IO currently, but it is strange > > > > __generic_file_write_iter make direct IO fall back to buffered IO, it > > > > seems breaking the semantics of direct IO. > > > > > > > > The reproduced environment is: > > > > Platform: Kunpeng 920 (arm64) > > > > Kernel: V5.15-rc > > > > PAGESIZE: 64K > > > > Mysql: V8.0 > > > > Innodb_page_size: default(16K) > > > > > > Thanks for report. I agree this should not happen. How hard is this to > > > reproduce? Any idea whether the fallback to buffered IO happens because > > > iomap_dio_rw() returns -ENOTBLK or because it returns short write? > > > > It is easy to reproduce in my test environment, as I said in the previous > > email replied to Andrew this problem is related to kernel page size. > > Ok, can you share a reproducer? I don't have a simple test case to reproduce, the whole procedure shown as following is somewhat complex. 1. Prepare Mysql installation environment a. Prepare a SSD partition (at least 100G) as the Mysql data partition, format to Ext3 and mount to /data # mkfs.ext3 /dev/sdb1 # mount /dev/sdb1 /data b. Create Mysql user and user group # groupadd mysql # useradd -g mysql mysql c. Create Mysql directory # mkdir -p /data/mysql # cd /data/mysql # mkdir data tmp run log 2. Install Mysql a. Download mysql-8.0.25-1.el8.aarch64.rpm-bundle.tar from https://downloads.mysql.com/archives/community/ b. Install Mysql # tar -xvf mysql-8.0.25-1.el8.aarch64.rpm-bundle.tar # yum install openssl openssl-devel # rpm -ivh mysql-community-common-8.0.25-1.el8.aarch64.rpm mysql-community-client-plugins-8.0.25-1.el8.aarch64.rpm \ mysql-community-libs-8.0.25-1.el8.aarch64.rp mysql-community-client-8.0.25-1.el8.aarch64.rpm \ mysql-community-server-8.0.25-1.el8.aarch64.rpm mysql-community-devel-8.0.25-1.el8.aarch64.rpm 3. Configure Mysql a. # chown mysql:mysql /etc/my.cnf b. # vim /etc/my.cnf innodb_flush_method = O_DIRECT default-storage-engine=INNODB datadir=/data/mysql/data socket=/data/mysql/run/mysql.sock tmpdir=/data/mysql/tmp log-error=/data/mysql/log/mysqld.log pid-file=/data/mysql/run/mysqld.pid port=3306 user=mysql c. initialize Mysql (problem may reproduce at this stage) # mysqld --defaults-file=/etc/my.cnf --initialize d. Start Mysql # mysqld --defaults-file=/etc/my.cnf & e. Login into Mysql # mysql -uroot -p -S /data/mysql/run/mysql.sock You can see the temporary password from step 3.c f. Configure access mysql> alter user 'root'@'localhost' identified by "123456"; mysql> create user 'root'@'%' identified by '123456'; mysql> grant all privileges on *.* to 'root'@'%'; flush privileges; mysql> create database sysbench; 4. Use sysbench to test Mysql a. Install sysbench from https://github.com/akopytov/sysbench/archive/master.zip b. Use following script to reproduce problem (may need dozens of minutes) while true ; do sysbench /usr/local/share/sysbench/oltp_write_only.lua --table-size=1000000 --tables=100 \ --threads=32 --db-driver=mysql --mysql-db=sysbench --mysql-host=127.0.0.1 --mysql- port=3306 \ --mysql-user=root --mysql-password=123456 --mysql-socket=/var/lib/mysql/mysql.sock prepare sleep 5 sysbench /usr/local/share/sysbench/oltp_write_only.lua --table-size=1000000 --tables=100 \ --threads=32 --db-driver=mysql --mysql-db=sysbench --mysql-host=127.0.0.1 --mysql- port=3306 \ --mysql-user=root --mysql-password=123456 --mysql-socket=/var/lib/mysql/mysql.sock cleanup sleep 5 done If you can't reproduce, we could provide a remote environment for you or connect to your machine to build a reproduced environment. > > > Can you post output of "dumpe2fs -h " for the filesystem where the > > > problem happens? Thanks! > > > > Sure, the output is: > > > > # dumpe2fs -h /dev/sda3 > > dumpe2fs 1.45.3 (14-Jul-2019) > > Filesystem volume name: > > Last mounted on: /data > > Filesystem UUID: 09a51146-b325-48bb-be63-c9df539a90a1 > > Filesystem magic number: 0xEF53 > > Filesystem revision #: 1 (dynamic) > > Filesystem features: has_journal ext_attr resize_inode dir_index > > filetype needs_recovery sparse_super large_file > > Thanks for the data. OK, a filesystem without extents. Does your test by > any chance try to do direct IO to a hole in a file? Because that is not > (and never was) supported without extents. Also the fact that you don't see > the problem with ext4 (which means extents support) would be pointing in > that direction. I am not sure if it trys to do direct IO to a hole or not, is there any way to check? If you have a simple test to reproduce please let me know, we are glad to try. Thanks,