From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E6A8EC433F5 for ; Thu, 21 Oct 2021 02:22:09 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 718A760E76 for ; Thu, 21 Oct 2021 02:22:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 718A760E76 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 9AB086B0071; Wed, 20 Oct 2021 22:22:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 95B256B0072; Wed, 20 Oct 2021 22:22:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7AC1D6B0073; Wed, 20 Oct 2021 22:22:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0124.hostedemail.com [216.40.44.124]) by kanga.kvack.org (Postfix) with ESMTP id 661BF6B0071 for ; Wed, 20 Oct 2021 22:22:08 -0400 (EDT) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 200C1181AF5D3 for ; Thu, 21 Oct 2021 02:22:08 +0000 (UTC) X-FDA: 78718844736.01.4FFA212 Received: from mail-io1-f54.google.com (mail-io1-f54.google.com [209.85.166.54]) by imf25.hostedemail.com (Postfix) with ESMTP id 9A9C6B00008E for ; Thu, 21 Oct 2021 02:22:03 +0000 (UTC) Received: by mail-io1-f54.google.com with SMTP id 188so27053563iou.12 for ; Wed, 20 Oct 2021 19:22:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=WoTi8ousAru3u0x0PqEa5h1DYc5xyn/i9aNhj594NLQ=; b=mko1nnW1L4r6x8ovrfGJ8Vdqyy3gMW9Y2Uifz5qYgK1j7sAOD/Gr368bSaNruSQpgG QdzA7NYFPgL74ysqR2aEWuctR1mC6Hx90V1GdSJIy8jWgYm86DuUmi4GD1VJReM4Un+S 1/0QzJFhnz3oOJtW+rrIsw1w5Py469OZJuD7aRVnNf0yFYg5i8m+xDZRIEa4x1LxTcQu Z4cfN49q1mzBOYsGOe5te55GfeFkB5AHDPBiqn8KiYbAklZi3ID7wTjYqEwSQ8kMwgRI yqcHnE9+i0qRl3R0M8ANV2z7N27AimcGjhv8iCFD8FleAuwkD9yEOlV1YnNNTGi9Mb+Z QwQw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=WoTi8ousAru3u0x0PqEa5h1DYc5xyn/i9aNhj594NLQ=; b=Ye58lFbUOetGtd8axh5hux8h8LG6z3Yr6eKiGAxmsTPcn/lQWFQ23eAfza6JmBOizZ aaqlmiUEnGpnNCZwnEgH8AkdLqclKYd78IZ1zeMOBOY1Q8B48Tav58fVCuv2AIRqMBe9 ZkkJRrtzJmiZoUA3uU4vV+DxRb3kE1su7whHK/fMfVxlTPB6V+8T1xMm9d8X0niyOusu cbS+81aaLGgiXnXZcD9gi0YI+YssFIdE2fVm+L/JwwzOU3cONdkdPuvsgIxFudUqobCU 9FTLHaXWVozxi107i9lScxbsShJaOtlSOlcOWSkMAGWFl3+VgeRq3TIG/ZJpdotdEJ9B EVaQ== X-Gm-Message-State: AOAM5320QqCGaDEv0EV9ZSh5sXezCw56+RgWQKfGVddYoWaYiZyfvsRX De/3D+19T8nG/PruzEM9vais53T1JQRk702jjFw= X-Google-Smtp-Source: ABdhPJyQLft+phhyBF9CudOwsTpzEFxOcJktVFW5DovKAmrYGrQGzGN/FUzB39UwIqIwczFgH6/yTAVqDfmW4Dex8SE= X-Received: by 2002:a6b:7302:: with SMTP id e2mr2037346ioh.41.1634782927264; Wed, 20 Oct 2021 19:22:07 -0700 (PDT) MIME-Version: 1.0 References: <20211020173729.GF16460@quack2.suse.cz> In-Reply-To: <20211020173729.GF16460@quack2.suse.cz> From: Zhengyuan Liu Date: Thu, 21 Oct 2021 10:21:55 +0800 Message-ID: Subject: Re: Problem with direct IO To: Jan Kara Cc: viro@zeniv.linux.org.uk, Andrew Morton , tytso@mit.edu, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-ext4@vger.kernel.org, =?UTF-8?B?5YiY5LqR?= , Zhengyuan Liu Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 9A9C6B00008E X-Stat-Signature: 4ejqqio6w9xumx7mzi1rqnowmoskzqiu Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=mko1nnW1; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf25.hostedemail.com: domain of liuzhengyuang521@gmail.com designates 209.85.166.54 as permitted sender) smtp.mailfrom=liuzhengyuang521@gmail.com X-HE-Tag: 1634782923-921026 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Oct 21, 2021 at 1:37 AM Jan Kara wrote: > > On Wed 13-10-21 09:46:46, Zhengyuan Liu wrote: > > Hi, all > > > > we are encounting following Mysql crash problem while importing tables = : > > > > 2021-09-26T11:22:17.825250Z 0 [ERROR] [MY-013622] [InnoDB] [FATAL] > > fsync() returned EIO, aborting. > > 2021-09-26T11:22:17.825315Z 0 [ERROR] [MY-013183] [InnoDB] > > Assertion failure: ut0ut.cc:555 thread 281472996733168 > > > > At the same time , we found dmesg had following message: > > > > [ 4328.838972] Page cache invalidation failure on direct I/O. > > Possible data corruption due to collision with buffered I/O! > > [ 4328.850234] File: /data/mysql/data/sysbench/sbtest53.ibd PID: > > 625 Comm: kworker/42:1 > > > > Firstly, we doubled Mysql has operating the file with direct IO and > > buffered IO interlaced, but after some checking we found it did only > > do direct IO using aio. The problem is exactly from direct-io > > interface (__generic_file_write_iter) itself. > > > > ssize_t __generic_file_write_iter() > > { > > ... > > if (iocb->ki_flags & IOCB_DIRECT) { > > loff_t pos, endbyte; > > > > written =3D generic_file_direct_write(iocb, from); > > /* > > * If the write stopped short of completing, fall back = to > > * buffered writes. Some filesystems do this for write= s to > > * holes, for example. For DAX files, a buffered write= will > > * not succeed (even if it did, DAX does not handle dir= ty > > * page-cache pages correctly). > > */ > > if (written < 0 || !iov_iter_count(from) || IS_DAX(inod= e)) > > goto out; > > > > status =3D generic_perform_write(file, from, pos =3D io= cb->ki_pos); > > ... > > } > > > > From above code snippet we can see that direct io could fall back to > > buffered IO under certain conditions, so even Mysql only did direct IO > > it could interleave with buffered IO when fall back occurred. I have > > no idea why FS(ext3) failed the direct IO currently, but it is strange > > __generic_file_write_iter make direct IO fall back to buffered IO, it > > seems breaking the semantics of direct IO. > > > > The reproduced environment is: > > Platform: Kunpeng 920 (arm64) > > Kernel: V5.15-rc > > PAGESIZE: 64K > > Mysql: V8.0 > > Innodb_page_size: default(16K) > > Thanks for report. I agree this should not happen. How hard is this to > reproduce? Any idea whether the fallback to buffered IO happens because > iomap_dio_rw() returns -ENOTBLK or because it returns short write? It is easy to reproduce in my test environment, as I said in the previous e= mail replied to Andrew this problem is related to kernel page size. > Can you post output of "dumpe2fs -h " for the filesystem where th= e > problem happens? Thanks! Sure, the output is: # dumpe2fs -h /dev/sda3 dumpe2fs 1.45.3 (14-Jul-2019) Filesystem volume name: Last mounted on: /data Filesystem UUID: 09a51146-b325-48bb-be63-c9df539a90a1 Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery sparse_super large_file Filesystem flags: unsigned_directory_hash Default mount options: user_xattr acl Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 11034624 Block count: 44138240 Reserved block count: 2206912 Free blocks: 43168100 Free inodes: 11034613 First block: 0 Block size: 4096 Fragment size: 4096 Reserved GDT blocks: 1013 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 8192 Inode blocks per group: 512 Filesystem created: Thu Oct 21 09:42:03 2021 Last mount time: Thu Oct 21 09:43:36 2021 Last write time: Thu Oct 21 09:43:36 2021 Mount count: 1 Maximum mount count: -1 Last checked: Thu Oct 21 09:42:03 2021 Check interval: 0 () Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 256 Required extra isize: 32 Desired extra isize: 32 Journal inode: 8 Default directory hash: half_md4 Directory Hash Seed: a7b04e61-1209-496d-ab9d-a51009b51ddb Journal backup: inode blocks Journal features: journal_incompat_revoke Journal size: 1024M Journal length: 262144 Journal sequence: 0x00000002 Journal start: 1 BTW=EF=BC=8C we have also tested Ext4 and XFS and didn't see direct write = fallback. Thanks,