From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 10C1DC4167B for ; Mon, 4 Dec 2023 12:11:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8C88F6B028F; Mon, 4 Dec 2023 07:11:25 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 878936B0291; Mon, 4 Dec 2023 07:11:25 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 767CF6B02BD; Mon, 4 Dec 2023 07:11:25 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 632636B028F for ; Mon, 4 Dec 2023 07:11:25 -0500 (EST) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 2A2F31201F0 for ; Mon, 4 Dec 2023 12:11:25 +0000 (UTC) X-FDA: 81529020930.05.8096EB5 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) by imf05.hostedemail.com (Postfix) with ESMTP id 9BE43100006 for ; Mon, 4 Dec 2023 12:11:22 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=uFq8rxm+; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=B8ysIvT1; spf=pass (imf05.hostedemail.com: domain of jack@suse.cz designates 195.135.223.131 as permitted sender) smtp.mailfrom=jack@suse.cz; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701691883; a=rsa-sha256; cv=none; b=exnuJWrftBlvYS4XiWerTv9dZQ13RzlOIVNocGHEhC+DZ4vFKf+Z7H0Rq9upj91hLElfzY hRcUiH5QaEMOHhWxmcntM4VmgxfN0bug/RjZc2/ro2xOlQkB9BJP+7TAq0wND+7TfRFYln kXe+A3sLDsnPRF10qkSmxnwUUoQgYl8= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=uFq8rxm+; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=B8ysIvT1; spf=pass (imf05.hostedemail.com: domain of jack@suse.cz designates 195.135.223.131 as permitted sender) smtp.mailfrom=jack@suse.cz; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701691883; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=dxTUWdHIx6xxX7iZlpUHoKFG80kLdXjhaKtXPLn8ws4=; b=PrwG4yViAXKaXt/0E5r5xQZVDlQXURq4/sNOiPbpwtBkK62XFB3BhRE0l0A5KcKfy0HcBB Zm1K2+B6246bM/LoOORbuVlJS1yXJ6wgjwFKmAVIedUGLRkbDTq2dxeqwDx0PfLuObK0BJ JfAu7ftW59Ex+b+E50wp0mQojclD7b0= Received: from imap2.dmz-prg2.suse.org (imap2.dmz-prg2.suse.org [10.150.64.98]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 9D8B91F8A6; Mon, 4 Dec 2023 12:11:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1701691880; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=dxTUWdHIx6xxX7iZlpUHoKFG80kLdXjhaKtXPLn8ws4=; b=uFq8rxm+pamkzbrSlBWbhNk6X9kePTtzzfNTDEjQ5TM90jz76mmWEIPyl6e0F4Y+ijFpc0 Yb1WUrtWT1EcVnujTM5UmYRM6fjJOZItU+CToU/OGPQ3NztLs0XKpSAiWYryK0rDHRHe6G wZ8lxFB0mlnWA+0lk/74j8HDuKM0p2U= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1701691880; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=dxTUWdHIx6xxX7iZlpUHoKFG80kLdXjhaKtXPLn8ws4=; b=B8ysIvT1UHTOEG1B88VZ4Di3VCaA2VlxLOCSxWQcQ09vJ9txnYpN5CoXv0mAfmQk27BflX G3d54fWVSpZ46xAQ== Received: from imap2.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap2.dmz-prg2.suse.org (Postfix) with ESMTPS id 8C42E139E2; Mon, 4 Dec 2023 12:11:20 +0000 (UTC) Received: from dovecot-director2.suse.de ([10.150.64.162]) by imap2.dmz-prg2.suse.org with ESMTPSA id MGpgIujBbWXyXwAAn2gu4w (envelope-from ); Mon, 04 Dec 2023 12:11:20 +0000 Received: by quack3.suse.cz (Postfix, from userid 1000) id 0F868A07DB; Mon, 4 Dec 2023 13:11:20 +0100 (CET) Date: Mon, 4 Dec 2023 13:11:20 +0100 From: Jan Kara To: Baokun Li Cc: linux-mm@kvack.org, linux-ext4@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, willy@infradead.org, akpm@linux-foundation.org, ritesh.list@gmail.com, linux-kernel@vger.kernel.org, yi.zhang@huawei.com, yangerkun@huawei.com, yukuai3@huawei.com Subject: Re: [PATCH -RFC 0/2] mm/ext4: avoid data corruption when extending DIO write race with buffered read Message-ID: <20231204121120.mpxntey47rluhcfi@quack3> References: <20231202091432.8349-1-libaokun1@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20231202091432.8349-1-libaokun1@huawei.com> X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 9BE43100006 X-Stat-Signature: yubm1dza4rmotnk1ghnjz4ok3fz1deo4 X-Rspam-User: X-HE-Tag: 1701691882-534368 X-HE-Meta: U2FsdGVkX1+Qh6D2FfYw1J+KCNVyRoUNZRu0FnvUuEVYDSTEdtTPs0pAvpe5GGNB537Ca0IAWZg8ry4iOI31IxS3E0ieMnZNk5I/qPxuppADkgA6HkHula8EiUXO7gI40w7Pu8wHwzeJvCFb7aZD8WepdKXGeoqlhDzoM33JupfsOXQ9Aq1heix8rCkmYvaO+34PFAwC+mLPfNTngwvgM76rQUNg9PlJu95ise/f1TS3HSJ+sGAWUxUmnxsvhGI36Fy5jxcAA3CilJ/ULfpN+d4UOFVuLVGBMe0yN0h0vx6T5loXNLqUdAlCShQgcg9/E9JBYFikzlTSPhbpX1uGq4aWye4hUMNbZTXHut131BCgeIPtAU9AZmpqvnOjH0dIF4APsldZFL+h2HcD5xFpRuu1wkl+f/MvDZugka5geQPHrSW3rfqTeh2oaAqTM9SdT494IpVscBStuF4sEa26YBSl+V308U+52/EF2vMn/LTKzlewav269RN2OspyBnbm4pUe/UsYmkheI/Ck5VcS7Nnf7K2GCKJh9VOXlh97j1JijBN5FE3HUkd3/nrVwzDQ0DrHH0oGHsrlVL/AVNu7fsQlKSBbSTw6lPQ1j0UX3Ni3+lkHEze6DeXtEiFFCMfKhLj54JjzdWZ8D9+Dt2n1ikFPoypWis95zXDkaUbuZuVTAF7t36DKlD27EtvXe955Pu0iZoLhT5mK/WVlw5+TscqOiDhrl1po4Z3SdUsut4Lu1Ok0BXLz0GO+YpDjBivjw4eK/6UXh0XamE2xAQINGfth1TA6I7gAeo+0YeT5w7sDYkG4O+7XtN/FzJwFiUPFmgULFUp/+Rt8nkXnPEhDx1glTOaekw7Xqx13e63G9jwlT6NJ8VIW/b9HYyp4QLCenps7JBtm+SRmlQD5ZvqDrSk3hVa6LNEDupKjb/FlibKU10llELVjqq0g0CLtkIFLwKw0Ng32PCpba88jTJ7 aMwlhwbs QfcvqwYv0aX56aNivejWfZQ+H9scUvYzrmb1XXQsLCx715X2GZQR4TYJoKf+QLLXTTW2cZvWvf/8umQzMh9ltg7Hkcl2gNk/az4VAuQH3/QT3Tj1y6yrhcf4srPP9ziBssv0JWZXbEZslKWYq07kKUHfDCnEwIrTrEOFUNHO/c2lMVBgEfHVXjA7bD0VLQd7KViey/OWfrqmPQuKcaD+jngTRztPaQK7TTwfNDX5X4wpvLbj8hpfhHNZyGBSXfUEX6ccQrVLGwj8ccw7jhsEPPRkkgck272TNWibNQZ2RS7qVob6uLr9dxkUb9/iHTYTI+G5FVJtdOoaHxFo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hello! On Sat 02-12-23 17:14:30, Baokun Li wrote: > Recently, while running some pressure tests on MYSQL, noticed that > occasionally a "corrupted data in log event" error would be reported. > After analyzing the error, I found that extending DIO write and buffered > read were competing, resulting in some zero-filled page end being read. > Since ext4 buffered read doesn't hold an inode lock, and there is no > field in the page to indicate the valid data size, it seems to me that > it is impossible to solve this problem perfectly without changing these > two things. Yes, combining buffered reads with direct IO writes is a recipe for problems and pretty much in the "don't do it" territory. So honestly I'd consider this a MYSQL bug. Were you able to identify why does MYSQL use buffered read in this case? It is just something specific to the test you're doing? > In this series, the first patch reads the inode size twice, and takes the > smaller of the two values as the copyout limit to avoid copying data that > was not actually read (0-padding) into the user buffer and causing data > corruption. This greatly reduces the probability of problems under 4k > page. However, the problem is still easily triggered under 64k page. > > The second patch waits for the existing dio write to complete and > invalidate the stale page cache before performing a new buffered read > in ext4, avoiding data corruption by copying the stale page cache to > the user buffer. This makes it much less likely that the problem will > be triggered in a 64k page. > > Do we have a plan to add a lock to the ext4 buffered read or a field in > the page that indicates the size of the valid data in the page? Or does > anyone have a better idea? No, there are no plans to address this AFAIK. Because such locking will slow down all the well behaved applications to fix a corner case for application doing unsupported things. Sure we must not crash the kernel, corrupt the filesystem or leak sensitive (e.g. uninitialized) data if app combines buffered and direct IO but returning zeros instead of valid data is in my opinion fully within the range of acceptable behavior for such case. Honza -- Jan Kara SUSE Labs, CR