From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 635B8C4167B for ; Tue, 5 Dec 2023 04:18:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CAFD36B0088; Mon, 4 Dec 2023 23:18:33 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C60996B0089; Mon, 4 Dec 2023 23:18:33 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B279C6B0092; Mon, 4 Dec 2023 23:18:33 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 9DE916B0088 for ; Mon, 4 Dec 2023 23:18:33 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 72495A04EC for ; Tue, 5 Dec 2023 04:18:33 +0000 (UTC) X-FDA: 81531458106.09.18E804D Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) by imf16.hostedemail.com (Postfix) with ESMTP id 389C4180007 for ; Tue, 5 Dec 2023 04:18:31 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=mit.edu header.s=outgoing header.b=H10iMOp8; spf=pass (imf16.hostedemail.com: domain of tytso@mit.edu designates 18.9.28.11 as permitted sender) smtp.mailfrom=tytso@mit.edu; dmarc=pass (policy=none) header.from=mit.edu ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701749911; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1o/bCjAzFZcpDU/Upo0AbbDspDxX8m5wngzphZwiRqg=; b=AidJRW4KJnBfIcuLmKOq9GtuerxufdatLLHO03ZeczNQLIpHgKCNfAsacS4uScAuc9tLnI oOpVnMmlV1kkVQTC93j0wfz1yyAdC3Q7FPbJQdyf4X4x4Kd+shu6HR8N0yTxOXZ45+vbD1 Q5ga5H5VAJUyXAA7iiVij8bcZFAXdWQ= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=mit.edu header.s=outgoing header.b=H10iMOp8; spf=pass (imf16.hostedemail.com: domain of tytso@mit.edu designates 18.9.28.11 as permitted sender) smtp.mailfrom=tytso@mit.edu; dmarc=pass (policy=none) header.from=mit.edu ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701749911; a=rsa-sha256; cv=none; b=0onUi+BhD5dkUE8N3SGenNbZeU2fD0asagf6k2udDWqIkp3RagXD4fLiO4O7czjB/8j8Sn 35mWz0pgLO4zbzidcl1TDdXr0Sq/7wk2wprKX4+Fc5sbLeZAFbU+b6F86qYsGHfAJiJkqN dDZM/4YMxSOrmvhfKHE7JE49iaLHyrQ= Received: from cwcc.thunk.org (pool-173-48-111-98.bstnma.fios.verizon.net [173.48.111.98]) (authenticated bits=0) (User authenticated as tytso@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 3B54HtgF018973 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 4 Dec 2023 23:17:56 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mit.edu; s=outgoing; t=1701749878; bh=1o/bCjAzFZcpDU/Upo0AbbDspDxX8m5wngzphZwiRqg=; h=Date:From:Subject:Message-ID:MIME-Version:Content-Type; b=H10iMOp8jBXz1T7Z61RNNYKslJY5CKV5RMwsoIj9e4TDGASxctktv/iVv1QW+BMvg jQYPIMpSWDkg5pI3OUhDZ2D/x9DU3kGX4qchiwQESXfI+wH0bVaYmVZVXQscdAhF2T K1gjb9tWQCZf7J6pSjyu1A1wusFlDSDQUU+Svs2Il1zhQL1dQwOtQQ1smqxSFW6M6P L78KDa/yQf1CX+O+KObZBDCazLk9OX7LNf5kl7DDXy7S5qY7fIPRJUcYTQ4lpWF0RI h8AsQD/KUSnXWP1QznGiyfYHvtfDH25SzYXXmUupfpr3QvdKarrD/9RpALMnH60HiK IDsfhqCNUNdcA== Received: by cwcc.thunk.org (Postfix, from userid 15806) id 5294A15C02E0; Mon, 4 Dec 2023 23:17:55 -0500 (EST) Date: Mon, 4 Dec 2023 23:17:55 -0500 From: "Theodore Ts'o" To: Baokun Li Cc: Jan Kara , linux-mm@kvack.org, linux-ext4@vger.kernel.org, adilger.kernel@dilger.ca, willy@infradead.org, akpm@linux-foundation.org, ritesh.list@gmail.com, linux-kernel@vger.kernel.org, yi.zhang@huawei.com, yangerkun@huawei.com, yukuai3@huawei.com Subject: Re: [PATCH -RFC 0/2] mm/ext4: avoid data corruption when extending DIO write race with buffered read Message-ID: <20231205041755.GG509422@mit.edu> References: <20231202091432.8349-1-libaokun1@huawei.com> <20231204121120.mpxntey47rluhcfi@quack3> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspamd-Queue-Id: 389C4180007 X-Rspam-User: X-Stat-Signature: 1k1w19uzomsjuz49jybkeuz6pzh977u3 X-Rspamd-Server: rspam01 X-HE-Tag: 1701749911-47059 X-HE-Meta: U2FsdGVkX1+tCtP9FPSbdHgD96wFlyK/ENupd52ra1j8aSdvpLkkP4eMGLBUe1RoS+RhrkKXu94OwNeTciq2a+uChX0IQCXhoQdSlM48O+Y5krPiIlTtkSwmeib5/DqVQQ5bj+7f5qcYifBOxT2NPFIRZ5VvzpT8ihOCdEC2TNjUeMQviemgXzAp9iwg1QTYuwzLEz3yJrEsmp9L0VQrBMiWciKUs8kmAN0+ST3K5SifNNRoXqtfYsxCI6bqM5P4UwScD/hCdTX36NrCDQSBGSnEMgL4h3XL7fRM1ceQD2HOlhON9s2Dwt6O7RdJS5nHOjupfIOeKMhXI0yQLlr1eLWrRr5e4qo7efrafhSgVduGYxPL7tkWw8B64C3ywASq80JE9XGpHFbsOC5oWRtg8iT4pFEQfvF4bPKoo072KRduuDVrw4ZlXCitXVbikF7yPBtKKLDKa0uUQ6FsnlS3JKarETHAzWw4J53arBOc99yuWBuJp2emplBQbnluwu5C4Nt5uhlWqoC8f9yb66KJtaVPxMXofEoBCG4vnIiWxe496ZqxzBktfSeDKha0g4Q4UOGxmLRVM7GUEBeAefCCS7bbWZhvbwY+wkwWnjX1ZuVVfRwAgFKNehPYaun2i1oeZAqbvD/+CGVfz7i6ma8NW61sEnUyxfZnBs0IWhki8ZC6g+GjlnbAwYPrezDn+uoJWkRYqN8lmxHR6c3/lsrxx/mm7gJwGi9f1dUmJO6t9V4aHUxygR0hxnybSFuGaS4gxHK8Yyw5njgkkRcV1QYos3V7CsMFfkqikNRC1yNV0oQfGXBtOFnMGIbZ4/vL0/UhWFH9KwvhKO8QzMvTZJAUyIzQ95DnOEcAYeej0Fw92Ayuw0ns9R3igXK8YoHp5GLEOnqGiGCMx5BYZEJ9ZjDRqgFT4vG4VaMv0mijNTT/vq/bH6LQP0prBKxoNAckUGC2P8lBGhi+VBzUCSLCTi5 4WzGwtgQ 2N66z2xczu1kgnQhAe77hfx/zcnTCzVYzTbNWjlBaUBTAEsUNjYVNPBN6mkiPjj5T1VUw/3CXTIh8REgApoGJMMi+TJQASKmQgq0HbvwSfdtOu5UHdZ5aDRdeoECUm1eYjNO93JLRUTkRrDY668L8S3WNiW6hU5unzTae+9z9468Usdy9WEISCXI0stqYjYKPmVOvPGEUiyUIjY576Do4LDmdcmXtaaEpdz1bGif5X9LoKiuzJKLW/Zw2P9bLs0CLJOqzwsIm1/NbATmcmjLfUlbvIO4h61B66qwh X-Bogosity: Ham, tests=bogofilter, spamicity=0.000028, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Dec 04, 2023 at 09:50:18PM +0800, Baokun Li wrote: > The problem is with a one-master-twoslave MYSQL database with three > physical machines, and using sysbench pressure testing on each of the > three machines, the problem occurs about once every two to three hours. > > The problem is with the relay log file, and when the problem occurs, > the middle dozens of bytes of the file are read as all zeros, while > the data on disk is not. This is a journal-like file where a write > process gets the data from the master node and writes it locally, > and another replay process reads the file and performs the replay > operation accordingly (some SQL statements). The problem is that > when replaying, it finds that the data read is corrupted, not valid > SQL data, while the data on disk is normal. You mentioned "scripts" --- are these locally developped scripts by any chance? The procedure suggested in a few places that I looked up don't involve needing to read the replay log. For example from[1]: On the master server: root@repl-master:~# mysql -uroot -p; mysql> CREATE USER ‘slave’@’12.34.56.789‘ IDENTIFIED BY ‘SLAVE_PASSWORD‘; mysql> GRANT REPLICATION SLAVE ON . TO ‘slave’@’12.34.56.222 ‘; mysql> FLUSH PRIVILEGES; mysql> FLUSH TABLES WITH READ LOCK; This will make the master server read-only, with all pending writes flushed out (so you don't need to worry about the replay log), and then you move the data from the master to slave: root@repl-master:~# mysqldump -u root -p –all-databases –master-data > data.sql root@repl-master:~# scp data.sql root@12.34.56.222 Then on the slave: root@repl-slave:~# mysql -uroot -p < data.sql root@repl-slave:~# mysql -uroot -p; mysql> STOP SLAVE; ... and then on the master: root@repl-master:~# mysql -uroot -p; mysql> UNLOCK TABLES; ... and back on the slave: root@repl-slave:~# mysql -uroot -p; mysql> START SLAVE; [1] https://hevodata.com/learn/mysql-master-slave-replication/ ... or you could buy the product advertised at [1] which is easier for the database administrators, but results in $$$ flowing to the Hevo company. :-) In any case, I'm pretty sure that the official documented way of setting up a failover replication setup doesn't involve buffered reads of the replay file. It is certainly the case that mysqldump uses buffered reads, but that's why you have to temporary make the database read-only using "FLUSH TABLES WITH READ LOCK" before taking a database snapshot, and then re-enable database updates the "UNLOCK TABLES" SQL commands. Cheers, - Ted