From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AEAFBC4167B for ; Tue, 5 Dec 2023 13:19:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4F6DA6B007B; Tue, 5 Dec 2023 08:19:13 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4A71C6B0085; Tue, 5 Dec 2023 08:19:13 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 399C36B0087; Tue, 5 Dec 2023 08:19:13 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 260E86B0085 for ; Tue, 5 Dec 2023 08:19:13 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id EF08D8015E for ; Tue, 5 Dec 2023 13:19:12 +0000 (UTC) X-FDA: 81532820544.09.2C5B973 Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) by imf24.hostedemail.com (Postfix) with ESMTP id 977FA18001F for ; Tue, 5 Dec 2023 13:19:09 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf24.hostedemail.com: domain of libaokun1@huawei.com designates 45.249.212.189 as permitted sender) smtp.mailfrom=libaokun1@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701782350; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Fn0DXdTi67wrD7YoTI2nEi5e+DIwrLxrEi5d0Ee7Dac=; b=hAY1puNeB0QKqshAYfbXLZby5S8OaRMKiq0xHB4866GSw+/+PynfgqZrakEeE5OXVUaJJ9 e1sPRSptQQbmGlWkCtal/xDUlo0SAoDnNTp18rCVKkrC/tUK1qFxLQoOH5WE+Q8rFh+9nW +XheWs+7LUkAKqyUnpESy4e50GonKS8= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf24.hostedemail.com: domain of libaokun1@huawei.com designates 45.249.212.189 as permitted sender) smtp.mailfrom=libaokun1@huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701782350; a=rsa-sha256; cv=none; b=YWio+ClEerFlqcd1i+R71ZcaE3tHUTx9oxEsGLDI932J8SoJLHkSWWBA9imc8NlgGQNGhv 9rX2bTwdPu0LPjxb3k0gLnbDiOqcoYzx2OGuxi16CC6D7ITsKKiQQ7jvQVaHSnakRxZ8gG SU1cQeGYL9jtwbcX1vfn81QpePmOD4M= Received: from dggpeml500021.china.huawei.com (unknown [172.30.72.57]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4Sl1Fp4YsBz14L97; Tue, 5 Dec 2023 21:14:06 +0800 (CST) Received: from [10.174.177.174] (10.174.177.174) by dggpeml500021.china.huawei.com (7.185.36.21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35; Tue, 5 Dec 2023 21:19:03 +0800 Message-ID: Date: Tue, 5 Dec 2023 21:19:03 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.1.2 Subject: Re: [PATCH -RFC 0/2] mm/ext4: avoid data corruption when extending DIO write race with buffered read Content-Language: en-US To: Theodore Ts'o CC: Jan Kara , , , , , , , , , , , Baokun Li References: <20231202091432.8349-1-libaokun1@huawei.com> <20231204121120.mpxntey47rluhcfi@quack3> <20231205041755.GG509422@mit.edu> From: Baokun Li In-Reply-To: <20231205041755.GG509422@mit.edu> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.177.174] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To dggpeml500021.china.huawei.com (7.185.36.21) X-CFilter-Loop: Reflected X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 977FA18001F X-Stat-Signature: qjkam4c9mk6wzarurz9fddyeu3i3f86j X-HE-Tag: 1701782349-126360 X-HE-Meta: U2FsdGVkX1+tjx+XjsC+gDixXVesci9GPnQ1NOu6nBRp45FpkJCzW/NBYitdzaAm01Rvb9bSOvpx4p1lePfGauWBx4a80Sg8mC7Xi0SLNhNXTPtX4jFYFu9OsUpHnm0cCTN+TR9xCOB3DVsUr0LdonVwiGA5LtEaNeq+HLFSegOA5xLwvmdeEYVI8H1O1bFhesTUAcIexSHVEa5jOSOB9xfwoI/9BpApqwFumMawVGQajZLZof4zZfrFqd/BF0TK7gPJ86aa8+CHN0Jdp3I3N8K57l9uSPCQEvzijgYiAiGEQjAJ7cDyGWjcHeFTnxcBes8ojD6AtlMAfiJPeeagzEfu1gbofJzHUtwgebuka8T4OjS/JriaPdd8c76Arp0B6gVNJJJKY1rkeEW+NsDMb9XZZpIc9hB9AWZv7y+9F5O5YSmna61BKmBzFqpB01Cz6SRVihgOyGTsJUmE90z0uT4/xu4y0tltM36YOa7RozbRtSl95Jiyn+em2lZoyGoItE+NubVuwS1YcfhdXak/mwFEt+97v13gmEdS0QtVGwB1dF2o+Fjdt4y/pV1rTvyaI+zhsX1CRg1FOXU99BK+m+ZjL3SyewFPO6B/FV1tJhrF6wAI3iJqm+lWf6BdB4gNghqwQxQsaQp3NW4Ta38WXkCba7pKdX2Ny1/RzEF9P3FGzPGQ/AlYx6rNzVbu20gNP6zyYKZ9mlvDpIftJzQhM2GgWPY3Xd1CpI2zj55drXQdb7lx3QLu2Pxd+V2EYbtQQ89udCHCToGI7OQxw7wanUk10zJihgGPVPkixHcA7q94NDaHytuVpSpx9/vSdlXf+7G84595HI5gmPsF6YDey1obFdjDMLmbXtnbcePbvSCur4AhCz/irnkcSuYPWAv5QyTNgWWWUNGOUUQEPqk7JuK5Tx2N+uzQYuSGqLiPgPYE0y4quu4Xo5qIchW1DY5/hCcbfL4CmCJJ4WQRmFZ DczLL9YK ZlB2DGJ+0ca85kHfwDU7JjCco0ZCPfNJppIZrx2GyQXBTJXk9jgyh2M3SuVOE9gii0R4Ev0/iqN6AhSFgGPtIxI0xdy1zmxnRGm5oRVDmXOfQ8Lu3VRfKogu1wSa2nwww36KyVK0IeLxrk3xWyMifuxVQqGknh+16SH7XO0Sk7gSA7tY8YpjMYQLgHuimiQTjggAbjb59gqdfPvAHtkVxoCqOG+ekMYyWQV2atWU23ERaYzqPDCvIOBBzDYvMokA1ScR5RZiKTOeW4GsKmcFXZVHt2wbGhJOWavf4CgZ+u7koNx+BBt826fY9BegCqJFTuXKAAzvXVs9/Z3Xuf1HhSg3TOg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2023/12/5 12:17, Theodore Ts'o wrote: > On Mon, Dec 04, 2023 at 09:50:18PM +0800, Baokun Li wrote: >> The problem is with a one-master-twoslave MYSQL database with three >> physical machines, and using sysbench pressure testing on each of the >> three machines, the problem occurs about once every two to three hours. >> >> The problem is with the relay log file, and when the problem occurs, >> the middle dozens of bytes of the file are read as all zeros, while >> the data on disk is not. This is a journal-like file where a write >> process gets the data from the master node and writes it locally, >> and another replay process reads the file and performs the replay >> operation accordingly (some SQL statements). The problem is that >> when replaying, it finds that the data read is corrupted, not valid >> SQL data, while the data on disk is normal. > You mentioned "scripts" --- are these locally developped scripts by > any chance? This refers to the sql commands to be replayed in the relay log file.  I don't know much about this file, but you can read the official documentation. https://dev.mysql.com/doc/refman/8.0/en/replica-logs-relaylog.html > The procedure suggested in a few places that I looked up > don't involve needing to read the replay log. For example from[1]: > > On the master server: > > root@repl-master:~# mysql -uroot -p; > mysql> CREATE USER ‘slave’@’12.34.56.789‘ IDENTIFIED BY ‘SLAVE_PASSWORD‘; > mysql> GRANT REPLICATION SLAVE ON . TO ‘slave’@’12.34.56.222 ‘; > mysql> FLUSH PRIVILEGES; > mysql> FLUSH TABLES WITH READ LOCK; > > This will make the master server read-only, with all pending writes > flushed out (so you don't need to worry about the replay log), and > then you move the data from the master to slave: > > root@repl-master:~# mysqldump -u root -p –all-databases –master-data > data.sql > root@repl-master:~# scp data.sql root@12.34.56.222 > > Then on the slave: > > root@repl-slave:~# mysql -uroot -p < data.sql > root@repl-slave:~# mysql -uroot -p; > mysql> STOP SLAVE; > > ... and then on the master: > > root@repl-master:~# mysql -uroot -p; > mysql> UNLOCK TABLES; > > ... and back on the slave: > > root@repl-slave:~# mysql -uroot -p; > mysql> START SLAVE; > > [1] https://hevodata.com/learn/mysql-master-slave-replication/ > > ... or you could buy the product advertised at [1] which is easier for > the database administrators, but results in $$$ flowing to the Hevo > company. :-) > > In any case, I'm pretty sure that the official documented way of > setting up a failover replication setup doesn't involve buffered reads > of the replay file. > > It is certainly the case that mysqldump uses buffered reads, but > that's why you have to temporary make the database read-only using > "FLUSH TABLES WITH READ LOCK" before taking a database snapshot, and > then re-enable database updates the "UNLOCK TABLES" SQL commands. > > Cheers, > > - Ted Thank you very much for your detailed explanation! But the downstream users do have buffered reads to read the relay log file, as I confirmed with bpftrace. Here's an introduction to turning on relay logging, but I'm not sure if you can access this link: https://blog.csdn.net/javaanddonet/article/details/112596148 Thanks! -- With Best Regards, Baokun Li .