From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F21F1C3DA7F for ; Tue, 13 Aug 2024 01:37:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 457ED6B0098; Mon, 12 Aug 2024 21:37:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4082C6B009A; Mon, 12 Aug 2024 21:37:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2A8B76B009E; Mon, 12 Aug 2024 21:37:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 0B2AB6B0098 for ; Mon, 12 Aug 2024 21:37:21 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 9A9A6160536 for ; Tue, 13 Aug 2024 01:37:20 +0000 (UTC) X-FDA: 82445509440.21.7B09B76 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) by imf11.hostedemail.com (Postfix) with ESMTP id 89A294000D for ; Tue, 13 Aug 2024 01:37:16 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf11.hostedemail.com: domain of yukuai1@huaweicloud.com designates 45.249.212.56 as permitted sender) smtp.mailfrom=yukuai1@huaweicloud.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723513027; a=rsa-sha256; cv=none; b=sFYnW5/2R7g3SSbXShYUuhPwR61s3LkmKcwKvDqyEGw43beao0lkNJcOScN47NMZOYgK27 G+YUkNxXBZtG8FXRYrWD25CHfKAywG5mT9zZAIvXVg//akiqhu6aKxvfj3KWk64xe4W0jj nFnjqueJ9kkx/9qty0yEPadOvV6SfB0= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf11.hostedemail.com: domain of yukuai1@huaweicloud.com designates 45.249.212.56 as permitted sender) smtp.mailfrom=yukuai1@huaweicloud.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723513027; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xzRtc64r5odhqCVFmuWjspVYK9df3HX8f6tuXtOTXzM=; b=x3i8WND+Y5Pnu12UM5Evl8GfPywNF9eg0uznA/ediSYqchKL7YE5/qWKBx+psXXsiXRd9Z HFTco5GrWGyG+BIiBWiZjiLDSlTR1m3OK02P1U1IWstlgTEFPazNJ+gol0jrbNFiLVDMGQ ekCE0flbqp6fUdVkzTEShaB0QRqSdH4= Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4WjYs33VJbz4f3jMD for ; Tue, 13 Aug 2024 09:36:55 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 2C0B21A0359 for ; Tue, 13 Aug 2024 09:37:09 +0800 (CST) Received: from [10.174.176.73] (unknown [10.174.176.73]) by APP4 (Coremail) with SMTP id gCh0CgCHr4XAuLpmnWMmBg--.63758S3; Tue, 13 Aug 2024 09:37:06 +0800 (CST) Subject: Re: [BUG] cgroupv2/blk: inconsistent I/O behavior in Cgroup v2 with set device wbps and wiops To: =?UTF-8?Q?Michal_Koutn=c3=bd?= , Lance Yang Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, cgroups@vger.kernel.org, josef@toxicpanda.com, tj@kernel.org, fujita.tomonori@lab.ntt.co.jp, boqun.feng@gmail.com, a.hindborg@samsung.com, paolo.valente@unimore.it, axboe@kernel.dk, vbabka@kernel.org, david@redhat.com, 21cnbao@gmail.com, baolin.wang@linux.alibaba.com, libang.li@antgroup.com, "yukuai (C)" References: <20240812150049.8252-1-ioworker0@gmail.com> From: Yu Kuai Message-ID: <9ede36af-fca4-ed41-6b7e-cef157c640bb@huaweicloud.com> Date: Tue, 13 Aug 2024 09:37:03 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-CM-TRANSID:gCh0CgCHr4XAuLpmnWMmBg--.63758S3 X-Coremail-Antispam: 1UD129KBjvJXoW3Xw17Gry5Cw47WF47Jr4fXwb_yoW7CF4rpr WIyFW7Gr95Grn8Ga40k3y0gr10vr13Ja1Sgr98J3Wa9a1rJ3Z8XFW8Jr4kK3s2qwn8GF4S qr4kAasFyF4akFJanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUU9Ib4IE77IF4wAFF20E14v26ryj6rWUM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28lY4IEw2IIxxk0rwA2F7IY1VAKz4 vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7Cj xVAFwI0_Gr1j6F4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x 0267AKxVW0oVCq3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG 6I80ewAv7VC0I7IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFV Cjc4AY6r1j6r4UM4x0Y48IcVAKI48JM4IIrI8v6xkF7I0E8cxan2IY04v7Mxk0xIA0c2IE e2xFo4CEbIxvr21lc7CjxVAaw2AFwI0_GFv_Wryl42xK82IYc2Ij64vIr41l4I8I3I0E4I kC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWU WwC2zVAF1VAY17CE14v26r4a6rW5MIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Jr 0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E14v26r4j6F4UMIIF0xvE42xK8VAvwI8IcIk0rVWU JVWUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJb IYCTnIWIevJa73UjIFyTuYvjxUIa0PDUUUU X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ X-Rspam-User: X-Rspamd-Queue-Id: 89A294000D X-Rspamd-Server: rspam01 X-Stat-Signature: zydkfposss3efyt81eypr8barcq1ytdq X-HE-Tag: 1723513036-386306 X-HE-Meta: U2FsdGVkX1+ncST/XylaQnpFZKLBm1CEaW8W1hullg9MDa2rn4Ryhk1NHqfaOPAlK6tB972QSQg6kUeKA4ZL4cSHvH6wyVBXeUHxFpfA+qZlA+ixO8rL+hygg1Y38jxKTWUxzaLIQZvM2i+UWH/bVqlJmUPoOBR70VXT4YEFLpRSKmwfKElN3lCpjJv841VTMSyns/Lygc2kxXwKgQezODDLubCkEaQbwXmDAeGLBJs4Zagvtn47Nq7IchudVBIGdtYFgxo0mZGHfdElfvvYjFazScLeow2bOVPbiLrushjr6/ksmxuHwzXssNaRzsFnhgATjBA8v5tHXs/C0YJZoYHv9I4tzLrZDx4MYfv3EVv4hq3A/LKHXIftqvPg7hdAfdbUlb8qoXQLoBTNefV3fD+2Lqkmds3JDW2moj6712NRcJr8oUax3Q4sieVZC25/FJq20csNDuXBq9w10yGgR47gd5Y010IY+IexP1lEcgHMBlF/FXdTAqH2OIhCcbFmFXSSe3tYQVWl2yXENVvXIWEL+W6cpQMsxe7ojtiXZiK/B3DBOjicjbjNmf4Fb82MdHIu0oSb7q48H02kdBNdPP84LkxkWHMCJaN6nE/6b3hU67s31Z8ykeKP1PxLUjpgxuRJLVLj/m/QCwyFFyRY3PS+ETO7stww1I9iIax6azalDCweMypuPNcn4DEdMDYdpHq2jxy9+TTSuHIcB5BMz3Fgqjy0xSDZHwCxn6/qZN4xEGeeHhzgwGVFLMOK1Gk8QnrmP9+4TU4mym77Q7ti1LjNZTdAm5u8vvuRvRAgVqCWBUy+nUmCoOG+zcX61DqDl8USFgNXPJ9h0uKGxNxpea8ccpA1GgXYyjfRw8IfJUDNsT8fEu3Tq6ZC/i81NU2jJfgIoj8ruzwuDafqyaMCVBSsEn/+yUhvkVC8h0sBadf2PHWThiAxCgYt9FI7jEXKNMTwHKDdDZ0x1byF3Md CgJ0KMX6 I7vEwBlsefVPHfTRurQvh1i0QdHatdRRaB/3t2cW9pj8zUXukama3aqum0IB+3qm2/QRKCEBRsrj8h2RpIeT9cnJ2/3SxD1LPrhYLr/MQiq6dEgXe9Os2n937Dr3+OfMmDVG8GT27aozyyi18H1y5asP1D0BR4R+aiJomZZn/xAhyCX762MmDhvyVWGLBk8ygrI/5zOzqM5l+rDdqR24vToewsmcHTLz932m3x53J/3p948PogOHGgy9SS25PeYHVUyfaX0DzHw18UqO27KDA8XHfkKVsbJekNuuQMiZ47nc/xGn9t2RStFkJmtrruLEvbUzoo4wrCBtuxcg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000304, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi, 在 2024/08/12 23:43, Michal Koutný 写道: > +Cc Kuai > > On Mon, Aug 12, 2024 at 11:00:30PM GMT, Lance Yang wrote: >> Hi all, >> >> I've run into a problem with Cgroup v2 where it doesn't seem to correctly limit >> I/O operations when I set both wbps and wiops for a device. However, if I only >> set wbps, then everything works as expected. >> >> To reproduce the problem, we can follow these command-based steps: >> >> 1. **System Information:** >> - Kernel Version and OS Release: >> ``` >> $ uname -r >> 6.10.0-rc5+ >> >> $ cat /etc/os-release >> PRETTY_NAME="Ubuntu 24.04 LTS" >> NAME="Ubuntu" >> VERSION_ID="24.04" >> VERSION="24.04 LTS (Noble Numbat)" >> VERSION_CODENAME=noble >> ID=ubuntu >> ID_LIKE=debian >> HOME_URL="https://www.ubuntu.com/" >> SUPPORT_URL="https://help.ubuntu.com/" >> BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" >> PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" >> UBUNTU_CODENAME=noble >> LOGO=ubuntu-logo >> ``` >> >> 2. **Device Information and Settings:** >> - List Block Devices and Scheduler: >> ``` >> $ lsblk >> NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS >> sda 8:0 0 4.4T 0 disk >> └─sda1 8:1 0 4.4T 0 part /data >> ... >> >> $ cat /sys/block/sda/queue/scheduler >> none [mq-deadline] kyber bfq >> >> $ cat /sys/block/sda/queue/rotational >> 1 >> ``` >> >> 3. **Reproducing the problem:** >> - Navigate to the cgroup v2 filesystem and configure I/O settings: >> ``` >> $ cd /sys/fs/cgroup/ >> $ stat -fc %T /sys/fs/cgroup >> cgroup2fs >> $ mkdir test >> $ echo "8:0 wbps=10485760 wiops=100000" > io.max >> ``` >> In this setup: >> wbps=10485760 sets the write bytes per second limit to 10 MB/s. >> wiops=100000 sets the write I/O operations per second limit to 100,000. >> >> - Add process to the cgroup and verify: >> ``` >> $ echo $$ > cgroup.procs >> $ cat cgroup.procs >> 3826771 >> 3828513 >> $ ps -ef|grep 3826771 >> root 3826771 3826768 0 22:04 pts/1 00:00:00 -bash >> root 3828761 3826771 0 22:06 pts/1 00:00:00 ps -ef >> root 3828762 3826771 0 22:06 pts/1 00:00:00 grep --color=auto 3826771 >> ``` >> >> - Observe I/O performance using `dd` commands and `iostat`: >> ``` >> $ dd if=/dev/zero of=/data/file1 bs=512M count=1 & >> $ dd if=/dev/zero of=/data/file1 bs=512M count=1 & You're testing buffer IO here, and I don't see that write back cgroup is enabled. Is this test intentional? Why not test direct IO? >> ``` >> ``` >> $ iostat -d 1 -h -y -p sda >> >> tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd Device >> 7.00 0.0k 1.3M 0.0k 0.0k 1.3M 0.0k sda >> 7.00 0.0k 1.3M 0.0k 0.0k 1.3M 0.0k sda1 >> >> >> tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd Device >> 5.00 0.0k 1.2M 0.0k 0.0k 1.2M 0.0k sda >> 5.00 0.0k 1.2M 0.0k 0.0k 1.2M 0.0k sda1 >> >> >> tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd Device >> 21.00 0.0k 1.4M 0.0k 0.0k 1.4M 0.0k sda >> 21.00 0.0k 1.4M 0.0k 0.0k 1.4M 0.0k sda1 >> >> >> tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd Device >> 5.00 0.0k 1.2M 0.0k 0.0k 1.2M 0.0k sda >> 5.00 0.0k 1.2M 0.0k 0.0k 1.2M 0.0k sda1 >> >> >> tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd Device >> 5.00 0.0k 1.2M 0.0k 0.0k 1.2M 0.0k sda >> 5.00 0.0k 1.2M 0.0k 0.0k 1.2M 0.0k sda1 >> >> >> tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd Device >> 1848.00 0.0k 448.1M 0.0k 0.0k 448.1M 0.0k sda >> 1848.00 0.0k 448.1M 0.0k 0.0k 448.1M 0.0k sda1 Looks like all dirty buffer got flushed to disk at the last second while the file is closed, this is expected. >> ``` >> Initially, the write speed is slow (<2MB/s) then suddenly bursts to several >> hundreds of MB/s. > > What it would be on average? > IOW how long would the whole operation in throttled cgroup take? > >> >> - Testing with wiops set to max: >> ``` >> echo "8:0 wbps=10485760 wiops=max" > io.max >> $ dd if=/dev/zero of=/data/file1 bs=512M count=1 & >> $ dd if=/dev/zero of=/data/file1 bs=512M count=1 & >> ``` >> ``` >> $ iostat -d 1 -h -y -p sda >> >> tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd Device >> 48.00 0.0k 10.0M 0.0k 0.0k 10.0M 0.0k sda >> 48.00 0.0k 10.0M 0.0k 0.0k 10.0M 0.0k sda1 >> >> >> tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd Device >> 40.00 0.0k 10.0M 0.0k 0.0k 10.0M 0.0k sda >> 40.00 0.0k 10.0M 0.0k 0.0k 10.0M 0.0k sda1 >> >> >> tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd Device >> 41.00 0.0k 10.0M 0.0k 0.0k 10.0M 0.0k sda >> 41.00 0.0k 10.0M 0.0k 0.0k 10.0M 0.0k sda1 >> >> >> tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd Device >> 46.00 0.0k 10.0M 0.0k 0.0k 10.0M 0.0k sda >> 46.00 0.0k 10.0M 0.0k 0.0k 10.0M 0.0k sda1 >> >> >> tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd Device >> 55.00 0.0k 10.2M 0.0k 0.0k 10.2M 0.0k sda >> 55.00 0.0k 10.2M 0.0k 0.0k 10.2M 0.0k sda1 And I don't this wiops=max is the reason, what need to explain is that why dirty buffer got flushed to disk synchronously before the dd finish and close the file? >> ``` >> The iostat output shows the write operations as stabilizing at around 10 MB/s, >> which aligns with the defined limit of 10 MB/s. After setting wiops to max, the >> I/O limits appear to work as expected. Can you give the direct IO a test? And also enable write back cgroup for buffer IO. Thanks, Kuai >> >> >> Thanks, >> Lance > > Thanks for the report Lance. Is this something you started seeing after > a kernel update or switch to cgroup v2? (Or you simply noticed with this > setup only?) > > > Michal >