From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A845CC52D7C for ; Tue, 13 Aug 2024 06:39:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3D2E96B009A; Tue, 13 Aug 2024 02:39:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 382EC6B009E; Tue, 13 Aug 2024 02:39:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 24B0A6B009F; Tue, 13 Aug 2024 02:39:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 0163F6B009A for ; Tue, 13 Aug 2024 02:39:50 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id AEF8A1A05AA for ; Tue, 13 Aug 2024 06:39:50 +0000 (UTC) X-FDA: 82446271740.01.AF9470F Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) by imf01.hostedemail.com (Postfix) with ESMTP id B372040009 for ; Tue, 13 Aug 2024 06:39:44 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=none; spf=pass (imf01.hostedemail.com: domain of yukuai1@huaweicloud.com designates 45.249.212.51 as permitted sender) smtp.mailfrom=yukuai1@huaweicloud.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723531118; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=SP600jmEip8KIlvKgtaOlTXwBRM0g/sEy0sSpUc/95k=; b=VjGfuAEYOBDwwJvSsUr9idtR8QSuuig6zGatZOshDFdeA0boItpcVh0xqp+fLmo1FSD+oP EJBL59kxNdlKBz5Kx8dx95T55r5L5m5lnG5ChaJ3CUPAK7OktBpsmZFf5WySGayf3XfTXq weMAJH2ZCwij2z5P3jFpXnPHDL076iM= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723531118; a=rsa-sha256; cv=none; b=A8PlMTsdFHlivZorLB+at5VAXOlO4h2BLf5wCpgoaXwcmAs8XDkTMiwYr4qhotBovob2/k wiiJA6634Ur/INbORRF7QIPtqH8RUHEZ75/kTStX1IQEP6WvQKFH4w85d4cAUFHTNdANVZ 4KxdIxZcuAyw/6IqfT1U/8BQakzLjHg= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=none; spf=pass (imf01.hostedemail.com: domain of yukuai1@huaweicloud.com designates 45.249.212.51 as permitted sender) smtp.mailfrom=yukuai1@huaweicloud.com; dmarc=none Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4WjhZ868c5z4f3jZ1 for ; Tue, 13 Aug 2024 14:39:28 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 26EF31A0359 for ; Tue, 13 Aug 2024 14:39:38 +0800 (CST) Received: from [10.174.176.73] (unknown [10.174.176.73]) by APP4 (Coremail) with SMTP id gCh0CgAHL4Wl_7pmE5E6Bg--.5405S3; Tue, 13 Aug 2024 14:39:35 +0800 (CST) Subject: Re: [BUG] cgroupv2/blk: inconsistent I/O behavior in Cgroup v2 with set device wbps and wiops To: Lance Yang , Yu Kuai Cc: =?UTF-8?Q?Michal_Koutn=c3=bd?= , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, cgroups@vger.kernel.org, josef@toxicpanda.com, tj@kernel.org, fujita.tomonori@lab.ntt.co.jp, boqun.feng@gmail.com, a.hindborg@samsung.com, paolo.valente@unimore.it, axboe@kernel.dk, vbabka@kernel.org, david@redhat.com, 21cnbao@gmail.com, baolin.wang@linux.alibaba.com, libang.li@antgroup.com, "yukuai (C)" References: <20240812150049.8252-1-ioworker0@gmail.com> <9ede36af-fca4-ed41-6b7e-cef157c640bb@huaweicloud.com> From: Yu Kuai Message-ID: Date: Tue, 13 Aug 2024 14:39:32 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-CM-TRANSID:gCh0CgAHL4Wl_7pmE5E6Bg--.5405S3 X-Coremail-Antispam: 1UD129KBjvJXoWxAr1ktr4fXw1DZF1UWFyrCrg_yoWrArWUpF Zxt3W7tFs5Gr13Gw1293y0gFyYqwnrJa15Xr1UKr15uFn0qr9Igr4UKr4qgFyFvF1fGw45 Zw4fWF12gr1093DanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUBF14x267AKxVW5JVWrJwAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK02 1l84ACjcxK6xIIjxv20xvE14v26F1j6w1UM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26r4U JVWxJr1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_Gc CE3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx0E 2Ix0cI8IcVAFwI0_Jr0_Jr4lYx0Ex4A2jsIE14v26r1j6r4UMcvjeVCFs4IE7xkEbVWUJV W8JwACjcxG0xvEwIxGrwACjI8F5VA0II8E6IAqYI8I648v4I1lFIxGxcIEc7CjxVA2Y2ka 0xkIwI1lc7I2V7IY0VAS07AlzVAYIcxG8wCY1x0262kKe7AKxVW8ZVWrXwCF04k20xvY0x 0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E 7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_GFv_WrylIxkGc2Ij64vIr41lIxAIcV C0I7IYx2IY67AKxVWUJVWUCwCI42IY6xIIjxv20xvEc7CjxVAFwI0_Gr0_Cr1lIxAIcVCF 04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r1j6r4UMIIF0xvEx4A2jsIEc7 CjxVAFwI0_Gr0_Gr1UYxBIdaVFxhVjvjDU0xZFpf9x0pRHUDLUUUUU= X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: B372040009 X-Stat-Signature: onjecq61ojf4wt6ho8x7r6geiho549ue X-HE-Tag: 1723531184-76444 X-HE-Meta: U2FsdGVkX19uAamoi4UqLNGPugrTCKzO8GMTzVFAsnzuH2ou+mQY1pikUOyoacu6ltIf66HdJ6RKOw/fw+rZ2B3x7eVk28IY9OR+ixx2n9933gkbZk+3LNT/Pi7sRTdxyASwQpQFgmlSg1S06knBMkV+7lgE9dsNXNW+tC97o/cs9MHbmRS3oDQ/hhAIdUYb3ec9p+BLRw3nYYrm/bD4iVJQX83mL66sEvAT7A3ubYu+chcw0OlZNrC6E+uA7nBP+Htij4ixFPHJMdwxSmgPmw0mE9vtRDo6dV//jNrucuy8ghtnZL30iPmc1ri9GqzOA+5GaeBSdQXFyAMhckaR3Kf15DDrsDj3IDx6kerz1My/liro4DDmef13Sjg77sQ9Tcvr/fC8xFlgarUB231NljaBd2QH/taUcp2Gjp+wTCQUDoINsYcc+xelmk5lg3ktOAOOQfGDaX7Hi0OjP2Q3LlSD1V6h02FPRnzHzr8gTRS+NFaWhS2aOgCD3P+BwT7i2KeiwwcKU1W8FuZSNJ89o04+6WA2jJoJHsVXkul6ECJLelZdBtQP1x4XfouzTxD8UBwifaxr7Pm8REqksCtgyah09u7lr6YWPQ3vtorMRl3c62Omc9CAEZYfLsVbEuIt8IzpSNmYrm9e88XFg7N9XYEMQsZgQNpYAG14X+urwckGVyAnsyaQWAlvPuQ6Pkep5z8G/atEJp7RJ2+z9T+JUmssfFWGVOY55m0joqmTPI/ZzkM/skwHwgRlyIV4C33Mj1zQf42w2rs1ipMThsyC2tPgF7JZHMzO57D9EwK743h5tBZegr2vwDrbdfjoDkyA1cOmjOdQJBAfobt/sIO+yt7tddUdZ0kTklT9PNa9i0eWMlrXQhpEx0FKt4SQsCUG83U0HQylS29DfY4Vfn4L+VrrW3Mj1oj5bVMsHlhHDbKkrHkywxZFFGrX63hX2Csf0NM4ATtwLwBaOX3lOkN sCXqmf+h LnwgctogAE4rUPv+gbDm/twMLz5xGzsiUlnnU5pSCSzIHWKKPQUTygXWBUr3qavT4Lhufbroog6vcqoorZGRtdEH0pJ/KmYRjXdQdUCVXokSUc6scIPEP1wS8MTIoErHXT5Jt7zFDfWzpsbcAEjgHDBTvgsuX/qcGWhgQglZSb8qWTVdv0rMagYJcG8u3KZ5A1WmnMAJM3sk2eNcmDYcaJIbVWz6QDqfMtweY96blcsLQ+62GncPceWe1K0XHoTWV9zAD18TfGd+Kg6xd8aSSrkD8h3p4uPpcTJBXxr3RVJocnSblndmkA4aFkbSlmC8WUFjt99wCisAMTvu8nXyJQ4yTA1jk41jVR9563/mPpwP83GAdeH2CJ6IHuWkhtuMHcy9E X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi, 在 2024/08/13 13:00, Lance Yang 写道: > Hi Kuai, > > Thanks a lot for jumping in! > > On Tue, Aug 13, 2024 at 9:37 AM Yu Kuai wrote: >> >> Hi, >> >> 在 2024/08/12 23:43, Michal Koutný 写道: >>> +Cc Kuai >>> >>> On Mon, Aug 12, 2024 at 11:00:30PM GMT, Lance Yang wrote: >>>> Hi all, >>>> >>>> I've run into a problem with Cgroup v2 where it doesn't seem to correctly limit >>>> I/O operations when I set both wbps and wiops for a device. However, if I only >>>> set wbps, then everything works as expected. >>>> >>>> To reproduce the problem, we can follow these command-based steps: >>>> >>>> 1. **System Information:** >>>> - Kernel Version and OS Release: >>>> ``` >>>> $ uname -r >>>> 6.10.0-rc5+ >>>> >>>> $ cat /etc/os-release >>>> PRETTY_NAME="Ubuntu 24.04 LTS" >>>> NAME="Ubuntu" >>>> VERSION_ID="24.04" >>>> VERSION="24.04 LTS (Noble Numbat)" >>>> VERSION_CODENAME=noble >>>> ID=ubuntu >>>> ID_LIKE=debian >>>> HOME_URL="https://www.ubuntu.com/" >>>> SUPPORT_URL="https://help.ubuntu.com/" >>>> BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" >>>> PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" >>>> UBUNTU_CODENAME=noble >>>> LOGO=ubuntu-logo >>>> ``` >>>> >>>> 2. **Device Information and Settings:** >>>> - List Block Devices and Scheduler: >>>> ``` >>>> $ lsblk >>>> NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS >>>> sda 8:0 0 4.4T 0 disk >>>> └─sda1 8:1 0 4.4T 0 part /data >>>> ... >>>> >>>> $ cat /sys/block/sda/queue/scheduler >>>> none [mq-deadline] kyber bfq >>>> >>>> $ cat /sys/block/sda/queue/rotational >>>> 1 >>>> ``` >>>> >>>> 3. **Reproducing the problem:** >>>> - Navigate to the cgroup v2 filesystem and configure I/O settings: >>>> ``` >>>> $ cd /sys/fs/cgroup/ >>>> $ stat -fc %T /sys/fs/cgroup >>>> cgroup2fs >>>> $ mkdir test >>>> $ echo "8:0 wbps=10485760 wiops=100000" > io.max >>>> ``` >>>> In this setup: >>>> wbps=10485760 sets the write bytes per second limit to 10 MB/s. >>>> wiops=100000 sets the write I/O operations per second limit to 100,000. >>>> >>>> - Add process to the cgroup and verify: >>>> ``` >>>> $ echo $$ > cgroup.procs >>>> $ cat cgroup.procs >>>> 3826771 >>>> 3828513 >>>> $ ps -ef|grep 3826771 >>>> root 3826771 3826768 0 22:04 pts/1 00:00:00 -bash >>>> root 3828761 3826771 0 22:06 pts/1 00:00:00 ps -ef >>>> root 3828762 3826771 0 22:06 pts/1 00:00:00 grep --color=auto 3826771 >>>> ``` >>>> >>>> - Observe I/O performance using `dd` commands and `iostat`: >>>> ``` >>>> $ dd if=/dev/zero of=/data/file1 bs=512M count=1 & >>>> $ dd if=/dev/zero of=/data/file1 bs=512M count=1 & >> >> You're testing buffer IO here, and I don't see that write back cgroup is >> enabled. Is this test intentional? Why not test direct IO? > > Yes, I was testing buffered I/O and can confirm that CONFIG_CGROUP_WRITEBACK > was enabled. > > $ cat /boot/config-6.10.0-rc5+ |grep CONFIG_CGROUP_WRITEBACK > CONFIG_CGROUP_WRITEBACK=y > > We intend to configure both wbps (write bytes per second) and wiops > (write I/O operations > per second) for the containers. IIUC, this setup will effectively > restrict both their block device > I/Os and buffered I/Os. > >> Why not test direct IO? > > I was testing direct IO as well. However it did not work as expected with > `echo "8:0 wbps=10485760 wiops=100000" > io.max`. > > $ time dd if=/dev/zero of=/data/file7 bs=512M count=1 oflag=direct So, you're issuing one huge IO, with 512M. > 1+0 records in > 1+0 records out > 536870912 bytes (537 MB, 512 MiB) copied, 51.5962 s, 10.4 MB/s And this result looks correct. Please noted that blk-throtl works before IO submit, while iostat reports IO that are done. A huge IO can be throttled for a long time. > > real 0m51.637s > user 0m0.000s > sys 0m0.313s > > $ iostat -d 1 -h -y -p sda > tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn > kB_dscd Device > 9.00 0.0k 1.3M 0.0k 0.0k 1.3M > 0.0k sda > 9.00 0.0k 1.3M 0.0k 0.0k 1.3M > 0.0k sda1 I don't understand yet is why there are few IO during the wait. Can you test for a raw disk to bypass filesystem? Thanks, Kuai