linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [BUG] cgroupv2/blk: inconsistent I/O behavior in Cgroup v2 with set device wbps and wiops
@ 2024-08-12 15:00 Lance Yang
  2024-08-12 15:43 ` Michal Koutný
  0 siblings, 1 reply; 13+ messages in thread
From: Lance Yang @ 2024-08-12 15:00 UTC (permalink / raw)
  To: linux-mm, linux-kernel, linux-block, cgroups
  Cc: josef, tj, fujita.tomonori, boqun.feng, a.hindborg,
	paolo.valente, axboe, vbabka, mkoutny, david, 21cnbao,
	baolin.wang, libang.li, Lance Yang

Hi all,

I've run into a problem with Cgroup v2 where it doesn't seem to correctly limit
I/O operations when I set both wbps and wiops for a device. However, if I only
set wbps, then everything works as expected.

To reproduce the problem, we can follow these command-based steps:

1. **System Information:**
   - Kernel Version and OS Release:
     ```
     $ uname -r
     6.10.0-rc5+

     $ cat /etc/os-release
     PRETTY_NAME="Ubuntu 24.04 LTS"
     NAME="Ubuntu"
     VERSION_ID="24.04"
     VERSION="24.04 LTS (Noble Numbat)"
     VERSION_CODENAME=noble
     ID=ubuntu
     ID_LIKE=debian
     HOME_URL="https://www.ubuntu.com/"
     SUPPORT_URL="https://help.ubuntu.com/"
     BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
     PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
     UBUNTU_CODENAME=noble
     LOGO=ubuntu-logo
     ```

2. **Device Information and Settings:**
   - List Block Devices and Scheduler:
     ```
     $ lsblk
     NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
     sda     8:0    0   4.4T  0 disk
     └─sda1  8:1    0   4.4T  0 part /data
     ...

     $ cat /sys/block/sda/queue/scheduler
     none [mq-deadline] kyber bfq

     $ cat /sys/block/sda/queue/rotational
     1
     ```

3. **Reproducing the problem:**
   - Navigate to the cgroup v2 filesystem and configure I/O settings:
     ```
     $ cd /sys/fs/cgroup/
     $ stat -fc %T /sys/fs/cgroup
     cgroup2fs
     $ mkdir test
     $ echo "8:0 wbps=10485760 wiops=100000" > io.max
     ```
     In this setup:
     wbps=10485760 sets the write bytes per second limit to 10 MB/s.
     wiops=100000 sets the write I/O operations per second limit to 100,000.

   - Add process to the cgroup and verify:
     ```
     $ echo $$ > cgroup.procs
     $ cat cgroup.procs
     3826771
     3828513
     $ ps -ef|grep 3826771
     root     3826771 3826768  0 22:04 pts/1    00:00:00 -bash
     root     3828761 3826771  0 22:06 pts/1    00:00:00 ps -ef
     root     3828762 3826771  0 22:06 pts/1    00:00:00 grep --color=auto 3826771
     ```

   - Observe I/O performance using `dd` commands and `iostat`:
     ```
     $ dd if=/dev/zero of=/data/file1 bs=512M count=1 &
     $ dd if=/dev/zero of=/data/file1 bs=512M count=1 &
     ```
     ```
     $ iostat -d 1 -h -y -p sda
     
	   tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
     7.00         0.0k         1.3M         0.0k       0.0k       1.3M       0.0k sda
     7.00         0.0k         1.3M         0.0k       0.0k       1.3M       0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
     5.00         0.0k         1.2M         0.0k       0.0k       1.2M       0.0k sda
     5.00         0.0k         1.2M         0.0k       0.0k       1.2M       0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
    21.00         0.0k         1.4M         0.0k       0.0k       1.4M       0.0k sda
    21.00         0.0k         1.4M         0.0k       0.0k       1.4M       0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
     5.00         0.0k         1.2M         0.0k       0.0k       1.2M       0.0k sda
     5.00         0.0k         1.2M         0.0k       0.0k       1.2M       0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
     5.00         0.0k         1.2M         0.0k       0.0k       1.2M       0.0k sda
     5.00         0.0k         1.2M         0.0k       0.0k       1.2M       0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
  1848.00         0.0k       448.1M         0.0k       0.0k     448.1M       0.0k sda
  1848.00         0.0k       448.1M         0.0k       0.0k     448.1M       0.0k sda1
     ```
Initially, the write speed is slow (<2MB/s) then suddenly bursts to several
hundreds of MB/s.

   - Testing with wiops set to max:
     ```
     echo "8:0 wbps=10485760 wiops=max" > io.max
     $ dd if=/dev/zero of=/data/file1 bs=512M count=1 &
     $ dd if=/dev/zero of=/data/file1 bs=512M count=1 &
     ```
     ```
     $ iostat -d 1 -h -y -p sda

      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
    48.00         0.0k        10.0M         0.0k       0.0k      10.0M       0.0k sda
    48.00         0.0k        10.0M         0.0k       0.0k      10.0M       0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
    40.00         0.0k        10.0M         0.0k       0.0k      10.0M       0.0k sda
    40.00         0.0k        10.0M         0.0k       0.0k      10.0M       0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
    41.00         0.0k        10.0M         0.0k       0.0k      10.0M       0.0k sda
    41.00         0.0k        10.0M         0.0k       0.0k      10.0M       0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
    46.00         0.0k        10.0M         0.0k       0.0k      10.0M       0.0k sda
    46.00         0.0k        10.0M         0.0k       0.0k      10.0M       0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
    55.00         0.0k        10.2M         0.0k       0.0k      10.2M       0.0k sda
    55.00         0.0k        10.2M         0.0k       0.0k      10.2M       0.0k sda1
     ```
The iostat output shows the write operations as stabilizing at around 10 MB/s,
which aligns with the defined limit of 10 MB/s. After setting wiops to max, the
I/O limits appear to work as expected. 


Thanks,
Lance


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] cgroupv2/blk: inconsistent I/O behavior in Cgroup v2 with set device wbps and wiops
  2024-08-12 15:00 [BUG] cgroupv2/blk: inconsistent I/O behavior in Cgroup v2 with set device wbps and wiops Lance Yang
@ 2024-08-12 15:43 ` Michal Koutný
  2024-08-13  1:37   ` Yu Kuai
  2024-08-13  5:11   ` Lance Yang
  0 siblings, 2 replies; 13+ messages in thread
From: Michal Koutný @ 2024-08-12 15:43 UTC (permalink / raw)
  To: Lance Yang
  Cc: linux-mm, linux-kernel, linux-block, cgroups, josef, tj,
	fujita.tomonori, boqun.feng, a.hindborg, paolo.valente, axboe,
	vbabka, david, 21cnbao, baolin.wang, libang.li, Yu Kuai

[-- Attachment #1: Type: text/plain, Size: 6625 bytes --]

+Cc Kuai

On Mon, Aug 12, 2024 at 11:00:30PM GMT, Lance Yang <ioworker0@gmail.com> wrote:
> Hi all,
> 
> I've run into a problem with Cgroup v2 where it doesn't seem to correctly limit
> I/O operations when I set both wbps and wiops for a device. However, if I only
> set wbps, then everything works as expected.
> 
> To reproduce the problem, we can follow these command-based steps:
> 
> 1. **System Information:**
>    - Kernel Version and OS Release:
>      ```
>      $ uname -r
>      6.10.0-rc5+
> 
>      $ cat /etc/os-release
>      PRETTY_NAME="Ubuntu 24.04 LTS"
>      NAME="Ubuntu"
>      VERSION_ID="24.04"
>      VERSION="24.04 LTS (Noble Numbat)"
>      VERSION_CODENAME=noble
>      ID=ubuntu
>      ID_LIKE=debian
>      HOME_URL="https://www.ubuntu.com/"
>      SUPPORT_URL="https://help.ubuntu.com/"
>      BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
>      PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
>      UBUNTU_CODENAME=noble
>      LOGO=ubuntu-logo
>      ```
> 
> 2. **Device Information and Settings:**
>    - List Block Devices and Scheduler:
>      ```
>      $ lsblk
>      NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
>      sda     8:0    0   4.4T  0 disk
>      └─sda1  8:1    0   4.4T  0 part /data
>      ...
> 
>      $ cat /sys/block/sda/queue/scheduler
>      none [mq-deadline] kyber bfq
> 
>      $ cat /sys/block/sda/queue/rotational
>      1
>      ```
> 
> 3. **Reproducing the problem:**
>    - Navigate to the cgroup v2 filesystem and configure I/O settings:
>      ```
>      $ cd /sys/fs/cgroup/
>      $ stat -fc %T /sys/fs/cgroup
>      cgroup2fs
>      $ mkdir test
>      $ echo "8:0 wbps=10485760 wiops=100000" > io.max
>      ```
>      In this setup:
>      wbps=10485760 sets the write bytes per second limit to 10 MB/s.
>      wiops=100000 sets the write I/O operations per second limit to 100,000.
> 
>    - Add process to the cgroup and verify:
>      ```
>      $ echo $$ > cgroup.procs
>      $ cat cgroup.procs
>      3826771
>      3828513
>      $ ps -ef|grep 3826771
>      root     3826771 3826768  0 22:04 pts/1    00:00:00 -bash
>      root     3828761 3826771  0 22:06 pts/1    00:00:00 ps -ef
>      root     3828762 3826771  0 22:06 pts/1    00:00:00 grep --color=auto 3826771
>      ```
> 
>    - Observe I/O performance using `dd` commands and `iostat`:
>      ```
>      $ dd if=/dev/zero of=/data/file1 bs=512M count=1 &
>      $ dd if=/dev/zero of=/data/file1 bs=512M count=1 &
>      ```
>      ```
>      $ iostat -d 1 -h -y -p sda
>      
> 	   tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
>      7.00         0.0k         1.3M         0.0k       0.0k       1.3M       0.0k sda
>      7.00         0.0k         1.3M         0.0k       0.0k       1.3M       0.0k sda1
> 
> 
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
>      5.00         0.0k         1.2M         0.0k       0.0k       1.2M       0.0k sda
>      5.00         0.0k         1.2M         0.0k       0.0k       1.2M       0.0k sda1
> 
> 
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
>     21.00         0.0k         1.4M         0.0k       0.0k       1.4M       0.0k sda
>     21.00         0.0k         1.4M         0.0k       0.0k       1.4M       0.0k sda1
> 
> 
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
>      5.00         0.0k         1.2M         0.0k       0.0k       1.2M       0.0k sda
>      5.00         0.0k         1.2M         0.0k       0.0k       1.2M       0.0k sda1
> 
> 
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
>      5.00         0.0k         1.2M         0.0k       0.0k       1.2M       0.0k sda
>      5.00         0.0k         1.2M         0.0k       0.0k       1.2M       0.0k sda1
> 
> 
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
>   1848.00         0.0k       448.1M         0.0k       0.0k     448.1M       0.0k sda
>   1848.00         0.0k       448.1M         0.0k       0.0k     448.1M       0.0k sda1
>      ```
> Initially, the write speed is slow (<2MB/s) then suddenly bursts to several
> hundreds of MB/s.

What it would be on average?
IOW how long would the whole operation in throttled cgroup take?

> 
>    - Testing with wiops set to max:
>      ```
>      echo "8:0 wbps=10485760 wiops=max" > io.max
>      $ dd if=/dev/zero of=/data/file1 bs=512M count=1 &
>      $ dd if=/dev/zero of=/data/file1 bs=512M count=1 &
>      ```
>      ```
>      $ iostat -d 1 -h -y -p sda
> 
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
>     48.00         0.0k        10.0M         0.0k       0.0k      10.0M       0.0k sda
>     48.00         0.0k        10.0M         0.0k       0.0k      10.0M       0.0k sda1
> 
> 
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
>     40.00         0.0k        10.0M         0.0k       0.0k      10.0M       0.0k sda
>     40.00         0.0k        10.0M         0.0k       0.0k      10.0M       0.0k sda1
> 
> 
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
>     41.00         0.0k        10.0M         0.0k       0.0k      10.0M       0.0k sda
>     41.00         0.0k        10.0M         0.0k       0.0k      10.0M       0.0k sda1
> 
> 
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
>     46.00         0.0k        10.0M         0.0k       0.0k      10.0M       0.0k sda
>     46.00         0.0k        10.0M         0.0k       0.0k      10.0M       0.0k sda1
> 
> 
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
>     55.00         0.0k        10.2M         0.0k       0.0k      10.2M       0.0k sda
>     55.00         0.0k        10.2M         0.0k       0.0k      10.2M       0.0k sda1
>      ```
> The iostat output shows the write operations as stabilizing at around 10 MB/s,
> which aligns with the defined limit of 10 MB/s. After setting wiops to max, the
> I/O limits appear to work as expected. 
> 
> 
> Thanks,
> Lance

Thanks for the report Lance. Is this something you started seeing after
a kernel update or switch to cgroup v2? (Or you simply noticed with this
setup only?)


Michal

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] cgroupv2/blk: inconsistent I/O behavior in Cgroup v2 with set device wbps and wiops
  2024-08-12 15:43 ` Michal Koutný
@ 2024-08-13  1:37   ` Yu Kuai
  2024-08-13  5:00     ` Lance Yang
  2024-08-13  5:11   ` Lance Yang
  1 sibling, 1 reply; 13+ messages in thread
From: Yu Kuai @ 2024-08-13  1:37 UTC (permalink / raw)
  To: Michal Koutný, Lance Yang
  Cc: linux-mm, linux-kernel, linux-block, cgroups, josef, tj,
	fujita.tomonori, boqun.feng, a.hindborg, paolo.valente, axboe,
	vbabka, david, 21cnbao, baolin.wang, libang.li, yukuai (C)

Hi,

在 2024/08/12 23:43, Michal Koutný 写道:
> +Cc Kuai
> 
> On Mon, Aug 12, 2024 at 11:00:30PM GMT, Lance Yang <ioworker0@gmail.com> wrote:
>> Hi all,
>>
>> I've run into a problem with Cgroup v2 where it doesn't seem to correctly limit
>> I/O operations when I set both wbps and wiops for a device. However, if I only
>> set wbps, then everything works as expected.
>>
>> To reproduce the problem, we can follow these command-based steps:
>>
>> 1. **System Information:**
>>     - Kernel Version and OS Release:
>>       ```
>>       $ uname -r
>>       6.10.0-rc5+
>>
>>       $ cat /etc/os-release
>>       PRETTY_NAME="Ubuntu 24.04 LTS"
>>       NAME="Ubuntu"
>>       VERSION_ID="24.04"
>>       VERSION="24.04 LTS (Noble Numbat)"
>>       VERSION_CODENAME=noble
>>       ID=ubuntu
>>       ID_LIKE=debian
>>       HOME_URL="https://www.ubuntu.com/"
>>       SUPPORT_URL="https://help.ubuntu.com/"
>>       BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
>>       PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
>>       UBUNTU_CODENAME=noble
>>       LOGO=ubuntu-logo
>>       ```
>>
>> 2. **Device Information and Settings:**
>>     - List Block Devices and Scheduler:
>>       ```
>>       $ lsblk
>>       NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
>>       sda     8:0    0   4.4T  0 disk
>>       └─sda1  8:1    0   4.4T  0 part /data
>>       ...
>>
>>       $ cat /sys/block/sda/queue/scheduler
>>       none [mq-deadline] kyber bfq
>>
>>       $ cat /sys/block/sda/queue/rotational
>>       1
>>       ```
>>
>> 3. **Reproducing the problem:**
>>     - Navigate to the cgroup v2 filesystem and configure I/O settings:
>>       ```
>>       $ cd /sys/fs/cgroup/
>>       $ stat -fc %T /sys/fs/cgroup
>>       cgroup2fs
>>       $ mkdir test
>>       $ echo "8:0 wbps=10485760 wiops=100000" > io.max
>>       ```
>>       In this setup:
>>       wbps=10485760 sets the write bytes per second limit to 10 MB/s.
>>       wiops=100000 sets the write I/O operations per second limit to 100,000.
>>
>>     - Add process to the cgroup and verify:
>>       ```
>>       $ echo $$ > cgroup.procs
>>       $ cat cgroup.procs
>>       3826771
>>       3828513
>>       $ ps -ef|grep 3826771
>>       root     3826771 3826768  0 22:04 pts/1    00:00:00 -bash
>>       root     3828761 3826771  0 22:06 pts/1    00:00:00 ps -ef
>>       root     3828762 3826771  0 22:06 pts/1    00:00:00 grep --color=auto 3826771
>>       ```
>>
>>     - Observe I/O performance using `dd` commands and `iostat`:
>>       ```
>>       $ dd if=/dev/zero of=/data/file1 bs=512M count=1 &
>>       $ dd if=/dev/zero of=/data/file1 bs=512M count=1 &

You're testing buffer IO here, and I don't see that write back cgroup is
enabled. Is this test intentional? Why not test direct IO?
>>       ```
>>       ```
>>       $ iostat -d 1 -h -y -p sda
>>       
>> 	   tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
>>       7.00         0.0k         1.3M         0.0k       0.0k       1.3M       0.0k sda
>>       7.00         0.0k         1.3M         0.0k       0.0k       1.3M       0.0k sda1
>>
>>
>>        tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
>>       5.00         0.0k         1.2M         0.0k       0.0k       1.2M       0.0k sda
>>       5.00         0.0k         1.2M         0.0k       0.0k       1.2M       0.0k sda1
>>
>>
>>        tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
>>      21.00         0.0k         1.4M         0.0k       0.0k       1.4M       0.0k sda
>>      21.00         0.0k         1.4M         0.0k       0.0k       1.4M       0.0k sda1
>>
>>
>>        tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
>>       5.00         0.0k         1.2M         0.0k       0.0k       1.2M       0.0k sda
>>       5.00         0.0k         1.2M         0.0k       0.0k       1.2M       0.0k sda1
>>
>>
>>        tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
>>       5.00         0.0k         1.2M         0.0k       0.0k       1.2M       0.0k sda
>>       5.00         0.0k         1.2M         0.0k       0.0k       1.2M       0.0k sda1
>>
>>
>>        tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
>>    1848.00         0.0k       448.1M         0.0k       0.0k     448.1M       0.0k sda
>>    1848.00         0.0k       448.1M         0.0k       0.0k     448.1M       0.0k sda1

Looks like all dirty buffer got flushed to disk at the last second while
the file is closed, this is expected.
>>       ```
>> Initially, the write speed is slow (<2MB/s) then suddenly bursts to several
>> hundreds of MB/s.
> 
> What it would be on average?
> IOW how long would the whole operation in throttled cgroup take?
> 
>>
>>     - Testing with wiops set to max:
>>       ```
>>       echo "8:0 wbps=10485760 wiops=max" > io.max
>>       $ dd if=/dev/zero of=/data/file1 bs=512M count=1 &
>>       $ dd if=/dev/zero of=/data/file1 bs=512M count=1 &
>>       ```
>>       ```
>>       $ iostat -d 1 -h -y -p sda
>>
>>        tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
>>      48.00         0.0k        10.0M         0.0k       0.0k      10.0M       0.0k sda
>>      48.00         0.0k        10.0M         0.0k       0.0k      10.0M       0.0k sda1
>>
>>
>>        tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
>>      40.00         0.0k        10.0M         0.0k       0.0k      10.0M       0.0k sda
>>      40.00         0.0k        10.0M         0.0k       0.0k      10.0M       0.0k sda1
>>
>>
>>        tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
>>      41.00         0.0k        10.0M         0.0k       0.0k      10.0M       0.0k sda
>>      41.00         0.0k        10.0M         0.0k       0.0k      10.0M       0.0k sda1
>>
>>
>>        tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
>>      46.00         0.0k        10.0M         0.0k       0.0k      10.0M       0.0k sda
>>      46.00         0.0k        10.0M         0.0k       0.0k      10.0M       0.0k sda1
>>
>>
>>        tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
>>      55.00         0.0k        10.2M         0.0k       0.0k      10.2M       0.0k sda
>>      55.00         0.0k        10.2M         0.0k       0.0k      10.2M       0.0k sda1

And I don't this wiops=max is the reason, what need to explain is that
why dirty buffer got flushed to disk synchronously before the dd finish
and close the file?

>>       ```
>> The iostat output shows the write operations as stabilizing at around 10 MB/s,
>> which aligns with the defined limit of 10 MB/s. After setting wiops to max, the
>> I/O limits appear to work as expected.

Can you give the direct IO a test? And also enable write back cgroup for
buffer IO.

Thanks,
Kuai

>>
>>
>> Thanks,
>> Lance
> 
> Thanks for the report Lance. Is this something you started seeing after
> a kernel update or switch to cgroup v2? (Or you simply noticed with this
> setup only?)
> 
> 
> Michal
> 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] cgroupv2/blk: inconsistent I/O behavior in Cgroup v2 with set device wbps and wiops
  2024-08-13  1:37   ` Yu Kuai
@ 2024-08-13  5:00     ` Lance Yang
  2024-08-13  6:17       ` Lance Yang
  2024-08-13  6:39       ` Yu Kuai
  0 siblings, 2 replies; 13+ messages in thread
From: Lance Yang @ 2024-08-13  5:00 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Michal Koutný,
	linux-mm, linux-kernel, linux-block, cgroups, josef, tj,
	fujita.tomonori, boqun.feng, a.hindborg, paolo.valente, axboe,
	vbabka, david, 21cnbao, baolin.wang, libang.li, yukuai (C)

Hi Kuai,

Thanks a lot for jumping in!

On Tue, Aug 13, 2024 at 9:37 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2024/08/12 23:43, Michal Koutný 写道:
> > +Cc Kuai
> >
> > On Mon, Aug 12, 2024 at 11:00:30PM GMT, Lance Yang <ioworker0@gmail.com> wrote:
> >> Hi all,
> >>
> >> I've run into a problem with Cgroup v2 where it doesn't seem to correctly limit
> >> I/O operations when I set both wbps and wiops for a device. However, if I only
> >> set wbps, then everything works as expected.
> >>
> >> To reproduce the problem, we can follow these command-based steps:
> >>
> >> 1. **System Information:**
> >>     - Kernel Version and OS Release:
> >>       ```
> >>       $ uname -r
> >>       6.10.0-rc5+
> >>
> >>       $ cat /etc/os-release
> >>       PRETTY_NAME="Ubuntu 24.04 LTS"
> >>       NAME="Ubuntu"
> >>       VERSION_ID="24.04"
> >>       VERSION="24.04 LTS (Noble Numbat)"
> >>       VERSION_CODENAME=noble
> >>       ID=ubuntu
> >>       ID_LIKE=debian
> >>       HOME_URL="https://www.ubuntu.com/"
> >>       SUPPORT_URL="https://help.ubuntu.com/"
> >>       BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
> >>       PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
> >>       UBUNTU_CODENAME=noble
> >>       LOGO=ubuntu-logo
> >>       ```
> >>
> >> 2. **Device Information and Settings:**
> >>     - List Block Devices and Scheduler:
> >>       ```
> >>       $ lsblk
> >>       NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
> >>       sda     8:0    0   4.4T  0 disk
> >>       └─sda1  8:1    0   4.4T  0 part /data
> >>       ...
> >>
> >>       $ cat /sys/block/sda/queue/scheduler
> >>       none [mq-deadline] kyber bfq
> >>
> >>       $ cat /sys/block/sda/queue/rotational
> >>       1
> >>       ```
> >>
> >> 3. **Reproducing the problem:**
> >>     - Navigate to the cgroup v2 filesystem and configure I/O settings:
> >>       ```
> >>       $ cd /sys/fs/cgroup/
> >>       $ stat -fc %T /sys/fs/cgroup
> >>       cgroup2fs
> >>       $ mkdir test
> >>       $ echo "8:0 wbps=10485760 wiops=100000" > io.max
> >>       ```
> >>       In this setup:
> >>       wbps=10485760 sets the write bytes per second limit to 10 MB/s.
> >>       wiops=100000 sets the write I/O operations per second limit to 100,000.
> >>
> >>     - Add process to the cgroup and verify:
> >>       ```
> >>       $ echo $$ > cgroup.procs
> >>       $ cat cgroup.procs
> >>       3826771
> >>       3828513
> >>       $ ps -ef|grep 3826771
> >>       root     3826771 3826768  0 22:04 pts/1    00:00:00 -bash
> >>       root     3828761 3826771  0 22:06 pts/1    00:00:00 ps -ef
> >>       root     3828762 3826771  0 22:06 pts/1    00:00:00 grep --color=auto 3826771
> >>       ```
> >>
> >>     - Observe I/O performance using `dd` commands and `iostat`:
> >>       ```
> >>       $ dd if=/dev/zero of=/data/file1 bs=512M count=1 &
> >>       $ dd if=/dev/zero of=/data/file1 bs=512M count=1 &
>
> You're testing buffer IO here, and I don't see that write back cgroup is
> enabled. Is this test intentional? Why not test direct IO?

Yes, I was testing buffered I/O and can confirm that CONFIG_CGROUP_WRITEBACK
was enabled.

$ cat /boot/config-6.10.0-rc5+ |grep CONFIG_CGROUP_WRITEBACK
CONFIG_CGROUP_WRITEBACK=y

We intend to configure both wbps (write bytes per second) and wiops
(write I/O operations
per second) for the containers. IIUC, this setup will effectively
restrict both their block device
I/Os and buffered I/Os.

> Why not test direct IO?

I was testing direct IO as well. However it did not work as expected with
`echo "8:0 wbps=10485760 wiops=100000" > io.max`.

$ time dd if=/dev/zero of=/data/file7 bs=512M count=1 oflag=direct
1+0 records in
1+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 51.5962 s, 10.4 MB/s

real 0m51.637s
user 0m0.000s
sys 0m0.313s

$ iostat -d 1 -h -y -p sda
 tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
kB_dscd Device
     9.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     9.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
    12.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
    12.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
    10.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
    10.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
    10.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
    10.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     7.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     7.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     9.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     9.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
    11.00         0.0k         1.4M         0.0k       0.0k       1.4M
      0.0k sda
    11.00         0.0k         1.4M         0.0k       0.0k       1.4M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
    11.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
    11.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
    55.00         0.0k         1.8M         0.0k       0.0k       1.8M
      0.0k sda
    55.00         0.0k         1.8M         0.0k       0.0k       1.8M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     8.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     8.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
    11.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
    11.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
    11.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
    11.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
    14.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
    14.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
    10.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
    10.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     7.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     7.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     8.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     8.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     9.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     9.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
    12.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
    12.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     7.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     7.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     8.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     8.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     7.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     7.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     8.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     8.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
    14.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
    14.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
    12.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
    12.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     7.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     7.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
    10.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
    10.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
    13.00         0.0k         1.4M         0.0k       0.0k       1.4M
      0.0k sda
    13.00         0.0k         1.4M         0.0k       0.0k       1.4M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
    12.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
    12.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     7.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     7.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     8.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     8.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     7.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     7.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     8.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     8.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     9.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     9.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     8.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     8.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     8.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     8.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
    10.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
    10.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
    13.00         0.0k         1.4M         0.0k       0.0k       1.4M
      0.0k sda
    13.00         0.0k         1.4M         0.0k       0.0k       1.4M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
    12.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
    12.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
    18.00         0.0k         1.4M         0.0k       0.0k       1.4M
      0.0k sda
    18.00         0.0k         1.4M         0.0k       0.0k       1.4M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
    10.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
    10.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     9.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     9.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     9.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     9.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     7.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     7.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
    11.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
    11.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     9.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     9.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     8.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     8.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
    12.00         0.0k         1.4M         0.0k       0.0k       1.4M
      0.0k sda
    12.00         0.0k         1.4M         0.0k       0.0k       1.4M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
    10.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
    10.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
    10.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
    10.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
  1804.00         0.0k       445.8M         0.0k       0.0k     445.8M
      0.0k sda
  1804.00         0.0k       445.8M         0.0k       0.0k     445.8M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     4.00         0.0k        24.0k         0.0k       0.0k      24.0k
      0.0k sda
     4.00         0.0k        24.0k         0.0k       0.0k      24.0k
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     0.00         0.0k         0.0k         0.0k       0.0k       0.0k
      0.0k sda
     0.00         0.0k         0.0k         0.0k       0.0k       0.0k
      0.0k sda1

There are two things that confuse me. First, initially, neither the
wbps nor the wiops
reached their limits. Second, in the last second, the wbps far
exceeded the limit.

But if I only set wbps, then everything works as expected with
`echo "8:0 wbps=10485760 wiops=max" > io.max`

> >>       ```
> >>       ```
> >>       $ iostat -d 1 -h -y -p sda
> >>
> >>         tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
> >>       7.00         0.0k         1.3M         0.0k       0.0k       1.3M       0.0k sda
> >>       7.00         0.0k         1.3M         0.0k       0.0k       1.3M       0.0k sda1
> >>
> >>
> >>        tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
> >>       5.00         0.0k         1.2M         0.0k       0.0k       1.2M       0.0k sda
> >>       5.00         0.0k         1.2M         0.0k       0.0k       1.2M       0.0k sda1
> >>
> >>
> >>        tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
> >>      21.00         0.0k         1.4M         0.0k       0.0k       1.4M       0.0k sda
> >>      21.00         0.0k         1.4M         0.0k       0.0k       1.4M       0.0k sda1
> >>
> >>
> >>        tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
> >>       5.00         0.0k         1.2M         0.0k       0.0k       1.2M       0.0k sda
> >>       5.00         0.0k         1.2M         0.0k       0.0k       1.2M       0.0k sda1
> >>
> >>
> >>        tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
> >>       5.00         0.0k         1.2M         0.0k       0.0k       1.2M       0.0k sda
> >>       5.00         0.0k         1.2M         0.0k       0.0k       1.2M       0.0k sda1
> >>
> >>
> >>        tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
> >>    1848.00         0.0k       448.1M         0.0k       0.0k     448.1M       0.0k sda
> >>    1848.00         0.0k       448.1M         0.0k       0.0k     448.1M       0.0k sda1
>
> Looks like all dirty buffer got flushed to disk at the last second while
> the file is closed, this is expected.

The dd command completed in less than a second, but flushing all the
dirty buffers to
disk took a much longer time. By the time the flushing was completed,
the file had
already been closed, IIUC.

$ time dd if=/dev/zero of=/data/file5 bs=512M count=1
1+0 records in
1+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 0.531944 s, 1.0 GB/s

real 0m0.578s
user 0m0.000s
sys 0m0.576s

$ iostat -d 1 -h -y -p sda
   tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
kB_dscd Device
     0.00         0.0k         0.0k         0.0k       0.0k       0.0k
      0.0k sda
     0.00         0.0k         0.0k         0.0k       0.0k       0.0k
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
    74.00         0.0k       664.0k         0.0k       0.0k     664.0k
      0.0k sda
    74.00         0.0k       664.0k         0.0k       0.0k     664.0k
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
    15.00         0.0k         1.1M         0.0k       0.0k       1.1M
      0.0k sda
    15.00         0.0k         1.1M         0.0k       0.0k       1.1M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     5.00         0.0k         1.2M         0.0k       0.0k       1.2M
      0.0k sda
     5.00         0.0k         1.2M         0.0k       0.0k       1.2M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     5.00         0.0k         1.2M         0.0k       0.0k       1.2M
      0.0k sda
     5.00         0.0k         1.2M         0.0k       0.0k       1.2M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     5.00         0.0k         1.2M         0.0k       0.0k       1.2M
      0.0k sda
     5.00         0.0k         1.2M         0.0k       0.0k       1.2M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     6.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     6.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     8.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     8.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     5.00         0.0k         1.2M         0.0k       0.0k       1.2M
      0.0k sda
     5.00         0.0k         1.2M         0.0k       0.0k       1.2M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     8.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     8.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     6.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     6.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
    10.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
    10.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     8.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     8.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     6.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     6.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     6.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     6.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     9.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     9.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     9.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     9.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     8.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     8.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     6.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     6.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     6.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     6.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     5.00         0.0k         1.2M         0.0k       0.0k       1.2M
      0.0k sda
     5.00         0.0k         1.2M         0.0k       0.0k       1.2M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     8.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     8.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
    13.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
    13.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
    10.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
    10.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     6.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     6.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     5.00         0.0k         1.2M         0.0k       0.0k       1.2M
      0.0k sda
     5.00         0.0k         1.2M         0.0k       0.0k       1.2M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
    12.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
    12.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     9.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     9.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     7.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     7.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
    11.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
    11.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     6.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     6.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
    46.00         0.0k         1.7M         0.0k       0.0k       1.7M
      0.0k sda
    46.00         0.0k         1.7M         0.0k       0.0k       1.7M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     6.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     6.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
    10.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
    10.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     6.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     6.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     7.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     7.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     6.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     6.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     8.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     8.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
    10.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
    10.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     7.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     7.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     6.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     6.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     8.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     8.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     6.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     6.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
    11.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
    11.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     5.00         0.0k         1.2M         0.0k       0.0k       1.2M
      0.0k sda
     5.00         0.0k         1.2M         0.0k       0.0k       1.2M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     6.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     6.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
    11.00         0.0k         1.4M         0.0k       0.0k       1.4M
      0.0k sda
    11.00         0.0k         1.4M         0.0k       0.0k       1.4M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
    15.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
    15.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     8.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
     8.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
    15.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
    15.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
    10.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda
    10.00         0.0k         1.3M         0.0k       0.0k       1.3M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
    49.00         0.0k         1.6M         0.0k       0.0k       1.6M
      0.0k sda
    49.00         0.0k         1.6M         0.0k       0.0k       1.6M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
    53.00         0.0k         1.6M         0.0k       0.0k       1.6M
      0.0k sda
    53.00         0.0k         1.6M         0.0k       0.0k       1.6M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
  1805.00         0.0k       448.4M         0.0k       0.0k     448.4M
      0.0k sda
  1805.00         0.0k       448.4M         0.0k       0.0k     448.4M
      0.0k sda1


      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
   kB_dscd Device
     0.00         0.0k         0.0k         0.0k       0.0k       0.0k
      0.0k sda
     0.00         0.0k         0.0k         0.0k       0.0k       0.0k
      0.0k sda1

> >>       ```
> >> Initially, the write speed is slow (<2MB/s) then suddenly bursts to several
> >> hundreds of MB/s.
> >
> > What it would be on average?
> > IOW how long would the whole operation in throttled cgroup take?
> >
> >>
> >>     - Testing with wiops set to max:
> >>       ```
> >>       echo "8:0 wbps=10485760 wiops=max" > io.max
> >>       $ dd if=/dev/zero of=/data/file1 bs=512M count=1 &
> >>       $ dd if=/dev/zero of=/data/file1 bs=512M count=1 &
> >>       ```
> >>       ```
> >>       $ iostat -d 1 -h -y -p sda
> >>
> >>        tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
> >>      48.00         0.0k        10.0M         0.0k       0.0k      10.0M       0.0k sda
> >>      48.00         0.0k        10.0M         0.0k       0.0k      10.0M       0.0k sda1
> >>
> >>
> >>        tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
> >>      40.00         0.0k        10.0M         0.0k       0.0k      10.0M       0.0k sda
> >>      40.00         0.0k        10.0M         0.0k       0.0k      10.0M       0.0k sda1
> >>
> >>
> >>        tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
> >>      41.00         0.0k        10.0M         0.0k       0.0k      10.0M       0.0k sda
> >>      41.00         0.0k        10.0M         0.0k       0.0k      10.0M       0.0k sda1
> >>
> >>
> >>        tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
> >>      46.00         0.0k        10.0M         0.0k       0.0k      10.0M       0.0k sda
> >>      46.00         0.0k        10.0M         0.0k       0.0k      10.0M       0.0k sda1
> >>
> >>
> >>        tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
> >>      55.00         0.0k        10.2M         0.0k       0.0k      10.2M       0.0k sda
> >>      55.00         0.0k        10.2M         0.0k       0.0k      10.2M       0.0k sda1
>
> And I don't this wiops=max is the reason, what need to explain is that
> why dirty buffer got flushed to disk synchronously before the dd finish
> and close the file?

The dd command operates in the background, and it seems that the dirty
buffers begin
to flush after the command has completed.

>
> >>       ```
> >> The iostat output shows the write operations as stabilizing at around 10 MB/s,
> >> which aligns with the defined limit of 10 MB/s. After setting wiops to max, the
> >> I/O limits appear to work as expected.
>
> Can you give the direct IO a test? And also enable write back cgroup for
> buffer IO.
>
> Thanks,
> Kuai


Thanks a lot again for your time!
Lance

>
> >>
> >>
> >> Thanks,
> >> Lance
> >
> > Thanks for the report Lance. Is this something you started seeing after
> > a kernel update or switch to cgroup v2? (Or you simply noticed with this
> > setup only?)
> >
> >
> > Michal
> >
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] cgroupv2/blk: inconsistent I/O behavior in Cgroup v2 with set device wbps and wiops
  2024-08-12 15:43 ` Michal Koutný
  2024-08-13  1:37   ` Yu Kuai
@ 2024-08-13  5:11   ` Lance Yang
  1 sibling, 0 replies; 13+ messages in thread
From: Lance Yang @ 2024-08-13  5:11 UTC (permalink / raw)
  To: Michal Koutný
  Cc: linux-mm, linux-kernel, linux-block, cgroups, josef, tj,
	fujita.tomonori, boqun.feng, a.hindborg, paolo.valente, axboe,
	vbabka, david, 21cnbao, baolin.wang, libang.li, Yu Kuai

Hi Michal,

Thanks a lot for jumping in!

On Mon, Aug 12, 2024 at 11:43 PM Michal Koutný <mkoutny@suse.com> wrote:
>
> +Cc Kuai
>
> On Mon, Aug 12, 2024 at 11:00:30PM GMT, Lance Yang <ioworker0@gmail.com> wrote:
> > Hi all,
> >
> > I've run into a problem with Cgroup v2 where it doesn't seem to correctly limit
> > I/O operations when I set both wbps and wiops for a device. However, if I only
> > set wbps, then everything works as expected.
> >
> > To reproduce the problem, we can follow these command-based steps:
> >
> > 1. **System Information:**
> >    - Kernel Version and OS Release:
> >      ```
> >      $ uname -r
> >      6.10.0-rc5+
> >
> >      $ cat /etc/os-release
> >      PRETTY_NAME="Ubuntu 24.04 LTS"
> >      NAME="Ubuntu"
> >      VERSION_ID="24.04"
> >      VERSION="24.04 LTS (Noble Numbat)"
> >      VERSION_CODENAME=noble
> >      ID=ubuntu
> >      ID_LIKE=debian
> >      HOME_URL="https://www.ubuntu.com/"
> >      SUPPORT_URL="https://help.ubuntu.com/"
> >      BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
> >      PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
> >      UBUNTU_CODENAME=noble
> >      LOGO=ubuntu-logo
> >      ```
> >
> > 2. **Device Information and Settings:**
> >    - List Block Devices and Scheduler:
> >      ```
> >      $ lsblk
> >      NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
> >      sda     8:0    0   4.4T  0 disk
> >      └─sda1  8:1    0   4.4T  0 part /data
> >      ...
> >
> >      $ cat /sys/block/sda/queue/scheduler
> >      none [mq-deadline] kyber bfq
> >
> >      $ cat /sys/block/sda/queue/rotational
> >      1
> >      ```
> >
> > 3. **Reproducing the problem:**
> >    - Navigate to the cgroup v2 filesystem and configure I/O settings:
> >      ```
> >      $ cd /sys/fs/cgroup/
> >      $ stat -fc %T /sys/fs/cgroup
> >      cgroup2fs
> >      $ mkdir test
> >      $ echo "8:0 wbps=10485760 wiops=100000" > io.max
> >      ```
> >      In this setup:
> >      wbps=10485760 sets the write bytes per second limit to 10 MB/s.
> >      wiops=100000 sets the write I/O operations per second limit to 100,000.
> >
> >    - Add process to the cgroup and verify:
> >      ```
> >      $ echo $$ > cgroup.procs
> >      $ cat cgroup.procs
> >      3826771
> >      3828513
> >      $ ps -ef|grep 3826771
> >      root     3826771 3826768  0 22:04 pts/1    00:00:00 -bash
> >      root     3828761 3826771  0 22:06 pts/1    00:00:00 ps -ef
> >      root     3828762 3826771  0 22:06 pts/1    00:00:00 grep --color=auto 3826771
> >      ```
> >
> >    - Observe I/O performance using `dd` commands and `iostat`:
> >      ```
> >      $ dd if=/dev/zero of=/data/file1 bs=512M count=1 &
> >      $ dd if=/dev/zero of=/data/file1 bs=512M count=1 &
> >      ```
> >      ```
> >      $ iostat -d 1 -h -y -p sda
> >
> >          tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
> >      7.00         0.0k         1.3M         0.0k       0.0k       1.3M       0.0k sda
> >      7.00         0.0k         1.3M         0.0k       0.0k       1.3M       0.0k sda1
> >
> >
> >       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
> >      5.00         0.0k         1.2M         0.0k       0.0k       1.2M       0.0k sda
> >      5.00         0.0k         1.2M         0.0k       0.0k       1.2M       0.0k sda1
> >
> >
> >       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
> >     21.00         0.0k         1.4M         0.0k       0.0k       1.4M       0.0k sda
> >     21.00         0.0k         1.4M         0.0k       0.0k       1.4M       0.0k sda1
> >
> >
> >       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
> >      5.00         0.0k         1.2M         0.0k       0.0k       1.2M       0.0k sda
> >      5.00         0.0k         1.2M         0.0k       0.0k       1.2M       0.0k sda1
> >
> >
> >       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
> >      5.00         0.0k         1.2M         0.0k       0.0k       1.2M       0.0k sda
> >      5.00         0.0k         1.2M         0.0k       0.0k       1.2M       0.0k sda1
> >
> >
> >       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
> >   1848.00         0.0k       448.1M         0.0k       0.0k     448.1M       0.0k sda
> >   1848.00         0.0k       448.1M         0.0k       0.0k     448.1M       0.0k sda1
> >      ```
> > Initially, the write speed is slow (<2MB/s) then suddenly bursts to several
> > hundreds of MB/s.
>
> What it would be on average?
> IOW how long would the whole operation in throttled cgroup take?
>
> >
> >    - Testing with wiops set to max:
> >      ```
> >      echo "8:0 wbps=10485760 wiops=max" > io.max
> >      $ dd if=/dev/zero of=/data/file1 bs=512M count=1 &
> >      $ dd if=/dev/zero of=/data/file1 bs=512M count=1 &
> >      ```
> >      ```
> >      $ iostat -d 1 -h -y -p sda
> >
> >       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
> >     48.00         0.0k        10.0M         0.0k       0.0k      10.0M       0.0k sda
> >     48.00         0.0k        10.0M         0.0k       0.0k      10.0M       0.0k sda1
> >
> >
> >       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
> >     40.00         0.0k        10.0M         0.0k       0.0k      10.0M       0.0k sda
> >     40.00         0.0k        10.0M         0.0k       0.0k      10.0M       0.0k sda1
> >
> >
> >       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
> >     41.00         0.0k        10.0M         0.0k       0.0k      10.0M       0.0k sda
> >     41.00         0.0k        10.0M         0.0k       0.0k      10.0M       0.0k sda1
> >
> >
> >       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
> >     46.00         0.0k        10.0M         0.0k       0.0k      10.0M       0.0k sda
> >     46.00         0.0k        10.0M         0.0k       0.0k      10.0M       0.0k sda1
> >
> >
> >       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
> >     55.00         0.0k        10.2M         0.0k       0.0k      10.2M       0.0k sda
> >     55.00         0.0k        10.2M         0.0k       0.0k      10.2M       0.0k sda1
> >      ```
> > The iostat output shows the write operations as stabilizing at around 10 MB/s,
> > which aligns with the defined limit of 10 MB/s. After setting wiops to max, the
> > I/O limits appear to work as expected.
> >
> >
> > Thanks,
> > Lance
>
> Thanks for the report Lance. Is this something you started seeing after
> a kernel update or switch to cgroup v2? (Or you simply noticed with this
> setup only?)

I just switched to cgroup v2 to begin testing, as we intend to have
containers run
in cgroup v2. Testing on both the 5.14.0 and mainline versions ;)

Thanks again for your time!
Lance

>
>
> Michal


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] cgroupv2/blk: inconsistent I/O behavior in Cgroup v2 with set device wbps and wiops
  2024-08-13  5:00     ` Lance Yang
@ 2024-08-13  6:17       ` Lance Yang
  2024-08-13  6:39       ` Yu Kuai
  1 sibling, 0 replies; 13+ messages in thread
From: Lance Yang @ 2024-08-13  6:17 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Michal Koutný,
	linux-mm, linux-kernel, linux-block, cgroups, josef, tj,
	fujita.tomonori, boqun.feng, a.hindborg, paolo.valente, axboe,
	vbabka, david, 21cnbao, baolin.wang, libang.li, yukuai (C)

I just realized that bursts appear to depend on the maximum values of wbps and
wiops, when wiops is not `max`?

Thanks,
Lance

On Tue, Aug 13, 2024 at 1:00 PM Lance Yang <ioworker0@gmail.com> wrote:
>
> Hi Kuai,
>
> Thanks a lot for jumping in!
>
> On Tue, Aug 13, 2024 at 9:37 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> >
> > Hi,
> >
> > 在 2024/08/12 23:43, Michal Koutný 写道:
> > > +Cc Kuai
> > >
> > > On Mon, Aug 12, 2024 at 11:00:30PM GMT, Lance Yang <ioworker0@gmail.com> wrote:
> > >> Hi all,
> > >>
> > >> I've run into a problem with Cgroup v2 where it doesn't seem to correctly limit
> > >> I/O operations when I set both wbps and wiops for a device. However, if I only
> > >> set wbps, then everything works as expected.
> > >>
> > >> To reproduce the problem, we can follow these command-based steps:
> > >>
> > >> 1. **System Information:**
> > >>     - Kernel Version and OS Release:
> > >>       ```
> > >>       $ uname -r
> > >>       6.10.0-rc5+
> > >>
> > >>       $ cat /etc/os-release
> > >>       PRETTY_NAME="Ubuntu 24.04 LTS"
> > >>       NAME="Ubuntu"
> > >>       VERSION_ID="24.04"
> > >>       VERSION="24.04 LTS (Noble Numbat)"
> > >>       VERSION_CODENAME=noble
> > >>       ID=ubuntu
> > >>       ID_LIKE=debian
> > >>       HOME_URL="https://www.ubuntu.com/"
> > >>       SUPPORT_URL="https://help.ubuntu.com/"
> > >>       BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
> > >>       PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
> > >>       UBUNTU_CODENAME=noble
> > >>       LOGO=ubuntu-logo
> > >>       ```
> > >>
> > >> 2. **Device Information and Settings:**
> > >>     - List Block Devices and Scheduler:
> > >>       ```
> > >>       $ lsblk
> > >>       NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
> > >>       sda     8:0    0   4.4T  0 disk
> > >>       └─sda1  8:1    0   4.4T  0 part /data
> > >>       ...
> > >>
> > >>       $ cat /sys/block/sda/queue/scheduler
> > >>       none [mq-deadline] kyber bfq
> > >>
> > >>       $ cat /sys/block/sda/queue/rotational
> > >>       1
> > >>       ```
> > >>
> > >> 3. **Reproducing the problem:**
> > >>     - Navigate to the cgroup v2 filesystem and configure I/O settings:
> > >>       ```
> > >>       $ cd /sys/fs/cgroup/
> > >>       $ stat -fc %T /sys/fs/cgroup
> > >>       cgroup2fs
> > >>       $ mkdir test
> > >>       $ echo "8:0 wbps=10485760 wiops=100000" > io.max
> > >>       ```
> > >>       In this setup:
> > >>       wbps=10485760 sets the write bytes per second limit to 10 MB/s.
> > >>       wiops=100000 sets the write I/O operations per second limit to 100,000.
> > >>
> > >>     - Add process to the cgroup and verify:
> > >>       ```
> > >>       $ echo $$ > cgroup.procs
> > >>       $ cat cgroup.procs
> > >>       3826771
> > >>       3828513
> > >>       $ ps -ef|grep 3826771
> > >>       root     3826771 3826768  0 22:04 pts/1    00:00:00 -bash
> > >>       root     3828761 3826771  0 22:06 pts/1    00:00:00 ps -ef
> > >>       root     3828762 3826771  0 22:06 pts/1    00:00:00 grep --color=auto 3826771
> > >>       ```
> > >>
> > >>     - Observe I/O performance using `dd` commands and `iostat`:
> > >>       ```
> > >>       $ dd if=/dev/zero of=/data/file1 bs=512M count=1 &
> > >>       $ dd if=/dev/zero of=/data/file1 bs=512M count=1 &
> >
> > You're testing buffer IO here, and I don't see that write back cgroup is
> > enabled. Is this test intentional? Why not test direct IO?
>
> Yes, I was testing buffered I/O and can confirm that CONFIG_CGROUP_WRITEBACK
> was enabled.
>
> $ cat /boot/config-6.10.0-rc5+ |grep CONFIG_CGROUP_WRITEBACK
> CONFIG_CGROUP_WRITEBACK=y
>
> We intend to configure both wbps (write bytes per second) and wiops
> (write I/O operations
> per second) for the containers. IIUC, this setup will effectively
> restrict both their block device
> I/Os and buffered I/Os.
>
> > Why not test direct IO?
>
> I was testing direct IO as well. However it did not work as expected with
> `echo "8:0 wbps=10485760 wiops=100000" > io.max`.
>
> $ time dd if=/dev/zero of=/data/file7 bs=512M count=1 oflag=direct
> 1+0 records in
> 1+0 records out
> 536870912 bytes (537 MB, 512 MiB) copied, 51.5962 s, 10.4 MB/s
>
> real 0m51.637s
> user 0m0.000s
> sys 0m0.313s
>
> $ iostat -d 1 -h -y -p sda
>  tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
> kB_dscd Device
>      9.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      9.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>     12.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>     12.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>     10.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>     10.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>     10.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>     10.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      7.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      7.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      9.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      9.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>     11.00         0.0k         1.4M         0.0k       0.0k       1.4M
>       0.0k sda
>     11.00         0.0k         1.4M         0.0k       0.0k       1.4M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>     11.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>     11.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>     55.00         0.0k         1.8M         0.0k       0.0k       1.8M
>       0.0k sda
>     55.00         0.0k         1.8M         0.0k       0.0k       1.8M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      8.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      8.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>     11.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>     11.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>     11.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>     11.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>     14.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>     14.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>     10.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>     10.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      7.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      7.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      8.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      8.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      9.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      9.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>     12.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>     12.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      7.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      7.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      8.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      8.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      7.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      7.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      8.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      8.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>     14.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>     14.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>     12.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>     12.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      7.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      7.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>     10.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>     10.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>     13.00         0.0k         1.4M         0.0k       0.0k       1.4M
>       0.0k sda
>     13.00         0.0k         1.4M         0.0k       0.0k       1.4M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>     12.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>     12.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      7.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      7.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      8.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      8.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      7.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      7.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      8.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      8.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      9.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      9.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      8.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      8.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      8.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      8.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>     10.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>     10.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>     13.00         0.0k         1.4M         0.0k       0.0k       1.4M
>       0.0k sda
>     13.00         0.0k         1.4M         0.0k       0.0k       1.4M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>     12.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>     12.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>     18.00         0.0k         1.4M         0.0k       0.0k       1.4M
>       0.0k sda
>     18.00         0.0k         1.4M         0.0k       0.0k       1.4M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>     10.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>     10.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      9.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      9.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      9.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      9.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      7.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      7.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>     11.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>     11.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      9.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      9.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      8.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      8.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>     12.00         0.0k         1.4M         0.0k       0.0k       1.4M
>       0.0k sda
>     12.00         0.0k         1.4M         0.0k       0.0k       1.4M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>     10.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>     10.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>     10.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>     10.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>   1804.00         0.0k       445.8M         0.0k       0.0k     445.8M
>       0.0k sda
>   1804.00         0.0k       445.8M         0.0k       0.0k     445.8M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      4.00         0.0k        24.0k         0.0k       0.0k      24.0k
>       0.0k sda
>      4.00         0.0k        24.0k         0.0k       0.0k      24.0k
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      0.00         0.0k         0.0k         0.0k       0.0k       0.0k
>       0.0k sda
>      0.00         0.0k         0.0k         0.0k       0.0k       0.0k
>       0.0k sda1
>
> There are two things that confuse me. First, initially, neither the
> wbps nor the wiops
> reached their limits. Second, in the last second, the wbps far
> exceeded the limit.
>
> But if I only set wbps, then everything works as expected with
> `echo "8:0 wbps=10485760 wiops=max" > io.max`
>
> > >>       ```
> > >>       ```
> > >>       $ iostat -d 1 -h -y -p sda
> > >>
> > >>         tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
> > >>       7.00         0.0k         1.3M         0.0k       0.0k       1.3M       0.0k sda
> > >>       7.00         0.0k         1.3M         0.0k       0.0k       1.3M       0.0k sda1
> > >>
> > >>
> > >>        tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
> > >>       5.00         0.0k         1.2M         0.0k       0.0k       1.2M       0.0k sda
> > >>       5.00         0.0k         1.2M         0.0k       0.0k       1.2M       0.0k sda1
> > >>
> > >>
> > >>        tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
> > >>      21.00         0.0k         1.4M         0.0k       0.0k       1.4M       0.0k sda
> > >>      21.00         0.0k         1.4M         0.0k       0.0k       1.4M       0.0k sda1
> > >>
> > >>
> > >>        tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
> > >>       5.00         0.0k         1.2M         0.0k       0.0k       1.2M       0.0k sda
> > >>       5.00         0.0k         1.2M         0.0k       0.0k       1.2M       0.0k sda1
> > >>
> > >>
> > >>        tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
> > >>       5.00         0.0k         1.2M         0.0k       0.0k       1.2M       0.0k sda
> > >>       5.00         0.0k         1.2M         0.0k       0.0k       1.2M       0.0k sda1
> > >>
> > >>
> > >>        tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
> > >>    1848.00         0.0k       448.1M         0.0k       0.0k     448.1M       0.0k sda
> > >>    1848.00         0.0k       448.1M         0.0k       0.0k     448.1M       0.0k sda1
> >
> > Looks like all dirty buffer got flushed to disk at the last second while
> > the file is closed, this is expected.
>
> The dd command completed in less than a second, but flushing all the
> dirty buffers to
> disk took a much longer time. By the time the flushing was completed,
> the file had
> already been closed, IIUC.
>
> $ time dd if=/dev/zero of=/data/file5 bs=512M count=1
> 1+0 records in
> 1+0 records out
> 536870912 bytes (537 MB, 512 MiB) copied, 0.531944 s, 1.0 GB/s
>
> real 0m0.578s
> user 0m0.000s
> sys 0m0.576s
>
> $ iostat -d 1 -h -y -p sda
>    tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
> kB_dscd Device
>      0.00         0.0k         0.0k         0.0k       0.0k       0.0k
>       0.0k sda
>      0.00         0.0k         0.0k         0.0k       0.0k       0.0k
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>     74.00         0.0k       664.0k         0.0k       0.0k     664.0k
>       0.0k sda
>     74.00         0.0k       664.0k         0.0k       0.0k     664.0k
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>     15.00         0.0k         1.1M         0.0k       0.0k       1.1M
>       0.0k sda
>     15.00         0.0k         1.1M         0.0k       0.0k       1.1M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      5.00         0.0k         1.2M         0.0k       0.0k       1.2M
>       0.0k sda
>      5.00         0.0k         1.2M         0.0k       0.0k       1.2M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      5.00         0.0k         1.2M         0.0k       0.0k       1.2M
>       0.0k sda
>      5.00         0.0k         1.2M         0.0k       0.0k       1.2M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      5.00         0.0k         1.2M         0.0k       0.0k       1.2M
>       0.0k sda
>      5.00         0.0k         1.2M         0.0k       0.0k       1.2M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      6.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      6.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      8.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      8.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      5.00         0.0k         1.2M         0.0k       0.0k       1.2M
>       0.0k sda
>      5.00         0.0k         1.2M         0.0k       0.0k       1.2M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      8.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      8.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      6.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      6.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>     10.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>     10.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      8.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      8.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      6.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      6.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      6.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      6.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      9.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      9.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      9.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      9.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      8.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      8.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      6.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      6.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      6.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      6.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      5.00         0.0k         1.2M         0.0k       0.0k       1.2M
>       0.0k sda
>      5.00         0.0k         1.2M         0.0k       0.0k       1.2M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      8.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      8.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>     13.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>     13.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>     10.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>     10.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      6.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      6.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      5.00         0.0k         1.2M         0.0k       0.0k       1.2M
>       0.0k sda
>      5.00         0.0k         1.2M         0.0k       0.0k       1.2M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>     12.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>     12.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      9.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      9.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      7.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      7.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>     11.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>     11.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      6.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      6.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>     46.00         0.0k         1.7M         0.0k       0.0k       1.7M
>       0.0k sda
>     46.00         0.0k         1.7M         0.0k       0.0k       1.7M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      6.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      6.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>     10.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>     10.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      6.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      6.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      7.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      7.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      6.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      6.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      8.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      8.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>     10.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>     10.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      7.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      7.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      6.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      6.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      8.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      8.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      6.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      6.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>     11.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>     11.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      5.00         0.0k         1.2M         0.0k       0.0k       1.2M
>       0.0k sda
>      5.00         0.0k         1.2M         0.0k       0.0k       1.2M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      6.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      6.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>     11.00         0.0k         1.4M         0.0k       0.0k       1.4M
>       0.0k sda
>     11.00         0.0k         1.4M         0.0k       0.0k       1.4M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>     15.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>     15.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      8.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>      8.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>     15.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>     15.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>     10.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda
>     10.00         0.0k         1.3M         0.0k       0.0k       1.3M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>     49.00         0.0k         1.6M         0.0k       0.0k       1.6M
>       0.0k sda
>     49.00         0.0k         1.6M         0.0k       0.0k       1.6M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>     53.00         0.0k         1.6M         0.0k       0.0k       1.6M
>       0.0k sda
>     53.00         0.0k         1.6M         0.0k       0.0k       1.6M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>   1805.00         0.0k       448.4M         0.0k       0.0k     448.4M
>       0.0k sda
>   1805.00         0.0k       448.4M         0.0k       0.0k     448.4M
>       0.0k sda1
>
>
>       tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>    kB_dscd Device
>      0.00         0.0k         0.0k         0.0k       0.0k       0.0k
>       0.0k sda
>      0.00         0.0k         0.0k         0.0k       0.0k       0.0k
>       0.0k sda1
>
> > >>       ```
> > >> Initially, the write speed is slow (<2MB/s) then suddenly bursts to several
> > >> hundreds of MB/s.
> > >
> > > What it would be on average?
> > > IOW how long would the whole operation in throttled cgroup take?
> > >
> > >>
> > >>     - Testing with wiops set to max:
> > >>       ```
> > >>       echo "8:0 wbps=10485760 wiops=max" > io.max
> > >>       $ dd if=/dev/zero of=/data/file1 bs=512M count=1 &
> > >>       $ dd if=/dev/zero of=/data/file1 bs=512M count=1 &
> > >>       ```
> > >>       ```
> > >>       $ iostat -d 1 -h -y -p sda
> > >>
> > >>        tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
> > >>      48.00         0.0k        10.0M         0.0k       0.0k      10.0M       0.0k sda
> > >>      48.00         0.0k        10.0M         0.0k       0.0k      10.0M       0.0k sda1
> > >>
> > >>
> > >>        tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
> > >>      40.00         0.0k        10.0M         0.0k       0.0k      10.0M       0.0k sda
> > >>      40.00         0.0k        10.0M         0.0k       0.0k      10.0M       0.0k sda1
> > >>
> > >>
> > >>        tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
> > >>      41.00         0.0k        10.0M         0.0k       0.0k      10.0M       0.0k sda
> > >>      41.00         0.0k        10.0M         0.0k       0.0k      10.0M       0.0k sda1
> > >>
> > >>
> > >>        tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
> > >>      46.00         0.0k        10.0M         0.0k       0.0k      10.0M       0.0k sda
> > >>      46.00         0.0k        10.0M         0.0k       0.0k      10.0M       0.0k sda1
> > >>
> > >>
> > >>        tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
> > >>      55.00         0.0k        10.2M         0.0k       0.0k      10.2M       0.0k sda
> > >>      55.00         0.0k        10.2M         0.0k       0.0k      10.2M       0.0k sda1
> >
> > And I don't this wiops=max is the reason, what need to explain is that
> > why dirty buffer got flushed to disk synchronously before the dd finish
> > and close the file?
>
> The dd command operates in the background, and it seems that the dirty
> buffers begin
> to flush after the command has completed.
>
> >
> > >>       ```
> > >> The iostat output shows the write operations as stabilizing at around 10 MB/s,
> > >> which aligns with the defined limit of 10 MB/s. After setting wiops to max, the
> > >> I/O limits appear to work as expected.
> >
> > Can you give the direct IO a test? And also enable write back cgroup for
> > buffer IO.
> >
> > Thanks,
> > Kuai
>
>
> Thanks a lot again for your time!
> Lance
>
> >
> > >>
> > >>
> > >> Thanks,
> > >> Lance
> > >
> > > Thanks for the report Lance. Is this something you started seeing after
> > > a kernel update or switch to cgroup v2? (Or you simply noticed with this
> > > setup only?)
> > >
> > >
> > > Michal
> > >
> >


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] cgroupv2/blk: inconsistent I/O behavior in Cgroup v2 with set device wbps and wiops
  2024-08-13  5:00     ` Lance Yang
  2024-08-13  6:17       ` Lance Yang
@ 2024-08-13  6:39       ` Yu Kuai
  2024-08-13  7:19         ` Yu Kuai
  1 sibling, 1 reply; 13+ messages in thread
From: Yu Kuai @ 2024-08-13  6:39 UTC (permalink / raw)
  To: Lance Yang, Yu Kuai
  Cc: Michal Koutný,
	linux-mm, linux-kernel, linux-block, cgroups, josef, tj,
	fujita.tomonori, boqun.feng, a.hindborg, paolo.valente, axboe,
	vbabka, david, 21cnbao, baolin.wang, libang.li, yukuai (C)

Hi,

在 2024/08/13 13:00, Lance Yang 写道:
> Hi Kuai,
> 
> Thanks a lot for jumping in!
> 
> On Tue, Aug 13, 2024 at 9:37 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
>> Hi,
>>
>> 在 2024/08/12 23:43, Michal Koutný 写道:
>>> +Cc Kuai
>>>
>>> On Mon, Aug 12, 2024 at 11:00:30PM GMT, Lance Yang <ioworker0@gmail.com> wrote:
>>>> Hi all,
>>>>
>>>> I've run into a problem with Cgroup v2 where it doesn't seem to correctly limit
>>>> I/O operations when I set both wbps and wiops for a device. However, if I only
>>>> set wbps, then everything works as expected.
>>>>
>>>> To reproduce the problem, we can follow these command-based steps:
>>>>
>>>> 1. **System Information:**
>>>>      - Kernel Version and OS Release:
>>>>        ```
>>>>        $ uname -r
>>>>        6.10.0-rc5+
>>>>
>>>>        $ cat /etc/os-release
>>>>        PRETTY_NAME="Ubuntu 24.04 LTS"
>>>>        NAME="Ubuntu"
>>>>        VERSION_ID="24.04"
>>>>        VERSION="24.04 LTS (Noble Numbat)"
>>>>        VERSION_CODENAME=noble
>>>>        ID=ubuntu
>>>>        ID_LIKE=debian
>>>>        HOME_URL="https://www.ubuntu.com/"
>>>>        SUPPORT_URL="https://help.ubuntu.com/"
>>>>        BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
>>>>        PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
>>>>        UBUNTU_CODENAME=noble
>>>>        LOGO=ubuntu-logo
>>>>        ```
>>>>
>>>> 2. **Device Information and Settings:**
>>>>      - List Block Devices and Scheduler:
>>>>        ```
>>>>        $ lsblk
>>>>        NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
>>>>        sda     8:0    0   4.4T  0 disk
>>>>        └─sda1  8:1    0   4.4T  0 part /data
>>>>        ...
>>>>
>>>>        $ cat /sys/block/sda/queue/scheduler
>>>>        none [mq-deadline] kyber bfq
>>>>
>>>>        $ cat /sys/block/sda/queue/rotational
>>>>        1
>>>>        ```
>>>>
>>>> 3. **Reproducing the problem:**
>>>>      - Navigate to the cgroup v2 filesystem and configure I/O settings:
>>>>        ```
>>>>        $ cd /sys/fs/cgroup/
>>>>        $ stat -fc %T /sys/fs/cgroup
>>>>        cgroup2fs
>>>>        $ mkdir test
>>>>        $ echo "8:0 wbps=10485760 wiops=100000" > io.max
>>>>        ```
>>>>        In this setup:
>>>>        wbps=10485760 sets the write bytes per second limit to 10 MB/s.
>>>>        wiops=100000 sets the write I/O operations per second limit to 100,000.
>>>>
>>>>      - Add process to the cgroup and verify:
>>>>        ```
>>>>        $ echo $$ > cgroup.procs
>>>>        $ cat cgroup.procs
>>>>        3826771
>>>>        3828513
>>>>        $ ps -ef|grep 3826771
>>>>        root     3826771 3826768  0 22:04 pts/1    00:00:00 -bash
>>>>        root     3828761 3826771  0 22:06 pts/1    00:00:00 ps -ef
>>>>        root     3828762 3826771  0 22:06 pts/1    00:00:00 grep --color=auto 3826771
>>>>        ```
>>>>
>>>>      - Observe I/O performance using `dd` commands and `iostat`:
>>>>        ```
>>>>        $ dd if=/dev/zero of=/data/file1 bs=512M count=1 &
>>>>        $ dd if=/dev/zero of=/data/file1 bs=512M count=1 &
>>
>> You're testing buffer IO here, and I don't see that write back cgroup is
>> enabled. Is this test intentional? Why not test direct IO?
> 
> Yes, I was testing buffered I/O and can confirm that CONFIG_CGROUP_WRITEBACK
> was enabled.
> 
> $ cat /boot/config-6.10.0-rc5+ |grep CONFIG_CGROUP_WRITEBACK
> CONFIG_CGROUP_WRITEBACK=y
> 
> We intend to configure both wbps (write bytes per second) and wiops
> (write I/O operations
> per second) for the containers. IIUC, this setup will effectively
> restrict both their block device
> I/Os and buffered I/Os.
> 
>> Why not test direct IO?
> 
> I was testing direct IO as well. However it did not work as expected with
> `echo "8:0 wbps=10485760 wiops=100000" > io.max`.
> 
> $ time dd if=/dev/zero of=/data/file7 bs=512M count=1 oflag=direct

So, you're issuing one huge IO, with 512M.
> 1+0 records in
> 1+0 records out
> 536870912 bytes (537 MB, 512 MiB) copied, 51.5962 s, 10.4 MB/s

And this result looks correct. Please noted that blk-throtl works before
IO submit, while iostat reports IO that are done. A huge IO can be
throttled for a long time.
> 
> real 0m51.637s
> user 0m0.000s
> sys 0m0.313s
> 
> $ iostat -d 1 -h -y -p sda
>   tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
> kB_dscd Device
>       9.00         0.0k         1.3M         0.0k       0.0k       1.3M
>        0.0k sda
>       9.00         0.0k         1.3M         0.0k       0.0k       1.3M
>        0.0k sda1

I don't understand yet is why there are few IO during the wait. Can you
test for a raw disk to bypass filesystem?

Thanks,
Kuai



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] cgroupv2/blk: inconsistent I/O behavior in Cgroup v2 with set device wbps and wiops
  2024-08-13  6:39       ` Yu Kuai
@ 2024-08-13  7:19         ` Yu Kuai
  2024-08-15  1:59           ` Lance Yang
                             ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Yu Kuai @ 2024-08-13  7:19 UTC (permalink / raw)
  To: Yu Kuai, Lance Yang
  Cc: Michal Koutný,
	linux-mm, linux-kernel, linux-block, cgroups, josef, tj,
	fujita.tomonori, boqun.feng, a.hindborg, paolo.valente, axboe,
	vbabka, david, 21cnbao, baolin.wang, libang.li, yukuai (C),
	yukuai (C)

Hi,

在 2024/08/13 14:39, Yu Kuai 写道:
> Hi,
> 
> 在 2024/08/13 13:00, Lance Yang 写道:
>> Hi Kuai,
>>
>> Thanks a lot for jumping in!
>>
>> On Tue, Aug 13, 2024 at 9:37 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>>
>>> Hi,
>>>
>>> 在 2024/08/12 23:43, Michal Koutný 写道:
>>>> +Cc Kuai
>>>>
>>>> On Mon, Aug 12, 2024 at 11:00:30PM GMT, Lance Yang 
>>>> <ioworker0@gmail.com> wrote:
>>>>> Hi all,
>>>>>
>>>>> I've run into a problem with Cgroup v2 where it doesn't seem to 
>>>>> correctly limit
>>>>> I/O operations when I set both wbps and wiops for a device. 
>>>>> However, if I only
>>>>> set wbps, then everything works as expected.
>>>>>
>>>>> To reproduce the problem, we can follow these command-based steps:
>>>>>
>>>>> 1. **System Information:**
>>>>>      - Kernel Version and OS Release:
>>>>>        ```
>>>>>        $ uname -r
>>>>>        6.10.0-rc5+
>>>>>
>>>>>        $ cat /etc/os-release
>>>>>        PRETTY_NAME="Ubuntu 24.04 LTS"
>>>>>        NAME="Ubuntu"
>>>>>        VERSION_ID="24.04"
>>>>>        VERSION="24.04 LTS (Noble Numbat)"
>>>>>        VERSION_CODENAME=noble
>>>>>        ID=ubuntu
>>>>>        ID_LIKE=debian
>>>>>        HOME_URL="https://www.ubuntu.com/"
>>>>>        SUPPORT_URL="https://help.ubuntu.com/"
>>>>>        BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
>>>>>        
>>>>> PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" 
>>>>>
>>>>>        UBUNTU_CODENAME=noble
>>>>>        LOGO=ubuntu-logo
>>>>>        ```
>>>>>
>>>>> 2. **Device Information and Settings:**
>>>>>      - List Block Devices and Scheduler:
>>>>>        ```
>>>>>        $ lsblk
>>>>>        NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
>>>>>        sda     8:0    0   4.4T  0 disk
>>>>>        └─sda1  8:1    0   4.4T  0 part /data
>>>>>        ...
>>>>>
>>>>>        $ cat /sys/block/sda/queue/scheduler
>>>>>        none [mq-deadline] kyber bfq
>>>>>
>>>>>        $ cat /sys/block/sda/queue/rotational
>>>>>        1
>>>>>        ```
>>>>>
>>>>> 3. **Reproducing the problem:**
>>>>>      - Navigate to the cgroup v2 filesystem and configure I/O 
>>>>> settings:
>>>>>        ```
>>>>>        $ cd /sys/fs/cgroup/
>>>>>        $ stat -fc %T /sys/fs/cgroup
>>>>>        cgroup2fs
>>>>>        $ mkdir test
>>>>>        $ echo "8:0 wbps=10485760 wiops=100000" > io.max
>>>>>        ```
>>>>>        In this setup:
>>>>>        wbps=10485760 sets the write bytes per second limit to 10 MB/s.
>>>>>        wiops=100000 sets the write I/O operations per second limit 
>>>>> to 100,000.
>>>>>
>>>>>      - Add process to the cgroup and verify:
>>>>>        ```
>>>>>        $ echo $$ > cgroup.procs
>>>>>        $ cat cgroup.procs
>>>>>        3826771
>>>>>        3828513
>>>>>        $ ps -ef|grep 3826771
>>>>>        root     3826771 3826768  0 22:04 pts/1    00:00:00 -bash
>>>>>        root     3828761 3826771  0 22:06 pts/1    00:00:00 ps -ef
>>>>>        root     3828762 3826771  0 22:06 pts/1    00:00:00 grep 
>>>>> --color=auto 3826771
>>>>>        ```
>>>>>
>>>>>      - Observe I/O performance using `dd` commands and `iostat`:
>>>>>        ```
>>>>>        $ dd if=/dev/zero of=/data/file1 bs=512M count=1 &
>>>>>        $ dd if=/dev/zero of=/data/file1 bs=512M count=1 &
>>>
>>> You're testing buffer IO here, and I don't see that write back cgroup is
>>> enabled. Is this test intentional? Why not test direct IO?
>>
>> Yes, I was testing buffered I/O and can confirm that 
>> CONFIG_CGROUP_WRITEBACK
>> was enabled.
>>
>> $ cat /boot/config-6.10.0-rc5+ |grep CONFIG_CGROUP_WRITEBACK
>> CONFIG_CGROUP_WRITEBACK=y
>>
>> We intend to configure both wbps (write bytes per second) and wiops
>> (write I/O operations
>> per second) for the containers. IIUC, this setup will effectively
>> restrict both their block device
>> I/Os and buffered I/Os.
>>
>>> Why not test direct IO?
>>
>> I was testing direct IO as well. However it did not work as expected with
>> `echo "8:0 wbps=10485760 wiops=100000" > io.max`.
>>
>> $ time dd if=/dev/zero of=/data/file7 bs=512M count=1 oflag=direct
> 
> So, you're issuing one huge IO, with 512M.
>> 1+0 records in
>> 1+0 records out
>> 536870912 bytes (537 MB, 512 MiB) copied, 51.5962 s, 10.4 MB/s
> 
> And this result looks correct. Please noted that blk-throtl works before
> IO submit, while iostat reports IO that are done. A huge IO can be
> throttled for a long time.
>>
>> real 0m51.637s
>> user 0m0.000s
>> sys 0m0.313s
>>
>> $ iostat -d 1 -h -y -p sda
>>   tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
>> kB_dscd Device
>>       9.00         0.0k         1.3M         0.0k       0.0k       1.3M
>>        0.0k sda
>>       9.00         0.0k         1.3M         0.0k       0.0k       1.3M
>>        0.0k sda1
> 
> I don't understand yet is why there are few IO during the wait. Can you
> test for a raw disk to bypass filesystem?

To be updated, I add a debug patch for this:

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index dc6140fa3de0..3b2648c17079 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -1119,8 +1119,10 @@ static void blk_throtl_dispatch_work_fn(struct 
work_struct *work)

         if (!bio_list_empty(&bio_list_on_stack)) {
                 blk_start_plug(&plug);
-               while ((bio = bio_list_pop(&bio_list_on_stack)))
+               while ((bio = bio_list_pop(&bio_list_on_stack))) {
+                       printk("%s: bio done %lu %px\n", __func__, 
bio_sectors(bio), bio);
                         submit_bio_noacct_nocheck(bio);
+               }
                 blk_finish_plug(&plug);
         }
  }
@@ -1606,6 +1608,8 @@ bool __blk_throtl_bio(struct bio *bio)
         bool throttled = false;
         struct throtl_data *td = tg->td;

+       printk("%s: bio start %lu %px\n", __func__, bio_sectors(bio), bio);
+
         rcu_read_lock();
         spin_lock_irq(&q->queue_lock);
         sq = &tg->service_queue;
@@ -1649,6 +1653,7 @@ bool __blk_throtl_bio(struct bio *bio)
                 tg = sq_to_tg(sq);
                 if (!tg) {
                         bio_set_flag(bio, BIO_BPS_THROTTLED);
+                       printk("%s: bio done %lu %px\n", __func__, 
bio_sectors(bio), bio);
                         goto out_unlock;
                 }
         }

For dirct IO with raw disk:

with or without wiops, the result is the same:

[  469.736098] __blk_throtl_bio: bio start 2128 ffff8881014c08c0
[  469.736903] __blk_throtl_bio: bio start 2144 ffff88817852ec80
[  469.737585] __blk_throtl_bio: bio start 2096 ffff88817852f080
[  469.738392] __blk_throtl_bio: bio start 2096 ffff88817852f480
[  469.739358] __blk_throtl_bio: bio start 2064 ffff88817852e880
[  469.740330] __blk_throtl_bio: bio start 2112 ffff88817852fa80
[  469.741262] __blk_throtl_bio: bio start 2080 ffff88817852e280
[  469.742280] __blk_throtl_bio: bio start 2096 ffff88817852e080
[  469.743281] __blk_throtl_bio: bio start 2104 ffff88817852f880
[  469.744309] __blk_throtl_bio: bio start 2240 ffff88817852e680
[  469.745050] __blk_throtl_bio: bio start 2184 ffff88817852e480
[  469.745857] __blk_throtl_bio: bio start 2120 ffff88817852f680
[  469.746779] __blk_throtl_bio: bio start 2512 ffff88817852fe80
[  469.747611] __blk_throtl_bio: bio start 2488 ffff88817852f280
[  469.748242] __blk_throtl_bio: bio start 2120 ffff88817852ee80
[  469.749159] __blk_throtl_bio: bio start 2256 ffff88817852fc80
[  469.750087] __blk_throtl_bio: bio start 2576 ffff88817852ea80
[  469.750802] __blk_throtl_bio: bio start 2112 ffff8881014a3a80
[  469.751586] __blk_throtl_bio: bio start 2240 ffff8881014a2880
[  469.752383] __blk_throtl_bio: bio start 2160 ffff8881014a2e80
[  469.753289] __blk_throtl_bio: bio start 2248 ffff8881014a3c80
[  469.754024] __blk_throtl_bio: bio start 2536 ffff8881014a2680
[  469.754913] __blk_throtl_bio: bio start 2088 ffff8881014a3080
[  469.766036] __blk_throtl_bio: bio start 211344 ffff8881014a3280
[  469.842366] blk_throtl_dispatch_work_fn: bio done 2128 ffff8881014c08c0
[  469.952627] blk_throtl_dispatch_work_fn: bio done 2144 ffff88817852ec80
[  470.048729] blk_throtl_dispatch_work_fn: bio done 2096 ffff88817852f080
[  470.152642] blk_throtl_dispatch_work_fn: bio done 2096 ffff88817852f480
[  470.256661] blk_throtl_dispatch_work_fn: bio done 2064 ffff88817852e880
[  470.360662] blk_throtl_dispatch_work_fn: bio done 2112 ffff88817852fa80
[  470.464626] blk_throtl_dispatch_work_fn: bio done 2080 ffff88817852e280
[  470.568652] blk_throtl_dispatch_work_fn: bio done 2096 ffff88817852e080
[  470.672623] blk_throtl_dispatch_work_fn: bio done 2104 ffff88817852f880
[  470.776620] blk_throtl_dispatch_work_fn: bio done 2240 ffff88817852e680
[  470.889801] blk_throtl_dispatch_work_fn: bio done 2184 ffff88817852e480
[  470.992686] blk_throtl_dispatch_work_fn: bio done 2120 ffff88817852f680
[  471.112633] blk_throtl_dispatch_work_fn: bio done 2512 ffff88817852fe80
[  471.232680] blk_throtl_dispatch_work_fn: bio done 2488 ffff88817852f280
[  471.336695] blk_throtl_dispatch_work_fn: bio done 2120 ffff88817852ee80
[  471.448645] blk_throtl_dispatch_work_fn: bio done 2256 ffff88817852fc80
[  471.576632] blk_throtl_dispatch_work_fn: bio done 2576 ffff88817852ea80
[  471.680709] blk_throtl_dispatch_work_fn: bio done 2112 ffff8881014a3a80
[  471.792680] blk_throtl_dispatch_work_fn: bio done 2240 ffff8881014a2880
[  471.896682] blk_throtl_dispatch_work_fn: bio done 2160 ffff8881014a2e80
[  472.008698] blk_throtl_dispatch_work_fn: bio done 2248 ffff8881014a3c80
[  472.136630] blk_throtl_dispatch_work_fn: bio done 2536 ffff8881014a2680
[  472.240678] blk_throtl_dispatch_work_fn: bio done 2088 ffff8881014a3080
[  482.560633] blk_throtl_dispatch_work_fn: bio done 211344 ffff8881014a3280

Hence the upper layer issue some small IO first, then with a 100+MB IO,
and wait time looks correct.

Then, I retest for xfs, result are still the same with or without wiops:

[ 1175.907019] __blk_throtl_bio: bio start 8192 ffff88816daf8480
[ 1175.908224] __blk_throtl_bio: bio start 8192 ffff88816daf8e80
[ 1175.910618] __blk_throtl_bio: bio start 8192 ffff88816daf9280
[ 1175.911991] __blk_throtl_bio: bio start 8192 ffff88816daf8280
[ 1175.913187] __blk_throtl_bio: bio start 8192 ffff88816daf9080
[ 1175.914904] __blk_throtl_bio: bio start 8192 ffff88816daf9680
[ 1175.916099] __blk_throtl_bio: bio start 8192 ffff88816daf8880
[ 1175.917844] __blk_throtl_bio: bio start 8192 ffff88816daf8c80
[ 1175.919025] __blk_throtl_bio: bio start 8192 ffff88816daf8a80
[ 1175.920868] __blk_throtl_bio: bio start 8192 ffff888178a84080
[ 1175.922068] __blk_throtl_bio: bio start 8192 ffff888178a84280
[ 1175.923819] __blk_throtl_bio: bio start 8192 ffff888178a84480
[ 1175.925017] __blk_throtl_bio: bio start 8192 ffff888178a84680
[ 1175.926851] __blk_throtl_bio: bio start 8192 ffff888178a84880
[ 1175.928025] __blk_throtl_bio: bio start 8192 ffff888178a84a80
[ 1175.929806] __blk_throtl_bio: bio start 8192 ffff888178a84c80
[ 1175.931007] __blk_throtl_bio: bio start 8192 ffff888178a84e80
[ 1175.932852] __blk_throtl_bio: bio start 8192 ffff888178a85080
[ 1175.934041] __blk_throtl_bio: bio start 8192 ffff888178a85280
[ 1175.935892] __blk_throtl_bio: bio start 8192 ffff888178a85480
[ 1175.937074] __blk_throtl_bio: bio start 8192 ffff888178a85680
[ 1175.938860] __blk_throtl_bio: bio start 8192 ffff888178a85880
[ 1175.940053] __blk_throtl_bio: bio start 8192 ffff888178a85a80
[ 1175.941824] __blk_throtl_bio: bio start 8192 ffff888178a85c80
[ 1175.943040] __blk_throtl_bio: bio start 8192 ffff888178a85e80
[ 1175.944945] __blk_throtl_bio: bio start 8192 ffff88816b046080
[ 1175.946156] __blk_throtl_bio: bio start 8192 ffff88816b046280
[ 1175.948261] __blk_throtl_bio: bio start 8192 ffff88816b046480
[ 1175.949521] __blk_throtl_bio: bio start 8192 ffff88816b046680
[ 1175.950877] __blk_throtl_bio: bio start 8192 ffff88816b046880
[ 1175.952051] __blk_throtl_bio: bio start 8192 ffff88816b046a80
[ 1175.954313] __blk_throtl_bio: bio start 8192 ffff88816b046c80
[ 1175.955530] __blk_throtl_bio: bio start 8192 ffff88816b046e80
[ 1175.957370] __blk_throtl_bio: bio start 8192 ffff88816b047080
[ 1175.958818] __blk_throtl_bio: bio start 8192 ffff88816b047280
[ 1175.960093] __blk_throtl_bio: bio start 8192 ffff88816b047480
[ 1175.961900] __blk_throtl_bio: bio start 8192 ffff88816b047680
[ 1175.963070] __blk_throtl_bio: bio start 8192 ffff88816b047880
[ 1175.965262] __blk_throtl_bio: bio start 8192 ffff88816b047a80
[ 1175.966527] __blk_throtl_bio: bio start 8192 ffff88816b047c80
[ 1175.967928] __blk_throtl_bio: bio start 8192 ffff88816b047e80
[ 1175.969124] __blk_throtl_bio: bio start 8192 ffff888170e84080
[ 1175.971369] __blk_throtl_bio: bio start 8192 ffff888170e84280


Hence xfs is always issuing 4MB IO, that's whay stable wbps can be
observed by iostat. The main difference is that a 100+MB IO is issued
from the last test and throttle for about 10+s.

Then for your case, you might want to comfirm what kind of IO are
submitted from upper layer.

Thanks,
Kuai
> 
> Thanks,
> Kuai
> 
> 
> .
> 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] cgroupv2/blk: inconsistent I/O behavior in Cgroup v2 with set device wbps and wiops
  2024-08-13  7:19         ` Yu Kuai
@ 2024-08-15  1:59           ` Lance Yang
  2024-08-23 12:05           ` Lance Yang
  2024-08-23 12:19           ` Lance Yang
  2 siblings, 0 replies; 13+ messages in thread
From: Lance Yang @ 2024-08-15  1:59 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Michal Koutný,
	linux-mm, linux-kernel, linux-block, cgroups, josef, tj,
	fujita.tomonori, boqun.feng, a.hindborg, paolo.valente, axboe,
	vbabka, david, 21cnbao, baolin.wang, libang.li, yukuai (C)

On Tue, Aug 13, 2024 at 3:19 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2024/08/13 14:39, Yu Kuai 写道:
> > Hi,
> >
> > 在 2024/08/13 13:00, Lance Yang 写道:
> >> Hi Kuai,
> >>
> >> Thanks a lot for jumping in!
> >>
> >> On Tue, Aug 13, 2024 at 9:37 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> >>>
> >>> Hi,
> >>>
> >>> 在 2024/08/12 23:43, Michal Koutný 写道:
> >>>> +Cc Kuai
> >>>>
> >>>> On Mon, Aug 12, 2024 at 11:00:30PM GMT, Lance Yang
> >>>> <ioworker0@gmail.com> wrote:
> >>>>> Hi all,
> >>>>>
> >>>>> I've run into a problem with Cgroup v2 where it doesn't seem to
> >>>>> correctly limit
> >>>>> I/O operations when I set both wbps and wiops for a device.
> >>>>> However, if I only
> >>>>> set wbps, then everything works as expected.
> >>>>>
> >>>>> To reproduce the problem, we can follow these command-based steps:
> >>>>>
> >>>>> 1. **System Information:**
> >>>>>      - Kernel Version and OS Release:
> >>>>>        ```
> >>>>>        $ uname -r
> >>>>>        6.10.0-rc5+
> >>>>>
> >>>>>        $ cat /etc/os-release
> >>>>>        PRETTY_NAME="Ubuntu 24.04 LTS"
> >>>>>        NAME="Ubuntu"
> >>>>>        VERSION_ID="24.04"
> >>>>>        VERSION="24.04 LTS (Noble Numbat)"
> >>>>>        VERSION_CODENAME=noble
> >>>>>        ID=ubuntu
> >>>>>        ID_LIKE=debian
> >>>>>        HOME_URL="https://www.ubuntu.com/"
> >>>>>        SUPPORT_URL="https://help.ubuntu.com/"
> >>>>>        BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
> >>>>>
> >>>>> PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
> >>>>>
> >>>>>        UBUNTU_CODENAME=noble
> >>>>>        LOGO=ubuntu-logo
> >>>>>        ```
> >>>>>
> >>>>> 2. **Device Information and Settings:**
> >>>>>      - List Block Devices and Scheduler:
> >>>>>        ```
> >>>>>        $ lsblk
> >>>>>        NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
> >>>>>        sda     8:0    0   4.4T  0 disk
> >>>>>        └─sda1  8:1    0   4.4T  0 part /data
> >>>>>        ...
> >>>>>
> >>>>>        $ cat /sys/block/sda/queue/scheduler
> >>>>>        none [mq-deadline] kyber bfq
> >>>>>
> >>>>>        $ cat /sys/block/sda/queue/rotational
> >>>>>        1
> >>>>>        ```
> >>>>>
> >>>>> 3. **Reproducing the problem:**
> >>>>>      - Navigate to the cgroup v2 filesystem and configure I/O
> >>>>> settings:
> >>>>>        ```
> >>>>>        $ cd /sys/fs/cgroup/
> >>>>>        $ stat -fc %T /sys/fs/cgroup
> >>>>>        cgroup2fs
> >>>>>        $ mkdir test
> >>>>>        $ echo "8:0 wbps=10485760 wiops=100000" > io.max
> >>>>>        ```
> >>>>>        In this setup:
> >>>>>        wbps=10485760 sets the write bytes per second limit to 10 MB/s.
> >>>>>        wiops=100000 sets the write I/O operations per second limit
> >>>>> to 100,000.
> >>>>>
> >>>>>      - Add process to the cgroup and verify:
> >>>>>        ```
> >>>>>        $ echo $$ > cgroup.procs
> >>>>>        $ cat cgroup.procs
> >>>>>        3826771
> >>>>>        3828513
> >>>>>        $ ps -ef|grep 3826771
> >>>>>        root     3826771 3826768  0 22:04 pts/1    00:00:00 -bash
> >>>>>        root     3828761 3826771  0 22:06 pts/1    00:00:00 ps -ef
> >>>>>        root     3828762 3826771  0 22:06 pts/1    00:00:00 grep
> >>>>> --color=auto 3826771
> >>>>>        ```
> >>>>>
> >>>>>      - Observe I/O performance using `dd` commands and `iostat`:
> >>>>>        ```
> >>>>>        $ dd if=/dev/zero of=/data/file1 bs=512M count=1 &
> >>>>>        $ dd if=/dev/zero of=/data/file1 bs=512M count=1 &
> >>>
> >>> You're testing buffer IO here, and I don't see that write back cgroup is
> >>> enabled. Is this test intentional? Why not test direct IO?
> >>
> >> Yes, I was testing buffered I/O and can confirm that
> >> CONFIG_CGROUP_WRITEBACK
> >> was enabled.
> >>
> >> $ cat /boot/config-6.10.0-rc5+ |grep CONFIG_CGROUP_WRITEBACK
> >> CONFIG_CGROUP_WRITEBACK=y
> >>
> >> We intend to configure both wbps (write bytes per second) and wiops
> >> (write I/O operations
> >> per second) for the containers. IIUC, this setup will effectively
> >> restrict both their block device
> >> I/Os and buffered I/Os.
> >>
> >>> Why not test direct IO?
> >>
> >> I was testing direct IO as well. However it did not work as expected with
> >> `echo "8:0 wbps=10485760 wiops=100000" > io.max`.
> >>
> >> $ time dd if=/dev/zero of=/data/file7 bs=512M count=1 oflag=direct
> >
> > So, you're issuing one huge IO, with 512M.
> >> 1+0 records in
> >> 1+0 records out
> >> 536870912 bytes (537 MB, 512 MiB) copied, 51.5962 s, 10.4 MB/s
> >
> > And this result looks correct. Please noted that blk-throtl works before
> > IO submit, while iostat reports IO that are done. A huge IO can be
> > throttled for a long time.
> >>
> >> real 0m51.637s
> >> user 0m0.000s
> >> sys 0m0.313s
> >>
> >> $ iostat -d 1 -h -y -p sda
> >>   tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn
> >> kB_dscd Device
> >>       9.00         0.0k         1.3M         0.0k       0.0k       1.3M
> >>        0.0k sda
> >>       9.00         0.0k         1.3M         0.0k       0.0k       1.3M
> >>        0.0k sda1
> >
> > I don't understand yet is why there are few IO during the wait. Can you
> > test for a raw disk to bypass filesystem?
>
> To be updated, I add a debug patch for this:

Kuai, sorry for the delayed response ;(

I'll give this debug patch a try and do other tests for a raw disk to bypass
the file system as well, and get back to you ASAP.

Thanks a lot for reaching out!
Lance

>
> diff --git a/block/blk-throttle.c b/block/blk-throttle.c
> index dc6140fa3de0..3b2648c17079 100644
> --- a/block/blk-throttle.c
> +++ b/block/blk-throttle.c
> @@ -1119,8 +1119,10 @@ static void blk_throtl_dispatch_work_fn(struct
> work_struct *work)
>
>          if (!bio_list_empty(&bio_list_on_stack)) {
>                  blk_start_plug(&plug);
> -               while ((bio = bio_list_pop(&bio_list_on_stack)))
> +               while ((bio = bio_list_pop(&bio_list_on_stack))) {
> +                       printk("%s: bio done %lu %px\n", __func__,
> bio_sectors(bio), bio);
>                          submit_bio_noacct_nocheck(bio);
> +               }
>                  blk_finish_plug(&plug);
>          }
>   }
> @@ -1606,6 +1608,8 @@ bool __blk_throtl_bio(struct bio *bio)
>          bool throttled = false;
>          struct throtl_data *td = tg->td;
>
> +       printk("%s: bio start %lu %px\n", __func__, bio_sectors(bio), bio);
> +
>          rcu_read_lock();
>          spin_lock_irq(&q->queue_lock);
>          sq = &tg->service_queue;
> @@ -1649,6 +1653,7 @@ bool __blk_throtl_bio(struct bio *bio)
>                  tg = sq_to_tg(sq);
>                  if (!tg) {
>                          bio_set_flag(bio, BIO_BPS_THROTTLED);
> +                       printk("%s: bio done %lu %px\n", __func__,
> bio_sectors(bio), bio);
>                          goto out_unlock;
>                  }
>          }
>
> For dirct IO with raw disk:
>
> with or without wiops, the result is the same:
>
> [  469.736098] __blk_throtl_bio: bio start 2128 ffff8881014c08c0
> [  469.736903] __blk_throtl_bio: bio start 2144 ffff88817852ec80
> [  469.737585] __blk_throtl_bio: bio start 2096 ffff88817852f080
> [  469.738392] __blk_throtl_bio: bio start 2096 ffff88817852f480
> [  469.739358] __blk_throtl_bio: bio start 2064 ffff88817852e880
> [  469.740330] __blk_throtl_bio: bio start 2112 ffff88817852fa80
> [  469.741262] __blk_throtl_bio: bio start 2080 ffff88817852e280
> [  469.742280] __blk_throtl_bio: bio start 2096 ffff88817852e080
> [  469.743281] __blk_throtl_bio: bio start 2104 ffff88817852f880
> [  469.744309] __blk_throtl_bio: bio start 2240 ffff88817852e680
> [  469.745050] __blk_throtl_bio: bio start 2184 ffff88817852e480
> [  469.745857] __blk_throtl_bio: bio start 2120 ffff88817852f680
> [  469.746779] __blk_throtl_bio: bio start 2512 ffff88817852fe80
> [  469.747611] __blk_throtl_bio: bio start 2488 ffff88817852f280
> [  469.748242] __blk_throtl_bio: bio start 2120 ffff88817852ee80
> [  469.749159] __blk_throtl_bio: bio start 2256 ffff88817852fc80
> [  469.750087] __blk_throtl_bio: bio start 2576 ffff88817852ea80
> [  469.750802] __blk_throtl_bio: bio start 2112 ffff8881014a3a80
> [  469.751586] __blk_throtl_bio: bio start 2240 ffff8881014a2880
> [  469.752383] __blk_throtl_bio: bio start 2160 ffff8881014a2e80
> [  469.753289] __blk_throtl_bio: bio start 2248 ffff8881014a3c80
> [  469.754024] __blk_throtl_bio: bio start 2536 ffff8881014a2680
> [  469.754913] __blk_throtl_bio: bio start 2088 ffff8881014a3080
> [  469.766036] __blk_throtl_bio: bio start 211344 ffff8881014a3280
> [  469.842366] blk_throtl_dispatch_work_fn: bio done 2128 ffff8881014c08c0
> [  469.952627] blk_throtl_dispatch_work_fn: bio done 2144 ffff88817852ec80
> [  470.048729] blk_throtl_dispatch_work_fn: bio done 2096 ffff88817852f080
> [  470.152642] blk_throtl_dispatch_work_fn: bio done 2096 ffff88817852f480
> [  470.256661] blk_throtl_dispatch_work_fn: bio done 2064 ffff88817852e880
> [  470.360662] blk_throtl_dispatch_work_fn: bio done 2112 ffff88817852fa80
> [  470.464626] blk_throtl_dispatch_work_fn: bio done 2080 ffff88817852e280
> [  470.568652] blk_throtl_dispatch_work_fn: bio done 2096 ffff88817852e080
> [  470.672623] blk_throtl_dispatch_work_fn: bio done 2104 ffff88817852f880
> [  470.776620] blk_throtl_dispatch_work_fn: bio done 2240 ffff88817852e680
> [  470.889801] blk_throtl_dispatch_work_fn: bio done 2184 ffff88817852e480
> [  470.992686] blk_throtl_dispatch_work_fn: bio done 2120 ffff88817852f680
> [  471.112633] blk_throtl_dispatch_work_fn: bio done 2512 ffff88817852fe80
> [  471.232680] blk_throtl_dispatch_work_fn: bio done 2488 ffff88817852f280
> [  471.336695] blk_throtl_dispatch_work_fn: bio done 2120 ffff88817852ee80
> [  471.448645] blk_throtl_dispatch_work_fn: bio done 2256 ffff88817852fc80
> [  471.576632] blk_throtl_dispatch_work_fn: bio done 2576 ffff88817852ea80
> [  471.680709] blk_throtl_dispatch_work_fn: bio done 2112 ffff8881014a3a80
> [  471.792680] blk_throtl_dispatch_work_fn: bio done 2240 ffff8881014a2880
> [  471.896682] blk_throtl_dispatch_work_fn: bio done 2160 ffff8881014a2e80
> [  472.008698] blk_throtl_dispatch_work_fn: bio done 2248 ffff8881014a3c80
> [  472.136630] blk_throtl_dispatch_work_fn: bio done 2536 ffff8881014a2680
> [  472.240678] blk_throtl_dispatch_work_fn: bio done 2088 ffff8881014a3080
> [  482.560633] blk_throtl_dispatch_work_fn: bio done 211344 ffff8881014a3280
>
> Hence the upper layer issue some small IO first, then with a 100+MB IO,
> and wait time looks correct.
>
> Then, I retest for xfs, result are still the same with or without wiops:
>
> [ 1175.907019] __blk_throtl_bio: bio start 8192 ffff88816daf8480
> [ 1175.908224] __blk_throtl_bio: bio start 8192 ffff88816daf8e80
> [ 1175.910618] __blk_throtl_bio: bio start 8192 ffff88816daf9280
> [ 1175.911991] __blk_throtl_bio: bio start 8192 ffff88816daf8280
> [ 1175.913187] __blk_throtl_bio: bio start 8192 ffff88816daf9080
> [ 1175.914904] __blk_throtl_bio: bio start 8192 ffff88816daf9680
> [ 1175.916099] __blk_throtl_bio: bio start 8192 ffff88816daf8880
> [ 1175.917844] __blk_throtl_bio: bio start 8192 ffff88816daf8c80
> [ 1175.919025] __blk_throtl_bio: bio start 8192 ffff88816daf8a80
> [ 1175.920868] __blk_throtl_bio: bio start 8192 ffff888178a84080
> [ 1175.922068] __blk_throtl_bio: bio start 8192 ffff888178a84280
> [ 1175.923819] __blk_throtl_bio: bio start 8192 ffff888178a84480
> [ 1175.925017] __blk_throtl_bio: bio start 8192 ffff888178a84680
> [ 1175.926851] __blk_throtl_bio: bio start 8192 ffff888178a84880
> [ 1175.928025] __blk_throtl_bio: bio start 8192 ffff888178a84a80
> [ 1175.929806] __blk_throtl_bio: bio start 8192 ffff888178a84c80
> [ 1175.931007] __blk_throtl_bio: bio start 8192 ffff888178a84e80
> [ 1175.932852] __blk_throtl_bio: bio start 8192 ffff888178a85080
> [ 1175.934041] __blk_throtl_bio: bio start 8192 ffff888178a85280
> [ 1175.935892] __blk_throtl_bio: bio start 8192 ffff888178a85480
> [ 1175.937074] __blk_throtl_bio: bio start 8192 ffff888178a85680
> [ 1175.938860] __blk_throtl_bio: bio start 8192 ffff888178a85880
> [ 1175.940053] __blk_throtl_bio: bio start 8192 ffff888178a85a80
> [ 1175.941824] __blk_throtl_bio: bio start 8192 ffff888178a85c80
> [ 1175.943040] __blk_throtl_bio: bio start 8192 ffff888178a85e80
> [ 1175.944945] __blk_throtl_bio: bio start 8192 ffff88816b046080
> [ 1175.946156] __blk_throtl_bio: bio start 8192 ffff88816b046280
> [ 1175.948261] __blk_throtl_bio: bio start 8192 ffff88816b046480
> [ 1175.949521] __blk_throtl_bio: bio start 8192 ffff88816b046680
> [ 1175.950877] __blk_throtl_bio: bio start 8192 ffff88816b046880
> [ 1175.952051] __blk_throtl_bio: bio start 8192 ffff88816b046a80
> [ 1175.954313] __blk_throtl_bio: bio start 8192 ffff88816b046c80
> [ 1175.955530] __blk_throtl_bio: bio start 8192 ffff88816b046e80
> [ 1175.957370] __blk_throtl_bio: bio start 8192 ffff88816b047080
> [ 1175.958818] __blk_throtl_bio: bio start 8192 ffff88816b047280
> [ 1175.960093] __blk_throtl_bio: bio start 8192 ffff88816b047480
> [ 1175.961900] __blk_throtl_bio: bio start 8192 ffff88816b047680
> [ 1175.963070] __blk_throtl_bio: bio start 8192 ffff88816b047880
> [ 1175.965262] __blk_throtl_bio: bio start 8192 ffff88816b047a80
> [ 1175.966527] __blk_throtl_bio: bio start 8192 ffff88816b047c80
> [ 1175.967928] __blk_throtl_bio: bio start 8192 ffff88816b047e80
> [ 1175.969124] __blk_throtl_bio: bio start 8192 ffff888170e84080
> [ 1175.971369] __blk_throtl_bio: bio start 8192 ffff888170e84280
>
>
> Hence xfs is always issuing 4MB IO, that's whay stable wbps can be
> observed by iostat. The main difference is that a 100+MB IO is issued
> from the last test and throttle for about 10+s.
>
> Then for your case, you might want to comfirm what kind of IO are
> submitted from upper layer.
>
> Thanks,
> Kuai
> >
> > Thanks,
> > Kuai
> >
> >
> > .
> >
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] cgroupv2/blk: inconsistent I/O behavior in Cgroup v2 with set device wbps and wiops
  2024-08-13  7:19         ` Yu Kuai
  2024-08-15  1:59           ` Lance Yang
@ 2024-08-23 12:05           ` Lance Yang
  2024-08-26  1:31             ` Yu Kuai
  2024-08-23 12:19           ` Lance Yang
  2 siblings, 1 reply; 13+ messages in thread
From: Lance Yang @ 2024-08-23 12:05 UTC (permalink / raw)
  To: yukuai1
  Cc: 21cnbao, a.hindborg, axboe, baolin.wang, boqun.feng, cgroups,
	david, fujita.tomonori, ioworker0, josef, libang.li, linux-block,
	linux-kernel, linux-mm, mkoutny, paolo.valente, tj, vbabka,
	yukuai3

My bad, I got tied up with some stuff :(

Hmm... tried your debug patch today, but my test results are different from
yours. So let's take a look at direct IO with raw disk first.

```
$ lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
sda      8:0    0   90G  0 disk
├─sda1   8:1    0    1G  0 part /boot/efi
└─sda2   8:2    0 88.9G  0 part /
sdb      8:16   0   10G  0 disk

$ cat  /sys/block/sda/queue/scheduler
none [mq-deadline]

$ cat  /sys/block/sda/queue/rotational
0

$ cat  /sys/block/sdb/queue/rotational
0

$ cat  /sys/block/sdb/queue/scheduler
none [mq-deadline]

$ cat /boot/config-6.11.0-rc3+ |grep CONFIG_CGROUP_
# CONFIG_CGROUP_FAVOR_DYNMODS is not set
CONFIG_CGROUP_WRITEBACK=y
CONFIG_CGROUP_SCHED=y
CONFIG_CGROUP_PIDS=y
CONFIG_CGROUP_RDMA=y
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_HUGETLB=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_CGROUP_PERF=y
CONFIG_CGROUP_BPF=y
CONFIG_CGROUP_MISC=y
# CONFIG_CGROUP_DEBUG is not set
CONFIG_CGROUP_NET_PRIO=y
CONFIG_CGROUP_NET_CLASSID=y

$ cd /sys/fs/cgroup/test/ && cat cgroup.controllers
cpu io memory pids

$ cat io.weight
default 100

$ cat io.prio.class
no-change
```

With wiops, the result is as follows:

```
$ echo "8:16 wbps=10485760 wiops=100000" > io.max

$ dd if=/dev/zero of=/dev/sdb bs=50M count=1 oflag=direct
1+0 records in
1+0 records out
52428800 bytes (52 MB, 50 MiB) copied, 5.05893 s, 10.4 MB/s

$ dmesg -T
[Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 2984 ffff0000fb3a8f00
[Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 6176 ffff0000fb3a97c0
[Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 7224 ffff0000fb3a9180
[Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 16384 ffff0000fb3a8640
[Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 16384 ffff0000fb3a9400
[Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 16384 ffff0000fb3a8c80
[Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 16384 ffff0000fb3a9040
[Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 16384 ffff0000fb3a92c0
[Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 4096 ffff0000fb3a8000
[Fri Aug 23 11:04:08 2024] blk_throtl_dispatch_work_fn: bio done 2984 ffff0000fb3a8f00
[Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 1400 ffff0000fb3a8f00
[Fri Aug 23 11:04:08 2024] blk_throtl_dispatch_work_fn: bio done 6176 ffff0000fb3a97c0
[Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 3616 ffff0000fb3a97c0
[Fri Aug 23 11:04:08 2024] blk_throtl_dispatch_work_fn: bio done 7224 ffff0000fb3a9180
[Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 4664 ffff0000fb3a9180
[Fri Aug 23 11:04:09 2024] blk_throtl_dispatch_work_fn: bio done 16384 ffff0000fb3a8640
[Fri Aug 23 11:04:09 2024] __blk_throtl_bio: bio start 13824 ffff0000fb3a8640
[Fri Aug 23 11:04:10 2024] blk_throtl_dispatch_work_fn: bio done 16384 ffff0000fb3a9400
[Fri Aug 23 11:04:10 2024] __blk_throtl_bio: bio start 13824 ffff0000fb3a9400
[Fri Aug 23 11:04:11 2024] blk_throtl_dispatch_work_fn: bio done 16384 ffff0000fb3a8c80
[Fri Aug 23 11:04:11 2024] __blk_throtl_bio: bio start 13824 ffff0000fb3a8c80
[Fri Aug 23 11:04:12 2024] blk_throtl_dispatch_work_fn: bio done 16384 ffff0000fb3a9040
[Fri Aug 23 11:04:12 2024] __blk_throtl_bio: bio start 13824 ffff0000fb3a9040
[Fri Aug 23 11:04:12 2024] blk_throtl_dispatch_work_fn: bio done 16384 ffff0000fb3a92c0
[Fri Aug 23 11:04:12 2024] __blk_throtl_bio: bio start 13824 ffff0000fb3a92c0
[Fri Aug 23 11:04:13 2024] blk_throtl_dispatch_work_fn: bio done 4096 ffff0000fb3a8000
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 1536 ffff0000fb3a8000
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 1536 ffff0000fb3a8000
[Fri Aug 23 11:04:13 2024] blk_throtl_dispatch_work_fn: bio done 1400 ffff0000fb3a8f00
[Fri Aug 23 11:04:13 2024] blk_throtl_dispatch_work_fn: bio done 3616 ffff0000fb3a97c0
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 1056 ffff0000fb3a97c0
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 1056 ffff0000fb3a97c0
[Fri Aug 23 11:04:13 2024] blk_throtl_dispatch_work_fn: bio done 4664 ffff0000fb3a9180
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 2104 ffff0000fb3a9180
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 2104 ffff0000fb3a9180
[Fri Aug 23 11:04:13 2024] blk_throtl_dispatch_work_fn: bio done 13824 ffff0000fb3a8640
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 11264 ffff0000fb3a8640
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 11264 ffff0000fb3a8640
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 8704 ffff0000fb3a8640
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 8704 ffff0000fb3a8640
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 6144 ffff0000fb3a8640
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 6144 ffff0000fb3a8640
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 3584 ffff0000fb3a8640
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 3584 ffff0000fb3a8640
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 1024 ffff0000fb3a8640
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 1024 ffff0000fb3a8640
[Fri Aug 23 11:04:13 2024] blk_throtl_dispatch_work_fn: bio done 13824 ffff0000fb3a9400
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 11264 ffff0000fb3a9400
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 11264 ffff0000fb3a9400
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 8704 ffff0000fb3a9400
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 8704 ffff0000fb3a9400
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 6144 ffff0000fb3a9400
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 6144 ffff0000fb3a9400
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 3584 ffff0000fb3a9400
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 3584 ffff0000fb3a9400
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 1024 ffff0000fb3a9400
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 1024 ffff0000fb3a9400
[Fri Aug 23 11:04:13 2024] blk_throtl_dispatch_work_fn: bio done 13824 ffff0000fb3a8c80
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 11264 ffff0000fb3a8c80
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 11264 ffff0000fb3a8c80
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 8704 ffff0000fb3a8c80
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 8704 ffff0000fb3a8c80
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 6144 ffff0000fb3a8c80
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 6144 ffff0000fb3a8c80
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 3584 ffff0000fb3a8c80
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 3584 ffff0000fb3a8c80
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 1024 ffff0000fb3a8c80
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 1024 ffff0000fb3a8c80
[Fri Aug 23 11:04:13 2024] blk_throtl_dispatch_work_fn: bio done 13824 ffff0000fb3a9040
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 11264 ffff0000fb3a9040
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 11264 ffff0000fb3a9040
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 8704 ffff0000fb3a9040
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 8704 ffff0000fb3a9040
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 6144 ffff0000fb3a9040
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 6144 ffff0000fb3a9040
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 3584 ffff0000fb3a9040
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 3584 ffff0000fb3a9040
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 1024 ffff0000fb3a9040
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 1024 ffff0000fb3a9040
[Fri Aug 23 11:04:13 2024] blk_throtl_dispatch_work_fn: bio done 13824 ffff0000fb3a92c0
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 11264 ffff0000fb3a92c0
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 11264 ffff0000fb3a92c0
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 8704 ffff0000fb3a92c0
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 8704 ffff0000fb3a92c0
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 6144 ffff0000fb3a92c0
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 6144 ffff0000fb3a92c0
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 3584 ffff0000fb3a92c0
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 3584 ffff0000fb3a92c0
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 1024 ffff0000fb3a92c0
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 1024 ffff0000fb3a92c0
```

And without wiops, the result is quite different:

```
$ echo "8:16 wbps=10485760 wiops=max" > io.max

$ dd if=/dev/zero of=/dev/sdb bs=50M count=1 oflag=direct
1+0 records in
1+0 records out
52428800 bytes (52 MB, 50 MiB) copied, 5.08187 s, 10.3 MB/s

$ dmesg -T
[Fri Aug 23 10:59:10 2024] __blk_throtl_bio: bio start 2880 ffff0000c74659c0
[Fri Aug 23 10:59:10 2024] __blk_throtl_bio: bio start 6992 ffff00014f621b80
[Fri Aug 23 10:59:10 2024] __blk_throtl_bio: bio start 92528 ffff00014f620dc0
[Fri Aug 23 10:59:10 2024] blk_throtl_dispatch_work_fn: bio done 2880 ffff0000c74659c0
[Fri Aug 23 10:59:11 2024] blk_throtl_dispatch_work_fn: bio done 6992 ffff00014f621b80
[Fri Aug 23 10:59:15 2024] blk_throtl_dispatch_work_fn: bio done 92528 ffff00014f620dc0
```

Then, I retested for ext4 as you did.

```
$ lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
sda      8:0    0   90G  0 disk
├─sda1   8:1    0    1G  0 part /boot/efi
└─sda2   8:2    0 88.9G  0 part /
sdb      8:16   0   10G  0 disk

$ df -T /data
Filesystem     Type 1K-blocks     Used Available Use% Mounted on
/dev/sda2      ext4  91222760 54648704  31894224  64% /
```

With wiops, the result is as follows:

```
$ echo "8:0 wbps=10485760 wiops=100000" > io.max

$ rm -rf /data/file1 && dd if=/dev/zero of=/data/file1 bs=50M count=1 oflag=direct
1+0 records in
1+0 records out
52428800 bytes (52 MB, 50 MiB) copied, 5.06227 s, 10.4 MB/s

$ dmesg -T
[Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 2984 ffff0000fb3a8f00
[Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 6176 ffff0000fb3a97c0
[Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 7224 ffff0000fb3a9180
[Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 16384 ffff0000fb3a8640
[Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 16384 ffff0000fb3a9400
[Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 16384 ffff0000fb3a8c80
[Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 16384 ffff0000fb3a9040
[Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 16384 ffff0000fb3a92c0
[Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 4096 ffff0000fb3a8000
[Fri Aug 23 11:04:08 2024] blk_throtl_dispatch_work_fn: bio done 2984 ffff0000fb3a8f00
[Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 1400 ffff0000fb3a8f00
[Fri Aug 23 11:04:08 2024] blk_throtl_dispatch_work_fn: bio done 6176 ffff0000fb3a97c0
[Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 3616 ffff0000fb3a97c0
[Fri Aug 23 11:04:08 2024] blk_throtl_dispatch_work_fn: bio done 7224 ffff0000fb3a9180
[Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 4664 ffff0000fb3a9180
[Fri Aug 23 11:04:09 2024] blk_throtl_dispatch_work_fn: bio done 16384 ffff0000fb3a8640
[Fri Aug 23 11:04:09 2024] __blk_throtl_bio: bio start 13824 ffff0000fb3a8640
[Fri Aug 23 11:04:10 2024] blk_throtl_dispatch_work_fn: bio done 16384 ffff0000fb3a9400
[Fri Aug 23 11:04:10 2024] __blk_throtl_bio: bio start 13824 ffff0000fb3a9400
[Fri Aug 23 11:04:11 2024] blk_throtl_dispatch_work_fn: bio done 16384 ffff0000fb3a8c80
[Fri Aug 23 11:04:11 2024] __blk_throtl_bio: bio start 13824 ffff0000fb3a8c80
[Fri Aug 23 11:04:12 2024] blk_throtl_dispatch_work_fn: bio done 16384 ffff0000fb3a9040
[Fri Aug 23 11:04:12 2024] __blk_throtl_bio: bio start 13824 ffff0000fb3a9040
[Fri Aug 23 11:04:12 2024] blk_throtl_dispatch_work_fn: bio done 16384 ffff0000fb3a92c0
[Fri Aug 23 11:04:12 2024] __blk_throtl_bio: bio start 13824 ffff0000fb3a92c0
[Fri Aug 23 11:04:13 2024] blk_throtl_dispatch_work_fn: bio done 4096 ffff0000fb3a8000
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 1536 ffff0000fb3a8000
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 1536 ffff0000fb3a8000
[Fri Aug 23 11:04:13 2024] blk_throtl_dispatch_work_fn: bio done 1400 ffff0000fb3a8f00
[Fri Aug 23 11:04:13 2024] blk_throtl_dispatch_work_fn: bio done 3616 ffff0000fb3a97c0
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 1056 ffff0000fb3a97c0
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 1056 ffff0000fb3a97c0
[Fri Aug 23 11:04:13 2024] blk_throtl_dispatch_work_fn: bio done 4664 ffff0000fb3a9180
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 2104 ffff0000fb3a9180
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 2104 ffff0000fb3a9180
[Fri Aug 23 11:04:13 2024] blk_throtl_dispatch_work_fn: bio done 13824 ffff0000fb3a8640
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 11264 ffff0000fb3a8640
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 11264 ffff0000fb3a8640
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 8704 ffff0000fb3a8640
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 8704 ffff0000fb3a8640
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 6144 ffff0000fb3a8640
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 6144 ffff0000fb3a8640
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 3584 ffff0000fb3a8640
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 3584 ffff0000fb3a8640
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 1024 ffff0000fb3a8640
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 1024 ffff0000fb3a8640
[Fri Aug 23 11:04:13 2024] blk_throtl_dispatch_work_fn: bio done 13824 ffff0000fb3a9400
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 11264 ffff0000fb3a9400
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 11264 ffff0000fb3a9400
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 8704 ffff0000fb3a9400
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 8704 ffff0000fb3a9400
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 6144 ffff0000fb3a9400
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 6144 ffff0000fb3a9400
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 3584 ffff0000fb3a9400
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 3584 ffff0000fb3a9400
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 1024 ffff0000fb3a9400
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 1024 ffff0000fb3a9400
[Fri Aug 23 11:04:13 2024] blk_throtl_dispatch_work_fn: bio done 13824 ffff0000fb3a8c80
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 11264 ffff0000fb3a8c80
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 11264 ffff0000fb3a8c80
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 8704 ffff0000fb3a8c80
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 8704 ffff0000fb3a8c80
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 6144 ffff0000fb3a8c80
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 6144 ffff0000fb3a8c80
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 3584 ffff0000fb3a8c80
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 3584 ffff0000fb3a8c80
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 1024 ffff0000fb3a8c80
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 1024 ffff0000fb3a8c80
[Fri Aug 23 11:04:13 2024] blk_throtl_dispatch_work_fn: bio done 13824 ffff0000fb3a9040
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 11264 ffff0000fb3a9040
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 11264 ffff0000fb3a9040
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 8704 ffff0000fb3a9040
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 8704 ffff0000fb3a9040
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 6144 ffff0000fb3a9040
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 6144 ffff0000fb3a9040
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 3584 ffff0000fb3a9040
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 3584 ffff0000fb3a9040
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 1024 ffff0000fb3a9040
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 1024 ffff0000fb3a9040
[Fri Aug 23 11:04:13 2024] blk_throtl_dispatch_work_fn: bio done 13824 ffff0000fb3a92c0
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 11264 ffff0000fb3a92c0
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 11264 ffff0000fb3a92c0
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 8704 ffff0000fb3a92c0
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 8704 ffff0000fb3a92c0
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 6144 ffff0000fb3a92c0
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 6144 ffff0000fb3a92c0
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 3584 ffff0000fb3a92c0
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 3584 ffff0000fb3a92c0
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio start 1024 ffff0000fb3a92c0
[Fri Aug 23 11:04:13 2024] __blk_throtl_bio: bio done 1024 ffff0000fb3a92c0
```

And without wiops, the result is also quite different:

```
$ echo "8:0 wbps=10485760 wiops=max" > io.max

$ rm -rf /data/file1 && dd if=/dev/zero of=/data/file1 bs=50M count=1 oflag=direct
1+0 records in
1+0 records out
52428800 bytes (52 MB, 50 MiB) copied, 5.03759 s, 10.4 MB/s

$ dmesg -T
[Fri Aug 23 11:05:07 2024] __blk_throtl_bio: bio start 2904 ffff0000c4e9f2c0
[Fri Aug 23 11:05:07 2024] __blk_throtl_bio: bio start 5984 ffff0000c4e9e000
[Fri Aug 23 11:05:07 2024] __blk_throtl_bio: bio start 7496 ffff0000c4e9e3c0
[Fri Aug 23 11:05:07 2024] __blk_throtl_bio: bio start 16384 ffff0000c4e9eb40
[Fri Aug 23 11:05:07 2024] __blk_throtl_bio: bio start 16384 ffff0000c4e9f540
[Fri Aug 23 11:05:07 2024] __blk_throtl_bio: bio start 16384 ffff0000c4e9e780
[Fri Aug 23 11:05:07 2024] __blk_throtl_bio: bio start 16384 ffff0000c4e9ea00
[Fri Aug 23 11:05:07 2024] __blk_throtl_bio: bio start 16384 ffff0000c4e9f900
[Fri Aug 23 11:05:07 2024] __blk_throtl_bio: bio start 4096 ffff0000c4e9e8c0
[Fri Aug 23 11:05:07 2024] blk_throtl_dispatch_work_fn: bio done 2904 ffff0000c4e9f2c0
[Fri Aug 23 11:05:07 2024] blk_throtl_dispatch_work_fn: bio done 5984 ffff0000c4e9e000
[Fri Aug 23 11:05:08 2024] blk_throtl_dispatch_work_fn: bio done 7496 ffff0000c4e9e3c0
[Fri Aug 23 11:05:09 2024] blk_throtl_dispatch_work_fn: bio done 16384 ffff0000c4e9eb40
[Fri Aug 23 11:05:09 2024] blk_throtl_dispatch_work_fn: bio done 16384 ffff0000c4e9f540
[Fri Aug 23 11:05:10 2024] blk_throtl_dispatch_work_fn: bio done 16384 ffff0000c4e9e780
[Fri Aug 23 11:05:11 2024] blk_throtl_dispatch_work_fn: bio done 16384 ffff0000c4e9ea00
[Fri Aug 23 11:05:12 2024] blk_throtl_dispatch_work_fn: bio done 16384 ffff0000c4e9f900
[Fri Aug 23 11:05:12 2024] blk_throtl_dispatch_work_fn: bio done 4096 ffff0000c4e9e8c0
```

Hmm... I still hava two questions here:
1. Is wbps an average value?
2. What's the difference between setting 'max' and setting a very high value for 'wiops'?

Thanks a lot again for your time!
Lance


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] cgroupv2/blk: inconsistent I/O behavior in Cgroup v2 with set device wbps and wiops
  2024-08-13  7:19         ` Yu Kuai
  2024-08-15  1:59           ` Lance Yang
  2024-08-23 12:05           ` Lance Yang
@ 2024-08-23 12:19           ` Lance Yang
  2 siblings, 0 replies; 13+ messages in thread
From: Lance Yang @ 2024-08-23 12:19 UTC (permalink / raw)
  To: yukuai1
  Cc: 21cnbao, a.hindborg, axboe, baolin.wang, boqun.feng, cgroups,
	david, fujita.tomonori, ioworker0, josef, libang.li, linux-block,
	linux-kernel, linux-mm, mkoutny, paolo.valente, tj, vbabka,
	yukuai3

Forget to add the test result of buffered IO:

With wiops, the result is as follows:

```
$ echo "8:0 wbps=10485760 wiops=100000" > io.max

$ rm -rf /data/file1 && dd if=/dev/zero of=/data/file1 bs=50M count=1
1+0 records in
1+0 records out
52428800 bytes (52 MB, 50 MiB) copied, 0.062217 s, 843 MB/s

$ dmesg -T
[Fri Aug 23 12:09:10 2024] __blk_throtl_bio: bio start 16384 ffff0000ce5ac500
[Fri Aug 23 12:09:10 2024] __blk_throtl_bio: bio start 16384 ffff0000ce5adb80
[Fri Aug 23 12:09:10 2024] __blk_throtl_bio: bio start 16384 ffff0000ce5ac140
[Fri Aug 23 12:09:10 2024] __blk_throtl_bio: bio start 16384 ffff0000ce5acdc0
[Fri Aug 23 12:09:10 2024] __blk_throtl_bio: bio start 16384 ffff0000ce5ac280
[Fri Aug 23 12:09:10 2024] __blk_throtl_bio: bio start 16384 ffff0000ce5ada40
[Fri Aug 23 12:09:10 2024] __blk_throtl_bio: bio start 4096 ffff0000ce5adcc0
[Fri Aug 23 12:09:10 2024] blk_throtl_dispatch_work_fn: bio done 16384 ffff0000ce5ac500
[Fri Aug 23 12:09:10 2024] __blk_throtl_bio: bio start 13824 ffff0000ce5ac500
[Fri Aug 23 12:09:11 2024] blk_throtl_dispatch_work_fn: bio done 16384 ffff0000ce5adb80
[Fri Aug 23 12:09:11 2024] __blk_throtl_bio: bio start 13824 ffff0000ce5adb80
[Fri Aug 23 12:09:12 2024] blk_throtl_dispatch_work_fn: bio done 16384 ffff0000ce5ac140
[Fri Aug 23 12:09:12 2024] __blk_throtl_bio: bio start 13824 ffff0000ce5ac140
[Fri Aug 23 12:09:13 2024] blk_throtl_dispatch_work_fn: bio done 16384 ffff0000ce5acdc0
[Fri Aug 23 12:09:13 2024] __blk_throtl_bio: bio start 13824 ffff0000ce5acdc0
[Fri Aug 23 12:09:14 2024] blk_throtl_dispatch_work_fn: bio done 16384 ffff0000ce5ac280
[Fri Aug 23 12:09:14 2024] __blk_throtl_bio: bio start 13824 ffff0000ce5ac280
[Fri Aug 23 12:09:14 2024] blk_throtl_dispatch_work_fn: bio done 16384 ffff0000ce5ada40
[Fri Aug 23 12:09:14 2024] __blk_throtl_bio: bio start 13824 ffff0000ce5ada40
[Fri Aug 23 12:09:15 2024] blk_throtl_dispatch_work_fn: bio done 4096 ffff0000ce5adcc0
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio start 1536 ffff0000ce5adcc0
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio done 1536 ffff0000ce5adcc0
[Fri Aug 23 12:09:15 2024] blk_throtl_dispatch_work_fn: bio done 13824 ffff0000ce5ac500
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio start 11264 ffff0000ce5ac500
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio done 11264 ffff0000ce5ac500
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio start 8704 ffff0000ce5ac500
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio done 8704 ffff0000ce5ac500
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio start 6144 ffff0000ce5ac500
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio done 6144 ffff0000ce5ac500
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio start 3584 ffff0000ce5ac500
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio done 3584 ffff0000ce5ac500
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio start 1024 ffff0000ce5ac500
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio done 1024 ffff0000ce5ac500
[Fri Aug 23 12:09:15 2024] blk_throtl_dispatch_work_fn: bio done 13824 ffff0000ce5adb80
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio start 11264 ffff0000ce5adb80
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio done 11264 ffff0000ce5adb80
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio start 8704 ffff0000ce5adb80
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio done 8704 ffff0000ce5adb80
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio start 6144 ffff0000ce5adb80
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio done 6144 ffff0000ce5adb80
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio start 3584 ffff0000ce5adb80
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio done 3584 ffff0000ce5adb80
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio start 1024 ffff0000ce5adb80
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio done 1024 ffff0000ce5adb80
[Fri Aug 23 12:09:15 2024] blk_throtl_dispatch_work_fn: bio done 13824 ffff0000ce5ac140
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio start 11264 ffff0000ce5ac140
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio done 11264 ffff0000ce5ac140
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio start 8704 ffff0000ce5ac140
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio done 8704 ffff0000ce5ac140
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio start 6144 ffff0000ce5ac140
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio done 6144 ffff0000ce5ac140
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio start 3584 ffff0000ce5ac140
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio done 3584 ffff0000ce5ac140
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio start 1024 ffff0000ce5ac140
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio done 1024 ffff0000ce5ac140
[Fri Aug 23 12:09:15 2024] blk_throtl_dispatch_work_fn: bio done 13824 ffff0000ce5acdc0
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio start 11264 ffff0000ce5acdc0
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio done 11264 ffff0000ce5acdc0
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio start 8704 ffff0000ce5acdc0
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio done 8704 ffff0000ce5acdc0
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio start 6144 ffff0000ce5acdc0
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio done 6144 ffff0000ce5acdc0
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio start 3584 ffff0000ce5acdc0
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio done 3584 ffff0000ce5acdc0
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio start 1024 ffff0000ce5acdc0
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio done 1024 ffff0000ce5acdc0
[Fri Aug 23 12:09:15 2024] blk_throtl_dispatch_work_fn: bio done 13824 ffff0000ce5ac280
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio start 11264 ffff0000ce5ac280
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio done 11264 ffff0000ce5ac280
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio start 8704 ffff0000ce5ac280
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio done 8704 ffff0000ce5ac280
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio start 6144 ffff0000ce5ac280
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio done 6144 ffff0000ce5ac280
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio start 3584 ffff0000ce5ac280
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio done 3584 ffff0000ce5ac280
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio start 1024 ffff0000ce5ac280
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio done 1024 ffff0000ce5ac280
[Fri Aug 23 12:09:15 2024] blk_throtl_dispatch_work_fn: bio done 13824 ffff0000ce5ada40
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio start 11264 ffff0000ce5ada40
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio done 11264 ffff0000ce5ada40
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio start 8704 ffff0000ce5ada40
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio done 8704 ffff0000ce5ada40
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio start 6144 ffff0000ce5ada40
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio done 6144 ffff0000ce5ada40
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio start 3584 ffff0000ce5ada40
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio done 3584 ffff0000ce5ada40
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio start 1024 ffff0000ce5ada40
[Fri Aug 23 12:09:15 2024] __blk_throtl_bio: bio done 1024 ffff0000ce5ada40
```

And without wiops, the result is quite different as well:

```
$ echo "8:0 wbps=10485760 wiops=max" > io.max

$ rm -rf /data/file1 && dd if=/dev/zero of=/data/file1 bs=50M count=1
1+0 records in
1+0 records out
52428800 bytes (52 MB, 50 MiB) copied, 0.0791369 s, 663 MB/s

$ dmesg -T
[Fri Aug 23 12:16:50 2024] __blk_throtl_bio: bio start 16384 ffff0000f87ca3c0
[Fri Aug 23 12:16:50 2024] __blk_throtl_bio: bio start 16384 ffff0000f87ca000
[Fri Aug 23 12:16:50 2024] __blk_throtl_bio: bio start 16384 ffff0000f87cb2c0
[Fri Aug 23 12:16:50 2024] __blk_throtl_bio: bio start 16384 ffff0000f87cb040
[Fri Aug 23 12:16:50 2024] __blk_throtl_bio: bio start 16384 ffff0000f87cac80
[Fri Aug 23 12:16:50 2024] __blk_throtl_bio: bio start 16384 ffff0000f87cb400
[Fri Aug 23 12:16:50 2024] __blk_throtl_bio: bio start 4096 ffff0000f87ca640
[Fri Aug 23 12:16:51 2024] blk_throtl_dispatch_work_fn: bio done 16384 ffff0000f87ca3c0
[Fri Aug 23 12:16:52 2024] blk_throtl_dispatch_work_fn: bio done 16384 ffff0000f87ca000
[Fri Aug 23 12:16:53 2024] blk_throtl_dispatch_work_fn: bio done 16384 ffff0000f87cb2c0
[Fri Aug 23 12:16:54 2024] blk_throtl_dispatch_work_fn: bio done 16384 ffff0000f87cb040
[Fri Aug 23 12:16:54 2024] blk_throtl_dispatch_work_fn: bio done 16384 ffff0000f87cac80
[Fri Aug 23 12:16:55 2024] blk_throtl_dispatch_work_fn: bio done 16384 ffff0000f87cb400
[Fri Aug 23 12:16:55 2024] blk_throtl_dispatch_work_fn: bio done 4096 ffff0000f87ca640
```

Thanks,
Lance


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] cgroupv2/blk: inconsistent I/O behavior in Cgroup v2 with set device wbps and wiops
  2024-08-23 12:05           ` Lance Yang
@ 2024-08-26  1:31             ` Yu Kuai
  2024-08-26  2:15               ` Lance Yang
  0 siblings, 1 reply; 13+ messages in thread
From: Yu Kuai @ 2024-08-26  1:31 UTC (permalink / raw)
  To: Lance Yang, yukuai1
  Cc: 21cnbao, a.hindborg, axboe, baolin.wang, boqun.feng, cgroups,
	david, fujita.tomonori, josef, libang.li, linux-block,
	linux-kernel, linux-mm, mkoutny, paolo.valente, tj, vbabka,
	yukuai (C)

Hi,

在 2024/08/23 20:05, Lance Yang 写道:
> My bad, I got tied up with some stuff :(
> 
> Hmm... tried your debug patch today, but my test results are different from
> yours. So let's take a look at direct IO with raw disk first.
> 
> ```
> $ lsblk
> NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
> sda      8:0    0   90G  0 disk
> ├─sda1   8:1    0    1G  0 part /boot/efi
> └─sda2   8:2    0 88.9G  0 part /
> sdb      8:16   0   10G  0 disk
> 
> $ cat  /sys/block/sda/queue/scheduler
> none [mq-deadline]
> 
> $ cat  /sys/block/sda/queue/rotational
> 0
> 
> $ cat  /sys/block/sdb/queue/rotational
> 0
> 
> $ cat  /sys/block/sdb/queue/scheduler
> none [mq-deadline]
> 
> $ cat /boot/config-6.11.0-rc3+ |grep CONFIG_CGROUP_
> # CONFIG_CGROUP_FAVOR_DYNMODS is not set
> CONFIG_CGROUP_WRITEBACK=y
> CONFIG_CGROUP_SCHED=y
> CONFIG_CGROUP_PIDS=y
> CONFIG_CGROUP_RDMA=y
> CONFIG_CGROUP_FREEZER=y
> CONFIG_CGROUP_HUGETLB=y
> CONFIG_CGROUP_DEVICE=y
> CONFIG_CGROUP_CPUACCT=y
> CONFIG_CGROUP_PERF=y
> CONFIG_CGROUP_BPF=y
> CONFIG_CGROUP_MISC=y
> # CONFIG_CGROUP_DEBUG is not set
> CONFIG_CGROUP_NET_PRIO=y
> CONFIG_CGROUP_NET_CLASSID=y
> 
> $ cd /sys/fs/cgroup/test/ && cat cgroup.controllers
> cpu io memory pids
> 
> $ cat io.weight
> default 100
> 
> $ cat io.prio.class
> no-change
> ```
> 
> With wiops, the result is as follows:
> 
> ```
> $ echo "8:16 wbps=10485760 wiops=100000" > io.max
> 
> $ dd if=/dev/zero of=/dev/sdb bs=50M count=1 oflag=direct
> 1+0 records in
> 1+0 records out
> 52428800 bytes (52 MB, 50 MiB) copied, 5.05893 s, 10.4 MB/s
> 
> $ dmesg -T
> [Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 2984 ffff0000fb3a8f00
> [Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 6176 ffff0000fb3a97c0
> [Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 7224 ffff0000fb3a9180
> [Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 16384 ffff0000fb3a8640
> [Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 16384 ffff0000fb3a9400
> [Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 16384 ffff0000fb3a8c80
> [Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 16384 ffff0000fb3a9040
> [Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 16384 ffff0000fb3a92c0
> [Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 4096 ffff0000fb3a8000

> 
> And without wiops, the result is quite different:
> 
> ```
> $ echo "8:16 wbps=10485760 wiops=max" > io.max
> 
> $ dd if=/dev/zero of=/dev/sdb bs=50M count=1 oflag=direct
> 1+0 records in
> 1+0 records out
> 52428800 bytes (52 MB, 50 MiB) copied, 5.08187 s, 10.3 MB/s
> 
> $ dmesg -T
> [Fri Aug 23 10:59:10 2024] __blk_throtl_bio: bio start 2880 ffff0000c74659c0
> [Fri Aug 23 10:59:10 2024] __blk_throtl_bio: bio start 6992 ffff00014f621b80
> [Fri Aug 23 10:59:10 2024] __blk_throtl_bio: bio start 92528 ffff00014f620dc0

I don't know why IO size from fs layer is different in this case.

> ```
> 
> Then, I retested for ext4 as you did.
> 
> ```
> $ lsblk
> NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
> sda      8:0    0   90G  0 disk
> ├─sda1   8:1    0    1G  0 part /boot/efi
> └─sda2   8:2    0 88.9G  0 part /
> sdb      8:16   0   10G  0 disk
> 
> $ df -T /data
> Filesystem     Type 1K-blocks     Used Available Use% Mounted on
> /dev/sda2      ext4  91222760 54648704  31894224  64% /
> ```
> 
> With wiops, the result is as follows:
> 
> ```
> $ echo "8:0 wbps=10485760 wiops=100000" > io.max
> 
> $ rm -rf /data/file1 && dd if=/dev/zero of=/data/file1 bs=50M count=1 oflag=direct
> 1+0 records in
> 1+0 records out
> 52428800 bytes (52 MB, 50 MiB) copied, 5.06227 s, 10.4 MB/s
> 
> $ dmesg -T
> [Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 2984 ffff0000fb3a8f00
> [Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 6176 ffff0000fb3a97c0
> [Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 7224 ffff0000fb3a9180
> [Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 16384 ffff0000fb3a8640
> [Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 16384 ffff0000fb3a9400
> [Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 16384 ffff0000fb3a8c80
> [Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 16384 ffff0000fb3a9040
> [Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 16384 ffff0000fb3a92c0
> [Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 4096 ffff0000fb3a8000

> 
> And without wiops, the result is also quite different:
> 
> ```
> $ echo "8:0 wbps=10485760 wiops=max" > io.max
> 
> $ rm -rf /data/file1 && dd if=/dev/zero of=/data/file1 bs=50M count=1 oflag=direct
> 1+0 records in
> 1+0 records out
> 52428800 bytes (52 MB, 50 MiB) copied, 5.03759 s, 10.4 MB/s
> 
> $ dmesg -T
> [Fri Aug 23 11:05:07 2024] __blk_throtl_bio: bio start 2904 ffff0000c4e9f2c0
> [Fri Aug 23 11:05:07 2024] __blk_throtl_bio: bio start 5984 ffff0000c4e9e000
> [Fri Aug 23 11:05:07 2024] __blk_throtl_bio: bio start 7496 ffff0000c4e9e3c0
> [Fri Aug 23 11:05:07 2024] __blk_throtl_bio: bio start 16384 ffff0000c4e9eb40
> [Fri Aug 23 11:05:07 2024] __blk_throtl_bio: bio start 16384 ffff0000c4e9f540
> [Fri Aug 23 11:05:07 2024] __blk_throtl_bio: bio start 16384 ffff0000c4e9e780
> [Fri Aug 23 11:05:07 2024] __blk_throtl_bio: bio start 16384 ffff0000c4e9ea00
> [Fri Aug 23 11:05:07 2024] __blk_throtl_bio: bio start 16384 ffff0000c4e9f900
> [Fri Aug 23 11:05:07 2024] __blk_throtl_bio: bio start 4096 ffff0000c4e9e8c0

While ext4 is the same. And I won't say result is different here.
> [
> ```
> 
> Hmm... I still hava two questions here:
> 1. Is wbps an average value?

Yes.
> 2. What's the difference between setting 'max' and setting a very high value for 'wiops'?

The only difference is that:

- If there is no iops limit, splited IO will be dispatched directly;
- If there is iops limit, splited IO will be throttled again. iops is
high, however, blk-throtl is FIFO, splited IO will have to wait for
formal request to be throttled by bps first before checking the iops
limit for splited IO.

Thanks,
Kuai

> 
> Thanks a lot again for your time!
> Lance
> .
> 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] cgroupv2/blk: inconsistent I/O behavior in Cgroup v2 with set device wbps and wiops
  2024-08-26  1:31             ` Yu Kuai
@ 2024-08-26  2:15               ` Lance Yang
  0 siblings, 0 replies; 13+ messages in thread
From: Lance Yang @ 2024-08-26  2:15 UTC (permalink / raw)
  To: Yu Kuai
  Cc: 21cnbao, a.hindborg, axboe, baolin.wang, boqun.feng, cgroups,
	david, fujita.tomonori, josef, libang.li, linux-block,
	linux-kernel, linux-mm, mkoutny, paolo.valente, tj, vbabka,
	yukuai (C)

Hi Kuai,

Thanks a lot for following up on this!

On Mon, Aug 26, 2024 at 9:31 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2024/08/23 20:05, Lance Yang 写道:
> > My bad, I got tied up with some stuff :(
> >
> > Hmm... tried your debug patch today, but my test results are different from
> > yours. So let's take a look at direct IO with raw disk first.
> >
> > ```
> > $ lsblk
> > NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
> > sda      8:0    0   90G  0 disk
> > ├─sda1   8:1    0    1G  0 part /boot/efi
> > └─sda2   8:2    0 88.9G  0 part /
> > sdb      8:16   0   10G  0 disk
> >
> > $ cat  /sys/block/sda/queue/scheduler
> > none [mq-deadline]
> >
> > $ cat  /sys/block/sda/queue/rotational
> > 0
> >
> > $ cat  /sys/block/sdb/queue/rotational
> > 0
> >
> > $ cat  /sys/block/sdb/queue/scheduler
> > none [mq-deadline]
> >
> > $ cat /boot/config-6.11.0-rc3+ |grep CONFIG_CGROUP_
> > # CONFIG_CGROUP_FAVOR_DYNMODS is not set
> > CONFIG_CGROUP_WRITEBACK=y
> > CONFIG_CGROUP_SCHED=y
> > CONFIG_CGROUP_PIDS=y
> > CONFIG_CGROUP_RDMA=y
> > CONFIG_CGROUP_FREEZER=y
> > CONFIG_CGROUP_HUGETLB=y
> > CONFIG_CGROUP_DEVICE=y
> > CONFIG_CGROUP_CPUACCT=y
> > CONFIG_CGROUP_PERF=y
> > CONFIG_CGROUP_BPF=y
> > CONFIG_CGROUP_MISC=y
> > # CONFIG_CGROUP_DEBUG is not set
> > CONFIG_CGROUP_NET_PRIO=y
> > CONFIG_CGROUP_NET_CLASSID=y
> >
> > $ cd /sys/fs/cgroup/test/ && cat cgroup.controllers
> > cpu io memory pids
> >
> > $ cat io.weight
> > default 100
> >
> > $ cat io.prio.class
> > no-change
> > ```
> >
> > With wiops, the result is as follows:
> >
> > ```
> > $ echo "8:16 wbps=10485760 wiops=100000" > io.max
> >
> > $ dd if=/dev/zero of=/dev/sdb bs=50M count=1 oflag=direct
> > 1+0 records in
> > 1+0 records out
> > 52428800 bytes (52 MB, 50 MiB) copied, 5.05893 s, 10.4 MB/s
> >
> > $ dmesg -T
> > [Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 2984 ffff0000fb3a8f00
> > [Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 6176 ffff0000fb3a97c0
> > [Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 7224 ffff0000fb3a9180
> > [Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 16384 ffff0000fb3a8640
> > [Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 16384 ffff0000fb3a9400
> > [Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 16384 ffff0000fb3a8c80
> > [Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 16384 ffff0000fb3a9040
> > [Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 16384 ffff0000fb3a92c0
> > [Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 4096 ffff0000fb3a8000
>
> >
> > And without wiops, the result is quite different:
> >
> > ```
> > $ echo "8:16 wbps=10485760 wiops=max" > io.max
> >
> > $ dd if=/dev/zero of=/dev/sdb bs=50M count=1 oflag=direct
> > 1+0 records in
> > 1+0 records out
> > 52428800 bytes (52 MB, 50 MiB) copied, 5.08187 s, 10.3 MB/s
> >
> > $ dmesg -T
> > [Fri Aug 23 10:59:10 2024] __blk_throtl_bio: bio start 2880 ffff0000c74659c0
> > [Fri Aug 23 10:59:10 2024] __blk_throtl_bio: bio start 6992 ffff00014f621b80
> > [Fri Aug 23 10:59:10 2024] __blk_throtl_bio: bio start 92528 ffff00014f620dc0
>
> I don't know why IO size from fs layer is different in this case.

Me neither...

>
> > ```
> >
> > Then, I retested for ext4 as you did.
> >
> > ```
> > $ lsblk
> > NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
> > sda      8:0    0   90G  0 disk
> > ├─sda1   8:1    0    1G  0 part /boot/efi
> > └─sda2   8:2    0 88.9G  0 part /
> > sdb      8:16   0   10G  0 disk
> >
> > $ df -T /data
> > Filesystem     Type 1K-blocks     Used Available Use% Mounted on
> > /dev/sda2      ext4  91222760 54648704  31894224  64% /
> > ```
> >
> > With wiops, the result is as follows:
> >
> > ```
> > $ echo "8:0 wbps=10485760 wiops=100000" > io.max
> >
> > $ rm -rf /data/file1 && dd if=/dev/zero of=/data/file1 bs=50M count=1 oflag=direct
> > 1+0 records in
> > 1+0 records out
> > 52428800 bytes (52 MB, 50 MiB) copied, 5.06227 s, 10.4 MB/s
> >
> > $ dmesg -T
> > [Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 2984 ffff0000fb3a8f00
> > [Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 6176 ffff0000fb3a97c0
> > [Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 7224 ffff0000fb3a9180
> > [Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 16384 ffff0000fb3a8640
> > [Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 16384 ffff0000fb3a9400
> > [Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 16384 ffff0000fb3a8c80
> > [Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 16384 ffff0000fb3a9040
> > [Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 16384 ffff0000fb3a92c0
> > [Fri Aug 23 11:04:08 2024] __blk_throtl_bio: bio start 4096 ffff0000fb3a8000
>
> >
> > And without wiops, the result is also quite different:
> >
> > ```
> > $ echo "8:0 wbps=10485760 wiops=max" > io.max
> >
> > $ rm -rf /data/file1 && dd if=/dev/zero of=/data/file1 bs=50M count=1 oflag=direct
> > 1+0 records in
> > 1+0 records out
> > 52428800 bytes (52 MB, 50 MiB) copied, 5.03759 s, 10.4 MB/s
> >
> > $ dmesg -T
> > [Fri Aug 23 11:05:07 2024] __blk_throtl_bio: bio start 2904 ffff0000c4e9f2c0
> > [Fri Aug 23 11:05:07 2024] __blk_throtl_bio: bio start 5984 ffff0000c4e9e000
> > [Fri Aug 23 11:05:07 2024] __blk_throtl_bio: bio start 7496 ffff0000c4e9e3c0
> > [Fri Aug 23 11:05:07 2024] __blk_throtl_bio: bio start 16384 ffff0000c4e9eb40
> > [Fri Aug 23 11:05:07 2024] __blk_throtl_bio: bio start 16384 ffff0000c4e9f540
> > [Fri Aug 23 11:05:07 2024] __blk_throtl_bio: bio start 16384 ffff0000c4e9e780
> > [Fri Aug 23 11:05:07 2024] __blk_throtl_bio: bio start 16384 ffff0000c4e9ea00
> > [Fri Aug 23 11:05:07 2024] __blk_throtl_bio: bio start 16384 ffff0000c4e9f900
> > [Fri Aug 23 11:05:07 2024] __blk_throtl_bio: bio start 4096 ffff0000c4e9e8c0
>
> While ext4 is the same. And I won't say result is different here.

Perhap there is other subtle stuff at play since ext4 is the same?

> > [
> > ```
> >
> > Hmm... I still hava two questions here:
> > 1. Is wbps an average value?
>
> Yes.
> > 2. What's the difference between setting 'max' and setting a very high value for 'wiops'?
>
> The only difference is that:
>
> - If there is no iops limit, splited IO will be dispatched directly;
> - If there is iops limit, splited IO will be throttled again. iops is
> high, however, blk-throtl is FIFO, splited IO will have to wait for
> formal request to be throttled by bps first before checking the iops
> limit for splited IO.

Thanks a lot again for the lesson!
Lance

>
> Thanks,
> Kuai
>
> >
> > Thanks a lot again for your time!
> > Lance
> > .
> >
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-08-26  2:16 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-08-12 15:00 [BUG] cgroupv2/blk: inconsistent I/O behavior in Cgroup v2 with set device wbps and wiops Lance Yang
2024-08-12 15:43 ` Michal Koutný
2024-08-13  1:37   ` Yu Kuai
2024-08-13  5:00     ` Lance Yang
2024-08-13  6:17       ` Lance Yang
2024-08-13  6:39       ` Yu Kuai
2024-08-13  7:19         ` Yu Kuai
2024-08-15  1:59           ` Lance Yang
2024-08-23 12:05           ` Lance Yang
2024-08-26  1:31             ` Yu Kuai
2024-08-26  2:15               ` Lance Yang
2024-08-23 12:19           ` Lance Yang
2024-08-13  5:11   ` Lance Yang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox