From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B2D3DC02182 for ; Wed, 22 Jan 2025 00:30:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2BC326B0085; Tue, 21 Jan 2025 19:30:12 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 26B9E280002; Tue, 21 Jan 2025 19:30:12 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0F1A4280001; Tue, 21 Jan 2025 19:30:12 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id DB1D56B0085 for ; Tue, 21 Jan 2025 19:30:11 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 5DA561411C8 for ; Wed, 22 Jan 2025 00:30:11 +0000 (UTC) X-FDA: 83033205822.14.9FB29CE Received: from mail-qt1-f182.google.com (mail-qt1-f182.google.com [209.85.160.182]) by imf10.hostedemail.com (Postfix) with ESMTP id 7A277C000F for ; Wed, 22 Jan 2025 00:30:09 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=TxKQfRWs; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf10.hostedemail.com: domain of joannelkoong@gmail.com designates 209.85.160.182 as permitted sender) smtp.mailfrom=joannelkoong@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737505809; a=rsa-sha256; cv=none; b=2fooEzPqPazg9UxQ4L3qTAYkfY0Se24d4r5WKDSRaqY9B1ly5LMD4ihRSHG8K5+wiTuo+I tNXY2+7R9AycQMHdCz/YzendKlKRt51j3C8FLrqJdsPnqTKJ4V/09aJW0UXMZaai0xd4Gd UM1cL5TSz7pD3ICJK+R0v/C2VzYkbOM= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=TxKQfRWs; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf10.hostedemail.com: domain of joannelkoong@gmail.com designates 209.85.160.182 as permitted sender) smtp.mailfrom=joannelkoong@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737505809; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=97ThALrZNXsHnY1PGvooVdFPHPlqCyzRVm/X1v+Brcc=; b=vuYt1zuM2RDL9JrOZz9FAC9Tn4Z5HkpFYV6mTu2GExeD4fW0M9mG3wcBsgaxHrJTt8/xQJ pbE8gYJaST8J07JqHcms8oucwaGkcmm0/AI1DOi4cCtrPTTsTvd7gEgNzECPJd/2OWMj4w TWB2N4OABxs51YGRVLVLesWDE+YAVC4= Received: by mail-qt1-f182.google.com with SMTP id d75a77b69052e-46901d01355so59758011cf.0 for ; Tue, 21 Jan 2025 16:30:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1737505808; x=1738110608; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=97ThALrZNXsHnY1PGvooVdFPHPlqCyzRVm/X1v+Brcc=; b=TxKQfRWssm7oEEZaJ7RyJBjRrlUUCvKXTulVLW3LXcMyUFzo6Jpqn5BD+4mu6tUIN2 moOEyU+duCBdCT+Jrqj2vSkxj4iqOEX42C7hXDal5c2mhUvjQFH0oLGaXLl7mMCB2zKx nmwjOVhngmfWajZMrSl1Yzt4vcpu1fOjQ0sn7w0ftxoen4HT6IaRH36fOLqB28wq5597 0TOXwZ79jmY7bPFOv/S53gWqoM20vpUE9QYxpswe5oGHeP1m8apqFyu4qKyK9R/Wzejq wO2vYl0M7p7uyf1rARDl2ZXsAvdZd5McSvypPZ4tZjzE0JB6R9VR2bwZinx5tQGOUv+s yyqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737505808; x=1738110608; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=97ThALrZNXsHnY1PGvooVdFPHPlqCyzRVm/X1v+Brcc=; b=Bp5SxG8Urniv1ZwSNCks0UWQgk9lHDsLEt0BE59I4c5BDV4aEd0sNlStqLTPsJj6kr OGCtCz/yLFR01nCObBn7JmU9HeF9r3DuUyno5pD0Mz5anTbW5f3Q5VPB0CQ7gNWJE6c+ iVqHPZGaYHKr0JCkBuvrjfu0tLoUTFRquEWLO/IWTHKvq5ORCdia2e4guPYoFOCdZfC7 aI6IBmGj1bK2wIuUy13isjFu4BABzC8u0qWgnV1PrX9cgHekCG+Iaom3aacXA9ojdtwZ WnPIQpkFkkeVOCiyw17IG3JMx+tDiAo3uVmKRkpaeWNqxOhF5gMEZ6F1pXT7h3iBgC/Q SkFA== X-Forwarded-Encrypted: i=1; AJvYcCVbfjltnz7YenFvWCt5EdetfxFOzxeF7Q73vwf1NpX8dFeuYqzh/HtqwTOuAW7KeMa6enCoGT1Gtw==@kvack.org X-Gm-Message-State: AOJu0Yz4zZ1YAJrCbjc2h/4EyBkeyp9E18VGbkAsj+wJYqMieCRDwooP jicMNdsziRSJuty98WFJRU8YsO/vlL2byhuzNjtuxmNMF9lDL/IxUFblq0qyarUMheiZketuOKS Zi0FSapPXupmE6VSX4tAw7dfeld0= X-Gm-Gg: ASbGncu6YIxkCyFn4JxbQ4+ooFpRtnEeply0JWwnkfwkCzc9rnmf+WN5foGIYhRy+0U G/ME/gi0RclQWACRqQa4pkVTuSJCfGthYxvWoU+laqUC4KeHJeBqQ X-Google-Smtp-Source: AGHT+IH5nTFPA591H1OlXts/g9XE5/8uhTkuyL2Ws7WJN1mnCWLP+2QryIcbOTJ08EwF6a1kANUqV8Y5/SrqLYc4lBA= X-Received: by 2002:a05:622a:292:b0:467:b7de:da8a with SMTP id d75a77b69052e-46e12a2c4e7mr297758521cf.6.1737505808541; Tue, 21 Jan 2025 16:30:08 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Joanne Koong Date: Tue, 21 Jan 2025 16:29:57 -0800 X-Gm-Features: AbW1kvbWQM79afxBZzxgDAO1WFzzF2GFqqHOkYKREu-1WlvKLkiUoAlrD8Ow-9M Message-ID: Subject: Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Improving large folio writeback performance To: Jan Kara Cc: lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, "Matthew Wilcox (Oracle)" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 7A277C000F X-Stat-Signature: d67jfgpjrctk7tc8rxprffpjj4fz9a75 X-Rspam-User: X-HE-Tag: 1737505809-116413 X-HE-Meta: U2FsdGVkX18gmd7dp2UNvg9THV7lJVB2VccIWMl3qoGs7saFVSoZH6Yuo0WrwBeJHqXuQRM0bt5Syogj2bjjPRnToFldUbeVKfmflNwMCM4IIdY4/AQvJtavgElr+nv3ZrLsEponfevDv4HiZR1wUNcr7VNENq6VbfyjcyYvjmu/spv2AFZjndtjwGSpHOzle/4TuDhI/O6HOVBHsF8LhUZp+cFFa4HaMsjPOMSCUnL4EiC4tQHBdOA8ppyXLh3FeMox5LF+gzk4KMPcOqr7iN4wZQhp62plKp7SS04A1hcIOfIp/4u+td/IJODCOgWQKrlbQBJhQTIdth5lMlDFDRdl4Z/Kn/PJ1kZz6dlbeyrDN7wn7TQtGqfiHR2vcB1R3inkN0MFGDKNsJEIqEoVaAmC5z86QJ4eCCqQOA+kZx5M7GGlfVyXjRVHDlLbgtxqA/UgBm9u60kw/LvhYBntG4WYeg9oCChft6YHDnFDlxMrkwGBmBykd2ZN4ogDFZLdpKZo0Eu3szBy1Ax4z/IdU+xIWuk2ht/H2+p6bWZT/cEIail4KFAKYd4dLunLRa23CiUwbpOYlkHbWGnshmMEQ4v84g+68DerSxQNqwUmhBo+IrBS7c+3KxlFt1rc2pL2kk2e2fnMdQNs9mlvlIIVnticUIMeDE4CUMO+Xc9N7+aWSMrjrb7gbFcshrxsAZVCuTOaLwfyBtkG36NZVJYMQ1IIpqT9xhZ15X1EWwV4yrh/AfS5rVIWCL+YfzmgbSKBLLQAvJ6OYk6jpN9bj+lEPC1W1P/HAP3C0Zj5Hj/C01prHv85IYVwpyqcaejSiiYqLuuei/8MLzVPLdUBKfgP8i2Ykvl7Q0lk2ezgwhdDQ7tfIGQmK2IldB7lw0XmIrO91aCK0rhryZXA88+aVJOXzP3Jks3t92mWDGf1WtQYIkRyWzjlGQrI7cmqYFERgsMQP+YUp1JPO9aDwcks/he OPP8f08F /426gG4lrFHGTr/CFqSBm+eMLdWj0UUVgZwgUMsw9FEt+Gcx6EDqTMI4bik/+pH1XuDu2lCMH77C2wkekY0DuYZASkzVbVqUy36g65IliApsG0Lgwk3Mdd1MfnleZhNIOa2OpWbr4OOxWAnokuCwbHmuwSVOjRgX2dRNG3ToNdgSuOvcQRI2nY9JELwI8VqLS5F5WebRAMPVqa475IkuTmgTI2U68C+u/l8RtJHtuBp7kaE0hLwtG48s0ETM8N7ROxOdEdg/JK+L09j7zQi5D2MYWGSoOeP8QGnYwWGZdNcVTQhh2aH7AzlIOoGrPS2ufJXG+/klhkUna4BTSpbDLI+hHgf1A6jqSTn/mNh/kEMTGfC9rumfqHkmBbFHvob8/tblYxky3SzoHyRa2FSacZHHVLQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000088, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Jan 20, 2025 at 2:42=E2=80=AFPM Jan Kara wrote: > > On Fri 17-01-25 14:45:01, Joanne Koong wrote: > > On Fri, Jan 17, 2025 at 3:53=E2=80=AFAM Jan Kara wrote: > > > On Thu 16-01-25 15:38:54, Joanne Koong wrote: > > > I think tweaking min_pause is a wrong way to do this. I think that is= just a > > > symptom. Can you run something like: > > > > > > while true; do > > > cat /sys/kernel/debug/bdi//stats > > > echo "---------" > > > sleep 1 > > > done >bdi-debug.txt > > > > > > while you are writing to the FUSE filesystem and share the output fil= e? > > > That should tell us a bit more about what's happening inside the writ= eback > > > throttling. Also do you somehow configure min/max_ratio for the FUSE = bdi? > > > You can check in /sys/block//bdi/{min,max}_ratio . I suspec= t the > > > problem is that the BDI dirty limit does not ramp up properly when we > > > increase dirtied pages in large chunks. > > > > This is the debug info I see for FUSE large folio writes where bs=3D1M > > and size=3D1G: > > > > > > BdiWriteback: 0 kB > > BdiReclaimable: 0 kB > > BdiDirtyThresh: 896 kB > > DirtyThresh: 359824 kB > > BackgroundThresh: 179692 kB > > BdiDirtied: 1071104 kB > > BdiWritten: 4096 kB > > BdiWriteBandwidth: 0 kBps > > b_dirty: 0 > > b_io: 0 > > b_more_io: 0 > > b_dirty_time: 0 > > bdi_list: 1 > > state: 1 > > --------- > > BdiWriteback: 0 kB > > BdiReclaimable: 0 kB > > BdiDirtyThresh: 3596 kB > > DirtyThresh: 359824 kB > > BackgroundThresh: 179692 kB > > BdiDirtied: 1290240 kB > > BdiWritten: 4992 kB > > BdiWriteBandwidth: 0 kBps > > b_dirty: 0 > > b_io: 0 > > b_more_io: 0 > > b_dirty_time: 0 > > bdi_list: 1 > > state: 1 > > --------- > > BdiWriteback: 0 kB > > BdiReclaimable: 0 kB > > BdiDirtyThresh: 3596 kB > > DirtyThresh: 359824 kB > > BackgroundThresh: 179692 kB > > BdiDirtied: 1517568 kB > > BdiWritten: 5824 kB > > BdiWriteBandwidth: 25692 kBps > > b_dirty: 0 > > b_io: 1 > > b_more_io: 0 > > b_dirty_time: 0 > > bdi_list: 1 > > state: 7 > > --------- > > BdiWriteback: 0 kB > > BdiReclaimable: 0 kB > > BdiDirtyThresh: 3596 kB > > DirtyThresh: 359824 kB > > BackgroundThresh: 179692 kB > > BdiDirtied: 1747968 kB > > BdiWritten: 6720 kB > > BdiWriteBandwidth: 0 kBps > > b_dirty: 0 > > b_io: 0 > > b_more_io: 0 > > b_dirty_time: 0 > > bdi_list: 1 > > state: 1 > > --------- > > BdiWriteback: 0 kB > > BdiReclaimable: 0 kB > > BdiDirtyThresh: 896 kB > > DirtyThresh: 359824 kB > > BackgroundThresh: 179692 kB > > BdiDirtied: 1949696 kB > > BdiWritten: 7552 kB > > BdiWriteBandwidth: 0 kBps > > b_dirty: 0 > > b_io: 0 > > b_more_io: 0 > > b_dirty_time: 0 > > bdi_list: 1 > > state: 1 > > --------- > > BdiWriteback: 0 kB > > BdiReclaimable: 0 kB > > BdiDirtyThresh: 3612 kB > > DirtyThresh: 361300 kB > > BackgroundThresh: 180428 kB > > BdiDirtied: 2097152 kB > > BdiWritten: 8128 kB > > BdiWriteBandwidth: 0 kBps > > b_dirty: 0 > > b_io: 0 > > b_more_io: 0 > > b_dirty_time: 0 > > bdi_list: 1 > > state: 1 > > --------- > > > > > > I didn't do anything to configure/change the FUSE bdi min/max_ratio. > > This is what I see on my system: > > > > cat /sys/class/bdi/0:52/min_ratio > > 0 > > cat /sys/class/bdi/0:52/max_ratio > > 1 > > OK, we can see that BdiDirtyThresh stabilized more or less at 3.6MB. > Checking the code, this shows we are hitting __wb_calc_thresh() logic: > > if (unlikely(wb->bdi->capabilities & BDI_CAP_STRICTLIMIT)) { > unsigned long limit =3D hard_dirty_limit(dom, dtc->thresh= ); > u64 wb_scale_thresh =3D 0; > > if (limit > dtc->dirty) > wb_scale_thresh =3D (limit - dtc->dirty) / 100; > wb_thresh =3D max(wb_thresh, min(wb_scale_thresh, wb_max_= thresh / > } > > so BdiDirtyThresh is set to DirtyThresh/100. This also shows bdi never > generates enough throughput to ramp up it's share from this initial value= . > > > > Actually, there's a patch queued in mm tree that improves the ramping= up of > > > bdi dirty limit for strictlimit bdis [1]. It would be nice if you cou= ld > > > test whether it changes something in the behavior you observe. Thanks= ! > > > > > > Honza > > > > > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/t= ree/patche > > > s/mm-page-writeback-consolidate-wb_thresh-bumping-logic-into-__wb_cal= c_thresh.pa > > > tch > > > > I still see the same results (~230 MiB/s throughput using fio) with > > this patch applied, unfortunately. Here's the debug info I see with > > this patch (same test scenario as above on FUSE large folio writes > > where bs=3D1M and size=3D1G): > > > > BdiWriteback: 0 kB > > BdiReclaimable: 2048 kB > > BdiDirtyThresh: 3588 kB > > DirtyThresh: 359132 kB > > BackgroundThresh: 179348 kB > > BdiDirtied: 51200 kB > > BdiWritten: 128 kB > > BdiWriteBandwidth: 102400 kBps > > b_dirty: 1 > > b_io: 0 > > b_more_io: 0 > > b_dirty_time: 0 > > bdi_list: 1 > > state: 5 > > --------- > > BdiWriteback: 0 kB > > BdiReclaimable: 0 kB > > BdiDirtyThresh: 3588 kB > > DirtyThresh: 359144 kB > > BackgroundThresh: 179352 kB > > BdiDirtied: 331776 kB > > BdiWritten: 1216 kB > > BdiWriteBandwidth: 0 kBps > > b_dirty: 0 > > b_io: 0 > > b_more_io: 0 > > b_dirty_time: 0 > > bdi_list: 1 > > state: 1 > > --------- > > BdiWriteback: 0 kB > > BdiReclaimable: 0 kB > > BdiDirtyThresh: 3588 kB > > DirtyThresh: 359144 kB > > BackgroundThresh: 179352 kB > > BdiDirtied: 562176 kB > > BdiWritten: 2176 kB > > BdiWriteBandwidth: 0 kBps > > b_dirty: 0 > > b_io: 0 > > b_more_io: 0 > > b_dirty_time: 0 > > bdi_list: 1 > > state: 1 > > --------- > > BdiWriteback: 0 kB > > BdiReclaimable: 0 kB > > BdiDirtyThresh: 3588 kB > > DirtyThresh: 359144 kB > > BackgroundThresh: 179352 kB > > BdiDirtied: 792576 kB > > BdiWritten: 3072 kB > > BdiWriteBandwidth: 0 kBps > > b_dirty: 0 > > b_io: 0 > > b_more_io: 0 > > b_dirty_time: 0 > > bdi_list: 1 > > state: 1 > > --------- > > BdiWriteback: 64 kB > > BdiReclaimable: 0 kB > > BdiDirtyThresh: 3588 kB > > DirtyThresh: 359144 kB > > BackgroundThresh: 179352 kB > > BdiDirtied: 1026048 kB > > BdiWritten: 3904 kB > > BdiWriteBandwidth: 0 kBps > > b_dirty: 0 > > b_io: 0 > > b_more_io: 0 > > b_dirty_time: 0 > > bdi_list: 1 > > state: 1 > > --------- > > Yeah, here the situation is really the same. As an experiment can you > experiment with setting min_ratio for the FUSE bdi to 1, 2, 3, ..., 10 (I > don't expect you should need to go past 10) and figure out when there's > enough slack space for the writeback bandwidth to ramp up to a full speed= ? > Thanks! > > Honza When locally testing this, I'm seeing that the max_ratio affects the bandwidth more so than min_ratio (eg the different min_ratios have roughly the same bandwidth per max_ratio). I'm also seeing somewhat high variance across runs which makes it hard to gauge what's accurate, but on average this is what I'm seeing: max_ratio=3D1 --- bandwidth=3D ~230 MiB/s max_ratio=3D2 --- bandwidth=3D ~420 MiB/s max_ratio=3D3 --- bandwidth=3D ~550 MiB/s max_ratio=3D4 --- bandwidth=3D ~653 MiB/s max_ratio=3D5 --- bandwidth=3D ~700 MiB/s max_ratio=3D6 --- bandwidth=3D ~810 MiB/s max_ratio=3D7 --- bandwidth=3D ~1040 MiB/s (and then a lot of times, 561 MiB/s on subsequent runs) Thanks, Joanne > -- > Jan Kara > SUSE Labs, CR