From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 90A31C48BDF for ; Tue, 22 Jun 2021 12:29:46 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3D5BF60FEA for ; Tue, 22 Jun 2021 12:29:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3D5BF60FEA Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D29C66B0082; Tue, 22 Jun 2021 08:29:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CD9286B0083; Tue, 22 Jun 2021 08:29:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BA1906B0087; Tue, 22 Jun 2021 08:29:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0152.hostedemail.com [216.40.44.152]) by kanga.kvack.org (Postfix) with ESMTP id 8A7BA6B0082 for ; Tue, 22 Jun 2021 08:29:45 -0400 (EDT) Received: from smtpin35.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 30B45117FE for ; Tue, 22 Jun 2021 12:29:45 +0000 (UTC) X-FDA: 78281291130.35.A5B4E9F Received: from mail-io1-f53.google.com (mail-io1-f53.google.com [209.85.166.53]) by imf30.hostedemail.com (Postfix) with ESMTP id CAD79E002CC6 for ; Tue, 22 Jun 2021 12:29:44 +0000 (UTC) Received: by mail-io1-f53.google.com with SMTP id i189so66762ioa.8 for ; Tue, 22 Jun 2021 05:29:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=HLmTmVuuBpaotf6UimooCw0I2x7NRsXSXiyt5vUC6yo=; b=CVBcfjSTzsrzZcOg7yBHG95WzkSzsc9ei7Ds7laJEdW4W6mVhcYgjQtTk059FPoUAN s95UIYRZgvP4jIfJpCNaajkjfcfQj5I23cVGmqF8tD4WosC2L79Q6lsxz8Lx6x4mV7ZV lHLQzNRVvbq+pMNxpfNwDvFDxG9Ao9QECe2AvUkwG0crqiVOTiTu2AFKd6undHoZZL+h Pel4diN4gl/ojs9Wd1/5GAtwHB0t86XK2PXD0RSj5J0iFKy8IlRQm+MGybLyiPCy/Oqr 4sM59HYTtYcvClDZYlclZlm9bNDL59PfXEIz7Lsh1U0FUwq0q0fld0SF9apUM72E1epd dxLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=HLmTmVuuBpaotf6UimooCw0I2x7NRsXSXiyt5vUC6yo=; b=s31EDUpxOghml21NTia4mTYQGff3bM50nniqELLlRp4v0ZTrpZ9ZLI6Xmt2Un5WC6s kX8IcqaK20q20oiCo3bJwHBbjZbpXDn7aXH2GWh7irF+V7C3nhMvQYZ+ps/eLBTMlc0a 3d58bxODIGXljSbeyujnCJB7ArK9o1c8i2GMk3KqsBbMGRFUE0tki1siQOq57ebc+fkw z8gahFOboefX0dgfyHtqj4thOVl0XE+atycYF6Y0pfGpOVBy3cXQsnJjnYq2PcXAsZtG oVZLoyF1yLqO+qSLPCn2m1rMjfq9D0YKKTx9P6E9RmbLL7q7gp8wiqnyl4uKyrkGVEO+ IR1w== X-Gm-Message-State: AOAM5321RWon4V+NzWBE+S7a4FtTrlPrEmZZA92xMC+6u46j9rKGWnqt SP03sa8WA6OITTQNNr2g9os0CJzXODQX9805wydpUg== X-Google-Smtp-Source: ABdhPJyBFnZ/enj0tZDRqDCSnSs4jMGpAl1FO4ZrOEGPE1Kdkl8s+X+/DbqvC3LAwuWmsofe9hBKZhrFj5Fbqz7m7AM= X-Received: by 2002:a02:5b45:: with SMTP id g66mr3798144jab.62.1624364983969; Tue, 22 Jun 2021 05:29:43 -0700 (PDT) MIME-Version: 1.0 References: <20210617095309.3542373-1-stapelberg+linux@google.com> <20210622121205.GG14261@quack2.suse.cz> In-Reply-To: <20210622121205.GG14261@quack2.suse.cz> From: Michael Stapelberg Date: Tue, 22 Jun 2021 14:29:32 +0200 Message-ID: Subject: Re: [PATCH] backing_dev_info: introduce min_bw/max_bw limits To: Jan Kara Cc: Miklos Szeredi , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm , linux-fsdevel@vger.kernel.org, Tejun Heo , Dennis Zhou , Jens Axboe , Roman Gushchin , Johannes Thumshirn , Song Liu , David Sterba Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20161025 header.b=CVBcfjST; spf=pass (imf30.hostedemail.com: domain of stapelberg@google.com designates 209.85.166.53 as permitted sender) smtp.mailfrom=stapelberg@google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: CAD79E002CC6 X-Stat-Signature: 7j73m5mwpps9w1n7dg54bjctfkg916yi X-HE-Tag: 1624364984-487911 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Thanks for taking a look! Comments inline: On Tue, 22 Jun 2021 at 14:12, Jan Kara wrote: > > On Mon 21-06-21 11:20:10, Michael Stapelberg wrote: > > Hey Miklos > > > > On Fri, 18 Jun 2021 at 16:42, Miklos Szeredi wrote: > > > > > > On Fri, 18 Jun 2021 at 10:31, Michael Stapelberg > > > wrote: > > > > > > > Maybe, but I don=E2=80=99t have the expertise, motivation or time t= o > > > > investigate this any further, let alone commit to get it done. > > > > During our previous discussion I got the impression that nobody els= e > > > > had any cycles for this either: > > > > https://lore.kernel.org/linux-fsdevel/CANnVG6n=3DySfe1gOr=3D0ituQid= p56idGARDKHzP0hv=3DERedeMrMA@mail.gmail.com/ > > > > > > > > Have you had a look at the China LSF report at > > > > http://bardofschool.blogspot.com/2011/? > > > > The author of the heuristic has spent significant effort and time > > > > coming up with what we currently have in the kernel: > > > > > > > > """ > > > > Fengguang said he draw more than 10K performance graphs and read ev= en > > > > more in the past year. > > > > """ > > > > > > > > This implies that making changes to the heuristic will not be a qui= ck fix. > > > > > > Having a piece of kernel code sitting there that nobody is willing to > > > fix is certainly not a great situation to be in. > > > > Agreed. > > > > > > > > And introducing band aids is not going improve the above situation, > > > more likely it will prolong it even further. > > > > Sounds like =E2=80=9CPerfect is the enemy of good=E2=80=9D to me: you= =E2=80=99re looking for a > > perfect hypothetical solution, > > whereas we have a known-working low risk fix for a real problem. > > > > Could we find a solution where medium-/long-term, the code in question > > is improved, > > perhaps via a Summer Of Code project or similar community efforts, > > but until then, we apply the patch at hand? > > > > As I mentioned, I think adding min/max limits can be useful regardless > > of how the heuristic itself changes. > > > > If that turns out to be incorrect or undesired, we can still turn the > > knobs into a no-op, if removal isn=E2=80=99t an option. > > Well, removal of added knobs is more or less out of question as it can > break some userspace. Similarly making them no-op is problematic unless w= e > are pretty certain it cannot break some existing setup. That's why we hav= e > to think twice (or better three times ;) before adding any knobs. Also > honestly the knobs you suggest will be pretty hard to tune when there are > multiple cgroups with writeback control involved (which can be affected b= y > the same problems you observe as well). So I agree with Miklos that this = is > not the right way to go. Speaking of tunables, did you try tuning > /sys/devices/virtual/bdi//min_ratio? I suspect that may > workaround your problems... Back then, I did try the various tunables (vm.dirty_ratio and vm.dirty_background_ratio on the global level, /sys/class/bdi//{min,max}_ratio on the file system level), and they have had no observable effect on the problem at all in my tests. > > Looking into your original report and tracing you did (thanks for that, > really useful), it seems that the problem is that writeback bandwidth is > updated at most every 200ms (more frequent calls are just ignored) and ar= e > triggered only from balance_dirty_pages() (happen when pages are dirtied)= and > inode writeback code so if the workload tends to have short spikes of act= ivity > and extended periods of quiet time, then writeback bandwidth may indeed b= e > seriously miscomputed because we just won't update writeback throughput > after most of writeback has happened as you observed. > > I think the fix for this can be relatively simple. We just need to make > sure we update writeback bandwidth reasonably quickly after the IO > finishes. I'll write a patch and see if it helps. Thank you! Please keep us posted.