From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B58EAC433F5 for ; Wed, 15 Sep 2021 22:39:07 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 63CD860FED for ; Wed, 15 Sep 2021 22:39:07 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 63CD860FED Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=fromorbit.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id E8A396B0072; Wed, 15 Sep 2021 18:39:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E12226B0073; Wed, 15 Sep 2021 18:39:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CB3B16B0074; Wed, 15 Sep 2021 18:39:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0087.hostedemail.com [216.40.44.87]) by kanga.kvack.org (Postfix) with ESMTP id B88CF6B0072 for ; Wed, 15 Sep 2021 18:39:06 -0400 (EDT) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 6BE0C181EE432 for ; Wed, 15 Sep 2021 22:39:06 +0000 (UTC) X-FDA: 78591274692.03.9F10691 Received: from mail107.syd.optusnet.com.au (mail107.syd.optusnet.com.au [211.29.132.53]) by imf10.hostedemail.com (Postfix) with ESMTP id 7A0966001980 for ; Wed, 15 Sep 2021 22:39:05 +0000 (UTC) Received: from dread.disaster.area (pa49-195-238-16.pa.nsw.optusnet.com.au [49.195.238.16]) by mail107.syd.optusnet.com.au (Postfix) with ESMTPS id 4357499FB; Thu, 16 Sep 2021 08:39:00 +1000 (AEST) Received: from dave by dread.disaster.area with local (Exim 4.92.3) (envelope-from ) id 1mQdYQ-00CvLU-9f; Thu, 16 Sep 2021 08:38:58 +1000 Date: Thu, 16 Sep 2021 08:38:58 +1000 From: Dave Chinner To: Mel Gorman Cc: NeilBrown , Andrew Morton , Theodore Ts'o , Andreas Dilger , "Darrick J. Wong" , Jan Kara , Michal Hocko , Matthew Wilcox , linux-xfs@vger.kernel.org, linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 3/6] EXT4: Remove ENOMEM/congestion_wait() loops. Message-ID: <20210915223858.GM2361455@dread.disaster.area> References: <163157808321.13293.486682642188075090.stgit@noble.brown> <163157838437.13293.14244628630141187199.stgit@noble.brown> <20210914163432.GR3828@suse.com> <20210914235535.GL2361455@dread.disaster.area> <20210915085904.GU3828@suse.com> <20210915143510.GE3959@techsingularity.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210915143510.GE3959@techsingularity.net> X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.3 cv=YKPhNiOx c=1 sm=1 tr=0 a=DzKKRZjfViQTE5W6EVc0VA==:117 a=DzKKRZjfViQTE5W6EVc0VA==:17 a=kj9zAlcOel0A:10 a=7QKq2e-ADPsA:10 a=7-415B0cAAAA:8 a=rs-OYbL5b6b9dbSePnwA:9 a=CjuIK1q_8ugA:10 a=biEYGPWJfzWAr4FL6Ov7:22 Authentication-Results: imf10.hostedemail.com; dkim=none; dmarc=none; spf=none (imf10.hostedemail.com: domain of david@fromorbit.com has no SPF policy when checking 211.29.132.53) smtp.mailfrom=david@fromorbit.com X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 7A0966001980 X-Stat-Signature: qorkq6ta5uszxp5k5wiez8wc8hy1n7m3 X-HE-Tag: 1631745545-853226 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Sep 15, 2021 at 03:35:10PM +0100, Mel Gorman wrote: > On Wed, Sep 15, 2021 at 09:59:04AM +0100, Mel Gorman wrote: > > > Yup, that's what we need, but I don't see why it needs to be exposed > > > outside the allocation code at all. > > > > > > > Probably not. At least some of it could be contained within reclaim > > itself to block when reclaim is not making progress as opposed to anything > > congestion related. That might still livelock if no progress can be made > > but that's not new, the OOM hammer should eventually kick in. > > > > There are two sides to the reclaim-related throttling > > 1. throttling because zero progress is being made > 2. throttling because there are too many dirty pages or pages under > writeback cycling through the LRU too quickly. > > The dirty page aspects (and the removal of wait_iff_congested which is > almost completely broken) could be done with something like the following > (completly untested). The downside is that end_page_writeback() takes an > atomic penalty if reclaim is throttled but at that point the system is > struggling anyway so I doubt it matters. The atomics are pretty nasty, as is directly accessing the pgdat on every call to end_page_writeback(). Those will be performance limiting factors. Indeed, we don't use atomics for dirty page throttling, which does dirty page accounting via percpu counters on the BDI and doesn't require wakeups. Also, we've already got per-node and per-zone counters there for dirty/write pending stats, so do we actually need new counters and wakeups here? i.e. balance_dirty_pages() does not have an explicit wakeup - it bases it's sleep time on the (memcg aware) measured writeback rate on the BDI the page belongs to and the amount of outstanding dirty data on that BDI. i.e. it estimates fairly accurately what the wait time for this task should be given the dirty page demand and current writeback progress being made is and just sleeps for that length of time. Ideally, that's what should be happening here - we should be able to calculate a page cleaning rate estimation and then base the sleep time on that. No wakeups needed - when we've waited for the estimated time, we try to reclaim again... In fact, why can't this "too many dirty pages" case just use the balance_dirty_pages() infrastructure to do the "wait for writeback" reclaim backoff? Why do we even need to re-invent the wheel here? > diff --git a/mm/filemap.c b/mm/filemap.c > index dae481293b5d..b9be9afa4308 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -1606,6 +1606,8 @@ void end_page_writeback(struct page *page) > smp_mb__after_atomic(); > wake_up_page(page, PG_writeback); > put_page(page); > + > + acct_reclaim_writeback(page); UAF - that would need to be before the put_page() call... Cheers, Dave. -- Dave Chinner david@fromorbit.com