From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 55500C433F5 for ; Fri, 3 Dec 2021 17:51:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B7D4D6B0073; Fri, 3 Dec 2021 12:51:16 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B055B6B0074; Fri, 3 Dec 2021 12:51:16 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 97F616B0075; Fri, 3 Dec 2021 12:51:16 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0169.hostedemail.com [216.40.44.169]) by kanga.kvack.org (Postfix) with ESMTP id 82DF26B0073 for ; Fri, 3 Dec 2021 12:51:16 -0500 (EST) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 3CC778249980 for ; Fri, 3 Dec 2021 17:51:06 +0000 (UTC) X-FDA: 78877224132.18.A94CD25 Received: from mail-lf1-f51.google.com (mail-lf1-f51.google.com [209.85.167.51]) by imf06.hostedemail.com (Postfix) with ESMTP id D1A79801AB06 for ; Fri, 3 Dec 2021 17:51:05 +0000 (UTC) Received: by mail-lf1-f51.google.com with SMTP id z7so8268857lfi.11 for ; Fri, 03 Dec 2021 09:51:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Xm0FOcKnB1VB3SyGY+sZgit/QboKHbPIM25nrSjiVeY=; b=N0n18KpBckGWjsg5X9h/A83ty0o4Y2ajbVK+oq8W2DGblpELSt9Dhvxst6zT5umQca l22wF2EXENaFxvBlgmOdgP9MwEyAzpw27Kext1rqCrvLVpBAfvt6zXCP1lqyoIF7zGoe P+afmkfI0PbMaFi7yUICWBHgB7jiKFB1ZYccLqS1kg83qgtm00eMb3dXtX5nf8A8RqV0 lEvR4CQfkt7+jF9vNelvn1aufiBq4DbTy/UFIBtrH8tnqAZHskqzMXZd37eO4HSU2juD s0Zgxp4s9rj+X0058T9NaXCB8niGEAjIdKnzglKON6QX0G+Z6kZ7R6kXSQU4Mxiu8c+J Gh7Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Xm0FOcKnB1VB3SyGY+sZgit/QboKHbPIM25nrSjiVeY=; b=Y9SBx7S2CbQ0NLzpYUpbtohFFuUe94R3qTxWRB7miBQBkbK3tyV2EDUPTGw+vkTr2u lg+SQWDUq4F1nuTaJhjawb0eqH0H5+kdoFqB9QTD9KG5dqyikukN9OhDM30Ky+ckj+L6 WKH7NySGZe4t3bFfa0ienubTZSIyrUN+jpkeMu5Gsh110Za+I6YYGHNUYiyNgYo14LGv ZnbgBrDSi0TYuTXeTFaefHefLoW2aXNh+t8gHyxfXnlLxlbtqN5fKBjC7uEWTZ//rRMS Qe6zwD/U6TVIk3/WGP6JggT1VBwz/UANN7CNCJxnl9jbaML1N3vN65n8rWdDDY++LNVc Hhfw== X-Gm-Message-State: AOAM533bykSaLGq6EqtPAUb0/NMLCqIiNrZjEwooQxazWlP04l3lz5Uy bQD6r15KqpbJ3ffgt533ZoczCv4kf5nq20MMB744PQ== X-Google-Smtp-Source: ABdhPJy+QmtD0vN6/vIdbzeiC2xsRqs0giW2uF6k3loJjaViYyrjKE72ZpAM7vEUEZc+Y+AUnuF0gnrkB9wwTa1wxWk= X-Received: by 2002:a05:6512:5c2:: with SMTP id o2mr18773168lfo.8.1638553863883; Fri, 03 Dec 2021 09:51:03 -0800 (PST) MIME-Version: 1.0 References: <20211202150614.22440-1-mgorman@techsingularity.net> <20211202165220.GZ3366@techsingularity.net> <20211203090137.GA3366@techsingularity.net> In-Reply-To: <20211203090137.GA3366@techsingularity.net> From: Shakeel Butt Date: Fri, 3 Dec 2021 09:50:51 -0800 Message-ID: Subject: Re: [PATCH v4 1/1] mm: vmscan: Reduce throttling due to a failure to make progress To: Mel Gorman Cc: Andrew Morton , Michal Hocko , Vlastimil Babka , Alexey Avramov , Rik van Riel , Mike Galbraith , Darrick Wong , regressions@lists.linux.dev, Linux-fsdevel , Linux-MM , LKML Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: D1A79801AB06 X-Stat-Signature: yxuoixb4er8fjfmhcr9rcuctcwzh1gp1 Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=N0n18KpB; spf=pass (imf06.hostedemail.com: domain of shakeelb@google.com designates 209.85.167.51 as permitted sender) smtp.mailfrom=shakeelb@google.com; dmarc=pass (policy=reject) header.from=google.com X-HE-Tag: 1638553865-172958 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Dec 3, 2021 at 1:01 AM Mel Gorman wrote: > [...] > > Not recently that I'm aware of but historically reclaim has been plagued by > at least two classes of problems -- premature OOM and excessive CPU usage > churning through the LRU. Going back, the solution was basically to sleep > something like "disable kswapd if it fails to make progress for too long". > Commit 69392a403f49 addressed a case where calling congestion_wait might as > well have been schedule_timeout_uninterruptible(HZ/10) because congestion > is no longer tracked by the block layer. > > Hence 69392a403f49 allows reclaim to throttle on NOPROGRESS but if > another task makes progress, the throttled tasks can be woken before the > timeout. The flaw was throttling too easily or for too long delaying OOM > being properly detected. > To remove congestion_wait of mem_cgroup_force_empty_write(), the commit 69392a403f49 has changed the behavior of all memcg reclaim codepaths as well as direct global reclaimers. Were there other congestion_wait() instances which commit 69392a403f49 was targeting but those congestion_wait() were replaced/removed by different commits? [...] > > > > Isn't it better that the reclaim returns why it is failing instead of > > littering the reclaim code with 'is this global reclaim', 'is this > > memcg reclaim', 'am I kswapd' which is also a layering violation. IMO > > this is the direction we should be going towards though not asking to > > do this now. > > > > It's not clear why you think the page allocator can make better decisions > about reclaim than reclaim can. It might make sense if callers were > returned enough information to make a decision but even if they could, > it would not be popular as the API would be difficult to use properly. > The above is a separate discussion for later. > Is your primary objection the cgroup_reclaim(sc) check? No, I am of the opinion that we should revert 69392a403f49 and we should have just replaced congestion_wait in mem_cgroup_force_empty_write with a simple schedule_timeout_interruptible. The memory.force_empty is a cgroup v1 interface (to be deprecated) and it is very normal to expect that the user will trigger that interface multiple times. We should not change the behavior of all the memcg reclaimers and direct global reclaimers so that we can remove congestion_wait from mem_cgroup_force_empty_write. > If so, I can > remove it. While there is a mild risk that OOM would be delayed, it's very > unlikely because a memcg failing to make progress in the local case will > probably call cond_resched() if there are not lots of of pages pending > writes globally. > > > Regarding this patch and 69392a403f49, I am still confused on the main > > motivation behind 69392a403f49 to change the behavior of 'direct > > reclaimers from page allocator'. > > > > The main motivation of the series overall was to remove the reliance on > congestion_wait and wait_iff_congested because both are fundamentally > broken when congestion is not tracked by the block layer. Replacing with > schedule_timeout_uninterruptible() would be silly because where possible > decisions on whether to pause or throttle should be based on events, > not time. For example, if there are too many pages waiting on writeback > then throttle but if writeback completes, wake the throttled tasks > instead of "sleep some time and hope for the best". > I am in agreement with the motivation of the whole series. I am just making sure that the motivation of VMSCAN_THROTTLE_NOPROGRESS based throttle is more than just the congestion_wait of mem_cgroup_force_empty_write. thanks, Shakeel