From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=3.0 tests=MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DCBAAC433E1 for ; Thu, 21 May 2020 14:35:19 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8CCC320671 for ; Thu, 21 May 2020 14:35:19 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8CCC320671 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 287ED80008; Thu, 21 May 2020 10:35:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2101A80007; Thu, 21 May 2020 10:35:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0D7F880008; Thu, 21 May 2020 10:35:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0251.hostedemail.com [216.40.44.251]) by kanga.kvack.org (Postfix) with ESMTP id E300680007 for ; Thu, 21 May 2020 10:35:18 -0400 (EDT) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id A7CDC641D for ; Thu, 21 May 2020 14:35:18 +0000 (UTC) X-FDA: 76840973916.09.bells57_68885ba7ad21c X-HE-Tag: bells57_68885ba7ad21c X-Filterd-Recvd-Size: 8364 Received: from mail-ej1-f52.google.com (mail-ej1-f52.google.com [209.85.218.52]) by imf37.hostedemail.com (Postfix) with ESMTP for ; Thu, 21 May 2020 14:35:18 +0000 (UTC) Received: by mail-ej1-f52.google.com with SMTP id z5so9114355ejb.3 for ; Thu, 21 May 2020 07:35:18 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=CU+URAOiaTBHZIYOLMUlPNn3m2wcv+iphqIOyVuOtwk=; b=bB0XO8RvlCWwIhYR1a6kzM1vZHYq1Vfg8XnQgDw/Zl/03B8qnxBnJ3h9XWDDeO4WXj PXhxjkBd/YgsGh/MM+pW4uIOxKBtkT/PMrMayDmeswoCsfoJNX3ATfyyShlz01sGY46j 0W11YAYJqqsUIjiQ18pZ49qsQz4OSD5q4yfHHV6pSbXt7lRcjb3o9OCZ1HjODjIhpNcb mx2oanDHIBlyiRO2qt/rV5TCM5LoKdlsK5FjXBaL0HQIY+GlszX9jsPGF5Kvezu6Up4k cFpFyhI7GjSAOkgcAKEjjDpy/CN905+k1o89iZNOuX9Ag4GXZ+Qe7gzU2ExzOQQwVrkA Liww== X-Gm-Message-State: AOAM5318EEOTOBzM+VxK5jw7BWfoyctBVBZQO/IqaJrqCIzoKdOiWlLL 1SAY54LlkMtkKQVPZWjRUbI= X-Google-Smtp-Source: ABdhPJwy5UVqhy2hd2MPmbnGZiwjLo8zlJjUJXRkIS9VzgWClA9n0F1GJHDd5SiZziEhBHQFyjuJZA== X-Received: by 2002:a17:906:4a8b:: with SMTP id x11mr3702141eju.107.1590071717110; Thu, 21 May 2020 07:35:17 -0700 (PDT) Received: from localhost (ip-37-188-180-112.eurotel.cz. [37.188.180.112]) by smtp.gmail.com with ESMTPSA id x23sm1891978edr.14.2020.05.21.07.35.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 21 May 2020 07:35:16 -0700 (PDT) Date: Thu, 21 May 2020 16:35:15 +0200 From: Michal Hocko To: Johannes Weiner Cc: Chris Down , Andrew Morton , Tejun Heo , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: Re: [PATCH] mm, memcg: reclaim more aggressively before high allocator throttling Message-ID: <20200521143515.GU6462@dhcp22.suse.cz> References: <20200520143712.GA749486@chrisdown.name> <20200520160756.GE6462@dhcp22.suse.cz> <20200520165131.GB630613@cmpxchg.org> <20200520170430.GG6462@dhcp22.suse.cz> <20200520175135.GA793901@cmpxchg.org> <20200521073245.GI6462@dhcp22.suse.cz> <20200521135152.GA810429@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200521135152.GA810429@cmpxchg.org> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu 21-05-20 09:51:52, Johannes Weiner wrote: > On Thu, May 21, 2020 at 09:32:45AM +0200, Michal Hocko wrote: [...] > > I am not saying the looping over try_to_free_pages is wrong. I do care > > about the final reclaim target. That shouldn't be arbitrary. We have > > established a target which is proportional to the requested amount of > > memory. And there is a good reason for that. If any task tries to > > reclaim down to the high limit then this might lead to a large > > unfairness when heavy producers piggy back on the active reclaimer(s). > > Why is that different than any other form of reclaim? Because the high limit reclaim is a best effort rather than must to either get over reclaim watermarks and continue allocation or meet the hard limit requirement to continue. In an ideal world even the global resp. hard limit reclaim should consider fairness. They don't because that is easier but that sucks. I have been involved in debugging countless of issues where direct reclaim was taking too long because of the unfairness. Users simply see that as bug and I am not surprised. > > I wouldn't mind to loop over try_to_free_pages to meet the requested > > memcg_nr_pages_over_high target. > > Should we do the same for global reclaim? Move reclaim to userspace > resume where there are no GFP_FS, GFP_NOWAIT etc. restrictions and > then have everybody just reclaim exactly what they asked for, and punt > interrupts / kthread allocations to a worker/kswapd? This would be quite challenging considering the page allocator wouldn't be able to make a forward progress without doing any reclaim. But maybe you can be creative with watermarks. > > > > > > Also if the current high reclaim scaling is insufficient then we should > > > > > > be handling that via memcg_nr_pages_over_high rather than effectivelly > > > > > > unbound number of reclaim retries. > > > > > > > > > > ??? > > > > > > > > I am not sure what you are asking here. > > > > > > You expressed that some alternate solution B would be preferable, > > > without any detail on why you think that is the case. > > > > > > And it's certainly not obvious or self-explanatory - in particular > > > because Chris's proposal *is* obvious and self-explanatory, given how > > > everybody else is already doing loops around page reclaim. > > > > Sorry, I could have been less cryptic. I hope the above and my response > > to Chris goes into more details why I do not like this proposal and what > > is the alternative. But let me summarize. I propose to use memcg_nr_pages_over_high > > target. If the current calculation of the target is unsufficient - e.g. > > in situations where the high limit excess is very large then this should > > be reflected in memcg_nr_pages_over_high. > > > > Is it more clear? > > Well you haven't made a good argument why memory.high is actually > different than any other form of reclaim, and why it should be the > only implementation of page reclaim that has special-cased handling > for the inherent "unfairness" or rather raciness of that operation. > > You cut these lines from the quote: > > Under pressure, page reclaim can struggle to satisfy the reclaim > goal and may return with less pages reclaimed than asked to. > > Under concurrency, a parallel allocation can invalidate the reclaim > progress made by a thread. > > Even if we *could* invest more into trying to avoid any unfairness, > you haven't made a point why we actually should do that here > specifically, yet not everywhere else. I have tried to explain my thinking elsewhere in the thread. The bottom line is that high limit is a way of throttling rather than meeting a specific target. With the current implementation we scale the reclaim activity by the consumer's demand which is something that is not terribly complex to wrap your head around and reason about. Because the objective is to not increase the excess much. It offers some sort of fairness as well. I fully recognize that a full fairness is not something we can target but working reasonably well most of the time sounds good enough for me. > (And people have tried to do it for global reclaim[1], but clearly > this isn't a meaningful problem in practice.) > > I have a good reason why we shouldn't: because it's special casing > memory.high from other forms of reclaim, and that is a maintainability > problem. We've recently been discussing ways to make the memory.high > implementation stand out less, not make it stand out even more. There > is no solid reason it should be different from memory.max reclaim, > except that it should sleep instead of invoke OOM at the end. It's > already a mess we're trying to get on top of and straighten out, and > you're proposing to add more kinks that will make this work harder. I do see your point of course. But I do not give the code consistency a higher priority than the potential unfairness aspect of the user visible behavior for something that can do better. Really the direct reclaim unfairness is really painfull and hard to explain to users. You can essentially only hand wave that system is struggling so fairness is not really a priority anymore. > I have to admit, I'm baffled by this conversation. I consider this a > fairly obvious, idiomatic change, and I cannot relate to the > objections or counter-proposals in the slightest. I have to admit that I would prefer a much less aggressive tone. We are discussing a topic which is obviously not black and white and there are different aspects of it. Thanks! > [1] http://lkml.iu.edu/hypermail//linux/kernel/0810.0/0169.html -- Michal Hocko SUSE Labs