From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F23FDC433EF for ; Fri, 12 Nov 2021 08:36:53 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8D68460FE3 for ; Fri, 12 Nov 2021 08:36:53 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 8D68460FE3 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 169AC6B0074; Fri, 12 Nov 2021 03:36:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 11B296B0078; Fri, 12 Nov 2021 03:36:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 008136B007B; Fri, 12 Nov 2021 03:36:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0141.hostedemail.com [216.40.44.141]) by kanga.kvack.org (Postfix) with ESMTP id E5C3C6B0074 for ; Fri, 12 Nov 2021 03:36:52 -0500 (EST) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 9AD7B184952E5 for ; Fri, 12 Nov 2021 08:36:52 +0000 (UTC) X-FDA: 78799622664.23.3ED005D Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf24.hostedemail.com (Postfix) with ESMTP id DE433B0000B9 for ; Fri, 12 Nov 2021 08:36:46 +0000 (UTC) Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 9FDCA1FDC2; Fri, 12 Nov 2021 08:36:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1636706205; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=utWjZFWtK1d8B6B0EoCAYEXXutvfJ9zlQXA/BhmjEOU=; b=roV1gIkupTl65lmeA+0dTiqvL2CHFYBKx+8Bfm+Azp+LNfosPF7BvXq/Nf2Jcsf/n9LhB5 0tjT57G1qy/6xYGK1EpK75lJOJwpa4mUq/n1KQ/PL2Bq3QQkySYtDfY08Z7sJtK9ELqw9d 0J5uBpWSd7h3uFOH/v010RJpfomWbIM= Received: from suse.cz (unknown [10.100.201.86]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 19965A3B81; Fri, 12 Nov 2021 08:36:45 +0000 (UTC) Date: Fri, 12 Nov 2021 09:36:43 +0100 From: Michal Hocko To: Mina Almasry Cc: Theodore Ts'o , Greg Thelen , Shakeel Butt , Andrew Morton , Hugh Dickins , Roman Gushchin , Johannes Weiner , Tejun Heo , Vladimir Davydov , Muchun Song , riel@surriel.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH v3 2/4] mm/oom: handle remote ooms Message-ID: References: <20211111234203.1824138-1-almasrymina@google.com> <20211111234203.1824138-3-almasrymina@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: DE433B0000B9 X-Stat-Signature: izsojm5jqi8hjmbzjxgtnyzifyjx3hxb Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=roV1gIku; spf=pass (imf24.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com X-HE-Tag: 1636706206-2293 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri 12-11-21 00:12:52, Mina Almasry wrote: > On Thu, Nov 11, 2021 at 11:52 PM Michal Hocko wrote: > > > > On Thu 11-11-21 15:42:01, Mina Almasry wrote: > > > On remote ooms (OOMs due to remote charging), the oom-killer will attempt > > > to find a task to kill in the memcg under oom, if the oom-killer > > > is unable to find one, the oom-killer should simply return ENOMEM to the > > > allocating process. > > > > This really begs for some justification. > > > > I'm thinking (and I can add to the commit message in v4) that we have > 2 reasonable options when the oom-killer gets invoked and finds > nothing to kill: (1) return ENOMEM, (2) kill the allocating task. I'm > thinking returning ENOMEM allows the application to gracefully handle > the failure to remote charge and continue operation. > > For example, in the network service use case that I mentioned in the > RFC proposal, it's beneficial for the network service to get an ENOMEM > and continue to service network requests for other clients running on > the machine, rather than get oom-killed when hitting the remote memcg > limit. But, this is not a hard requirement, the network service could > fork a process that does the remote charging to guard against the > remote charge bringing down the entire process. This all belongs to the changelog so that we can discuss all potential implication and do not rely on any implicit assumptions. E.g. why does it even make sense to kill a task in the origin cgroup? > > > If we're in pagefault path and we're unable to return ENOMEM to the > > > allocating process, we instead kill the allocating process. > > > > Why do you handle those differently? > > > > I'm thinking (possibly incorrectly) it's beneficial to return ENOMEM > to the allocating task rather than killing it. I would love to return > ENOMEM in both these cases, but I can't return ENOMEM in the fault > path. The behavior I see is that the oom-killer gets invoked over and > over again looking to find something to kill and continually failing > to find something to kill and the pagefault never gets handled. Just one remark. Until just very recently VM_FAULT_OOM (a result of ENOMEM) would trigger the global OOM killer. This has changed by 60e2793d440a ("mm, oom: do not trigger out_of_memory from the #PF"). But you are right that you might just end up looping in the page fault for ever. Is that bad though? The situation is fundamentaly unresolveable at this stage. On the other hand the task is still killable so the userspace can decide to terminate and break out of the loop. What is the best approach I am not quite sure. As I've said earlier this is very likely going to open a can of worms and so it should be evaluated very carefuly. For that, please make sure to describe your thinking in details. > I could, however, kill the allocating task whether it's in the > pagefault path or not; it's not a hard requirement that I return > ENOMEM. If this is what you'd like to see in v4, please let me know, > but I do see some value in allowing some callers to gracefully handle > the ENOMEM. > > > > Signed-off-by: Mina Almasry > > > > > > Cc: Michal Hocko > > > Cc: Theodore Ts'o > > > Cc: Greg Thelen > > > Cc: Shakeel Butt > > > Cc: Andrew Morton > > > Cc: Hugh Dickins > > > CC: Roman Gushchin > > > Cc: Johannes Weiner > > > Cc: Hugh Dickins > > > Cc: Tejun Heo > > > Cc: Vladimir Davydov > > > Cc: Muchun Song > > > Cc: riel@surriel.com > > > Cc: linux-mm@kvack.org > > > Cc: linux-fsdevel@vger.kernel.org > > > Cc: cgroups@vger.kernel.org > > -- > > Michal Hocko > > SUSE Labs -- Michal Hocko SUSE Labs