From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E4F70C433F5 for ; Mon, 15 Nov 2021 10:59:08 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 80CD0615A4 for ; Mon, 15 Nov 2021 10:59:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 80CD0615A4 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 12D016B007B; Mon, 15 Nov 2021 05:59:08 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0DD966B007D; Mon, 15 Nov 2021 05:59:08 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F0D886B007E; Mon, 15 Nov 2021 05:59:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0245.hostedemail.com [216.40.44.245]) by kanga.kvack.org (Postfix) with ESMTP id E13C46B007B for ; Mon, 15 Nov 2021 05:59:07 -0500 (EST) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 8AAAB18084F44 for ; Mon, 15 Nov 2021 10:59:07 +0000 (UTC) X-FDA: 78810867534.08.3DD5A07 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf12.hostedemail.com (Postfix) with ESMTP id 0BBAF10000BB for ; Mon, 15 Nov 2021 10:59:05 +0000 (UTC) Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 4C6692191E; Mon, 15 Nov 2021 10:58:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1636973926; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=vM9+eNo6ERXQ4h/WUE/dmFMupZWi1IETUaQIKqTmnlk=; b=g3lwX5BznrpmCjZTXX+HgLaWbRuG2tfzrSs/fJlqGEbFsuZwW3FW8O61dB4QRonTX+NQxj TlaLIO+AEpofdSkGfZcTjf+JMOQJNeCfvnZXwistYm45PQTt/MiwszfqGzjBPgjUvwJG3S pr5COIH3NF6MO3B6kp4m3+ExV3tq3AU= Received: from suse.cz (unknown [10.100.201.86]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id AAFC1A3B8C; Mon, 15 Nov 2021 10:58:45 +0000 (UTC) Date: Mon, 15 Nov 2021 11:58:45 +0100 From: Michal Hocko To: Mina Almasry Cc: Theodore Ts'o , Greg Thelen , Shakeel Butt , Andrew Morton , Hugh Dickins , Roman Gushchin , Johannes Weiner , Tejun Heo , Vladimir Davydov , Muchun Song , riel@surriel.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH v3 2/4] mm/oom: handle remote ooms Message-ID: References: <20211111234203.1824138-1-almasrymina@google.com> <20211111234203.1824138-3-almasrymina@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 0BBAF10000BB X-Stat-Signature: igchj57thsyqr5w4yyor9r4orc6pas5e Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=g3lwX5Bz; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf12.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.28 as permitted sender) smtp.mailfrom=mhocko@suse.com X-HE-Tag: 1636973945-200581 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri 12-11-21 09:59:22, Mina Almasry wrote: > On Fri, Nov 12, 2021 at 12:36 AM Michal Hocko wrote: > > > > On Fri 12-11-21 00:12:52, Mina Almasry wrote: > > > On Thu, Nov 11, 2021 at 11:52 PM Michal Hocko wrote: > > > > > > > > On Thu 11-11-21 15:42:01, Mina Almasry wrote: > > > > > On remote ooms (OOMs due to remote charging), the oom-killer will attempt > > > > > to find a task to kill in the memcg under oom, if the oom-killer > > > > > is unable to find one, the oom-killer should simply return ENOMEM to the > > > > > allocating process. > > > > > > > > This really begs for some justification. > > > > > > > > > > I'm thinking (and I can add to the commit message in v4) that we have > > > 2 reasonable options when the oom-killer gets invoked and finds > > > nothing to kill: (1) return ENOMEM, (2) kill the allocating task. I'm > > > thinking returning ENOMEM allows the application to gracefully handle > > > the failure to remote charge and continue operation. > > > > > > For example, in the network service use case that I mentioned in the > > > RFC proposal, it's beneficial for the network service to get an ENOMEM > > > and continue to service network requests for other clients running on > > > the machine, rather than get oom-killed when hitting the remote memcg > > > limit. But, this is not a hard requirement, the network service could > > > fork a process that does the remote charging to guard against the > > > remote charge bringing down the entire process. > > > > This all belongs to the changelog so that we can discuss all potential > > implication and do not rely on any implicit assumptions. > > Understood. Maybe I'll wait to collect more feedback and upload v4 > with a thorough explanation of the thought process. > > > E.g. why does > > it even make sense to kill a task in the origin cgroup? > > > > The behavior I saw returning ENOMEM for this edge case was that the > code was forever looping the pagefault, and I was (seemingly > incorrectly) under the impression that a suggestion to forever loop > the pagefault would be completely fundamentally unacceptable. Well, I have to say I am not entirely sure what is the best way to handle this situation. Another option would be to treat this similar to ENOSPACE situation. This would result into SIGBUS IIRC. The main problem with OOM killer is that it will not resolve the underlying problem in most situations. Shmem files would likely stay laying around and their charge along with them. Killing the allocating task has problems on its own because this could be just a DoS vector by other unrelated tasks sharing the shmem mount point without a gracefull fallback. Retrying the page fault is hard to detect. SIGBUS might be something that helps with the latest. The question is how to communicate this requerement down to the memcg code to know that the memory reclaim should happen (Should it? How hard we should try?) but do not invoke the oom killer. The more I think about this the nastier this is. -- Michal Hocko SUSE Labs