From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 188D2C433F5 for ; Mon, 15 Nov 2021 17:32:58 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BEE67632B8 for ; Mon, 15 Nov 2021 17:32:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org BEE67632B8 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 4A02F6B007B; Mon, 15 Nov 2021 12:32:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 44F916B007D; Mon, 15 Nov 2021 12:32:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 33EC56B007E; Mon, 15 Nov 2021 12:32:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0125.hostedemail.com [216.40.44.125]) by kanga.kvack.org (Postfix) with ESMTP id 2420D6B007B for ; Mon, 15 Nov 2021 12:32:57 -0500 (EST) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id D99EB84789 for ; Mon, 15 Nov 2021 17:32:56 +0000 (UTC) X-FDA: 78811859952.12.2CAFB12 Received: from mail-lj1-f176.google.com (mail-lj1-f176.google.com [209.85.208.176]) by imf27.hostedemail.com (Postfix) with ESMTP id B8EB570000B0 for ; Mon, 15 Nov 2021 17:32:55 +0000 (UTC) Received: by mail-lj1-f176.google.com with SMTP id e11so37096356ljo.13 for ; Mon, 15 Nov 2021 09:32:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=fIZsV1tZ1zejKBLMIoMZpzHmx69X5V2FDPLBZdbpb4I=; b=I0XsVPB1FjfH1rhf3ncp0KEXwazjq2PIf6R5llK+ireA6WKRE5bO3s2q5dXEaBJirg YEYLAzgw4ei9euOlx+LKjVYERdtPvbKR51tBnmc02VDC6I6eXPtR2Nwot1iCYcb02ert AfJSWq9PE3c7BltHeWeZeZP/T1xtt0Q3fy37o4qC+tcObqJJB6uKfDoMRKylB4DSebuc Yty4EIwtGqQZ6wZp2zIpfxkCz5x6nqi0tAsY2milaXqCyhYjnqtSt77DsEiKnZqpWDc0 CErK70dbfqIkSjJfcr2z14RsNvlkNnVX2qvpnShUPEtUoMtHp7FZxsiumXCEnZnkCB6V 3WYw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=fIZsV1tZ1zejKBLMIoMZpzHmx69X5V2FDPLBZdbpb4I=; b=3gB0jRJhe9AP5uySpFnDLjUnGCyjOw+J2c9YQ/hl2qJvjF+px2LlhuzsuXm1vnOHm7 Ex+VaX0U6ggcVagHFOHwbCONlCgqh32jqM38RN5KKL5icz6xh9MBbIms9oogThvYc+u9 2f7VBwu1aTAD/7xmVOZLZkP8E9aiYBFzKVQ+V3Q3YlQlhkgOB3PX+83mMG2GEocpZfJP xMxOOynwK74yMRnJuziZ3LSH2a9F8Q/y1X+3ajhEh3u4x3xDe6DDJH+kzxQCeCMyJ5Qg qAMD1C+wnBKCDv7dcCVI/26S75CwajQeQtbXTyeRJMd+hcl+54p/AiJQHlApCJ+LTUBs vtSQ== X-Gm-Message-State: AOAM532qCEPHxIN1fKzZr+wcM/lED2I6xydZS25fRp0YB05rlYiU6ka7 fJOMR6uEexgZDF7U/CRuPq3yP3ASFzvHfivWS3uR9A== X-Google-Smtp-Source: ABdhPJyaBBfC+vfCO5R9uufkjWJYBe2T5CgoqLc2lysplK5NQmmEvq/aLPvOlcgfRzAM3cMTLACBIhbgPcNlt4aK4+g= X-Received: by 2002:a2e:a314:: with SMTP id l20mr335865lje.86.1636997573877; Mon, 15 Nov 2021 09:32:53 -0800 (PST) MIME-Version: 1.0 References: <20211111234203.1824138-1-almasrymina@google.com> <20211111234203.1824138-3-almasrymina@google.com> In-Reply-To: From: Shakeel Butt Date: Mon, 15 Nov 2021 09:32:42 -0800 Message-ID: Subject: Re: [PATCH v3 2/4] mm/oom: handle remote ooms To: Michal Hocko Cc: Mina Almasry , "Theodore Ts'o" , Greg Thelen , Andrew Morton , Hugh Dickins , Roman Gushchin , Johannes Weiner , Tejun Heo , Vladimir Davydov , Muchun Song , riel@surriel.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, cgroups@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: B8EB570000B0 X-Stat-Signature: 7d97cbftjpnnc8k9cxz9qj9cc7scgo8j Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=I0XsVPB1; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf27.hostedemail.com: domain of shakeelb@google.com designates 209.85.208.176 as permitted sender) smtp.mailfrom=shakeelb@google.com X-HE-Tag: 1636997575-973700 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Nov 15, 2021 at 2:58 AM Michal Hocko wrote: > [...] > > > > The behavior I saw returning ENOMEM for this edge case was that the > > code was forever looping the pagefault, and I was (seemingly > > incorrectly) under the impression that a suggestion to forever loop > > the pagefault would be completely fundamentally unacceptable. > > Well, I have to say I am not entirely sure what is the best way to > handle this situation. Another option would be to treat this similar to > ENOSPACE situation. This would result into SIGBUS IIRC. > > The main problem with OOM killer is that it will not resolve the > underlying problem in most situations. Shmem files would likely stay > laying around and their charge along with them. This and similar topics were discussed during LSFMM 2019 (https://lwn.net/Articles/787626/). > Killing the allocating > task has problems on its own because this could be just a DoS vector by > other unrelated tasks sharing the shmem mount point without a gracefull > fallback. Retrying the page fault is hard to detect. SIGBUS might be > something that helps with the latest. The question is how to communicate > this requerement down to the memcg code to know that the memory reclaim > should happen (Should it? How hard we should try?) but do not invoke the > oom killer. The more I think about this the nastier this is. > -- IMHO we should punt the resolution to the userspace and keep the kernel simple. This is an opt-in feature and the user is expected to know and handle exceptional scenarios. The kernel just needs to tell the userspace that this exceptional situation is happening somehow. How about for remote ooms irrespective of page fault path or not, keep the allocator looping but keep incrementing a new memcg event MEMCG_OOM_NO_VICTIM? The userspace will get to know the situation either through inotify or polling and can handle the situation by either increasing the limit or by releasing the memory of the monitored memcg. thanks, Shakeel