From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7E403C433EF for ; Sat, 20 Nov 2021 05:28:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E2C9A6B0071; Sat, 20 Nov 2021 00:27:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DDB7C6B0072; Sat, 20 Nov 2021 00:27:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CCA526B0073; Sat, 20 Nov 2021 00:27:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0023.hostedemail.com [216.40.44.23]) by kanga.kvack.org (Postfix) with ESMTP id BE6126B0071 for ; Sat, 20 Nov 2021 00:27:57 -0500 (EST) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 83E918C583 for ; Sat, 20 Nov 2021 05:27:47 +0000 (UTC) X-FDA: 78828176574.13.1FF5E42 Received: from mail-il1-f169.google.com (mail-il1-f169.google.com [209.85.166.169]) by imf04.hostedemail.com (Postfix) with ESMTP id 40C105000306 for ; Sat, 20 Nov 2021 05:27:45 +0000 (UTC) Received: by mail-il1-f169.google.com with SMTP id h23so12294694ila.4 for ; Fri, 19 Nov 2021 21:27:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=SAiOoSz2aR+nU4nAvKB3E4IjIBIAvYPcM3p77o3YLH0=; b=oy0azkKwn4dwoV3Er9arlhJg9eUMCuNmoZRgSiSzCRP7kxbiWKrxPVDpRhQzQ962oc kgM1lDDdza4+bSYnmj4afPPKJdgGSAQ4h8AwRzQ13eTNYyHJ/zlcWKxJNtiX8KpS4Nm+ IkWtFWulBUR14O60JjlU00wLAs3PZwZ9Z/JHPr6/HIBf2AWqgs1rDEc1vMMkl758qjTs rIgsapGRmn0e+yjMnezZMYa7i4G33yYaIpfjTgvRDZYgaXx+AR9S+xeHUmW9NARqI6Kd WAutsxJT/ikfNfJI5sEJI8e4l3BVBgGpBBVK8ORWN9OqGoselXVKqLVQIGiBLZvT4xTj lH0Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=SAiOoSz2aR+nU4nAvKB3E4IjIBIAvYPcM3p77o3YLH0=; b=wNzdJ8VG1lcOgMXjI/tMuna0cr7yYa2uGXwsdX4P+8X7WiwT7oFZZpi9qAhnbtx218 sRr2fFpBUXHRmYqtJKsPxjRPIMQqE/tgj63bueScX4WP7FUoCAzI4R5sLGy1u4rVCrxS S0zA9HxT6q9q1gnVGfnc7joqZcca2/uoIzQEDt1Dc7snGwon5P3tCFhc5OllQgm39Re/ SrBuxv08ENy3lt/9RPnSZh5FexZJpDLvrxpNpok+06SiTpX1ZzzyinKnLG6VYIr1Pyem Iv8lDHmie3XMDaOGGE4uuGfGdy7p2zdufh4kepuDJoWbWXCoY8Eo15RVhjcEShsu/zeq 9Ycg== X-Gm-Message-State: AOAM530nsKUQRq0j8EguiH6Ha3OXeVZenzddmYJaTXil/vu11FuwaLRi aKvsF/uk9nGZL1+cVTp/e3TlTrM54rDrJJ6kS2GV1Q== X-Google-Smtp-Source: ABdhPJzPJjBBL+Y+HgRcyAByQC7OhenxnmUpLB684dVA4hTC4HT+3IC2ky5zTXr9Y6GPGe1O1ZR9kvKqeUv9fPh/R38= X-Received: by 2002:a92:6b0b:: with SMTP id g11mr8778285ilc.146.1637386066340; Fri, 19 Nov 2021 21:27:46 -0800 (PST) MIME-Version: 1.0 References: <20211120045011.3074840-1-almasrymina@google.com> In-Reply-To: From: Mina Almasry Date: Fri, 19 Nov 2021 21:27:34 -0800 Message-ID: Subject: Re: [PATCH v4 0/4] Deterministic charging of shared memory To: Matthew Wilcox Cc: Jonathan Corbet , Alexander Viro , Andrew Morton , Johannes Weiner , Michal Hocko , Vladimir Davydov , Hugh Dickins , Shuah Khan , Shakeel Butt , Greg Thelen , Dave Chinner , Roman Gushchin , "Theodore Ts'o" , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 40C105000306 Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=oy0azkKw; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf04.hostedemail.com: domain of almasrymina@google.com designates 209.85.166.169 as permitted sender) smtp.mailfrom=almasrymina@google.com X-Stat-Signature: nms4f53fd1qrmo9suqpq4cfhk1ck7gyn X-HE-Tag: 1637386065-447299 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000189, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Nov 19, 2021 at 9:01 PM Matthew Wilcox wrote: > > On Fri, Nov 19, 2021 at 08:50:06PM -0800, Mina Almasry wrote: > > 1. One complication to address is the behavior when the target memcg > > hits its memory.max limit because of remote charging. In this case the > > oom-killer will be invoked, but the oom-killer may not find anything > > to kill in the target memcg being charged. Thera are a number of considerations > > in this case: > > > > 1. It's not great to kill the allocating process since the allocating process > > is not running in the memcg under oom, and killing it will not free memory > > in the memcg under oom. > > 2. Pagefaults may hit the memcg limit, and we need to handle the pagefault > > somehow. If not, the process will forever loop the pagefault in the upstream > > kernel. > > > > In this case, I propose simply failing the remote charge and returning an ENOSPC > > to the caller. This will cause will cause the process executing the remote > > charge to get an ENOSPC in non-pagefault paths, and get a SIGBUS on the pagefault > > path. This will be documented behavior of remote charging, and this feature is > > opt-in. Users can: > > - Not opt-into the feature if they want. > > - Opt-into the feature and accept the risk of received ENOSPC or SIGBUS and > > abort if they desire. > > - Gracefully handle any resulting ENOSPC or SIGBUS errors and continue their > > operation without executing the remote charge if possible. > > Why is ENOSPC the right error instead of ENOMEM? Returning ENOMEM from mem_cgroup_charge_mapping() will cause the application to get ENOMEM from non-pagefault paths (which is perfectly fine), and get stuck in a loop trying to resolve the pagefault in the pagefault path (less fine). The logic is here: https://elixir.bootlin.com/linux/latest/source/arch/x86/mm/fault.c#L1432 ENOMEM gets bubbled up here as VM_FAULT_OOM and on remote charges the behavior I see is that the kernel loops the pagefault forever until memory is freed in the remote memcg, and it may never will. ENOSPC gets bubbled up here as a VM_FAULT_SIGBUS and and sends a SIGBUS to the allocating process. The conjecture here is that it's preferred to send a SIGBUS to the allocating process rather than have it be stuck in a loop trying to resolve a pagefault.