From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8660AC433EF for ; Wed, 6 Jul 2022 04:03:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F04748E0002; Wed, 6 Jul 2022 00:03:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EB3238E0001; Wed, 6 Jul 2022 00:03:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DA3178E0002; Wed, 6 Jul 2022 00:03:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id CB01B8E0001 for ; Wed, 6 Jul 2022 00:03:27 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 97587607F7 for ; Wed, 6 Jul 2022 04:03:27 +0000 (UTC) X-FDA: 79655330454.10.7876854 Received: from mail-vs1-f43.google.com (mail-vs1-f43.google.com [209.85.217.43]) by imf26.hostedemail.com (Postfix) with ESMTP id 42660140004 for ; Wed, 6 Jul 2022 04:03:27 +0000 (UTC) Received: by mail-vs1-f43.google.com with SMTP id j6so13983652vsi.0 for ; Tue, 05 Jul 2022 21:03:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=hMOXU9KWNm3j8iYiYGllvra8fWrHwp8WY0VXFRlbzgk=; b=jixxLQE46gl5vVIhvX1Y6PvczVME2POiT22AeGVsZ54M4C7ru8ga21OpHfmLGi545I QNq9iYQ5tuGMMzAe4ybmIRglzba5nuW67V1AVkczeKjUPU0uudeCTbo4an5OZh0cnXfq QN4BJwIGiMVLyiQ8jeL7DrfQy7VshQgFrZ9XCOcJVQoNgW06w+9YYyU5z5W2SWBU66F6 dUeEsOnHDadhYHDC/PLMQVJ2yDrzyhrMJ3Io9zeUPvgG6mMXJ23Ek19ANTv7Bn2cGpzm xl40SFD1GfFX/Dk7RT3tcugbpB7AwEUtzlN5cxYei5HehrgWJJ5BeeR9FQlRHOpg4YmO bmIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=hMOXU9KWNm3j8iYiYGllvra8fWrHwp8WY0VXFRlbzgk=; b=HOpFF6Fj6EHVqpa9qfrwZNFym5ugbSflm+EeTTIoSibtqTf1oCKACdFHnSifNEho6Y emDjm+LJ6Mkv532piNzVLruTkTMU5VXpsgvBkbD+y1tui22ODr4B6BNDUs8wOMUMq3ro AZNNUA4JhBgPno3GIOynWSpDsoIIRMjAvSR6MqbtR2Lcc6AfyZep3FsQHbp2Pyclh+3u fXZyHuRkCl20m/xc3t0WN19i8SJsc828kgs39rm2tFhE0aqqbVEh9zPwtFOrp/XEMQKG MXsml25e2zv5QZeD4ItiveNsFKNxx0Qf2W5/iKYVyc1SXU2ucp0xLIdgD21HORvkzpoV /ciA== X-Gm-Message-State: AJIora9yOVY2w6kKj1IKPSJGQA6NvTJtHhHsfqDhfdiqXhTEtCjWRu1m 7FQ/vEqyzbhno1XB7uufkMaC73imzpN4l+O0hpM= X-Google-Smtp-Source: AGRyM1vZX8KVbnxd1aaz6SUN8zHARsVwIVAagJtSGENonH3i8iR9bvGXJGJe+lHcRC9JTfptGkwO7bk4e2jYt/RCiwU= X-Received: by 2002:a05:6102:cc6:b0:356:3c5c:beb5 with SMTP id g6-20020a0561020cc600b003563c5cbeb5mr21216824vst.80.1657080206548; Tue, 05 Jul 2022 21:03:26 -0700 (PDT) MIME-Version: 1.0 References: <20220702033521.64630-1-roman.gushchin@linux.dev> In-Reply-To: From: Yafang Shao Date: Wed, 6 Jul 2022 12:02:49 +0800 Message-ID: Subject: Re: [PATCH] mm: memcontrol: do not miss MEMCG_MAX events for enforced allocations To: Roman Gushchin Cc: Michal Hocko , Shakeel Butt , Andrew Morton , Johannes Weiner , Muchun Song , Cgroups , Linux MM , bpf Content-Type: text/plain; charset="UTF-8" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657080207; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hMOXU9KWNm3j8iYiYGllvra8fWrHwp8WY0VXFRlbzgk=; b=wbDw8SdRK59M5cvr6rZmVoKszy9M86TnTSPm5eiw/s0JxNfw0FzJpp0W+7GQ5ulJX3/ZSb BKwJ0QtEihZez7SpJ9sFmdmfyQ0OsAC2TWC5z/jEaoEbEL6Z0TyThHZgazPWzyPj8IMfcw 55rdj0W5hp9y3DGW1er0sEBvqdADMuc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1657080207; a=rsa-sha256; cv=none; b=H8EIvqfncIAwo7Hlkb81QRkul222mbKVuWQ836OVD66n1WoQOfoa8MG53NDCEbJYtiT4ly DvnFw0HeC9CDy39RStxUzQTajt07ezE/0Pt21jBHT7hu8/PKqBbuLBexvHeW+nUB4n7Dip NjDE8Rmeo2eqMmcKWbLQBZYkk1AM8cE= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=jixxLQE4; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf26.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.217.43 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com X-Stat-Signature: ooucsmu4xnc8gshs58m7am74fnwsjg6h X-Rspamd-Queue-Id: 42660140004 X-Rspam-User: Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=jixxLQE4; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf26.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.217.43 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com X-Rspamd-Server: rspam10 X-HE-Tag: 1657080207-848899 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jul 6, 2022 at 11:56 AM Roman Gushchin wrote: > > On Wed, Jul 06, 2022 at 11:42:50AM +0800, Yafang Shao wrote: > > On Wed, Jul 6, 2022 at 11:28 AM Roman Gushchin wrote: > > > > > > On Wed, Jul 06, 2022 at 10:46:48AM +0800, Yafang Shao wrote: > > > > On Wed, Jul 6, 2022 at 4:49 AM Roman Gushchin wrote: > > > > > > > > > > On Mon, Jul 04, 2022 at 05:07:30PM +0200, Michal Hocko wrote: > > > > > > On Sat 02-07-22 08:39:14, Roman Gushchin wrote: > > > > > > > On Fri, Jul 01, 2022 at 10:50:40PM -0700, Shakeel Butt wrote: > > > > > > > > On Fri, Jul 1, 2022 at 8:35 PM Roman Gushchin wrote: > > > > > > > > > > > > > > > > > > Yafang Shao reported an issue related to the accounting of bpf > > > > > > > > > memory: if a bpf map is charged indirectly for memory consumed > > > > > > > > > from an interrupt context and allocations are enforced, MEMCG_MAX > > > > > > > > > events are not raised. > > > > > > > > > > > > > > > > > > It's not/less of an issue in a generic case because consequent > > > > > > > > > allocations from a process context will trigger the reclaim and > > > > > > > > > MEMCG_MAX events. However a bpf map can belong to a dying/abandoned > > > > > > > > > memory cgroup, so it might never happen. > > > > > > > > > > > > > > > > The patch looks good but the above sentence is confusing. What might > > > > > > > > never happen? Reclaim or MAX event on dying memcg? > > > > > > > > > > > > > > Direct reclaim and MAX events. I agree it might be not clear without > > > > > > > looking into the code. How about something like this? > > > > > > > > > > > > > > "It's not/less of an issue in a generic case because consequent > > > > > > > allocations from a process context will trigger the direct reclaim > > > > > > > and MEMCG_MAX events will be raised. However a bpf map can belong > > > > > > > to a dying/abandoned memory cgroup, so there will be no allocations > > > > > > > from a process context and no MEMCG_MAX events will be triggered." > > > > > > > > > > > > Could you expand little bit more on the situation? Can those charges to > > > > > > offline memcg happen indefinetely? > > > > > > > > > > Yes. > > > > > > > > > > > How can it ever go away then? > > > > > > > > > > Bpf map should be deleted by a user first. > > > > > > > > > > > > > It can't apply to pinned bpf maps, because the user expects the bpf > > > > maps to continue working after the user agent exits. > > > > > > > > > > Also is this something that we actually want to encourage? > > > > > > > > > > Not really. We can implement reparenting (probably objcg-based), I think it's > > > > > a good idea in general. I can take a look, but can't promise it will be fast. > > > > > > > > > > In thory we can't forbid deleting cgroups with associated bpf maps, but I don't > > > > > thinks it's a good idea. > > > > > > > > > > > > > Agreed. It is not a good idea. > > > > > > > > > > In other words shouldn't those remote charges be redirected when the > > > > > > target memcg is offline? > > > > > > > > > > Reparenting is the best answer I have. > > > > > > > > > > > > > At the cost of increasing the complexity of deployment, that may not > > > > be a good idea neither. > > > > > > What do you mean? Can you please elaborate on it? > > > > > > > parent memcg > > | > > bpf memcg <- limit the memory size of bpf > > programs > > / \ > > bpf user agent pinned bpf program > > > > After bpf user agents exit, the bpf memcg will be dead, and then all > > its memory will be reparented. > > That is okay for preallocated bpf maps, but not okay for > > non-preallocated bpf maps. > > Because the bpf maps will continue to charge, but as all its memory > > and objcg are reparented, so we have to limit the bpf memory size in > > the parent as follows, > > So you're relying on the memory limit of a dying cgroup? No. I didn't say it. What I said is you can't use a dying cgroup to limit it, that's why I said that we have to use parant memcg to limit it. > Sorry, but I don't think we can seriously discuss such a design. > A dying cgroup is invisible for a user, a user can't change any tunables, > they have zero visibility into any stats or charges. Why would you do this? > > If you want the cgroup to be an active part of the memory management > process, don't delete it. There are exactly zero guarantees about what > happens with a memory cgroup after being deleted by a user, it's all > implementation details. > > Anyway, here is the patch for reparenting bpf maps: > https://github.com/rgushchin/linux/commit/f57df8bb35770507a4624fe52216b6c14f39c50c > > I gonna post it to bpf@ after some testing. > I will take a look at it. But AFAIK the reparenting can't resolve the problem of non-preallocated maps. -- Regards Yafang