From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6B3A5C433EF for ; Wed, 6 Jul 2022 04:34:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D153F8E0003; Wed, 6 Jul 2022 00:34:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CC3B88E0001; Wed, 6 Jul 2022 00:34:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B8AB48E0003; Wed, 6 Jul 2022 00:34:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id A81F28E0001 for ; Wed, 6 Jul 2022 00:34:21 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay12.hostedemail.com (Postfix) with ESMTP id 7961D12046C for ; Wed, 6 Jul 2022 04:34:21 +0000 (UTC) X-FDA: 79655408322.19.018BE9A Received: from mail-vs1-f46.google.com (mail-vs1-f46.google.com [209.85.217.46]) by imf08.hostedemail.com (Postfix) with ESMTP id 0D219160025 for ; Wed, 6 Jul 2022 04:34:20 +0000 (UTC) Received: by mail-vs1-f46.google.com with SMTP id o13so13975771vsn.4 for ; Tue, 05 Jul 2022 21:34:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=odNzXegTZ8SAvboIVBZVmaUJHo6YFqRbBWWyFG8UAHU=; b=EZV6JfN0d3wIyjBz52X9nbX0eahDoYxLWr87lThxcxFCedWmv+vqi28v3ZsZp3h881 z8EeYYU6DHFVWvSrKRqs0utVkiBFEMOESC8MVgJ3WvRqjeIoJ/8EjdTsuFowmqui8/Qa ZZcRZkGHlsqUrP5xpF+Voi5cDIgJWOeb0u88G3BxID+uaNISKwCCkrK/HbUOn7MxoMyk y8xoqfUk1f1Ds7ztO+gv3neE9gRPDhTHJruO0322Q0JHMRldX40B97Z52TmnxIGNNuG8 /3avu/HSeTR7WDFitSW3X1AcZJEkHfkFNByfjHntiDnxZXLkqvozrZRWRUonvNuq/WMp Qfeg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=odNzXegTZ8SAvboIVBZVmaUJHo6YFqRbBWWyFG8UAHU=; b=BDimdeqBhfH3hXDrZZw6PA/hWoD4CZQpGFtf563aQtfiIfvm7jJCfvvAiTZGjGlQ9B Qj7iEtgDXuhgIltpLPv0D6ltgdm733TYk1z5tz+Zo1GQes3SjTn4kuKftPh3S109QENb OZIXSIgyufdF0kJXRNwb+KRF/upS7eKbKUDAmjA2cS1NY+Jvj1Jcp74yhtY9doeGl1Wu M6omiMpkn21+XhfLI5n0bkdQ09RDDf6IWZ4RhEN3dfL12uwlWiuvx6dSFGI76dI0G4WG MO35tbZcQL3kylifXEP0ysk9d4p+SkZz61VbAVX3b3OHdbt8cNixgU7Yn4NWjiCVcAnZ ZAEg== X-Gm-Message-State: AJIora+H9Iurg2OgErJ8ZyS/aCigxOyafhwFbWjkPmEjUodnhr3hJwAF FT2TjR2d0I31rL4qNguviUj9xvHRGr06hREgN/w= X-Google-Smtp-Source: AGRyM1vKG3TeKI6Jmjnl3f4kwZsEhNRJ+PgYhDhbMoR1jjY9KsgZmU4TRnrB+rR8Ri36ItTnazYXaWSurlKeAxwv4D4= X-Received: by 2002:a67:ffc8:0:b0:357:8ec:4b42 with SMTP id w8-20020a67ffc8000000b0035708ec4b42mr1611294vsq.16.1657082060237; Tue, 05 Jul 2022 21:34:20 -0700 (PDT) MIME-Version: 1.0 References: <20220702033521.64630-1-roman.gushchin@linux.dev> In-Reply-To: From: Yafang Shao Date: Wed, 6 Jul 2022 12:33:43 +0800 Message-ID: Subject: Re: [PATCH] mm: memcontrol: do not miss MEMCG_MAX events for enforced allocations To: Roman Gushchin Cc: Michal Hocko , Shakeel Butt , Andrew Morton , Johannes Weiner , Muchun Song , Cgroups , Linux MM , bpf Content-Type: text/plain; charset="UTF-8" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657082061; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=odNzXegTZ8SAvboIVBZVmaUJHo6YFqRbBWWyFG8UAHU=; b=v/Bt/4JBTiwSsaNopROlPWg3S8SfAhXY9Fj7gXBSs8JRINcwRj6WT4/goYJbuNF/WeM//F GhwHvYR+B3aDyv7j2qzZBs9Zu/QBkSxIV+sGObfn68fMXIPRRWlJ88KBEKm5fh5eLU/uTy q1uX0ud0oIpRtTLmHxtMCU4BOfL3s/0= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1657082061; a=rsa-sha256; cv=none; b=cX+i/wKi8cr8pkc6aJTIGh9Wzjv4/6KSlBCsrRjIvqCz0TKKvat88ej3lyToqPU1mZQPtF zFwARuqcBpjpT1Kf6so2zjWW5BwB2FXexJyLGqgeUop8dri4VIBLw9oHvrHud52wN3+tJD p1yDjPJr+lwbCg8GdjEaXFF1oeZj9IQ= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=EZV6JfN0; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf08.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.217.46 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com X-Stat-Signature: 5wygqfz4bsepxm67k85b5gzkmjdpbxos X-Rspamd-Queue-Id: 0D219160025 X-Rspam-User: Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=EZV6JfN0; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf08.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.217.46 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com X-Rspamd-Server: rspam10 X-HE-Tag: 1657082060-885944 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000939, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jul 6, 2022 at 12:19 PM Roman Gushchin wrote: > > On Wed, Jul 06, 2022 at 12:02:49PM +0800, Yafang Shao wrote: > > On Wed, Jul 6, 2022 at 11:56 AM Roman Gushchin wrote: > > > > > > On Wed, Jul 06, 2022 at 11:42:50AM +0800, Yafang Shao wrote: > > > > On Wed, Jul 6, 2022 at 11:28 AM Roman Gushchin wrote: > > > > > > > > > > On Wed, Jul 06, 2022 at 10:46:48AM +0800, Yafang Shao wrote: > > > > > > On Wed, Jul 6, 2022 at 4:49 AM Roman Gushchin wrote: > > > > > > > > > > > > > > On Mon, Jul 04, 2022 at 05:07:30PM +0200, Michal Hocko wrote: > > > > > > > > On Sat 02-07-22 08:39:14, Roman Gushchin wrote: > > > > > > > > > On Fri, Jul 01, 2022 at 10:50:40PM -0700, Shakeel Butt wrote: > > > > > > > > > > On Fri, Jul 1, 2022 at 8:35 PM Roman Gushchin wrote: > > > > > > > > > > > > > > > > > > > > > > Yafang Shao reported an issue related to the accounting of bpf > > > > > > > > > > > memory: if a bpf map is charged indirectly for memory consumed > > > > > > > > > > > from an interrupt context and allocations are enforced, MEMCG_MAX > > > > > > > > > > > events are not raised. > > > > > > > > > > > > > > > > > > > > > > It's not/less of an issue in a generic case because consequent > > > > > > > > > > > allocations from a process context will trigger the reclaim and > > > > > > > > > > > MEMCG_MAX events. However a bpf map can belong to a dying/abandoned > > > > > > > > > > > memory cgroup, so it might never happen. > > > > > > > > > > > > > > > > > > > > The patch looks good but the above sentence is confusing. What might > > > > > > > > > > never happen? Reclaim or MAX event on dying memcg? > > > > > > > > > > > > > > > > > > Direct reclaim and MAX events. I agree it might be not clear without > > > > > > > > > looking into the code. How about something like this? > > > > > > > > > > > > > > > > > > "It's not/less of an issue in a generic case because consequent > > > > > > > > > allocations from a process context will trigger the direct reclaim > > > > > > > > > and MEMCG_MAX events will be raised. However a bpf map can belong > > > > > > > > > to a dying/abandoned memory cgroup, so there will be no allocations > > > > > > > > > from a process context and no MEMCG_MAX events will be triggered." > > > > > > > > > > > > > > > > Could you expand little bit more on the situation? Can those charges to > > > > > > > > offline memcg happen indefinetely? > > > > > > > > > > > > > > Yes. > > > > > > > > > > > > > > > How can it ever go away then? > > > > > > > > > > > > > > Bpf map should be deleted by a user first. > > > > > > > > > > > > > > > > > > > It can't apply to pinned bpf maps, because the user expects the bpf > > > > > > maps to continue working after the user agent exits. > > > > > > > > > > > > > > Also is this something that we actually want to encourage? > > > > > > > > > > > > > > Not really. We can implement reparenting (probably objcg-based), I think it's > > > > > > > a good idea in general. I can take a look, but can't promise it will be fast. > > > > > > > > > > > > > > In thory we can't forbid deleting cgroups with associated bpf maps, but I don't > > > > > > > thinks it's a good idea. > > > > > > > > > > > > > > > > > > > Agreed. It is not a good idea. > > > > > > > > > > > > > > In other words shouldn't those remote charges be redirected when the > > > > > > > > target memcg is offline? > > > > > > > > > > > > > > Reparenting is the best answer I have. > > > > > > > > > > > > > > > > > > > At the cost of increasing the complexity of deployment, that may not > > > > > > be a good idea neither. > > > > > > > > > > What do you mean? Can you please elaborate on it? > > > > > > > > > > > > > parent memcg > > > > | > > > > bpf memcg <- limit the memory size of bpf > > > > programs > > > > / \ > > > > bpf user agent pinned bpf program > > > > > > > > After bpf user agents exit, the bpf memcg will be dead, and then all > > > > its memory will be reparented. > > > > That is okay for preallocated bpf maps, but not okay for > > > > non-preallocated bpf maps. > > > > Because the bpf maps will continue to charge, but as all its memory > > > > and objcg are reparented, so we have to limit the bpf memory size in > > > > the parent as follows, > > > > > > So you're relying on the memory limit of a dying cgroup? > > > > No. I didn't say it. What I said is you can't use a dying cgroup to > > limit it, that's why I said that we have to use parant memcg to limit > > it. > > > > > Sorry, but I don't think we can seriously discuss such a design. > > > A dying cgroup is invisible for a user, a user can't change any tunables, > > > they have zero visibility into any stats or charges. Why would you do this? > > > > > > If you want the cgroup to be an active part of the memory management > > > process, don't delete it. There are exactly zero guarantees about what > > > happens with a memory cgroup after being deleted by a user, it's all > > > implementation details. > > > > > > Anyway, here is the patch for reparenting bpf maps: > > > https://github.com/rgushchin/linux/commit/f57df8bb35770507a4624fe52216b6c14f39c50c > > > > > > I gonna post it to bpf@ after some testing. > > > > > > > I will take a look at it. > > But AFAIK the reparenting can't resolve the problem of non-preallocated maps. > > Sorry, what's the problem then? > The problem is, the bpf memcg or its parent memcg can't be destroyed currently. IOW, you have to forbid the user to rmdir. Reparenting is an improvement for the preallocated bpf map, because all its memory is charged, so the memg is useless any more. So it can be destroyed and thus the reparenting is an improvement. But for the non-preallocated bpf map, the memcg still has to do the limit work, that means, it can't be destroyed currently. If you reparent it, then the parent can't be destroyed. So why not forbid destroying the bpf memcg in the first place? The reparenting just increases the complexity for this case. > Michal asked how we can prevent an indefinite pinning of a dying memcg by an associated > bpf map being used by other processes, and I guess the objcg-based reparenting is > the best answer here. You said it will complicate the deployment? What does it mean? > See my reply above. > From a user's POV there is no visible difference. What am I missing here? > Yes, if we reparent the bpf map, memory.max of the original memory cgroup will > not apply, but as I said, if you want it to be effective, don't delete the cgroup. > -- Regards Yafang