From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 09535C43334 for ; Wed, 6 Jul 2022 03:43:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2A02B8E0002; Tue, 5 Jul 2022 23:43:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2501B8E0001; Tue, 5 Jul 2022 23:43:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 119038E0002; Tue, 5 Jul 2022 23:43:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id F3CCC8E0001 for ; Tue, 5 Jul 2022 23:43:27 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id C866D213AC for ; Wed, 6 Jul 2022 03:43:27 +0000 (UTC) X-FDA: 79655280054.30.8ECD8E9 Received: from mail-vk1-f177.google.com (mail-vk1-f177.google.com [209.85.221.177]) by imf21.hostedemail.com (Postfix) with ESMTP id 7009E1C0016 for ; Wed, 6 Jul 2022 03:43:27 +0000 (UTC) Received: by mail-vk1-f177.google.com with SMTP id j63so1275959vkj.8 for ; Tue, 05 Jul 2022 20:43:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=4s7QfIni8DNDKUYteD89leDpxTEhHiten0cLfesaJz4=; b=UyQxnKyKoBR4VtdV1CEMjIZa+lTh/H3cn2zFHp1RBMW4Oj9KSB/m6y3YZ54vLf9iL9 GrvvJIV+YqkeVwTtqPZGOZ1tHmuMVjz0jKRhE7qSys62FSUD9gmLmalULyvqqMICQI/N 1EMbRQgg1P2QsWlyqTzkwOMPIwC1iFa3xgFqN+R0gFtUL9klR9t33bigmdHs0Im9ntTV tDluc/Z6v6Oqf8wqakDEKAt0Ok4+cG+oYgqp2G20B1YDpTZ/G16PSP1FUvTO+AbpxqYz TT74ed4RtJVYwtDlX/1qU9hWnFsCx6YTroIXm2xmWpIaTGiUMjZ/UQJhY/36wvYSDPMY Z/0A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=4s7QfIni8DNDKUYteD89leDpxTEhHiten0cLfesaJz4=; b=YNdjQtsUjbPg41s4lUQjFzFoCTau7V8+4hoUyXaiN6C3nDWwUDgXS9ptqcRUw+Ztao eGGLfxSsHAwGj9aUKfDDSXU6N9rxAh3d9rsJe600AV9kufyW9gro10nbT8GKqoEBAOMH lXuWdFxZfaXx35M0NlUkbqm+Yy7+9+CVm3LEIdWiR/9AWnjvcAT3Ga+BVTgq7kK06kdQ rwqTMwXJk1t99HMdlJ3RL6DrVCG6APmiN+mUwAUuym6vU1FlXoCHNRK2VjTLtzPdB9ti 9KNMI6mOaSNTX5fHYeT4ajWAqtbUXtBjSI44cFYFKs4tHS8+t298RMlmR5ndV4haShHJ EUEA== X-Gm-Message-State: AJIora+zRkjZFVVtAjShZBS2+VsNT4cUnsRG1/NOMTTsA7SiS+lITT/b gluvl95q4sgtiGqgOcNXIUizrhPbIiIShdL8erw= X-Google-Smtp-Source: AGRyM1vnsq/BKAFKGOmTnJmTq4p0i/3b7U5U9R9o6b8Cbx6nZePzKQoG5ubL8lObfp+ph1yuTSR7xtvaR4iieJqnvJ0= X-Received: by 2002:ac5:cb6f:0:b0:36c:424b:6d79 with SMTP id l15-20020ac5cb6f000000b0036c424b6d79mr21794473vkn.14.1657079006712; Tue, 05 Jul 2022 20:43:26 -0700 (PDT) MIME-Version: 1.0 References: <20220702033521.64630-1-roman.gushchin@linux.dev> In-Reply-To: From: Yafang Shao Date: Wed, 6 Jul 2022 11:42:50 +0800 Message-ID: Subject: Re: [PATCH] mm: memcontrol: do not miss MEMCG_MAX events for enforced allocations To: Roman Gushchin Cc: Michal Hocko , Shakeel Butt , Andrew Morton , Johannes Weiner , Muchun Song , Cgroups , Linux MM , bpf Content-Type: text/plain; charset="UTF-8" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1657079007; a=rsa-sha256; cv=none; b=4YgAe9XJUmbT4w3/0ImAEhlU3e/ALbJt7mkLoAiyKxCBZymDWzSNtWQbL97tAtev06nTBM rk4BY3nUXV3JH5x7n/Q8qkA13osIklJ0y2y9dlyXwhH0ctu4CRhFxUDVS7TI9qvv6TH/th tksSMb9BE82wMz8Ks9V3A755/FwAfzk= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=UyQxnKyK; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf21.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.221.177 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657079007; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=4s7QfIni8DNDKUYteD89leDpxTEhHiten0cLfesaJz4=; b=j02AU81sBLUxor1uoj7D9bVAbOZsHDYm10FjqtCiBC6AfBohDpSiKF9MEWWPxsAmE5AMYL OXn26udfI7JKeQBbaPd3rK1DRsxrJZVoR/2Dj/EdsnuvkLKNXQVNt9mIHOo5j9XTJKZjfd W1KiCV/25mLBDaXLPkUmPkaxYktTZ/0= Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=UyQxnKyK; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf21.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.221.177 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com X-Stat-Signature: 1huty5wa6663jckm7i9ype49bip5gdcd X-Rspamd-Queue-Id: 7009E1C0016 X-Rspamd-Server: rspam05 X-Rspam-User: X-HE-Tag: 1657079007-358305 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jul 6, 2022 at 11:28 AM Roman Gushchin wrote: > > On Wed, Jul 06, 2022 at 10:46:48AM +0800, Yafang Shao wrote: > > On Wed, Jul 6, 2022 at 4:49 AM Roman Gushchin wrote: > > > > > > On Mon, Jul 04, 2022 at 05:07:30PM +0200, Michal Hocko wrote: > > > > On Sat 02-07-22 08:39:14, Roman Gushchin wrote: > > > > > On Fri, Jul 01, 2022 at 10:50:40PM -0700, Shakeel Butt wrote: > > > > > > On Fri, Jul 1, 2022 at 8:35 PM Roman Gushchin wrote: > > > > > > > > > > > > > > Yafang Shao reported an issue related to the accounting of bpf > > > > > > > memory: if a bpf map is charged indirectly for memory consumed > > > > > > > from an interrupt context and allocations are enforced, MEMCG_MAX > > > > > > > events are not raised. > > > > > > > > > > > > > > It's not/less of an issue in a generic case because consequent > > > > > > > allocations from a process context will trigger the reclaim and > > > > > > > MEMCG_MAX events. However a bpf map can belong to a dying/abandoned > > > > > > > memory cgroup, so it might never happen. > > > > > > > > > > > > The patch looks good but the above sentence is confusing. What might > > > > > > never happen? Reclaim or MAX event on dying memcg? > > > > > > > > > > Direct reclaim and MAX events. I agree it might be not clear without > > > > > looking into the code. How about something like this? > > > > > > > > > > "It's not/less of an issue in a generic case because consequent > > > > > allocations from a process context will trigger the direct reclaim > > > > > and MEMCG_MAX events will be raised. However a bpf map can belong > > > > > to a dying/abandoned memory cgroup, so there will be no allocations > > > > > from a process context and no MEMCG_MAX events will be triggered." > > > > > > > > Could you expand little bit more on the situation? Can those charges to > > > > offline memcg happen indefinetely? > > > > > > Yes. > > > > > > > How can it ever go away then? > > > > > > Bpf map should be deleted by a user first. > > > > > > > It can't apply to pinned bpf maps, because the user expects the bpf > > maps to continue working after the user agent exits. > > > > > > Also is this something that we actually want to encourage? > > > > > > Not really. We can implement reparenting (probably objcg-based), I think it's > > > a good idea in general. I can take a look, but can't promise it will be fast. > > > > > > In thory we can't forbid deleting cgroups with associated bpf maps, but I don't > > > thinks it's a good idea. > > > > > > > Agreed. It is not a good idea. > > > > > > In other words shouldn't those remote charges be redirected when the > > > > target memcg is offline? > > > > > > Reparenting is the best answer I have. > > > > > > > At the cost of increasing the complexity of deployment, that may not > > be a good idea neither. > > What do you mean? Can you please elaborate on it? > parent memcg | bpf memcg <- limit the memory size of bpf programs / \ bpf user agent pinned bpf program After bpf user agents exit, the bpf memcg will be dead, and then all its memory will be reparented. That is okay for preallocated bpf maps, but not okay for non-preallocated bpf maps. Because the bpf maps will continue to charge, but as all its memory and objcg are reparented, so we have to limit the bpf memory size in the parent as follows, parent memcg <- limit the memory size of bpf programs | bpf memcg / \ bpf user agent pinned bpf program That means parent memcg can't be deleted and can only contain one bpf memcg. It may work if we use systemd to manage the memcgs, but it will be a problem if we use k8s to manage the memcgs. -- Regards Yafang