From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 579DDC25B74 for ; Fri, 17 May 2024 02:21:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 539216B007B; Thu, 16 May 2024 22:21:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4E92C6B0083; Thu, 16 May 2024 22:21:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3B0D26B0085; Thu, 16 May 2024 22:21:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 1AF056B007B for ; Thu, 16 May 2024 22:21:42 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 6A29242A1B for ; Fri, 17 May 2024 02:21:41 +0000 (UTC) X-FDA: 82126286802.03.F2014B1 Received: from mail-yb1-f179.google.com (mail-yb1-f179.google.com [209.85.219.179]) by imf24.hostedemail.com (Postfix) with ESMTP id A3876180009 for ; Fri, 17 May 2024 02:21:39 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Fbkx27KZ; spf=pass (imf24.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.179 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1715912499; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=s2kNiqWce50N3wzSAZPVVQzi32Ss+5CvUe/kUi/ijtk=; b=sMXcnddBo4oJhHoTVGWWQRHgkL79GsUWXcze/50HHS9zC41CfQaOWn4TrUhmhQrLaB+zWt I57vnEACtzZIVA7KC9GZOcyRA0wtbbUBBYUn3Hlt5HdDNO3Zm29zDmH+7R8tLjE59qomty Aw40L6DQgoEpq1uczYeSS6OjYjIZZc0= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Fbkx27KZ; spf=pass (imf24.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.179 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1715912499; a=rsa-sha256; cv=none; b=5MyKhHomjnEHdQrAjppLsGrOgB7eQWF8M9yGOXPUvC49gLpvacvq80p8QQdk4y5mGZhbOM qA98xvfxtpfvU4XyUp0dvw+UZkyNt1yI1dPAhHCB6SdD5UkgL8fxKIok+goLAddqsdxcmu 2Pv0rkWVt8nCEjsrOIrWy2kY8nI6kMc= Received: by mail-yb1-f179.google.com with SMTP id 3f1490d57ef6-de60a51fe21so8716269276.0 for ; Thu, 16 May 2024 19:21:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1715912499; x=1716517299; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=s2kNiqWce50N3wzSAZPVVQzi32Ss+5CvUe/kUi/ijtk=; b=Fbkx27KZUlMUxgBtSJaS0doytT4RQe0O7LII07wQI8lSMcHPDTCxmKODYIxdGBTaWp htTkJy5/L6Oi6iy9pnu0hdosf9UluUdDCI+mjrFexm7US7gUviadXlb76kubLeCvUbCS Thdxfwqy4Z+koeXOgovrLdkPuX3MuCD9J7BfcUrn/AdAcBlh3hlJO/PeqJRW3pSiwgja N3cb9yJFQMqy4Bq1zl99BGofsxQ1kkvukBtufRFvomvPG78vFxURjSwUHtapAf6md/cz JvwxOr0dzkoRc4y3W0Qx5Ej5rdyI7TIxBn6MrrIOUlh6qiZkucmKKmb9UV0HY3404f7W EXGQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715912499; x=1716517299; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=s2kNiqWce50N3wzSAZPVVQzi32Ss+5CvUe/kUi/ijtk=; b=HfeXwrkv6Lh1YYaSeohL9VMc7cb/zt0ypGzrQ23eFz4mbRJZkXpWY9q6ueX6PfYBfi gRdkGu65tdmd4aLOasWLaWpCZIJKkvK7GivUe+mqYTrSYtou5PLhBveVbQ//GAydu/i8 mwB3TxovofOnr6yza1Il7UwShwYgfIwy5Lu1dmN8Cc3VO7nDJD+HXNCHRu5gU+en72FI P1h8mf4aXQTk9ZulYptxKMTdoZ0qDdkUMhJ8H8Ce3S/dcFHojeYUmbBpAoPAdzq18xMx FoxDWiqPZDRyJqIMQLrez6benAH8ia7hvXq1XxSnQFfwkH+XgHUQWmgJ6ue9MRRcXoDb 3uQg== X-Forwarded-Encrypted: i=1; AJvYcCUz9ykcECqLNsjKNKkQeHa+DG1zwS6kmTwIoembfGR0DXmHcDu9IAsL4kwXB7RiJHhKm9lF4SVwDZ+8PmQqVzGIfYE= X-Gm-Message-State: AOJu0YxoB4W6BGzotEr+vBuQNXTnKwWlRp9CtTdOcbSObHmLR3AKn2ey +tpCdgd/7+a+Q6KhOuor7s1Y/MfdT/0oe0jTmYzx3p5ZWUfKZvPxiFgGL1RptUTji87/qPGnnY2 giCKqIPlsITlzV7exgTbhkxl/+2w= X-Google-Smtp-Source: AGHT+IGTVt5SLoCsm5aUVOJ0dhn4ZQGXPMwJfFfPApDghZLzCvMEzlEAKE0Xf5kS7eo6yjAal/9UI3SiVcggjWOkOjo= X-Received: by 2002:a25:5502:0:b0:de0:deb0:c363 with SMTP id 3f1490d57ef6-dee4f35b99fmr24598331276.31.1715912498612; Thu, 16 May 2024 19:21:38 -0700 (PDT) MIME-Version: 1.0 References: <20240509034138.2207186-1-roman.gushchin@linux.dev> In-Reply-To: From: Yafang Shao Date: Fri, 17 May 2024 10:21:01 +0800 Message-ID: Subject: Re: [PATCH rfc 0/9] mm: memcg: separate legacy cgroup v1 code and put under config option To: Roman Gushchin Cc: Shakeel Butt , Yosry Ahmed , Andrew Morton , Muchun Song , Johannes Weiner , Michal Hocko , Matthew Wilcox , linux-mm@kvack.org, linux-kernel@vger.kernel.org, gthelen@google.coma, rientjes@google.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: A3876180009 X-Stat-Signature: dwyhsoye6sw3ngyp58ny8qawthb658op X-Rspam-User: X-HE-Tag: 1715912499-496962 X-HE-Meta: U2FsdGVkX1871L+X5mwKWa96A53jhMH55l3JknHTKdcN3XmNn65M9mLDDn9eBHe1fYm4liQTqv+G+gVZc6sq/0xP+hIBZXVtfXse5QZ20KxG3/XKhWtubgQ7QTYdMF10kAdRFaz4lrC/LudN0+TX5esQU92jl4OQ5SgqVnzpLoT9RSsRCIPnivKSEoV2ji/wvs3KtDDNvuJ927Gmjtj2GdmdPr5V2PGoEVFfIL06o2rFcjfx2ZZsv/tnVd/ME/LKP+U7O4HsakPhJBYz8BdXYw9PZsIAhQbvKI5evBUT3GpMPxZlMhHEx1IAmUKee6XrSF6juSfh+X/Ho+LBHQBND4hAexmv8FvrxTDlZGa+nYII7WadPlnoD0bhK62Qf0RwIWa+rs/fhndr3DVOfKnWchB2CROjAIRU1sbBvaM1VvfEbAwHfiIszKsugh6H9zjxjzm5psnIKt+dyobkF3VlfvqyOYcxyFnLs/i/kkE/kcg+Od9xc8UTDn9HyPBXecIY+mb9GeUEmWv2XlEF97e6hJ1Jv8Vj96exIMj9eiL1Nn/YgdEZNtNhQAC+zHqF3e4aalRuL2wSwvX9r66bHQ7MOJt+3QDikZ05kJPC9D07XZ8RGh1L8pTE98V2A48zvDc8MBPKPVe0mIXH5dafnZpDRrBTrOYDrQOvNNpOnt0+K/3GPeRE2M87q5EqpX3OKNvAOI/GOfERRNM5faaO2dv/KXRhkpwGbiEz01WP1Z2T4azMMLczcnd/blYBiJ9hZetoZGnQm8J0OutHdm2F2Q5beWIvU0fJEmoPQPAUrKvPbBHLbOdCuY0YBSC0/8KeWWkp4V7FnjB2bodcXAEZ7vBi2y4Bon22Qy8oo7q2I+YJ9BAL3i1lJFRiMdQNGbDQezzxCbY1WHK2wx/52PcUOEBYIVeVgOF3GHCARM9wqvKrUmVp8RuaCMevH5JQOnZyATUZz3Xmuw853XCH0xDKlrI FFmqmYP+ In5vBgNlGDXQOAksugYIqhUTsPydnbdg5ERzmrqWixXCj3Rc+tcCA+j/Zu5JzVDZ4AWey57qmqdJS78j5jjJwLKo+sKFP77TK/fDY8h/DKVIkJW1oWfzcTjuHwvAUoImonb4W59G+atlH9bHKMfwij63yUlI05EErhmforb6ElTfRK1WkTbUq4opTfU2XhkVdTVtrkyV5jou80TxMuD91+OiTbV0b7iIotTEbhc3KX84w/QxQjMHtQxX8j6NYlqXdjIiOKePZmTK8uVHFnO48sBzjBY8QfD0b9OOxmcOF7c1zfgJFBqg8PBMw4isLB0FIy3Gs8sEW5D0L7escMz5XT5KnjCMudpr4BkfjsULjzf/2eFruicWDFSI0ruNzXQncKb9M X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, May 17, 2024 at 1:29=E2=80=AFAM Roman Gushchin wrote: > > On Thu, May 16, 2024 at 11:35:57AM +0800, Yafang Shao wrote: > > On Thu, May 9, 2024 at 2:33=E2=80=AFPM Shakeel Butt wrote: > > > > > > On Wed, May 08, 2024 at 08:41:29PM -0700, Roman Gushchin wrote: > > > > Cgroups v2 have been around for a while and many users have fully a= dopted them, > > > > so they never use cgroups v1 features and functionality. Yet they h= ave to "pay" > > > > for the cgroup v1 support anyway: > > > > 1) the kernel binary contains useless cgroup v1 code, > > > > 2) some common structures like task_struct and mem_cgroup have neve= r used > > > > cgroup v1-specific members, > > > > 3) some code paths have additional checks which are not needed. > > > > > > > > Cgroup v1's memory controller has a number of features that are not= supported > > > > by cgroup v2 and their implementation is pretty much self contained= . > > > > Most notably, these features are: soft limit reclaim, oom handling = in userspace, > > > > complicated event notification system, charge migration. > > > > > > > > Cgroup v1-specific code in memcontrol.c is close to 4k lines in siz= e and it's > > > > intervened with generic and cgroup v2-specific code. It's a burden = on > > > > developers and maintainers. > > > > > > > > This patchset aims to solve these problems by: > > > > 1) moving cgroup v1-specific memcg code to the new mm/memcontrol-v1= .c file, > > > > 2) putting definitions shared by memcontrol.c and memcontrol-v1.c i= nto the > > > > mm/internal.h header > > > > 3) introducing the CONFIG_MEMCG_V1 config option, turned on by defa= ult > > > > 4) making memcontrol-v1.c to compile only if CONFIG_MEMCG_V1 is set > > > > 5) putting unused struct memory_cgroup and task_struct members unde= r > > > > CONFIG_MEMCG_V1 as well. > > > > > > > > This is an RFC version, which is not 100% polished yet, so but it w= ould be great > > > > to discuss and agree on the overall approach. > > > > > > > > Some open questions, opinions are appreciated: > > > > 1) I consider renaming non-static functions in memcontrol-v1.c to h= ave > > > > mem_cgroup_v1_ prefix. Is this a good idea? > > > > 2) Do we want to extend it beyond the memory controller? Should > > > > 3) Is it better to use a new include/linux/memcontrol-v1.h instead = of > > > > mm/internal.h? Or mm/memcontrol-v1.h. > > > > > > > > > > Hi Roman, > > > > > > A very timely and important topic and we should definitely talk about= it > > > during LSFMM as well. I have been thinking about this problem for qui= te > > > sometime and I am getting more and more convinced that we should aim = to > > > completely deprecate memcg-v1. > > > > > > More specifically: > > > > > > 1. What are the memcg-v1 features which have no alternative in memcg-= v2 > > > and are blocker for memcg-v1 users? (setting aside the cgroup v2 > > > structual restrictions) > > > > > > 2. What are unused memcg-v1 features which we should start deprecatin= g? > > > > > > IMO we should systematically start deprecating memcg-v1 features and > > > start unblocking the users stuck on memcg-v1. > > > > > > Now regarding the proposal in this series, I think it can be a first > > > step but should not give an impression that we are done. The only > > > concern I have is the potential of "out of sight, out of mind" situat= ion > > > with this change but if we keep the momentum of deprecation of memcg-= v1 > > > it should be fine. > > > > > > I have CCed Greg and David from Google to get their opinion on what > > > memcg-v1 features are blocker for their memcg-v2 migration and if the= y > > > have concern in deprecation of memcg-v1 features. > > > > > > Anyone else still on memcg-v1, please do provide your input. > > > > Hi Shakeel, > > > > Hopefully I'm not too late. We are currently using memcg v1. > > > > One specific feature we rely on in v1 is skmem accounting. In v1, we > > account for TCP memory usage without charging it to memcg v1, which is > > useful for monitoring the TCP memory usage generated by tasks running > > in a container. However, in memcg v2, monitoring TCP memory requires > > charging it to the container, which can easily cause OOM issues. It > > would be better if we could monitor skmem usage without charging it in > > the memcg v2, allowing us to account for it without the risk of > > triggering OOM conditions. > > Hi Yafang, > > the data itself is available on cgroup v2 in memory.stat:sock, however > you're right, it's charged on pair with other types of memory. It was > one of the main principles of cgroup v2's memory controller, so I don't > think it can be changed. > > So the feature you need is not skmem accounting, but something quite > opposite :) > > The question I have here: what makes socket memory different here? > > Is it something specific to your setup (e.g. you mostly use memory.max > to protect against memory leaks in the userspace code, but socket memory > spikes are always caused by external traffic and are legit) or we have > more fundamental problems with the socket memory handling, e.g. we can't > effectively reclaim it under the memory pressure? It is the first case. > > In the first case you can maintain a ~2-lines non-upstream patch which wi= ll > disable the charging while maintaining statistics - it's not a perfect, b= ut > likely the best option here. In the second case we need collectively fix = it > for cgroup v2. > Thank you for your advice. Currently, we do not have any immediate plans to migrate to cgroup v2. If we are required to use cgroup v2 in the future, we will need to maintain non-upstream patches. By the way, is there any reason we cannot keep this behavior consistent with memcg v1 in the upstream kernel? That would save us from having to maintain it locally. --=20 Regards Yafang