From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1EB52C77B7C for ; Thu, 4 May 2023 17:02:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7630F6B007D; Thu, 4 May 2023 13:02:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6EC916B007E; Thu, 4 May 2023 13:02:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5B4746B0080; Thu, 4 May 2023 13:02:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f181.google.com (mail-qt1-f181.google.com [209.85.160.181]) by kanga.kvack.org (Postfix) with ESMTP id 3978A6B007D for ; Thu, 4 May 2023 13:02:51 -0400 (EDT) Received: by mail-qt1-f181.google.com with SMTP id d75a77b69052e-3ef31924c64so814591cf.1 for ; Thu, 04 May 2023 10:02:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1683219770; x=1685811770; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Aw5r5ALm4oFjWiHQAjuyJ/C3zh1wvAYI8hLaKDUg1ew=; b=WqN3KPJdT+xRIDnVbuxJdVPto4J2z/z1p+k2dZpdbhh36anIVnYFISHW8IAMDPjQ2c MXcYtl89Z5wRzZnAJJSXv0rlYW+KuwXbaTMjRQS7ytQtz707A7VfECWanLieTvzaYWFp 62QhEtDhbQMJ2sWXZoZxg7OphoiOQy1LKpVF97k9zELX0BLlx49fbI67O3b71dP/Ph2h 8qKkf/DqwPBg2ubww9OL/lDj+Dtagm0Lk+owMktXbBoYlXvjFbNSjRdmsKtcBchPbZ+V sUHEfoyzTrnuQ5nBN3UWqWtYwy/Y73U4apHcLi/q8mIWbeDQSeUZyZ3g1IfwMGomkuTh ldtQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683219770; x=1685811770; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Aw5r5ALm4oFjWiHQAjuyJ/C3zh1wvAYI8hLaKDUg1ew=; b=epz6Stz5/+ixnEB2JRG84Ei69ypkSp1RgSBU7iTipyC0trmOrPRkZALq4ZUVMd3YaF ibQe5vAERIfTYT0ha6w3YK9ZPoaqKIDf5c9ppP+TmAdJnVwkQTuSZMrkBIkub1scFJ21 9GzWVn9lQo+oZjrbAMbDxuaaR8jst+zHs8kv3pd9WMAgaHWAzdZoin3BYeHhPWHCPnG1 kZuYc3Lgx8hb331RYNCs9E/+Q7kYp/kSTd68s+HCbPnAWYBGoHlKLpJfUqoqLV2gx+Ch c4vU+uVWl7vwKXRPJWLZ3pTjOzMUaECUl19ECLNjv/ZkDJpdi8AWVCc9GuuYM0rXwqGu HQMQ== X-Gm-Message-State: AC+VfDynnNYruaq6yzgfiqSNfq09TderR1PMHCVnWYX+cP/8ezKcVlgN OmAKMO8+tTV4ppO1jAlBuxm8oIwM07OkX8yyromyTA== X-Google-Smtp-Source: ACHHUZ5sWGMeNtZ2Xbro8HaFXL+y2u45Xy7hoP5dhv1jZ+WOi1WIfwswJ4aBdA4RlXbsChq1OCPVyNEkm4W4wvY3T+Q= X-Received: by 2002:ac8:5a0d:0:b0:3ef:404a:b291 with SMTP id n13-20020ac85a0d000000b003ef404ab291mr326779qta.7.1683219769869; Thu, 04 May 2023 10:02:49 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Shakeel Butt Date: Thu, 4 May 2023 10:02:38 -0700 Message-ID: Subject: Re: [LSF/MM/BPF TOPIC] Reducing zombie memcgs To: Chris Li Cc: "T.J. Mercier" , lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org, cgroups@vger.kernel.org, Yosry Ahmed , Tejun Heo , Muchun Song , Johannes Weiner , Roman Gushchin , Alistair Popple , Jason Gunthorpe , Kalesh Singh , Yu Zhao Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000902, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, May 3, 2023 at 3:15=E2=80=AFPM Chris Li wrote: [...] > I am also interested in this topic. T.J. and I have some offline > discussion about this. We have some proposals to solve this > problem. > > I will share the write up here for the up coming LSF/MM discussion. > > > Shared Memory Cgroup Controllers > > =3D Introduction > > The current memory cgroup controller does not support shared memory objec= ts. For the memory that is shared between different processes, it is not ob= vious which process should get charged. Google has some internal tmpfs =E2= =80=9Cmemcg=3D=E2=80=9D mount option to charge tmpfs data to a specific me= mcg that=E2=80=99s often different from where charging processes run. Howev= er it faces some difficulties when the charged memcg exits and the charged = memcg becomes a zombie memcg. What is the exact problem this proposal is solving? Is it the zombie memcgs? To me that is just a side effect of memory shared between different memcgs. > Other approaches include =E2=80=9Cre-parenting=E2=80=9D the memcg charge = to the parent memcg. Which has its own problem. If the charge is huge, iter= ation of the reparenting can be costly. What is the iteration of the reparenting? Are you referring to reparenting the LRUs or something else? > > =3D Proposed Solution > > The proposed solution is to add a new type of memory controller for share= d memory usage. E.g. tmpfs, hugetlb, file system mmap and dma_buf. This sha= red memory cgroup controller object will have the same life cycle of the un= derlying shared memory. I am confused by the relationship between shared memory controller and the underlying shared memory. What does the same life cycle mean? Are the users expected to register the shared memory objects with the smemcg? What about unnamed shared memory objects like MAP_SHARED or memfds? How does the charging work for smemcg? Is this new controller hierarchical? > > Processes can not be added to the shared memory cgroup. Instead the share= d memory cgroup can be added to the memcg using a =E2=80=9Csmemcg=E2=80=9D = API file, similar to adding a process into the =E2=80=9Ctasks=E2=80=9D API = file. Is the charge of the underlying shared memory live with smemcg or the memcg where smemcg is attached? Can a smemcg detach and reattach to a different memcg? > When a smemcg is added to the memcg, the amount of memory that has been s= hared in the memcg process will be accounted for as the part of the memcg = =E2=80=9Cmemory.current=E2=80=9D.The memory.current of the memcg is make up= of two parts, 1) the processes anonymous memory and 2) the memory shared f= rom smemcg. The above is somewhat giving the impression that the charge of shared memory lives with smemcg. This can mess up or complicate the hierarchical property of the original memcg. > > When the memcg =E2=80=9Cmemory.current=E2=80=9D is raised to the limit. T= he kernel will active try to reclaim for the memcg to make =E2=80=9Csmemcg = memory + process anonymous memory=E2=80=9D within the limit. Further memory= allocation within those memcg processes will fail if the limit can not be = followed. If many reclaim attempts fail to bring the memcg =E2=80=9Cmemory.= current=E2=80=9D within the limit, the process in this memcg will get OOM k= illed. The OOM killing for remote charging needs much more thought. Please see https://lwn.net/Articles/787626/ for previous discussion on related topic.