From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 88BEAEB64DA for ; Thu, 20 Jul 2023 23:24:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 95A0428016B; Thu, 20 Jul 2023 19:24:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 90AFF28004C; Thu, 20 Jul 2023 19:24:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7AA8B28016B; Thu, 20 Jul 2023 19:24:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 6AE3928004C for ; Thu, 20 Jul 2023 19:24:17 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 02C601202F0 for ; Thu, 20 Jul 2023 23:24:16 +0000 (UTC) X-FDA: 81033570954.26.D6CF493 Received: from mail-yw1-f177.google.com (mail-yw1-f177.google.com [209.85.128.177]) by imf10.hostedemail.com (Postfix) with ESMTP id 44DFDC0008 for ; Thu, 20 Jul 2023 23:24:15 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b="ZyqL/pfs"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf10.hostedemail.com: domain of tjmercier@google.com designates 209.85.128.177 as permitted sender) smtp.mailfrom=tjmercier@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689895455; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Wla3efE6JyGPdqvNC8P15XtJgOh6q0zRuJGb6WkYFkU=; b=GBJSrv4wlIEeBEYbhRFTQGJwZc1A1sH99C5m7hRsHVBNGqp6k1FlycfAr345R0d/FX+RfY 0zjLM2a/btlOSFl3cEdW3hByMLX0Sie0pWu6veTmuIFvyujPN+LxplH1T/qGrrJjsU1LG7 vzrl2eeGY95d11zAiKMlAqbG1Tsh9uQ= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b="ZyqL/pfs"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf10.hostedemail.com: domain of tjmercier@google.com designates 209.85.128.177 as permitted sender) smtp.mailfrom=tjmercier@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689895455; a=rsa-sha256; cv=none; b=1TPq/rQiVlWICU9oygwACA2Pup/2/NV6zKjcI1ewS7T1XtGOeOyV7BsEA2dSlFKqihiWnL 8vLW2FaRWehKOAb6PSvAt90MT9EVGHU0oBEl2BjUEFAeVhT92gO3o6ZIpIYZdUL2RvNZUN Km7K76z1iZls79EMb6sim1RQlza9CWw= Received: by mail-yw1-f177.google.com with SMTP id 00721157ae682-57a551ce7e9so14519867b3.3 for ; Thu, 20 Jul 2023 16:24:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689895454; x=1690500254; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Wla3efE6JyGPdqvNC8P15XtJgOh6q0zRuJGb6WkYFkU=; b=ZyqL/pfs56dYsjzgW1sC88OX6ea1yYxnkLNnDI3qkpD1FfBIuqlzveXINQdKsv5k/L b2hWGu0BSe43eg7hgztFXLBLiWGR5sElL2WlIU8IWQba5lCaD4QHi2h585ZS79ells+d EyYJPYuffXS94IYaIaFx2HapRRKi7Fq8ooPOzLLb2POx0evTUdrmNqgOETvS6AbjTz7Z c7qnuP55qbCcv2pgvn6TUwH+46rb4fwqMSdQUAgZYHLo13hDnxfK38H5hjC2TMZl6ldF /QRVPbyCjd+D1wZv8SxRGevdUw+z00sMJyXtuRO414Fp54C7WEsXKxMEiB2sLuiW18fE uFEg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689895454; x=1690500254; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Wla3efE6JyGPdqvNC8P15XtJgOh6q0zRuJGb6WkYFkU=; b=bL5rW4RknBC/ajN3qBW4om6kSl5pbA0i07WUt5OCM4gwvtpUC/CpQo38SZxwit8u+O EryyZ5+ZEkBIeF/HTl9/F/ESmK88APXEgwO581vtVxMSYMZdZsD+6J5tXs9ESgL3iKBe Ur/+DSf1xvrw4V7s8Zd5vADESSs8DptplP87L/IMLX66o7nlTK8Myly77aqCxeL4BcOt luLgivU2CZ/XXRtmatfyxnXf1l9S1ngRsP9lOzkQurS3EtWcIfmqrcjM4/QJKqlMpKLF /ap/FY1hiEp0PLT0Asax10aqtkLPQnDTwPrDdbieJNVL0BM4xTEYkBlnaqpdjXE329S/ Jugg== X-Gm-Message-State: ABy/qLbIUNgivsqNNkWTN4K4orWmYpLpo87ApZYCxVoYt5qFfQ9bf7QQ Df5KrUtzmcEpcXfpLLXFqUJ5JQSStTPew8ZNOK6sag== X-Google-Smtp-Source: APBJJlGnGQ0f1m/FIydRTzO3jhn9Rx7MtlEgLX6UdsXfRcxJ/oxbgTesXn/E/6PLQeCPvA1stBZ+aWU7XnVi8GNA2B0= X-Received: by 2002:a0d:d646:0:b0:565:9fc7:9330 with SMTP id y67-20020a0dd646000000b005659fc79330mr530144ywd.17.1689895454153; Thu, 20 Jul 2023 16:24:14 -0700 (PDT) MIME-Version: 1.0 References: <20230720070825.992023-1-yosryahmed@google.com> <20230720153515.GA1003248@cmpxchg.org> In-Reply-To: From: "T.J. Mercier" Date: Thu, 20 Jul 2023 16:24:02 -0700 Message-ID: Subject: Re: [RFC PATCH 0/8] memory recharging for offline memcgs To: Tejun Heo Cc: Yosry Ahmed , Johannes Weiner , Andrew Morton , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , "Matthew Wilcox (Oracle)" , Zefan Li , Yu Zhao , Luis Chamberlain , Kees Cook , Iurii Zaikin , Greg Thelen , linux-kernel@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: mbagkak9dcmbzkka8b1jczupww7xbigr X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 44DFDC0008 X-HE-Tag: 1689895455-385786 X-HE-Meta: U2FsdGVkX19soCW4XaqA9Inqn+qZ+4dPq7e6dBwCozJPV/ApUgN7Fi0He/4d0NeF14KfTnJVrQJKOYQpObqBP7T5eSLnXv8+7thWKHmmohFvRHzf92fodjFZUWFdAF/5vdqxxTpnF4l8gU1VxawFA2tqUR8twamg0tITmxH7kHukA1gdE1d4tO0n7gynP6CNYgwtAOZHleWHKKYDj8O+y0LmO7b/shHj+LF1LSFH2UkaHErJbXPll758nl+SC+J+8ul3KNH1l4uMLp3ca7pYuCCMtg4cjJHAiO1AYmip9mf9dD6uaDJLi449+bw2K4De8mpJTc5e6RfwGUHwGneJzVOiJdm7qlnCPPHGSUeUECSUx5JPgrgy6Wo3kstPVcmQ7rXaET+w+t1PKCpemBZCIQBOLahA4cJ5kKL0LK6R6ny757KfvWmcuZrzOcsQx59gRhwhMNvrYh0wOFtlcv/E7U5lomfqH5JJvTNG0UpOEl+6zRhdtVRIjMWvtPn2kFpq+3/veoCGwyEFEn7Cx+ypyQN0FDsTvkXFKIt+raZqpxc1puNNQYe8Kb6nqct6+dy/aAzKqzn+NQX6WYyf+V8KA3k81q4LU7vPxT2xN0Ydtm4aMduVcQEXytHU/qv9gaLNo78OkI2/w71RKgdL3wbMCYuoQjMohnZfHtyGwlQUcBZsZyu6xCdqv1IhAdcJ30QvCnpZPs2A7h8IesXmPH8kLStBdiwXfMmzhgKgkuBmsoLvgtkq0Aqf1NMGTlWZXT3byF3+tvMJt8g47TmYsvyOVJVLjPuSGqd7wDejB4sB6bTxLdDJ4UmkPo6mvaiTF0NZBkWCoo8j4Z/dPE7YqUHu/TwpjPC3r1dMfntybRvWub2q7K9Y7IhrGlJIIIxKQJOq/hNmHV7Hcr5ekqK43YJtdtJHe6x/4iS7mUAEJ95Lhm75vsP9zHWTt1gxl5l9BWt0WnQF7LToZDOBCKz+peV qVJewLxy ipni5M7z+qvKUET2Wp1rTjfCBNuj42J3wlZVT8xwkYThhWcuWB0tDWeLGJiLpyoTlzrkwDxKwM0cyFfUthNOgc12WdstpVJ0u1AcCNaiTRNRooiJ37mzOdBGEDSlu+o+9hLN6ZKSX2aXeQwjSoW+GWKT6Lh8xXjDpMAmDQ6l7tbdKNS/u6LyvUE99bbKGgrrzMHFgUFGp4UXfRYAO2JxciWi2vlWH3BO1oJ46HAe34bqQQU83SRR+/yXf4Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Jul 20, 2023 at 3:31=E2=80=AFPM Tejun Heo wrote: > > Hello, > > On Thu, Jul 20, 2023 at 03:23:59PM -0700, Yosry Ahmed wrote: > > > On its own, AFAICS, I'm not sure the scope of problems it can actuall= y solve > > > is justifiably greater than what can be achieved with simple nesting. > > > > In our use case nesting is not a viable option. As I said, in a large > > fleet where a lot of different workloads are dynamically being > > scheduled on different machines, and where there is no way of knowing > > what resources are being shared among what workloads, and even if we > > do, it wouldn't be constant, it's very difficult to construct the > > hierarchy with nesting to keep the resources confined. > > Hmm... so, usually, the problems we see are resources that are persistent > across different instances of the same application as they may want to sh= are > large chunks of memory like on-memory cache. I get that machines get > different dynamic jobs but unrelated jobs usually don't share huge amount= of > memory at least in our case. The sharing across them comes down to things > like some common library pages which don't really account for much these > days. > This has also been my experience in terms of bytes of memory that are incorrectly charged (because they're charged to a zombie), but that is because memcg doesn't currently track the large shared allocations in my case (primarily dma-buf). The greater issue I've seen so far is the number of zombie cgroups that can accumulate over time. But my understanding is that both of these two problems are currently significant for Yosry's case.