From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B10F0C3DA49 for ; Fri, 26 Jul 2024 16:51:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 283616B0093; Fri, 26 Jul 2024 12:51:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2332C6B0096; Fri, 26 Jul 2024 12:51:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0FB206B0098; Fri, 26 Jul 2024 12:51:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id E160F6B0093 for ; Fri, 26 Jul 2024 12:51:05 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 8F4B2A6C95 for ; Fri, 26 Jul 2024 16:51:05 +0000 (UTC) X-FDA: 82382493690.14.721BA8D Received: from mail-yb1-f176.google.com (mail-yb1-f176.google.com [209.85.219.176]) by imf09.hostedemail.com (Postfix) with ESMTP id B7BA214000D for ; Fri, 26 Jul 2024 16:51:03 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=bxUvk92g; spf=pass (imf09.hostedemail.com: domain of tjmercier@google.com designates 209.85.219.176 as permitted sender) smtp.mailfrom=tjmercier@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722012661; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=4BxA0+YU8HmhWtQ4ARHeqW0BfBOAZJ5OD2VLt2P+zSs=; b=TdTRrPYSvVxJvbFk6t+9SbVymmzMy+PDq8jdR2Q3zY0vA0UT19850sFHyCdKQdY1LmQEzX gv/66yqLQxWz/jkfi/yQLhh0cONfpK4Pe4Dc0YCFUNVEbd67ceKJ7ochI68HrVLWzPEzdu wBjeNooKmVum0l/Sv3Ff7F/egWOIMXQ= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=bxUvk92g; spf=pass (imf09.hostedemail.com: domain of tjmercier@google.com designates 209.85.219.176 as permitted sender) smtp.mailfrom=tjmercier@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722012661; a=rsa-sha256; cv=none; b=5WgMLuhEnbl+Ov3+80hzUoLAxP31hpGx6Vc5fHuxMDD50G+7Xr/tMg4biigPt2Rc+nopWq YF1htFOqlabXpJHjTMqDPQ6oQJEPxFWQ87X3MXWgk8yQahGeA9NUKsiQlFNXedDPt/erkr 8uFYwEQ6dTvLQ4gciRrLge84aZ5kj5s= Received: by mail-yb1-f176.google.com with SMTP id 3f1490d57ef6-e0b10e8b6b7so2116468276.2 for ; Fri, 26 Jul 2024 09:51:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1722012663; x=1722617463; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=4BxA0+YU8HmhWtQ4ARHeqW0BfBOAZJ5OD2VLt2P+zSs=; b=bxUvk92g1V3D5AGY1P+V9c2NtOyDFTp2MTPy+NygfnvSNo0V5pPYtjGYzijGxsZaLj +XWiSIbf8wzT7272YG+VFDIXNbGLsUVx9969cJHG4uvUblsXs5AOsYSpO29X4XZu4WiB NZ2SSjbnFPSIlxx3Ca5PBBek8L905CvF6tx5dhVyRkHU+Idy1GFeQZ3Ph7dsOzVfwz4j XAER3umI5I9tdygrzRnMEPPIFuDAQegMvhQHc22KnCK2ONUAIP8bEowAARFTHRglDNMM hrXbrkyLy8RWtVBKrSIqOWVmFnHuPCH/fE9P6mxVMKIRtYM/k7hEEwPBSx7YY2ODwEnh VAPw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722012663; x=1722617463; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4BxA0+YU8HmhWtQ4ARHeqW0BfBOAZJ5OD2VLt2P+zSs=; b=OBXPrr4ELADVVeyrc14ls+QTYSUAxpdqiuLSytkdf1oQkTUrPDDPkaZQzCRsgwi8+U 9WLN8V13/SSnZxiNWzfyXNuO6B+Kqmk8z90Fs7ef7AbXZUekFyUW/bo5jgkMd9VXiWsg TAd5hM/cQIHoJJWSz9gsUj/QPIHPS9Tf9y/dEn8yiaCkkHCGUwYM3XmcuN8FxHg+gIJq lHKDZZ12FoagbIQT09E7WqUh1KPWqODw3ma1CeyORW6w3rvAgo6W5jFZXvUmzAogNKmS 2p24QXKKs8r3p4ahtfqD3gbs/+SR0ngKA/8t4O6o/YvF6+tsXeDhafcP+0BICz/aAmOw Twyg== X-Forwarded-Encrypted: i=1; AJvYcCWO3gaTCyT9eSx/rNzuScw9usA1nkrj+134QONSvAo3PapHdMnWvBx1qKdmqjiMXKpmZsRdM3cu2aUe6fn8lzPJqWk= X-Gm-Message-State: AOJu0YwqQjiGtvHNovS1126n0jz0j7oS7hnqCJVsw5V1AE7pekIQo3I8 hm5y28WrcDARreNPiQtLU8yPaGUSNsx6QWmaDbnLqu/u9teaZuXTHbcDaTK+m8JvZBj1K5SrFIS g8AUfGavItyV1tnlBjwfsZn36pECF4UesYA8T X-Google-Smtp-Source: AGHT+IHWxj7f6PtHUxoxIKBx/m3q3z5LDExloc7S9hOF5q1/D5McAnn+zGMWjLhVQoAOHESdBoLl/CjGrncZPmFt6Ck= X-Received: by 2002:a05:6902:2402:b0:e0b:4844:f927 with SMTP id 3f1490d57ef6-e0b545f4ddcmr417745276.49.1722012662490; Fri, 26 Jul 2024 09:51:02 -0700 (PDT) MIME-Version: 1.0 References: <20240722225306.1494878-1-shakeel.butt@linux.dev> In-Reply-To: From: "T.J. Mercier" Date: Fri, 26 Jul 2024 09:50:49 -0700 Message-ID: Subject: Re: [RFC PATCH] memcg: expose children memory usage for root To: Yosry Ahmed Cc: Shakeel Butt , Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Greg Thelen , Facebook Kernel Team , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: wkanzg6nri8rmydsngnepkswjoo3d35r X-Rspamd-Queue-Id: B7BA214000D X-Rspamd-Server: rspam11 X-HE-Tag: 1722012663-284657 X-HE-Meta: U2FsdGVkX1/m79h014P8S/VNPkITXy/gkdoLe16ZmNQMH8P6bIlTHc+o5/kyPahk4IEWwXO60gApwvTB+WA+BqaEoY2N/citW/Z8FPvBhwNk7sTQym/a/DX5p4s9iYWs/6bIc69HDGMtRJactsCBOnOsAhNZTXDExhzmSPuT4nN/BFIlHL8pNoNmmYp7mdqjdRrXow8VIW0GLlD1hW+c2/+1OWETN8Xg8imQ915C/WQyfgO8my6TIZsRfL0SPGcR0RRViXD1v6WBcBHb1msTHbAO94vKihA9DeCyZQlMCk+w9n3tIuhEkW8S2jDcdJ+bLrNwbFTnz2HunwwqBusD/xUj43i3635GDCgL09b9yVtjbJw1r3yVXDOtfTtFX3+myXTU0cq576joXfEFzH7JhQo79Z4qZv2teRfg9WmvcdS2zNHunUv4pbTT5X4+L/d5LeGo3/Vbggy3MOuWvWRzQQnkXQyfg89wfRPHrybGE5nRrH+7PBK4yL4LGc1Y81MlO6JVefW5Iix5yjAE9YcyowltI7K8m0SGBDbvUEt3SXeVFyeNrZhaluQPVrZk5lHRNzwLQbv9tr29FRbaLISHb597514K7MTlyi1GAsLH+Z1op2+pUwoYY2jpld3ZoOTSSeIzsFLnHEG3H1Z5Fa9Zf//Ca0Om9+gZDUpYZMWZJqidQ9xz/37F2NCuFcOlKlOhv3a+ukDp0QqZl4IlHHM3OUc1UsprxyyB5JMGXaGrDpAwJdrMSTnrDaDlzjzMCW5FWeRxT84JAwZucJhaBH8QWcxm8UYRuqWkTmzWdudKPJVQ621V0otOWf5KNyNl+2oYAMtb9d6DEqForRHmhPxUWqozsDxtUpOmzI7jycuc43Vx+AEPbQTHPrsccwpuAyNUfS0huR4ZRq3FOLS3EH4ArXthXaJ+i+AEdpjiq25/aLlrZRStzY2rndiIigEaeiCjaH9hUbqOT8wUqjr7MOo FDS2QYaJ Pufb4t3Z6kgYoDkivWAdXspJHB2a1yxiFgfDraeAExihP+35Hmwu//ogQI3daWoDp2copDFVr/12p0XzzmNkEQ9UTA6CWbTp+kuMiZ6mymg8RiAlGNNGNV9Gw12MlkSdWWsEWjfpIjBSWTexfP/XuoEDaV4yIMtCoBrxdbRgAMqjwX6bdbFTa2YXIxuBeFkIBipE/ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Jul 26, 2024 at 9:26=E2=80=AFAM Yosry Ahmed = wrote: > > On Fri, Jul 26, 2024 at 8:48=E2=80=AFAM Shakeel Butt wrote: > > > > On Thu, Jul 25, 2024 at 04:20:45PM GMT, Yosry Ahmed wrote: > > > On Mon, Jul 22, 2024 at 3:53=E2=80=AFPM Shakeel Butt wrote: > > > > > > > > Linux kernel does not expose memory.current on the root memcg and t= here > > > > are applications which have to traverse all the top level memcgs to > > > > calculate the total memory charged in the system. This is more expe= nsive > > > > (directory traversal and multiple open and reads) and is racy on a = busy > > > > machine. As the kernel already have the needed information i.e. roo= t's > > > > memory.current, why not expose that? > > > > > > > > However root's memory.current will have a different semantics than = the > > > > non-root's memory.current as the kernel skips the charging for root= , so > > > > maybe it is better to have a different named interface for the root= . > > > > Something like memory.children_usage only for root memcg. > > > > > > > > Now there is still a question that why the kernel does not expose > > > > memory.current for the root. The historical reason was that the mem= cg > > > > charging was expensice and to provide the users to bypass the memcg > > > > charging by letting them run in the root. However do we still want = to > > > > have this exception today? What is stopping us to start charging th= e > > > > root memcg as well. Of course the root will not have limits but the > > > > allocations will go through memcg charging and then the memory.curr= ent > > > > of root and non-root will have the same semantics. > > > > > > > > This is an RFC to start a discussion on memcg charging for root. > > > > > > I vaguely remember when running some netperf tests (tcp_rr?) in a > > > cgroup that the performance decreases considerably with every level > > > down the hierarchy. I am assuming that charging was a part of the > > > reason. If that's the case, charging the root will be similar to > > > moving all workloads one level down the hierarchy in terms of chargin= g > > > overhead. > > > > No, the workloads running in non-root memcgs will not see any > > difference. Only the workloads running in root will see charging > > overhead. > > Oh yeah we already charge the root's page counters hierarchically in > the upstream kernel, we just do not charge them if the origin of the > charge is the root itself. > > We also have workloads that iterate top-level memcgs to calculate the > total charged memory, so memory.children_usage for the root memcg > would help. > > As for memory.current, do you have any data about how much memory is > charged to the root itself? Yeah I wonder if we'd be able to see any significant regressions for stuff that lives there today if we were to start charging it. I can try running a test with Android next week. I guess try_charge() is the main thing that would need to change to allow root charges? > We think of the memory charged to the root > as system overhead, while the memory charged to top-level memcgs > isn't. > > So basically total_memory - root::memory.children_usage would be a > fast way to get a rough estimation of system overhead. The same would > not apply for total_memory - root::memory.current if I understand > correctly. > On Fri, Jul 26, 2024 at 9:26=E2=80=AFAM Yosry Ahmed = wrote: > > On Fri, Jul 26, 2024 at 8:48=E2=80=AFAM Shakeel Butt wrote: > > > > On Thu, Jul 25, 2024 at 04:20:45PM GMT, Yosry Ahmed wrote: > > > On Mon, Jul 22, 2024 at 3:53=E2=80=AFPM Shakeel Butt wrote: > > > > > > > > Linux kernel does not expose memory.current on the root memcg and t= here > > > > are applications which have to traverse all the top level memcgs to > > > > calculate the total memory charged in the system. This is more expe= nsive > > > > (directory traversal and multiple open and reads) and is racy on a = busy > > > > machine. As the kernel already have the needed information i.e. roo= t's > > > > memory.current, why not expose that? > > > > > > > > However root's memory.current will have a different semantics than = the > > > > non-root's memory.current as the kernel skips the charging for root= , so > > > > maybe it is better to have a different named interface for the root= . > > > > Something like memory.children_usage only for root memcg. > > > > > > > > Now there is still a question that why the kernel does not expose > > > > memory.current for the root. The historical reason was that the mem= cg > > > > charging was expensice and to provide the users to bypass the memcg > > > > charging by letting them run in the root. However do we still want = to > > > > have this exception today? What is stopping us to start charging th= e > > > > root memcg as well. Of course the root will not have limits but the > > > > allocations will go through memcg charging and then the memory.curr= ent > > > > of root and non-root will have the same semantics. > > > > > > > > This is an RFC to start a discussion on memcg charging for root. > > > > > > I vaguely remember when running some netperf tests (tcp_rr?) in a > > > cgroup that the performance decreases considerably with every level > > > down the hierarchy. I am assuming that charging was a part of the > > > reason. If that's the case, charging the root will be similar to > > > moving all workloads one level down the hierarchy in terms of chargin= g > > > overhead. > > > > No, the workloads running in non-root memcgs will not see any > > difference. Only the workloads running in root will see charging > > overhead. > > Oh yeah we already charge the root's page counters hierarchically in > the upstream kernel, we just do not charge them if the origin of the > charge is the root itself. > > We also have workloads that iterate top-level memcgs to calculate the > total charged memory, so memory.children_usage for the root memcg > would help. > > As for memory.current, do you have any data about how much memory is > charged to the root itself? We think of the memory charged to the root > as system overhead, while the memory charged to top-level memcgs > isn't. > > So basically total_memory - root::memory.children_usage would be a > fast way to get a rough estimation of system overhead. The same would > not apply for total_memory - root::memory.current if I understand > correctly. >