From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A8C02C197BF for ; Thu, 27 Feb 2025 23:44:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4706B6B0095; Thu, 27 Feb 2025 18:44:19 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 41FA26B0096; Thu, 27 Feb 2025 18:44:19 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2E7E3280001; Thu, 27 Feb 2025 18:44:19 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 11E2D6B0095 for ; Thu, 27 Feb 2025 18:44:19 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id B08B852423 for ; Thu, 27 Feb 2025 23:44:18 +0000 (UTC) X-FDA: 83167355796.18.907C5CF Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) by imf05.hostedemail.com (Postfix) with ESMTP id BBA9D100003 for ; Thu, 27 Feb 2025 23:44:16 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=cqfQQs2t; spf=pass (imf05.hostedemail.com: domain of inwardvessel@gmail.com designates 209.85.214.179 as permitted sender) smtp.mailfrom=inwardvessel@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740699856; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=HAUpca2R3d0U2hR0ll37Jv2ikqDEb2N9yLFXzqEDzzM=; b=Ym2NCkLslsiEXKHilwerNwRDYr329X0vMeNo6lwaSgy99D9JkVkM75yEnIAcjooLZ9RGCo aOnZ9t9o7RCk23LLIGVkhUDDVaGomBC0eLKQ0OdIuXam39lBH4dJzQtjpk0qWd2g8P1ixT TywgOz77gZ6wjwv4Of9sLNAJPZNzWEQ= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=cqfQQs2t; spf=pass (imf05.hostedemail.com: domain of inwardvessel@gmail.com designates 209.85.214.179 as permitted sender) smtp.mailfrom=inwardvessel@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740699856; a=rsa-sha256; cv=none; b=Uuvhw3V8gkhHIHrkQeiuSxRhTNw3+i0v3S/8o5dOZJDEsYqsMEgJnFPfakaRbmTSh4Wz/M iqN2zksk0/7lCKrE7fITov1FAZKJA05qognpvS8oWtfRdVHBogIpWuKjOrXwoXldgN1nNo 9j8dCJEVYP37zqZd+VpGvMLj7Tzt7BI= Received: by mail-pl1-f179.google.com with SMTP id d9443c01a7336-221206dbd7eso31948775ad.2 for ; Thu, 27 Feb 2025 15:44:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1740699856; x=1741304656; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=HAUpca2R3d0U2hR0ll37Jv2ikqDEb2N9yLFXzqEDzzM=; b=cqfQQs2txsjzzS2HaYsuUXolbbMs+4E7qIFeJXOFC09mmjcwAzDYoolLpg9b/hD8bc l71BAI7o1lUTe0IPripvcWJwPOWX9WNDtRu9VaJScySP7QI0CJIIPE2bs4FYthOr2YVg ihVphFuFmolVSnlO1o5YsdxKUjD/02mTWk1NCmeT0PjwRQTtj6I+3+t6erNQcB5M8LRI yA1PtMdmapqfCEQOo5e/pipUjyZbgSWvKzeVZ2pLw0uAJtrou2IamIVxZAzhTqNKDrsi pRPJ60MGNghHXpnbJ0AoFzsz9smD5VqT9v3ZhwKOZi8+DpzLPMEKVIw7835ADHh1pR/0 5OJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740699856; x=1741304656; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=HAUpca2R3d0U2hR0ll37Jv2ikqDEb2N9yLFXzqEDzzM=; b=exwI6rVtugf5uy7PJAcqpecdISKaNW+Tri/qs9DEI80sl/Ly2lprBXyc91P7w5i8rY wY4XOhkcdxS9m6uW5cpoJvLg9s094gOWRKUnLHDF2g9aUAE1xxJBrscNGbQr8W31a3Bq cJNycXE35ko0OF8IGDUEwqm0tfzIVERAsnEcoQbbivDFv6V9UtHJsvuvjDYwxxuzBcDI Q4J1I2rzu7fJ8IgTn2viZpbPGX2ePy1OutJLJWoz42YfI5F4nuJDDEmxAQ80vtOLH3si yerjnnHuL1Qz5Lal2YVAdEg6xdMbLq4X7hMfcceiWqWaftfxVj5ePj7yjh6iVh0Wqf/T ZaGw== X-Forwarded-Encrypted: i=1; AJvYcCV0LCk3rz3yvcNw0kHxl9GLD5BmBo7jgN66+rnfgtQwD4gBAGO1LAmJhDXHL0/h6N7bOI69PflUMQ==@kvack.org X-Gm-Message-State: AOJu0YxYCLJJLBenTX/KWi/xHkUh/Qq0zVsBdZM0U/FHbm+rGscm85AV aS56QYathwZjpvlAcgW+c7z9fRiGNq6IqJi8EOtXPLVv9tXK4BzC X-Gm-Gg: ASbGncsE0bUUfvqXrBxD+WnIjWU1XsnQmTr9CQ6siO0KwCrMXfCa1ED9LOmUs3iBihj r4d0trp/x/w6sMZFzV0F3gkksF0Cgbnenp45Q/U+Y3zSYdmBGZgV0xKs1W4k0nQpJ6ZB4Yc8fry bHMHM3NIQoODixlTZTh0jHuYMMZMl1G1X5i4/YxGMGfTiwDnl+5qtuXuBhZaLN/3ybmvfj5lMMY R6QxCQVQellTX0YlKjDZlVALxT8KTE8o7b9hiKOlpNBnbJy9tTYfB2sgHJBlIj8bXbYhzaWnROq CVlogFohhjNdNxcgvpDJVIpavHQnVF3/qxgDm/z6sX6C1Y545Q//jZvIBr8= X-Google-Smtp-Source: AGHT+IH7yoAmJXvNFYIfB3be6p4ODNNjXp8tXvLrOZ8JAxx0oD/Lz+Hx7TMRdvkSQJypw7aCuRdCVg== X-Received: by 2002:a17:90b:3b8e:b0:2fa:228d:5b03 with SMTP id 98e67ed59e1d1-2febab7a2e4mr1771255a91.19.1740699855619; Thu, 27 Feb 2025 15:44:15 -0800 (PST) Received: from ?IPV6:2a03:83e0:1151:15:8fb1:8b78:c871:aacd? ([2620:10d:c090:500::4:4d60]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2fe825d2b85sm4455886a91.26.2025.02.27.15.44.13 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 27 Feb 2025 15:44:15 -0800 (PST) Message-ID: Date: Thu, 27 Feb 2025 15:44:12 -0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 00/11] cgroup: separate rstat trees To: Tejun Heo Cc: shakeel.butt@linux.dev, mhocko@kernel.org, hannes@cmpxchg.org, yosryahmed@google.com, akpm@linux-foundation.org, linux-mm@kvack.org, cgroups@vger.kernel.org, kernel-team@meta.com References: <20250218031448.46951-1-inwardvessel@gmail.com> Content-Language: en-US From: JP Kobryn In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Queue-Id: BBA9D100003 X-Rspamd-Server: rspam09 X-Stat-Signature: 7c5msw3chxyt69bgoxqstk7apisic886 X-HE-Tag: 1740699856-673461 X-HE-Meta: U2FsdGVkX199l8EvvuMmJEoF7LnuJ5A87zu7Z77BM7u4M6YrpWB4xj+fKfrw/KUrS6pwT2oBHrTvznb0AB00KUemUqpNhsUY5MEDtzEhpFpCVkxy+tX6/2T6riv3zgbVwDGM8fyflkUKxwfDGiYZ1UIuRGJtotkRqgal3QRmAkK27F4NwIS2085r/Nemwnzd/fstvfMm9huxwPaBDfazJxiojm7JRrSZGCn7/J/ThX+UrkmxMCqxtfFbXlAN8+AgVwyillmbQLUpaBOQXYJR8ywQZ9T3eILaft2C/tTw0w0o469RkQP3JalIOBOTbQUcbZNbZDj+3x9OqMn+nZAUN9oCGhiE41Ufn253SJZO4WPeyTddUX7L5Nir8UI557e1SQTFtvdmYrBPgSap/r7cv0xDiRf6ULJUGlHeSkKQW8mm6gzPEOd9vbdFJNLxW4oceXtk7o/OOBZIBxi/3T8smX3xENcA9mdrdy3ojyl6HjDf8PSaAiE9b0m6ceXT2DDVzYHWnT7UV/oJji/xki7FmldP/uyWZA5n/t4MUtXmZoiyDruzJOzEE/OBkSLu5HHyJ86ixZw7j42s/mJiNLip5oWbBVp+CAe3F3Hmucha04WZByfXnpr5XILGvHlBcJ9SI2mXUW7+FtIp4lipfbp0v3oC3+RasfEEnTXm4M+Fu8Tbq31HW1OzRW5tKxcTEHTLKdwoKNaaRhONjnil+fMSSRXs/fR8eWQUGw2wBAX/hhuW/rDJSXhQT6ngB3uXurFvktcRAHWCbKjPN7XqzecdR49C3wz7zUq4yKico4xY0HXH51ZsA/7U2NZjWx2F4UTH+BzwHZmOyrzwJ+0MGMm8hjoqswAN1/N7tAMJ6pSyZ9rWfycio+xBZsTHfUXigOxXEZJWDPs/yGT1ve18kGMoxsffl4Nd+r/5QsvTy8iDzG1dkjFXRqRSeMBLiaJuuRPOCfryng96k+NaaLdofi/ itAaUMH+ lYw5SFjwxv+i2li2VMlTfuZB6QqILbFY1PW+pUv8CsSfOcn0HJN/Rj0OP90KvHruiG5Ay3gyN3QjSvUIKBZ62k1kehyCswzYKhwrul1jogKF/7ce0Oi7e7fk/haTv8qkeGZXsvzMuKLizcxGTfumA3fvzzob4HOcakWsHHmXCMXidJdEWGfDBkB/l95CVnHjKuX/lYtS5fUgJxMt8UCNvgIHU2hn2tvxxDJzm5pycHiXLZRHKnPl3nfKTwmghNa+IG2xTVSKQsf4uSoOL9kFxlfzIBsg+PHPWs9A+t7Yl1s+QhC0WPilfdhjfUAwNtIAx99uH+uzGKcmYP3BNgtggEcj0aDHiZOmJhklxBZEbai8XtvbJ7ThSJf7VEyZDdDqtpjDKygtNDcOXtWrk9EAWzLe4nWMBhMjzpDgpWJDZSq635oobhJJT1L0/Sq0Khr8thtO/T5y1/DULKxI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000098, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2/20/25 7:51 AM, Tejun Heo wrote: > Hello, > > On Mon, Feb 17, 2025 at 07:14:37PM -0800, JP Kobryn wrote: > ... >> The first experiment consisted of a parent cgroup with memory.swap.max=0 >> and memory.max=1G. On a 52-cpu machine, 26 child cgroups were created and >> within each child cgroup a process was spawned to encourage the updating of >> memory cgroup stats by creating and then reading a file of size 1T >> (encouraging reclaim). These 26 tasks were run in parallel. While this was >> going on, a custom program was used to open cpu.stat file of the parent >> cgroup, read the entire file 1M times, then close it. The perf report for >> the task performing the reading showed that most of the cycles (42%) were >> spent on the function mem_cgroup_css_rstat_flush() of the control side. It >> also showed a smaller but significant number of cycles spent in >> __blkcg_rstat_flush. The perf report for patched kernel differed in that no >> cycles were spent in these functions. Instead most cycles were spent on >> cgroup_base_stat_flush(). Aside from the perf reports, the amount of time >> spent running the program performing the reading of cpu.stats showed a gain >> when comparing the control to the experimental kernel.The time in kernel >> mode was reduced. >> >> before: >> real 0m18.449s >> user 0m0.209s >> sys 0m18.165s >> >> after: >> real 0m6.080s >> user 0m0.170s >> sys 0m5.890s >> >> Another experiment on the same host was setup using a parent cgroup with >> two child cgroups. The same swap and memory max were used as the previous >> experiment. In the two child cgroups, kernel builds were done in parallel, >> each using "-j 20". The program from the previous experiment was used to >> perform 1M reads of the parent cpu.stat file. The perf comparison showed >> similar results as the previous experiment. For the control side, a >> majority of cycles (42%) on mem_cgroup_css_rstat_flush() and significant >> cycles in __blkcg_rstat_flush(). On the experimental side, most cycles were >> spent on cgroup_base_stat_flush() and no cycles were spent flushing memory >> or io. As for the time taken by the program reading cpu.stat, measurements >> are shown below. >> >> before: >> real 0m17.223s >> user 0m0.259s >> sys 0m16.871s >> >> after: >> real 0m6.498s >> user 0m0.237s >> sys 0m6.220s >> >> For the final experiment, perf events were recorded during a kernel build >> with the same host and cgroup setup. The builds took place in the child >> node. Control and experimental sides both showed similar in cycles spent >> on cgroup_rstat_updated() and appeard insignificant compared among the >> events recorded with the workload. > > One of the reasons why the original design used one rstat tree is because > readers, in addition to writers, can often be correlated too - e.g. You'd > often have periodic monitoring tools which poll all the major stat files > periodically. Splitting the trees will likely make those at least a bit > worse. Can you test how much worse that'd be? ie. Repeat the above tests but > read all the major stat files - cgroup.stat, cpu.stat, memory.stat and > io.stat. Sure. I changed the experiment to read all of these files. It still showed an improvement in performance. You can see the details in v2 [0] which I sent out earlier today. [0] https://lore.kernel.org/all/20250227215543.49928-1-inwardvessel@gmail.com/ > > Thanks. >