From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A6BEBC25B5F for ; Mon, 6 May 2024 16:22:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 392216B0095; Mon, 6 May 2024 12:22:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 342876B0096; Mon, 6 May 2024 12:22:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 209A26B0098; Mon, 6 May 2024 12:22:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 032196B0095 for ; Mon, 6 May 2024 12:22:19 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 94E7880B1F for ; Mon, 6 May 2024 16:22:19 +0000 (UTC) X-FDA: 82088488398.11.1C8CAB2 Received: from out-184.mta1.migadu.com (out-184.mta1.migadu.com [95.215.58.184]) by imf29.hostedemail.com (Postfix) with ESMTP id C9CEA120011 for ; Mon, 6 May 2024 16:22:17 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=wKdUA9Ak; spf=pass (imf29.hostedemail.com: domain of shakeel.butt@linux.dev designates 95.215.58.184 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1715012538; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nALvQTtsxVLvBAYqOXvSZA/jQdEqOKOBP2on/4nFouA=; b=i9jqtHUmy1VIliUIp6UpEeyQpA4+pzSHhtDdVXRN6wMggWhEwPhmebqhjcmRGLSp5ach3i mejnxisncU7BKogCEHjhZy5XeIzSod98aQcLsenjvgol+0CSa6HMTnYsQN080W5Ir7GONQ 5F7vVjnd/2yPwwN2kqlt33silRNV9/k= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=wKdUA9Ak; spf=pass (imf29.hostedemail.com: domain of shakeel.butt@linux.dev designates 95.215.58.184 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1715012538; a=rsa-sha256; cv=none; b=8G8XvhKehaqcQ4VX/1hlWnc8RjlsbYSEG6CCEoSa1S6GmgXdROI2l+yL5+JHAPQc7IVkt5 jPBLMCZLyh3OS9ldmvR5iZ1E5fTXNnoSMGif6gM+9O/pIP6I3YbWD2UomvGg6+VTwGgiNq NPID37T2aZZ8TctgzbqpFlK/JR7nHeo= Date: Mon, 6 May 2024 09:22:11 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1715012535; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=nALvQTtsxVLvBAYqOXvSZA/jQdEqOKOBP2on/4nFouA=; b=wKdUA9AkInjNcG4oD11+YsIiuMvwDRcyDOrBFB+tws4xaHPMT5GEyIVnCqhYQomX6+Q8RG 0/bG7aJTnaiSyqsmhXq3FukgKGKjlh8cQNCiKmyZ7I7KjlvA9VLkx/xav4O7dGs7ILjZCO XSVu6YOTxp/pmTh8sosgvF5N948eKF8= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Shakeel Butt To: Jesper Dangaard Brouer Cc: Waiman Long , tj@kernel.org, hannes@cmpxchg.org, lizefan.x@bytedance.com, cgroups@vger.kernel.org, yosryahmed@google.com, netdev@vger.kernel.org, linux-mm@kvack.org, kernel-team@cloudflare.com, Arnaldo Carvalho de Melo , Sebastian Andrzej Siewior , Daniel Dao , Ivan Babrou , jr@cloudflare.com Subject: Re: [PATCH v1] cgroup/rstat: add cgroup_rstat_cpu_lock helpers and tracepoints Message-ID: References: <171457225108.4159924.12821205549807669839.stgit@firesoul> <30d64e25-561a-41c6-ab95-f0820248e9b6@redhat.com> <4a680b80-b296-4466-895a-13239b982c85@kernel.org> <203fdb35-f4cf-4754-9709-3c024eecade9@redhat.com> <42a6d218-206b-4f87-a8fa-ef42d107fb23@kernel.org> <4gdfgo3njmej7a42x6x6x4b6tm267xmrfwedis4mq7f4mypfc7@4egtwzrfqkhp> <55854a94-681e-4142-9160-98b22fa64d61@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <55854a94-681e-4142-9160-98b22fa64d61@kernel.org> X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: C9CEA120011 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: 9m43ar5ejpnnktxkedi7ikfa58jj7z7g X-HE-Tag: 1715012537-784699 X-HE-Meta: U2FsdGVkX19urh7swvJKXnKofm93drQTlNNbNbd7+6N+teo+bWqo8355E6AbsqXyVIHso5GR+c02BKdjrcOXkHMGKyqWOUmY86vVg/nukfQKBzsRnD5KqC5SwcnXHZnGGyrVRWAxyhHqeoA3rYjC/P/v2SIZPKqqRw1KAC1jb2nwfXMB3gHUUINXpT1cPvayZXDVeFpi5SBrLO+pnOsWHtFzeswNJ9K6SmUKiVAwpl0gT9JuecE8yS1u3fozFKEbiZRjtY30J5Msn/GXvdjv/Nv9q2AhX+PoQDnFy8K0FZRjJLGIuY6o7rCADOgOduNVgdSWf7DUGHYsKE6ldYP7NcW58hwwJd2P0PmK/NjTMkAF9pclimXRLiOQGLm8p/XKOaPzR1j4psHP0Wijz1LAU26ENreLo5GXeLpq/L7Jkf3P3slMafgcOqyahLhHFYtA287jH7Bm7hcTNb6XCwUhGZhZb+Yj73U5GI3Hzk1HXClwBlNQmUMH5Fl33+quWSzp8OVLc0NZrVHYg8GsCm6amC/1qFURLB/GcFz6wfagQnUA8Q4S97HtoNzjZ1Jwcy5cPuFQ4GutlTaJDwVY89U60TltQwnywyoxaOq2bngnSyOrdNVsshgwsm7YDh6WLIxl3dAJtTZNRe1OGIsHi614HLOrLIA/ML9/SisMREDPAhBp6vispKSo1gkWTmbT08CLMt1lnbOITYWaSMm75DxSFvIQSowhc6SzFHJbvHKMr2QFYyF7WQo4omww3zbTW0zcXkDnzP+P0L3FjCfw+92zxcN/XAtwjA4PxDx/auig1ioUBh45JQR/Nx60M14VV1vNH2HR0HknrEw0g0SNBOokVjzACCWVfFD9MrYuAmPq4/uV3AepOrrXLOYuBgQn9VPHn0ztTlsMvuqVdfQu74OxxfvlzmezPLq2gPRcaUmi3i+SLeW4jGW5RZ/xJTdoAf2UV0VgWmVFZZLx+qMOdJl LTWJnhNW vHy6Ub4XaNuJgchXHHrzwUK16VTdv8UmoA0zxqdVsEvgSn2tkCteAn22KD+c53uCs6RTPVvsSwwn8omz7rrGXLzF/8GhdJ9SEBYo4c70cFupl40N2QYUvSbFkQnOnm7r3LyXVmVdkN/n0VPU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000002, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, May 06, 2024 at 02:03:47PM +0200, Jesper Dangaard Brouer wrote: > > > On 03/05/2024 21.18, Shakeel Butt wrote: [...] > > > > Hmm 128 usec is actually unexpectedly high. > > > How does the cgroup hierarchy on your system looks like? > I didn't design this, so hopefully my co-workers can help me out here? (To > @Daniel or @Jon) > > My low level view is that, there are 17 top-level directories in > /sys/fs/cgroup/. > There are 649 cgroups (counting occurrence of memory.stat). > There are two directories that contain the major part. > - /sys/fs/cgroup/system.slice = 379 > - /sys/fs/cgroup/production.slice = 233 > - (production.slice have directory two levels) > - remaining 37 > > We are open to changing this if you have any advice? > (@Daniel and @Jon are actually working on restructuring this) > > > How many cgroups have actual workloads running? > Do you have a command line trick to determine this? > The rstat infra maintains a per-cpu cgroup update tree to only flush stats of cgroups which have seen updates. So, even if you have large number of cgroups but the workload is active in small number of cgroups, the update tree should be much smaller. That is the reason I asked these questions. I don't have any advise yet. At the I am trying to understand the usage and then hopefully work on optimizing those. > > > Can the network softirqs run on any cpus or smaller > > set of cpus? I am assuming these softirqs are processing packets from > > any or all cgroups and thus have larger cgroup update tree. > > Softirq and specifically NET_RX is running half of the cores (e.g. 64). > (I'm looking at restructuring this allocation) > > > I wonder if > > you comment out MEMCG_SOCK stat update and still see the same holding > > time. > > > > It doesn't look like MEMCG_SOCK is used. > > I deduct you are asking: > - What is the update count for different types of mod_memcg_state() calls? > > // Dumped via BTF info > enum memcg_stat_item { > MEMCG_SWAP = 43, > MEMCG_SOCK = 44, > MEMCG_PERCPU_B = 45, > MEMCG_VMALLOC = 46, > MEMCG_KMEM = 47, > MEMCG_ZSWAP_B = 48, > MEMCG_ZSWAPPED = 49, > MEMCG_NR_STAT = 50, > }; > > sudo bpftrace -e 'kfunc:vmlinux:__mod_memcg_state{@[args->idx]=count()} > END{printf("\nEND time elapsed: %d sec\n", elapsed / 1000000000);}' > Attaching 2 probes... > ^C > END time elapsed: 99 sec > > @[45]: 17996 > @[46]: 18603 > @[43]: 61858 > @[47]: 21398919 > > It seems clear that MEMCG_KMEM = 47 is the main "user". > - 21398919/99 = 216150 calls per sec > > Could someone explain to me what this MEMCG_KMEM is used for? > MEMCG_KMEM is the kernel memory charged to a cgroup. It also contains the untyped kernel memory which is not included in kernel_stack, pagetables, percpu, vmalloc, slab e.t.c. The reason I asked about MEMCG_SOCK was that it might be causing larger update trees (more cgroups) on CPUs processing the NET_RX. Anyways did the mutex change helped your production workload regarding latencies?