From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F06C7C4167B for ; Tue, 28 Nov 2023 09:32:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7198B6B020F; Tue, 28 Nov 2023 04:32:55 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6A1CE6B0252; Tue, 28 Nov 2023 04:32:55 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 542AD6B0253; Tue, 28 Nov 2023 04:32:55 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 3FCAF6B020F for ; Tue, 28 Nov 2023 04:32:55 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id D773F80162 for ; Tue, 28 Nov 2023 09:32:54 +0000 (UTC) X-FDA: 81506848668.23.792A4A9 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by imf18.hostedemail.com (Postfix) with ESMTP id C30F21C001C for ; Tue, 28 Nov 2023 09:32:52 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=none; spf=pass (imf18.hostedemail.com: domain of mhocko@suse.com designates 195.135.223.130 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701163973; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=SUfdjXU8U2ZVgO4/Rf+STIoyxs7nZXqd4JSrCeay21Y=; b=qQDAd3PEPHmEBgSBlItjqzKAp5EQqFnjPRDzNAwWKOZo8FbEdHLbjOv15HdGNx7Aj/Shoa NQPYZLvGi9IwHwgYoYeVtvk8FcaW5AP+TW4CgSz9D/Y0W4OTqNXDAvVkZTrZaeTRf38G2F fy41Ag61s3KO9diiphRqVDfN57fUjnE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701163973; a=rsa-sha256; cv=none; b=6vCEUx9s8cXg0HtBGLdD9z+BeMUllElE3wGWGjNCrwMnPEqWToxBiQMpan/TSPx++Q/W9p eaVcxsZf/g8Zmv1weVcQcOzDiHe7SPUx99AMCYghSV31nebITLc8oOw9Bug7wcg1IK0OdD 92wEWly4nM3qjO0SSOBpBcJoNlza6fM= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=none; spf=pass (imf18.hostedemail.com: domain of mhocko@suse.com designates 195.135.223.130 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 211EB2198D; Tue, 28 Nov 2023 09:32:51 +0000 (UTC) Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id ED9751343E; Tue, 28 Nov 2023 09:32:50 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id ALBEMsKzZWWUYAAAD6G6ig (envelope-from ); Tue, 28 Nov 2023 09:32:50 +0000 Date: Tue, 28 Nov 2023 10:32:50 +0100 From: Michal Hocko To: Dmitry Rokosov Cc: rostedt@goodmis.org, mhiramat@kernel.org, hannes@cmpxchg.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, akpm@linux-foundation.org, kernel@sberdevices.ru, rockosov@gmail.com, cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org Subject: Re: [PATCH v3 2/2] mm: memcg: introduce new event to trace shrink_memcg Message-ID: References: <20231123193937.11628-1-ddrokosov@salutedevices.com> <20231123193937.11628-3-ddrokosov@salutedevices.com> <20231127113644.btg2xrcpjhq4cdgu@CAB-WSD-L081021> <20231127161637.5eqxk7xjhhyr5tj4@CAB-WSD-L081021> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20231127161637.5eqxk7xjhhyr5tj4@CAB-WSD-L081021> X-Rspamd-Queue-Id: C30F21C001C X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: gj497p9ck37ddg3gsujbpem919ku5r36 X-HE-Tag: 1701163972-835089 X-HE-Meta: U2FsdGVkX19gfs17DMvKDbggLDiRbEA/k1LrrPCNUTS3n6JLQ+KW8mF/57lSvnVvsi437j6tYHoexH3+kwpeQGqc0OTt0DH5areJ6hnrohbtGbPPXybdKpsV3ARI0ms8//kYED9Ajz9pCRJKTxqpmrR2gUuXpdacLpa92BMJCne/7x4zf2GcinTghFM20GwmXK7lcZsSXwkWF1Qj1oQ6H7vX7u8eu6nsBK4p/8RBBgwSpbyhr2KtVUdewIh8Dli2KKbY/3laHnMQEfGR+1Ej9Uh5gg7Ef34XW2597YOP1wMBhhCOL9uhBOxAI7S81FtyHQn1a3EKlOLETMJWlcOrTmW8YwAdBTLS26r23M0bN+v2/R3e3JNS1qxR23dRsgHasn4LOJHu9D1gJxAa9CxUFkYibDN4kDsLQ5bFbIfSaJsP7FSVEw+1s3qSYX++g/J5lzD+nwlh+rbdG2EGKHaQjfEvXYRszDAkpQLmOfjXG4lwUJXtPl6ZQwssZX3R/xDlXq/r2XZxf5okh1ViK/wOyzPvjOZJpm370r+For2UgACG7Tdi2o9iLiug20zSixZ4kPORY5I9EpqOr0GgWdvrX+GbbjuZjXpvSFqq8PUm8hQBfvqsHggTRSEIDI4aGdWKCXdsbVW1ggTj6OftygFrQtJLkIgtgTVAxVeUQlV5m2K6PcG8VIYAw78uHozBVRkbsGc3bB7ZtQkkArD+37hA9vw1zf6Moq7wQWDKSPYhgEgd9aOwRNi9sg7t3GOWQgVXYJaOhWa5RAYskJFNhYGBcYHitCUv0ptoC1EBol9vxIOy/3xGqKDIidfiYs8/V2bpYE4oe17wXZNPHu4HoDwOBzg+r/GZJi6lNoJ/fCnXyvksB8sfMjpdmT+Dyi8tdGkPX1vF7XX4mJ3z2t0jCQY+DlT5nIFj4M76vi5uEqEieBcDJC7x+eqkQqycUAarcn9Q/wzxR3j4cJDXAOvRlDr ThIBUhk8 djpNZ+op1ywl0goatrynUarSNVhVyp036xST+AzdLSiZx71MLCxuDNoB/+mN1F5OC/RKaZpFTp5Fg7jT1i1Rsy5IHyQ1NGCp4fvQohqnzSVk3pea3slmEsPUbLlo0CQ9ophJDAuGBkuSFND6K+rRVs/2xdzH7iO170jHWD8gHELZ096IIbqABNw3qiw5usBB9d2NI X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon 27-11-23 19:16:37, Dmitry Rokosov wrote: > On Mon, Nov 27, 2023 at 01:50:22PM +0100, Michal Hocko wrote: > > On Mon 27-11-23 14:36:44, Dmitry Rokosov wrote: > > > On Mon, Nov 27, 2023 at 10:33:49AM +0100, Michal Hocko wrote: > > > > On Thu 23-11-23 22:39:37, Dmitry Rokosov wrote: > > > > > The shrink_memcg flow plays a crucial role in memcg reclamation. > > > > > Currently, it is not possible to trace this point from non-direct > > > > > reclaim paths. However, direct reclaim has its own tracepoint, so there > > > > > is no issue there. In certain cases, when debugging memcg pressure, > > > > > developers may need to identify all potential requests for memcg > > > > > reclamation including kswapd(). The patchset introduces the tracepoints > > > > > mm_vmscan_memcg_shrink_{begin|end}() to address this problem. > > > > > > > > > > Example of output in the kswapd context (non-direct reclaim): > > > > > kswapd0-39 [001] ..... 240.356378: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > > > > > kswapd0-39 [001] ..... 240.356396: mm_vmscan_memcg_shrink_end: nr_reclaimed=0 memcg=16 > > > > > kswapd0-39 [001] ..... 240.356420: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > > > > > kswapd0-39 [001] ..... 240.356454: mm_vmscan_memcg_shrink_end: nr_reclaimed=1 memcg=16 > > > > > kswapd0-39 [001] ..... 240.356479: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > > > > > kswapd0-39 [001] ..... 240.356506: mm_vmscan_memcg_shrink_end: nr_reclaimed=4 memcg=16 > > > > > kswapd0-39 [001] ..... 240.356525: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > > > > > kswapd0-39 [001] ..... 240.356593: mm_vmscan_memcg_shrink_end: nr_reclaimed=11 memcg=16 > > > > > kswapd0-39 [001] ..... 240.356614: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > > > > > kswapd0-39 [001] ..... 240.356738: mm_vmscan_memcg_shrink_end: nr_reclaimed=25 memcg=16 > > > > > kswapd0-39 [001] ..... 240.356790: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > > > > > kswapd0-39 [001] ..... 240.357125: mm_vmscan_memcg_shrink_end: nr_reclaimed=53 memcg=16 > > > > > > > > In the previous version I have asked why do we need this specific > > > > tracepoint when we already do have trace_mm_vmscan_lru_shrink_{in}active > > > > which already give you a very good insight. That includes the number of > > > > reclaimed pages but also more. I do see that we do not include memcg id > > > > of the reclaimed LRU, but that shouldn't be a big problem to add, no? > > > > > > >From my point of view, memcg reclaim includes two points: LRU shrink and > > > slab shrink, as mentioned in the vmscan.c file. > > > > > > > > > static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc) > > > ... > > > reclaimed = sc->nr_reclaimed; > > > scanned = sc->nr_scanned; > > > > > > shrink_lruvec(lruvec, sc); > > > > > > shrink_slab(sc->gfp_mask, pgdat->node_id, memcg, > > > sc->priority); > > > ... > > > > > > So, both of these operations are important for understanding whether > > > memcg reclaiming was successful or not, as well as its effectiveness. I > > > believe it would be beneficial to summarize them, which is why I have > > > created new tracepoints. > > > > This sounds like nice to have rather than must. Put it differently. If > > you make existing reclaim trace points memcg aware (print memcg id) then > > what prevents you from making analysis you need? > > You are right, nothing prevents me from making this analysis... but... > > This approach does have some disadvantages: > 1) It requires more changes to vmscan. At the very least, the memcg > object should be forwarded to all subfunctions for LRU and SLAB > shrinkers. We should have lruvec or memcg available. lruvec_memcg() could be used to get memcg from the lruvec. It might be more places to add the id but arguably this would improve them to identify where the memory has been scanned/reclaimed from. > 2) With this approach, we will not have the ability to trace a situation > where the kernel is requesting reclaim for a specific memcg, but due to > limits issues, we are unable to run it. I do not follow. Could you be more specific please? > 3) LRU and SLAB shrinkers are too common places to handle memcg-related > tasks. Additionally, memcg can be disabled in the kernel configuration. Right. This could be all hidden in the tracing code. You simply do not print memcg id when the controller is disabled. Or just simply print 0. I do not really see any major problems with that. I would really prefer to focus on that direction rather than adding another begin/end tracepoint which overalaps with existing begin/end traces and provides much more limited information because I would bet we will have somebody complaining that mere nr_reclaimed is not sufficient. -- Michal Hocko SUSE Labs