From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1DDE2C4167B for ; Wed, 29 Nov 2023 15:26:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 923688D001C; Wed, 29 Nov 2023 10:26:19 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8D3738D0001; Wed, 29 Nov 2023 10:26:19 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 79B708D001C; Wed, 29 Nov 2023 10:26:19 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 6B5228D0001 for ; Wed, 29 Nov 2023 10:26:19 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 38F22C050E for ; Wed, 29 Nov 2023 15:26:19 +0000 (UTC) X-FDA: 81511368078.13.78653D4 Received: from mx1.sberdevices.ru (mx1.sberdevices.ru [37.18.73.165]) by imf22.hostedemail.com (Postfix) with ESMTP id DF54CC0025 for ; Wed, 29 Nov 2023 15:26:15 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=salutedevices.com header.s=mail header.b=C5r6pVRb; dmarc=pass (policy=quarantine) header.from=salutedevices.com; spf=pass (imf22.hostedemail.com: domain of ddrokosov@salutedevices.com designates 37.18.73.165 as permitted sender) smtp.mailfrom=ddrokosov@salutedevices.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701271576; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=akCVsohWzhgHFlDJGn9FqbXDx2oHKT/KQD6M/QgipVI=; b=SJW5+reS3GuMd6eMlvwlCsAlR+4ddqPsXsCZov8ZTZRKvJX+CTjcrOZIi6G+VAul+mG5e1 /pZOSBkX38Rp03dWLTWGJ2zmxnnIeC1tiIgLINVZvOFCkXAlMG1xFCL3yaivUVXNnI7CNJ 2POrcgfRa+49kNeAdIccHj3UBN0AZog= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=salutedevices.com header.s=mail header.b=C5r6pVRb; dmarc=pass (policy=quarantine) header.from=salutedevices.com; spf=pass (imf22.hostedemail.com: domain of ddrokosov@salutedevices.com designates 37.18.73.165 as permitted sender) smtp.mailfrom=ddrokosov@salutedevices.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701271576; a=rsa-sha256; cv=none; b=kbk/+Y14s49mO2egOCRe5NMzFPJ4VBYjKDq9E7X4te8z1+tt4SW0mS8qWtc5uX8n3+h4Rm z1t/TjhfLcTIK75aKO/jm1/vLKePrb3EsNjH9bjj7Mh5Dcxqy/Pu8kPShpYVbAggePvEhA cK5/AOwh+g8oxaa+fZjnKDBeJZ8+yEs= Received: from p-infra-ksmg-sc-msk01 (localhost [127.0.0.1]) by mx1.sberdevices.ru (Postfix) with ESMTP id C8C6D100013; Wed, 29 Nov 2023 18:26:13 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.sberdevices.ru C8C6D100013 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=salutedevices.com; s=mail; t=1701271573; bh=akCVsohWzhgHFlDJGn9FqbXDx2oHKT/KQD6M/QgipVI=; h=Date:From:To:Subject:Message-ID:MIME-Version:Content-Type:From; b=C5r6pVRbhIu+BDwsK3OYZYLx6h9KNlw/q4UG0htK7NLAnCkfKNfmrvcon83pRq//a 5kkglds76vx9nzbXUUTN6k4MORleHYuQ3480JShCcmpmht2EzrHetRNuRdmpqkMEhI 1AUNrTJtIRxsPKLleSlAQsEObzTz6pwKucHcAo5DEHtkZcyxtOWVqP66wqMRijm5NH 542M4jLnr0MCMJpm3uZ/299COj/eN1ZrtZpbTgbFcfeW2eNW08iFt06LlJCU09W6BH pzNC6NKjwcOkmLGuRSdAP3DU+kQwoanFcQVhZr8XOeDkEPo5UXXAuFbNGcls4vkK4F AAny+LeUluCqw== Received: from p-i-exch-sc-m01.sberdevices.ru (p-i-exch-sc-m01.sberdevices.ru [172.16.192.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.sberdevices.ru (Postfix) with ESMTPS; Wed, 29 Nov 2023 18:26:13 +0300 (MSK) Received: from localhost (100.64.160.123) by p-i-exch-sc-m01.sberdevices.ru (172.16.192.107) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Wed, 29 Nov 2023 18:26:13 +0300 Date: Wed, 29 Nov 2023 18:26:13 +0300 From: Dmitry Rokosov To: Michal Hocko CC: , , , , , , , , , , , , Subject: Re: [PATCH v3 2/2] mm: memcg: introduce new event to trace shrink_memcg Message-ID: <20231129152613.6vfz4b675u7wbz25@CAB-WSD-L081021> References: <20231123193937.11628-1-ddrokosov@salutedevices.com> <20231123193937.11628-3-ddrokosov@salutedevices.com> <20231127113644.btg2xrcpjhq4cdgu@CAB-WSD-L081021> <20231127161637.5eqxk7xjhhyr5tj4@CAB-WSD-L081021> <20231129152057.x7fhbcvwtsmkbdpb@CAB-WSD-L081021> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <20231129152057.x7fhbcvwtsmkbdpb@CAB-WSD-L081021> User-Agent: NeoMutt/20220415 X-Originating-IP: [100.64.160.123] X-ClientProxiedBy: p-i-exch-sc-m02.sberdevices.ru (172.16.192.103) To p-i-exch-sc-m01.sberdevices.ru (172.16.192.107) X-KSMG-Rule-ID: 10 X-KSMG-Message-Action: clean X-KSMG-AntiSpam-Lua-Profiles: 181705 [Nov 29 2023] X-KSMG-AntiSpam-Version: 6.0.0.2 X-KSMG-AntiSpam-Envelope-From: ddrokosov@salutedevices.com X-KSMG-AntiSpam-Rate: 0 X-KSMG-AntiSpam-Status: not_detected X-KSMG-AntiSpam-Method: none X-KSMG-AntiSpam-Auth: dkim=none X-KSMG-AntiSpam-Info: LuaCore: 5 0.3.5 98d108ddd984cca1d7e65e595eac546a62b0144b, {Tracking_uf_ne_domains}, {Track_E25351}, {Tracking_from_domain_doesnt_match_to}, 100.64.160.123:7.1.2;127.0.0.199:7.1.2;d41d8cd98f00b204e9800998ecf8427e.com:7.1.1;salutedevices.com:7.1.1;p-i-exch-sc-m01.sberdevices.ru:5.0.1,7.1.1;git.kernel.org:7.1.1, FromAlignment: s, ApMailHostAddress: 100.64.160.123 X-MS-Exchange-Organization-SCL: -1 X-KSMG-AntiSpam-Interceptor-Info: scan successful X-KSMG-AntiPhishing: Clean, bases: 2023/11/29 13:21:00 X-KSMG-LinksScanning: Clean, bases: 2023/11/29 13:21:00 X-KSMG-AntiVirus: Kaspersky Secure Mail Gateway, version 2.0.1.6960, bases: 2023/11/29 12:04:00 #22572143 X-KSMG-AntiVirus-Status: Clean, skipped X-Rspamd-Queue-Id: DF54CC0025 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: w64y3fpfjs1nwbquhxhhsh5tmaigi5qq X-HE-Tag: 1701271575-114209 X-HE-Meta: U2FsdGVkX1+EI8u52MAmha4N0Kavsq/BlIOfzT7s8stuasf/aJB21p2GvT06t9xdST4+/5sj0WyeIUoYvNKOc76iGX9MpwmvyHLp89jXWriV75o+pgUhu7pSDZWVMAw1P6qIGfyuRiDsVvIG5SEU0mQNZlBPeTEZ7FM404vVFxrPQPv6TtooJTKPN0oAGN+evMV9fB8ognSLx1PlapPATw1mMpXna1S4Dx16e+c7aU13CkkySH16ZSEQwxnF4X5QCZRniVv8k0FblSg7DuVitOg9W35Z320FgeRYKxIRRzPoMNPXNvnMDaUKFMPZgekPa8esTuiAiBTXcA7mPGLXO4SGweIKFqH0Qtc368+R+RzX5Kg50KmP3loyDGUPqChLoxJV9DAE5xOsVi+a133OicyTRxynQQxI2SaxRUQHXr++AM6d/ksMreGTttnYFwrgE7f8oVYvyYX3rLKU1cSU56yOkQQcDawbZoYKbTm9MQbL+j9/LUXqNwz8xQ9AKieNkct5vbvQGptnW7cnN1b5SqfULDO+riJshJXKPzD9FsC/3xkVotIQhS6HQ3UOuTP1vcf655memDHat2YqZx2VCKMM0J4ZnjkBIr9v1dBX6SjYbJqB+hMM0TMYhCJrS1hUvWCtWobo8m7/gI9DN8tfcppzXnls3bxkYf+XZ83zX6b1p448rVLvdqiuOdJ3whhSfqfKRCxSy95D/fSv0rUh4pKwPUzgypjPyEb3JURdrrKSWZubMrZ9SNSWgJTFUWQx1i/fqMazLQsdKdul3b3zPZnGkZQDqTUicFs7tPrRedIYiE284vjJJJx7eu0ES0md9uR0AEXwIMBkexxCRzNb52lFyUTOz9JhExo2ftvFjsYEx1eJ1eY1DzY5C9lV59mVzgQKH4megz87g2fUvDGWzKrvRiVkYeP6rmbrXf0nQ+YTPsRHbnZb2MR9V4F1pPAmhJqw3+zAup1VD4fjP4w yV75TUCi uqloIrXwQFg/PfaDLM1F1PPzYHewlC2IPxFHybP/53BY2HEAAUajCWw5J4DfX2RpkNXzy6NIjgSDY4z0hjrGIGncRt+k2d9VH8ripPHgRI0uXvXh0O+QQjcJ7sJj2pFIDloNEws5WVrsZ2EPtoAzi2KFAtCCS2WZgrOz8AfpXRi51D79M8W5X4C4LVnCBazFZZfOWDEgoX7U+cDHRmsTEEsrkr7PJ8IuQDGazcd2HDzGgwQwA8xJSeb3fEXyllxD/ft4Tw9m+Q8ph6jZUCVHmSj6gzAfjKFNp2E3FUYJQLw8o5YlkctuJ5TKbIQPhE1oxcVEFzhsP8jC4R9292c5HOp3Q6rmIibr/XQRvq+HYinyG/iFLUyTo34SAluxHGb51Dp00OxHrvVNFErF3C9XeHGce4Vatq7WN4glamnRmaV5eZTS8jnAZFrChI0K3rRvaAtvi X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Nov 29, 2023 at 06:20:57PM +0300, Dmitry Rokosov wrote: > On Tue, Nov 28, 2023 at 10:32:50AM +0100, Michal Hocko wrote: > > On Mon 27-11-23 19:16:37, Dmitry Rokosov wrote: > > > On Mon, Nov 27, 2023 at 01:50:22PM +0100, Michal Hocko wrote: > > > > On Mon 27-11-23 14:36:44, Dmitry Rokosov wrote: > > > > > On Mon, Nov 27, 2023 at 10:33:49AM +0100, Michal Hocko wrote: > > > > > > On Thu 23-11-23 22:39:37, Dmitry Rokosov wrote: > > > > > > > The shrink_memcg flow plays a crucial role in memcg reclamation. > > > > > > > Currently, it is not possible to trace this point from non-direct > > > > > > > reclaim paths. However, direct reclaim has its own tracepoint, so there > > > > > > > is no issue there. In certain cases, when debugging memcg pressure, > > > > > > > developers may need to identify all potential requests for memcg > > > > > > > reclamation including kswapd(). The patchset introduces the tracepoints > > > > > > > mm_vmscan_memcg_shrink_{begin|end}() to address this problem. > > > > > > > > > > > > > > Example of output in the kswapd context (non-direct reclaim): > > > > > > > kswapd0-39 [001] ..... 240.356378: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > > > > > > > kswapd0-39 [001] ..... 240.356396: mm_vmscan_memcg_shrink_end: nr_reclaimed=0 memcg=16 > > > > > > > kswapd0-39 [001] ..... 240.356420: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > > > > > > > kswapd0-39 [001] ..... 240.356454: mm_vmscan_memcg_shrink_end: nr_reclaimed=1 memcg=16 > > > > > > > kswapd0-39 [001] ..... 240.356479: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > > > > > > > kswapd0-39 [001] ..... 240.356506: mm_vmscan_memcg_shrink_end: nr_reclaimed=4 memcg=16 > > > > > > > kswapd0-39 [001] ..... 240.356525: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > > > > > > > kswapd0-39 [001] ..... 240.356593: mm_vmscan_memcg_shrink_end: nr_reclaimed=11 memcg=16 > > > > > > > kswapd0-39 [001] ..... 240.356614: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > > > > > > > kswapd0-39 [001] ..... 240.356738: mm_vmscan_memcg_shrink_end: nr_reclaimed=25 memcg=16 > > > > > > > kswapd0-39 [001] ..... 240.356790: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > > > > > > > kswapd0-39 [001] ..... 240.357125: mm_vmscan_memcg_shrink_end: nr_reclaimed=53 memcg=16 > > > > > > > > > > > > In the previous version I have asked why do we need this specific > > > > > > tracepoint when we already do have trace_mm_vmscan_lru_shrink_{in}active > > > > > > which already give you a very good insight. That includes the number of > > > > > > reclaimed pages but also more. I do see that we do not include memcg id > > > > > > of the reclaimed LRU, but that shouldn't be a big problem to add, no? > > > > > > > > > > >From my point of view, memcg reclaim includes two points: LRU shrink and > > > > > slab shrink, as mentioned in the vmscan.c file. > > > > > > > > > > > > > > > static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc) > > > > > ... > > > > > reclaimed = sc->nr_reclaimed; > > > > > scanned = sc->nr_scanned; > > > > > > > > > > shrink_lruvec(lruvec, sc); > > > > > > > > > > shrink_slab(sc->gfp_mask, pgdat->node_id, memcg, > > > > > sc->priority); > > > > > ... > > > > > > > > > > So, both of these operations are important for understanding whether > > > > > memcg reclaiming was successful or not, as well as its effectiveness. I > > > > > believe it would be beneficial to summarize them, which is why I have > > > > > created new tracepoints. > > > > > > > > This sounds like nice to have rather than must. Put it differently. If > > > > you make existing reclaim trace points memcg aware (print memcg id) then > > > > what prevents you from making analysis you need? > > > > > > You are right, nothing prevents me from making this analysis... but... > > > > > > This approach does have some disadvantages: > > > 1) It requires more changes to vmscan. At the very least, the memcg > > > object should be forwarded to all subfunctions for LRU and SLAB > > > shrinkers. > > > > We should have lruvec or memcg available. lruvec_memcg() could be used > > to get memcg from the lruvec. It might be more places to add the id but > > arguably this would improve them to identify where the memory has been > > scanned/reclaimed from. > > > > Oh, thank you, didn't see this conversion function before... > > > > 2) With this approach, we will not have the ability to trace a situation > > > where the kernel is requesting reclaim for a specific memcg, but due to > > > limits issues, we are unable to run it. > > > > I do not follow. Could you be more specific please? > > > > I'm referring to a situation where kswapd() or another kernel mm code > requests some reclaim pages from memcg, but memcg rejects it due to > limits checkers. This occurs in the shrink_node_memcgs() function. > > === > mem_cgroup_calculate_protection(target_memcg, memcg); > > if (mem_cgroup_below_min(target_memcg, memcg)) { > /* > * Hard protection. > * If there is no reclaimable memory, OOM. > */ > continue; > } else if (mem_cgroup_below_low(target_memcg, memcg)) { > /* > * Soft protection. > * Respect the protection only as long as > * there is an unprotected supply > * of reclaimable memory from other cgroups. > */ > if (!sc->memcg_low_reclaim) { > sc->memcg_low_skipped = 1; > continue; > } > memcg_memory_event(memcg, MEMCG_LOW); > } > === > > With separate shrink begin()/end() tracepoints we can detect such > problem. > > > > > 3) LRU and SLAB shrinkers are too common places to handle memcg-related > > > tasks. Additionally, memcg can be disabled in the kernel configuration. > > > > Right. This could be all hidden in the tracing code. You simply do not > > print memcg id when the controller is disabled. Or just simply print 0. > > I do not really see any major problems with that. > > > > I would really prefer to focus on that direction rather than adding > > another begin/end tracepoint which overalaps with existing begin/end > > traces and provides much more limited information because I would bet we > > will have somebody complaining that mere nr_reclaimed is not sufficient. > > Okay, I will try to prepare a new patch version with memcg printing from > lruvec and slab tracepoints. > > Then Andrew should drop the previous patchsets, I suppose. Please advise > on the correct workflow steps here. Actually, it has already been merged into linux-next... I just checked. Maybe it would be better to prepare lruvec and slab memcg printing as a separate patch series? https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=0e7f0c52a76cb22c8633f21bff6e48fabff6016e -- Thank you, Dmitry