From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F2766C54739 for ; Wed, 28 Aug 2024 03:32:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 86EB16B0092; Tue, 27 Aug 2024 23:32:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 81F246B0095; Tue, 27 Aug 2024 23:32:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6E69E6B0096; Tue, 27 Aug 2024 23:32:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 505D66B0092 for ; Tue, 27 Aug 2024 23:32:28 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id F3A5CA1C4C for ; Wed, 28 Aug 2024 03:32:27 +0000 (UTC) X-FDA: 82500231534.08.F898836 Received: from mail-pg1-f177.google.com (mail-pg1-f177.google.com [209.85.215.177]) by imf04.hostedemail.com (Postfix) with ESMTP id 0560440008 for ; Wed, 28 Aug 2024 03:32:25 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=JnV5q3dY; spf=pass (imf04.hostedemail.com: domain of david@fromorbit.com designates 209.85.215.177 as permitted sender) smtp.mailfrom=david@fromorbit.com; dmarc=pass (policy=quarantine) header.from=fromorbit.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724815830; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=dOUwFnh8dRGJzRbK7IWvINcshuS8hyE9hSMQAPyOBfI=; b=vGoARocRYvI0GeYp61OHtR/NyJF3o2NXh1LzNJBsWhiMycTTjbc10B+C/izRLjImtwsXKg /PAcfQD6z0Dae3h99fb0avFqExlbp0bC0UZ9VqipMULgmp82IbKz0O3eAxJhcR8gp07wrz 4mbvG6iIsgAKf/n7hJ+n3jU2zVmaop0= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=JnV5q3dY; spf=pass (imf04.hostedemail.com: domain of david@fromorbit.com designates 209.85.215.177 as permitted sender) smtp.mailfrom=david@fromorbit.com; dmarc=pass (policy=quarantine) header.from=fromorbit.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724815830; a=rsa-sha256; cv=none; b=W5DbI5vmDCb3YGHaLyPWw1qBqj+wZRiXsRKd4SL5TvE3NGSYllABxwsgIFckWnG/O9ro1G MsrpYQUJ+OrmVT/FOmX9wMt4gq3AcNm5JsraAdD3DrwopWnYtuzQn2//YcAHM870nUY4Ps maMZ8bHmqNizWjrb6nCKLynBehFsml8= Received: by mail-pg1-f177.google.com with SMTP id 41be03b00d2f7-7cd76b56e59so2163971a12.2 for ; Tue, 27 Aug 2024 20:32:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1724815944; x=1725420744; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=dOUwFnh8dRGJzRbK7IWvINcshuS8hyE9hSMQAPyOBfI=; b=JnV5q3dYH+1iJ+lGCz/xDQlY5bkgCkUf3S9Fk1+1/FV1NMb+T1f/YviYv2Mh+OYMnn XXuhMHYzmj+TM7KcCp9KSF2+Xi7jjQYOEW96eG44jLLdjPfzEWz2Um8C8pHko6ltPI1m lJ+r2wF2khiV5JL4wLLQ9bQVNWMcqVa6sCmltkmt84WA0z63tqKMp3ybxIfYcGmJHfSp LIoGIZZDFL+to6SFU9dgx+N7+DRTMyl0eJOuFOQo+Fs4mF/e0mCMdqUsAeZ2y7uaqRL8 VnmcB0uxx2XCdeqjXeKP4BfhspQP6uyOYtN8+JnIkWALsCOybA+pg717Bd2j5uTa/xeu OIWg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724815944; x=1725420744; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=dOUwFnh8dRGJzRbK7IWvINcshuS8hyE9hSMQAPyOBfI=; b=wp2D9imcuTtz+9SfR6mXxLZdR/+Da8NhWZHXDHR59BfTOfFbfln4xB/CDeIqqfTlDM iDA5vJeS63tA7axHYZigrtAImlfhMq0/TWJzlkxe18MIXImKMFYIgOQ8MBtm5N/BAyh/ rIcChbeMITRX3LxWXYdG5WmVA5tJD4DgMAZDFgWs+hFW0E6gul+ZTCg0xJbrpGcWgBFN C4vOBUHPVFaXh5/AQ4SfQgTZ9BMHHY0lMkE5VheQafjQfo3W9LLv7uyg1BRuG4WY/5jy D1b6KMzVzwIcfjou0t4ke46+TbUK5ToovmOtPjkzpL5N4+12Qkztj3rneO5pAXiHSggV ajxQ== X-Forwarded-Encrypted: i=1; AJvYcCWU0B71y+5OcgtQ3MAmnp3rw00viXD9tvZO8Fv1fdYiVn0wJISJ+Gu/emlJOfYVRGdHC/Z9KiIVrQ==@kvack.org X-Gm-Message-State: AOJu0YyCCVJlTETQlUoOsuqxZeWN9rIqsC3UBbQ4DvukdyPdwL6/ofZN If3xmvU8+9EuvOY34a0L5RLbNyUYcWBwqgMxidFcEAZZ3FQhV6x7Cg/nO8epvc8= X-Google-Smtp-Source: AGHT+IGQnmK2Kx24hDKpI+BVFCeBMtIVqGG2N0/z6DHuYSL/PxeKxx30BPfqZGyDNMUOFOzL+KIWmg== X-Received: by 2002:a17:90a:9ad:b0:2c9:77d8:bb60 with SMTP id 98e67ed59e1d1-2d646d6f468mr16953327a91.35.1724815944575; Tue, 27 Aug 2024 20:32:24 -0700 (PDT) Received: from dread.disaster.area (pa49-179-0-65.pa.nsw.optusnet.com.au. [49.179.0.65]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2d84462af10sm362282a91.34.2024.08.27.20.32.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 27 Aug 2024 20:32:24 -0700 (PDT) Received: from dave by dread.disaster.area with local (Exim 4.96) (envelope-from ) id 1sj9Px-00FI5C-0A; Wed, 28 Aug 2024 13:32:21 +1000 Date: Wed, 28 Aug 2024 13:32:21 +1000 From: Dave Chinner To: Kent Overstreet Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Andrew Morton , Qi Zheng , Roman Gushchin Subject: Re: [PATCH 02/10] mm: shrinker: Add a .to_text() method for shrinkers Message-ID: References: <20240824191020.3170516-1-kent.overstreet@linux.dev> <20240824191020.3170516-3-kent.overstreet@linux.dev> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240824191020.3170516-3-kent.overstreet@linux.dev> X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 0560440008 X-Stat-Signature: dde1umrzooby4f7a9q8qcq6bcqhe9isu X-Rspam-User: X-HE-Tag: 1724815945-94668 X-HE-Meta: U2FsdGVkX1/vnMhV6ZtxR1txrq6GmC7OgS/m6fCkE2B6FKo65SaOD6jgO9iaX2UISpE9LZAxW1gHDjW3z2RgwYWl3gBdVuZXqAjGqBs98Uz7cb5w5/P25lpJ5r4QPtj2wBXcGpzyA4gaz5CGPL6euxN7GhgQ8kgHzACLPlllPkXyvibAAbzqfFSGmtiT/Gwfj0W+UAar86utErrhWwPxOeNyufCerD0jJ7N153sy5/y+3wVjwccC6T/rqlVEQDqTVUehU1SNh6UppdzTGk/xAlEvkrRZnSzoR8WO9Z9YPZyu3AYi5dOHVMnLMHp39VYeAjw6MOB1WBDid7TbIaZxRo9yQ2Q/2ZoAMG3+HwImhX5c7XaThjbfLlqxtmsIJih7627bD7njwTdUYs3UVxIgZlmHCE4vONENjlSpoIu+FQdtKEdEHIAkGafp0VDurE1VhkzSZaI9rgaU7IVm7W42KZoUsxLv0ALrfNChi/FBdFJUFrjTHJHhbjtiuRA0q6Mbe24bN+XzRoU8JKb81TjveyD3Mwib2vm3EkXOmf9kTliCXQ2iA/qwHw2w4chXMtFki8j9Ei2jdVRpspSNXvn6sOUiOHB3+0yQF2ddPp2gCHx0Js3UaB2WjBnJRlogJy57wGL5D7DlzDE8ZurEOvVIDlTggDlKkOIoooDEhVB1KHrUEMI2a9j1XxNpTVW/XkJX1jZLKE5copU91U+Nn5yA4MD1IvpoXd2Bq3b/O5SgXJoML8ORKKH3hTHdrfcdgW6yt/fqdc1KMIxObmf2p4OsAcHhOQ8pnunhh6/DQCGy2jZC6fNPj2ZfL4V7yG6DTh9CrP8Cu03ewd6mTxLnzp8mN798HjEpD0j9LgqbStKMdsXmpFsXqCnb4+Vo5SHqHejCpO2qghwiO+7dSLHgAHU9UGozBMYgfy3+w6veCosfBZZpFFnaiDMvKkGdl+eD3fRSBOQ4h/fSWubr7IRXLzD 7VUbzBKF uVScRSH4NBmP8WYHjtx2YNgArtUzW4KoTcC7zIrfxPC+R2HpP6U/SXE8flieHAtwqpvb00Na11iTLtvRn73YWusbsn8Dej1ztFwFUPl0S6bp1dB+u9+exW6sz21UagRo5VOgN1HJJ6jtZmeyfsZQI7ocDHnS7YzHV+FCv9DM7jpRDF4bdesbCo3BqSPDqG74Iu0/Qdf9RkSdN3lQU+C7Fwoq0KcNxjOEkEnRA7xaIZW6EueKrziQoNlnYRxUX9FaVX5JLEth6Lf70z8K3ql/cIxBXmNAf1rbvHKpqJskaYhS5dHbt/M72DGExYBScEcxOGWao0dqlP4d9uEaLl4tY8XmNuGH3WB3UlBLmf+KBQyfWUtzkpP0ocOtL+8SYjIsRSEHy32Rxhrtqt3UQxMXyO8ucpKm/rL2RMxsYOxXjkwBvTuUo5sScukoVoeLzv+qOcwuZ+Q05HtG29xo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sat, Aug 24, 2024 at 03:10:09PM -0400, Kent Overstreet wrote: > This adds a new callback method to shrinkers which they can use to > describe anything relevant to memory reclaim about their internal state, > for example object dirtyness. .... > + if (!mutex_trylock(&shrinker_mutex)) { > + seq_buf_puts(out, "(couldn't take shrinker lock)"); > + return; > + } Please don't use the shrinker_mutex like this. There can be tens of thousands of entries in the shrinker list (because memcgs) and holding the shrinker_mutex for long running traversals like this is known to cause latency problems for memcg reaping. If we are at ENOMEM, the last thing we want to be doing is preventing memcgs from being reaped. > + list_for_each_entry(shrinker, &shrinker_list, list) { > + struct shrink_control sc = { .gfp_mask = GFP_KERNEL, }; This iteration and counting setup is neither node or memcg aware. For node aware shrinkers, this will only count the items freeable on node 0, and ignore all the other memory in the system. For memcg systems, it will also only scan the root memcg and so miss counting any memory in memcg owned caches. IOWs, the shrinker iteration mechanism needs to iterate both by NUMA node and by memcg. On large machines with multiple nodes and hosting thousands of memcgs, a total shrinker state iteration is has to walk a -lot- of structures. And example of this is drop_slab() - called from /proc/sys/vm/drop_caches(). It does this to iterate all the shrinkers for all the nodes and memcgs in the system: static unsigned long drop_slab_node(int nid) { unsigned long freed = 0; struct mem_cgroup *memcg = NULL; memcg = mem_cgroup_iter(NULL, NULL, NULL); do { freed += shrink_slab(GFP_KERNEL, nid, memcg, 0); } while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)) != NULL); return freed; } void drop_slab(void) { int nid; int shift = 0; unsigned long freed; do { freed = 0; for_each_online_node(nid) { if (fatal_signal_pending(current)) return; freed += drop_slab_node(nid); } } while ((freed >> shift++) > 1); } Hence any iteration for finding the 10 largest shrinkable caches in the system needs to do something similar. Only, it needs to iterate memcgs first and then aggregate object counts across all nodes for shrinkers that are NUMA aware. Because it needs direct access to the shrinkers, it will need to use the RCU lock + refcount method of traversal because that's the only safe way to go from memcg to shrinker instance. IOWs, it needs to mirror the code in shrink_slab/shrink_slab_memcg to obtain a safe reference to the relevant shrinker so it can call ->count_objects() and store a refcounted pointer to the shrinker(s) that will get printed out after the scan is done.... Once the shrinker iteration is sorted out, I'll look further at the rest of the code in this patch... -Dave. -- Dave Chinner david@fromorbit.com