linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/6] page allocation tag compression
@ 2024-09-02  4:41 Suren Baghdasaryan
  2024-09-02  4:41 ` [PATCH v2 1/6] maple_tree: add mas_for_each_rev() helper Suren Baghdasaryan
                   ` (5 more replies)
  0 siblings, 6 replies; 27+ messages in thread
From: Suren Baghdasaryan @ 2024-09-02  4:41 UTC (permalink / raw)
  To: akpm
  Cc: kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth,
	tglx, bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
	roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
	souravpanda, keescook, dennis, jhubbard, yuzhao, vvvvvv, rostedt,
	iamjoonsoo.kim, rientjes, minchan, kaleshsingh, linux-doc,
	linux-kernel, linux-arch, linux-mm, linux-modules, kernel-team,
	surenb

This patchset implements several improvements:
1. Gracefully handles module unloading while there are used allocations
allocated from that module;
2. Provides an option to reduce memory overhead from storing page
allocation references by indexing allocation tags;
3. Provides an option to store page allocation tag references in the
page flags, removing dependency on page extensions and eliminating the
memory overhead from storing page allocation references (~0.2% of total
system memory).
4. Improves page allocation performance when CONFIG_MEM_ALLOC_PROFILING
is enabled by eliminating page extension lookup. Page allocation
performance overhead is reduced from 14% to 5.5%.

Patch #1 introduces mas_for_each_rev() helper function.

Patch #2 copies module tags into virtually contiguous memory which
serves two purposes:
- Lets us deal with the situation when module is unloaded while there
are still live allocations from that module. Since we are using a copy
version of the tags we can safely unload the module. Space and gaps in
this contiguous memory are managed using a maple tree.
- Enables simple indexing of the tags in the later patches.

Preallocated virtually contiguous memory size can be configured using
max_module_alloc_tags kernel parameter.

Patch #3 is a code cleanup to simplify later changes.

Patch #4 abstracts page allocation tag reference to simplify later
changes.

Patch #5 lets us control page allocation tag reference sizes and
introduces tag indexing.

Patch #6 adds a config to store page allocation tag references inside
page flags if they fit.

Patchset applies to mm-unstable.

Changes since v1 [1]:
- introduced mas_for_each_rev() and use it, per Liam Howlett
- use advanced maple_tree API to minimize lookups, per Liam Howlett
- fixed CONFIG_MODULES=n configuration build, per kernel test robot

[1] https://lore.kernel.org/all/20240819151512.2363698-1-surenb@google.com/

Suren Baghdasaryan (6):
  maple_tree: add mas_for_each_rev() helper
  alloc_tag: load module tags into separate continuous memory
  alloc_tag: eliminate alloc_tag_ref_set
  alloc_tag: introduce pgalloc_tag_ref to abstract page tag references
  alloc_tag: make page allocation tag reference size configurable
  alloc_tag: config to store page allocation tag refs in page flags

 .../admin-guide/kernel-parameters.txt         |   4 +
 include/asm-generic/codetag.lds.h             |  19 ++
 include/linux/alloc_tag.h                     |  46 ++-
 include/linux/codetag.h                       |  40 ++-
 include/linux/maple_tree.h                    |  14 +
 include/linux/mmzone.h                        |   3 +
 include/linux/page-flags-layout.h             |  10 +-
 include/linux/pgalloc_tag.h                   | 287 +++++++++++++---
 kernel/module/main.c                          |  67 ++--
 lib/Kconfig.debug                             |  36 +-
 lib/alloc_tag.c                               | 321 ++++++++++++++++--
 lib/codetag.c                                 | 104 +++++-
 mm/mm_init.c                                  |   1 +
 mm/page_ext.c                                 |   2 +-
 scripts/module.lds.S                          |   5 +-
 15 files changed, 826 insertions(+), 133 deletions(-)


base-commit: 18d35b7e30d5a217ff1cc976bb819e1aa2873301
-- 
2.46.0.469.g59c65b2a67-goog



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v2 1/6] maple_tree: add mas_for_each_rev() helper
  2024-09-02  4:41 [PATCH v2 0/6] page allocation tag compression Suren Baghdasaryan
@ 2024-09-02  4:41 ` Suren Baghdasaryan
  2024-09-02  4:41 ` [PATCH v2 2/6] alloc_tag: load module tags into separate continuous memory Suren Baghdasaryan
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 27+ messages in thread
From: Suren Baghdasaryan @ 2024-09-02  4:41 UTC (permalink / raw)
  To: akpm
  Cc: kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth,
	tglx, bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
	roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
	souravpanda, keescook, dennis, jhubbard, yuzhao, vvvvvv, rostedt,
	iamjoonsoo.kim, rientjes, minchan, kaleshsingh, linux-doc,
	linux-kernel, linux-arch, linux-mm, linux-modules, kernel-team,
	surenb, Liam R. Howlett

Add mas_for_each_rev() function to iterate maple tree nodes in reverse
order.

Suggested-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 include/linux/maple_tree.h | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
index 8e1504a81cd2..45e633806da2 100644
--- a/include/linux/maple_tree.h
+++ b/include/linux/maple_tree.h
@@ -592,6 +592,20 @@ static __always_inline void mas_reset(struct ma_state *mas)
 #define mas_for_each(__mas, __entry, __max) \
 	while (((__entry) = mas_find((__mas), (__max))) != NULL)
 
+/**
+ * mas_for_each_rev() - Iterate over a range of the maple tree in reverse order.
+ * @__mas: Maple Tree operation state (maple_state)
+ * @__entry: Entry retrieved from the tree
+ * @__min: minimum index to retrieve from the tree
+ *
+ * When returned, mas->index and mas->last will hold the entire range for the
+ * entry.
+ *
+ * Note: may return the zero entry.
+ */
+#define mas_for_each_rev(__mas, __entry, __min) \
+	while (((__entry) = mas_find_rev((__mas), (__min))) != NULL)
+
 #ifdef CONFIG_DEBUG_MAPLE_TREE
 enum mt_dump_format {
 	mt_dump_dec,
-- 
2.46.0.469.g59c65b2a67-goog



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v2 2/6] alloc_tag: load module tags into separate continuous memory
  2024-09-02  4:41 [PATCH v2 0/6] page allocation tag compression Suren Baghdasaryan
  2024-09-02  4:41 ` [PATCH v2 1/6] maple_tree: add mas_for_each_rev() helper Suren Baghdasaryan
@ 2024-09-02  4:41 ` Suren Baghdasaryan
  2024-09-02  4:41 ` [PATCH v2 3/6] alloc_tag: eliminate alloc_tag_ref_set Suren Baghdasaryan
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 27+ messages in thread
From: Suren Baghdasaryan @ 2024-09-02  4:41 UTC (permalink / raw)
  To: akpm
  Cc: kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth,
	tglx, bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
	roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
	souravpanda, keescook, dennis, jhubbard, yuzhao, vvvvvv, rostedt,
	iamjoonsoo.kim, rientjes, minchan, kaleshsingh, linux-doc,
	linux-kernel, linux-arch, linux-mm, linux-modules, kernel-team,
	surenb

When a module gets unloaded there is a possibility that some of the
allocations it made are still used and therefore the allocation tags
corresponding to these allocations are still referenced. As such, the
memory for these tags can't be freed. This is currently handled as an
abnormal situation and module's data section is not being unloaded.
To handle this situation without keeping module's data in memory,
allow codetags with longer lifespan than the module to be loaded into
their own separate memory. The in-use memory areas and gaps after
module unloading in this separate memory are tracked using maple trees.
Allocation tags arrange their separate memory so that it is virtually
contiguous and that will allow simple allocation tag indexing later on
in this patchset. The size of this virtually contiguous memory is set
to store up to 100000 allocation tags and max_module_alloc_tags kernel
parameter is introduced to change this size.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 .../admin-guide/kernel-parameters.txt         |   4 +
 include/asm-generic/codetag.lds.h             |  19 ++
 include/linux/alloc_tag.h                     |  13 +-
 include/linux/codetag.h                       |  37 ++-
 kernel/module/main.c                          |  67 +++--
 lib/alloc_tag.c                               | 265 ++++++++++++++++--
 lib/codetag.c                                 | 100 ++++++-
 scripts/module.lds.S                          |   5 +-
 8 files changed, 451 insertions(+), 59 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 8dd0aefea01e..991b1c9ecf0e 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3205,6 +3205,10 @@
 			devices can be requested on-demand with the
 			/dev/loop-control interface.
 
+
+	max_module_alloc_tags=	[KNL] Max supported number of allocation tags
+			in modules.
+
 	mce		[X86-32] Machine Check Exception
 
 	mce=option	[X86-64] See Documentation/arch/x86/x86_64/boot-options.rst
diff --git a/include/asm-generic/codetag.lds.h b/include/asm-generic/codetag.lds.h
index 64f536b80380..372c320c5043 100644
--- a/include/asm-generic/codetag.lds.h
+++ b/include/asm-generic/codetag.lds.h
@@ -11,4 +11,23 @@
 #define CODETAG_SECTIONS()		\
 	SECTION_WITH_BOUNDARIES(alloc_tags)
 
+/*
+ * Module codetags which aren't used after module unload, therefore have the
+ * same lifespan as the module and can be safely unloaded with the module.
+ */
+#define MOD_CODETAG_SECTIONS()
+
+#define MOD_SEPARATE_CODETAG_SECTION(_name)	\
+	.codetag.##_name : {			\
+		SECTION_WITH_BOUNDARIES(_name)	\
+	}
+
+/*
+ * For codetags which might be used after module unload, therefore might stay
+ * longer in memory. Each such codetag type has its own section so that we can
+ * unload them individually once unused.
+ */
+#define MOD_SEPARATE_CODETAG_SECTIONS()		\
+	MOD_SEPARATE_CODETAG_SECTION(alloc_tags)
+
 #endif /* __ASM_GENERIC_CODETAG_LDS_H */
diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
index 8c61ccd161ba..99cbc7f086ad 100644
--- a/include/linux/alloc_tag.h
+++ b/include/linux/alloc_tag.h
@@ -30,6 +30,13 @@ struct alloc_tag {
 	struct alloc_tag_counters __percpu	*counters;
 } __aligned(8);
 
+struct alloc_tag_module_section {
+	unsigned long start_addr;
+	unsigned long end_addr;
+	/* used size */
+	unsigned long size;
+};
+
 #ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
 
 #define CODETAG_EMPTY	((void *)1)
@@ -54,6 +61,8 @@ static inline void set_codetag_empty(union codetag_ref *ref) {}
 
 #ifdef CONFIG_MEM_ALLOC_PROFILING
 
+#define ALLOC_TAG_SECTION_NAME	"alloc_tags"
+
 struct codetag_bytes {
 	struct codetag *ct;
 	s64 bytes;
@@ -76,7 +85,7 @@ DECLARE_PER_CPU(struct alloc_tag_counters, _shared_alloc_tag);
 
 #define DEFINE_ALLOC_TAG(_alloc_tag)						\
 	static struct alloc_tag _alloc_tag __used __aligned(8)			\
-	__section("alloc_tags") = {						\
+	__section(ALLOC_TAG_SECTION_NAME) = {					\
 		.ct = CODE_TAG_INIT,						\
 		.counters = &_shared_alloc_tag };
 
@@ -85,7 +94,7 @@ DECLARE_PER_CPU(struct alloc_tag_counters, _shared_alloc_tag);
 #define DEFINE_ALLOC_TAG(_alloc_tag)						\
 	static DEFINE_PER_CPU(struct alloc_tag_counters, _alloc_tag_cntr);	\
 	static struct alloc_tag _alloc_tag __used __aligned(8)			\
-	__section("alloc_tags") = {						\
+	__section(ALLOC_TAG_SECTION_NAME) = {					\
 		.ct = CODE_TAG_INIT,						\
 		.counters = &_alloc_tag_cntr };
 
diff --git a/include/linux/codetag.h b/include/linux/codetag.h
index c2a579ccd455..fb4e7adfa746 100644
--- a/include/linux/codetag.h
+++ b/include/linux/codetag.h
@@ -35,8 +35,15 @@ struct codetag_type_desc {
 	size_t tag_size;
 	void (*module_load)(struct codetag_type *cttype,
 			    struct codetag_module *cmod);
-	bool (*module_unload)(struct codetag_type *cttype,
+	void (*module_unload)(struct codetag_type *cttype,
 			      struct codetag_module *cmod);
+#ifdef CONFIG_MODULES
+	void (*module_replaced)(struct module *mod, struct module *new_mod);
+	bool (*needs_section_mem)(struct module *mod, unsigned long size);
+	void *(*alloc_section_mem)(struct module *mod, unsigned long size,
+				   unsigned int prepend, unsigned long align);
+	void (*free_section_mem)(struct module *mod, bool unused);
+#endif
 };
 
 struct codetag_iterator {
@@ -71,11 +78,31 @@ struct codetag_type *
 codetag_register_type(const struct codetag_type_desc *desc);
 
 #if defined(CONFIG_CODE_TAGGING) && defined(CONFIG_MODULES)
+
+bool codetag_needs_module_section(struct module *mod, const char *name,
+				  unsigned long size);
+void *codetag_alloc_module_section(struct module *mod, const char *name,
+				   unsigned long size, unsigned int prepend,
+				   unsigned long align);
+void codetag_free_module_sections(struct module *mod);
+void codetag_module_replaced(struct module *mod, struct module *new_mod);
 void codetag_load_module(struct module *mod);
-bool codetag_unload_module(struct module *mod);
-#else
+void codetag_unload_module(struct module *mod);
+
+#else /* defined(CONFIG_CODE_TAGGING) && defined(CONFIG_MODULES) */
+
+static inline bool
+codetag_needs_module_section(struct module *mod, const char *name,
+			     unsigned long size) { return false; }
+static inline void *
+codetag_alloc_module_section(struct module *mod, const char *name,
+			     unsigned long size, unsigned int prepend,
+			     unsigned long align) { return NULL; }
+static inline void codetag_free_module_sections(struct module *mod) {}
+static inline void codetag_module_replaced(struct module *mod, struct module *new_mod) {}
 static inline void codetag_load_module(struct module *mod) {}
-static inline bool codetag_unload_module(struct module *mod) { return true; }
-#endif
+static inline void codetag_unload_module(struct module *mod) {}
+
+#endif /* defined(CONFIG_CODE_TAGGING) && defined(CONFIG_MODULES) */
 
 #endif /* _LINUX_CODETAG_H */
diff --git a/kernel/module/main.c b/kernel/module/main.c
index 71396e297499..d195d788835c 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -1225,18 +1225,12 @@ static int module_memory_alloc(struct module *mod, enum mod_mem_type type)
 	return 0;
 }
 
-static void module_memory_free(struct module *mod, enum mod_mem_type type,
-			       bool unload_codetags)
+static void module_memory_free(struct module *mod, enum mod_mem_type type)
 {
-	void *ptr = mod->mem[type].base;
-
-	if (!unload_codetags && mod_mem_type_is_core_data(type))
-		return;
-
-	execmem_free(ptr);
+	execmem_free(mod->mem[type].base);
 }
 
-static void free_mod_mem(struct module *mod, bool unload_codetags)
+static void free_mod_mem(struct module *mod)
 {
 	for_each_mod_mem_type(type) {
 		struct module_memory *mod_mem = &mod->mem[type];
@@ -1247,25 +1241,20 @@ static void free_mod_mem(struct module *mod, bool unload_codetags)
 		/* Free lock-classes; relies on the preceding sync_rcu(). */
 		lockdep_free_key_range(mod_mem->base, mod_mem->size);
 		if (mod_mem->size)
-			module_memory_free(mod, type, unload_codetags);
+			module_memory_free(mod, type);
 	}
 
 	/* MOD_DATA hosts mod, so free it at last */
 	lockdep_free_key_range(mod->mem[MOD_DATA].base, mod->mem[MOD_DATA].size);
-	module_memory_free(mod, MOD_DATA, unload_codetags);
+	module_memory_free(mod, MOD_DATA);
 }
 
 /* Free a module, remove from lists, etc. */
 static void free_module(struct module *mod)
 {
-	bool unload_codetags;
-
 	trace_module_free(mod);
 
-	unload_codetags = codetag_unload_module(mod);
-	if (!unload_codetags)
-		pr_warn("%s: memory allocation(s) from the module still alive, cannot unload cleanly\n",
-			mod->name);
+	codetag_unload_module(mod);
 
 	mod_sysfs_teardown(mod);
 
@@ -1308,7 +1297,7 @@ static void free_module(struct module *mod)
 	kfree(mod->args);
 	percpu_modfree(mod);
 
-	free_mod_mem(mod, unload_codetags);
+	free_mod_mem(mod);
 }
 
 void *__symbol_get(const char *symbol)
@@ -1573,6 +1562,14 @@ static void __layout_sections(struct module *mod, struct load_info *info, bool i
 			if (WARN_ON_ONCE(type == MOD_INVALID))
 				continue;
 
+			/*
+			 * Do not allocate codetag memory as we load it into
+			 * preallocated contiguous memory. s->sh_entsize will
+			 * not be used for this section, so leave it as is.
+			 */
+			if (codetag_needs_module_section(mod, sname, s->sh_size))
+				continue;
+
 			s->sh_entsize = module_get_offset_and_type(mod, type, s, i);
 			pr_debug("\t%s\n", sname);
 		}
@@ -2247,6 +2244,7 @@ static int move_module(struct module *mod, struct load_info *info)
 	int i;
 	enum mod_mem_type t = 0;
 	int ret = -ENOMEM;
+	bool codetag_section_found = false;
 
 	for_each_mod_mem_type(type) {
 		if (!mod->mem[type].size) {
@@ -2257,7 +2255,7 @@ static int move_module(struct module *mod, struct load_info *info)
 		ret = module_memory_alloc(mod, type);
 		if (ret) {
 			t = type;
-			goto out_enomem;
+			goto out_err;
 		}
 	}
 
@@ -2267,11 +2265,27 @@ static int move_module(struct module *mod, struct load_info *info)
 		void *dest;
 		Elf_Shdr *shdr = &info->sechdrs[i];
 		enum mod_mem_type type = shdr->sh_entsize >> SH_ENTSIZE_TYPE_SHIFT;
+		const char *sname;
 
 		if (!(shdr->sh_flags & SHF_ALLOC))
 			continue;
 
-		dest = mod->mem[type].base + (shdr->sh_entsize & SH_ENTSIZE_OFFSET_MASK);
+		sname = info->secstrings + shdr->sh_name;
+		/*
+		 * Load codetag sections separately as they might still be used
+		 * after module unload.
+		 */
+		if (codetag_needs_module_section(mod, sname, shdr->sh_size)) {
+			dest = codetag_alloc_module_section(mod, sname, shdr->sh_size,
+					arch_mod_section_prepend(mod, i), shdr->sh_addralign);
+			if (IS_ERR(dest)) {
+				ret = PTR_ERR(dest);
+				goto out_err;
+			}
+			codetag_section_found = true;
+		} else {
+			dest = mod->mem[type].base + (shdr->sh_entsize & SH_ENTSIZE_OFFSET_MASK);
+		}
 
 		if (shdr->sh_type != SHT_NOBITS) {
 			/*
@@ -2283,7 +2297,7 @@ static int move_module(struct module *mod, struct load_info *info)
 			if (i == info->index.mod &&
 			   (WARN_ON_ONCE(shdr->sh_size != sizeof(struct module)))) {
 				ret = -ENOEXEC;
-				goto out_enomem;
+				goto out_err;
 			}
 			memcpy(dest, (void *)shdr->sh_addr, shdr->sh_size);
 		}
@@ -2299,9 +2313,12 @@ static int move_module(struct module *mod, struct load_info *info)
 	}
 
 	return 0;
-out_enomem:
+out_err:
 	for (t--; t >= 0; t--)
-		module_memory_free(mod, t, true);
+		module_memory_free(mod, t);
+	if (codetag_section_found)
+		codetag_free_module_sections(mod);
+
 	return ret;
 }
 
@@ -2422,6 +2439,8 @@ static struct module *layout_and_allocate(struct load_info *info, int flags)
 	/* Module has been copied to its final place now: return it. */
 	mod = (void *)info->sechdrs[info->index.mod].sh_addr;
 	kmemleak_load_module(mod, info);
+	codetag_module_replaced(info->mod, mod);
+
 	return mod;
 }
 
@@ -2431,7 +2450,7 @@ static void module_deallocate(struct module *mod, struct load_info *info)
 	percpu_modfree(mod);
 	module_arch_freeing_init(mod);
 
-	free_mod_mem(mod, true);
+	free_mod_mem(mod);
 }
 
 int __weak module_finalize(const Elf_Ehdr *hdr,
diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index 81e5f9a70f22..19ef02a18611 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -1,5 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0-only
 #include <linux/alloc_tag.h>
+#include <linux/execmem.h>
 #include <linux/fs.h>
 #include <linux/gfp.h>
 #include <linux/module.h>
@@ -149,30 +150,244 @@ static void __init procfs_init(void)
 	proc_create_seq("allocinfo", 0400, NULL, &allocinfo_seq_op);
 }
 
-static bool alloc_tag_module_unload(struct codetag_type *cttype,
-				    struct codetag_module *cmod)
+#ifdef CONFIG_MODULES
+
+static struct maple_tree mod_area_mt = MTREE_INIT(mod_area_mt, MT_FLAGS_ALLOC_RANGE);
+/* A dummy object used to indicate an unloaded module */
+static struct module unloaded_mod;
+/* A dummy object used to indicate a module prepended area */
+static struct module prepend_mod;
+
+static struct alloc_tag_module_section module_tags;
+
+static bool needs_section_mem(struct module *mod, unsigned long size)
 {
-	struct codetag_iterator iter = codetag_get_ct_iter(cttype);
-	struct alloc_tag_counters counter;
-	bool module_unused = true;
+	return size >= sizeof(struct alloc_tag);
+}
+
+/* Called under RCU read lock */
+static void clean_unused_module_areas_locked(void)
+{
+	MA_STATE(mas, &mod_area_mt, 0, module_tags.size);
+	struct module *val;
+
+	mas_for_each(&mas, val, module_tags.size) {
+		if (val == &unloaded_mod) {
+			struct alloc_tag *tag;
+			struct alloc_tag *last;
+			bool unused = true;
+
+			tag = (struct alloc_tag *)(module_tags.start_addr + mas.index);
+			last = (struct alloc_tag *)(module_tags.start_addr + mas.last);
+			while (tag <= last) {
+				struct alloc_tag_counters counter;
+
+				counter = alloc_tag_read(tag);
+				if (counter.bytes) {
+					unused = false;
+					break;
+				}
+				tag++;
+			}
+			if (unused)
+				mas_erase(&mas);
+		}
+	}
+}
+
+static void *reserve_module_tags(struct module *mod, unsigned long size,
+				 unsigned int prepend, unsigned long align)
+{
+	unsigned long section_size = module_tags.end_addr - module_tags.start_addr;
+	MA_STATE(mas, &mod_area_mt, 0, section_size - 1);
+	bool cleanup_done = false;
+	unsigned long offset;
+	void *ret;
+
+	/* If no tags return NULL */
+	if (size < sizeof(struct alloc_tag))
+		return NULL;
+
+	/*
+	 * align is always power of 2, so we can use IS_ALIGNED and ALIGN.
+	 * align 0 or 1 means no alignment, to simplify set to 1.
+	 */
+	if (!align)
+		align = 1;
+
+	mas_lock(&mas);
+repeat:
+	/* Try finding exact size and hope the start is aligned */
+	if (mas_empty_area(&mas, 0, section_size - 1, prepend + size))
+		goto cleanup;
+
+	if (IS_ALIGNED(mas.index + prepend, align))
+		goto found;
+
+	/* Try finding larger area to align later */
+	mas_reset(&mas);
+	if (!mas_empty_area(&mas, 0, section_size - 1,
+			    size + prepend + align - 1))
+		goto found;
+
+cleanup:
+	/* No free area, try cleanup stale data and repeat the search once */
+	if (!cleanup_done) {
+		clean_unused_module_areas_locked();
+		cleanup_done = true;
+		mas_reset(&mas);
+		goto repeat;
+	} else {
+		ret = ERR_PTR(-ENOMEM);
+		goto out;
+	}
+
+found:
+	offset = mas.index;
+	offset += prepend;
+	offset = ALIGN(offset, align);
+
+	if (offset != mas.index) {
+		unsigned long pad_start = mas.index;
+
+		mas.last = offset - 1;
+		mas_store(&mas, &prepend_mod);
+		if (mas_is_err(&mas)) {
+			ret = ERR_PTR(xa_err(mas.node));
+			goto out;
+		}
+		mas.index = offset;
+		mas.last = offset + size - 1;
+		mas_store(&mas, mod);
+		if (mas_is_err(&mas)) {
+			ret = ERR_PTR(xa_err(mas.node));
+			mas.index = pad_start;
+			mas_erase(&mas);
+			goto out;
+		}
+
+	} else {
+		mas.last = offset + size - 1;
+		mas_store(&mas, mod);
+		if (mas_is_err(&mas)) {
+			ret = ERR_PTR(xa_err(mas.node));
+			goto out;
+		}
+	}
+
+	if (module_tags.size < offset + size)
+		module_tags.size = offset + size;
+
+	ret = (struct alloc_tag *)(module_tags.start_addr + offset);
+out:
+	mas_unlock(&mas);
+
+	return ret;
+}
+
+static void release_module_tags(struct module *mod, bool unused)
+{
+	MA_STATE(mas, &mod_area_mt, module_tags.size, module_tags.size);
+	struct alloc_tag *last;
 	struct alloc_tag *tag;
-	struct codetag *ct;
+	struct module *val;
 
-	for (ct = codetag_next_ct(&iter); ct; ct = codetag_next_ct(&iter)) {
-		if (iter.cmod != cmod)
-			continue;
+	if (unused)
+		return;
+
+	mas_lock(&mas);
+	mas_for_each_rev(&mas, val, 0)
+		if (val == mod)
+			break;
+
+	if (!val) /* module not found */
+		goto out;
+
+	tag = (struct alloc_tag *)(module_tags.start_addr + mas.index);
+	last = (struct alloc_tag *)(module_tags.start_addr + mas.last);
+
+	unused = true;
+	while (tag <= last) {
+		struct alloc_tag_counters counter;
 
-		tag = ct_to_alloc_tag(ct);
 		counter = alloc_tag_read(tag);
+		if (counter.bytes) {
+			struct codetag *ct = &tag->ct;
+
+			pr_info("%s:%u module %s func:%s has %llu allocated at module unload\n",
+				ct->filename, ct->lineno, ct->modname,
+				ct->function, counter.bytes);
+			unused = false;
+			break;
+		}
+		tag++;
+	}
+
+	mas_store(&mas, unused ? NULL : &unloaded_mod);
+	val = mas_prev_range(&mas, 0);
+	if (val == &prepend_mod)
+		mas_store(&mas, NULL);
+out:
+	mas_unlock(&mas);
+}
+
+static void replace_module(struct module *mod, struct module *new_mod)
+{
+	MA_STATE(mas, &mod_area_mt, 0, module_tags.size);
+	struct module *val;
 
-		if (WARN(counter.bytes,
-			 "%s:%u module %s func:%s has %llu allocated at module unload",
-			 ct->filename, ct->lineno, ct->modname, ct->function, counter.bytes))
-			module_unused = false;
+	mas_lock(&mas);
+	mas_for_each(&mas, val, module_tags.size) {
+		if (val != mod)
+			continue;
+
+		mas_store_gfp(&mas, new_mod, GFP_KERNEL);
+		break;
 	}
+	mas_unlock(&mas);
+}
+
+static unsigned long max_module_alloc_tags __initdata = 100000;
 
-	return module_unused;
+static int __init set_max_module_alloc_tags(char *arg)
+{
+	if (!arg)
+		return -EINVAL;
+
+	return kstrtoul(arg, 0, &max_module_alloc_tags);
 }
+early_param("max_module_alloc_tags", set_max_module_alloc_tags);
+
+static int __init alloc_mod_tags_mem(void)
+{
+	unsigned long module_tags_mem_sz;
+
+	module_tags_mem_sz = max_module_alloc_tags * sizeof(struct alloc_tag);
+	pr_info("%lu bytes reserved for module allocation tags\n", module_tags_mem_sz);
+
+	/* Allocate space to copy allocation tags */
+	module_tags.start_addr = (unsigned long)execmem_alloc(EXECMEM_MODULE_DATA,
+							      module_tags_mem_sz);
+	if (!module_tags.start_addr)
+		return -ENOMEM;
+
+	module_tags.end_addr = module_tags.start_addr + module_tags_mem_sz;
+
+	return 0;
+}
+
+static void __init free_mod_tags_mem(void)
+{
+	execmem_free((void *)module_tags.start_addr);
+	module_tags.start_addr = 0;
+}
+
+#else /* CONFIG_MODULES */
+
+static inline int alloc_mod_tags_mem(void) { return 0; }
+static inline void free_mod_tags_mem(void) {}
+
+#endif /* CONFIG_MODULES */
 
 #ifdef CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT
 static bool mem_profiling_support __meminitdata = true;
@@ -255,14 +470,26 @@ static inline void sysctl_init(void) {}
 static int __init alloc_tag_init(void)
 {
 	const struct codetag_type_desc desc = {
-		.section	= "alloc_tags",
-		.tag_size	= sizeof(struct alloc_tag),
-		.module_unload	= alloc_tag_module_unload,
+		.section		= ALLOC_TAG_SECTION_NAME,
+		.tag_size		= sizeof(struct alloc_tag),
+#ifdef CONFIG_MODULES
+		.needs_section_mem	= needs_section_mem,
+		.alloc_section_mem	= reserve_module_tags,
+		.free_section_mem	= release_module_tags,
+		.module_replaced	= replace_module,
+#endif
 	};
+	int res;
+
+	res = alloc_mod_tags_mem();
+	if (res)
+		return res;
 
 	alloc_tag_cttype = codetag_register_type(&desc);
-	if (IS_ERR(alloc_tag_cttype))
+	if (IS_ERR(alloc_tag_cttype)) {
+		free_mod_tags_mem();
 		return PTR_ERR(alloc_tag_cttype);
+	}
 
 	sysctl_init();
 	procfs_init();
diff --git a/lib/codetag.c b/lib/codetag.c
index afa8a2d4f317..60463ef4bb85 100644
--- a/lib/codetag.c
+++ b/lib/codetag.c
@@ -207,6 +207,94 @@ static int codetag_module_init(struct codetag_type *cttype, struct module *mod)
 }
 
 #ifdef CONFIG_MODULES
+#define CODETAG_SECTION_PREFIX	".codetag."
+
+/* Some codetag types need a separate module section */
+bool codetag_needs_module_section(struct module *mod, const char *name,
+				  unsigned long size)
+{
+	const char *type_name;
+	struct codetag_type *cttype;
+	bool ret = false;
+
+	if (strncmp(name, CODETAG_SECTION_PREFIX, strlen(CODETAG_SECTION_PREFIX)))
+		return false;
+
+	type_name = name + strlen(CODETAG_SECTION_PREFIX);
+	mutex_lock(&codetag_lock);
+	list_for_each_entry(cttype, &codetag_types, link) {
+		if (strcmp(type_name, cttype->desc.section) == 0) {
+			if (!cttype->desc.needs_section_mem)
+				break;
+
+			down_write(&cttype->mod_lock);
+			ret = cttype->desc.needs_section_mem(mod, size);
+			up_write(&cttype->mod_lock);
+			break;
+		}
+	}
+	mutex_unlock(&codetag_lock);
+
+	return ret;
+}
+
+void *codetag_alloc_module_section(struct module *mod, const char *name,
+				   unsigned long size, unsigned int prepend,
+				   unsigned long align)
+{
+	const char *type_name = name + strlen(CODETAG_SECTION_PREFIX);
+	struct codetag_type *cttype;
+	void *ret = NULL;
+
+	mutex_lock(&codetag_lock);
+	list_for_each_entry(cttype, &codetag_types, link) {
+		if (strcmp(type_name, cttype->desc.section) == 0) {
+			if (WARN_ON(!cttype->desc.alloc_section_mem))
+				break;
+
+			down_write(&cttype->mod_lock);
+			ret = cttype->desc.alloc_section_mem(mod, size, prepend, align);
+			up_write(&cttype->mod_lock);
+			break;
+		}
+	}
+	mutex_unlock(&codetag_lock);
+
+	return ret;
+}
+
+void codetag_free_module_sections(struct module *mod)
+{
+	struct codetag_type *cttype;
+
+	mutex_lock(&codetag_lock);
+	list_for_each_entry(cttype, &codetag_types, link) {
+		if (!cttype->desc.free_section_mem)
+			continue;
+
+		down_write(&cttype->mod_lock);
+		cttype->desc.free_section_mem(mod, true);
+		up_write(&cttype->mod_lock);
+	}
+	mutex_unlock(&codetag_lock);
+}
+
+void codetag_module_replaced(struct module *mod, struct module *new_mod)
+{
+	struct codetag_type *cttype;
+
+	mutex_lock(&codetag_lock);
+	list_for_each_entry(cttype, &codetag_types, link) {
+		if (!cttype->desc.module_replaced)
+			continue;
+
+		down_write(&cttype->mod_lock);
+		cttype->desc.module_replaced(mod, new_mod);
+		up_write(&cttype->mod_lock);
+	}
+	mutex_unlock(&codetag_lock);
+}
+
 void codetag_load_module(struct module *mod)
 {
 	struct codetag_type *cttype;
@@ -220,13 +308,12 @@ void codetag_load_module(struct module *mod)
 	mutex_unlock(&codetag_lock);
 }
 
-bool codetag_unload_module(struct module *mod)
+void codetag_unload_module(struct module *mod)
 {
 	struct codetag_type *cttype;
-	bool unload_ok = true;
 
 	if (!mod)
-		return true;
+		return;
 
 	mutex_lock(&codetag_lock);
 	list_for_each_entry(cttype, &codetag_types, link) {
@@ -243,18 +330,17 @@ bool codetag_unload_module(struct module *mod)
 		}
 		if (found) {
 			if (cttype->desc.module_unload)
-				if (!cttype->desc.module_unload(cttype, cmod))
-					unload_ok = false;
+				cttype->desc.module_unload(cttype, cmod);
 
 			cttype->count -= range_size(cttype, &cmod->range);
 			idr_remove(&cttype->mod_idr, mod_id);
 			kfree(cmod);
 		}
 		up_write(&cttype->mod_lock);
+		if (found && cttype->desc.free_section_mem)
+			cttype->desc.free_section_mem(mod, false);
 	}
 	mutex_unlock(&codetag_lock);
-
-	return unload_ok;
 }
 #endif /* CONFIG_MODULES */
 
diff --git a/scripts/module.lds.S b/scripts/module.lds.S
index 3f43edef813c..711c6e029936 100644
--- a/scripts/module.lds.S
+++ b/scripts/module.lds.S
@@ -50,7 +50,7 @@ SECTIONS {
 	.data : {
 		*(.data .data.[0-9a-zA-Z_]*)
 		*(.data..L*)
-		CODETAG_SECTIONS()
+		MOD_CODETAG_SECTIONS()
 	}
 
 	.rodata : {
@@ -59,9 +59,10 @@ SECTIONS {
 	}
 #else
 	.data : {
-		CODETAG_SECTIONS()
+		MOD_CODETAG_SECTIONS()
 	}
 #endif
+	MOD_SEPARATE_CODETAG_SECTIONS()
 }
 
 /* bring in arch-specific sections */
-- 
2.46.0.469.g59c65b2a67-goog



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v2 3/6] alloc_tag: eliminate alloc_tag_ref_set
  2024-09-02  4:41 [PATCH v2 0/6] page allocation tag compression Suren Baghdasaryan
  2024-09-02  4:41 ` [PATCH v2 1/6] maple_tree: add mas_for_each_rev() helper Suren Baghdasaryan
  2024-09-02  4:41 ` [PATCH v2 2/6] alloc_tag: load module tags into separate continuous memory Suren Baghdasaryan
@ 2024-09-02  4:41 ` Suren Baghdasaryan
  2024-09-02  4:41 ` [PATCH v2 4/6] alloc_tag: introduce pgalloc_tag_ref to abstract page tag references Suren Baghdasaryan
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 27+ messages in thread
From: Suren Baghdasaryan @ 2024-09-02  4:41 UTC (permalink / raw)
  To: akpm
  Cc: kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth,
	tglx, bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
	roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
	souravpanda, keescook, dennis, jhubbard, yuzhao, vvvvvv, rostedt,
	iamjoonsoo.kim, rientjes, minchan, kaleshsingh, linux-doc,
	linux-kernel, linux-arch, linux-mm, linux-modules, kernel-team,
	surenb

To simplify further refactoring, open-code the only two callers
of alloc_tag_ref_set().

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 include/linux/alloc_tag.h   | 25 ++-----------------------
 include/linux/pgalloc_tag.h | 12 +++++++++++-
 2 files changed, 13 insertions(+), 24 deletions(-)

diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
index 99cbc7f086ad..21e3098220e3 100644
--- a/include/linux/alloc_tag.h
+++ b/include/linux/alloc_tag.h
@@ -143,36 +143,15 @@ static inline void alloc_tag_add_check(union codetag_ref *ref, struct alloc_tag
 static inline void alloc_tag_sub_check(union codetag_ref *ref) {}
 #endif
 
-/* Caller should verify both ref and tag to be valid */
-static inline void __alloc_tag_ref_set(union codetag_ref *ref, struct alloc_tag *tag)
-{
-	ref->ct = &tag->ct;
-	/*
-	 * We need in increment the call counter every time we have a new
-	 * allocation or when we split a large allocation into smaller ones.
-	 * Each new reference for every sub-allocation needs to increment call
-	 * counter because when we free each part the counter will be decremented.
-	 */
-	this_cpu_inc(tag->counters->calls);
-}
-
-static inline void alloc_tag_ref_set(union codetag_ref *ref, struct alloc_tag *tag)
-{
-	alloc_tag_add_check(ref, tag);
-	if (!ref || !tag)
-		return;
-
-	__alloc_tag_ref_set(ref, tag);
-}
-
 static inline void alloc_tag_add(union codetag_ref *ref, struct alloc_tag *tag, size_t bytes)
 {
 	alloc_tag_add_check(ref, tag);
 	if (!ref || !tag)
 		return;
 
-	__alloc_tag_ref_set(ref, tag);
+	ref->ct = &tag->ct;
 	this_cpu_add(tag->counters->bytes, bytes);
+	this_cpu_inc(tag->counters->calls);
 }
 
 static inline void alloc_tag_sub(union codetag_ref *ref, size_t bytes)
diff --git a/include/linux/pgalloc_tag.h b/include/linux/pgalloc_tag.h
index 207f0c83c8e9..244a328dff62 100644
--- a/include/linux/pgalloc_tag.h
+++ b/include/linux/pgalloc_tag.h
@@ -103,7 +103,17 @@ static inline void pgalloc_tag_split(struct page *page, unsigned int nr)
 	page_ext = page_ext_next(page_ext);
 	for (i = 1; i < nr; i++) {
 		/* Set new reference to point to the original tag */
-		alloc_tag_ref_set(codetag_ref_from_page_ext(page_ext), tag);
+		ref = codetag_ref_from_page_ext(page_ext);
+		alloc_tag_add_check(ref, tag);
+		if (ref) {
+			ref->ct = &tag->ct;
+			/*
+			 * We need in increment the call counter every time we split a
+			 * large allocation into smaller ones because when we free each
+			 * part the counter will be decremented.
+			 */
+			this_cpu_inc(tag->counters->calls);
+		}
 		page_ext = page_ext_next(page_ext);
 	}
 out:
-- 
2.46.0.469.g59c65b2a67-goog



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v2 4/6] alloc_tag: introduce pgalloc_tag_ref to abstract page tag references
  2024-09-02  4:41 [PATCH v2 0/6] page allocation tag compression Suren Baghdasaryan
                   ` (2 preceding siblings ...)
  2024-09-02  4:41 ` [PATCH v2 3/6] alloc_tag: eliminate alloc_tag_ref_set Suren Baghdasaryan
@ 2024-09-02  4:41 ` Suren Baghdasaryan
  2024-09-02  4:41 ` [PATCH v2 5/6] alloc_tag: make page allocation tag reference size configurable Suren Baghdasaryan
  2024-09-02  4:41 ` [PATCH v2 6/6] alloc_tag: config to store page allocation tag refs in page flags Suren Baghdasaryan
  5 siblings, 0 replies; 27+ messages in thread
From: Suren Baghdasaryan @ 2024-09-02  4:41 UTC (permalink / raw)
  To: akpm
  Cc: kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth,
	tglx, bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
	roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
	souravpanda, keescook, dennis, jhubbard, yuzhao, vvvvvv, rostedt,
	iamjoonsoo.kim, rientjes, minchan, kaleshsingh, linux-doc,
	linux-kernel, linux-arch, linux-mm, linux-modules, kernel-team,
	surenb

To simplify later changes to page tag references, introduce new
pgalloc_tag_ref and pgtag_ref_handle types. This allows easy
replacement of page_ext as a storage of page allocation tags

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 include/linux/pgalloc_tag.h | 144 +++++++++++++++++++++++-------------
 lib/alloc_tag.c             |   3 +-
 2 files changed, 95 insertions(+), 52 deletions(-)

diff --git a/include/linux/pgalloc_tag.h b/include/linux/pgalloc_tag.h
index 244a328dff62..c76b629d0206 100644
--- a/include/linux/pgalloc_tag.h
+++ b/include/linux/pgalloc_tag.h
@@ -9,48 +9,76 @@
 
 #ifdef CONFIG_MEM_ALLOC_PROFILING
 
+typedef union codetag_ref	pgalloc_tag_ref;
+
+static inline void read_pgref(pgalloc_tag_ref *pgref, union codetag_ref *ref)
+{
+	ref->ct = pgref->ct;
+}
+
+static inline void write_pgref(pgalloc_tag_ref *pgref, union codetag_ref *ref)
+{
+	pgref->ct = ref->ct;
+}
 #include <linux/page_ext.h>
 
 extern struct page_ext_operations page_alloc_tagging_ops;
 
-static inline union codetag_ref *codetag_ref_from_page_ext(struct page_ext *page_ext)
+static inline pgalloc_tag_ref *pgref_from_page_ext(struct page_ext *page_ext)
 {
-	return (union codetag_ref *)page_ext_data(page_ext, &page_alloc_tagging_ops);
+	return (pgalloc_tag_ref *)page_ext_data(page_ext, &page_alloc_tagging_ops);
 }
 
-static inline struct page_ext *page_ext_from_codetag_ref(union codetag_ref *ref)
+static inline struct page_ext *page_ext_from_pgref(pgalloc_tag_ref *pgref)
 {
-	return (void *)ref - page_alloc_tagging_ops.offset;
+	return (void *)pgref - page_alloc_tagging_ops.offset;
 }
 
+typedef pgalloc_tag_ref	*pgtag_ref_handle;
+
 /* Should be called only if mem_alloc_profiling_enabled() */
-static inline union codetag_ref *get_page_tag_ref(struct page *page)
+static inline pgtag_ref_handle get_page_tag_ref(struct page *page, union codetag_ref *ref)
 {
 	if (page) {
 		struct page_ext *page_ext = page_ext_get(page);
 
-		if (page_ext)
-			return codetag_ref_from_page_ext(page_ext);
+		if (page_ext) {
+			pgalloc_tag_ref *pgref = pgref_from_page_ext(page_ext);
+
+			read_pgref(pgref, ref);
+			return pgref;
+		}
 	}
 	return NULL;
 }
 
-static inline void put_page_tag_ref(union codetag_ref *ref)
+static inline void put_page_tag_ref(pgtag_ref_handle pgref)
 {
-	if (WARN_ON(!ref))
+	if (WARN_ON(!pgref))
 		return;
 
-	page_ext_put(page_ext_from_codetag_ref(ref));
+	page_ext_put(page_ext_from_pgref(pgref));
+}
+
+static inline void update_page_tag_ref(pgtag_ref_handle pgref, union codetag_ref *ref)
+{
+	if (WARN_ON(!pgref || !ref))
+		return;
+
+	write_pgref(pgref, ref);
 }
 
 static inline void clear_page_tag_ref(struct page *page)
 {
 	if (mem_alloc_profiling_enabled()) {
-		union codetag_ref *ref = get_page_tag_ref(page);
-
-		if (ref) {
-			set_codetag_empty(ref);
-			put_page_tag_ref(ref);
+		pgtag_ref_handle handle;
+		union codetag_ref ref;
+
+		handle = get_page_tag_ref(page, &ref);
+		if (handle) {
+			set_codetag_empty(&ref);
+			update_page_tag_ref(handle, &ref);
+			put_page_tag_ref(handle);
 		}
 	}
 }
@@ -59,11 +87,14 @@ static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
 				   unsigned int nr)
 {
 	if (mem_alloc_profiling_enabled()) {
-		union codetag_ref *ref = get_page_tag_ref(page);
-
-		if (ref) {
-			alloc_tag_add(ref, task->alloc_tag, PAGE_SIZE * nr);
-			put_page_tag_ref(ref);
+		pgtag_ref_handle handle;
+		union codetag_ref ref;
+
+		handle = get_page_tag_ref(page, &ref);
+		if (handle) {
+			alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr);
+			update_page_tag_ref(handle, &ref);
+			put_page_tag_ref(handle);
 		}
 	}
 }
@@ -71,53 +102,58 @@ static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
 static inline void pgalloc_tag_sub(struct page *page, unsigned int nr)
 {
 	if (mem_alloc_profiling_enabled()) {
-		union codetag_ref *ref = get_page_tag_ref(page);
-
-		if (ref) {
-			alloc_tag_sub(ref, PAGE_SIZE * nr);
-			put_page_tag_ref(ref);
+		pgtag_ref_handle handle;
+		union codetag_ref ref;
+
+		handle = get_page_tag_ref(page, &ref);
+		if (handle) {
+			alloc_tag_sub(&ref, PAGE_SIZE * nr);
+			update_page_tag_ref(handle, &ref);
+			put_page_tag_ref(handle);
 		}
 	}
 }
 
 static inline void pgalloc_tag_split(struct page *page, unsigned int nr)
 {
-	int i;
-	struct page_ext *first_page_ext;
-	struct page_ext *page_ext;
-	union codetag_ref *ref;
+	pgtag_ref_handle first_pgref;
+	union codetag_ref first_ref;
 	struct alloc_tag *tag;
+	int i;
 
 	if (!mem_alloc_profiling_enabled())
 		return;
 
-	first_page_ext = page_ext = page_ext_get(page);
-	if (unlikely(!page_ext))
+	first_pgref = get_page_tag_ref(page, &first_ref);
+	if (unlikely(!first_pgref))
 		return;
 
-	ref = codetag_ref_from_page_ext(page_ext);
-	if (!ref->ct)
+	if (!first_ref.ct)
 		goto out;
 
-	tag = ct_to_alloc_tag(ref->ct);
-	page_ext = page_ext_next(page_ext);
+	tag = ct_to_alloc_tag(first_ref.ct);
 	for (i = 1; i < nr; i++) {
-		/* Set new reference to point to the original tag */
-		ref = codetag_ref_from_page_ext(page_ext);
-		alloc_tag_add_check(ref, tag);
-		if (ref) {
-			ref->ct = &tag->ct;
+		pgtag_ref_handle handle;
+		union codetag_ref ref;
+
+		page++;
+		handle = get_page_tag_ref(page, &ref);
+		if (handle) {
+			/* Set new reference to point to the original tag */
+			alloc_tag_add_check(&ref, tag);
+			ref.ct = &tag->ct;
 			/*
 			 * We need in increment the call counter every time we split a
 			 * large allocation into smaller ones because when we free each
 			 * part the counter will be decremented.
 			 */
 			this_cpu_inc(tag->counters->calls);
+			update_page_tag_ref(handle, &ref);
+			put_page_tag_ref(handle);
 		}
-		page_ext = page_ext_next(page_ext);
 	}
 out:
-	page_ext_put(first_page_ext);
+	put_page_tag_ref(first_pgref);
 }
 
 static inline struct alloc_tag *pgalloc_tag_get(struct page *page)
@@ -125,13 +161,15 @@ static inline struct alloc_tag *pgalloc_tag_get(struct page *page)
 	struct alloc_tag *tag = NULL;
 
 	if (mem_alloc_profiling_enabled()) {
-		union codetag_ref *ref = get_page_tag_ref(page);
-
-		alloc_tag_sub_check(ref);
-		if (ref) {
-			if (ref->ct)
-				tag = ct_to_alloc_tag(ref->ct);
-			put_page_tag_ref(ref);
+		pgtag_ref_handle handle;
+		union codetag_ref ref;
+
+		handle = get_page_tag_ref(page, &ref);
+		if (handle) {
+			alloc_tag_sub_check(&ref);
+			if (ref.ct)
+				tag = ct_to_alloc_tag(ref.ct);
+			put_page_tag_ref(handle);
 		}
 	}
 
@@ -146,8 +184,12 @@ static inline void pgalloc_tag_sub_pages(struct alloc_tag *tag, unsigned int nr)
 
 #else /* CONFIG_MEM_ALLOC_PROFILING */
 
-static inline union codetag_ref *get_page_tag_ref(struct page *page) { return NULL; }
-static inline void put_page_tag_ref(union codetag_ref *ref) {}
+typedef void	*pgtag_ref_handle;
+
+static inline pgtag_ref_handle
+get_page_tag_ref(struct page *page, union codetag_ref *ref) { return NULL; }
+static inline void put_page_tag_ref(pgtag_ref_handle handle) {}
+static inline void update_page_tag_ref(pgtag_ref_handle handle, union codetag_ref *ref) {}
 static inline void clear_page_tag_ref(struct page *page) {}
 static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
 				   unsigned int nr) {}
diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index 19ef02a18611..53bd3236d30b 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -5,6 +5,7 @@
 #include <linux/gfp.h>
 #include <linux/module.h>
 #include <linux/page_ext.h>
+#include <linux/pgalloc_tag.h>
 #include <linux/proc_fs.h>
 #include <linux/seq_buf.h>
 #include <linux/seq_file.h>
@@ -436,7 +437,7 @@ static __init void init_page_alloc_tagging(void)
 }
 
 struct page_ext_operations page_alloc_tagging_ops = {
-	.size = sizeof(union codetag_ref),
+	.size = sizeof(pgalloc_tag_ref),
 	.need = need_page_alloc_tagging,
 	.init = init_page_alloc_tagging,
 };
-- 
2.46.0.469.g59c65b2a67-goog



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v2 5/6] alloc_tag: make page allocation tag reference size configurable
  2024-09-02  4:41 [PATCH v2 0/6] page allocation tag compression Suren Baghdasaryan
                   ` (3 preceding siblings ...)
  2024-09-02  4:41 ` [PATCH v2 4/6] alloc_tag: introduce pgalloc_tag_ref to abstract page tag references Suren Baghdasaryan
@ 2024-09-02  4:41 ` Suren Baghdasaryan
  2024-09-02  5:09   ` Andrew Morton
  2024-09-02  4:41 ` [PATCH v2 6/6] alloc_tag: config to store page allocation tag refs in page flags Suren Baghdasaryan
  5 siblings, 1 reply; 27+ messages in thread
From: Suren Baghdasaryan @ 2024-09-02  4:41 UTC (permalink / raw)
  To: akpm
  Cc: kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth,
	tglx, bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
	roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
	souravpanda, keescook, dennis, jhubbard, yuzhao, vvvvvv, rostedt,
	iamjoonsoo.kim, rientjes, minchan, kaleshsingh, linux-doc,
	linux-kernel, linux-arch, linux-mm, linux-modules, kernel-team,
	surenb

Introduce CONFIG_PGALLOC_TAG_REF_BITS to control the size of the
page allocation tag references. When the size is configured to be
less than a direct pointer, the tags are searched using an index
stored as the tag reference.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 include/linux/alloc_tag.h   | 10 +++-
 include/linux/codetag.h     |  3 ++
 include/linux/pgalloc_tag.h | 99 +++++++++++++++++++++++++++++++++++++
 lib/Kconfig.debug           | 11 +++++
 lib/alloc_tag.c             | 51 ++++++++++++++++++-
 lib/codetag.c               |  4 +-
 mm/mm_init.c                |  1 +
 7 files changed, 175 insertions(+), 4 deletions(-)

diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
index 21e3098220e3..b5cf24517333 100644
--- a/include/linux/alloc_tag.h
+++ b/include/linux/alloc_tag.h
@@ -30,8 +30,16 @@ struct alloc_tag {
 	struct alloc_tag_counters __percpu	*counters;
 } __aligned(8);
 
+struct alloc_tag_kernel_section {
+	struct alloc_tag *first_tag;
+	unsigned long count;
+};
+
 struct alloc_tag_module_section {
-	unsigned long start_addr;
+	union {
+		unsigned long start_addr;
+		struct alloc_tag *first_tag;
+	};
 	unsigned long end_addr;
 	/* used size */
 	unsigned long size;
diff --git a/include/linux/codetag.h b/include/linux/codetag.h
index fb4e7adfa746..401fc297eeda 100644
--- a/include/linux/codetag.h
+++ b/include/linux/codetag.h
@@ -13,6 +13,9 @@ struct codetag_module;
 struct seq_buf;
 struct module;
 
+#define CODETAG_SECTION_START_PREFIX	"__start_"
+#define CODETAG_SECTION_STOP_PREFIX	"__stop_"
+
 /*
  * An instance of this structure is created in a special ELF section at every
  * code location being tagged.  At runtime, the special section is treated as
diff --git a/include/linux/pgalloc_tag.h b/include/linux/pgalloc_tag.h
index c76b629d0206..a7f8f00c118f 100644
--- a/include/linux/pgalloc_tag.h
+++ b/include/linux/pgalloc_tag.h
@@ -9,7 +9,18 @@
 
 #ifdef CONFIG_MEM_ALLOC_PROFILING
 
+#if !defined(CONFIG_PGALLOC_TAG_REF_BITS) || CONFIG_PGALLOC_TAG_REF_BITS > 32
+#define PGALLOC_TAG_DIRECT_REF
 typedef union codetag_ref	pgalloc_tag_ref;
+#else /* !defined(CONFIG_PGALLOC_TAG_REF_BITS) || CONFIG_PGALLOC_TAG_REF_BITS > 32 */
+#if CONFIG_PGALLOC_TAG_REF_BITS > 16
+typedef u32	pgalloc_tag_ref;
+#else
+typedef u16	pgalloc_tag_ref;
+#endif
+#endif /* !defined(CONFIG_PGALLOC_TAG_REF_BITS) || CONFIG_PGALLOC_TAG_REF_BITS > 32 */
+
+#ifdef PGALLOC_TAG_DIRECT_REF
 
 static inline void read_pgref(pgalloc_tag_ref *pgref, union codetag_ref *ref)
 {
@@ -20,6 +31,93 @@ static inline void write_pgref(pgalloc_tag_ref *pgref, union codetag_ref *ref)
 {
 	pgref->ct = ref->ct;
 }
+
+static inline void alloc_tag_sec_init(void) {}
+
+#else /* PGALLOC_TAG_DIRECT_REF */
+
+extern struct alloc_tag_kernel_section kernel_tags;
+
+#define CODETAG_ID_NULL		0
+#define CODETAG_ID_EMPTY	1
+#define CODETAG_ID_FIRST	2
+
+#ifdef CONFIG_MODULES
+
+extern struct alloc_tag_module_section module_tags;
+
+static inline struct codetag *get_module_ct(pgalloc_tag_ref pgref)
+{
+	return &module_tags.first_tag[pgref - kernel_tags.count].ct;
+}
+
+static inline pgalloc_tag_ref get_module_pgref(struct alloc_tag *tag)
+{
+	return CODETAG_ID_FIRST + kernel_tags.count + (tag - module_tags.first_tag);
+}
+
+#else /* CONFIG_MODULES */
+
+static inline struct codetag *get_module_ct(pgalloc_tag_ref pgref)
+{
+	pr_warn("invalid page tag reference %lu\n", (unsigned long)pgref);
+	return NULL;
+}
+
+static inline pgalloc_tag_ref get_module_pgref(struct alloc_tag *tag)
+{
+	pr_warn("invalid page tag 0x%lx\n", (unsigned long)tag);
+	return CODETAG_ID_NULL;
+}
+
+#endif /* CONFIG_MODULES */
+
+static inline void read_pgref(pgalloc_tag_ref *pgref, union codetag_ref *ref)
+{
+	pgalloc_tag_ref pgref_val = *pgref;
+
+	switch (pgref_val) {
+	case (CODETAG_ID_NULL):
+		ref->ct = NULL;
+		break;
+	case (CODETAG_ID_EMPTY):
+		set_codetag_empty(ref);
+		break;
+	default:
+		pgref_val -= CODETAG_ID_FIRST;
+		ref->ct = pgref_val < kernel_tags.count ?
+			&kernel_tags.first_tag[pgref_val].ct :
+			get_module_ct(pgref_val);
+		break;
+	}
+}
+
+static inline void write_pgref(pgalloc_tag_ref *pgref, union codetag_ref *ref)
+{
+	struct alloc_tag *tag;
+
+	if (!ref->ct) {
+		*pgref = CODETAG_ID_NULL;
+		return;
+	}
+
+	if (is_codetag_empty(ref)) {
+		*pgref = CODETAG_ID_EMPTY;
+		return;
+	}
+
+	tag = ct_to_alloc_tag(ref->ct);
+	if (tag >= kernel_tags.first_tag && tag < kernel_tags.first_tag + kernel_tags.count) {
+		*pgref = CODETAG_ID_FIRST + (tag - kernel_tags.first_tag);
+		return;
+	}
+
+	*pgref = get_module_pgref(tag);
+}
+
+void __init alloc_tag_sec_init(void);
+
+#endif /* PGALLOC_TAG_DIRECT_REF */
 #include <linux/page_ext.h>
 
 extern struct page_ext_operations page_alloc_tagging_ops;
@@ -197,6 +295,7 @@ static inline void pgalloc_tag_sub(struct page *page, unsigned int nr) {}
 static inline void pgalloc_tag_split(struct page *page, unsigned int nr) {}
 static inline struct alloc_tag *pgalloc_tag_get(struct page *page) { return NULL; }
 static inline void pgalloc_tag_sub_pages(struct alloc_tag *tag, unsigned int nr) {}
+static inline void alloc_tag_sec_init(void) {}
 
 #endif /* CONFIG_MEM_ALLOC_PROFILING */
 
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index a30c03a66172..253f9c2028da 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1000,6 +1000,17 @@ config MEM_ALLOC_PROFILING_DEBUG
 	  Adds warnings with helpful error messages for memory allocation
 	  profiling.
 
+config PGALLOC_TAG_REF_BITS
+	int "Number of bits for page allocation tag reference (10-64)"
+	range 10 64
+	default "64"
+	depends on MEM_ALLOC_PROFILING
+	help
+	  Number of bits used to encode a page allocation tag reference.
+
+	  Smaller number results in less memory overhead but limits the number of
+	  allocations which can be tagged (including allocations from modules).
+
 source "lib/Kconfig.kasan"
 source "lib/Kconfig.kfence"
 source "lib/Kconfig.kmsan"
diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index 53bd3236d30b..73791aa55ab6 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -3,6 +3,7 @@
 #include <linux/execmem.h>
 #include <linux/fs.h>
 #include <linux/gfp.h>
+#include <linux/kallsyms.h>
 #include <linux/module.h>
 #include <linux/page_ext.h>
 #include <linux/pgalloc_tag.h>
@@ -151,6 +152,26 @@ static void __init procfs_init(void)
 	proc_create_seq("allocinfo", 0400, NULL, &allocinfo_seq_op);
 }
 
+#ifndef PGALLOC_TAG_DIRECT_REF
+
+#define SECTION_START(NAME)	(CODETAG_SECTION_START_PREFIX NAME)
+#define SECTION_STOP(NAME)	(CODETAG_SECTION_STOP_PREFIX NAME)
+
+struct alloc_tag_kernel_section kernel_tags = { NULL, 0 };
+
+void __init alloc_tag_sec_init(void)
+{
+	struct alloc_tag *last_codetag;
+
+	kernel_tags.first_tag = (struct alloc_tag *)kallsyms_lookup_name(
+					SECTION_START(ALLOC_TAG_SECTION_NAME));
+	last_codetag = (struct alloc_tag *)kallsyms_lookup_name(
+					SECTION_STOP(ALLOC_TAG_SECTION_NAME));
+	kernel_tags.count = last_codetag - kernel_tags.first_tag;
+}
+
+#endif /* PGALLOC_TAG_DIRECT_REF */
+
 #ifdef CONFIG_MODULES
 
 static struct maple_tree mod_area_mt = MTREE_INIT(mod_area_mt, MT_FLAGS_ALLOC_RANGE);
@@ -159,7 +180,16 @@ static struct module unloaded_mod;
 /* A dummy object used to indicate a module prepended area */
 static struct module prepend_mod;
 
-static struct alloc_tag_module_section module_tags;
+struct alloc_tag_module_section module_tags;
+
+#ifndef PGALLOC_TAG_DIRECT_REF
+static inline unsigned long alloc_tag_align(unsigned long val)
+{
+	if (val % sizeof(struct alloc_tag) == 0)
+		return val;
+	return ((val / sizeof(struct alloc_tag)) + 1) * sizeof(struct alloc_tag);
+}
+#endif /* PGALLOC_TAG_DIRECT_REF */
 
 static bool needs_section_mem(struct module *mod, unsigned long size)
 {
@@ -216,6 +246,21 @@ static void *reserve_module_tags(struct module *mod, unsigned long size,
 	if (!align)
 		align = 1;
 
+#ifndef PGALLOC_TAG_DIRECT_REF
+	/*
+	 * If alloc_tag size is not a multiple of required alignment tag
+	 * indexing does not work.
+	 */
+	if (!IS_ALIGNED(sizeof(struct alloc_tag), align)) {
+		pr_err("%s: alignment %lu is incompatible with allocation tag indexing (CONFIG_PGALLOC_TAG_REF_BITS)",
+			mod->name, align);
+		return ERR_PTR(-EINVAL);
+	}
+
+	/* Ensure prepend consumes multiple of alloc_tag-sized blocks */
+	if (prepend)
+		prepend = alloc_tag_align(prepend);
+#endif /* PGALLOC_TAG_DIRECT_REF */
 	mas_lock(&mas);
 repeat:
 	/* Try finding exact size and hope the start is aligned */
@@ -373,6 +418,10 @@ static int __init alloc_mod_tags_mem(void)
 		return -ENOMEM;
 
 	module_tags.end_addr = module_tags.start_addr + module_tags_mem_sz;
+#ifndef PGALLOC_TAG_DIRECT_REF
+	/* Ensure the base is alloc_tag aligned */
+	module_tags.start_addr = alloc_tag_align(module_tags.start_addr);
+#endif
 
 	return 0;
 }
diff --git a/lib/codetag.c b/lib/codetag.c
index 60463ef4bb85..cb3aa1631417 100644
--- a/lib/codetag.c
+++ b/lib/codetag.c
@@ -149,8 +149,8 @@ static struct codetag_range get_section_range(struct module *mod,
 					      const char *section)
 {
 	return (struct codetag_range) {
-		get_symbol(mod, "__start_", section),
-		get_symbol(mod, "__stop_", section),
+		get_symbol(mod, CODETAG_SECTION_START_PREFIX, section),
+		get_symbol(mod, CODETAG_SECTION_STOP_PREFIX, section),
 	};
 }
 
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 4ba5607aaf19..231a95782455 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -2650,6 +2650,7 @@ void __init mm_core_init(void)
 	report_meminit();
 	kmsan_init_shadow();
 	stack_depot_early_init();
+	alloc_tag_sec_init();
 	mem_init();
 	kmem_cache_init();
 	/*
-- 
2.46.0.469.g59c65b2a67-goog



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v2 6/6] alloc_tag: config to store page allocation tag refs in page flags
  2024-09-02  4:41 [PATCH v2 0/6] page allocation tag compression Suren Baghdasaryan
                   ` (4 preceding siblings ...)
  2024-09-02  4:41 ` [PATCH v2 5/6] alloc_tag: make page allocation tag reference size configurable Suren Baghdasaryan
@ 2024-09-02  4:41 ` Suren Baghdasaryan
  2024-09-02  5:16   ` Andrew Morton
  5 siblings, 1 reply; 27+ messages in thread
From: Suren Baghdasaryan @ 2024-09-02  4:41 UTC (permalink / raw)
  To: akpm
  Cc: kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth,
	tglx, bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
	roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
	souravpanda, keescook, dennis, jhubbard, yuzhao, vvvvvv, rostedt,
	iamjoonsoo.kim, rientjes, minchan, kaleshsingh, linux-doc,
	linux-kernel, linux-arch, linux-mm, linux-modules, kernel-team,
	surenb

Add CONFIG_PGALLOC_TAG_USE_PAGEFLAGS to store allocation tag
references directly in the page flags. This removes dependency on
page_ext and results in better performance for page allocations as
well as reduced page_ext memory overhead.
CONFIG_PGALLOC_TAG_REF_BITS controls the number of bits required
to be available in the page flags to store the references. If the
number of page flag bits is insufficient, the build will fail and
either CONFIG_PGALLOC_TAG_REF_BITS would have to be lowered or
CONFIG_PGALLOC_TAG_USE_PAGEFLAGS should be disabled.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 include/linux/mmzone.h            |  3 ++
 include/linux/page-flags-layout.h | 10 +++++--
 include/linux/pgalloc_tag.h       | 48 +++++++++++++++++++++++++++++++
 lib/Kconfig.debug                 | 27 +++++++++++++++--
 lib/alloc_tag.c                   |  4 +++
 mm/page_ext.c                     |  2 +-
 6 files changed, 89 insertions(+), 5 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 17506e4a2835..0dd2b42f7cb6 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1085,6 +1085,7 @@ static inline bool zone_is_empty(struct zone *zone)
 #define KASAN_TAG_PGOFF		(LAST_CPUPID_PGOFF - KASAN_TAG_WIDTH)
 #define LRU_GEN_PGOFF		(KASAN_TAG_PGOFF - LRU_GEN_WIDTH)
 #define LRU_REFS_PGOFF		(LRU_GEN_PGOFF - LRU_REFS_WIDTH)
+#define ALLOC_TAG_REF_PGOFF	(LRU_REFS_PGOFF - ALLOC_TAG_REF_WIDTH)
 
 /*
  * Define the bit shifts to access each section.  For non-existent
@@ -1096,6 +1097,7 @@ static inline bool zone_is_empty(struct zone *zone)
 #define ZONES_PGSHIFT		(ZONES_PGOFF * (ZONES_WIDTH != 0))
 #define LAST_CPUPID_PGSHIFT	(LAST_CPUPID_PGOFF * (LAST_CPUPID_WIDTH != 0))
 #define KASAN_TAG_PGSHIFT	(KASAN_TAG_PGOFF * (KASAN_TAG_WIDTH != 0))
+#define ALLOC_TAG_REF_PGSHIFT	(ALLOC_TAG_REF_PGOFF * (ALLOC_TAG_REF_WIDTH != 0))
 
 /* NODE:ZONE or SECTION:ZONE is used to ID a zone for the buddy allocator */
 #ifdef NODE_NOT_IN_PAGE_FLAGS
@@ -1116,6 +1118,7 @@ static inline bool zone_is_empty(struct zone *zone)
 #define LAST_CPUPID_MASK	((1UL << LAST_CPUPID_SHIFT) - 1)
 #define KASAN_TAG_MASK		((1UL << KASAN_TAG_WIDTH) - 1)
 #define ZONEID_MASK		((1UL << ZONEID_SHIFT) - 1)
+#define ALLOC_TAG_REF_MASK	((1UL << ALLOC_TAG_REF_WIDTH) - 1)
 
 static inline enum zone_type page_zonenum(const struct page *page)
 {
diff --git a/include/linux/page-flags-layout.h b/include/linux/page-flags-layout.h
index 7d79818dc065..21bba7c8c965 100644
--- a/include/linux/page-flags-layout.h
+++ b/include/linux/page-flags-layout.h
@@ -5,6 +5,12 @@
 #include <linux/numa.h>
 #include <generated/bounds.h>
 
+#ifdef CONFIG_PGALLOC_TAG_USE_PAGEFLAGS
+#define ALLOC_TAG_REF_WIDTH	CONFIG_PGALLOC_TAG_REF_BITS
+#else
+#define ALLOC_TAG_REF_WIDTH	0
+#endif
+
 /*
  * When a memory allocation must conform to specific limitations (such
  * as being suitable for DMA) the caller will pass in hints to the
@@ -91,7 +97,7 @@
 #endif
 
 #if ZONES_WIDTH + LRU_GEN_WIDTH + SECTIONS_WIDTH + NODES_WIDTH + \
-	KASAN_TAG_WIDTH + LAST_CPUPID_SHIFT <= BITS_PER_LONG - NR_PAGEFLAGS
+	KASAN_TAG_WIDTH + ALLOC_TAG_REF_WIDTH + LAST_CPUPID_SHIFT <= BITS_PER_LONG - NR_PAGEFLAGS
 #define LAST_CPUPID_WIDTH LAST_CPUPID_SHIFT
 #else
 #define LAST_CPUPID_WIDTH 0
@@ -102,7 +108,7 @@
 #endif
 
 #if ZONES_WIDTH + LRU_GEN_WIDTH + SECTIONS_WIDTH + NODES_WIDTH + \
-	KASAN_TAG_WIDTH + LAST_CPUPID_WIDTH > BITS_PER_LONG - NR_PAGEFLAGS
+	KASAN_TAG_WIDTH + ALLOC_TAG_REF_WIDTH + LAST_CPUPID_WIDTH > BITS_PER_LONG - NR_PAGEFLAGS
 #error "Not enough bits in page flags"
 #endif
 
diff --git a/include/linux/pgalloc_tag.h b/include/linux/pgalloc_tag.h
index a7f8f00c118f..dcb6706dee15 100644
--- a/include/linux/pgalloc_tag.h
+++ b/include/linux/pgalloc_tag.h
@@ -118,6 +118,52 @@ static inline void write_pgref(pgalloc_tag_ref *pgref, union codetag_ref *ref)
 void __init alloc_tag_sec_init(void);
 
 #endif /* PGALLOC_TAG_DIRECT_REF */
+
+#ifdef CONFIG_PGALLOC_TAG_USE_PAGEFLAGS
+
+typedef struct page	*pgtag_ref_handle;
+
+/* Should be called only if mem_alloc_profiling_enabled() */
+static inline pgtag_ref_handle get_page_tag_ref(struct page *page,
+						union codetag_ref *ref)
+{
+	if (page) {
+		pgalloc_tag_ref pgref;
+
+		pgref = (page->flags >> ALLOC_TAG_REF_PGSHIFT) & ALLOC_TAG_REF_MASK;
+		read_pgref(&pgref, ref);
+		return page;
+	}
+
+	return NULL;
+}
+
+static inline void put_page_tag_ref(pgtag_ref_handle page)
+{
+	WARN_ON(!page);
+}
+
+static inline void update_page_tag_ref(pgtag_ref_handle page, union codetag_ref *ref)
+{
+	unsigned long old_flags, flags, val;
+	pgalloc_tag_ref pgref;
+
+	if (WARN_ON(!page || !ref))
+		return;
+
+	write_pgref(&pgref, ref);
+	val = (unsigned long)pgref;
+	val = (val & ALLOC_TAG_REF_MASK) << ALLOC_TAG_REF_PGSHIFT;
+	do {
+		old_flags = READ_ONCE(page->flags);
+		flags = old_flags;
+		flags &= ~(ALLOC_TAG_REF_MASK << ALLOC_TAG_REF_PGSHIFT);
+		flags |= val;
+	} while (unlikely(!try_cmpxchg(&page->flags, &old_flags, flags)));
+}
+
+#else /* CONFIG_PGALLOC_TAG_USE_PAGEFLAGS */
+
 #include <linux/page_ext.h>
 
 extern struct page_ext_operations page_alloc_tagging_ops;
@@ -166,6 +212,8 @@ static inline void update_page_tag_ref(pgtag_ref_handle pgref, union codetag_ref
 	write_pgref(pgref, ref);
 }
 
+#endif /* CONFIG_PGALLOC_TAG_USE_PAGEFLAGS */
+
 static inline void clear_page_tag_ref(struct page *page)
 {
 	if (mem_alloc_profiling_enabled()) {
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 253f9c2028da..9fc8c1981f27 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -979,7 +979,7 @@ config MEM_ALLOC_PROFILING
 	depends on PROC_FS
 	depends on !DEBUG_FORCE_WEAK_PER_CPU
 	select CODE_TAGGING
-	select PAGE_EXTENSION
+	select PAGE_EXTENSION if !PGALLOC_TAG_USE_PAGEFLAGS
 	select SLAB_OBJ_EXT
 	help
 	  Track allocation source code and record total allocation size
@@ -1000,10 +1000,26 @@ config MEM_ALLOC_PROFILING_DEBUG
 	  Adds warnings with helpful error messages for memory allocation
 	  profiling.
 
+config PGALLOC_TAG_USE_PAGEFLAGS
+	bool "Use pageflags to encode page allocation tag reference"
+	default n
+	depends on MEM_ALLOC_PROFILING
+	help
+	  When set, page allocation tag references are encoded inside page
+	  flags, otherwise they are encoded in page extensions.
+
+	  Setting this flag reduces memory and performance overhead of memory
+	  allocation profiling but also limits how many allocations can be
+	  tagged. The number of bits is set by PGALLOC_TAG_USE_PAGEFLAGS and
+	  they must fit in the page flags field.
+
+	  Say N if unsure.
+
 config PGALLOC_TAG_REF_BITS
 	int "Number of bits for page allocation tag reference (10-64)"
 	range 10 64
-	default "64"
+	default "16" if PGALLOC_TAG_USE_PAGEFLAGS
+	default "64" if !PGALLOC_TAG_USE_PAGEFLAGS
 	depends on MEM_ALLOC_PROFILING
 	help
 	  Number of bits used to encode a page allocation tag reference.
@@ -1011,6 +1027,13 @@ config PGALLOC_TAG_REF_BITS
 	  Smaller number results in less memory overhead but limits the number of
 	  allocations which can be tagged (including allocations from modules).
 
+	  If PGALLOC_TAG_USE_PAGEFLAGS is set, the number of requested bits should
+	  fit inside the page flags.
+
+	  If PGALLOC_TAG_USE_PAGEFLAGS is not set, the number of bits used to store
+	  a reference is rounded up to the closest basic type. If set higher than 32,
+	  a direct pointer to the allocation tag is stored for performance reasons.
+
 source "lib/Kconfig.kasan"
 source "lib/Kconfig.kfence"
 source "lib/Kconfig.kmsan"
diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index 73791aa55ab6..34820650186d 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -476,6 +476,8 @@ static int __init setup_early_mem_profiling(char *str)
 }
 early_param("sysctl.vm.mem_profiling", setup_early_mem_profiling);
 
+#ifndef CONFIG_PGALLOC_TAG_USE_PAGEFLAGS
+
 static __init bool need_page_alloc_tagging(void)
 {
 	return mem_profiling_support;
@@ -492,6 +494,8 @@ struct page_ext_operations page_alloc_tagging_ops = {
 };
 EXPORT_SYMBOL(page_alloc_tagging_ops);
 
+#endif /* CONFIG_PGALLOC_TAG_USE_PAGEFLAGS */
+
 #ifdef CONFIG_SYSCTL
 static struct ctl_table memory_allocation_profiling_sysctls[] = {
 	{
diff --git a/mm/page_ext.c b/mm/page_ext.c
index 641d93f6af4c..5f993c271ee7 100644
--- a/mm/page_ext.c
+++ b/mm/page_ext.c
@@ -83,7 +83,7 @@ static struct page_ext_operations *page_ext_ops[] __initdata = {
 #if defined(CONFIG_PAGE_IDLE_FLAG) && !defined(CONFIG_64BIT)
 	&page_idle_ops,
 #endif
-#ifdef CONFIG_MEM_ALLOC_PROFILING
+#if defined(CONFIG_MEM_ALLOC_PROFILING) && !defined(CONFIG_PGALLOC_TAG_USE_PAGEFLAGS)
 	&page_alloc_tagging_ops,
 #endif
 #ifdef CONFIG_PAGE_TABLE_CHECK
-- 
2.46.0.469.g59c65b2a67-goog



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 5/6] alloc_tag: make page allocation tag reference size configurable
  2024-09-02  4:41 ` [PATCH v2 5/6] alloc_tag: make page allocation tag reference size configurable Suren Baghdasaryan
@ 2024-09-02  5:09   ` Andrew Morton
  2024-09-04  1:07     ` Suren Baghdasaryan
  0 siblings, 1 reply; 27+ messages in thread
From: Andrew Morton @ 2024-09-02  5:09 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth,
	tglx, bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
	roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
	souravpanda, keescook, dennis, jhubbard, yuzhao, vvvvvv, rostedt,
	iamjoonsoo.kim, rientjes, minchan, kaleshsingh, linux-doc,
	linux-kernel, linux-arch, linux-mm, linux-modules, kernel-team

On Sun,  1 Sep 2024 21:41:27 -0700 Suren Baghdasaryan <surenb@google.com> wrote:

> Introduce CONFIG_PGALLOC_TAG_REF_BITS to control the size of the
> page allocation tag references. When the size is configured to be
> less than a direct pointer, the tags are searched using an index
> stored as the tag reference.
> 
> ...
>
> +config PGALLOC_TAG_REF_BITS
> +	int "Number of bits for page allocation tag reference (10-64)"
> +	range 10 64
> +	default "64"
> +	depends on MEM_ALLOC_PROFILING
> +	help
> +	  Number of bits used to encode a page allocation tag reference.
> +
> +	  Smaller number results in less memory overhead but limits the number of
> +	  allocations which can be tagged (including allocations from modules).
> +

In other words, "we have no idea what's best for you, you're on your
own".

I pity our poor users.

Can we at least tell them what they should look at to determine whether
whatever random number they chose was helpful or harmful?



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 6/6] alloc_tag: config to store page allocation tag refs in page flags
  2024-09-02  4:41 ` [PATCH v2 6/6] alloc_tag: config to store page allocation tag refs in page flags Suren Baghdasaryan
@ 2024-09-02  5:16   ` Andrew Morton
  2024-09-03 18:19     ` Suren Baghdasaryan
  0 siblings, 1 reply; 27+ messages in thread
From: Andrew Morton @ 2024-09-02  5:16 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth,
	tglx, bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
	roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
	souravpanda, keescook, dennis, jhubbard, yuzhao, vvvvvv, rostedt,
	iamjoonsoo.kim, rientjes, minchan, kaleshsingh, linux-doc,
	linux-kernel, linux-arch, linux-mm, linux-modules, kernel-team

On Sun,  1 Sep 2024 21:41:28 -0700 Suren Baghdasaryan <surenb@google.com> wrote:

> Add CONFIG_PGALLOC_TAG_USE_PAGEFLAGS to store allocation tag
> references directly in the page flags. This removes dependency on
> page_ext and results in better performance for page allocations as
> well as reduced page_ext memory overhead.
> CONFIG_PGALLOC_TAG_REF_BITS controls the number of bits required
> to be available in the page flags to store the references. If the
> number of page flag bits is insufficient, the build will fail and
> either CONFIG_PGALLOC_TAG_REF_BITS would have to be lowered or
> CONFIG_PGALLOC_TAG_USE_PAGEFLAGS should be disabled.
> 
> ...
>
> +config PGALLOC_TAG_USE_PAGEFLAGS
> +	bool "Use pageflags to encode page allocation tag reference"
> +	default n
> +	depends on MEM_ALLOC_PROFILING
> +	help
> +	  When set, page allocation tag references are encoded inside page
> +	  flags, otherwise they are encoded in page extensions.
> +
> +	  Setting this flag reduces memory and performance overhead of memory
> +	  allocation profiling but also limits how many allocations can be
> +	  tagged. The number of bits is set by PGALLOC_TAG_USE_PAGEFLAGS and
> +	  they must fit in the page flags field.

Again.  Please put yourself in the position of one of the all-minus-two
people in this world who aren't kernel-memory-profiling-developers. 
How the heck are they to decide whether or not to enable this?  OK, 59%
of them are likely to say "yes" because reasons.  But then what?  How
are they to determine whether it was the correct choice for them?  If
we don't tell them, who will?

>  config PGALLOC_TAG_REF_BITS
>  	int "Number of bits for page allocation tag reference (10-64)"
>  	range 10 64
> -	default "64"
> +	default "16" if PGALLOC_TAG_USE_PAGEFLAGS
> +	default "64" if !PGALLOC_TAG_USE_PAGEFLAGS
>  	depends on MEM_ALLOC_PROFILING
>  	help
>  	  Number of bits used to encode a page allocation tag reference.
> @@ -1011,6 +1027,13 @@ config PGALLOC_TAG_REF_BITS
>  	  Smaller number results in less memory overhead but limits the number of
>  	  allocations which can be tagged (including allocations from modules).
>  
> +	  If PGALLOC_TAG_USE_PAGEFLAGS is set, the number of requested bits should
> +	  fit inside the page flags.

What does "should fit" mean?  "It is your responsibility to make it
fit"?  "We think it will fit but we aren't really sure"?

> +	  If PGALLOC_TAG_USE_PAGEFLAGS is not set, the number of bits used to store
> +	  a reference is rounded up to the closest basic type. If set higher than 32,
> +	  a direct pointer to the allocation tag is stored for performance reasons.
> +

We shouldn't be offering things like this to our users.  If we cannot decide, how
can they?


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 6/6] alloc_tag: config to store page allocation tag refs in page flags
  2024-09-02  5:16   ` Andrew Morton
@ 2024-09-03 18:19     ` Suren Baghdasaryan
  2024-09-04  1:25       ` John Hubbard
  0 siblings, 1 reply; 27+ messages in thread
From: Suren Baghdasaryan @ 2024-09-03 18:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth,
	tglx, bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
	roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
	souravpanda, keescook, dennis, jhubbard, yuzhao, vvvvvv, rostedt,
	iamjoonsoo.kim, rientjes, minchan, kaleshsingh, linux-doc,
	linux-kernel, linux-arch, linux-mm, linux-modules, kernel-team

On Sun, Sep 1, 2024 at 10:16 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Sun,  1 Sep 2024 21:41:28 -0700 Suren Baghdasaryan <surenb@google.com> wrote:
>
> > Add CONFIG_PGALLOC_TAG_USE_PAGEFLAGS to store allocation tag
> > references directly in the page flags. This removes dependency on
> > page_ext and results in better performance for page allocations as
> > well as reduced page_ext memory overhead.
> > CONFIG_PGALLOC_TAG_REF_BITS controls the number of bits required
> > to be available in the page flags to store the references. If the
> > number of page flag bits is insufficient, the build will fail and
> > either CONFIG_PGALLOC_TAG_REF_BITS would have to be lowered or
> > CONFIG_PGALLOC_TAG_USE_PAGEFLAGS should be disabled.
> >
> > ...
> >
> > +config PGALLOC_TAG_USE_PAGEFLAGS
> > +     bool "Use pageflags to encode page allocation tag reference"
> > +     default n
> > +     depends on MEM_ALLOC_PROFILING
> > +     help
> > +       When set, page allocation tag references are encoded inside page
> > +       flags, otherwise they are encoded in page extensions.
> > +
> > +       Setting this flag reduces memory and performance overhead of memory
> > +       allocation profiling but also limits how many allocations can be
> > +       tagged. The number of bits is set by PGALLOC_TAG_USE_PAGEFLAGS and
> > +       they must fit in the page flags field.
>
> Again.  Please put yourself in the position of one of the all-minus-two
> people in this world who aren't kernel-memory-profiling-developers.
> How the heck are they to decide whether or not to enable this?  OK, 59%
> of them are likely to say "yes" because reasons.  But then what?  How
> are they to determine whether it was the correct choice for them?  If
> we don't tell them, who will?

Fair point. I think one would want to enable this config unless there
aren't enough unused bits if the page flags to address all page
allocation tags. That last part of determining how many bits we need
is a bit tricky.
If we put aside loadable modules for now, there are 3 cases:

1. The number of unused page flag bits is enough to address all page
allocations.
2. The number of unused page flag bits is enough if we push
last_cpupid out of page flags. In that case we get the warning at
https://elixir.bootlin.com/linux/v6.11-rc6/source/mm/mm_init.c#L124.
3. The number of unused page flag bits is not enough even if we push
last_cpupid out of page flags. In that case we get the  "Not enough
bits in page flags" build time error.

So, maybe I should make this option "default y" when
CONFIG_MEM_ALLOC_PROFILING=y and let the user disable it if they hit
case #3 or (case #2 and performance hit is unacceptable)?

For loadable modules, if we hit the limit when loading a module at
runtime, we could issue a warning and disable allocation tagging via
the static key. Another option is to fail to load the module with a
proper warning but that IMO would be less appealing.

>
> >  config PGALLOC_TAG_REF_BITS
> >       int "Number of bits for page allocation tag reference (10-64)"
> >       range 10 64
> > -     default "64"
> > +     default "16" if PGALLOC_TAG_USE_PAGEFLAGS
> > +     default "64" if !PGALLOC_TAG_USE_PAGEFLAGS
> >       depends on MEM_ALLOC_PROFILING
> >       help
> >         Number of bits used to encode a page allocation tag reference.
> > @@ -1011,6 +1027,13 @@ config PGALLOC_TAG_REF_BITS
> >         Smaller number results in less memory overhead but limits the number of
> >         allocations which can be tagged (including allocations from modules).
> >
> > +       If PGALLOC_TAG_USE_PAGEFLAGS is set, the number of requested bits should
> > +       fit inside the page flags.
>
> What does "should fit" mean?  "It is your responsibility to make it
> fit"?  "We think it will fit but we aren't really sure"?

This is the case #3 I described above, the user will get a "Not enough
bits in page flags" build time error. If we stick with this config, I
can clarify that in this description.

>
> > +       If PGALLOC_TAG_USE_PAGEFLAGS is not set, the number of bits used to store
> > +       a reference is rounded up to the closest basic type. If set higher than 32,
> > +       a direct pointer to the allocation tag is stored for performance reasons.
> > +
>
> We shouldn't be offering things like this to our users.  If we cannot decide, how
> can they?

Thinking about the ease of use, the CONFIG_PGALLOC_TAG_REF_BITS is the
hardest one to set. The user does not know how many page allocations
are there. I think I can simplify this by trying to use all unused
page flag bits for addressing the tags. Then, after compilation we can
follow the rules I mentioned before:
- If the available bits are not enough to address all kernel page
allocations, we issue an error. The user should disable
CONFIG_PGALLOC_TAG_USE_PAGEFLAGS.
- If there are enough unused bits but we have to push last_cpupid out
of page flags, we issue a warning and continue. The user can disable
CONFIG_PGALLOC_TAG_USE_PAGEFLAGS if last_cpupid has to stay in page
flags.
- If we run out of addressing space during module loading, we disable
allocation tagging and continue. The user should disable
CONFIG_PGALLOC_TAG_USE_PAGEFLAGS.

This leaves one outstanding case:
- If we run out of addressing space during module loading but we would
not run out of space if we pushed last_cpupid out of page flags during
compilation.
In this case I would want the user to have an option to request a
larger addressing space for page allocation tags at compile time.
Maybe I can keep CONFIG_PGALLOC_TAG_REF_BITS for such explicit
requests for a larger space? This would limit the use of
CONFIG_PGALLOC_TAG_REF_BITS to this case only. In all other cases the
number of bits would be set automatically. WDYT?


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 5/6] alloc_tag: make page allocation tag reference size configurable
  2024-09-02  5:09   ` Andrew Morton
@ 2024-09-04  1:07     ` Suren Baghdasaryan
  2024-09-04  1:16       ` Kent Overstreet
  0 siblings, 1 reply; 27+ messages in thread
From: Suren Baghdasaryan @ 2024-09-04  1:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth,
	tglx, bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
	roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
	souravpanda, keescook, dennis, jhubbard, yuzhao, vvvvvv, rostedt,
	iamjoonsoo.kim, rientjes, minchan, kaleshsingh, linux-doc,
	linux-kernel, linux-arch, linux-mm, linux-modules, kernel-team

On Sun, Sep 1, 2024 at 10:09 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Sun,  1 Sep 2024 21:41:27 -0700 Suren Baghdasaryan <surenb@google.com> wrote:
>
> > Introduce CONFIG_PGALLOC_TAG_REF_BITS to control the size of the
> > page allocation tag references. When the size is configured to be
> > less than a direct pointer, the tags are searched using an index
> > stored as the tag reference.
> >
> > ...
> >
> > +config PGALLOC_TAG_REF_BITS
> > +     int "Number of bits for page allocation tag reference (10-64)"
> > +     range 10 64
> > +     default "64"
> > +     depends on MEM_ALLOC_PROFILING
> > +     help
> > +       Number of bits used to encode a page allocation tag reference.
> > +
> > +       Smaller number results in less memory overhead but limits the number of
> > +       allocations which can be tagged (including allocations from modules).
> > +
>
> In other words, "we have no idea what's best for you, you're on your
> own".
>
> I pity our poor users.
>
> Can we at least tell them what they should look at to determine whether
> whatever random number they chose was helpful or harmful?

At the end of my reply in
https://lore.kernel.org/all/CAJuCfpGNYgx0GW4suHRzmxVH28RGRnFBvFC6WO+F8BD4HDqxXA@mail.gmail.com/#t
I suggested using all unused page flags. That would simplify things
for the user at the expense of potentially using more memory than we
need. In practice 13 bits should be more than enough to cover all
kernel page allocations with enough headroom for page allocations
coming from loadable modules. I guess using 13 as the default would
cover most cases. In the unlikely case a specific system needs more
tags, the user can increase this value. It can also be set to 64 to
force direct references instead of indexing for better performance.
Would that approach be acceptable?

>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 5/6] alloc_tag: make page allocation tag reference size configurable
  2024-09-04  1:07     ` Suren Baghdasaryan
@ 2024-09-04  1:16       ` Kent Overstreet
  2024-09-04  2:04         ` Suren Baghdasaryan
  0 siblings, 1 reply; 27+ messages in thread
From: Kent Overstreet @ 2024-09-04  1:16 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Andrew Morton, corbet, arnd, mcgrof, rppt, paulmck, thuth, tglx,
	bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
	roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
	souravpanda, keescook, dennis, jhubbard, yuzhao, vvvvvv, rostedt,
	iamjoonsoo.kim, rientjes, minchan, kaleshsingh, linux-doc,
	linux-kernel, linux-arch, linux-mm, linux-modules, kernel-team

On Tue, Sep 03, 2024 at 06:07:28PM GMT, Suren Baghdasaryan wrote:
> On Sun, Sep 1, 2024 at 10:09 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > On Sun,  1 Sep 2024 21:41:27 -0700 Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > > Introduce CONFIG_PGALLOC_TAG_REF_BITS to control the size of the
> > > page allocation tag references. When the size is configured to be
> > > less than a direct pointer, the tags are searched using an index
> > > stored as the tag reference.
> > >
> > > ...
> > >
> > > +config PGALLOC_TAG_REF_BITS
> > > +     int "Number of bits for page allocation tag reference (10-64)"
> > > +     range 10 64
> > > +     default "64"
> > > +     depends on MEM_ALLOC_PROFILING
> > > +     help
> > > +       Number of bits used to encode a page allocation tag reference.
> > > +
> > > +       Smaller number results in less memory overhead but limits the number of
> > > +       allocations which can be tagged (including allocations from modules).
> > > +
> >
> > In other words, "we have no idea what's best for you, you're on your
> > own".
> >
> > I pity our poor users.
> >
> > Can we at least tell them what they should look at to determine whether
> > whatever random number they chose was helpful or harmful?
> 
> At the end of my reply in
> https://lore.kernel.org/all/CAJuCfpGNYgx0GW4suHRzmxVH28RGRnFBvFC6WO+F8BD4HDqxXA@mail.gmail.com/#t
> I suggested using all unused page flags. That would simplify things
> for the user at the expense of potentially using more memory than we
> need.

Why would that use more memory, and how much?

> In practice 13 bits should be more than enough to cover all
> kernel page allocations with enough headroom for page allocations
> coming from loadable modules. I guess using 13 as the default would
> cover most cases. In the unlikely case a specific system needs more
> tags, the user can increase this value. It can also be set to 64 to
> force direct references instead of indexing for better performance.
> Would that approach be acceptable?

Any knob that has to be kept track of and adjusted is a real hassle -
e.g. lockdep has a bunch of knobs that have to be periodically tweaked,
that's used by _developers_, and they're often wrong.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 6/6] alloc_tag: config to store page allocation tag refs in page flags
  2024-09-03 18:19     ` Suren Baghdasaryan
@ 2024-09-04  1:25       ` John Hubbard
  2024-09-04  2:05         ` John Hubbard
                           ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: John Hubbard @ 2024-09-04  1:25 UTC (permalink / raw)
  To: Suren Baghdasaryan, Andrew Morton
  Cc: kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth,
	tglx, bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
	roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
	souravpanda, keescook, dennis, yuzhao, vvvvvv, rostedt,
	iamjoonsoo.kim, rientjes, minchan, kaleshsingh, linux-doc,
	linux-kernel, linux-arch, linux-mm, linux-modules, kernel-team

On 9/3/24 11:19 AM, Suren Baghdasaryan wrote:
> On Sun, Sep 1, 2024 at 10:16 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>> On Sun,  1 Sep 2024 21:41:28 -0700 Suren Baghdasaryan <surenb@google.com> wrote:
...
>> We shouldn't be offering things like this to our users.  If we cannot decide, how
>> can they?
> 
> Thinking about the ease of use, the CONFIG_PGALLOC_TAG_REF_BITS is the
> hardest one to set. The user does not know how many page allocations
> are there. I think I can simplify this by trying to use all unused
> page flag bits for addressing the tags. Then, after compilation we can
> follow the rules I mentioned before:
> - If the available bits are not enough to address all kernel page
> allocations, we issue an error. The user should disable
> CONFIG_PGALLOC_TAG_USE_PAGEFLAGS.
> - If there are enough unused bits but we have to push last_cpupid out
> of page flags, we issue a warning and continue. The user can disable
> CONFIG_PGALLOC_TAG_USE_PAGEFLAGS if last_cpupid has to stay in page
> flags.
> - If we run out of addressing space during module loading, we disable
> allocation tagging and continue. The user should disable
> CONFIG_PGALLOC_TAG_USE_PAGEFLAGS.

If the computer already knows what to do, it should do it, rather than
prompting the user to disable a deeply mystifying config parameter.

> 
> This leaves one outstanding case:
> - If we run out of addressing space during module loading but we would
> not run out of space if we pushed last_cpupid out of page flags during
> compilation.
> In this case I would want the user to have an option to request a
> larger addressing space for page allocation tags at compile time.
> Maybe I can keep CONFIG_PGALLOC_TAG_REF_BITS for such explicit
> requests for a larger space? This would limit the use of
> CONFIG_PGALLOC_TAG_REF_BITS to this case only. In all other cases the
> number of bits would be set automatically. WDYT?

Manually dealing with something like this is just not going to work.

The more I read this story, the clearer it becomes that this should be
entirely done by the build system: set it, or don't set it, automatically.

And if you can make it not even a kconfig item at all, that's probably even
better.

And if there is no way to set it automatically, then that probably means
that the feature is still too raw to unleash upon the world.

thanks,
-- 
John Hubbard
NVIDIA



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 5/6] alloc_tag: make page allocation tag reference size configurable
  2024-09-04  1:16       ` Kent Overstreet
@ 2024-09-04  2:04         ` Suren Baghdasaryan
  2024-09-04 16:25           ` Kent Overstreet
  0 siblings, 1 reply; 27+ messages in thread
From: Suren Baghdasaryan @ 2024-09-04  2:04 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Andrew Morton, corbet, arnd, mcgrof, rppt, paulmck, thuth, tglx,
	bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
	roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
	souravpanda, keescook, dennis, jhubbard, yuzhao, vvvvvv, rostedt,
	iamjoonsoo.kim, rientjes, minchan, kaleshsingh, linux-doc,
	linux-kernel, linux-arch, linux-mm, linux-modules, kernel-team

On Tue, Sep 3, 2024 at 6:17 PM Kent Overstreet
<kent.overstreet@linux.dev> wrote:
>
> On Tue, Sep 03, 2024 at 06:07:28PM GMT, Suren Baghdasaryan wrote:
> > On Sun, Sep 1, 2024 at 10:09 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> > >
> > > On Sun,  1 Sep 2024 21:41:27 -0700 Suren Baghdasaryan <surenb@google.com> wrote:
> > >
> > > > Introduce CONFIG_PGALLOC_TAG_REF_BITS to control the size of the
> > > > page allocation tag references. When the size is configured to be
> > > > less than a direct pointer, the tags are searched using an index
> > > > stored as the tag reference.
> > > >
> > > > ...
> > > >
> > > > +config PGALLOC_TAG_REF_BITS
> > > > +     int "Number of bits for page allocation tag reference (10-64)"
> > > > +     range 10 64
> > > > +     default "64"
> > > > +     depends on MEM_ALLOC_PROFILING
> > > > +     help
> > > > +       Number of bits used to encode a page allocation tag reference.
> > > > +
> > > > +       Smaller number results in less memory overhead but limits the number of
> > > > +       allocations which can be tagged (including allocations from modules).
> > > > +
> > >
> > > In other words, "we have no idea what's best for you, you're on your
> > > own".
> > >
> > > I pity our poor users.
> > >
> > > Can we at least tell them what they should look at to determine whether
> > > whatever random number they chose was helpful or harmful?
> >
> > At the end of my reply in
> > https://lore.kernel.org/all/CAJuCfpGNYgx0GW4suHRzmxVH28RGRnFBvFC6WO+F8BD4HDqxXA@mail.gmail.com/#t
> > I suggested using all unused page flags. That would simplify things
> > for the user at the expense of potentially using more memory than we
> > need.
>
> Why would that use more memory, and how much?

Say our kernel uses 5000 page allocations and there are additional 100
allocations from all the modules we are loading at runtime. They all
can be addressed using 13 bits (8192 addressable tags), so the
contiguous memory we will be preallocating to store these tags is 8192
* sizeof(alloc_tag). sizeof(alloc_tag) is 40 bytes as of today but
might increase in the future if we add more fields there for other
uses (like gfp_flags for example). So, currently this would use 320KB.
If we always use 16 bits we would be preallocating 2.5MB. So, that
would be 2.2MB of wasted memory. Using more than 16 bits (65536
addressable tags) will be impractical anytime soon (current number
IIRC is a bit over 4000).


>
> > In practice 13 bits should be more than enough to cover all
> > kernel page allocations with enough headroom for page allocations
> > coming from loadable modules. I guess using 13 as the default would
> > cover most cases. In the unlikely case a specific system needs more
> > tags, the user can increase this value. It can also be set to 64 to
> > force direct references instead of indexing for better performance.
> > Would that approach be acceptable?
>
> Any knob that has to be kept track of and adjusted is a real hassle -
> e.g. lockdep has a bunch of knobs that have to be periodically tweaked,
> that's used by _developers_, and they're often wrong.

Yes, I understand, but this config would allow us not to waste these
couple of MBs, provide a way for the user to request direct addressing
of the tags and it also helps us to deal with the case I described in
the last paragraph of my posting at
https://lore.kernel.org/all/CAJuCfpGNYgx0GW4suHRzmxVH28RGRnFBvFC6WO+F8BD4HDqxXA@mail.gmail.com/#t


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 6/6] alloc_tag: config to store page allocation tag refs in page flags
  2024-09-04  1:25       ` John Hubbard
@ 2024-09-04  2:05         ` John Hubbard
  2024-09-04 16:08           ` Suren Baghdasaryan
  2024-09-04  2:18         ` Matthew Wilcox
  2024-09-04  2:41         ` Kent Overstreet
  2 siblings, 1 reply; 27+ messages in thread
From: John Hubbard @ 2024-09-04  2:05 UTC (permalink / raw)
  To: Suren Baghdasaryan, Andrew Morton
  Cc: kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth,
	tglx, bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
	roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
	souravpanda, keescook, dennis, yuzhao, vvvvvv, rostedt,
	iamjoonsoo.kim, rientjes, minchan, kaleshsingh, linux-doc,
	linux-kernel, linux-arch, linux-mm, linux-modules, kernel-team

On 9/3/24 6:25 PM, John Hubbard wrote:
> On 9/3/24 11:19 AM, Suren Baghdasaryan wrote:
>> On Sun, Sep 1, 2024 at 10:16 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>>> On Sun,  1 Sep 2024 21:41:28 -0700 Suren Baghdasaryan <surenb@google.com> wrote:
> ...
>>> We shouldn't be offering things like this to our users.  If we cannot decide, how
>>> can they?
>>
>> Thinking about the ease of use, the CONFIG_PGALLOC_TAG_REF_BITS is the
>> hardest one to set. The user does not know how many page allocations

I should probably clarify my previous reply, so here is the more detailed
version:

>> are there. I think I can simplify this by trying to use all unused
>> page flag bits for addressing the tags. Then, after compilation we can

Yes.

>> follow the rules I mentioned before:
>> - If the available bits are not enough to address all kernel page
>> allocations, we issue an error. The user should disable
>> CONFIG_PGALLOC_TAG_USE_PAGEFLAGS.

The configuration should disable itself, in this case. But if that is
too big of a change for now, I suppose we could fall back to an error
message to the effect of, "please disable CONFIG_PGALLOC_TAG_USE_PAGEFLAGS
because the kernel build system is still too primitive to do that for you". :)


>> - If there are enough unused bits but we have to push last_cpupid out
>> of page flags, we issue a warning and continue. The user can disable
>> CONFIG_PGALLOC_TAG_USE_PAGEFLAGS if last_cpupid has to stay in page
>> flags.

Let's try to decide now, what that tradeoff should be. Just pick one based
on what some of us perceive to be the expected usefulness and frequency of
use between last_cpuid and these tag refs.

If someone really needs to change the tradeoff for that one bit, then that
someone is also likely able to hack up a change for it.

thanks,
-- 
John Hubbard

>> - If we run out of addressing space during module loading, we disable
>> allocation tagging and continue. The user should disable
>> CONFIG_PGALLOC_TAG_USE_PAGEFLAGS.
> 
> If the computer already knows what to do, it should do it, rather than
> prompting the user to disable a deeply mystifying config parameter.
> 
>>
>> This leaves one outstanding case:
>> - If we run out of addressing space during module loading but we would
>> not run out of space if we pushed last_cpupid out of page flags during
>> compilation.
>> In this case I would want the user to have an option to request a
>> larger addressing space for page allocation tags at compile time.
>> Maybe I can keep CONFIG_PGALLOC_TAG_REF_BITS for such explicit
>> requests for a larger space? This would limit the use of
>> CONFIG_PGALLOC_TAG_REF_BITS to this case only. In all other cases the
>> number of bits would be set automatically. WDYT?
> 
> Manually dealing with something like this is just not going to work.
> 
> The more I read this story, the clearer it becomes that this should be
> entirely done by the build system: set it, or don't set it, automatically.
> 
> And if you can make it not even a kconfig item at all, that's probably even
> better.
> 
> And if there is no way to set it automatically, then that probably means
> that the feature is still too raw to unleash upon the world.
> 
> thanks,





^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 6/6] alloc_tag: config to store page allocation tag refs in page flags
  2024-09-04  1:25       ` John Hubbard
  2024-09-04  2:05         ` John Hubbard
@ 2024-09-04  2:18         ` Matthew Wilcox
  2024-09-04 16:18           ` Suren Baghdasaryan
  2024-09-04  2:41         ` Kent Overstreet
  2 siblings, 1 reply; 27+ messages in thread
From: Matthew Wilcox @ 2024-09-04  2:18 UTC (permalink / raw)
  To: John Hubbard
  Cc: Suren Baghdasaryan, Andrew Morton, kent.overstreet, corbet, arnd,
	mcgrof, rppt, paulmck, thuth, tglx, bp, xiongwei.song, ardb,
	david, vbabka, mhocko, hannes, roman.gushchin, dave,
	liam.howlett, pasha.tatashin, souravpanda, keescook, dennis,
	yuzhao, vvvvvv, rostedt, iamjoonsoo.kim, rientjes, minchan,
	kaleshsingh, linux-doc, linux-kernel, linux-arch, linux-mm,
	linux-modules, kernel-team

On Tue, Sep 03, 2024 at 06:25:52PM -0700, John Hubbard wrote:
> The more I read this story, the clearer it becomes that this should be
> entirely done by the build system: set it, or don't set it, automatically.
> 
> And if you can make it not even a kconfig item at all, that's probably even
> better.
> 
> And if there is no way to set it automatically, then that probably means
> that the feature is still too raw to unleash upon the world.

I'd suggest that this implementation is just too whack.

What if you use a maple tree for this?  For each allocation range, you
can store a pointer to a tag instead of storing an index in each folio.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 6/6] alloc_tag: config to store page allocation tag refs in page flags
  2024-09-04  1:25       ` John Hubbard
  2024-09-04  2:05         ` John Hubbard
  2024-09-04  2:18         ` Matthew Wilcox
@ 2024-09-04  2:41         ` Kent Overstreet
  2 siblings, 0 replies; 27+ messages in thread
From: Kent Overstreet @ 2024-09-04  2:41 UTC (permalink / raw)
  To: John Hubbard
  Cc: Suren Baghdasaryan, Andrew Morton, corbet, arnd, mcgrof, rppt,
	paulmck, thuth, tglx, bp, xiongwei.song, ardb, david, vbabka,
	mhocko, hannes, roman.gushchin, dave, willy, liam.howlett,
	pasha.tatashin, souravpanda, keescook, dennis, yuzhao, vvvvvv,
	rostedt, iamjoonsoo.kim, rientjes, minchan, kaleshsingh,
	linux-doc, linux-kernel, linux-arch, linux-mm, linux-modules,
	kernel-team

On Tue, Sep 03, 2024 at 06:25:52PM GMT, John Hubbard wrote:
> On 9/3/24 11:19 AM, Suren Baghdasaryan wrote:
> > On Sun, Sep 1, 2024 at 10:16 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> > > On Sun,  1 Sep 2024 21:41:28 -0700 Suren Baghdasaryan <surenb@google.com> wrote:
> ...
> > > We shouldn't be offering things like this to our users.  If we cannot decide, how
> > > can they?
> > 
> > Thinking about the ease of use, the CONFIG_PGALLOC_TAG_REF_BITS is the
> > hardest one to set. The user does not know how many page allocations
> > are there. I think I can simplify this by trying to use all unused
> > page flag bits for addressing the tags. Then, after compilation we can
> > follow the rules I mentioned before:
> > - If the available bits are not enough to address all kernel page
> > allocations, we issue an error. The user should disable
> > CONFIG_PGALLOC_TAG_USE_PAGEFLAGS.
> > - If there are enough unused bits but we have to push last_cpupid out
> > of page flags, we issue a warning and continue. The user can disable
> > CONFIG_PGALLOC_TAG_USE_PAGEFLAGS if last_cpupid has to stay in page
> > flags.
> > - If we run out of addressing space during module loading, we disable
> > allocation tagging and continue. The user should disable
> > CONFIG_PGALLOC_TAG_USE_PAGEFLAGS.
> 
> If the computer already knows what to do, it should do it, rather than
> prompting the user to disable a deeply mystifying config parameter.
> 
> > 
> > This leaves one outstanding case:
> > - If we run out of addressing space during module loading but we would
> > not run out of space if we pushed last_cpupid out of page flags during
> > compilation.
> > In this case I would want the user to have an option to request a
> > larger addressing space for page allocation tags at compile time.
> > Maybe I can keep CONFIG_PGALLOC_TAG_REF_BITS for such explicit
> > requests for a larger space? This would limit the use of
> > CONFIG_PGALLOC_TAG_REF_BITS to this case only. In all other cases the
> > number of bits would be set automatically. WDYT?
> 
> Manually dealing with something like this is just not going to work.
> 
> The more I read this story, the clearer it becomes that this should be
> entirely done by the build system: set it, or don't set it, automatically.

The difficulty is that we don't know how many modules are going to be
loaded, and an allyesconfig or allmodconfig is going to require quite a
few bits - it's not something we can really predict or calculate.

So there is some genuine trickyness here.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 6/6] alloc_tag: config to store page allocation tag refs in page flags
  2024-09-04  2:05         ` John Hubbard
@ 2024-09-04 16:08           ` Suren Baghdasaryan
  2024-09-04 18:58             ` John Hubbard
  0 siblings, 1 reply; 27+ messages in thread
From: Suren Baghdasaryan @ 2024-09-04 16:08 UTC (permalink / raw)
  To: John Hubbard
  Cc: Andrew Morton, kent.overstreet, corbet, arnd, mcgrof, rppt,
	paulmck, thuth, tglx, bp, xiongwei.song, ardb, david, vbabka,
	mhocko, hannes, roman.gushchin, dave, willy, liam.howlett,
	pasha.tatashin, souravpanda, keescook, dennis, yuzhao, vvvvvv,
	rostedt, iamjoonsoo.kim, rientjes, minchan, kaleshsingh,
	linux-doc, linux-kernel, linux-arch, linux-mm, linux-modules,
	kernel-team

On Tue, Sep 3, 2024 at 7:06 PM 'John Hubbard' via kernel-team
<kernel-team@android.com> wrote:
>
> On 9/3/24 6:25 PM, John Hubbard wrote:
> > On 9/3/24 11:19 AM, Suren Baghdasaryan wrote:
> >> On Sun, Sep 1, 2024 at 10:16 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> >>> On Sun,  1 Sep 2024 21:41:28 -0700 Suren Baghdasaryan <surenb@google.com> wrote:
> > ...
> >>> We shouldn't be offering things like this to our users.  If we cannot decide, how
> >>> can they?
> >>
> >> Thinking about the ease of use, the CONFIG_PGALLOC_TAG_REF_BITS is the
> >> hardest one to set. The user does not know how many page allocations
>
> I should probably clarify my previous reply, so here is the more detailed
> version:
>
> >> are there. I think I can simplify this by trying to use all unused
> >> page flag bits for addressing the tags. Then, after compilation we can
>
> Yes.
>
> >> follow the rules I mentioned before:
> >> - If the available bits are not enough to address all kernel page
> >> allocations, we issue an error. The user should disable
> >> CONFIG_PGALLOC_TAG_USE_PAGEFLAGS.
>
> The configuration should disable itself, in this case. But if that is
> too big of a change for now, I suppose we could fall back to an error
> message to the effect of, "please disable CONFIG_PGALLOC_TAG_USE_PAGEFLAGS
> because the kernel build system is still too primitive to do that for you". :)

I don't think we can detect this at build time. We need to know how
many page allocations there are, which we find out only after we build
the kernel image (from the section size that holds allocation tags).
Therefore it would have to be a post-build check. So I think the best
we can do is to generate the error like the one you suggested after we
build the image.
Dependency on CONFIG_PAGE_EXTENSION is yet another complexity because
if we auto-disable CONFIG_PGALLOC_TAG_USE_PAGEFLAGS, we would have to
also auto-enable CONFIG_PAGE_EXTENSION if it's not already enabled.

I'll dig around some more to see if there is a better way.

>
>
> >> - If there are enough unused bits but we have to push last_cpupid out
> >> of page flags, we issue a warning and continue. The user can disable
> >> CONFIG_PGALLOC_TAG_USE_PAGEFLAGS if last_cpupid has to stay in page
> >> flags.
>
> Let's try to decide now, what that tradeoff should be. Just pick one based
> on what some of us perceive to be the expected usefulness and frequency of
> use between last_cpuid and these tag refs.
>
> If someone really needs to change the tradeoff for that one bit, then that
> someone is also likely able to hack up a change for it.

Yeah, from all the feedback, I realize that by pursuing the maximum
flexibility I made configuring this mechanism close to impossible. I
think the first step towards simplifying this would be to identify
usable configurations. From that POV, I can see 3 useful modes:

1. Page flags are not used. In this mode we will use direct pointer
references and page extensions, like we do today. This mode is used
when we don't have enough page flags. This can be a safe default which
keeps things as they are today and should always work.
2. Page flags are used but not forced. This means we will try to use
all free page flags bits (up to a reasonable limit of 16) without
pushing out last_cpupid.
3. Page flags are forced. This means we will try to use all free page
flags bits after pushing last_cpupid out of page flags. This mode
could be used if the user cares about memory profiling more than the
performance overhead caused by last_cpupid.

I'm not 100% sure (3) is needed, so I think we can skip it until
someone asks for it. It should be easy to add that in the future.
If we detect at build time that we don't have enough page flag bits to
cover kernel allocations for modes (2) or (3), we issue an error
prompting the user to reconfigure to mode (1).

Ideally, I would like to have (2) as default mode and automatically
fall back to (1) when it's impossible but as I mentioned before, I
don't yet see a way to do that automatically.

For loadable modules, I think my earlier suggestion should work fine.
If a module causes us to run out of space for tags, we disable memory
profiling at runtime and log a warning for the user stating that we
disabled memory profiling and if the user needs it they should
configure mode (1). I *think* I can even disable profiling only for
that module and not globally but I need to try that first.

I can start with modes (1) and (2) support which requires only
CONFIG_PGALLOC_TAG_USE_PAGEFLAGS defaulted to N. Any user can try
enabling this config and if that builds fine then keeping it for
better performance and memory usage. Does that sound acceptable?
Thanks,
Suren.

>
> thanks,
> --
> John Hubbard
>
> >> - If we run out of addressing space during module loading, we disable
> >> allocation tagging and continue. The user should disable
> >> CONFIG_PGALLOC_TAG_USE_PAGEFLAGS.
> >
> > If the computer already knows what to do, it should do it, rather than
> > prompting the user to disable a deeply mystifying config parameter.
> >
> >>
> >> This leaves one outstanding case:
> >> - If we run out of addressing space during module loading but we would
> >> not run out of space if we pushed last_cpupid out of page flags during
> >> compilation.
> >> In this case I would want the user to have an option to request a
> >> larger addressing space for page allocation tags at compile time.
> >> Maybe I can keep CONFIG_PGALLOC_TAG_REF_BITS for such explicit
> >> requests for a larger space? This would limit the use of
> >> CONFIG_PGALLOC_TAG_REF_BITS to this case only. In all other cases the
> >> number of bits would be set automatically. WDYT?
> >
> > Manually dealing with something like this is just not going to work.
> >
> > The more I read this story, the clearer it becomes that this should be
> > entirely done by the build system: set it, or don't set it, automatically.
> >
> > And if you can make it not even a kconfig item at all, that's probably even
> > better.
> >
> > And if there is no way to set it automatically, then that probably means
> > that the feature is still too raw to unleash upon the world.
> >
> > thanks,
>
>
>
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 6/6] alloc_tag: config to store page allocation tag refs in page flags
  2024-09-04  2:18         ` Matthew Wilcox
@ 2024-09-04 16:18           ` Suren Baghdasaryan
  2024-09-04 16:21             ` Kent Overstreet
  2024-09-04 16:35             ` Matthew Wilcox
  0 siblings, 2 replies; 27+ messages in thread
From: Suren Baghdasaryan @ 2024-09-04 16:18 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: John Hubbard, Andrew Morton, kent.overstreet, corbet, arnd,
	mcgrof, rppt, paulmck, thuth, tglx, bp, xiongwei.song, ardb,
	david, vbabka, mhocko, hannes, roman.gushchin, dave,
	liam.howlett, pasha.tatashin, souravpanda, keescook, dennis,
	yuzhao, vvvvvv, rostedt, iamjoonsoo.kim, rientjes, minchan,
	kaleshsingh, linux-doc, linux-kernel, linux-arch, linux-mm,
	linux-modules, kernel-team

On Tue, Sep 3, 2024 at 7:19 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Tue, Sep 03, 2024 at 06:25:52PM -0700, John Hubbard wrote:
> > The more I read this story, the clearer it becomes that this should be
> > entirely done by the build system: set it, or don't set it, automatically.
> >
> > And if you can make it not even a kconfig item at all, that's probably even
> > better.
> >
> > And if there is no way to set it automatically, then that probably means
> > that the feature is still too raw to unleash upon the world.
>
> I'd suggest that this implementation is just too whack.
>
> What if you use a maple tree for this?  For each allocation range, you
> can store a pointer to a tag instead of storing an index in each folio.

I'm not sure I understand your suggestion, Matthew. We allocate a
folio and need to store a reference to the tag associated with the
code that allocated that folio. We are not operating with ranges here.
Are you suggesting to use a maple tree instead of page_ext to store
this reference?


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 6/6] alloc_tag: config to store page allocation tag refs in page flags
  2024-09-04 16:18           ` Suren Baghdasaryan
@ 2024-09-04 16:21             ` Kent Overstreet
  2024-09-04 16:35             ` Matthew Wilcox
  1 sibling, 0 replies; 27+ messages in thread
From: Kent Overstreet @ 2024-09-04 16:21 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Matthew Wilcox, John Hubbard, Andrew Morton, corbet, arnd,
	mcgrof, rppt, paulmck, thuth, tglx, bp, xiongwei.song, ardb,
	david, vbabka, mhocko, hannes, roman.gushchin, dave,
	liam.howlett, pasha.tatashin, souravpanda, keescook, dennis,
	yuzhao, vvvvvv, rostedt, iamjoonsoo.kim, rientjes, minchan,
	kaleshsingh, linux-doc, linux-kernel, linux-arch, linux-mm,
	linux-modules, kernel-team

On Wed, Sep 04, 2024 at 09:18:01AM GMT, Suren Baghdasaryan wrote:
> On Tue, Sep 3, 2024 at 7:19 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Tue, Sep 03, 2024 at 06:25:52PM -0700, John Hubbard wrote:
> > > The more I read this story, the clearer it becomes that this should be
> > > entirely done by the build system: set it, or don't set it, automatically.
> > >
> > > And if you can make it not even a kconfig item at all, that's probably even
> > > better.
> > >
> > > And if there is no way to set it automatically, then that probably means
> > > that the feature is still too raw to unleash upon the world.
> >
> > I'd suggest that this implementation is just too whack.
> >
> > What if you use a maple tree for this?  For each allocation range, you
> > can store a pointer to a tag instead of storing an index in each folio.
> 
> I'm not sure I understand your suggestion, Matthew. We allocate a
> folio and need to store a reference to the tag associated with the
> code that allocated that folio. We are not operating with ranges here.
> Are you suggesting to use a maple tree instead of page_ext to store
> this reference?

yeah, I don't think the maple tree idea makes any sense either

we already have a way of going from index -> alloc tag, this is just
about how we store the index


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 5/6] alloc_tag: make page allocation tag reference size configurable
  2024-09-04  2:04         ` Suren Baghdasaryan
@ 2024-09-04 16:25           ` Kent Overstreet
  2024-09-04 16:35             ` Suren Baghdasaryan
  0 siblings, 1 reply; 27+ messages in thread
From: Kent Overstreet @ 2024-09-04 16:25 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Andrew Morton, corbet, arnd, mcgrof, rppt, paulmck, thuth, tglx,
	bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
	roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
	souravpanda, keescook, dennis, jhubbard, yuzhao, vvvvvv, rostedt,
	iamjoonsoo.kim, rientjes, minchan, kaleshsingh, linux-doc,
	linux-kernel, linux-arch, linux-mm, linux-modules, kernel-team

On Tue, Sep 03, 2024 at 07:04:51PM GMT, Suren Baghdasaryan wrote:
> On Tue, Sep 3, 2024 at 6:17 PM Kent Overstreet
> <kent.overstreet@linux.dev> wrote:
> >
> > On Tue, Sep 03, 2024 at 06:07:28PM GMT, Suren Baghdasaryan wrote:
> > > On Sun, Sep 1, 2024 at 10:09 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> > > >
> > > > On Sun,  1 Sep 2024 21:41:27 -0700 Suren Baghdasaryan <surenb@google.com> wrote:
> > > >
> > > > > Introduce CONFIG_PGALLOC_TAG_REF_BITS to control the size of the
> > > > > page allocation tag references. When the size is configured to be
> > > > > less than a direct pointer, the tags are searched using an index
> > > > > stored as the tag reference.
> > > > >
> > > > > ...
> > > > >
> > > > > +config PGALLOC_TAG_REF_BITS
> > > > > +     int "Number of bits for page allocation tag reference (10-64)"
> > > > > +     range 10 64
> > > > > +     default "64"
> > > > > +     depends on MEM_ALLOC_PROFILING
> > > > > +     help
> > > > > +       Number of bits used to encode a page allocation tag reference.
> > > > > +
> > > > > +       Smaller number results in less memory overhead but limits the number of
> > > > > +       allocations which can be tagged (including allocations from modules).
> > > > > +
> > > >
> > > > In other words, "we have no idea what's best for you, you're on your
> > > > own".
> > > >
> > > > I pity our poor users.
> > > >
> > > > Can we at least tell them what they should look at to determine whether
> > > > whatever random number they chose was helpful or harmful?
> > >
> > > At the end of my reply in
> > > https://lore.kernel.org/all/CAJuCfpGNYgx0GW4suHRzmxVH28RGRnFBvFC6WO+F8BD4HDqxXA@mail.gmail.com/#t
> > > I suggested using all unused page flags. That would simplify things
> > > for the user at the expense of potentially using more memory than we
> > > need.
> >
> > Why would that use more memory, and how much?
> 
> Say our kernel uses 5000 page allocations and there are additional 100
> allocations from all the modules we are loading at runtime. They all
> can be addressed using 13 bits (8192 addressable tags), so the
> contiguous memory we will be preallocating to store these tags is 8192
> * sizeof(alloc_tag). sizeof(alloc_tag) is 40 bytes as of today but
> might increase in the future if we add more fields there for other
> uses (like gfp_flags for example). So, currently this would use 320KB.
> If we always use 16 bits we would be preallocating 2.5MB. So, that
> would be 2.2MB of wasted memory. Using more than 16 bits (65536
> addressable tags) will be impractical anytime soon (current number
> IIRC is a bit over 4000).

I see, it's not about the page bits, it's about the contiguous array of
alloc tags?

What if we just reserved address space, and only filled it in as needed?


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 5/6] alloc_tag: make page allocation tag reference size configurable
  2024-09-04 16:25           ` Kent Overstreet
@ 2024-09-04 16:35             ` Suren Baghdasaryan
  0 siblings, 0 replies; 27+ messages in thread
From: Suren Baghdasaryan @ 2024-09-04 16:35 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Andrew Morton, corbet, arnd, mcgrof, rppt, paulmck, thuth, tglx,
	bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
	roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
	souravpanda, keescook, dennis, jhubbard, yuzhao, vvvvvv, rostedt,
	iamjoonsoo.kim, rientjes, minchan, kaleshsingh, linux-doc,
	linux-kernel, linux-arch, linux-mm, linux-modules, kernel-team

On Wed, Sep 4, 2024 at 9:25 AM Kent Overstreet
<kent.overstreet@linux.dev> wrote:
>
> On Tue, Sep 03, 2024 at 07:04:51PM GMT, Suren Baghdasaryan wrote:
> > On Tue, Sep 3, 2024 at 6:17 PM Kent Overstreet
> > <kent.overstreet@linux.dev> wrote:
> > >
> > > On Tue, Sep 03, 2024 at 06:07:28PM GMT, Suren Baghdasaryan wrote:
> > > > On Sun, Sep 1, 2024 at 10:09 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> > > > >
> > > > > On Sun,  1 Sep 2024 21:41:27 -0700 Suren Baghdasaryan <surenb@google.com> wrote:
> > > > >
> > > > > > Introduce CONFIG_PGALLOC_TAG_REF_BITS to control the size of the
> > > > > > page allocation tag references. When the size is configured to be
> > > > > > less than a direct pointer, the tags are searched using an index
> > > > > > stored as the tag reference.
> > > > > >
> > > > > > ...
> > > > > >
> > > > > > +config PGALLOC_TAG_REF_BITS
> > > > > > +     int "Number of bits for page allocation tag reference (10-64)"
> > > > > > +     range 10 64
> > > > > > +     default "64"
> > > > > > +     depends on MEM_ALLOC_PROFILING
> > > > > > +     help
> > > > > > +       Number of bits used to encode a page allocation tag reference.
> > > > > > +
> > > > > > +       Smaller number results in less memory overhead but limits the number of
> > > > > > +       allocations which can be tagged (including allocations from modules).
> > > > > > +
> > > > >
> > > > > In other words, "we have no idea what's best for you, you're on your
> > > > > own".
> > > > >
> > > > > I pity our poor users.
> > > > >
> > > > > Can we at least tell them what they should look at to determine whether
> > > > > whatever random number they chose was helpful or harmful?
> > > >
> > > > At the end of my reply in
> > > > https://lore.kernel.org/all/CAJuCfpGNYgx0GW4suHRzmxVH28RGRnFBvFC6WO+F8BD4HDqxXA@mail.gmail.com/#t
> > > > I suggested using all unused page flags. That would simplify things
> > > > for the user at the expense of potentially using more memory than we
> > > > need.
> > >
> > > Why would that use more memory, and how much?
> >
> > Say our kernel uses 5000 page allocations and there are additional 100
> > allocations from all the modules we are loading at runtime. They all
> > can be addressed using 13 bits (8192 addressable tags), so the
> > contiguous memory we will be preallocating to store these tags is 8192
> > * sizeof(alloc_tag). sizeof(alloc_tag) is 40 bytes as of today but
> > might increase in the future if we add more fields there for other
> > uses (like gfp_flags for example). So, currently this would use 320KB.
> > If we always use 16 bits we would be preallocating 2.5MB. So, that
> > would be 2.2MB of wasted memory. Using more than 16 bits (65536
> > addressable tags) will be impractical anytime soon (current number
> > IIRC is a bit over 4000).
>
> I see, it's not about the page bits, it's about the contiguous array of
> alloc tags?
>
> What if we just reserved address space, and only filled it in as needed?

That might be possible. I'll have to try that. Thanks!


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 6/6] alloc_tag: config to store page allocation tag refs in page flags
  2024-09-04 16:18           ` Suren Baghdasaryan
  2024-09-04 16:21             ` Kent Overstreet
@ 2024-09-04 16:35             ` Matthew Wilcox
  2024-09-04 16:37               ` Kent Overstreet
  1 sibling, 1 reply; 27+ messages in thread
From: Matthew Wilcox @ 2024-09-04 16:35 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: John Hubbard, Andrew Morton, kent.overstreet, corbet, arnd,
	mcgrof, rppt, paulmck, thuth, tglx, bp, xiongwei.song, ardb,
	david, vbabka, mhocko, hannes, roman.gushchin, dave,
	liam.howlett, pasha.tatashin, souravpanda, keescook, dennis,
	yuzhao, vvvvvv, rostedt, iamjoonsoo.kim, rientjes, minchan,
	kaleshsingh, linux-doc, linux-kernel, linux-arch, linux-mm,
	linux-modules, kernel-team

On Wed, Sep 04, 2024 at 09:18:01AM -0700, Suren Baghdasaryan wrote:
> I'm not sure I understand your suggestion, Matthew. We allocate a
> folio and need to store a reference to the tag associated with the
> code that allocated that folio. We are not operating with ranges here.
> Are you suggesting to use a maple tree instead of page_ext to store
> this reference?

I'm saying that a folio has a physical address.  So you can use a physical
address as an index into a maple tree to store additional information
instead of using page_ext or trying to hammer the additional information
into struct page somewhere.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 6/6] alloc_tag: config to store page allocation tag refs in page flags
  2024-09-04 16:35             ` Matthew Wilcox
@ 2024-09-04 16:37               ` Kent Overstreet
  2024-09-04 16:39                 ` Suren Baghdasaryan
  0 siblings, 1 reply; 27+ messages in thread
From: Kent Overstreet @ 2024-09-04 16:37 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Suren Baghdasaryan, John Hubbard, Andrew Morton, corbet, arnd,
	mcgrof, rppt, paulmck, thuth, tglx, bp, xiongwei.song, ardb,
	david, vbabka, mhocko, hannes, roman.gushchin, dave,
	liam.howlett, pasha.tatashin, souravpanda, keescook, dennis,
	yuzhao, vvvvvv, rostedt, iamjoonsoo.kim, rientjes, minchan,
	kaleshsingh, linux-doc, linux-kernel, linux-arch, linux-mm,
	linux-modules, kernel-team

On Wed, Sep 04, 2024 at 05:35:49PM GMT, Matthew Wilcox wrote:
> On Wed, Sep 04, 2024 at 09:18:01AM -0700, Suren Baghdasaryan wrote:
> > I'm not sure I understand your suggestion, Matthew. We allocate a
> > folio and need to store a reference to the tag associated with the
> > code that allocated that folio. We are not operating with ranges here.
> > Are you suggesting to use a maple tree instead of page_ext to store
> > this reference?
> 
> I'm saying that a folio has a physical address.  So you can use a physical
> address as an index into a maple tree to store additional information
> instead of using page_ext or trying to hammer the additional information
> into struct page somewhere.

Ah, thanks, that makes more sense.

But it would add a lot of overhead to the page alloc/free paths...


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 6/6] alloc_tag: config to store page allocation tag refs in page flags
  2024-09-04 16:37               ` Kent Overstreet
@ 2024-09-04 16:39                 ` Suren Baghdasaryan
  0 siblings, 0 replies; 27+ messages in thread
From: Suren Baghdasaryan @ 2024-09-04 16:39 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Matthew Wilcox, John Hubbard, Andrew Morton, corbet, arnd,
	mcgrof, rppt, paulmck, thuth, tglx, bp, xiongwei.song, ardb,
	david, vbabka, mhocko, hannes, roman.gushchin, dave,
	liam.howlett, pasha.tatashin, souravpanda, keescook, dennis,
	yuzhao, vvvvvv, rostedt, iamjoonsoo.kim, rientjes, minchan,
	kaleshsingh, linux-doc, linux-kernel, linux-arch, linux-mm,
	linux-modules, kernel-team

On Wed, Sep 4, 2024 at 9:37 AM Kent Overstreet
<kent.overstreet@linux.dev> wrote:
>
> On Wed, Sep 04, 2024 at 05:35:49PM GMT, Matthew Wilcox wrote:
> > On Wed, Sep 04, 2024 at 09:18:01AM -0700, Suren Baghdasaryan wrote:
> > > I'm not sure I understand your suggestion, Matthew. We allocate a
> > > folio and need to store a reference to the tag associated with the
> > > code that allocated that folio. We are not operating with ranges here.
> > > Are you suggesting to use a maple tree instead of page_ext to store
> > > this reference?
> >
> > I'm saying that a folio has a physical address.  So you can use a physical
> > address as an index into a maple tree to store additional information
> > instead of using page_ext or trying to hammer the additional information
> > into struct page somewhere.
>
> Ah, thanks, that makes more sense.
>
> But it would add a lot of overhead to the page alloc/free paths...

Yeah, inserting into a maple_tree in the fast path of page allocation
would introduce considerable performance overhead.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 6/6] alloc_tag: config to store page allocation tag refs in page flags
  2024-09-04 16:08           ` Suren Baghdasaryan
@ 2024-09-04 18:58             ` John Hubbard
  2024-09-04 21:07               ` Suren Baghdasaryan
  0 siblings, 1 reply; 27+ messages in thread
From: John Hubbard @ 2024-09-04 18:58 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Andrew Morton, kent.overstreet, corbet, arnd, mcgrof, rppt,
	paulmck, thuth, tglx, bp, xiongwei.song, ardb, david, vbabka,
	mhocko, hannes, roman.gushchin, dave, willy, liam.howlett,
	pasha.tatashin, souravpanda, keescook, dennis, yuzhao, vvvvvv,
	rostedt, iamjoonsoo.kim, rientjes, minchan, kaleshsingh,
	linux-doc, linux-kernel, linux-arch, linux-mm, linux-modules,
	kernel-team

On 9/4/24 9:08 AM, Suren Baghdasaryan wrote:
> On Tue, Sep 3, 2024 at 7:06 PM 'John Hubbard' via kernel-team
> <kernel-team@android.com> wrote:
>> On 9/3/24 6:25 PM, John Hubbard wrote:
>>> On 9/3/24 11:19 AM, Suren Baghdasaryan wrote:
>>>> On Sun, Sep 1, 2024 at 10:16 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>>>>> On Sun,  1 Sep 2024 21:41:28 -0700 Suren Baghdasaryan <surenb@google.com> wrote:
...
>> The configuration should disable itself, in this case. But if that is
>> too big of a change for now, I suppose we could fall back to an error
>> message to the effect of, "please disable CONFIG_PGALLOC_TAG_USE_PAGEFLAGS
>> because the kernel build system is still too primitive to do that for you". :)
> 
> I don't think we can detect this at build time. We need to know how
> many page allocations there are, which we find out only after we build
> the kernel image (from the section size that holds allocation tags).
> Therefore it would have to be a post-build check. So I think the best
> we can do is to generate the error like the one you suggested after we
> build the image.
> Dependency on CONFIG_PAGE_EXTENSION is yet another complexity because
> if we auto-disable CONFIG_PGALLOC_TAG_USE_PAGEFLAGS, we would have to
> also auto-enable CONFIG_PAGE_EXTENSION if it's not already enabled.
> 
> I'll dig around some more to see if there is a better way.
>>
>>>> - If there are enough unused bits but we have to push last_cpupid out
>>>> of page flags, we issue a warning and continue. The user can disable
>>>> CONFIG_PGALLOC_TAG_USE_PAGEFLAGS if last_cpupid has to stay in page
>>>> flags.
>>
>> Let's try to decide now, what that tradeoff should be. Just pick one based
>> on what some of us perceive to be the expected usefulness and frequency of
>> use between last_cpuid and these tag refs.
>>
>> If someone really needs to change the tradeoff for that one bit, then that
>> someone is also likely able to hack up a change for it.
> 
> Yeah, from all the feedback, I realize that by pursuing the maximum
> flexibility I made configuring this mechanism close to impossible. I
> think the first step towards simplifying this would be to identify
> usable configurations. From that POV, I can see 3 useful modes:
> 
> 1. Page flags are not used. In this mode we will use direct pointer
> references and page extensions, like we do today. This mode is used
> when we don't have enough page flags. This can be a safe default which
> keeps things as they are today and should always work.

Definitely my favorite so far.

> 2. Page flags are used but not forced. This means we will try to use
> all free page flags bits (up to a reasonable limit of 16) without
> pushing out last_cpupid.

This is a logical next step, agreed.

> 3. Page flags are forced. This means we will try to use all free page
> flags bits after pushing last_cpupid out of page flags. This mode
> could be used if the user cares about memory profiling more than the
> performance overhead caused by last_cpupid.
> 
> I'm not 100% sure (3) is needed, so I think we can skip it until
> someone asks for it. It should be easy to add that in the future.

Right.

> If we detect at build time that we don't have enough page flag bits to
> cover kernel allocations for modes (2) or (3), we issue an error
> prompting the user to reconfigure to mode (1).
> 
> Ideally, I would like to have (2) as default mode and automatically
> fall back to (1) when it's impossible but as I mentioned before, I
> don't yet see a way to do that automatically.
> 
> For loadable modules, I think my earlier suggestion should work fine.
> If a module causes us to run out of space for tags, we disable memory
> profiling at runtime and log a warning for the user stating that we
> disabled memory profiling and if the user needs it they should
> configure mode (1). I *think* I can even disable profiling only for
> that module and not globally but I need to try that first.
> 
> I can start with modes (1) and (2) support which requires only
> CONFIG_PGALLOC_TAG_USE_PAGEFLAGS defaulted to N. Any user can try
> enabling this config and if that builds fine then keeping it for
> better performance and memory usage. Does that sound acceptable?
> Thanks,
> Suren.
> 

How badly do we need (2)? Because this is really expensive:

    a) It adds complexity to a complex,delicate core part of mm.

    b) It adds constraints, which prevent possible future features.

It's not yet clear that (2) is valuable enough (compared to (1))
to compensate, at least from what I've read. Unless I missed
something big.


thanks,
-- 
John Hubbard



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 6/6] alloc_tag: config to store page allocation tag refs in page flags
  2024-09-04 18:58             ` John Hubbard
@ 2024-09-04 21:07               ` Suren Baghdasaryan
  0 siblings, 0 replies; 27+ messages in thread
From: Suren Baghdasaryan @ 2024-09-04 21:07 UTC (permalink / raw)
  To: John Hubbard
  Cc: Andrew Morton, kent.overstreet, corbet, arnd, mcgrof, rppt,
	paulmck, thuth, tglx, bp, xiongwei.song, ardb, david, vbabka,
	mhocko, hannes, roman.gushchin, dave, willy, liam.howlett,
	pasha.tatashin, souravpanda, keescook, dennis, yuzhao, vvvvvv,
	rostedt, iamjoonsoo.kim, rientjes, minchan, kaleshsingh,
	linux-doc, linux-kernel, linux-arch, linux-mm, linux-modules,
	kernel-team

On Wed, Sep 4, 2024 at 11:58 AM 'John Hubbard' via kernel-team
<kernel-team@android.com> wrote:
>
> On 9/4/24 9:08 AM, Suren Baghdasaryan wrote:
> > On Tue, Sep 3, 2024 at 7:06 PM 'John Hubbard' via kernel-team
> > <kernel-team@android.com> wrote:
> >> On 9/3/24 6:25 PM, John Hubbard wrote:
> >>> On 9/3/24 11:19 AM, Suren Baghdasaryan wrote:
> >>>> On Sun, Sep 1, 2024 at 10:16 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> >>>>> On Sun,  1 Sep 2024 21:41:28 -0700 Suren Baghdasaryan <surenb@google.com> wrote:
> ...
> >> The configuration should disable itself, in this case. But if that is
> >> too big of a change for now, I suppose we could fall back to an error
> >> message to the effect of, "please disable CONFIG_PGALLOC_TAG_USE_PAGEFLAGS
> >> because the kernel build system is still too primitive to do that for you". :)
> >
> > I don't think we can detect this at build time. We need to know how
> > many page allocations there are, which we find out only after we build
> > the kernel image (from the section size that holds allocation tags).
> > Therefore it would have to be a post-build check. So I think the best
> > we can do is to generate the error like the one you suggested after we
> > build the image.
> > Dependency on CONFIG_PAGE_EXTENSION is yet another complexity because
> > if we auto-disable CONFIG_PGALLOC_TAG_USE_PAGEFLAGS, we would have to
> > also auto-enable CONFIG_PAGE_EXTENSION if it's not already enabled.
> >
> > I'll dig around some more to see if there is a better way.
> >>
> >>>> - If there are enough unused bits but we have to push last_cpupid out
> >>>> of page flags, we issue a warning and continue. The user can disable
> >>>> CONFIG_PGALLOC_TAG_USE_PAGEFLAGS if last_cpupid has to stay in page
> >>>> flags.
> >>
> >> Let's try to decide now, what that tradeoff should be. Just pick one based
> >> on what some of us perceive to be the expected usefulness and frequency of
> >> use between last_cpuid and these tag refs.
> >>
> >> If someone really needs to change the tradeoff for that one bit, then that
> >> someone is also likely able to hack up a change for it.
> >
> > Yeah, from all the feedback, I realize that by pursuing the maximum
> > flexibility I made configuring this mechanism close to impossible. I
> > think the first step towards simplifying this would be to identify
> > usable configurations. From that POV, I can see 3 useful modes:
> >
> > 1. Page flags are not used. In this mode we will use direct pointer
> > references and page extensions, like we do today. This mode is used
> > when we don't have enough page flags. This can be a safe default which
> > keeps things as they are today and should always work.
>
> Definitely my favorite so far.
>
> > 2. Page flags are used but not forced. This means we will try to use
> > all free page flags bits (up to a reasonable limit of 16) without
> > pushing out last_cpupid.
>
> This is a logical next step, agreed.
>
> > 3. Page flags are forced. This means we will try to use all free page
> > flags bits after pushing last_cpupid out of page flags. This mode
> > could be used if the user cares about memory profiling more than the
> > performance overhead caused by last_cpupid.
> >
> > I'm not 100% sure (3) is needed, so I think we can skip it until
> > someone asks for it. It should be easy to add that in the future.
>
> Right.
>
> > If we detect at build time that we don't have enough page flag bits to
> > cover kernel allocations for modes (2) or (3), we issue an error
> > prompting the user to reconfigure to mode (1).
> >
> > Ideally, I would like to have (2) as default mode and automatically
> > fall back to (1) when it's impossible but as I mentioned before, I
> > don't yet see a way to do that automatically.
> >
> > For loadable modules, I think my earlier suggestion should work fine.
> > If a module causes us to run out of space for tags, we disable memory
> > profiling at runtime and log a warning for the user stating that we
> > disabled memory profiling and if the user needs it they should
> > configure mode (1). I *think* I can even disable profiling only for
> > that module and not globally but I need to try that first.
> >
> > I can start with modes (1) and (2) support which requires only
> > CONFIG_PGALLOC_TAG_USE_PAGEFLAGS defaulted to N. Any user can try
> > enabling this config and if that builds fine then keeping it for
> > better performance and memory usage. Does that sound acceptable?
> > Thanks,
> > Suren.
> >
>
> How badly do we need (2)? Because this is really expensive:
>
>     a) It adds complexity to a complex,delicate core part of mm.
>
>     b) It adds constraints, which prevent possible future features.
>
> It's not yet clear that (2) is valuable enough (compared to (1))
> to compensate, at least from what I've read. Unless I missed
> something big.

(1) is what we already have today.
(2) would allow us to drop page_ext dependency, so there is
considerable memory saving and performance improvement (no need to
lookup the page extension on each tag reference). The benefits are
described in the cover letter at:
https://lore.kernel.org/all/20240902044128.664075-1-surenb@google.com/.


>
>
> thanks,
> --
> John Hubbard
>
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2024-09-04 21:07 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-09-02  4:41 [PATCH v2 0/6] page allocation tag compression Suren Baghdasaryan
2024-09-02  4:41 ` [PATCH v2 1/6] maple_tree: add mas_for_each_rev() helper Suren Baghdasaryan
2024-09-02  4:41 ` [PATCH v2 2/6] alloc_tag: load module tags into separate continuous memory Suren Baghdasaryan
2024-09-02  4:41 ` [PATCH v2 3/6] alloc_tag: eliminate alloc_tag_ref_set Suren Baghdasaryan
2024-09-02  4:41 ` [PATCH v2 4/6] alloc_tag: introduce pgalloc_tag_ref to abstract page tag references Suren Baghdasaryan
2024-09-02  4:41 ` [PATCH v2 5/6] alloc_tag: make page allocation tag reference size configurable Suren Baghdasaryan
2024-09-02  5:09   ` Andrew Morton
2024-09-04  1:07     ` Suren Baghdasaryan
2024-09-04  1:16       ` Kent Overstreet
2024-09-04  2:04         ` Suren Baghdasaryan
2024-09-04 16:25           ` Kent Overstreet
2024-09-04 16:35             ` Suren Baghdasaryan
2024-09-02  4:41 ` [PATCH v2 6/6] alloc_tag: config to store page allocation tag refs in page flags Suren Baghdasaryan
2024-09-02  5:16   ` Andrew Morton
2024-09-03 18:19     ` Suren Baghdasaryan
2024-09-04  1:25       ` John Hubbard
2024-09-04  2:05         ` John Hubbard
2024-09-04 16:08           ` Suren Baghdasaryan
2024-09-04 18:58             ` John Hubbard
2024-09-04 21:07               ` Suren Baghdasaryan
2024-09-04  2:18         ` Matthew Wilcox
2024-09-04 16:18           ` Suren Baghdasaryan
2024-09-04 16:21             ` Kent Overstreet
2024-09-04 16:35             ` Matthew Wilcox
2024-09-04 16:37               ` Kent Overstreet
2024-09-04 16:39                 ` Suren Baghdasaryan
2024-09-04  2:41         ` Kent Overstreet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox