linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH stable 5.10 0/1] Fix memleak during hotremove memory
@ 2023-06-14  6:18 Wupeng Ma
  2023-06-14  6:19 ` [PATCH stable 5.10 1/1] mm/memory_hotplug: extend offline_and_remove_memory() to handle more than one memory block Wupeng Ma
  0 siblings, 1 reply; 13+ messages in thread
From: Wupeng Ma @ 2023-06-14  6:18 UTC (permalink / raw)
  To: akpm, david; +Cc: linux-mm, linux-kernel, mawupeng1, stable

From: Ma Wupeng <mawupeng1@huawei.com>

Hi maintainers:

Our test find a memleak in init_memory_block, it is clear that mem is never
been released due to wrong refcount. Commit 08b3acd7a68f ("mm/memory_hotplug:
Introduce offline_and_remove_memory()") failed to dec refcount after
find_memory_block which fail to dec refcount to zero in remove memory
causing the leak.

Commit 8dc4bb58a146 ("mm/memory_hotplug: extend offline_and_remove_memory()
to handle more than one memory block") introduce walk_memory_blocks to
replace find_memory_block which dec refcount by calling put_device after
find_memory_block_by_id. In the way, the memleak is fixed.

Here is the simplified calltrace:

  kmem_cache_alloc_trace+0x664/0xed0
  init_memory_block+0x8c/0x170
  create_memory_block_devices+0xa4/0x150
  add_memory_resource+0x188/0x530
  __add_memory+0x78/0x104
  add_memory+0x6c/0xb0

David Hildenbrand (1):
  mm/memory_hotplug: extend offline_and_remove_memory() to handle more
    than one memory block

 mm/memory_hotplug.c | 105 +++++++++++++++++++++++++++++++++++++-------
 1 file changed, 89 insertions(+), 16 deletions(-)

-- 
2.25.1



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH stable 5.10 1/1] mm/memory_hotplug: extend offline_and_remove_memory() to handle more than one memory block
  2023-06-14  6:18 [PATCH stable 5.10 0/1] Fix memleak during hotremove memory Wupeng Ma
@ 2023-06-14  6:19 ` Wupeng Ma
  2023-06-14  6:35   ` Greg KH
  0 siblings, 1 reply; 13+ messages in thread
From: Wupeng Ma @ 2023-06-14  6:19 UTC (permalink / raw)
  To: akpm, david
  Cc: linux-mm, linux-kernel, mawupeng1, stable, Wei Yang,
	Michael S. Tsirkin, Jason Wang, Pankaj Gupta, Michal Hocko,
	Oscar Salvador

From: David Hildenbrand <david@redhat.com>

virtio-mem soon wants to use offline_and_remove_memory() memory that
exceeds a single Linux memory block (memory_block_size_bytes()). Let's
remove that restriction.

Let's remember the old state and try to restore that if anything goes
wrong. While re-onlining can, in general, fail, it's highly unlikely to
happen (usually only when a notifier fails to allocate memory, and these
are rather rare).

This will be used by virtio-mem to offline+remove memory ranges that are
bigger than a single memory block - for example, with a device block
size of 1 GiB (e.g., gigantic pages in the hypervisor) and a Linux memory
block size of 128MB.

While we could compress the state into 2 bit, using 8 bit is much
easier.

This handling is similar, but different to acpi_scan_try_to_offline():

a) We don't try to offline twice. I am not sure if this CONFIG_MEMCG
optimization is still relevant - it should only apply to ZONE_NORMAL
(where we have no guarantees). If relevant, we can always add it.

b) acpi_scan_try_to_offline() simply onlines all memory in case
something goes wrong. It doesn't restore previous online type. Let's do
that, so we won't overwrite what e.g., user space configured.

Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-28-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
---
 mm/memory_hotplug.c | 105 +++++++++++++++++++++++++++++++++++++-------
 1 file changed, 89 insertions(+), 16 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index f0633f9a9116..9ec9e1e67705 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1788,39 +1788,112 @@ int remove_memory(int nid, u64 start, u64 size)
 }
 EXPORT_SYMBOL_GPL(remove_memory);
 
+static int try_offline_memory_block(struct memory_block *mem, void *arg)
+{
+	uint8_t online_type = MMOP_ONLINE_KERNEL;
+	uint8_t **online_types = arg;
+	struct page *page;
+	int rc;
+
+	/*
+	 * Sense the online_type via the zone of the memory block. Offlining
+	 * with multiple zones within one memory block will be rejected
+	 * by offlining code ... so we don't care about that.
+	 */
+	page = pfn_to_online_page(section_nr_to_pfn(mem->start_section_nr));
+	if (page && zone_idx(page_zone(page)) == ZONE_MOVABLE)
+		online_type = MMOP_ONLINE_MOVABLE;
+
+	rc = device_offline(&mem->dev);
+	/*
+	 * Default is MMOP_OFFLINE - change it only if offlining succeeded,
+	 * so try_reonline_memory_block() can do the right thing.
+	 */
+	if (!rc)
+		**online_types = online_type;
+
+	(*online_types)++;
+	/* Ignore if already offline. */
+	return rc < 0 ? rc : 0;
+}
+
+static int try_reonline_memory_block(struct memory_block *mem, void *arg)
+{
+	uint8_t **online_types = arg;
+	int rc;
+
+	if (**online_types != MMOP_OFFLINE) {
+		mem->online_type = **online_types;
+		rc = device_online(&mem->dev);
+		if (rc < 0)
+			pr_warn("%s: Failed to re-online memory: %d",
+				__func__, rc);
+	}
+
+	/* Continue processing all remaining memory blocks. */
+	(*online_types)++;
+	return 0;
+}
+
 /*
- * Try to offline and remove a memory block. Might take a long time to
- * finish in case memory is still in use. Primarily useful for memory devices
- * that logically unplugged all memory (so it's no longer in use) and want to
- * offline + remove the memory block.
+ * Try to offline and remove memory. Might take a long time to finish in case
+ * memory is still in use. Primarily useful for memory devices that logically
+ * unplugged all memory (so it's no longer in use) and want to offline + remove
+ * that memory.
  */
 int offline_and_remove_memory(int nid, u64 start, u64 size)
 {
-	struct memory_block *mem;
-	int rc = -EINVAL;
+	const unsigned long mb_count = size / memory_block_size_bytes();
+	uint8_t *online_types, *tmp;
+	int rc;
 
 	if (!IS_ALIGNED(start, memory_block_size_bytes()) ||
-	    size != memory_block_size_bytes())
-		return rc;
+	    !IS_ALIGNED(size, memory_block_size_bytes()) || !size)
+		return -EINVAL;
+
+	/*
+	 * We'll remember the old online type of each memory block, so we can
+	 * try to revert whatever we did when offlining one memory block fails
+	 * after offlining some others succeeded.
+	 */
+	online_types = kmalloc_array(mb_count, sizeof(*online_types),
+				     GFP_KERNEL);
+	if (!online_types)
+		return -ENOMEM;
+	/*
+	 * Initialize all states to MMOP_OFFLINE, so when we abort processing in
+	 * try_offline_memory_block(), we'll skip all unprocessed blocks in
+	 * try_reonline_memory_block().
+	 */
+	memset(online_types, MMOP_OFFLINE, mb_count);
 
 	lock_device_hotplug();
-	mem = find_memory_block(__pfn_to_section(PFN_DOWN(start)));
-	if (mem)
-		rc = device_offline(&mem->dev);
-	/* Ignore if the device is already offline. */
-	if (rc > 0)
-		rc = 0;
+
+	tmp = online_types;
+	rc = walk_memory_blocks(start, size, &tmp, try_offline_memory_block);
 
 	/*
-	 * In case we succeeded to offline the memory block, remove it.
+	 * In case we succeeded to offline all memory, remove it.
 	 * This cannot fail as it cannot get onlined in the meantime.
 	 */
 	if (!rc) {
 		rc = try_remove_memory(nid, start, size);
-		WARN_ON_ONCE(rc);
+		if (rc)
+			pr_err("%s: Failed to remove memory: %d", __func__, rc);
+	}
+
+	/*
+	 * Rollback what we did. While memory onlining might theoretically fail
+	 * (nacked by a notifier), it barely ever happens.
+	 */
+	if (rc) {
+		tmp = online_types;
+		walk_memory_blocks(start, size, &tmp,
+				   try_reonline_memory_block);
 	}
 	unlock_device_hotplug();
 
+	kfree(online_types);
 	return rc;
 }
 EXPORT_SYMBOL_GPL(offline_and_remove_memory);
-- 
2.25.1



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH stable 5.10 1/1] mm/memory_hotplug: extend offline_and_remove_memory() to handle more than one memory block
  2023-06-14  6:19 ` [PATCH stable 5.10 1/1] mm/memory_hotplug: extend offline_and_remove_memory() to handle more than one memory block Wupeng Ma
@ 2023-06-14  6:35   ` Greg KH
  2023-06-14  6:45     ` mawupeng
  0 siblings, 1 reply; 13+ messages in thread
From: Greg KH @ 2023-06-14  6:35 UTC (permalink / raw)
  To: Wupeng Ma
  Cc: akpm, david, linux-mm, linux-kernel, stable, Wei Yang,
	Michael S. Tsirkin, Jason Wang, Pankaj Gupta, Michal Hocko,
	Oscar Salvador

On Wed, Jun 14, 2023 at 02:19:00PM +0800, Wupeng Ma wrote:
> From: David Hildenbrand <david@redhat.com>
> 
> virtio-mem soon wants to use offline_and_remove_memory() memory that
> exceeds a single Linux memory block (memory_block_size_bytes()). Let's
> remove that restriction.
> 
> Let's remember the old state and try to restore that if anything goes
> wrong. While re-onlining can, in general, fail, it's highly unlikely to
> happen (usually only when a notifier fails to allocate memory, and these
> are rather rare).
> 
> This will be used by virtio-mem to offline+remove memory ranges that are
> bigger than a single memory block - for example, with a device block
> size of 1 GiB (e.g., gigantic pages in the hypervisor) and a Linux memory
> block size of 128MB.
> 
> While we could compress the state into 2 bit, using 8 bit is much
> easier.
> 
> This handling is similar, but different to acpi_scan_try_to_offline():
> 
> a) We don't try to offline twice. I am not sure if this CONFIG_MEMCG
> optimization is still relevant - it should only apply to ZONE_NORMAL
> (where we have no guarantees). If relevant, we can always add it.
> 
> b) acpi_scan_try_to_offline() simply onlines all memory in case
> something goes wrong. It doesn't restore previous online type. Let's do
> that, so we won't overwrite what e.g., user space configured.
> 
> Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> Link: https://lore.kernel.org/r/20201112133815.13332-28-david@redhat.com
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> Acked-by: Andrew Morton <akpm@linux-foundation.org>
> ---
>  mm/memory_hotplug.c | 105 +++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 89 insertions(+), 16 deletions(-)

As you forwarded this patch on, you too need to sign-off on it.

Also, what is the git id of the commit in Linus's tree?

thanks,

greg k-h


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH stable 5.10 1/1] mm/memory_hotplug: extend offline_and_remove_memory() to handle more than one memory block
  2023-06-14  6:35   ` Greg KH
@ 2023-06-14  6:45     ` mawupeng
  2023-06-19  6:20       ` Greg KH
  2023-06-19  6:51       ` [PATCH stable 5.10] " Wupeng Ma
  0 siblings, 2 replies; 13+ messages in thread
From: mawupeng @ 2023-06-14  6:45 UTC (permalink / raw)
  To: gregkh
  Cc: mawupeng1, akpm, david, linux-mm, linux-kernel, stable,
	richard.weiyang, mst, jasowang, pankaj.gupta.linux, mhocko,
	osalvador



On 2023/6/14 14:35, Greg KH wrote:
> On Wed, Jun 14, 2023 at 02:19:00PM +0800, Wupeng Ma wrote:
>> From: David Hildenbrand <david@redhat.com>
>>
>> virtio-mem soon wants to use offline_and_remove_memory() memory that
>> exceeds a single Linux memory block (memory_block_size_bytes()). Let's
>> remove that restriction.
>>
>> Let's remember the old state and try to restore that if anything goes
>> wrong. While re-onlining can, in general, fail, it's highly unlikely to
>> happen (usually only when a notifier fails to allocate memory, and these
>> are rather rare).
>>
>> This will be used by virtio-mem to offline+remove memory ranges that are
>> bigger than a single memory block - for example, with a device block
>> size of 1 GiB (e.g., gigantic pages in the hypervisor) and a Linux memory
>> block size of 128MB.
>>
>> While we could compress the state into 2 bit, using 8 bit is much
>> easier.
>>
>> This handling is similar, but different to acpi_scan_try_to_offline():
>>
>> a) We don't try to offline twice. I am not sure if this CONFIG_MEMCG
>> optimization is still relevant - it should only apply to ZONE_NORMAL
>> (where we have no guarantees). If relevant, we can always add it.
>>
>> b) acpi_scan_try_to_offline() simply onlines all memory in case
>> something goes wrong. It doesn't restore previous online type. Let's do
>> that, so we won't overwrite what e.g., user space configured.
>>
>> Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>> Cc: Jason Wang <jasowang@redhat.com>
>> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>> Cc: Michal Hocko <mhocko@kernel.org>
>> Cc: Oscar Salvador <osalvador@suse.de>
>> Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> Link: https://lore.kernel.org/r/20201112133815.13332-28-david@redhat.com
>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>> Acked-by: Andrew Morton <akpm@linux-foundation.org>
>> ---
>>  mm/memory_hotplug.c | 105 +++++++++++++++++++++++++++++++++++++-------
>>  1 file changed, 89 insertions(+), 16 deletions(-)
> 
> As you forwarded this patch on, you too need to sign-off on it.

Thanks for reminding me.

Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>

> 
> Also, what is the git id of the commit in Linus's tree?

Sorry, here is the commit in Linus's tree.

commit 8dc4bb58a146655eb057247d7c9d19e73928715b upstream.

> 
> thanks,
> 
> greg k-h


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH stable 5.10 1/1] mm/memory_hotplug: extend offline_and_remove_memory() to handle more than one memory block
  2023-06-14  6:45     ` mawupeng
@ 2023-06-19  6:20       ` Greg KH
  2023-06-19  6:54         ` mawupeng
  2023-06-19  6:51       ` [PATCH stable 5.10] " Wupeng Ma
  1 sibling, 1 reply; 13+ messages in thread
From: Greg KH @ 2023-06-19  6:20 UTC (permalink / raw)
  To: mawupeng
  Cc: akpm, david, linux-mm, linux-kernel, stable, richard.weiyang,
	mst, jasowang, pankaj.gupta.linux, mhocko, osalvador

On Wed, Jun 14, 2023 at 02:45:58PM +0800, mawupeng wrote:
> 
> 
> On 2023/6/14 14:35, Greg KH wrote:
> > On Wed, Jun 14, 2023 at 02:19:00PM +0800, Wupeng Ma wrote:
> >> From: David Hildenbrand <david@redhat.com>
> >>
> >> virtio-mem soon wants to use offline_and_remove_memory() memory that
> >> exceeds a single Linux memory block (memory_block_size_bytes()). Let's
> >> remove that restriction.
> >>
> >> Let's remember the old state and try to restore that if anything goes
> >> wrong. While re-onlining can, in general, fail, it's highly unlikely to
> >> happen (usually only when a notifier fails to allocate memory, and these
> >> are rather rare).
> >>
> >> This will be used by virtio-mem to offline+remove memory ranges that are
> >> bigger than a single memory block - for example, with a device block
> >> size of 1 GiB (e.g., gigantic pages in the hypervisor) and a Linux memory
> >> block size of 128MB.
> >>
> >> While we could compress the state into 2 bit, using 8 bit is much
> >> easier.
> >>
> >> This handling is similar, but different to acpi_scan_try_to_offline():
> >>
> >> a) We don't try to offline twice. I am not sure if this CONFIG_MEMCG
> >> optimization is still relevant - it should only apply to ZONE_NORMAL
> >> (where we have no guarantees). If relevant, we can always add it.
> >>
> >> b) acpi_scan_try_to_offline() simply onlines all memory in case
> >> something goes wrong. It doesn't restore previous online type. Let's do
> >> that, so we won't overwrite what e.g., user space configured.
> >>
> >> Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
> >> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> >> Cc: Jason Wang <jasowang@redhat.com>
> >> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> >> Cc: Michal Hocko <mhocko@kernel.org>
> >> Cc: Oscar Salvador <osalvador@suse.de>
> >> Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
> >> Cc: Andrew Morton <akpm@linux-foundation.org>
> >> Signed-off-by: David Hildenbrand <david@redhat.com>
> >> Link: https://lore.kernel.org/r/20201112133815.13332-28-david@redhat.com
> >> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> >> Acked-by: Andrew Morton <akpm@linux-foundation.org>
> >> ---
> >>  mm/memory_hotplug.c | 105 +++++++++++++++++++++++++++++++++++++-------
> >>  1 file changed, 89 insertions(+), 16 deletions(-)
> > 
> > As you forwarded this patch on, you too need to sign-off on it.
> 
> Thanks for reminding me.
> 
> Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
> 
> > 
> > Also, what is the git id of the commit in Linus's tree?
> 
> Sorry, here is the commit in Linus's tree.
> 
> commit 8dc4bb58a146655eb057247d7c9d19e73928715b upstream.

Please resend the change with both of these things fixed up, so I don't
have to manually do it :)

thanks,

greg k-h


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH stable 5.10] mm/memory_hotplug: extend offline_and_remove_memory() to handle more than one memory block
  2023-06-14  6:45     ` mawupeng
  2023-06-19  6:20       ` Greg KH
@ 2023-06-19  6:51       ` Wupeng Ma
  2023-06-19  7:16         ` Greg KH
  1 sibling, 1 reply; 13+ messages in thread
From: Wupeng Ma @ 2023-06-19  6:51 UTC (permalink / raw)
  To: akpm, david, gregkh
  Cc: linux-mm, linux-kernel, mawupeng1, stable, Wei Yang,
	Michael S. Tsirkin, Jason Wang, Pankaj Gupta, Michal Hocko,
	Oscar Salvador

From: David Hildenbrand <david@redhat.com>

commit 8dc4bb58a146655eb057247d7c9d19e73928715b upstream.

virtio-mem soon wants to use offline_and_remove_memory() memory that
exceeds a single Linux memory block (memory_block_size_bytes()). Let's
remove that restriction.

Let's remember the old state and try to restore that if anything goes
wrong. While re-onlining can, in general, fail, it's highly unlikely to
happen (usually only when a notifier fails to allocate memory, and these
are rather rare).

This will be used by virtio-mem to offline+remove memory ranges that are
bigger than a single memory block - for example, with a device block
size of 1 GiB (e.g., gigantic pages in the hypervisor) and a Linux memory
block size of 128MB.

While we could compress the state into 2 bit, using 8 bit is much
easier.

This handling is similar, but different to acpi_scan_try_to_offline():

a) We don't try to offline twice. I am not sure if this CONFIG_MEMCG
optimization is still relevant - it should only apply to ZONE_NORMAL
(where we have no guarantees). If relevant, we can always add it.

b) acpi_scan_try_to_offline() simply onlines all memory in case
something goes wrong. It doesn't restore previous online type. Let's do
that, so we won't overwrite what e.g., user space configured.

Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-28-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
---
 mm/memory_hotplug.c | 105 +++++++++++++++++++++++++++++++++++++-------
 1 file changed, 89 insertions(+), 16 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index f0633f9a9116..9ec9e1e67705 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1788,39 +1788,112 @@ int remove_memory(int nid, u64 start, u64 size)
 }
 EXPORT_SYMBOL_GPL(remove_memory);
 
+static int try_offline_memory_block(struct memory_block *mem, void *arg)
+{
+	uint8_t online_type = MMOP_ONLINE_KERNEL;
+	uint8_t **online_types = arg;
+	struct page *page;
+	int rc;
+
+	/*
+	 * Sense the online_type via the zone of the memory block. Offlining
+	 * with multiple zones within one memory block will be rejected
+	 * by offlining code ... so we don't care about that.
+	 */
+	page = pfn_to_online_page(section_nr_to_pfn(mem->start_section_nr));
+	if (page && zone_idx(page_zone(page)) == ZONE_MOVABLE)
+		online_type = MMOP_ONLINE_MOVABLE;
+
+	rc = device_offline(&mem->dev);
+	/*
+	 * Default is MMOP_OFFLINE - change it only if offlining succeeded,
+	 * so try_reonline_memory_block() can do the right thing.
+	 */
+	if (!rc)
+		**online_types = online_type;
+
+	(*online_types)++;
+	/* Ignore if already offline. */
+	return rc < 0 ? rc : 0;
+}
+
+static int try_reonline_memory_block(struct memory_block *mem, void *arg)
+{
+	uint8_t **online_types = arg;
+	int rc;
+
+	if (**online_types != MMOP_OFFLINE) {
+		mem->online_type = **online_types;
+		rc = device_online(&mem->dev);
+		if (rc < 0)
+			pr_warn("%s: Failed to re-online memory: %d",
+				__func__, rc);
+	}
+
+	/* Continue processing all remaining memory blocks. */
+	(*online_types)++;
+	return 0;
+}
+
 /*
- * Try to offline and remove a memory block. Might take a long time to
- * finish in case memory is still in use. Primarily useful for memory devices
- * that logically unplugged all memory (so it's no longer in use) and want to
- * offline + remove the memory block.
+ * Try to offline and remove memory. Might take a long time to finish in case
+ * memory is still in use. Primarily useful for memory devices that logically
+ * unplugged all memory (so it's no longer in use) and want to offline + remove
+ * that memory.
  */
 int offline_and_remove_memory(int nid, u64 start, u64 size)
 {
-	struct memory_block *mem;
-	int rc = -EINVAL;
+	const unsigned long mb_count = size / memory_block_size_bytes();
+	uint8_t *online_types, *tmp;
+	int rc;
 
 	if (!IS_ALIGNED(start, memory_block_size_bytes()) ||
-	    size != memory_block_size_bytes())
-		return rc;
+	    !IS_ALIGNED(size, memory_block_size_bytes()) || !size)
+		return -EINVAL;
+
+	/*
+	 * We'll remember the old online type of each memory block, so we can
+	 * try to revert whatever we did when offlining one memory block fails
+	 * after offlining some others succeeded.
+	 */
+	online_types = kmalloc_array(mb_count, sizeof(*online_types),
+				     GFP_KERNEL);
+	if (!online_types)
+		return -ENOMEM;
+	/*
+	 * Initialize all states to MMOP_OFFLINE, so when we abort processing in
+	 * try_offline_memory_block(), we'll skip all unprocessed blocks in
+	 * try_reonline_memory_block().
+	 */
+	memset(online_types, MMOP_OFFLINE, mb_count);
 
 	lock_device_hotplug();
-	mem = find_memory_block(__pfn_to_section(PFN_DOWN(start)));
-	if (mem)
-		rc = device_offline(&mem->dev);
-	/* Ignore if the device is already offline. */
-	if (rc > 0)
-		rc = 0;
+
+	tmp = online_types;
+	rc = walk_memory_blocks(start, size, &tmp, try_offline_memory_block);
 
 	/*
-	 * In case we succeeded to offline the memory block, remove it.
+	 * In case we succeeded to offline all memory, remove it.
 	 * This cannot fail as it cannot get onlined in the meantime.
 	 */
 	if (!rc) {
 		rc = try_remove_memory(nid, start, size);
-		WARN_ON_ONCE(rc);
+		if (rc)
+			pr_err("%s: Failed to remove memory: %d", __func__, rc);
+	}
+
+	/*
+	 * Rollback what we did. While memory onlining might theoretically fail
+	 * (nacked by a notifier), it barely ever happens.
+	 */
+	if (rc) {
+		tmp = online_types;
+		walk_memory_blocks(start, size, &tmp,
+				   try_reonline_memory_block);
 	}
 	unlock_device_hotplug();
 
+	kfree(online_types);
 	return rc;
 }
 EXPORT_SYMBOL_GPL(offline_and_remove_memory);
-- 
2.25.1



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH stable 5.10 1/1] mm/memory_hotplug: extend offline_and_remove_memory() to handle more than one memory block
  2023-06-19  6:20       ` Greg KH
@ 2023-06-19  6:54         ` mawupeng
  0 siblings, 0 replies; 13+ messages in thread
From: mawupeng @ 2023-06-19  6:54 UTC (permalink / raw)
  To: gregkh
  Cc: mawupeng1, akpm, david, linux-mm, linux-kernel, stable,
	richard.weiyang, mst, jasowang, pankaj.gupta.linux, mhocko,
	osalvador



On 2023/6/19 14:20, Greg KH wrote:
> On Wed, Jun 14, 2023 at 02:45:58PM +0800, mawupeng wrote:
>>
>>
>> On 2023/6/14 14:35, Greg KH wrote:
>>> On Wed, Jun 14, 2023 at 02:19:00PM +0800, Wupeng Ma wrote:
>>>> From: David Hildenbrand <david@redhat.com>
>>>>
>>>> virtio-mem soon wants to use offline_and_remove_memory() memory that
>>>> exceeds a single Linux memory block (memory_block_size_bytes()). Let's
>>>> remove that restriction.
>>>>
>>>> Let's remember the old state and try to restore that if anything goes
>>>> wrong. While re-onlining can, in general, fail, it's highly unlikely to
>>>> happen (usually only when a notifier fails to allocate memory, and these
>>>> are rather rare).
>>>>
>>>> This will be used by virtio-mem to offline+remove memory ranges that are
>>>> bigger than a single memory block - for example, with a device block
>>>> size of 1 GiB (e.g., gigantic pages in the hypervisor) and a Linux memory
>>>> block size of 128MB.
>>>>
>>>> While we could compress the state into 2 bit, using 8 bit is much
>>>> easier.
>>>>
>>>> This handling is similar, but different to acpi_scan_try_to_offline():
>>>>
>>>> a) We don't try to offline twice. I am not sure if this CONFIG_MEMCG
>>>> optimization is still relevant - it should only apply to ZONE_NORMAL
>>>> (where we have no guarantees). If relevant, we can always add it.
>>>>
>>>> b) acpi_scan_try_to_offline() simply onlines all memory in case
>>>> something goes wrong. It doesn't restore previous online type. Let's do
>>>> that, so we won't overwrite what e.g., user space configured.
>>>>
>>>> Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
>>>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>>>> Cc: Jason Wang <jasowang@redhat.com>
>>>> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>>>> Cc: Michal Hocko <mhocko@kernel.org>
>>>> Cc: Oscar Salvador <osalvador@suse.de>
>>>> Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>> Link: https://lore.kernel.org/r/20201112133815.13332-28-david@redhat.com
>>>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>>>> Acked-by: Andrew Morton <akpm@linux-foundation.org>
>>>> ---
>>>>  mm/memory_hotplug.c | 105 +++++++++++++++++++++++++++++++++++++-------
>>>>  1 file changed, 89 insertions(+), 16 deletions(-)
>>>
>>> As you forwarded this patch on, you too need to sign-off on it.
>>
>> Thanks for reminding me.
>>
>> Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
>>
>>>
>>> Also, what is the git id of the commit in Linus's tree?
>>
>> Sorry, here is the commit in Linus's tree.
>>
>> commit 8dc4bb58a146655eb057247d7c9d19e73928715b upstream.
> 
> Please resend the change with both of these things fixed up, so I don't
> have to manually do it :)

I have resend the patch but use the parent message id for in-reply-to.

Sorry.

> 
> thanks,
> 
> greg k-h


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH stable 5.10] mm/memory_hotplug: extend offline_and_remove_memory() to handle more than one memory block
  2023-06-19  6:51       ` [PATCH stable 5.10] " Wupeng Ma
@ 2023-06-19  7:16         ` Greg KH
  2023-06-19  7:22           ` mawupeng
  0 siblings, 1 reply; 13+ messages in thread
From: Greg KH @ 2023-06-19  7:16 UTC (permalink / raw)
  To: Wupeng Ma
  Cc: akpm, david, linux-mm, linux-kernel, stable, Wei Yang,
	Michael S. Tsirkin, Jason Wang, Pankaj Gupta, Michal Hocko,
	Oscar Salvador

On Mon, Jun 19, 2023 at 02:51:21PM +0800, Wupeng Ma wrote:
> From: David Hildenbrand <david@redhat.com>
> 
> commit 8dc4bb58a146655eb057247d7c9d19e73928715b upstream.
> 
> virtio-mem soon wants to use offline_and_remove_memory() memory that
> exceeds a single Linux memory block (memory_block_size_bytes()). Let's
> remove that restriction.
> 
> Let's remember the old state and try to restore that if anything goes
> wrong. While re-onlining can, in general, fail, it's highly unlikely to
> happen (usually only when a notifier fails to allocate memory, and these
> are rather rare).
> 
> This will be used by virtio-mem to offline+remove memory ranges that are
> bigger than a single memory block - for example, with a device block
> size of 1 GiB (e.g., gigantic pages in the hypervisor) and a Linux memory
> block size of 128MB.
> 
> While we could compress the state into 2 bit, using 8 bit is much
> easier.
> 
> This handling is similar, but different to acpi_scan_try_to_offline():
> 
> a) We don't try to offline twice. I am not sure if this CONFIG_MEMCG
> optimization is still relevant - it should only apply to ZONE_NORMAL
> (where we have no guarantees). If relevant, we can always add it.
> 
> b) acpi_scan_try_to_offline() simply onlines all memory in case
> something goes wrong. It doesn't restore previous online type. Let's do
> that, so we won't overwrite what e.g., user space configured.
> 
> Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> Link: https://lore.kernel.org/r/20201112133815.13332-28-david@redhat.com
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> Acked-by: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
> ---
>  mm/memory_hotplug.c | 105 +++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 89 insertions(+), 16 deletions(-)
> 

Why is this needed in 5.10.y?  Looks like a new feature to me, what
problem does it solve there?

thanks,

greg k-h


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH stable 5.10] mm/memory_hotplug: extend offline_and_remove_memory() to handle more than one memory block
  2023-06-19  7:16         ` Greg KH
@ 2023-06-19  7:22           ` mawupeng
  2023-06-19  7:41             ` David Hildenbrand
  2023-06-19  7:48             ` Greg KH
  0 siblings, 2 replies; 13+ messages in thread
From: mawupeng @ 2023-06-19  7:22 UTC (permalink / raw)
  To: gregkh
  Cc: mawupeng1, akpm, david, linux-mm, linux-kernel, stable,
	richard.weiyang, mst, jasowang, pankaj.gupta.linux, mhocko,
	osalvador



On 2023/6/19 15:16, Greg KH wrote:
> On Mon, Jun 19, 2023 at 02:51:21PM +0800, Wupeng Ma wrote:
>> From: David Hildenbrand <david@redhat.com>
>>
>> commit 8dc4bb58a146655eb057247d7c9d19e73928715b upstream.
>>
>> virtio-mem soon wants to use offline_and_remove_memory() memory that
>> exceeds a single Linux memory block (memory_block_size_bytes()). Let's
>> remove that restriction.
>>
>> Let's remember the old state and try to restore that if anything goes
>> wrong. While re-onlining can, in general, fail, it's highly unlikely to
>> happen (usually only when a notifier fails to allocate memory, and these
>> are rather rare).
>>
>> This will be used by virtio-mem to offline+remove memory ranges that are
>> bigger than a single memory block - for example, with a device block
>> size of 1 GiB (e.g., gigantic pages in the hypervisor) and a Linux memory
>> block size of 128MB.
>>
>> While we could compress the state into 2 bit, using 8 bit is much
>> easier.
>>
>> This handling is similar, but different to acpi_scan_try_to_offline():
>>
>> a) We don't try to offline twice. I am not sure if this CONFIG_MEMCG
>> optimization is still relevant - it should only apply to ZONE_NORMAL
>> (where we have no guarantees). If relevant, we can always add it.
>>
>> b) acpi_scan_try_to_offline() simply onlines all memory in case
>> something goes wrong. It doesn't restore previous online type. Let's do
>> that, so we won't overwrite what e.g., user space configured.
>>
>> Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>> Cc: Jason Wang <jasowang@redhat.com>
>> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>> Cc: Michal Hocko <mhocko@kernel.org>
>> Cc: Oscar Salvador <osalvador@suse.de>
>> Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> Link: https://lore.kernel.org/r/20201112133815.13332-28-david@redhat.com
>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>> Acked-by: Andrew Morton <akpm@linux-foundation.org>
>> Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
>> ---
>>  mm/memory_hotplug.c | 105 +++++++++++++++++++++++++++++++++++++-------
>>  1 file changed, 89 insertions(+), 16 deletions(-)
>>
> 
> Why is this needed in 5.10.y?  Looks like a new feature to me, what
> problem does it solve there?
> 
> thanks,
> 
> greg k-h

It do introduce a new feature. But at the same time, it fix a memleak introduced
in Commit 08b3acd7a68f ("mm/memory_hotplug: Introduce offline_and_remove_memory()"

Our test find a memleak in init_memory_block, it is clear that mem is never
been released due to wrong refcount. Commit 08b3acd7a68f ("mm/memory_hotplug:
Introduce offline_and_remove_memory()") failed to dec refcount after
find_memory_block which fail to dec refcount to zero in remove memory
causing the leak.

Commit 8dc4bb58a146 ("mm/memory_hotplug: extend offline_and_remove_memory()
to handle more than one memory block") introduce walk_memory_blocks to
replace find_memory_block which dec refcount by calling put_device after
find_memory_block_by_id. In the way, the memleak is fixed.

Here is the simplified calltrace:

  kmem_cache_alloc_trace+0x664/0xed0
  init_memory_block+0x8c/0x170
  create_memory_block_devices+0xa4/0x150
  add_memory_resource+0x188/0x530
  __add_memory+0x78/0x104
  add_memory+0x6c/0xb0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH stable 5.10] mm/memory_hotplug: extend offline_and_remove_memory() to handle more than one memory block
  2023-06-19  7:22           ` mawupeng
@ 2023-06-19  7:41             ` David Hildenbrand
  2023-06-19  7:53               ` mawupeng
  2023-06-19  7:48             ` Greg KH
  1 sibling, 1 reply; 13+ messages in thread
From: David Hildenbrand @ 2023-06-19  7:41 UTC (permalink / raw)
  To: mawupeng, gregkh
  Cc: akpm, linux-mm, linux-kernel, stable, richard.weiyang, mst,
	jasowang, pankaj.gupta.linux, mhocko, osalvador

On 19.06.23 09:22, mawupeng wrote:
> 
> 
> On 2023/6/19 15:16, Greg KH wrote:
>> On Mon, Jun 19, 2023 at 02:51:21PM +0800, Wupeng Ma wrote:
>>> From: David Hildenbrand <david@redhat.com>
>>>
>>> commit 8dc4bb58a146655eb057247d7c9d19e73928715b upstream.
>>>
>>> virtio-mem soon wants to use offline_and_remove_memory() memory that
>>> exceeds a single Linux memory block (memory_block_size_bytes()). Let's
>>> remove that restriction.
>>>
>>> Let's remember the old state and try to restore that if anything goes
>>> wrong. While re-onlining can, in general, fail, it's highly unlikely to
>>> happen (usually only when a notifier fails to allocate memory, and these
>>> are rather rare).
>>>
>>> This will be used by virtio-mem to offline+remove memory ranges that are
>>> bigger than a single memory block - for example, with a device block
>>> size of 1 GiB (e.g., gigantic pages in the hypervisor) and a Linux memory
>>> block size of 128MB.
>>>
>>> While we could compress the state into 2 bit, using 8 bit is much
>>> easier.
>>>
>>> This handling is similar, but different to acpi_scan_try_to_offline():
>>>
>>> a) We don't try to offline twice. I am not sure if this CONFIG_MEMCG
>>> optimization is still relevant - it should only apply to ZONE_NORMAL
>>> (where we have no guarantees). If relevant, we can always add it.
>>>
>>> b) acpi_scan_try_to_offline() simply onlines all memory in case
>>> something goes wrong. It doesn't restore previous online type. Let's do
>>> that, so we won't overwrite what e.g., user space configured.
>>>
>>> Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
>>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>>> Cc: Jason Wang <jasowang@redhat.com>
>>> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>>> Cc: Michal Hocko <mhocko@kernel.org>
>>> Cc: Oscar Salvador <osalvador@suse.de>
>>> Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>> Link: https://lore.kernel.org/r/20201112133815.13332-28-david@redhat.com
>>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>>> Acked-by: Andrew Morton <akpm@linux-foundation.org>
>>> Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
>>> ---
>>>   mm/memory_hotplug.c | 105 +++++++++++++++++++++++++++++++++++++-------
>>>   1 file changed, 89 insertions(+), 16 deletions(-)
>>>
>>
>> Why is this needed in 5.10.y?  Looks like a new feature to me, what
>> problem does it solve there?
>>
>> thanks,
>>
>> greg k-h
> 
> It do introduce a new feature. But at the same time, it fix a memleak introduced
> in Commit 08b3acd7a68f ("mm/memory_hotplug: Introduce offline_and_remove_memory()"
> 
> Our test find a memleak in init_memory_block, it is clear that mem is never
> been released due to wrong refcount. Commit 08b3acd7a68f ("mm/memory_hotplug:
> Introduce offline_and_remove_memory()") failed to dec refcount after
> find_memory_block which fail to dec refcount to zero in remove memory
> causing the leak.
> 
> Commit 8dc4bb58a146 ("mm/memory_hotplug: extend offline_and_remove_memory()
> to handle more than one memory block") introduce walk_memory_blocks to
> replace find_memory_block which dec refcount by calling put_device after
> find_memory_block_by_id. In the way, the memleak is fixed.
> 
> Here is the simplified calltrace:
> 
>    kmem_cache_alloc_trace+0x664/0xed0
>    init_memory_block+0x8c/0x170
>    create_memory_block_devices+0xa4/0x150
>    add_memory_resource+0x188/0x530
>    __add_memory+0x78/0x104
>    add_memory+0x6c/0xb0
> 

Makes sense to me. Of course, we could think about a simplified stable 
fix that only drops the ref.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH stable 5.10] mm/memory_hotplug: extend offline_and_remove_memory() to handle more than one memory block
  2023-06-19  7:22           ` mawupeng
  2023-06-19  7:41             ` David Hildenbrand
@ 2023-06-19  7:48             ` Greg KH
  1 sibling, 0 replies; 13+ messages in thread
From: Greg KH @ 2023-06-19  7:48 UTC (permalink / raw)
  To: mawupeng
  Cc: akpm, david, linux-mm, linux-kernel, stable, richard.weiyang,
	mst, jasowang, pankaj.gupta.linux, mhocko, osalvador

On Mon, Jun 19, 2023 at 03:22:20PM +0800, mawupeng wrote:
> 
> 
> On 2023/6/19 15:16, Greg KH wrote:
> > On Mon, Jun 19, 2023 at 02:51:21PM +0800, Wupeng Ma wrote:
> >> From: David Hildenbrand <david@redhat.com>
> >>
> >> commit 8dc4bb58a146655eb057247d7c9d19e73928715b upstream.
> >>
> >> virtio-mem soon wants to use offline_and_remove_memory() memory that
> >> exceeds a single Linux memory block (memory_block_size_bytes()). Let's
> >> remove that restriction.
> >>
> >> Let's remember the old state and try to restore that if anything goes
> >> wrong. While re-onlining can, in general, fail, it's highly unlikely to
> >> happen (usually only when a notifier fails to allocate memory, and these
> >> are rather rare).
> >>
> >> This will be used by virtio-mem to offline+remove memory ranges that are
> >> bigger than a single memory block - for example, with a device block
> >> size of 1 GiB (e.g., gigantic pages in the hypervisor) and a Linux memory
> >> block size of 128MB.
> >>
> >> While we could compress the state into 2 bit, using 8 bit is much
> >> easier.
> >>
> >> This handling is similar, but different to acpi_scan_try_to_offline():
> >>
> >> a) We don't try to offline twice. I am not sure if this CONFIG_MEMCG
> >> optimization is still relevant - it should only apply to ZONE_NORMAL
> >> (where we have no guarantees). If relevant, we can always add it.
> >>
> >> b) acpi_scan_try_to_offline() simply onlines all memory in case
> >> something goes wrong. It doesn't restore previous online type. Let's do
> >> that, so we won't overwrite what e.g., user space configured.
> >>
> >> Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
> >> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> >> Cc: Jason Wang <jasowang@redhat.com>
> >> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> >> Cc: Michal Hocko <mhocko@kernel.org>
> >> Cc: Oscar Salvador <osalvador@suse.de>
> >> Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
> >> Cc: Andrew Morton <akpm@linux-foundation.org>
> >> Signed-off-by: David Hildenbrand <david@redhat.com>
> >> Link: https://lore.kernel.org/r/20201112133815.13332-28-david@redhat.com
> >> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> >> Acked-by: Andrew Morton <akpm@linux-foundation.org>
> >> Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
> >> ---
> >>  mm/memory_hotplug.c | 105 +++++++++++++++++++++++++++++++++++++-------
> >>  1 file changed, 89 insertions(+), 16 deletions(-)
> >>
> > 
> > Why is this needed in 5.10.y?  Looks like a new feature to me, what
> > problem does it solve there?
> > 
> > thanks,
> > 
> > greg k-h
> 
> It do introduce a new feature. But at the same time, it fix a memleak introduced
> in Commit 08b3acd7a68f ("mm/memory_hotplug: Introduce offline_and_remove_memory()"
> 
> Our test find a memleak in init_memory_block, it is clear that mem is never
> been released due to wrong refcount. Commit 08b3acd7a68f ("mm/memory_hotplug:
> Introduce offline_and_remove_memory()") failed to dec refcount after
> find_memory_block which fail to dec refcount to zero in remove memory
> causing the leak.
> 
> Commit 8dc4bb58a146 ("mm/memory_hotplug: extend offline_and_remove_memory()
> to handle more than one memory block") introduce walk_memory_blocks to
> replace find_memory_block which dec refcount by calling put_device after
> find_memory_block_by_id. In the way, the memleak is fixed.
> 
> Here is the simplified calltrace:
> 
>   kmem_cache_alloc_trace+0x664/0xed0
>   init_memory_block+0x8c/0x170
>   create_memory_block_devices+0xa4/0x150
>   add_memory_resource+0x188/0x530
>   __add_memory+0x78/0x104
>   add_memory+0x6c/0xb0

Ok, thanks for the information, now queued up.

greg k-h


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH stable 5.10] mm/memory_hotplug: extend offline_and_remove_memory() to handle more than one memory block
  2023-06-19  7:41             ` David Hildenbrand
@ 2023-06-19  7:53               ` mawupeng
  2023-06-19  8:04                 ` Greg KH
  0 siblings, 1 reply; 13+ messages in thread
From: mawupeng @ 2023-06-19  7:53 UTC (permalink / raw)
  To: david, gregkh
  Cc: mawupeng1, akpm, linux-mm, linux-kernel, stable, richard.weiyang,
	mst, jasowang, pankaj.gupta.linux, mhocko, osalvador



On 2023/6/19 15:41, David Hildenbrand wrote:
> On 19.06.23 09:22, mawupeng wrote:
>>
>>
>> On 2023/6/19 15:16, Greg KH wrote:
>>> On Mon, Jun 19, 2023 at 02:51:21PM +0800, Wupeng Ma wrote:
>>>> From: David Hildenbrand <david@redhat.com>
>>>>
>>>> commit 8dc4bb58a146655eb057247d7c9d19e73928715b upstream.
>>>>
>>>> virtio-mem soon wants to use offline_and_remove_memory() memory that
>>>> exceeds a single Linux memory block (memory_block_size_bytes()). Let's
>>>> remove that restriction.
>>>>
>>>> Let's remember the old state and try to restore that if anything goes
>>>> wrong. While re-onlining can, in general, fail, it's highly unlikely to
>>>> happen (usually only when a notifier fails to allocate memory, and these
>>>> are rather rare).
>>>>
>>>> This will be used by virtio-mem to offline+remove memory ranges that are
>>>> bigger than a single memory block - for example, with a device block
>>>> size of 1 GiB (e.g., gigantic pages in the hypervisor) and a Linux memory
>>>> block size of 128MB.
>>>>
>>>> While we could compress the state into 2 bit, using 8 bit is much
>>>> easier.
>>>>
>>>> This handling is similar, but different to acpi_scan_try_to_offline():
>>>>
>>>> a) We don't try to offline twice. I am not sure if this CONFIG_MEMCG
>>>> optimization is still relevant - it should only apply to ZONE_NORMAL
>>>> (where we have no guarantees). If relevant, we can always add it.
>>>>
>>>> b) acpi_scan_try_to_offline() simply onlines all memory in case
>>>> something goes wrong. It doesn't restore previous online type. Let's do
>>>> that, so we won't overwrite what e.g., user space configured.
>>>>
>>>> Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
>>>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>>>> Cc: Jason Wang <jasowang@redhat.com>
>>>> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>>>> Cc: Michal Hocko <mhocko@kernel.org>
>>>> Cc: Oscar Salvador <osalvador@suse.de>
>>>> Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>> Link: https://lore.kernel.org/r/20201112133815.13332-28-david@redhat.com
>>>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>>>> Acked-by: Andrew Morton <akpm@linux-foundation.org>
>>>> Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
>>>> ---
>>>>   mm/memory_hotplug.c | 105 +++++++++++++++++++++++++++++++++++++-------
>>>>   1 file changed, 89 insertions(+), 16 deletions(-)
>>>>
>>>
>>> Why is this needed in 5.10.y?  Looks like a new feature to me, what
>>> problem does it solve there?
>>>
>>> thanks,
>>>
>>> greg k-h
>>
>> It do introduce a new feature. But at the same time, it fix a memleak introduced
>> in Commit 08b3acd7a68f ("mm/memory_hotplug: Introduce offline_and_remove_memory()"
>>
>> Our test find a memleak in init_memory_block, it is clear that mem is never
>> been released due to wrong refcount. Commit 08b3acd7a68f ("mm/memory_hotplug:
>> Introduce offline_and_remove_memory()") failed to dec refcount after
>> find_memory_block which fail to dec refcount to zero in remove memory
>> causing the leak.
>>
>> Commit 8dc4bb58a146 ("mm/memory_hotplug: extend offline_and_remove_memory()
>> to handle more than one memory block") introduce walk_memory_blocks to
>> replace find_memory_block which dec refcount by calling put_device after
>> find_memory_block_by_id. In the way, the memleak is fixed.
>>
>> Here is the simplified calltrace:
>>
>>    kmem_cache_alloc_trace+0x664/0xed0
>>    init_memory_block+0x8c/0x170
>>    create_memory_block_devices+0xa4/0x150
>>    add_memory_resource+0x188/0x530
>>    __add_memory+0x78/0x104
>>    add_memory+0x6c/0xb0
>>
> 
> Makes sense to me. Of course, we could think about a simplified stable fix that only drops the ref.

Since the new patch does not introduce any kabi change, maybe we can merge this one?

However the changelog may lead to some confusion for other people just like the question
you asked.

> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH stable 5.10] mm/memory_hotplug: extend offline_and_remove_memory() to handle more than one memory block
  2023-06-19  7:53               ` mawupeng
@ 2023-06-19  8:04                 ` Greg KH
  0 siblings, 0 replies; 13+ messages in thread
From: Greg KH @ 2023-06-19  8:04 UTC (permalink / raw)
  To: mawupeng
  Cc: david, akpm, linux-mm, linux-kernel, stable, richard.weiyang,
	mst, jasowang, pankaj.gupta.linux, mhocko, osalvador

On Mon, Jun 19, 2023 at 03:53:40PM +0800, mawupeng wrote:
> 
> 
> On 2023/6/19 15:41, David Hildenbrand wrote:
> > On 19.06.23 09:22, mawupeng wrote:
> >>
> >>
> >> On 2023/6/19 15:16, Greg KH wrote:
> >>> On Mon, Jun 19, 2023 at 02:51:21PM +0800, Wupeng Ma wrote:
> >>>> From: David Hildenbrand <david@redhat.com>
> >>>>
> >>>> commit 8dc4bb58a146655eb057247d7c9d19e73928715b upstream.
> >>>>
> >>>> virtio-mem soon wants to use offline_and_remove_memory() memory that
> >>>> exceeds a single Linux memory block (memory_block_size_bytes()). Let's
> >>>> remove that restriction.
> >>>>
> >>>> Let's remember the old state and try to restore that if anything goes
> >>>> wrong. While re-onlining can, in general, fail, it's highly unlikely to
> >>>> happen (usually only when a notifier fails to allocate memory, and these
> >>>> are rather rare).
> >>>>
> >>>> This will be used by virtio-mem to offline+remove memory ranges that are
> >>>> bigger than a single memory block - for example, with a device block
> >>>> size of 1 GiB (e.g., gigantic pages in the hypervisor) and a Linux memory
> >>>> block size of 128MB.
> >>>>
> >>>> While we could compress the state into 2 bit, using 8 bit is much
> >>>> easier.
> >>>>
> >>>> This handling is similar, but different to acpi_scan_try_to_offline():
> >>>>
> >>>> a) We don't try to offline twice. I am not sure if this CONFIG_MEMCG
> >>>> optimization is still relevant - it should only apply to ZONE_NORMAL
> >>>> (where we have no guarantees). If relevant, we can always add it.
> >>>>
> >>>> b) acpi_scan_try_to_offline() simply onlines all memory in case
> >>>> something goes wrong. It doesn't restore previous online type. Let's do
> >>>> that, so we won't overwrite what e.g., user space configured.
> >>>>
> >>>> Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
> >>>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> >>>> Cc: Jason Wang <jasowang@redhat.com>
> >>>> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> >>>> Cc: Michal Hocko <mhocko@kernel.org>
> >>>> Cc: Oscar Salvador <osalvador@suse.de>
> >>>> Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
> >>>> Cc: Andrew Morton <akpm@linux-foundation.org>
> >>>> Signed-off-by: David Hildenbrand <david@redhat.com>
> >>>> Link: https://lore.kernel.org/r/20201112133815.13332-28-david@redhat.com
> >>>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> >>>> Acked-by: Andrew Morton <akpm@linux-foundation.org>
> >>>> Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
> >>>> ---
> >>>>   mm/memory_hotplug.c | 105 +++++++++++++++++++++++++++++++++++++-------
> >>>>   1 file changed, 89 insertions(+), 16 deletions(-)
> >>>>
> >>>
> >>> Why is this needed in 5.10.y?  Looks like a new feature to me, what
> >>> problem does it solve there?
> >>>
> >>> thanks,
> >>>
> >>> greg k-h
> >>
> >> It do introduce a new feature. But at the same time, it fix a memleak introduced
> >> in Commit 08b3acd7a68f ("mm/memory_hotplug: Introduce offline_and_remove_memory()"
> >>
> >> Our test find a memleak in init_memory_block, it is clear that mem is never
> >> been released due to wrong refcount. Commit 08b3acd7a68f ("mm/memory_hotplug:
> >> Introduce offline_and_remove_memory()") failed to dec refcount after
> >> find_memory_block which fail to dec refcount to zero in remove memory
> >> causing the leak.
> >>
> >> Commit 8dc4bb58a146 ("mm/memory_hotplug: extend offline_and_remove_memory()
> >> to handle more than one memory block") introduce walk_memory_blocks to
> >> replace find_memory_block which dec refcount by calling put_device after
> >> find_memory_block_by_id. In the way, the memleak is fixed.
> >>
> >> Here is the simplified calltrace:
> >>
> >>    kmem_cache_alloc_trace+0x664/0xed0
> >>    init_memory_block+0x8c/0x170
> >>    create_memory_block_devices+0xa4/0x150
> >>    add_memory_resource+0x188/0x530
> >>    __add_memory+0x78/0x104
> >>    add_memory+0x6c/0xb0
> >>
> > 
> > Makes sense to me. Of course, we could think about a simplified stable fix that only drops the ref.
> 
> Since the new patch does not introduce any kabi change, maybe we can merge this one?

stable kernels never care about "kabi", that is a made up thing that
some distros work to enforce only.  It has nothing to do with the
community.

And I will always prefer to take the real commit that is in Linus's tree
over any "custom" patch, as 90%+ of the time, custom changes are almost
always wrong.

thanks,

greg k-h


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-06-19  8:04 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-14  6:18 [PATCH stable 5.10 0/1] Fix memleak during hotremove memory Wupeng Ma
2023-06-14  6:19 ` [PATCH stable 5.10 1/1] mm/memory_hotplug: extend offline_and_remove_memory() to handle more than one memory block Wupeng Ma
2023-06-14  6:35   ` Greg KH
2023-06-14  6:45     ` mawupeng
2023-06-19  6:20       ` Greg KH
2023-06-19  6:54         ` mawupeng
2023-06-19  6:51       ` [PATCH stable 5.10] " Wupeng Ma
2023-06-19  7:16         ` Greg KH
2023-06-19  7:22           ` mawupeng
2023-06-19  7:41             ` David Hildenbrand
2023-06-19  7:53               ` mawupeng
2023-06-19  8:04                 ` Greg KH
2023-06-19  7:48             ` Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox