linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: stable@vger.kernel.org, linux-mm@kvack.org,
	Oscar Salvador <osalvador@suse.de>,
	Michal Hocko <mhocko@suse.com>,
	"Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Laurent Vivier <lvivier@redhat.com>, Baoquan He <bhe@redhat.com>
Subject: Re: [PATCH for 4.19-stable 00/25] mm/memory_hotplug: backport of pending stable fixes
Date: Wed, 15 Jan 2020 16:54:59 +0100	[thread overview]
Message-ID: <4a09f161-e2f1-b506-f0fd-2d6c4ea1437c@redhat.com> (raw)
In-Reply-To: <20200115153927.GC3881751@kroah.com>

On 15.01.20 16:39, Greg Kroah-Hartman wrote:
> On Wed, Jan 15, 2020 at 04:33:14PM +0100, David Hildenbrand wrote:
>> This is the backport of the following fixes for 4.19-stable:
>>
>> - a31b264c2b41 ("mm/memory_hotplug: make
>>   unregister_memory_block_under_nodes() never fail")
>> -- Turned out to not only be a cleanup but also a fix

Took the wrong one. It's d84f2f5a7552 ("drivers/base/node.c: simplify
unregister_memory_block_under_nodes()")

>> - 2c91f8fc6c99 ("mm/memory_hotplug: fix try_offline_node()")
>> -- Automatic stable backport failed due to missing dependencies.
>> - feee6b298916 ("mm/memory_hotplug: shrink zones when offlining memory")
>> -- Was marked as stable 5.0+ due to the backport complexity,, but it's also
>>    relevant for 4.19/4.14. As I have to backport quite some cleanups
>>    already ...
>>
>> To minimize manual code changes, I decided to pull in quite some cleanups.
>> Still some manual code changes are necessary (indicated in the individual
>> patches). Especially missing arm64 hot(un)plug, missing sub-section hotadd
>> support, and missing unification of mm/hmm.c and kernel/memremap.c requires
>> care.
>>
>> Due to:
>> - 4e0d2e7ef14d ("mm, sparse: pass nid instead of pgdat to
>>   sparse_add_one_section()")
>> I need:
>> - afe9b36ca890 ("mm/memunmap: don't access uninitialized memmap in
>>   memunmap_pages()")
>>
>> Please note that:
>> - 4c4b7f9ba948 ("mm/memory_hotplug: remove memory block devices
>>   before arch_remove_memory()")
>> Makes big (e.g., 32TB) machines boot up slower (e.g., 2h vs 10m). There is
>> a performance fix in linux-next, but it does not seem to classify as a
>> fix for current RC / stable.
>>
>> I did quite some testing with hot(un)plug, onlining/offlining of memory
>> blocks and memory-less/CPU-less NUMA nodes under x86_64 - the same set of
>> tests I run against upstream on a fairly regular basis. I compile-tested
>> on PowerPC. I did not test any ZONE_DEVICE/HMM thingies.
>>
>> Let's see what people think - it's a lot of patches. If we want this,
>> then I can try to prepare a similar set for 4.4-stable.
> 
> What bug(s) are these trying to fix here?

All tackle memory unplug issues, especially when memory was never
onlined (or onlining failed), paired with memory unplug. When trying to
access garbage memmaps we crash the kernel (e.g., because the derviced
pgdat pointer is broken)


d84f2f5a7552 ("drivers/base/node.c: simplify
unregister_memory_block_under_nodes()")

->
https://lore.kernel.org/linux-mm/b2e31976-b07d-11e6-f806-f13f4619be4d@redhat.com/

"If the memory we are removing was never onlined,
get_nid_for_pfn()->pfn_to_nid() will return garbage. Removing will
succeed but links will remain in place. [...] We will trigger the
BUG_ON(ret) in add_memory_resource(), because
link_mem_sections() will return with -EEXIST."


2c91f8fc6c99 ("mm/memory_hotplug: fix try_offline_node()")

We might access garbage memmaps on memory unplug and trigger a crash on
memory unplug, when trying to offline the node.


feee6b298916 ("mm/memory_hotplug: shrink zones when offlining memory")

Memory unplug will access garbage memmaps (resulting in crashes) and the
zones might not get fixed up properly. Relevant when memory was never
onlined, when memory blocks of a DIMM were onlined to different zones,
or when memory blocks were re-onlined to different zones.


This backports the remaining "don't access uninitialized memmaps"-like
fixes. The other ones, were already backported.

> 
> And why would 4.9 and 4.4 care about them?

The crashes can be trigger under 4.9 and 4.4. If we decide that we do
not care, then this series can be dropped.

-- 
Thanks,

David / dhildenb



  reply	other threads:[~2020-01-15 15:55 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-15 15:33 David Hildenbrand
2020-01-15 15:33 ` [PATCH for 4.19-stable 01/25] mm/memory_hotplug: make remove_memory() take the device_hotplug_lock David Hildenbrand
2020-01-15 15:33 ` [PATCH for 4.19-stable 02/25] mm, sparse: drop pgdat_resize_lock in sparse_add/remove_one_section() David Hildenbrand
2020-01-15 15:33 ` [PATCH for 4.19-stable 03/25] mm, sparse: pass nid instead of pgdat to sparse_add_one_section() David Hildenbrand
2020-01-15 15:33 ` [PATCH for 4.19-stable 04/25] drivers/base/memory.c: remove an unnecessary check on NR_MEM_SECTIONS David Hildenbrand
2020-01-15 15:33 ` [PATCH for 4.19-stable 05/25] mm, memory_hotplug: add nid parameter to arch_remove_memory David Hildenbrand
2020-01-15 15:33 ` [PATCH for 4.19-stable 06/25] mm/memory_hotplug: release memory resource after arch_remove_memory() David Hildenbrand
2020-01-15 15:33 ` [PATCH for 4.19-stable 07/25] drivers/base/memory.c: clean up relics in function parameters David Hildenbrand
2020-01-15 15:33 ` [PATCH for 4.19-stable 08/25] mm, memory_hotplug: update a comment in unregister_memory() David Hildenbrand
2020-01-15 15:38   ` Greg Kroah-Hartman
2020-01-15 15:41     ` David Hildenbrand
2020-01-15 15:33 ` [PATCH for 4.19-stable 09/25] mm/memory_hotplug: make unregister_memory_section() never fail David Hildenbrand
2020-01-15 15:33 ` [PATCH for 4.19-stable 10/25] mm/memory_hotplug: make __remove_section() " David Hildenbrand
2020-01-15 15:33 ` [PATCH for 4.19-stable 11/25] powerpc/mm: Fix section mismatch warning David Hildenbrand
2020-01-15 15:33 ` [PATCH for 4.19-stable 12/25] powerpc/mm: move warning from resize_hpt_for_hotplug() David Hildenbrand
2020-01-15 15:33 ` [PATCH for 4.19-stable 13/25] mm/memory_hotplug: make __remove_pages() and arch_remove_memory() never fail David Hildenbrand
2020-01-15 15:33 ` [PATCH for 4.19-stable 14/25] s390x/mm: implement arch_remove_memory() David Hildenbrand
2020-01-15 15:33 ` [PATCH for 4.19-stable 15/25] mm/memory_hotplug: allow arch_remove_memory() without CONFIG_MEMORY_HOTREMOVE David Hildenbrand
2020-01-15 15:33 ` [PATCH for 4.19-stable 16/25] drivers/base/memory: pass a block_id to init_memory_block() David Hildenbrand
2020-01-15 15:33 ` [PATCH for 4.19-stable 17/25] mm/memory_hotplug: create memory block devices after arch_add_memory() David Hildenbrand
2020-01-15 15:33 ` [PATCH for 4.19-stable 18/25] mm/memory_hotplug: remove memory block devices before arch_remove_memory() David Hildenbrand
2020-01-15 15:33 ` [PATCH for 4.19-stable 19/25] mm/memory_hotplug: make unregister_memory_block_under_nodes() never fail David Hildenbrand
2020-01-15 15:33 ` [PATCH for 4.19-stable 20/25] mm/memory_hotplug: remove "zone" parameter from sparse_remove_one_section David Hildenbrand
2020-01-15 15:33 ` [PATCH for 4.19-stable 21/25] mm/hotplug: kill is_dev_zone() usage in __remove_pages() David Hildenbrand
2020-01-15 15:33 ` [PATCH for 4.19-stable 22/25] drivers/base/node.c: simplify unregister_memory_block_under_nodes() David Hildenbrand
2020-01-15 15:33 ` [PATCH for 4.19-stable 23/25] mm/memunmap: don't access uninitialized memmap in memunmap_pages() David Hildenbrand
2020-01-15 15:33 ` [PATCH for 4.19-stable 24/25] mm/memory_hotplug: fix try_offline_node() David Hildenbrand
2020-01-15 15:33 ` [PATCH for 4.19-stable 25/25] mm/memory_hotplug: shrink zones when offlining memory David Hildenbrand
2020-01-15 15:39 ` [PATCH for 4.19-stable 00/25] mm/memory_hotplug: backport of pending stable fixes Greg Kroah-Hartman
2020-01-15 15:54   ` David Hildenbrand [this message]
2020-01-16  8:34     ` Greg Kroah-Hartman
2020-01-16  8:42       ` David Hildenbrand
2020-01-16  8:54         ` Greg Kroah-Hartman
2020-01-16  8:59           ` David Hildenbrand
2020-01-16  9:26             ` Greg Kroah-Hartman
2020-01-16  9:35               ` David Hildenbrand
2020-01-16 14:32               ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4a09f161-e2f1-b506-f0fd-2d6c4ea1437c@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=bhe@redhat.com \
    --cc=dan.j.williams@intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-mm@kvack.org \
    --cc=lvivier@redhat.com \
    --cc=mhocko@suse.com \
    --cc=osalvador@suse.de \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox