Re: [PATCH v3 01/11] mm/memremap: Get rid of memmap_init_zone_device()

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: David Hildenbrand <david@redhat.com>
To: Alexander Duyck <alexander.duyck@gmail.com>,
	Dan Williams <dan.j.williams@intel.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Oscar Salvador <osalvador@suse.com>,
	Michal Hocko <mhocko@suse.com>,
	Pavel Tatashin <pasha.tatashin@soleen.com>,
	Wei Yang <richard.weiyang@gmail.com>, Qian Cai <cai@lca.pw>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Stephen Rothwell <sfr@canb.auug.org.au>,
	Dave Airlie <airlied@redhat.com>,
	Andrey Konovalov <andreyknvl@google.com>,
	Alexander Duyck <alexander.h.duyck@linux.intel.com>,
	Ira Weiny <ira.weiny@intel.com>,
	John Hubbard <jhubbard@nvidia.com>,
	Arun KS <arunks@codeaurora.org>,
	Souptick Joarder <jrdr.linux@gmail.com>,
	Robin Murphy <robin.murphy@arm.com>,
	Yang Shi <yang.shi@linux.alibaba.com>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	Logan Gunthorpe <logang@deltatee.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Mel Gorman <mgorman@techsingularity.net>,
	Mike Rapoport <rppt@linux.vnet.ibm.com>,
	Alexander Potapenko <glider@google.com>
Subject: Re: [PATCH v3 01/11] mm/memremap: Get rid of memmap_init_zone_device()
Date: Thu, 29 Aug 2019 18:55:42 +0200	[thread overview]
Message-ID: <46b8e215-97d4-9097-9941-58e050c5efcc@redhat.com> (raw)
In-Reply-To: <CAKgT0UdKwYhF++=b=vN_Kw_SORtgJ3ehCAn8a9kp8tb79HknyQ@mail.gmail.com>

On 29.08.19 18:39, Alexander Duyck wrote:
> On Thu, Aug 29, 2019 at 12:00 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> As far as I can see, the original split wanted to speed up a duplicate
>> initialization. We now only initialize once - and we really should
>> initialize under the lock, when resizing the zones.
> 
> What do you mean by duplicate initialization? Basically the issue was
> that we can have systems with massive memory footprints and I was just
> trying to get the initialization time under a minute. The compromise I
> made was to split the initialization so that we only initialized the
> pages in the altmap and defer the rest so that they can be initialized
> in parallel.
> 
> What this patch does is serialize the initialization so it will likely
> take 2 to 4 minutes or more to initialize memory on a system where I
> had brought the init time under about 30 seconds.
> 
>> As soon as we drop the lock we might have memory unplug running, trying
>> to shrink the zone again. In case the memmap was not properly initialized,
>> the zone/node shrinking code might get false negatives when search for
>> the new zone/node boundaries - bad. We suddenly could no longer span the
>> memory we just added.
> 
> The issue as I see it is that we can have systems with multiple nodes
> and each node can populate a fairly significant amount of data. By
> pushing this all back under the hotplug lock you are serializing the
> initialization for each node resulting in a 4x or 8x increase in
> initialization time on some of these bigger systems.
> 
>> Also, once we want to fix set_zone_contiguous(zone) inside
>> move_pfn_range_to_zone() to actually work with ZONE_DEVICE (instead of
>> always immediately stopping and never setting zone->contiguous) we have
>> to have the whole memmap initialized at that point. (not sure if we
>> want that in the future, just a note)
>>
>> Let's just keep things simple and initialize the memmap when resizing
>> the zones under the lock.
>>
>> If this is a real performance issue, we have to watch out for
>> alternatives.
> 
> My concern is that this is going to become a performance issue, but I
> don't have access to the hardware at the moment to test how much of
> one. I'll have to check to see if I can get access to a system to test
> this patch set.
> 

Thanks for having a look - I already dropped this patch again. We will
rather stop shrinking the ZONE_DEVICE. So assume this patch is gone.

[...]

> So if you are moving this all under the lock then this is going to
> serialize initialization and it is going to be quite expensive on bit
> systems. I was only ever able to get the init time down to something
> like ~20s with the optimized function. Since that has been torn apart
> and you are back to doing things with memmap_init_zone we are probably
> looking at more like 25-30s for each node, and on a 4 node system we
> are looking at 2 minutes or so which may lead to issues if people are
> mounting this.
> 
> Instead of forcing this all to be done under the hotplug lock is there
> some way we could do this under the zone span_seqlock instead to
> achieve the same result?

I guess the right approach really is as Michal suggest to not shrink at
all (at least ZONE_DEVICE) :)

> 
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index c5d62f1c2851..44038665fe8e 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -5845,7 +5845,7 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn)
>>   */
>>  void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>>                 unsigned long start_pfn, enum memmap_context context,
>> -               struct vmem_altmap *altmap)
>> +               struct dev_pagemap *pgmap)
>>  {
>>         unsigned long pfn, end_pfn = start_pfn + size;
>>         struct page *page;
>> @@ -5853,24 +5853,6 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>>         if (highest_memmap_pfn < end_pfn - 1)
>>                 highest_memmap_pfn = end_pfn - 1;
>>
>> -#ifdef CONFIG_ZONE_DEVICE
>> -       /*
>> -        * Honor reservation requested by the driver for this ZONE_DEVICE
>> -        * memory. We limit the total number of pages to initialize to just
>> -        * those that might contain the memory mapping. We will defer the
>> -        * ZONE_DEVICE page initialization until after we have released
>> -        * the hotplug lock.
>> -        */
>> -       if (zone == ZONE_DEVICE) {
>> -               if (!altmap)
>> -                       return;
>> -
>> -               if (start_pfn == altmap->base_pfn)
>> -                       start_pfn += altmap->reserve;
>> -               end_pfn = altmap->base_pfn + vmem_altmap_offset(altmap);
>> -       }
>> -#endif
>> -
>>         for (pfn = start_pfn; pfn < end_pfn; pfn++) {
>>                 /*
>>                  * There can be holes in boot-time mem_map[]s handed to this
>> @@ -5892,6 +5874,20 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>>                 if (context == MEMMAP_HOTPLUG)
>>                         __SetPageReserved(page);
>>
>> +#ifdef CONFIG_ZONE_DEVICE
>> +               if (zone == ZONE_DEVICE) {
>> +                       WARN_ON_ONCE(!pgmap);
>> +                       /*
>> +                        * ZONE_DEVICE pages union ->lru with a ->pgmap back
>> +                        * pointer and zone_device_data. It is a bug if a
>> +                        * ZONE_DEVICE page is ever freed or placed on a driver
>> +                        * private list.
>> +                        */
>> +                       page->pgmap = pgmap;
>> +                       page->zone_device_data = NULL;
>> +               }
>> +#endif
>> +
> 
> So I am not sure this is right. Part of the reason for the split is
> that the pages that were a part of the altmap had an LRU setup, not
> the pgmap/zone_device_data setup. This is changing that.

Yeah, you might be right, we would have to handle the altmap part
separately.

> 
> Also, I am more a fan of just testing pgmap and if it is present then
> assign page->pgmap and reset zone_device_data. Then you can avoid the
> test for zone on every iteration and the WARN_ON_ONCE check, or at
> least you could move it outside the loop so we don't incur the cost
> with each page. Are there situations where you would have pgmap but
> not a ZONE_DEVICE page?

Don't think so. But I am definitely not a ZONE_DEVICE expert.

> 
>>                 /*
>>                  * Mark the block movable so that blocks are reserved for
>>                  * movable at startup. This will force kernel allocations
>> @@ -5951,14 +5947,6 @@ void __ref memmap_init_zone_device(struct zone *zone,
>>                  */
>>                 __SetPageReserved(page);
>>
>> -               /*
>> -                * ZONE_DEVICE pages union ->lru with a ->pgmap back pointer
>> -                * and zone_device_data.  It is a bug if a ZONE_DEVICE page is
>> -                * ever freed or placed on a driver-private list.
>> -                */
>> -               page->pgmap = pgmap;
>> -               page->zone_device_data = NULL;
>> -
>>                 /*
>>                  * Mark the block movable so that blocks are reserved for
>>                  * movable at startup. This will force kernel allocations
> 
> Shouldn't you be removing this function instead of just gutting it?
> I'm kind of surprised you aren't getting warnings about unused code
> since you also pulled the declaration for it from the header.
> 

Don't ask me what went wrong here, I guess I messed this up while rebasing.

-- 

Thanks,

David / dhildenb

next prev parent reply	other threads:[~2019-08-29 16:55 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-29  7:00 [PATCH v3 00/11] mm/memory_hotplug: Shrink zones before removing memory David Hildenbrand
2019-08-29  7:00 ` [PATCH v3 01/11] mm/memremap: Get rid of memmap_init_zone_device() David Hildenbrand
2019-08-29 16:39   ` Alexander Duyck
2019-08-29 16:55     ` David Hildenbrand [this message]
2019-08-29  7:00 ` [PATCH v3 02/11] mm/memory_hotplug: Simplify shrink_pgdat_span() David Hildenbrand
2019-08-29  7:00 ` [PATCH v3 03/11] mm/memory_hotplug: We always have a zone in find_(smallest|biggest)_section_pfn David Hildenbrand
2019-08-29  7:00 ` [PATCH v3 04/11] mm/memory_hotplug: Drop local variables in shrink_zone_span() David Hildenbrand
2019-08-29  7:00 ` [PATCH v3 05/11] mm/memory_hotplug: Optimize zone shrinking code when checking for holes David Hildenbrand
2019-08-29  7:00 ` [PATCH v3 06/11] mm/memory_hotplug: Fix crashes in shrink_zone_span() David Hildenbrand
2019-08-29  7:00 ` [PATCH v3 07/11] mm/memory_hotplug: Exit early in __remove_pages() on BUGs David Hildenbrand
2019-08-29  7:00 ` [PATCH v3 08/11] mm: Exit early in set_zone_contiguous() if already contiguous David Hildenbrand
2019-08-29  7:00 ` [PATCH v3 09/11] mm/memory_hotplug: Remove pages from a zone before removing memory David Hildenbrand
2019-08-29  7:00 ` [PATCH v3 10/11] mm/memory_hotplug: Remove zone parameter from __remove_pages() David Hildenbrand
2019-08-29  7:00 ` [PATCH v3 11/11] mm/memory_hotplug: Cleanup __remove_pages() David Hildenbrand
2019-08-29  8:23 ` [PATCH v3 00/11] mm/memory_hotplug: Shrink zones before removing memory Michal Hocko
2019-08-29 11:33   ` David Hildenbrand
2019-08-29 11:43     ` David Hildenbrand
2019-08-29 12:08       ` David Hildenbrand
2019-08-29 12:15         ` Michal Hocko
2019-08-29 12:29           ` David Hildenbrand
2019-08-29 15:19             ` Michal Hocko
2019-08-29 15:28               ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46b8e215-97d4-9097-9941-58e050c5efcc@redhat.com \
    --to=david@redhat.com \
    --cc=airlied@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.duyck@gmail.com \
    --cc=alexander.h.duyck@linux.intel.com \
    --cc=andreyknvl@google.com \
    --cc=arunks@codeaurora.org \
    --cc=cai@lca.pw \
    --cc=dan.j.williams@intel.com \
    --cc=glider@google.com \
    --cc=ira.weiny@intel.com \
    --cc=jgg@ziepe.ca \
    --cc=jhubbard@nvidia.com \
    --cc=jrdr.linux@gmail.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=logang@deltatee.com \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=osalvador@suse.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=richard.weiyang@gmail.com \
    --cc=robin.murphy@arm.com \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=sfr@canb.auug.org.au \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=yang.shi@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox