From: Daniel Jordan <daniel.m.jordan@oracle.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Andy Lutomirski <luto@kernel.org>,
Dave Hansen <dave.hansen@linux.intel.com>,
David Hildenbrand <david@redhat.com>,
Pavel Tatashin <pasha.tatashin@soleen.com>,
Peter Zijlstra <peterz@infradead.org>,
Steven Sistare <steven.sistare@oracle.com>
Subject: Re: [PATCH v2] x86/mm: use max memory block size on bare metal
Date: Mon, 22 Jun 2020 15:17:39 -0400 [thread overview]
Message-ID: <20200622191739.4lekqrmjnzv2vwl2@ca-dmjordan1.us.oracle.com> (raw)
In-Reply-To: <20200619120704.GD12177@dhcp22.suse.cz>
Hello Michal,
(I've been away and may be slow to respond for a little while)
On Fri, Jun 19, 2020 at 02:07:04PM +0200, Michal Hocko wrote:
> On Tue 09-06-20 18:54:51, Daniel Jordan wrote:
> [...]
> > @@ -1390,6 +1391,15 @@ static unsigned long probe_memory_block_size(void)
> > goto done;
> > }
> >
> > + /*
> > + * Use max block size to minimize overhead on bare metal, where
> > + * alignment for memory hotplug isn't a concern.
>
> This really begs a clarification why this is not a concern. Bare metal
> can see physical memory hotadd as well. I just suspect that you do not
> consider that to be very common so it is not a big deal?
It's not only uncommon, it's also that boot_mem_end on bare metal may not align
with any available memory block size. For instance, this server's boot_mem_end
is only 4M aligned and FWIW my desktop's is 2M aligned. As far as I can tell,
the logic that picks the size wasn't intended for bare metal.
> And I would
> tend to agree but still we are just going to wait until first user
> stumbles over this.
This isn't something new with this patch, 2G has been the default on big
machines for years. This is addressing an unintended side effect of
078eb6aa50dc50, which was for qemu, by restoring the original behavior on bare
metal to avoid oodles of sysfs files.
> Btw. memblock interface just doesn't scale and it is a terrible
> interface for large machines and for the memory hotplug in general (just
> look at ppc and their insanely small memblocks).
I agree that the status quo isn't ideal and is something to address going
forward.
> Most usecases I have seen simply want to either offline some portion of
> memory without a strong requirement of the physical memory range as long
> as it is from a particular node or simply offline and remove the full
> node.
Interesting, would've thought that removing a single bad DIMM for RAS purposes
would also be common relative to how often hotplug is done on real systems.
> I believe that we should think about a future interface rather than
> trying to ducktape the blocksize anytime it causes problems. I would be
> even tempted to simply add a kernel command line option
> memory_hotplug=disable,legacy,new_shiny
>
> for disable it would simply drop all the sysfs crud and speed up boot
> for most users who simply do not care about memory hotplug. new_shiny
> would ideally provide an interface that would either export logically
> hotplugable memory ranges (e.g. DIMMs) or a query/action interface which
> accepts physical ranges as input. Having gazillions of sysfs files is
> simply unsustainable.
So in this idea, presumably the default would start off being legacy and then
later be changed to new_shiny?
If new_shiny scales well, maybe 'disable' wouldn't be needed and so using the
option could be avoided most of the time. If some users really don't want it,
they can build without it.
next prev parent reply other threads:[~2020-06-22 19:17 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-09 22:54 Daniel Jordan
2020-06-09 23:03 ` Daniel Jordan
2020-06-10 7:20 ` David Hildenbrand
2020-06-10 7:30 ` David Hildenbrand
2020-06-10 17:16 ` Daniel Jordan
2020-06-11 14:16 ` Dave Hansen
2020-06-11 16:59 ` Daniel Jordan
2020-06-11 17:05 ` Dave Hansen
2020-06-12 3:29 ` Daniel Jordan
2020-06-19 12:07 ` Michal Hocko
2020-06-22 19:17 ` Daniel Jordan [this message]
2020-06-26 12:47 ` Michal Hocko
2020-07-08 18:46 ` Daniel Jordan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200622191739.4lekqrmjnzv2vwl2@ca-dmjordan1.us.oracle.com \
--to=daniel.m.jordan@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=mhocko@kernel.org \
--cc=pasha.tatashin@soleen.com \
--cc=peterz@infradead.org \
--cc=steven.sistare@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox