linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>,
	linux-mm <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Oscar Salvador <osalvador@suse.de>,
	Michal Hocko <mhocko@suse.com>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	Alexander Gordeev <agordeev@linux.ibm.com>,
	Heiko Carstens <hca@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	linux-s390 <linux-s390@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 0/8] implement "memmap on memory" feature on s390
Date: Fri, 17 Nov 2023 16:37:29 +0100	[thread overview]
Message-ID: <ee492da8-74b4-4a97-8b24-73e07257f01d@redhat.com> (raw)
In-Reply-To: <20231117140009.5d8a509c@thinkpad-T15>

On 17.11.23 14:00, Gerald Schaefer wrote:
> On Fri, 17 Nov 2023 00:08:31 +0100
> David Hildenbrand <david@redhat.com> wrote:
> 
>> On 14.11.23 19:02, Sumanth Korikkar wrote:
>>> Hi All,
>>>
>>> The patch series implements "memmap on memory" feature on s390 and
>>> provides the necessary fixes for it.
>>
>> Thinking about this, one thing that makes s390x different from all the
>> other architectures in this series is the altmap handling.
>>
>> I'm curious, why is that even required?
>>
>> A memmep that is not marked as online in the section should not be
>> touched by anybody (except memory onlining code :) ). And if we do, it's
>> usually a BUG because that memmap might contain garbage/be poisoned or
>> completely stale, so we might want to track that down and fix it in any
>> case.
>>
>> So what speaks against just leaving add_memory() populate the memmap
>> from the altmap? Then, also the page tables for the memmap are already
>> in place when onlining memory.
> 
> Good question, I am not 100% sure if we ran into bugs, or simply assumed
> that it is not OK to call __add_pages() when the memory for the altmap
> is not accessible.

I mean, we create the direct map even though nobody should access that 
memory, so maybe we can simply map the altmap even though nobody should 
should access that memory.

As I said, then, even the page tables for the altmap are allocated 
already and memory onlining likely doesn't need any allocation anymore 
(except, there is kasan or some other memory notifiers have special 
demands).

Certainly simpler :)

> 
> Maybe there is also already a common code bug with that, s390 might be
> special but that is often also good for finding bugs in common code ...

If it's only the page_init_poison() as noted by Sumanth, we could 
disable that on s390x with an altmap some way or the other; should be 
possible.

I mean, you effectively have your own poisoning if the altmap is 
effectively inaccessible and makes your CPU angry on access :)

Last but not least, support for an inaccessible altmap might come in 
handy for virtio-mem eventually, and make altmap support eventually 
simpler. So added bonus points.

> 
>> Then, adding two new notifier calls on start of memory_block_online()
>> called something like MEM_PREPARE_ONLINE and end the end of
>> memory_block_offline() called something like MEM_FINISH_OFFLINE is still
>> suboptimal, but that's where standby memory could be
>> activated/deactivated, without messing with the altmap.
>>
>> That way, the only s390x specific thing is that the memmap that should
>> not be touched by anybody is actually inaccessible, and you'd
>> activate/deactivate simply from the new notifier calls just the way we
>> used to do.
>>
>> It's still all worse than just adding/removing memory properly, using a
>> proper interface -- where you could alloc/free an actual memmap when the
>> altmap is not desired. But I know that people don't want to spend time
>> just doing it cleanly from scratch.
> 
> Yes, sometimes they need to be forced to do that :-)

I certainly won't force you if we can just keep the __add_pages() calls 
as is; having an altmap that is inaccessible but fully prepared sounds 
reasonable to me.

I can see how this gives an immediate benefit to existing s390x 
installations without being too hacky and without taking a long time to 
settle.

But I'll strongly suggest to evaluate a new interface long-term.

> 
> So, we'll look into defining a "proper interface", and treat patches 1-3
> separately as bug fixes? Especially patch 3 might be interesting for arm,
> if they do not have ZONE_DEVICE, but still use the functions, they might
> end up with the no-op version, not really freeing any memory.

It might make sense to

1) Send the first 3 out separately
2) Look into a simple variant that leaves __add_pages() calls alone and
    only adds the new MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers --
    well, and deals with an inaccessible altmap, like the
    page_init_poison() when the altmap might be inaccessible.
3) Look into a proper interface to add/remove memory instead of relying
    on online/offline.

2) is certainly an improvement and might be desired in some cases. 3) is 
more powerful (e.g., where you don't want an altmap because of 
fragmentation) and future proof.

I suspect there will be installations where an altmap is undesired: it 
fragments your address space with unmovable (memmap) allocations. 
Currently, runtime allocations of gigantic pages are affected. Long-term 
other large allocations (if we ever see very large THP) will be affected.

For that reason, we want to either support variable-sized memory blocks 
long-term, or simulate that by "grouping" memory blocks that share a 
same altmap located on the first memory blocks in that group: but 
onlining one block forces onlining of the whole group.

On s390x that adds all memory ahead of time, it's hard to make a 
decision what the right granularity will be, and seeing sudden 
online/offline changed behavior might be quite "surprising" for users. 
The user can give better hints when adding/removing memory explicitly.

-- 
Cheers,

David / dhildenb



  reply	other threads:[~2023-11-17 15:37 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-14 18:02 Sumanth Korikkar
2023-11-14 18:02 ` [PATCH 2/8] mm/memory_hotplug: fix error handling in add_memory_resource() Sumanth Korikkar
2023-11-14 18:36   ` David Hildenbrand
2023-11-15 13:45     ` Sumanth Korikkar
2023-11-14 18:02 ` [PATCH 3/8] mm: use vmem_altmap code without CONFIG_ZONE_DEVICE Sumanth Korikkar
2023-11-16 18:43   ` David Hildenbrand
2023-11-17 21:39   ` kernel test robot
2023-11-14 18:02 ` [PATCH 5/8] s390/mm: allocate vmemmap pages from self-contained memory range Sumanth Korikkar
2023-11-14 18:02 ` [PATCH 7/8] s390/sclp: remove unhandled memory notifier type Sumanth Korikkar
2023-11-16 19:33   ` David Hildenbrand
     [not found] ` <20231114180238.1522782-2-sumanthk@linux.ibm.com>
2023-11-14 18:22   ` [PATCH 1/8] mm/memory_hotplug: fix memory hotplug locking order David Hildenbrand
     [not found]     ` <ZVTKk7J1AcoBBxhR@li-2b55cdcc-350b-11b2-a85c-a78bff51fc11.ibm.com>
2023-11-16 18:40       ` David Hildenbrand
2023-11-17 13:42         ` Sumanth Korikkar
     [not found] ` <20231114180238.1522782-5-sumanthk@linux.ibm.com>
2023-11-14 18:27   ` [PATCH 4/8] mm/memory_hotplug: introduce MEM_PHYS_ONLINE/OFFLINE memory notifiers David Hildenbrand
2023-11-15 14:23     ` Sumanth Korikkar
2023-11-16 19:03       ` David Hildenbrand
2023-11-15 15:03     ` Gerald Schaefer
2023-11-16 19:02       ` David Hildenbrand
     [not found] ` <20231114180238.1522782-7-sumanthk@linux.ibm.com>
2023-11-14 18:39   ` [PATCH 6/8] s390/mm: implement MEM_PHYS_ONLINE MEM_PHYS_OFFLINE " David Hildenbrand
2023-11-15 14:20     ` Sumanth Korikkar
2023-11-16 19:16       ` David Hildenbrand
2023-11-16 19:23         ` David Hildenbrand
2023-11-17 13:00         ` Gerald Schaefer
2023-11-20 14:49           ` David Hildenbrand
2023-11-16 23:08 ` [PATCH 0/8] implement "memmap on memory" feature on s390 David Hildenbrand
2023-11-17 13:00   ` Gerald Schaefer
2023-11-17 15:37     ` David Hildenbrand [this message]
2023-11-17 19:46       ` Sumanth Korikkar
2023-11-21 13:13       ` Sumanth Korikkar
2023-11-21 13:21         ` Sumanth Korikkar
2023-11-21 14:42           ` David Hildenbrand
2023-11-21 19:24         ` David Hildenbrand
2023-11-22 11:44           ` Sumanth Korikkar
2023-11-17 13:56   ` Sumanth Korikkar
2023-11-17 15:37     ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ee492da8-74b4-4a97-8b24-73e07257f01d@redhat.com \
    --to=david@redhat.com \
    --cc=agordeev@linux.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=anshuman.khandual@arm.com \
    --cc=gerald.schaefer@linux.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=mhocko@suse.com \
    --cc=osalvador@suse.de \
    --cc=sumanthk@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox