linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: linux-mm@kvack.org
Subject: Re: [PATCH RFC 0/8] mm: online/offline 4MB chunks controlled by device driver
Date: Fri, 13 Apr 2018 16:59:39 +0200	[thread overview]
Message-ID: <e5430631-7bc2-3c42-43b0-2fce220feeeb@redhat.com> (raw)
In-Reply-To: <20180413142030.GU17484@dhcp22.suse.cz>

On 13.04.2018 16:20, Michal Hocko wrote:
> On Fri 13-04-18 16:01:43, David Hildenbrand wrote:
>> On 13.04.2018 15:44, Michal Hocko wrote:
>>> [If you choose to not CC the same set of people on all patches - which
>>> is sometimes a legit thing to do - then please cc them to the cover
>>> letter at least.]
>>>
>>> On Fri 13-04-18 15:16:24, David Hildenbrand wrote:
>>>> I am right now working on a paravirtualized memory device ("virtio-mem").
>>>> These devices control a memory region and the amount of memory available
>>>> via it. Memory will not be indicated via ACPI and friends, the device
>>>> driver is responsible for it.
>>>
>>> How does this compare to other ballooning solutions? And why your driver
>>> cannot simply use the existing sections and maintain subsections on top?
>>>
>>
>> (further down in this mail is a small paragraph about that)
> 
> Sorry, I just stopped right there and didn't even finsh to the end.
> Shame on me! I will do my homework and read it carefully (next week).
> 

Sure, in case you have any questions feel free to ask. And if you are
curious how this is used in practice, let me know and I can post the
current prototype that should run on x86 and s390x.

Have a nice weekend! :)

> [...]
>> "And why your driver cannot simply use the existing sections and
>> maintain subsections on top?"
>>
>> Can you elaborate how that is going to work? What I do as of now, is to
>> remember for each memory block (basically a section because I want to
>> make it as small as possible) which chunks ("subsections") are
>> online/offline. This works just fine. Is this what you are referring to?
> 
> Well, basically yes. I meant to suggest you simply mark pages reserved
> and pull them out. You can reuse some parts of such a struct page for
> your metadata because we should simply ignore those.

I store metadata in a separate structure (basically a uin64_t) right,
because it is easier to track blocks especially when I remove_memory()
again.

Problem with reserved pages is that e.g. kdump will happily think it can
read all pages. So we need some way to indicate that to dumping tools.
Also, offline_pages() has to be thought to not simply offline a memory
section just because a subset of pages has been offlined.

> 
> You still have to allocate memmap for the full section but 128MB
> sections have a nice effect that they fit into a single PMD for
> sparse-vmemmap. So you do not really need to touch mem sections, all you
> need is to keep your metadata on top.

Please keep in mind that we somehow have to get pages out of the system
when trying to remove 4mb chunks. Especially to also make
remove_memory() work once all chunks have been offlined. We cannot use
any current allocator for this ("allocate memory only in a certain
address range"). So the online/offline_pages approach is the cleanest
solution I have found so far. (e.g. offline_pages: isolate+migrate a 4MB
block, flush them out of all data structures, fixup accounting).

Also, please note that the subsection size can very. It could e.g. be
8MB or 16MB. This is not fixed to 4MB. It could be configured
differently by the paravirtualized memory device (e.g. minimum
granularity is 8MB)

The current prototype allows a driver to:
- add/remove >4MB chunks to/from the system cleanly
- add memory blocks that it manages, when needed
- remove memory blocks when no longer needed (removing struct pages)
- teaching kdump not to touch subsections that are offline

Thanks!

-- 

Thanks,

David / dhildenb

  reply	other threads:[~2018-04-13 14:59 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-13 13:16 David Hildenbrand
2018-04-13 13:16 ` [PATCH RFC 1/8] mm/memory_hotplug: Revert "mm/memory_hotplug: optimize memory hotplug" David Hildenbrand
2018-04-13 13:16 ` [PATCH RFC 2/8] mm: introduce PG_offline David Hildenbrand
2018-04-13 13:40   ` Michal Hocko
2018-04-13 13:46     ` David Hildenbrand
2018-04-17 11:50     ` David Hildenbrand
2018-04-13 17:11   ` Matthew Wilcox
2018-04-16  8:31     ` David Hildenbrand
2018-04-21 16:52     ` Vlastimil Babka
2018-04-22  3:01       ` Matthew Wilcox
2018-04-22  8:17         ` David Hildenbrand
2018-04-22 14:02           ` Matthew Wilcox
2018-04-22 15:13             ` David Hildenbrand
2018-04-29 21:08               ` Michal Hocko
2018-04-30  6:31                 ` David Hildenbrand
2018-04-20  7:30   ` David Hildenbrand
2018-04-13 13:16 ` [PATCH RFC 3/8] mm: use PG_offline in online/offlining code David Hildenbrand
2018-04-13 13:31 ` [PATCH RFC 4/8] kdump: expose PG_offline David Hildenbrand
2018-04-13 13:33 ` [PATCH RFC 5/8] mm: only mark section offline when all pages are offline David Hildenbrand
2018-04-13 13:33 ` [PATCH RFC 6/8] mm: offline_pages() is also limited by MAX_ORDER David Hildenbrand
2018-04-13 13:33 ` [PATCH RFC 7/8] mm: allow to control onlining/offlining of memory by a driver David Hildenbrand
2018-04-13 15:59   ` Michal Hocko
2018-04-13 16:32     ` David Hildenbrand
2018-04-13 13:33 ` [PATCH RFC 8/8] mm: export more functions used to online/offline memory David Hildenbrand
2018-04-13 13:44 ` [PATCH RFC 0/8] mm: online/offline 4MB chunks controlled by device driver Michal Hocko
2018-04-13 14:01   ` David Hildenbrand
2018-04-13 14:20     ` Michal Hocko
2018-04-13 14:59       ` David Hildenbrand [this message]
2018-04-13 15:02   ` David Hildenbrand
2018-04-13 16:03     ` Michal Hocko
2018-04-13 16:36       ` David Hildenbrand
2018-04-13 15:59 ` Michal Hocko
2018-04-13 16:31   ` David Hildenbrand
2018-04-16 14:08     ` Michal Hocko
2018-04-16 14:48       ` David Hildenbrand
2018-04-18 15:46       ` David Hildenbrand
2018-04-19  7:33         ` Michal Hocko
2018-04-26 15:30           ` David Hildenbrand
2018-04-29 21:05             ` Michal Hocko
2018-04-30  6:24               ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e5430631-7bc2-3c42-43b0-2fce220feeeb@redhat.com \
    --to=david@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox