linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Dave Young <dyoung@redhat.com>
Cc: "Michal Hocko" <mhocko@kernel.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	"Alexander Potapenko" <glider@google.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Andrey Ryabinin" <aryabinin@virtuozzo.com>,
	"Balbir Singh" <bsingharora@gmail.com>,
	"Baoquan He" <bhe@redhat.com>,
	"Benjamin Herrenschmidt" <benh@kernel.crashing.org>,
	"Boris Ostrovsky" <boris.ostrovsky@oracle.com>,
	"Dan Williams" <dan.j.williams@intel.com>,
	"Dmitry Vyukov" <dvyukov@google.com>,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	"Hari Bathini" <hbathini@linux.vnet.ibm.com>,
	"Huang Ying" <ying.huang@intel.com>,
	"Hugh Dickins" <hughd@google.com>,
	"Ingo Molnar" <mingo@kernel.org>,
	"Jaewon Kim" <jaewon31.kim@samsung.com>,
	"Jan Kara" <jack@suse.cz>, "Jérôme Glisse" <jglisse@redhat.com>,
	"Joonsoo Kim" <iamjoonsoo.kim@lge.com>,
	"Juergen Gross" <jgross@suse.com>,
	"Kate Stewart" <kstewart@linuxfoundation.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	"Matthew Wilcox" <mawilcox@microsoft.com>,
	"Mel Gorman" <mgorman@suse.de>,
	"Michael Ellerman" <mpe@ellerman.id.au>,
	"Miles Chen" <miles.chen@mediatek.com>,
	"Oscar Salvador" <osalvador@techadventures.net>,
	"Paul Mackerras" <paulus@samba.org>,
	"Pavel Tatashin" <pasha.tatashin@oracle.com>,
	"Philippe Ombredanne" <pombredanne@nexb.com>,
	"Rashmica Gupta" <rashmica.g@gmail.com>,
	"Reza Arbab" <arbab@linux.vnet.ibm.com>,
	"Souptick Joarder" <jrdr.linux@gmail.com>,
	"Tetsuo Handa" <penguin-kernel@i-love.sakura.ne.jp>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Vlastimil Babka" <vbabka@suse.cz>
Subject: Re: [PATCH v1 00/10] mm: online/offline 4MB chunks controlled by device driver
Date: Thu, 24 May 2018 11:14:33 +0200	[thread overview]
Message-ID: <e70de03e-6965-749a-6c3c-ecf6dcb60c71@redhat.com> (raw)
In-Reply-To: <20180524085610.GA5467@dhcp-128-65.nay.redhat.com>

On 24.05.2018 10:56, Dave Young wrote:
> Hi,
> 
> [snip]
>>>
>>>> For kdump and onlining/offlining code, we
>>>> have to mark pages as offline before a new segment is visible to the system
>>>> (e.g. as these pages might not be backed by real memory in the hypervisor).
>>>
>>> Please expand on the kdump part. That is really confusing because
>>> hotplug should simply not depend on kdump at all. Moreover why don't you
>>> simply mark those pages reserved and pull them out from the page
>>> allocator?
>>
>> 1. "hotplug should simply not depend on kdump at all"
>>
>> In theory yes. In the current state we already have to trigger kdump to
>> reload whenever we add/remove a memory block.
>>
>>
>> 2. kdump part
>>
>> Whenever we offline a page and tell the hypervisor about it ("unplug"),
>> we should not assume that we can read that page again. Now, if dumping
>> tools assume they can read all memory that is offline, we are in trouble.
>>
>> It is the same thing as we already have with Pg_hwpoison. Just a
>> different meaning - "don't touch this page, it is offline" compared to
>> "don't touch this page, hw is broken".
> 
> Does that means in case an offline no kdump reload as mentioned in 1)?
> 
> If we have the offline event and reload kdump, I assume the memory state
> is refreshed so kdump will not read the memory offlined, am I missing
> something?

If a whole section is offline: yes. (ACPI hotplug)

If pages are online but broken ("logically offline" - hwpoison): no

If single pages are logically offline: no. (Balloon inflation - let's
call it unplug as that's what some people refer to)

If only subsections (4MB chunks) are offline: no.

Exporting memory ranges in a smaller granularity to kdump than section
size would a) be heavily complicated b) introduce a lot of overhead for
this tracking data c) make us retrigger kdump way too often.

So simply marking pages offline in the struct pages and telling kdump
about it is the straight forward thing to do. And it is fairly easy to
add and implement as we have the exact same thing in place for hwpoison.

> 
>>
>> Balloon drivers solve this problem by always allowing to read unplugged
>> memory. In virtio-mem, this cannot and should even not be guaranteed.
>>
> 
> Hmm, that sounds a bug..

I can give you a simple example why reading such unplugged (or balloon
inflated) memory is problematic: Huge page backed guests.

There is no zero page for huge pages. So if we allow the guest to read
that memory any time, we cannot guarantee that we actually consume less
memory in the hypervisor. This is absolutely to be avoided.

Existing balloon drivers don't support huge page backed guests. (well
you can inflate, but the hypervisor cannot madvise() 4k on a huge page,
resulting in no action being performed). This scenario is to be
supported with virtio-mem.


So yes, this is actually a bug in e.g. virtio-balloon implementations:

With "VIRTIO_BALLOON_F_MUST_TELL_HOST" we have to tell the hypervisor
before we access a page again. kdump cannot do this and does not care,
so this page is silently accessed and dumped. One of the main problems
why extending virtio-balloon hypervisor implementations to support
host-enforced R/W protection is impossible.

> 
>> And what we have to do to make this work is actually pretty simple: Just
>> like Pg_hwpoison, track per page if it is online and provide this
>> information to kdump.
>>
>>
> 
> Thanks
> Dave
> 


-- 

Thanks,

David / dhildenb

  reply	other threads:[~2018-05-24  9:14 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-23 15:11 David Hildenbrand
2018-05-23 15:11 ` [PATCH v1 01/10] mm: introduce and use PageOffline() David Hildenbrand
2018-05-23 15:11 ` [PATCH v1 02/10] mm/page_ext.c: support online/offline of memory < section size David Hildenbrand
2018-05-23 15:11 ` [PATCH v1 03/10] kasan: prepare for online/offline of different start/size David Hildenbrand
2018-05-23 15:11 ` [PATCH v1 04/10] kdump: include PAGE_OFFLINE_MAPCOUNT_VALUE in VMCOREINFO David Hildenbrand
2018-05-23 15:11 ` [PATCH v1 05/10] mm/memory_hotplug: limit offline_pages() to sizes we can actually handle David Hildenbrand
2018-05-23 15:11 ` [PATCH v1 06/10] mm/memory_hotplug: onlining pages can only fail due to notifiers David Hildenbrand
2018-05-23 15:11 ` [PATCH v1 07/10] mm/memory_hotplug: print only with DEBUG_VM in online/offline_pages() David Hildenbrand
2018-05-23 15:11 ` [PATCH v1 08/10] mm/memory_hotplug: allow to control onlining/offlining of memory by a driver David Hildenbrand
2018-05-23 15:11 ` [PATCH v1 09/10] mm/memory_hotplug: teach offline_pages() to not try forever David Hildenbrand
2018-05-24 14:39   ` Michal Hocko
2018-05-24 20:36     ` David Hildenbrand
2018-05-23 15:11 ` [PATCH v1 10/10] mm/memory_hotplug: allow online/offline memory by a kernel module David Hildenbrand
2018-05-23 19:51   ` Christoph Hellwig
2018-05-24  5:59     ` David Hildenbrand
2018-05-24  7:53 ` [PATCH v1 00/10] mm: online/offline 4MB chunks controlled by device driver Michal Hocko
2018-05-24  8:31   ` David Hildenbrand
2018-05-24  8:56     ` Dave Young
2018-05-24  9:14       ` David Hildenbrand [this message]
2018-05-28  8:28         ` Dave Young
2018-05-28 10:03           ` David Hildenbrand
2018-05-24  9:31     ` Michal Hocko
2018-05-24 10:45       ` David Hildenbrand
2018-05-24 12:03         ` Michal Hocko
2018-05-24 14:04           ` David Hildenbrand
2018-05-24 14:22             ` Michal Hocko
2018-05-24 21:07               ` David Hildenbrand
2018-06-11 11:53                 ` David Hildenbrand
2018-06-11 11:56                   ` Michal Hocko
2018-06-11 12:33                     ` David Hildenbrand
2018-07-16 19:48                       ` David Hildenbrand
2018-07-16 20:05                         ` Michal Hocko
2018-07-18  9:56                           ` David Hildenbrand
2018-07-18 11:23                             ` Michal Hocko
2018-07-18 13:19                 ` Michal Hocko
2018-07-18 13:39                   ` David Hildenbrand
2018-07-18 13:43                     ` Michal Hocko
2018-07-18 13:47                       ` David Hildenbrand
2018-07-18 13:56                         ` Michal Hocko
2018-05-25 15:08           ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e70de03e-6965-749a-6c3c-ecf6dcb60c71@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=arbab@linux.vnet.ibm.com \
    --cc=aryabinin@virtuozzo.com \
    --cc=benh@kernel.crashing.org \
    --cc=bhe@redhat.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bsingharora@gmail.com \
    --cc=dan.j.williams@intel.com \
    --cc=dvyukov@google.com \
    --cc=dyoung@redhat.com \
    --cc=glider@google.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hbathini@linux.vnet.ibm.com \
    --cc=hughd@google.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=jack@suse.cz \
    --cc=jaewon31.kim@samsung.com \
    --cc=jglisse@redhat.com \
    --cc=jgross@suse.com \
    --cc=jrdr.linux@gmail.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kstewart@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mawilcox@microsoft.com \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=miles.chen@mediatek.com \
    --cc=mingo@kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=osalvador@techadventures.net \
    --cc=pasha.tatashin@oracle.com \
    --cc=paulus@samba.org \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=pombredanne@nexb.com \
    --cc=rashmica.g@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=vbabka@suse.cz \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox