linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: "Vlastimil Babka" <vbabka@suse.cz>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Baoquan He" <bhe@redhat.com>, "Dave Young" <dyoung@redhat.com>,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	"Hari Bathini" <hbathini@linux.vnet.ibm.com>,
	"Huang Ying" <ying.huang@intel.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	"Marc-André Lureau" <marcandre.lureau@redhat.com>,
	"Matthew Wilcox" <mawilcox@microsoft.com>,
	"Miles Chen" <miles.chen@mediatek.com>,
	"Pavel Tatashin" <pasha.tatashin@oracle.com>,
	"Petr Tesarik" <ptesarik@suse.cz>
Subject: Re: [PATCH v1 0/2] mm/kdump: exclude reserved pages in dumps
Date: Tue, 24 Jul 2018 16:13:09 +0200	[thread overview]
Message-ID: <6c753cae-f8b6-5563-e5ba-7c1fefdeb74e@redhat.com> (raw)
In-Reply-To: <20180724133530.GN28386@dhcp22.suse.cz>

On 24.07.2018 15:35, Michal Hocko wrote:
> On Tue 24-07-18 15:27:51, David Hildenbrand wrote:
>> On 24.07.2018 15:13, Michal Hocko wrote:
>>> On Tue 24-07-18 14:17:12, David Hildenbrand wrote:
>>>> On 24.07.2018 09:25, Michal Hocko wrote:
>>>>> On Mon 23-07-18 19:20:43, David Hildenbrand wrote:
>>>>>> On 23.07.2018 14:30, Michal Hocko wrote:
>>>>>>> On Mon 23-07-18 13:45:18, Vlastimil Babka wrote:
>>>>>>>> On 07/20/2018 02:34 PM, David Hildenbrand wrote:
>>>>>>>>> Dumping tools (like makedumpfile) right now don't exclude reserved pages.
>>>>>>>>> So reserved pages might be access by dump tools although nobody except
>>>>>>>>> the owner should touch them.
>>>>>>>>
>>>>>>>> Are you sure about that? Or maybe I understand wrong. Maybe it changed
>>>>>>>> recently, but IIRC pages that are backing memmap (struct pages) are also
>>>>>>>> PG_reserved. And you definitely do want those in the dump.
>>>>>>>
>>>>>>> You are right. reserve_bootmem_region will make all early bootmem
>>>>>>> allocations (including those backing memmaps) PageReserved. I have asked
>>>>>>> several times but I haven't seen a satisfactory answer yet. Why do we
>>>>>>> even care for kdump about those. If they are reserved the nobody should
>>>>>>> really look at those specific struct pages and manipulate them. Kdump
>>>>>>> tools are using a kernel interface to read the content. If the specific
>>>>>>> content is backed by a non-existing memory then they should simply not
>>>>>>> return anything.
>>>>>>>
>>>>>>
>>>>>> "new kernel" provides an interface to read memory from "old kernel".
>>>>>>
>>>>>> The new kernel has no idea about
>>>>>> - which memory was added/online in the old kernel
>>>>>> - where struct pages of the old kernel are and what their content is
>>>>>> - which memory is save to touch and which not
>>>>>>
>>>>>> Dump tools figure all that out by interpreting the VMCORE. They e.g.
>>>>>> identify "struct pages" and see if they should be dumped. The "new
>>>>>> kernel" only allows to read that memory. It cannot hinder to crash the
>>>>>> system (e.g. if a dump tool would try to read a hwpoison page).
>>>>>>
>>>>>> So how should the "new kernel" know if a page can be touched or not?
>>>>>
>>>>> I am sorry I am not familiar with kdump much. But from what I remember
>>>>> it reads from /proc/vmcore and implementation of this interface should
>>>>> simply return EINVAL or alike when you try to dump inaccessible memory
>>>>> range.
>>>>
>>>> Oh, and BTW, while something like -EINVAL could work, we usually don't
>>>> want to try to read certain pages at all (e.g. ballooned pages -
>>>> accessing the page might work but involves quite some overhead in the
>>>> hypervisor).
>>>>
>>>> So we should either handle this in dump tools (reserved + ...?) or while
>>>> doing the read similar to XEN (is_ram_page()).
>>>
>>> Yes, I think this is the proper way. Just test for PageOnline
>>> in read_from_oldmem/copy_oldmem_page. Btw. we already page
>>> pfn_to_online_page which performs the per-section online/offline
>>> status. This should be extendable to consider your new PageOffline
>>> state.
>>
>> That is the important bit:
>>
>> What the new kernel sees is not what the old kernel saw.
>>
>> Checking for pfn_to_online_page() from
>> read_from_oldmem/copy_oldmem_page() is plain wrong.
>>
>> E.g. ACPI hotplug memory is not even added in the new kernel - see
>> "acpi_no_memhotplug" which is used in kdump environments.
>>
>> The only thing we can do is
>> - query the hypervisor
>> - try to access and get an exception
> 
> But we do preserve struct page's (aka memmap) from the crash kernel,
> don't we? So you have the whole state there. Or am I missing something?
> 

Yes, they are preserved but we don't interpret them, that is up to dump
tools. We only provide access to the vmcore, which includes read/writing
the memory indicated in it. The struct pages are simply part of the
vmcore. Completely hidden from the new kernel.

Finding/interpreting the struct pages is not (and most probably should
never) be done in the kernel.

E.g. The old kernel could be a different kernel version, different
memory configuration (!SPARSE, SPARSE ...), page flags could be
different ... it's not a straight forward access.

That's why dump tools interpret struct pages instead. And also why I
want a simple identifier in them so user space dump tools can figure out
"this page is better not to be touched, the content is stale or not
accessible".

So I see right now:

- Pg_reserved + e.g. new page type (or some other unique identifier in
  combination with Pg_reserved)
 -> Avoid reads of pages we know are offline
- extend is_ram_page()
 -> Fake zero memory for pages we know are offline

Or even both (avoid reading and don't crash the kernel if it is being done).

I am not a friend of the "try to access and get an exception" approach.

-- 

Thanks,

David / dhildenb

  reply	other threads:[~2018-07-24 14:13 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-20 12:34 David Hildenbrand
2018-07-20 12:34 ` [PATCH v1 1/2] mm: clarify semantics of reserved pages David Hildenbrand
2018-07-23 10:48   ` Michal Hocko
2018-07-20 12:34 ` [PATCH v1 2/2] kdump: include PG_reserved value in VMCOREINFO David Hildenbrand
2018-07-23 11:45 ` [PATCH v1 0/2] mm/kdump: exclude reserved pages in dumps Vlastimil Babka
2018-07-23 12:30   ` Michal Hocko
2018-07-23 17:20     ` David Hildenbrand
2018-07-24  7:25       ` Michal Hocko
2018-07-24  8:46         ` David Hildenbrand
2018-07-24  8:53           ` Michal Hocko
2018-07-24  9:18             ` David Hildenbrand
2018-07-24 12:17         ` David Hildenbrand
2018-07-24 13:13           ` Michal Hocko
2018-07-24 13:27             ` David Hildenbrand
2018-07-24 13:35               ` Michal Hocko
2018-07-24 14:13                 ` David Hildenbrand [this message]
2018-07-25 13:51                   ` Michal Hocko
2018-07-25 14:20                     ` David Hildenbrand
2018-07-26  8:27                       ` Michal Hocko
2018-07-26  8:37                         ` David Hildenbrand
2018-07-24  9:47     ` Vlastimil Babka
2018-07-24 11:19       ` Michal Hocko
2018-07-24 12:22         ` Vlastimil Babka
2018-07-24 12:33           ` David Hildenbrand
2018-07-24 13:06           ` Michal Hocko
2018-07-23 17:12   ` David Hildenbrand
2018-07-24  7:22     ` Michal Hocko
2018-07-24  9:48       ` Vlastimil Babka
2018-07-26  8:22       ` David Hildenbrand
2018-07-26  8:30         ` Michal Hocko
2018-07-26  8:45           ` David Hildenbrand
2018-07-26 19:50             ` Andrew Morton
2018-07-30  8:17               ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6c753cae-f8b6-5563-e5ba-7c1fefdeb74e@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=bhe@redhat.com \
    --cc=dyoung@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hbathini@linux.vnet.ibm.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=marcandre.lureau@redhat.com \
    --cc=mawilcox@microsoft.com \
    --cc=mhocko@kernel.org \
    --cc=miles.chen@mediatek.com \
    --cc=pasha.tatashin@oracle.com \
    --cc=ptesarik@suse.cz \
    --cc=vbabka@suse.cz \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox