From: David Hildenbrand <david@redhat.com>
To: Mel Gorman <mgorman@techsingularity.net>,
Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>,
akpm@linux-foundation.org, aarcange@redhat.com,
dan.j.williams@intel.com, dave.hansen@intel.com,
konrad.wilk@oracle.com, lcapitulino@redhat.com,
mm-commits@vger.kernel.org, mst@redhat.com, osalvador@suse.de,
pagupta@redhat.com, pbonzini@redhat.com, riel@surriel.com,
vbabka@suse.cz, wei.w.wang@intel.com, willy@infradead.org,
yang.zhang.wz@gmail.com, linux-mm@kvack.org
Subject: Re: + mm-introduce-reported-pages.patch added to -mm tree
Date: Thu, 7 Nov 2019 00:38:42 +0100 [thread overview]
Message-ID: <cff63b5c-0e1c-3da9-23c9-2409daecde87@redhat.com> (raw)
In-Reply-To: <20191106221150.GR3016@techsingularity.net>
>>> I definitely do not intent to nack this work, I just have maintainability
>>> concerns and considering there is an alternative approach that does not
>>> require to touch page allocator internals and which we need to compare
>>> against then I do not really think there is any need to push something
>>> in right away. Or is there any pressing reason to have this merged right
>>> now?
>>
>> The alternative approach doesn't touch the page allocator, however it
>> still has essentially the same changes to __free_one_page. I suspect the
>> performance issue seen is mostly due to the fact that because it doesn't
>> touch the page allocator it is taking the zone lock and probing the page
>> for each set bit to see if the page is still free. As such the performance
>> regression seen gets worse the lower the order used for reporting.
>>
>
> What confused me quite a lot is that this is enabled at compile time
> and then incurs a performance hit whether there is a hypervisor that
> even cares is involved or not. So I don't think the performance angle
> justifying this approach is a good one because this implementation has
> issues of its own. Previously I said
>
> I worry that poking too much into the internal state of the
> allocator will be fragile long-term. There is the arch alloc/free
> hooks but they are typically about protections only and does not
> interfere with the internal state of the allocator. Compaction
> pokes in as well but once the page is off the free list, the page
> allocator no longer cares so again there is on interference with
> the internal state. If the state is interefered with externally,
> it becomes unclear what happens if things like page merging is
> deferred in a way the allocator cannot control as high-order
> allocation requests may fail for example.
>
> Adding an API for compaction does not get away from the problem that
> it'll be fragile to depend on the internal state of the allocator for
> correctness. Existing users that poke into the state do so as an
> optimistic shortcut but if it fails, nothing actually breaks. The free
> list reporting stuff might and will not be routinely tested.
>
> Take compaction as an example, the initial implementation of it was dumb
> as rocks and only started maintaining additional state and later poking
> into the page allocator when there was empirical evidence it was necessary.
>
> The initial implementation of page reporting should be the same, it
> should do no harm at all to users that don't care (hiding behind
> kconfig is not good enough, use static branches) and it should not
> depend on the internal state of the page allocator and ordering of free
> lists for correctness until it's shown it's absolutely necessary.
>
> You say that the zone lock has to be taken in the alternative
> implementation to check if it's still free and sure, that would cost
> but unless you are isolating that page immediately then you are racing
> once the lock is released. If you are isolating immediately, then isolate
> pages in batches to amortise the loock costs. The details of this could
> be really hard but this approach is essentially saying "everything,
> everywhere should take a small hit so the overhead is not noticeable for
> virtio users" which is a curious choice for a new feature.
>
> Regardless of the details of any implementation, the first one should be
> basic, do no harm and be relatively simple giving just a bare interface
> to virtio/qemu/etc. Then optimise it until such point as there is no
> chance but to poke into the core.
I second that. If somebody would ask me, I'd want to see a simple,
maintainable design that provides a net benefit, does not harm !virt and
possibly reuses existing core functionality (e.g., page isolation). We
can work from there to optimize.
--
Thanks,
David / dhildenb
next prev parent reply other threads:[~2019-11-06 23:39 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20191106000547.juQRi83gi%akpm@linux-foundation.org>
2019-11-06 12:16 ` Michal Hocko
2019-11-06 14:09 ` David Hildenbrand
2019-11-06 16:35 ` Alexander Duyck
2019-11-06 16:54 ` Michal Hocko
2019-11-06 17:48 ` Alexander Duyck
2019-11-06 22:11 ` Mel Gorman
2019-11-06 23:38 ` David Hildenbrand [this message]
2019-11-07 0:20 ` Alexander Duyck
2019-11-07 10:20 ` Mel Gorman
2019-11-07 16:07 ` Alexander Duyck
2019-11-08 9:43 ` Mel Gorman
2019-11-08 16:17 ` Alexander Duyck
2019-11-08 18:41 ` Mel Gorman
2019-11-08 20:29 ` Alexander Duyck
2019-11-09 14:57 ` Mel Gorman
2019-11-10 18:03 ` Alexander Duyck
2019-11-06 23:33 ` David Hildenbrand
2019-11-07 0:20 ` Dave Hansen
2019-11-07 0:52 ` David Hildenbrand
2019-11-07 17:12 ` Dave Hansen
2019-11-07 17:46 ` Michal Hocko
2019-11-07 18:08 ` Dave Hansen
2019-11-07 18:12 ` Alexander Duyck
2019-11-08 9:57 ` Michal Hocko
2019-11-08 16:43 ` Alexander Duyck
2019-11-07 18:46 ` Qian Cai
2019-11-07 18:02 ` Alexander Duyck
2019-11-07 19:37 ` Nitesh Narayan Lal
2019-11-07 22:46 ` Alexander Duyck
2019-11-07 22:43 ` David Hildenbrand
2019-11-08 0:42 ` Alexander Duyck
2019-11-08 7:06 ` David Hildenbrand
2019-11-08 17:18 ` Alexander Duyck
2019-11-12 13:04 ` David Hildenbrand
2019-11-12 18:34 ` Alexander Duyck
2019-11-12 21:05 ` David Hildenbrand
2019-11-12 22:17 ` David Hildenbrand
2019-11-12 22:19 ` Alexander Duyck
2019-11-12 23:10 ` David Hildenbrand
2019-11-13 0:31 ` Alexander Duyck
2019-11-13 18:51 ` Nitesh Narayan Lal
2019-11-06 16:49 ` Nitesh Narayan Lal
2019-11-11 18:52 ` Nitesh Narayan Lal
2019-11-11 22:00 ` Alexander Duyck
2019-11-12 15:19 ` Nitesh Narayan Lal
2019-11-12 16:18 ` Alexander Duyck
2019-11-13 18:39 ` Nitesh Narayan Lal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cff63b5c-0e1c-3da9-23c9-2409daecde87@redhat.com \
--to=david@redhat.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=alexander.h.duyck@linux.intel.com \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@intel.com \
--cc=konrad.wilk@oracle.com \
--cc=lcapitulino@redhat.com \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=mhocko@kernel.org \
--cc=mm-commits@vger.kernel.org \
--cc=mst@redhat.com \
--cc=osalvador@suse.de \
--cc=pagupta@redhat.com \
--cc=pbonzini@redhat.com \
--cc=riel@surriel.com \
--cc=vbabka@suse.cz \
--cc=wei.w.wang@intel.com \
--cc=willy@infradead.org \
--cc=yang.zhang.wz@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox