Re: + mm-introduce-reported-pages.patch added to -mm tree

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: David Hildenbrand <david@redhat.com>
To: Alexander Duyck <alexander.h.duyck@linux.intel.com>,
	Michal Hocko <mhocko@kernel.org>
Cc: akpm@linux-foundation.org, aarcange@redhat.com,
	dan.j.williams@intel.com, dave.hansen@intel.com,
	konrad.wilk@oracle.com, lcapitulino@redhat.com,
	mgorman@techsingularity.net, mm-commits@vger.kernel.org,
	mst@redhat.com, osalvador@suse.de, pagupta@redhat.com,
	pbonzini@redhat.com, riel@surriel.com, vbabka@suse.cz,
	wei.w.wang@intel.com, willy@infradead.org,
	yang.zhang.wz@gmail.com, linux-mm@kvack.org
Subject: Re: + mm-introduce-reported-pages.patch added to -mm tree
Date: Thu, 7 Nov 2019 23:43:47 +0100	[thread overview]
Message-ID: <4cf64ff9-b099-d50a-5c08-9a8f3a2f52bf@redhat.com> (raw)
In-Reply-To: <f1f84779123b7d3f0613d20f6bd9b05c806b39f7.camel@linux.intel.com>

[...]
>>> The alternative approach doesn't touch the page allocator, however it
>>> still has essentially the same changes to __free_one_page. I suspect the
>>
>> Nitesh is working on Michals suggestion to use page isolation instead
>> AFAIK - which avoids this.
> 
> Okay. However it makes it much harder to discuss when we are comparing
> against code that isn't public. If the design is being redone do we have
> any ETA for when we will have something to actually compare to?

Maybe Nitesh got a little bit more careful with sending RFCs because he 
was getting negatives vibes due to the prototype quality. I might be 
wrong and he really is only looking into some performance aspects.

> 
>>> performance issue seen is mostly due to the fact that because it doesn't
>>> touch the page allocator it is taking the zone lock and probing the page
>>> for each set bit to see if the page is still free. As such the performance
>>> regression seen gets worse the lower the order used for reporting.
>>>
>>> Also I suspect Nitesh's patches are also in need of further review. I have
>>> provided feedback however my focus ended up being on more the kernel
>>> panics and 30% performance regression rather than debating architecture.
>>
>> Please don't take this personally, but I really dislike you taking about
>> Niteshs RFCs (!) and pushing for your approach (although it was you that
>> was late to the party!) in that way. If there are problems then please
>> collaborate and fix instead of using the same wrong arguments over and
>> over again.
> 
> Since Nitesh is in the middle of doing a full rewrite anyway I don't have
> much to compare against except for the previous set, which still needs
> fixes.  It is why I mentioned in the cover of the last patch set that I
> would prefer to not discuss it since I have no visibility into the patch
> set he is now working on.

Me too :) I'd love to see how Michals idea with page isolation worked 
out. But I can understand that Nitesh wants to explore some details first.

> 
>> a) hotplug/sparse zones: I explained a couple of times why we can ignore
>> that. There was never a reply from you, yet you keep coming up with
>> that. I don't enjoy talking to a wall.
> 
> This gets to the heart of how Nitesh's patch set works. It is assuming
> that every zone is linear, that there will be no overlap between zones,
> and that the zones don't really change. These are key architectural
> assumptions that should really be discussed instead of simply dismissed.

IMHO, implementation detail of using bitmaps for each zone right now. 
Maybe there is a better data structure for tracking this sparse data 
(e.g., sparse bitmaps), or a better way to handle bitmaps. I think this 
is a good start to get something relatively simple implemented (yeah, 
there were some pitfalls in previous versions, maybe page isolation will 
make that less error prone in an RFC).

> 
> I guess part of the difference between us is that I am looking for
> something that is production ready and not a proof of concept. It sounds
> like you would prefer this work stays in a proof of concept stage for some
> time longer.

Certainly not, but I don't think we have to rush. As I said, let's come 
to a conclusion if we want this in the allocator or not. For me, other 
things (e.g., maintainability) are more important. And AFAIKT, also for 
Michal and Mel.

> 
>> b) Locking optimizations: Come on, these are premature optimizations and
>> nothing to dictate your design. *nobody* but you cares about that in an
>> initial approach we get upstream. We can always optimize that.
> 
> My concern isn't so much the locking as the fact that it is the hunt and
> peck approach through a bitmap that will become increasingly more stale as
> you are processing the data. Every bit you have to test for requires
> taking a zone lock and then probing to see if the page is still free and
> the right size. My concern is how much time is going to be spent with the
> zone lock held while other CPUs are waiting on access.

Valid concerns, really. But I don't think these are road blockers.

> 
>> c) Kernel panics: Come on, we are talking about complicated RFCs here
>> with moving design decisions. We want do discuss *design* and
>> *architecture* here, not *implementation details*.
> 
> Then why ask me to compare performance against it? You were the one
> pushing for me to test it, not me. If you and Nitesh knew the design
> wasn't complete enough to run it why ask me to test it?

The design changed with Michals comment about page isolation, that was 
afterwards, no?

Your performance comparison was very helpful. I think, I said back then 
that I am interested in fundamental performance differences. You 
reported differences, AFAIK Nitesh was able to resolve one (MAX_ORDER - 
1 if I'm, not wrong) using implementation changes. I *think* he is still 
looking into another comparison.

> 
> Many of the kernel panics for the patch sets in the past have been related
> to fundamental architectural issues. For example ignoring things like
> NUMA, mangling the free_list by accessing it with the wrong locks held,
> etc.

Yeah, I think Nitesh was still fairly new to the kernel when he started 
working on Riks ideas. I assume he learned a lot during the last 
months/years :)

> 
>> d) Performance: We want to see a design that fits into the whole
>> architecture cleanly, is maintainable, and provides a benefit. Of
>> course, performance is relevant, but it certainly should not dictate our
>> design of a *virtualization specific optimization feature*. Performance
>> is not everything, otherwise please feel free and rewrite the kernel in
>> ASM and claim it is better because it is faster.
> 
> I agree performance is not everything. But when a system grinds down to
> 60% of what it was originally I find that significant.

I totally agree, that's why I asked for a fundamental performance 
comparison, which helps to make a decision. "is this gain in performance 
worth moving it into the core".

> 
>> Again, I do value your review and feedback, but I absolutely do not
>> enjoy the way you are trying to push your series here, sorry.
> 
> Well I am a bit frustrated as I have had to provide a significant amount
> of feedback on Nitesh's patches, and in spite of that I feel like I am
> getting nothing in return. I have pointed out the various issues and

I can understand the frustration. I reviewed all the parts I feel 
comfortable with (e.g., page flag vs. page type, cleanup patches), and 
left the core buddy review to experts (Mel), because that's not my aree 
of experience (yet, lol). Yeah, MM people are busy.

> opportunities to address the issues. At this point there are sections of
> his code that are directly copied from mine[1]. I have done everything I

Bad: he's not crediting you. Good: Both implementations came to the same 
conclusion virtio-wise.

> can to help the patches along but it seems like they aren't getting out of
> RFC or proof-of-concept state any time soon. So with that being the case

My gut feeling is that with page isolation the RFC stage could be over 
soon. It heavily simplifies locking/blocking pages from getting 
allocated. I might be wrong. But that's what it is when you explore new 
ideas.

> why not consider his patch set as something that could end up being a
> follow-on/refactor instead of an alternative to mine?

I guess MM people prefer to start simple and only add core functionality 
when really needed / it can be shown that there is a serious performance 
impact.

> 
>> Yes, if we end up finding out that there is real value in your approach,
>> nothing speaks against considering it. But please don't try to hurry and
>> push your series in that way. Please give everybody to time to evaluate.
> 
> I would love to argue this patch set on the merits. However I really don't
> feel like I am getting a fair comparison here, at least from you. Every
> other reply on the thread seems to be from you trying to reinforce any
> criticism and taking the opportunity to mention that there is another
> solution out there. It is fine to fight for your own idea, but at least

"for your own idea" - are you saying Nitesh's approach is my idea? I 
hope not, otherwise I would get credit for Rik's and Nitesh's work by 
simply providing review comments.

Of course it is okay to fight for your own idea.

> let me reply to the criticisms of my own patchset before you pile on. I

Me (+ Michal): Are these core buddy changes really wanted and required. 
Can we evaluate the alternatives properly. (Michal even proposed 
something very similar to Nitesh's approach before even looking into it)

You: Please take my patch set, it is better than the alternatives 
because of X, for X in {RFC quality, sparse zones, locking internals, 
current performance differences}

And all I am requesting is that we do the evaluation, discuss if there 
are really no alternatives, and sort out fundamental issues with 
external tracking.

Michal asked the very same question again at the beginning of this 
thread: "Is there really a consensus"

Reading the replies, "no".

-- 

Thanks,

David / dhildenb

next prev parent reply	other threads:[~2019-11-07 22:44 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20191106000547.juQRi83gi%akpm@linux-foundation.org>
2019-11-06 12:16 ` Michal Hocko
2019-11-06 14:09   ` David Hildenbrand
2019-11-06 16:35     ` Alexander Duyck
2019-11-06 16:54       ` Michal Hocko
2019-11-06 17:48         ` Alexander Duyck
2019-11-06 22:11           ` Mel Gorman
2019-11-06 23:38             ` David Hildenbrand
2019-11-07  0:20             ` Alexander Duyck
2019-11-07 10:20               ` Mel Gorman
2019-11-07 16:07                 ` Alexander Duyck
2019-11-08  9:43                   ` Mel Gorman
2019-11-08 16:17                     ` Alexander Duyck
2019-11-08 18:41                       ` Mel Gorman
2019-11-08 20:29                         ` Alexander Duyck
2019-11-09 14:57                           ` Mel Gorman
2019-11-10 18:03                             ` Alexander Duyck
2019-11-06 23:33           ` David Hildenbrand
2019-11-07  0:20             ` Dave Hansen
2019-11-07  0:52               ` David Hildenbrand
2019-11-07 17:12                 ` Dave Hansen
2019-11-07 17:46                   ` Michal Hocko
2019-11-07 18:08                     ` Dave Hansen
2019-11-07 18:12                     ` Alexander Duyck
2019-11-08  9:57                       ` Michal Hocko
2019-11-08 16:43                         ` Alexander Duyck
2019-11-07 18:46                   ` Qian Cai
2019-11-07 18:02             ` Alexander Duyck
2019-11-07 19:37               ` Nitesh Narayan Lal
2019-11-07 22:46                 ` Alexander Duyck
2019-11-07 22:43               ` David Hildenbrand [this message]
2019-11-08  0:42                 ` Alexander Duyck
2019-11-08  7:06                   ` David Hildenbrand
2019-11-08 17:18                     ` Alexander Duyck
2019-11-12 13:04                       ` David Hildenbrand
2019-11-12 18:34                         ` Alexander Duyck
2019-11-12 21:05                           ` David Hildenbrand
2019-11-12 22:17                             ` David Hildenbrand
2019-11-12 22:19                             ` Alexander Duyck
2019-11-12 23:10                               ` David Hildenbrand
2019-11-13  0:31                                 ` Alexander Duyck
2019-11-13 18:51                           ` Nitesh Narayan Lal
2019-11-06 16:49   ` Nitesh Narayan Lal
2019-11-11 18:52   ` Nitesh Narayan Lal
2019-11-11 22:00     ` Alexander Duyck
2019-11-12 15:19       ` Nitesh Narayan Lal
2019-11-12 16:18         ` Alexander Duyck
2019-11-13 18:39           ` Nitesh Narayan Lal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4cf64ff9-b099-d50a-5c08-9a8f3a2f52bf@redhat.com \
    --to=david@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.h.duyck@linux.intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=konrad.wilk@oracle.com \
    --cc=lcapitulino@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=mm-commits@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=osalvador@suse.de \
    --cc=pagupta@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=riel@surriel.com \
    --cc=vbabka@suse.cz \
    --cc=wei.w.wang@intel.com \
    --cc=willy@infradead.org \
    --cc=yang.zhang.wz@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox