From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=ID7B=Y7=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-0.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no
	version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id B056AC43331
	for <linux-mm@archiver.kernel.org>; Thu,  7 Nov 2019 18:02:30 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 61CC02178F
	for <linux-mm@archiver.kernel.org>; Thu,  7 Nov 2019 18:02:30 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 61CC02178F
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 115F96B0003; Thu,  7 Nov 2019 13:02:30 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 0C8986B0005; Thu,  7 Nov 2019 13:02:30 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id EF9CC6B0007; Thu,  7 Nov 2019 13:02:29 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0098.hostedemail.com [216.40.44.98])
	by kanga.kvack.org (Postfix) with ESMTP id D89A46B0003
	for <linux-mm@kvack.org>; Thu,  7 Nov 2019 13:02:29 -0500 (EST)
Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay05.hostedemail.com (Postfix) with SMTP id 6BD7A181AEF1D
	for <linux-mm@kvack.org>; Thu,  7 Nov 2019 18:02:29 +0000 (UTC)
X-FDA: 76130251218.01.uncle92_6cef0be67720a
X-HE-Tag: uncle92_6cef0be67720a
X-Filterd-Recvd-Size: 10403
Received: from mga14.intel.com (mga14.intel.com [192.55.52.115])
	by imf20.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Thu,  7 Nov 2019 18:02:28 +0000 (UTC)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga005.jf.intel.com ([10.7.209.41])
  by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 07 Nov 2019 10:02:25 -0800
X-IronPort-AV: E=Sophos;i="5.68,278,1569308400"; 
   d="scan'208";a="377509910"
Received: from ahduyck-desk1.jf.intel.com ([10.7.198.76])
  by orsmga005-auth.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 07 Nov 2019 10:02:24 -0800
Message-ID: <f1f84779123b7d3f0613d20f6bd9b05c806b39f7.camel@linux.intel.com>
Subject: Re: + mm-introduce-reported-pages.patch added to -mm tree
From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
To: David Hildenbrand <david@redhat.com>, Michal Hocko <mhocko@kernel.org>
Cc: akpm@linux-foundation.org, aarcange@redhat.com,
 dan.j.williams@intel.com,  dave.hansen@intel.com, konrad.wilk@oracle.com,
 lcapitulino@redhat.com,  mgorman@techsingularity.net,
 mm-commits@vger.kernel.org, mst@redhat.com,  osalvador@suse.de,
 pagupta@redhat.com, pbonzini@redhat.com, riel@surriel.com,  vbabka@suse.cz,
 wei.w.wang@intel.com, willy@infradead.org, yang.zhang.wz@gmail.com, 
 linux-mm@kvack.org
Date: Thu, 07 Nov 2019 10:02:24 -0800
In-Reply-To: <f84f53b8-221e-02bd-2e7a-c0040ca03a38@redhat.com>
References: <20191106121605.GH8314@dhcp22.suse.cz>
	 <CD4A882A-91A7-43F2-B31C-3FFD85289907@redhat.com>
	 <dc1b2b3f4db8591303932351971f55688e0e240e.camel@linux.intel.com>
	 <20191106165416.GO8314@dhcp22.suse.cz>
	 <e90877ab21cdb7bb6a7a71a035f1b03e5544f384.camel@linux.intel.com>
	 <f84f53b8-221e-02bd-2e7a-c0040ca03a38@redhat.com>
Content-Type: text/plain; charset="UTF-8"
User-Agent: Evolution 3.30.5 (3.30.5-1.fc29) 
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Thu, 2019-11-07 at 00:33 +0100, David Hildenbrand wrote:
> On 06.11.19 18:48, Alexander Duyck wrote:
> > On Wed, 2019-11-06 at 17:54 +0100, Michal Hocko wrote:
> > > On Wed 06-11-19 08:35:43, Alexander Duyck wrote:
> > > > On Wed, 2019-11-06 at 15:09 +0100, David Hildenbrand wrote:
> > > > > > Am 06.11.2019 um 13:16 schrieb Michal Hocko <mhocko@kernel.or=
g>:
> > > > > >=20
> > > > > > =EF=BB=BFI didn't have time to read through newer versions of=
 this patch series
> > > > > > but I remember there were concerns about this functionality b=
eing pulled
> > > > > > into the page allocator previously both by me and Mel [1][2].=
 Have those been
> > > > > > addressed? I do not see an ack from Mel or any other MM peopl=
e. Is there
> > > > > > really a consensus that we want something like that living in=
 the
> > > > > > allocator?
> > > > >=20
> > > > > I don=E2=80=98t think there is. The discussion is still ongoing=
 (although quiet,
> > > > > Nitesh is working on a new version AFAIK). I think we should no=
t rush
> > > > > this.
> > > >=20
> > > > How much time is needed to get a review? I waited 2 weeks since p=
osting
> > > > v12 and the only comments I got on the code were from Andrew. Mos=
t of this
> > > > hasn't changed much since v10 and that was posted back in mid Sep=
tember. I
> > > > have been down to making small tweaks here and there and haven't =
had any
> > > > real critiques on the approach since Mel had the comments about c=
onflicts
> > > > with compaction which I addressed by allowing compaction to punt =
the
> > > > reporter out so that it could split and splice the lists as it wa=
lked
> > > > through them.
> > >=20
> > > Well, people are busy and MM community is not a large one. I cannot
> > > really help you much other than keep poking those people and give
> > > reasonable arguments so they decide to ack your patch.
> >=20
> > I get that. But v10 was posted in mid September. Back then we had a
> > discussion about addressing what Mel had mentioned and I had mentione=
d
> > then that I had addressed it by allowing compaction to essentially re=
set
> > the reporter to get it out of the list so compaction could do this sp=
lit
> > and splice tumbling logic.
> >=20
> > > I definitely do not intent to nack this work, I just have maintaina=
bility
> > > concerns and considering there is an alternative approach that does=
 not
> > > require to touch page allocator internals and which we need to comp=
are
> > > against then I do not really think there is any need to push someth=
ing
> > > in right away. Or is there any pressing reason to have this merged =
right
> > > now?
> >=20
> > The alternative approach doesn't touch the page allocator, however it
> > still has essentially the same changes to __free_one_page. I suspect =
the
>=20
> Nitesh is working on Michals suggestion to use page isolation instead=20
> AFAIK - which avoids this.

Okay. However it makes it much harder to discuss when we are comparing
against code that isn't public. If the design is being redone do we have
any ETA for when we will have something to actually compare to?

> > performance issue seen is mostly due to the fact that because it does=
n't
> > touch the page allocator it is taking the zone lock and probing the p=
age
> > for each set bit to see if the page is still free. As such the perfor=
mance
> > regression seen gets worse the lower the order used for reporting.
> >=20
> > Also I suspect Nitesh's patches are also in need of further review. I=
 have
> > provided feedback however my focus ended up being on more the kernel
> > panics and 30% performance regression rather than debating architectu=
re.
>=20
> Please don't take this personally, but I really dislike you taking abou=
t=20
> Niteshs RFCs (!) and pushing for your approach (although it was you tha=
t=20
> was late to the party!) in that way. If there are problems then please=20
> collaborate and fix instead of using the same wrong arguments over and=20
> over again.

Since Nitesh is in the middle of doing a full rewrite anyway I don't have
much to compare against except for the previous set, which still needs
fixes.  It is why I mentioned in the cover of the last patch set that I
would prefer to not discuss it since I have no visibility into the patch
set he is now working on.

> a) hotplug/sparse zones: I explained a couple of times why we can ignor=
e=20
> that. There was never a reply from you, yet you keep coming up with=20
> that. I don't enjoy talking to a wall.

This gets to the heart of how Nitesh's patch set works. It is assuming
that every zone is linear, that there will be no overlap between zones,
and that the zones don't really change. These are key architectural
assumptions that should really be discussed instead of simply dismissed.

I guess part of the difference between us is that I am looking for
something that is production ready and not a proof of concept. It sounds
like you would prefer this work stays in a proof of concept stage for som=
e
time longer.

> b) Locking optimizations: Come on, these are premature optimizations an=
d=20
> nothing to dictate your design. *nobody* but you cares about that in an=
=20
> initial approach we get upstream. We can always optimize that.

My concern isn't so much the locking as the fact that it is the hunt and
peck approach through a bitmap that will become increasingly more stale a=
s
you are processing the data. Every bit you have to test for requires
taking a zone lock and then probing to see if the page is still free and
the right size. My concern is how much time is going to be spent with the
zone lock held while other CPUs are waiting on access.

> c) Kernel panics: Come on, we are talking about complicated RFCs here=20
> with moving design decisions. We want do discuss *design* and=20
> *architecture* here, not *implementation details*.

Then why ask me to compare performance against it? You were the one
pushing for me to test it, not me. If you and Nitesh knew the design
wasn't complete enough to run it why ask me to test it?

Many of the kernel panics for the patch sets in the past have been relate=
d
to fundamental architectural issues. For example ignoring things like
NUMA, mangling the free_list by accessing it with the wrong locks held,
etc.

> d) Performance: We want to see a design that fits into the whole=20
> architecture cleanly, is maintainable, and provides a benefit. Of=20
> course, performance is relevant, but it certainly should not dictate ou=
r=20
> design of a *virtualization specific optimization feature*. Performance=
=20
> is not everything, otherwise please feel free and rewrite the kernel in=
=20
> ASM and claim it is better because it is faster.

I agree performance is not everything. But when a system grinds down to
60% of what it was originally I find that significant.

> Again, I do value your review and feedback, but I absolutely do not=20
> enjoy the way you are trying to push your series here, sorry.

Well I am a bit frustrated as I have had to provide a significant amount
of feedback on Nitesh's patches, and in spite of that I feel like I am
getting nothing in return. I have pointed out the various issues and
opportunities to address the issues. At this point there are sections of
his code that are directly copied from mine[1]. I have done everything I
can to help the patches along but it seems like they aren't getting out o=
f
RFC or proof-of-concept state any time soon. So with that being the case
why not consider his patch set as something that could end up being a
follow-on/refactor instead of an alternative to mine?

> Yes, if we end up finding out that there is real value in your approach=
,=20
> nothing speaks against considering it. But please don't try to hurry an=
d=20
> push your series in that way. Please give everybody to time to evaluate=
.

I would love to argue this patch set on the merits. However I really don'=
t
feel like I am getting a fair comparison here, at least from you. Every
other reply on the thread seems to be from you trying to reinforce any
criticism and taking the opportunity to mention that there is another
solution out there. It is fine to fight for your own idea, but at least
let me reply to the criticisms of my own patchset before you pile on. I
would really prefer to discuss Nitesh's patches on a posting of his patch
set rather than here.

[1]: https://lore.kernel.org/lkml/101649ae-58d4-76ee-91f3-42ac1c145c46@re=
dhat.com/