From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=ID7B=Y7=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-2.3 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 8E5C0C5DF61
	for <linux-mm@archiver.kernel.org>; Thu,  7 Nov 2019 10:20:52 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 4AC1E2087E
	for <linux-mm@archiver.kernel.org>; Thu,  7 Nov 2019 10:20:52 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4AC1E2087E
Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=techsingularity.net
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id C0BDC6B0003; Thu,  7 Nov 2019 05:20:51 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id BBC226B0006; Thu,  7 Nov 2019 05:20:51 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id AD2016B0007; Thu,  7 Nov 2019 05:20:51 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0187.hostedemail.com [216.40.44.187])
	by kanga.kvack.org (Postfix) with ESMTP id 9499B6B0003
	for <linux-mm@kvack.org>; Thu,  7 Nov 2019 05:20:51 -0500 (EST)
Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay01.hostedemail.com (Postfix) with SMTP id 53015180AD80F
	for <linux-mm@kvack.org>; Thu,  7 Nov 2019 10:20:51 +0000 (UTC)
X-FDA: 76129087902.17.watch48_759de9ac4c5c
X-HE-Tag: watch48_759de9ac4c5c
X-Filterd-Recvd-Size: 8088
Received: from outbound-smtp36.blacknight.com (outbound-smtp36.blacknight.com [46.22.139.219])
	by imf43.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Thu,  7 Nov 2019 10:20:50 +0000 (UTC)
Received: from mail.blacknight.com (unknown [81.17.254.11])
	by outbound-smtp36.blacknight.com (Postfix) with ESMTPS id BA58B9BF
	for <linux-mm@kvack.org>; Thu,  7 Nov 2019 10:20:48 +0000 (GMT)
Received: (qmail 13885 invoked from network); 7 Nov 2019 10:20:48 -0000
Received: from unknown (HELO techsingularity.net) (mgorman@techsingularity.net@[84.203.23.195])
  by 81.17.254.9 with ESMTPSA (AES256-SHA encrypted, authenticated); 7 Nov 2019 10:20:48 -0000
Date: Thu, 7 Nov 2019 10:20:45 +0000
From: Mel Gorman <mgorman@techsingularity.net>
To: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>, David Hildenbrand <david@redhat.com>,
	akpm@linux-foundation.org, aarcange@redhat.com,
	dan.j.williams@intel.com, dave.hansen@intel.com,
	konrad.wilk@oracle.com, lcapitulino@redhat.com,
	mm-commits@vger.kernel.org, mst@redhat.com, osalvador@suse.de,
	pagupta@redhat.com, pbonzini@redhat.com, riel@surriel.com,
	vbabka@suse.cz, wei.w.wang@intel.com, willy@infradead.org,
	yang.zhang.wz@gmail.com, linux-mm@kvack.org
Subject: Re: + mm-introduce-reported-pages.patch added to -mm tree
Message-ID: <20191107102045.GS3016@techsingularity.net>
References: <20191106121605.GH8314@dhcp22.suse.cz>
 <CD4A882A-91A7-43F2-B31C-3FFD85289907@redhat.com>
 <dc1b2b3f4db8591303932351971f55688e0e240e.camel@linux.intel.com>
 <20191106165416.GO8314@dhcp22.suse.cz>
 <e90877ab21cdb7bb6a7a71a035f1b03e5544f384.camel@linux.intel.com>
 <20191106221150.GR3016@techsingularity.net>
 <673862eb7f0425f638ea3fc507d0e8049ee4133c.camel@linux.intel.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-15
Content-Disposition: inline
In-Reply-To: <673862eb7f0425f638ea3fc507d0e8049ee4133c.camel@linux.intel.com>
User-Agent: Mutt/1.10.1 (2018-07-13)
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Wed, Nov 06, 2019 at 04:20:56PM -0800, Alexander Duyck wrote:
> > > I get that. But v10 was posted in mid September. Back then we had a
> > > discussion about addressing what Mel had mentioned and I had mentioned
> > > then that I had addressed it by allowing compaction to essentially reset
> > > the reporter to get it out of the list so compaction could do this split
> > > and splice tumbling logic.
> > > 
> > 
> > At the time I said "While in theory that could be addressed by always
> > going through an interface maintained by the page allocator, it would be
> > tricky to test the virtio case in particular."
> > 
> > Now, I get that you added an interface for that *but* if compaction was
> > ever updated or yet another approach was taken to deal with it, virtio
> > could get broken. If the page allocator itself has a bug or compaction
> > has a bug, the effect is very visible. While stuff does slip through, it
> > tends to be an obscure corner case. However, when virtio gets broken,
> > it'll be a long time before we get it.
> 
> Specifically all we are doing is walking through a list using a pointer as
> an iterator.  I would think we would likely see the page reporting blow up
> pretty quickly if we were to somehow mess up the list so bad that it still
> had access to a page that was no longer on the list. Other than that if
> the list is just shuffled without resetting the pointer then worst case
> would be that we end up with the reporting being rearmed as soon as we
> were complete due to a batch of unreported pages being shuffled past the
> iterator.
> 

And what I'm saying is that the initial version should have focused
exclusively on the mechanism to report free pages and kept the search as
simple as possible with optimisations on top. By all means track pages
that are already reported and skip them because that is a relatively basic
operation with the caveat that even the page_reported checks should be
behind a static branch if there is no virtio balloon driver loaded.

The page reporting to a hypervisor is done out of band from any application
so even if it takes a little longer to do that reporting, is there an
impact really? If so, how much and what frequency is this impact incurred?
The primary impact I can see is that free pages are invisible while the
notification takes place so the window for that may be wider if it takes
longer to find enough pages to send in batch but there will always be a
window where the free pages are invisible.

> > <SNIP>
> >
> > What confused me quite a lot is that this is enabled at compile time
> > and then incurs a performance hit whether there is a hypervisor that
> > even cares is involved or not. So I don't think the performance angle
> > justifying this approach is a good one because this implementation has
> > issues of its own. Previously I said
> 
> We are adding code. There is always going to be some performance impact.

And that impact should be negligible in so far as possible, parts of
it are protected by branches but as it stands, just building the virtio
balloon driver enables all of this whether the running system is using
virtual machines or not.

> > Adding an API for compaction does not get away from the problem that
> > it'll be fragile to depend on the internal state of the allocator for
> > correctness. Existing users that poke into the state do so as an
> > optimistic shortcut but if it fails, nothing actually breaks. The free
> > list reporting stuff might and will not be routinely tested.
> 
> I view what I am doing as not being too different from that. I am only
> maintaining state when I can. I understand that it makes things more
> fragile as we have more routes where things could go wrong, but isn't that
> the case with adding any interface. I have simplified this about as far as
> it can go.
> 

The enhanced tracking of state is not necessary initially to simply report
free pages to the hypervisor. 

> All I am tracking is basically a pointer into the freelists so that we can
> remember where we left off, and adding a flag indicating that those
> pointers are there. If the flag is cleared we stop using the pointers. We
> can also just reset to the list_head when an individual freelist is
> manipulated since the list_head itself will never actually go anywhere.
> 
> > Take compaction as an example, the initial implementation of it was dumb
> > as rocks and only started maintaining additional state and later poking
> > into the page allocator when there was empirical evidence it was necessary.
> 
> The issue is that we are trying to do this iteratively. As such I need to
> store some sort of state so that once I go off to report the small bunch
> of pages I have collected I have some way to quickly get back to where I
> was. Without doing that I burn a large number of cycles having to rewalk
> the list.
> 
> That is why I rewrote this so that we can have the list get completely
> shuffled and we don't care as long as our iterator is reset to the
> list_head, or the flag indicating that the iterators are active is
> cleared.
> 

And again, it's not clear that the additional complexity is required.
Specifically, it'll be very hard to tell if the state tracking is
actually helping because excessive list shuffling due to compaction may
mean that a lot of state is being tracked while the full free list ends
up having to be searched anyway.

As it stands, the optimisations are hard-wired into the providing of
the feature itself making it an all or nothing approach and no option to
"merge a bare-bones mechanism that is behind static branches and build
optimisations on top". At least that way, any competing approach could
be based on optimisations alone while the feature is still available.

It also doesn't help that reviewing the series takes a lot of jumping
around. This patch has a kfree of a structure that is not allocated
until a later patch for example.

-- 
Mel Gorman
SUSE Labs