From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.4 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4066DC43331 for ; Tue, 12 Nov 2019 13:04:26 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2FB352084E for ; Tue, 12 Nov 2019 13:04:25 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="F1if5l5m" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2FB352084E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 852316B0003; Tue, 12 Nov 2019 08:04:24 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 803386B0005; Tue, 12 Nov 2019 08:04:24 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6A3316B0006; Tue, 12 Nov 2019 08:04:24 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0020.hostedemail.com [216.40.44.20]) by kanga.kvack.org (Postfix) with ESMTP id 53DD46B0003 for ; Tue, 12 Nov 2019 08:04:24 -0500 (EST) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with SMTP id 1CF0E181AEF10 for ; Tue, 12 Nov 2019 13:04:24 +0000 (UTC) X-FDA: 76147644048.25.owl40_7d96914fa4b1f X-HE-Tag: owl40_7d96914fa4b1f X-Filterd-Recvd-Size: 13170 Received: from us-smtp-1.mimecast.com (us-smtp-delivery-1.mimecast.com [207.211.31.120]) by imf31.hostedemail.com (Postfix) with ESMTP for ; Tue, 12 Nov 2019 13:04:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1573563862; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=csd4gp5nw01gyv/QXlXG6nCjSaKLWmBGUHy+CsDq1wg=; b=F1if5l5m8Dx+CoHA5RLmrin1pX+d5l9q0bJRo/7J1OY9y2tIvquirdFrM4dn0hfdGYPP9b tNVNbG0gZZ1slTyJZlL0uMiHFLEwAreNtHHTvHeLuB93mBAuL/qHes48o0+P/CObVEVgym op9YhESJ5WG0GscEVah4OSlXtIaRv50= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-31-e_WUBXFLMAGbLzceRPT1Xw-1; Tue, 12 Nov 2019 08:04:20 -0500 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 490CA1005500; Tue, 12 Nov 2019 13:04:18 +0000 (UTC) Received: from [10.36.117.126] (ovpn-117-126.ams2.redhat.com [10.36.117.126]) by smtp.corp.redhat.com (Postfix) with ESMTP id B37E66090D; Tue, 12 Nov 2019 13:04:08 +0000 (UTC) Subject: Re: + mm-introduce-reported-pages.patch added to -mm tree To: Alexander Duyck , Michal Hocko Cc: akpm@linux-foundation.org, aarcange@redhat.com, dan.j.williams@intel.com, dave.hansen@intel.com, konrad.wilk@oracle.com, lcapitulino@redhat.com, mgorman@techsingularity.net, mm-commits@vger.kernel.org, mst@redhat.com, osalvador@suse.de, pagupta@redhat.com, pbonzini@redhat.com, riel@surriel.com, vbabka@suse.cz, wei.w.wang@intel.com, willy@infradead.org, yang.zhang.wz@gmail.com, linux-mm@kvack.org References: <20191106121605.GH8314@dhcp22.suse.cz> <20191106165416.GO8314@dhcp22.suse.cz> <4cf64ff9-b099-d50a-5c08-9a8f3a2f52bf@redhat.com> <131f72aa-c4e6-572d-f616-624316b62842@redhat.com> <1d881e86ed58511b20883fd0031623fe6cade480.camel@linux.intel.com> From: David Hildenbrand Autocrypt: addr=david@redhat.com; prefer-encrypt=mutual; keydata= mQINBFXLn5EBEAC+zYvAFJxCBY9Tr1xZgcESmxVNI/0ffzE/ZQOiHJl6mGkmA1R7/uUpiCjJ dBrn+lhhOYjjNefFQou6478faXE6o2AhmebqT4KiQoUQFV4R7y1KMEKoSyy8hQaK1umALTdL QZLQMzNE74ap+GDK0wnacPQFpcG1AE9RMq3aeErY5tujekBS32jfC/7AnH7I0v1v1TbbK3Gp XNeiN4QroO+5qaSr0ID2sz5jtBLRb15RMre27E1ImpaIv2Jw8NJgW0k/D1RyKCwaTsgRdwuK Kx/Y91XuSBdz0uOyU/S8kM1+ag0wvsGlpBVxRR/xw/E8M7TEwuCZQArqqTCmkG6HGcXFT0V9 PXFNNgV5jXMQRwU0O/ztJIQqsE5LsUomE//bLwzj9IVsaQpKDqW6TAPjcdBDPLHvriq7kGjt WhVhdl0qEYB8lkBEU7V2Yb+SYhmhpDrti9Fq1EsmhiHSkxJcGREoMK/63r9WLZYI3+4W2rAc UucZa4OT27U5ZISjNg3Ev0rxU5UH2/pT4wJCfxwocmqaRr6UYmrtZmND89X0KigoFD/XSeVv jwBRNjPAubK9/k5NoRrYqztM9W6sJqrH8+UWZ1Idd/DdmogJh0gNC0+N42Za9yBRURfIdKSb B3JfpUqcWwE7vUaYrHG1nw54pLUoPG6sAA7Mehl3nd4pZUALHwARAQABtCREYXZpZCBIaWxk ZW5icmFuZCA8ZGF2aWRAcmVkaGF0LmNvbT6JAj4EEwECACgFAljj9eoCGwMFCQlmAYAGCwkI BwMCBhUIAgkKCwQWAgMBAh4BAheAAAoJEE3eEPcA/4Na5IIP/3T/FIQMxIfNzZshIq687qgG 8UbspuE/YSUDdv7r5szYTK6KPTlqN8NAcSfheywbuYD9A4ZeSBWD3/NAVUdrCaRP2IvFyELj xoMvfJccbq45BxzgEspg/bVahNbyuBpLBVjVWwRtFCUEXkyazksSv8pdTMAs9IucChvFmmq3 jJ2vlaz9lYt/lxN246fIVceckPMiUveimngvXZw21VOAhfQ+/sofXF8JCFv2mFcBDoa7eYob s0FLpmqFaeNRHAlzMWgSsP80qx5nWWEvRLdKWi533N2vC/EyunN3HcBwVrXH4hxRBMco3jvM m8VKLKao9wKj82qSivUnkPIwsAGNPdFoPbgghCQiBjBe6A75Z2xHFrzo7t1jg7nQfIyNC7ez MZBJ59sqA9EDMEJPlLNIeJmqslXPjmMFnE7Mby/+335WJYDulsRybN+W5rLT5aMvhC6x6POK z55fMNKrMASCzBJum2Fwjf/VnuGRYkhKCqqZ8gJ3OvmR50tInDV2jZ1DQgc3i550T5JDpToh dPBxZocIhzg+MBSRDXcJmHOx/7nQm3iQ6iLuwmXsRC6f5FbFefk9EjuTKcLMvBsEx+2DEx0E UnmJ4hVg7u1PQ+2Oy+Lh/opK/BDiqlQ8Pz2jiXv5xkECvr/3Sv59hlOCZMOaiLTTjtOIU7Tq 7ut6OL64oAq+uQINBFXLn5EBEADn1959INH2cwYJv0tsxf5MUCghCj/CA/lc/LMthqQ773ga uB9mN+F1rE9cyyXb6jyOGn+GUjMbnq1o121Vm0+neKHUCBtHyseBfDXHA6m4B3mUTWo13nid 0e4AM71r0DS8+KYh6zvweLX/LL5kQS9GQeT+QNroXcC1NzWbitts6TZ+IrPOwT1hfB4WNC+X 2n4AzDqp3+ILiVST2DT4VBc11Gz6jijpC/KI5Al8ZDhRwG47LUiuQmt3yqrmN63V9wzaPhC+ xbwIsNZlLUvuRnmBPkTJwwrFRZvwu5GPHNndBjVpAfaSTOfppyKBTccu2AXJXWAE1Xjh6GOC 8mlFjZwLxWFqdPHR1n2aPVgoiTLk34LR/bXO+e0GpzFXT7enwyvFFFyAS0Nk1q/7EChPcbRb hJqEBpRNZemxmg55zC3GLvgLKd5A09MOM2BrMea+l0FUR+PuTenh2YmnmLRTro6eZ/qYwWkC u8FFIw4pT0OUDMyLgi+GI1aMpVogTZJ70FgV0pUAlpmrzk/bLbRkF3TwgucpyPtcpmQtTkWS gDS50QG9DR/1As3LLLcNkwJBZzBG6PWbvcOyrwMQUF1nl4SSPV0LLH63+BrrHasfJzxKXzqg rW28CTAE2x8qi7e/6M/+XXhrsMYG+uaViM7n2je3qKe7ofum3s4vq7oFCPsOgwARAQABiQIl BBgBAgAPBQJVy5+RAhsMBQkJZgGAAAoJEE3eEPcA/4NagOsP/jPoIBb/iXVbM+fmSHOjEshl KMwEl/m5iLj3iHnHPVLBUWrXPdS7iQijJA/VLxjnFknhaS60hkUNWexDMxVVP/6lbOrs4bDZ NEWDMktAeqJaFtxackPszlcpRVkAs6Msn9tu8hlvB517pyUgvuD7ZS9gGOMmYwFQDyytpepo YApVV00P0u3AaE0Cj/o71STqGJKZxcVhPaZ+LR+UCBZOyKfEyq+ZN311VpOJZ1IvTExf+S/5 lqnciDtbO3I4Wq0ArLX1gs1q1XlXLaVaA3yVqeC8E7kOchDNinD3hJS4OX0e1gdsx/e6COvy qNg5aL5n0Kl4fcVqM0LdIhsubVs4eiNCa5XMSYpXmVi3HAuFyg9dN+x8thSwI836FoMASwOl C7tHsTjnSGufB+D7F7ZBT61BffNBBIm1KdMxcxqLUVXpBQHHlGkbwI+3Ye+nE6HmZH7IwLwV W+Ajl7oYF+jeKaH4DZFtgLYGLtZ1LDwKPjX7VAsa4Yx7S5+EBAaZGxK510MjIx6SGrZWBrrV TEvdV00F2MnQoeXKzD7O4WFbL55hhyGgfWTHwZ457iN9SgYi1JLPqWkZB0JRXIEtjd4JEQcx +8Umfre0Xt4713VxMygW0PnQt5aSQdMD58jHFxTk092mU+yIHj5LeYgvwSgZN4airXk5yRXl SE+xAvmumFBY Organization: Red Hat GmbH Message-ID: <8a407188-5dd2-648b-fc26-f03a826bfee3@redhat.com> Date: Tue, 12 Nov 2019 14:04:07 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.1.1 MIME-Version: 1.0 In-Reply-To: <1d881e86ed58511b20883fd0031623fe6cade480.camel@linux.intel.com> Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-MC-Unique: e_WUBXFLMAGbLzceRPT1Xw-1 X-Mimecast-Spam-Score: 0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: >> Yes, they are protoypes, RFCs. You did the right thing to report the=20 >> issues so Nitesh can look into them. >=20 > I feel like you are arguing this both ways. On one side you are saying > that these are alternatives and need to be evaluated. Then on the other > side you say they are RFCs and it isn't fair to hold the outcome of a > performance evaluation against them. I guess my point is to identify if there are fundamental performance issues that cannot be solved easily later. If you have a prototype and you perform some minor changes to solve them, then I don't consider it a fundamental problem. It's still a prototype. >=20 > Maybe instead of directly comparing the two approaches we should just loo= k > at defining what requirements need to be met for either approach to be > considered acceptable. Then neither of us necessarily needs to be > comparing things directly and instead we are marching toward a set of > requirements to get to the solution that will work best overall. Makes perfect sense to me. >=20 >>>> You: Please take my patch set, it is better than the alternatives >>>> because of X, for X in {RFC quality, sparse zones, locking internals, >>>> current performance differences} >>> >>> I should have replied to Michal's original question and simply stated t= hat >>> Mel had not replied to the patches in the last month and a half. I half >>> suspect that is the reason for Andrew applying it. It put some pressure= on >>> others to provide review feedback, which if nothing else I am grateful >>> for. >>> >>> You had inserted the need to compare it against Nitesh's patch set. Whi= ch >>> based on Nitesh's email is likely going to be a little while since he >>> cannot give me an ETA. >> >> So I want us (you, me, Michal, Mel, Dave, ...) to discuss the direction= =20 >> we want to go. I'd love to do this on a design level, instead of having= =20 >> to wait for any patch set. But I guess this is harder to do? And=20 >> especially as you keep mentioning the performance difference, I think we= =20 >> should evaluate if this is an unsolvable problem or just an issue in the= =20 >> current prototype? >> >> I mean people could have a look at Niteshs older series to see how it=20 >> fundamentally differs to your approach (external tracking). Nitesh might= =20 >> have fixed some things in the mean time, and is replacing the "fake=20 >> allocation" by page isolation. But I mean, the general approach should= =20 >> be obvious and sufficient for people to make a decision. >=20 > I think the problem is what is being asked for versus how we are going > about it. Asking for performance comparisons isn't going to really lead t= o > a design discussion. I tend to agree. It's just me wondering how good of a performance we can achieve with external tracking :) >=20 > One thing that might be interesting, at least to me, might be to just > start a new thread to discuss the options/approaches. I know I haven't > really heard much about the page isolation approach. I would be intereste= d > in hearing how you guys are planning to go about implementing that. Yes, we should definitely do that. I was only skimming over the other mail exchange (won't really be working this week), but I think you identified some improvements yourself for your approach. [...] >>> fact is it is still invasive, just to different parts of the mm subsyst= em. >> >> I'd love to see how it uses the page isolation framework, and only has a= =20 >> single hook to queue pages. I don't like the way pages are pulled out of= =20 >> the buddy in Niteshs approach currently. What you have is cleaner. >=20 > I don't see how you could use the page isolation framework to pull out > free pages. Is there a thread somewhere on the topic that I missed? It's basically only isolating pages while reporting them, and not pulling them out of the buddy (IOW, you move the pages to the isolate queues where nobody is allowed to touch them, and setting the migratetype properly). This e.g., makes other user of page isolation (e.g., memory offlining, alloc_contig_range()) play nicely with these isolated pages. "somebody else just isolated them, please try again." start_isolate_page_range()/undo_isolate_page_range()/test_pages_isolated() along with a lockless check if the page is free. I think it should be something like this (ignoring different migratetypes and such for now) 1. Test lockless if page is free: Not free? Done. 2. start_isolate_page_range(): Busy? Rare race (with other isolate users or with an allocation). Done. 3. test_pages_isolated() 3a. no? Rare race, page not free anymore. undo_isolate_page_range() 3b. yes? Report, then undo_isolate_page_range() If we would run into performance issues with the current page isolation implementation (esp. locking), I think there are some nice cleanups/reworks possible of which all current users could benefit (especially accross pageblocks). >=20 >>> I would argue that one of my concerns about the hotplug and sparse >>> handling is that by skipping those for now is essentially hiding what i= s >>> likely to be some invasive code, likely not too different from what I h= ad >>> to deal with with compaction. At this point he adds more data to the zo= ne >>> struct than my changes, and I suspect as he progresses that may increas= e >>> further. > >>> I do not think it is fair to hold up review and acceptance of this patc= h >>> set for performance comparisons with a patch set with no definite ETA. >> >> Michal asked "Is there really a consensus". A consensus that we want=20 >> something like this, not that we want Nitesh's approach. It's just an=20 >> alternative worth discussing. >> >> And if you are reworking your patch set now with Mel, we might get=20 >> another alternative that everybody is pleased with. Nobody is against=20 >> reviewing your series - that's perfect, it's against picking it up and= =20 >> sending it upstream. That's my concern an Michals concern if I am not wr= ong. >=20 > I agree it is not necessarily ready for upstream yet. Thus why I am > working on a v14. My past experience has been that anything accepted at > this state is going to spend at least a couple months in the mm tree > before it is pushed. However I don't see the issue with it spending some > time in the mm tree and linux-next to get more eyes on it to identify any > potential issues or additional use cases. If anything I welcome the > additional debate, as it allows for additional opportunities for > improvement. Indeed. --=20 Thanks, David / dhildenb