From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, UNPARSEABLE_RELAY,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2606BC43464 for ; Fri, 18 Sep 2020 02:31:05 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B3A0F23770 for ; Fri, 18 Sep 2020 02:31:04 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B3A0F23770 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 12F8C6B00BB; Thu, 17 Sep 2020 22:31:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0B8A66B00BC; Thu, 17 Sep 2020 22:31:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EC2826B00BD; Thu, 17 Sep 2020 22:31:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0023.hostedemail.com [216.40.44.23]) by kanga.kvack.org (Postfix) with ESMTP id D27D96B00BB for ; Thu, 17 Sep 2020 22:31:03 -0400 (EDT) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 9C6308249980 for ; Fri, 18 Sep 2020 02:31:03 +0000 (UTC) X-FDA: 77274604806.12.rate05_03026b527127 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin12.hostedemail.com (Postfix) with ESMTP id 78D7C1800994B for ; Fri, 18 Sep 2020 02:31:03 +0000 (UTC) X-HE-Tag: rate05_03026b527127 X-Filterd-Recvd-Size: 5141 Received: from out30-130.freemail.mail.aliyun.com (out30-130.freemail.mail.aliyun.com [115.124.30.130]) by imf32.hostedemail.com (Postfix) with ESMTP for ; Fri, 18 Sep 2020 02:31:01 +0000 (UTC) X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R191e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04394;MF=richard.weiyang@linux.alibaba.com;NM=1;PH=DS;RN=21;SR=0;TI=SMTPD_---0U9GuiaC_1600396252; Received: from localhost(mailfrom:richard.weiyang@linux.alibaba.com fp:SMTPD_---0U9GuiaC_1600396252) by smtp.aliyun-inc.com(127.0.0.1); Fri, 18 Sep 2020 10:30:52 +0800 Date: Fri, 18 Sep 2020 10:30:51 +0800 From: Wei Yang To: David Hildenbrand Cc: osalvador@suse.de, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-hyperv@vger.kernel.org, xen-devel@lists.xenproject.org, linux-acpi@vger.kernel.org, Andrew Morton , Alexander Duyck , Dave Hansen , Haiyang Zhang , "K. Y. Srinivasan" , Mel Gorman , Michael Ellerman , Michal Hocko , Mike Rapoport , Scott Cheloha , Stephen Hemminger , Vlastimil Babka , Wei Liu , Wei Yang Subject: Re: [PATCH RFC 0/4] mm: place pages to the freelist tail when onling and undoing isolation Message-ID: <20200918023051.GE54754@L-31X9LVDL-1304.local> Reply-To: Wei Yang References: <5c0910c2cd0d9d351e509392a45552fb@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Sep 16, 2020 at 09:31:21PM +0200, David Hildenbrand wrote: > > >> Am 16.09.2020 um 20:50 schrieb osalvador@suse.de: >>=20 >> =EF=BB=BFOn 2020-09-16 20:34, David Hildenbrand wrote: >>> When adding separate memory blocks via add_memory*() and onlining the= m >>> immediately, the metadata (especially the memmap) of the next block w= ill be >>> placed onto one of the just added+onlined block. This creates a chain >>> of unmovable allocations: If the last memory block cannot get >>> offlined+removed() so will all dependant ones. We directly have unmov= able >>> allocations all over the place. >>> This can be observed quite easily using virtio-mem, however, it can a= lso >>> be observed when using DIMMs. The freshly onlined pages will usually = be >>> placed to the head of the freelists, meaning they will be allocated n= ext, >>> turning the just-added memory usually immediately un-removable. The >>> fresh pages are cold, prefering to allocate others (that might be hot= ) >>> also feels to be the natural thing to do. >>> It also applies to the hyper-v balloon xen-balloon, and ppc64 dlpar: = when >>> adding separate, successive memory blocks, each memory block will hav= e >>> unmovable allocations on them - for example gigantic pages will fail = to >>> allocate. >>> While the ZONE_NORMAL doesn't provide any guarantees that memory can = get >>> offlined+removed again (any kind of fragmentation with unmovable >>> allocations is possible), there are many scenarios (hotplugging a lot= of >>> memory, running workload, hotunplug some memory/as much as possible) = where >>> we can offline+remove quite a lot with this patchset. >>=20 >> Hi David, >>=20 > >Hi Oscar. > >> I did not read through the patchset yet, so sorry if the question is n= onsense, but is this not trying to fix the same issue the vmemmap patches= did? [1] > >Not nonesense at all. It only helps to some degree, though. It solves th= e dependencies due to the memmap. However, it=E2=80=98s not completely id= eal, especially for single memory blocks. > >With single memory blocks (virtio-mem, xen-balloon, hv balloon, ppc dlpa= r) you still have unmovable (vmemmap chunks) all over the physical addres= s space. Consider the gigantic page example after hotplug. You directly f= ragmented all hotplugged memory. > >Of course, there might be (less extreme) dependencies due page tables fo= r the identity mapping, extended struct pages and similar. > >Having that said, there are other benefits when preferring other memory = over just hotplugged memory. Think about adding+onlining memory during bo= ot (dimms under QEMU, virtio-mem), once the system is up you will have mo= st (all) of that memory completely untouched. > >So while vmemmap on hotplugged memory would tackle some part of the issu= e, there are cases where this approach is better, and there are even bene= fits when combining both. While everything changes with shuffle. > >Thanks! > >David > >>=20 >> I was about to give it a new respin now that thw hwpoison stuff has be= en settled. >>=20 >> [1] https://patchwork.kernel.org/cover/11059175/ >>=20 --=20 Wei Yang Help you, Help me