From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, UNPARSEABLE_RELAY,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6BB30C4363D for ; Thu, 24 Sep 2020 01:57:10 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D7B1E21D20 for ; Thu, 24 Sep 2020 01:57:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D7B1E21D20 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 03B566B0055; Wed, 23 Sep 2020 21:57:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F2CCB6B005D; Wed, 23 Sep 2020 21:57:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E46696B0062; Wed, 23 Sep 2020 21:57:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0063.hostedemail.com [216.40.44.63]) by kanga.kvack.org (Postfix) with ESMTP id CE4016B0055 for ; Wed, 23 Sep 2020 21:57:08 -0400 (EDT) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 906168249980 for ; Thu, 24 Sep 2020 01:57:08 +0000 (UTC) X-FDA: 77296292136.25.death11_36059422715b Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin25.hostedemail.com (Postfix) with ESMTP id 6873A1804E3A1 for ; Thu, 24 Sep 2020 01:57:08 +0000 (UTC) X-HE-Tag: death11_36059422715b X-Filterd-Recvd-Size: 6207 Received: from out30-132.freemail.mail.aliyun.com (out30-132.freemail.mail.aliyun.com [115.124.30.132]) by imf05.hostedemail.com (Postfix) with ESMTP for ; Thu, 24 Sep 2020 01:57:04 +0000 (UTC) X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R231e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04400;MF=richard.weiyang@linux.alibaba.com;NM=1;PH=DS;RN=21;SR=0;TI=SMTPD_---0U9uuGE7_1600912620; Received: from localhost(mailfrom:richard.weiyang@linux.alibaba.com fp:SMTPD_---0U9uuGE7_1600912620) by smtp.aliyun-inc.com(127.0.0.1); Thu, 24 Sep 2020 09:57:01 +0800 Date: Thu, 24 Sep 2020 09:57:00 +0800 From: Wei Yang To: Vlastimil Babka Cc: David Hildenbrand , osalvador@suse.de, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-hyperv@vger.kernel.org, xen-devel@lists.xenproject.org, linux-acpi@vger.kernel.org, Andrew Morton , Alexander Duyck , Dave Hansen , Haiyang Zhang , "K. Y. Srinivasan" , Mel Gorman , Michael Ellerman , Michal Hocko , Mike Rapoport , Scott Cheloha , Stephen Hemminger , Wei Liu , Wei Yang Subject: Re: [PATCH RFC 0/4] mm: place pages to the freelist tail when onling and undoing isolation Message-ID: <20200924015700.GA3145@L-31X9LVDL-1304.local> Reply-To: Wei Yang References: <5c0910c2cd0d9d351e509392a45552fb@suse.de> <67928cbd-950a-3279-bf9b-29b04c87728b@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <67928cbd-950a-3279-bf9b-29b04c87728b@suse.cz> Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Sep 23, 2020 at 04:31:25PM +0200, Vlastimil Babka wrote: >On 9/16/20 9:31 PM, David Hildenbrand wrote: >>=20 >>=20 >>> Am 16.09.2020 um 20:50 schrieb osalvador@suse.de: >>>=20 >>> =EF=BB=BFOn 2020-09-16 20:34, David Hildenbrand wrote: >>>> When adding separate memory blocks via add_memory*() and onlining th= em >>>> immediately, the metadata (especially the memmap) of the next block = will be >>>> placed onto one of the just added+onlined block. This creates a chai= n >>>> of unmovable allocations: If the last memory block cannot get >>>> offlined+removed() so will all dependant ones. We directly have unmo= vable >>>> allocations all over the place. >>>> This can be observed quite easily using virtio-mem, however, it can = also >>>> be observed when using DIMMs. The freshly onlined pages will usually= be >>>> placed to the head of the freelists, meaning they will be allocated = next, >>>> turning the just-added memory usually immediately un-removable. The >>>> fresh pages are cold, prefering to allocate others (that might be ho= t) >>>> also feels to be the natural thing to do. >>>> It also applies to the hyper-v balloon xen-balloon, and ppc64 dlpar:= when >>>> adding separate, successive memory blocks, each memory block will ha= ve >>>> unmovable allocations on them - for example gigantic pages will fail= to >>>> allocate. >>>> While the ZONE_NORMAL doesn't provide any guarantees that memory can= get >>>> offlined+removed again (any kind of fragmentation with unmovable >>>> allocations is possible), there are many scenarios (hotplugging a lo= t of >>>> memory, running workload, hotunplug some memory/as much as possible)= where >>>> we can offline+remove quite a lot with this patchset. >>>=20 >>> Hi David, >>>=20 >>=20 >> Hi Oscar. >>=20 >>> I did not read through the patchset yet, so sorry if the question is = nonsense, but is this not trying to fix the same issue the vmemmap patche= s did? [1] >>=20 >> Not nonesense at all. It only helps to some degree, though. It solves = the dependencies due to the memmap. However, it=E2=80=98s not completely = ideal, especially for single memory blocks. >>=20 >> With single memory blocks (virtio-mem, xen-balloon, hv balloon, ppc dl= par) you still have unmovable (vmemmap chunks) all over the physical addr= ess space. Consider the gigantic page example after hotplug. You directly= fragmented all hotplugged memory. >>=20 >> Of course, there might be (less extreme) dependencies due page tables = for the identity mapping, extended struct pages and similar. >>=20 >> Having that said, there are other benefits when preferring other memor= y over just hotplugged memory. Think about adding+onlining memory during = boot (dimms under QEMU, virtio-mem), once the system is up you will have = most (all) of that memory completely untouched. >>=20 >> So while vmemmap on hotplugged memory would tackle some part of the is= sue, there are cases where this approach is better, and there are even be= nefits when combining both. > >I see the point, but I don't think the head/tail mechanism is great for = this. It >might sort of work, but with other interfering activity there are no gua= rantees >and it relies on a subtle implementation detail. There are better mechan= isms >possible I think, such as preparing a larger MIGRATE_UNMOVABLE area in t= he >existing memory before we allocate those long-term management structures= . Or >onlining a bunch of blocks as zone_movable first and only later convert = to >zone_normal in a controlled way when existing normal zone becomes depete= d? > To be honest, David's approach is easy to understand for me. And I don't see some negative effect. >I guess it's an issue that the e.g. 128M block onlines are so disconnect= ed from >each other it's hard to employ a strategy that works best for e.g. a who= le bunch >of GB onlined at once. But I noticed some effort towards new API, so may= be that >will be solved there too? > >> Thanks! >>=20 >> David >>=20 >>>=20 >>> I was about to give it a new respin now that thw hwpoison stuff has b= een settled. >>>=20 >>> [1] https://patchwork.kernel.org/cover/11059175/ >>>=20 >>=20 --=20 Wei Yang Help you, Help me