From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, UNPARSEABLE_RELAY autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 55826C433DF for ; Tue, 20 Oct 2020 00:23:43 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A214C22240 for ; Tue, 20 Oct 2020 00:23:42 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A214C22240 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A53F36B005C; Mon, 19 Oct 2020 20:23:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9DCBB6B0062; Mon, 19 Oct 2020 20:23:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8A3826B0068; Mon, 19 Oct 2020 20:23:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0070.hostedemail.com [216.40.44.70]) by kanga.kvack.org (Postfix) with ESMTP id 55D366B005C for ; Mon, 19 Oct 2020 20:23:41 -0400 (EDT) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id E4FCA180AD811 for ; Tue, 20 Oct 2020 00:23:40 +0000 (UTC) X-FDA: 77390405400.24.map34_410c16c2723b Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin24.hostedemail.com (Postfix) with ESMTP id C31AF1A4A0 for ; Tue, 20 Oct 2020 00:23:40 +0000 (UTC) X-HE-Tag: map34_410c16c2723b X-Filterd-Recvd-Size: 5751 Received: from out30-54.freemail.mail.aliyun.com (out30-54.freemail.mail.aliyun.com [115.124.30.54]) by imf12.hostedemail.com (Postfix) with ESMTP for ; Tue, 20 Oct 2020 00:23:39 +0000 (UTC) X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R501e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04357;MF=richard.weiyang@linux.alibaba.com;NM=1;PH=DS;RN=11;SR=0;TI=SMTPD_---0UCbA6wY_1603153414; Received: from localhost(mailfrom:richard.weiyang@linux.alibaba.com fp:SMTPD_---0UCbA6wY_1603153414) by smtp.aliyun-inc.com(127.0.0.1); Tue, 20 Oct 2020 08:23:35 +0800 Date: Tue, 20 Oct 2020 08:23:34 +0800 From: Wei Yang To: David Hildenbrand Cc: Wei Yang , linux-kernel@vger.kernel.org, linux-mm@kvack.org, virtualization@lists.linux-foundation.org, Andrew Morton , "Michael S . Tsirkin" , Jason Wang , Pankaj Gupta , Michal Hocko , Oscar Salvador Subject: Re: [PATCH v1 29/29] virtio-mem: Big Block Mode (BBM) - safe memory hotunplug Message-ID: <20201020002334.GA61232@L-31X9LVDL-1304.local> Reply-To: Wei Yang References: <20201012125323.17509-1-david@redhat.com> <20201012125323.17509-30-david@redhat.com> <20201019075406.GE54484@L-31X9LVDL-1304.local> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Oct 19, 2020 at 10:50:39AM +0200, David Hildenbrand wrote: >On 19.10.20 09:54, Wei Yang wrote: >> On Mon, Oct 12, 2020 at 02:53:23PM +0200, David Hildenbrand wrote: >>> Let's add a safe mechanism to unplug memory, avoiding long/endless loops >>> when trying to offline memory - similar to in SBM. >>> >>> Fake-offline all memory (via alloc_contig_range()) before trying to >>> offline+remove it. Use this mode as default, but allow to enable the other >>> mode explicitly (which could give better memory hotunplug guarantees in >> >> I don't get the point how unsafe mode would have a better guarantees? > >It's primarily only relevant when there is a lot of concurrent action >going on while unplugging memory. Using alloc_contig_range() on >ZONE_MOVABLE can fail more easily than memory offlining. > >alloc_contig_range() doesn't try as hard as memory offlining code to >isolate memory. There are known issues with temporary page pinning >(e.g., when a process dies) and the PCP. (mostly discovered via CMA >allocations) > >See the TODO I add in patch #14. > >[...] >>> >>> + if (bbm_safe_unplug) { >>> + /* >>> + * Start by fake-offlining all memory. Once we marked the device >>> + * block as fake-offline, all newly onlined memory will >>> + * automatically be kept fake-offline. Protect from concurrent >>> + * onlining/offlining until we have a consistent state. >>> + */ >>> + mutex_lock(&vm->hotplug_mutex); >>> + virtio_mem_bbm_set_bb_state(vm, bb_id, >>> + VIRTIO_MEM_BBM_BB_FAKE_OFFLINE); >>> + >> >> State is set here. >> >>> + for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) { >>> + page = pfn_to_online_page(pfn); >>> + if (!page) >>> + continue; >>> + >>> + rc = virtio_mem_fake_offline(pfn, PAGES_PER_SECTION); >>> + if (rc) { >>> + end_pfn = pfn; >>> + goto rollback_safe_unplug; >>> + } >>> + } >>> + mutex_unlock(&vm->hotplug_mutex); >>> + } >>> + >>> rc = virtio_mem_bbm_offline_and_remove_bb(vm, bb_id); >>> - if (rc) >>> + if (rc) { >>> + if (bbm_safe_unplug) { >>> + mutex_lock(&vm->hotplug_mutex); >>> + goto rollback_safe_unplug; >>> + } >>> return rc; >>> + } >>> >>> rc = virtio_mem_bbm_unplug_bb(vm, bb_id); >>> if (rc) >> >> And changed to PLUGGED or UNUSED based on rc. > >Right, after offlining+remove succeeded. So no longer added to Linux. > >The final state depends on the success of the unplug request towards the >hypervisor. > >> >>> @@ -1987,6 +2069,17 @@ static int virtio_mem_bbm_offline_remove_and_unplug_bb(struct virtio_mem *vm, >>> virtio_mem_bbm_set_bb_state(vm, bb_id, >>> VIRTIO_MEM_BBM_BB_UNUSED); >>> return rc; >>> + >>> +rollback_safe_unplug: >>> + for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) { >>> + page = pfn_to_online_page(pfn); >>> + if (!page) >>> + continue; >>> + virtio_mem_fake_online(pfn, PAGES_PER_SECTION); >>> + } >>> + virtio_mem_bbm_set_bb_state(vm, bb_id, VIRTIO_MEM_BBM_BB_ADDED); >> >> And changed to ADDED if failed. > >Right, back to the initial state when entering this function. > >> >>> + mutex_unlock(&vm->hotplug_mutex); >>> + return rc; >>> } >> >> So in which case, the bbm state is FAKE_OFFLINE during >> virtio_mem_bbm_notify_going_offline() and >> virtio_mem_bbm_notify_cancel_offline() ? > >Exactly, so we can do our magic with fake-offline pages and our >virtio_mem_bbm_offline_and_remove_bb() can actually succeed. Ah, my fault. The exact code flow is this: virtio_mem_bbm_offline_remove_and_unplug_bb() virtio_mem_bbm_set_bb_state(vm, bb_id, VIRTIO_MEM_BBM_BB_FAKE_OFFLINE) virtio_mem_fake_offline(pfn, PAGES_PER_SECTION) virtio_mem_bbm_offline_and_remove_bb(vm, bb_id) offline and trigger memory notification --- 1) The notification is necessary at 1) to release the refcount, which is grabbed during fake offline. > > >-- >Thanks, > >David / dhildenb -- Wei Yang Help you, Help me