From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, UNPARSEABLE_RELAY autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 94B90C433DF for ; Fri, 16 Oct 2020 10:02:18 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 087BE2073A for ; Fri, 16 Oct 2020 10:02:17 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 087BE2073A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 26EE0940007; Fri, 16 Oct 2020 06:02:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 21F34900002; Fri, 16 Oct 2020 06:02:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 136FC940007; Fri, 16 Oct 2020 06:02:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0005.hostedemail.com [216.40.44.5]) by kanga.kvack.org (Postfix) with ESMTP id F06CF900002 for ; Fri, 16 Oct 2020 06:02:16 -0400 (EDT) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id B01F2181AEF10 for ; Fri, 16 Oct 2020 10:02:16 +0000 (UTC) X-FDA: 77377348272.16.lift29_5414aa52721c Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin16.hostedemail.com (Postfix) with ESMTP id 816EC10252410 for ; Fri, 16 Oct 2020 10:02:16 +0000 (UTC) X-HE-Tag: lift29_5414aa52721c X-Filterd-Recvd-Size: 6764 Received: from out30-42.freemail.mail.aliyun.com (out30-42.freemail.mail.aliyun.com [115.124.30.42]) by imf06.hostedemail.com (Postfix) with ESMTP for ; Fri, 16 Oct 2020 10:02:14 +0000 (UTC) X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R691e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04423;MF=richard.weiyang@linux.alibaba.com;NM=1;PH=DS;RN=9;SR=0;TI=SMTPD_---0UCBIqR._1602842531; Received: from localhost(mailfrom:richard.weiyang@linux.alibaba.com fp:SMTPD_---0UCBIqR._1602842531) by smtp.aliyun-inc.com(127.0.0.1); Fri, 16 Oct 2020 18:02:11 +0800 Date: Fri, 16 Oct 2020 18:02:11 +0800 From: Wei Yang To: David Hildenbrand Cc: Wei Yang , linux-kernel@vger.kernel.org, linux-mm@kvack.org, virtualization@lists.linux-foundation.org, Andrew Morton , "Michael S . Tsirkin" , Jason Wang , Pankaj Gupta Subject: Re: [PATCH v1 05/29] virtio-mem: generalize check for added memory Message-ID: <20201016100211.GI44269@L-31X9LVDL-1304.local> Reply-To: Wei Yang References: <20201012125323.17509-1-david@redhat.com> <20201012125323.17509-6-david@redhat.com> <20201015082808.GE86495@L-31X9LVDL-1304.local> <994394f3-c16d-911c-c9fc-d2280f32e7b1@redhat.com> <20201016021651.GI86495@L-31X9LVDL-1304.local> <5caec772-295c-436a-2b19-ca261ea1ad0c@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5caec772-295c-436a-2b19-ca261ea1ad0c@redhat.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Oct 16, 2020 at 11:11:24AM +0200, David Hildenbrand wrote: >>> That's an interesting corner case. Assume you have a 128MB memory block >>> but only 64MB are plugged. >> >> Since we just plug a part of memory block, this state is OFFLINE_PARTIAL >> first. But then we would add these memory and online it. This means the state >> of this memory block is ONLINE_PARTIAL. >> >> When this state is changed to OFFLINE_PARTIAL again? > >Please note that memory onlining is *completely* controllable by user >space. User space can offline/online memory blocks as it wants. Not >saying this might actually be the right thing to do - but we cannot >trust that user space does the right thing. > >So at any point in time, you have to assume that > >a) added memory might not get onlined >b) previously onlined memory might get offlined >c) previously offline memory might get onlined > >> >>> >>> As long as we have our online_pages callback in place, we can hinder the >>> unplugged 64MB from getting exposed to the buddy >>> (virtio_mem_online_page_cb()). However, once we unloaded the driver, >> >> Yes, >> >> virtio_mem_set_fake_offline() would __SetPageOffline() to those pages. >> >>> this is no longer the case. If someone would online that memory block, >>> we would expose unplugged memory to the buddy - very bad. >>> >> >> Per my understanding, at this point of time, the memory block is at online >> state. Even part of it is set to *fake* offline. >> >> So how could user trigger another online from sysfs interface? > >Assume we added a partially plugged memory block, which is now offline. >Further assume user space did not online the memory block (e.g., no udev >rules). > >User space could happily online the block after unloading the driver. >Again, we have to assume user space could do crazy things. > You are right, online memory is not a forced behavior. >> >>> So we have to remove these partially plugged, offline memory blocks when >>> losing control over them. >>> >>> I tried to document that via: >>> >>> "After we unregistered our callbacks, user space can online partially >>> plugged offline blocks. Make sure to remove them." >>> >>>> >>>> Also, during virtio_mem_remove(), we just handle OFFLINE_PARTIAL memory block. >>>> How about memory block in other states? It is not necessary to remove >>>> ONLINE[_PARTIAL] memroy blocks? >>> >>> Blocks that are fully plugged (ONLINE or OFFLINE) can get >>> onlined/offlined without us having to care. Works fine - we only have to >>> care about partially plugged blocks. >>> >>> While we *could* unplug OFFLINE blocks, there is no way we can >>> deterministically offline+remove ONLINE blocks. So that memory has to >>> stay, even after we unloaded the driver (similar to the dax/kmem driver). >> >> For OFFLINE memory blocks, would that leave the situation: >> >> Guest doesn't need those pages, while host still maps them? > >Yes, but the guest could online the memory and make use of it. > >(again, whoever decides to unload the driver better be knowing what he does) > >To do it even more cleanly, we would > >a) Have to remove completely plugged offline blocks (not done) >b) Have to remove partially plugged offline blocks (done) >c) Actually send unplug requests to the hypervisor > >Right now, only b) is done, because it might actually cause harm (as >discussed). However, the problem is, that c) might actually fail. > >Long short: we could add a) if it turns out to be a real issue. But >than, unloading the driver isn't really suggested, the current >implementation just "keeps it working without crashes" - and I guess >that's good enough for now. > >> >>> >>> ONLINE_PARTIAL is already taken care of: it cannot get offlined anymore, >>> as we still hold references to these struct pages >>> (virtio_mem_set_fake_offline()), and as we no longer have the memory >>> notifier in place, we can no longer agree to offline this memory (when >>> going_offline). >>> >> >> Ok, I seems to understand the logic now. >> >> But how we prevent ONLINE_PARTIAL memory block get offlined? There are three >> calls in virtio_mem_set_fake_offline(), while all of them adjust page's flag. >> How they hold reference to struct page? > >Sorry, I should have given you the right pointer. (similar to my other >reply) > >We hold a reference either via > >1. alloc_contig_range() I am not familiar with this one, need to spend some time to look into. >2. memmap init code, when not calling generic_online_page(). I may miss some code here. Before online pages, memmaps are allocated in section_activate(). They are supposed to be zero-ed. (I don't get the exact code line.) I am not sure when we grab a refcount here. > >So these fake-offline pages can never be actually offlined, because we >no longer have the memory notifier registered to fix that up. > >-- >Thanks, > >David / dhildenb -- Wei Yang Help you, Help me