From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.1 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id ADFDDC43460 for ; Thu, 20 May 2021 11:59:36 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 478A96135B for ; Thu, 20 May 2021 11:59:36 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 478A96135B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id DCB206B00C3; Thu, 20 May 2021 07:59:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DA2588E0006; Thu, 20 May 2021 07:59:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BF5358E0005; Thu, 20 May 2021 07:59:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0031.hostedemail.com [216.40.44.31]) by kanga.kvack.org (Postfix) with ESMTP id 8D70C6B00C3 for ; Thu, 20 May 2021 07:59:35 -0400 (EDT) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 38A2ABF18 for ; Thu, 20 May 2021 11:59:35 +0000 (UTC) X-FDA: 78161464710.16.256F64F Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf09.hostedemail.com (Postfix) with ESMTP id 8B324600025B for ; Thu, 20 May 2021 11:59:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1621511974; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=JBAQo3FtbFzeEZGaxuRtcZF/NyukBTNcZelW04MqfX4=; b=c1CxjSrqRKCUU1T/x785R9WsXjtRpuRJZ3PuKLAprm4CbTst1+UgXWT5oKkT3Y/EWjaxgZ rKc8GLQJb+DruVhmUeYPBxcfU2t8FezldGdklqWv4d0YxYTpscpD+5cT24q9JlvUVlfUuK laIHnji6QqJX1y6F3OD2FOLVcKD0BCo= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-224-rqd83pTnPKGLC4Uw42psDQ-1; Thu, 20 May 2021 07:59:31 -0400 X-MC-Unique: rqd83pTnPKGLC4Uw42psDQ-1 Received: by mail-wr1-f69.google.com with SMTP id d12-20020adfc3cc0000b029011166e2f1a7so7562297wrg.19 for ; Thu, 20 May 2021 04:59:30 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:organization :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=JBAQo3FtbFzeEZGaxuRtcZF/NyukBTNcZelW04MqfX4=; b=L2gbqS6I4LIQU+l0JeDDBw2iP3snBzdZFS7Y02BUqXnBlpNagEVuZufWRVdrGa5kHZ 0fsOa+R9emSPnx7wFcDt1avQ059KO22cJpYLlDwl2fH9Z0oU4iyIHgXhecjvWZ202lDX nvncGQRQXsZbOoZovClSayVC6hcCJN90h1x2GeoKh9Sib12KK6MqJIu9649bMcRI1dNe bAPY45LtpD1wfZu6XEHqCTbHxpkAensYt8Khiq0ihuN9uzQ0dmVFFhj0VNNIokmedKxF rnlnvcxq+dwJS/t5NwaEIg+Ij7erJLlFI4G0DXFzke6LeRqIsNI6yGZ4t2KjRht5F5W5 gnXg== X-Gm-Message-State: AOAM5318DOoyYjAqH0DU6ywTjW70YoOljedTTQpI720PJDZ9s6AokvV7 Sz7tARNsp6Q5w+PUIk+bbL87JACoIF/7HpIn0LN19doi8a+KxnSqhjZdjwK06lVXahDx9KNrBZB DghrX5G921M0= X-Received: by 2002:a1c:7fd0:: with SMTP id a199mr3256513wmd.161.1621511969574; Thu, 20 May 2021 04:59:29 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwkFujVPAteUWEO5DO4dowktrsEOc85LrVAZwnahENNHCoVm2vyCin1TRLLH8zU4SOOP8G4EQ== X-Received: by 2002:a1c:7fd0:: with SMTP id a199mr3256481wmd.161.1621511969239; Thu, 20 May 2021 04:59:29 -0700 (PDT) Received: from [192.168.3.132] (p5b0c6315.dip0.t-ipconnect.de. [91.12.99.21]) by smtp.gmail.com with ESMTPSA id v15sm8757755wmj.39.2021.05.20.04.59.28 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 20 May 2021 04:59:28 -0700 (PDT) Subject: Re: [PATCH] arm64: mm: hugetlb: add support for free vmemmap pages of HugeTLB To: Anshuman Khandual , Muchun Song , will@kernel.org, akpm@linux-foundation.org, bodeddub@amazon.com, osalvador@suse.de, mike.kravetz@oracle.com, rientjes@google.com Cc: linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, fam.zheng@bytedance.com, zhengqi.arch@bytedance.com References: <20210518091826.36937-1-songmuchun@bytedance.com> <1b9d008a-7544-cc85-5c2f-532b984eb5b5@arm.com> <88114091-fbb2-340d-b69b-a572fa340265@redhat.com> From: David Hildenbrand Organization: Red Hat Message-ID: <45c1a368-3d31-e92d-f120-4dca0eb2111d@redhat.com> Date: Thu, 20 May 2021 13:59:28 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.10.1 MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=c1CxjSrq; spf=none (imf09.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 8B324600025B X-Stat-Signature: whmhnefa93h5uqmsp1cqhyt3s5kqtfjk X-HE-Tag: 1621511973-343861 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 20.05.21 13:54, Anshuman Khandual wrote: >=20 > On 5/19/21 5:33 PM, David Hildenbrand wrote: >> On 19.05.21 13:45, Anshuman Khandual wrote: >>> >>> >>> On 5/18/21 2:48 PM, Muchun Song wrote: >>>> The preparation of supporting freeing vmemmap associated with each >>>> HugeTLB page is ready, so we can support this feature for arm64. >>>> >>>> Signed-off-by: Muchun Song >>>> --- >>>> =C2=A0 arch/arm64/mm/mmu.c | 5 +++++ >>>> =C2=A0 fs/Kconfig=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 | 2 +- >>>> =C2=A0 2 files changed, 6 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c >>>> index 5d37e461c41f..967b01ce468d 100644 >>>> --- a/arch/arm64/mm/mmu.c >>>> +++ b/arch/arm64/mm/mmu.c >>>> @@ -23,6 +23,7 @@ >>>> =C2=A0 #include >>>> =C2=A0 #include >>>> =C2=A0 #include >>>> +#include >>>> =C2=A0 =C2=A0 #include >>>> =C2=A0 #include >>>> @@ -1134,6 +1135,10 @@ int __meminit vmemmap_populate(unsigned long = start, unsigned long end, int node, >>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 pmd_t *pmdp; >>>> =C2=A0 =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 WARN_ON((start < VMEMMAP_STAR= T) || (end > VMEMMAP_END)); >>>> + >>>> +=C2=A0=C2=A0=C2=A0 if (is_hugetlb_free_vmemmap_enabled() && !altmap= ) >>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return vmemmap_populate_= basepages(start, end, node, altmap); >>> >>> Not considering the fact that this will force the kernel to have only >>> base page size mapping for vmemmap (unless altmap is also requested) >>> which might reduce the performance, it also enables vmemmap mapping t= o >>> be teared down or build up at runtime which could potentially collide >>> with other kernel page table walkers like ptdump or memory hotremove >>> operation ! How those possible collisions are protected right now ? >> >> Hi Anshuman, >> >> Memory hotremove is not an issue IIRC. At the time memory is removed, = all huge pages either have been migrated away or dissolved; the vmemmap i= s stable. >=20 > But what happens when a hot remove section's vmemmap area (which is bei= ng > teared down) is nearby another vmemmap area which is either created or > being destroyed for HugeTLB alloc/free purpose. As you mentioned HugeTL= B > pages inside the hot remove section might be safe. But what about other > HugeTLB areas whose vmemmap area shares page table entries with vmemmap > entries for a section being hot removed ? Massive HugeTLB alloc/use/fre= e > test cycle using memory just adjacent to a memory hotplug area, which i= s > always added and removed periodically, should be able to expose this pr= oblem. >=20 > IIUC unlike vmalloc(), vmemap mapping areas in the kernel page table we= re > always constant unless there are hotplug add or remove operations which > are protected with a hotplug lock. Now with this change, we could have > simultaneous walking and add or remove of the vmemap areas without any > synchronization. Is not this problematic ? >=20 > On arm64 memory hot remove operation empties free portions of the vmemm= ap > table after clearing them. Hence all concurrent walkers (hugetlb_vmemma= p, > hot remove, ptdump etc) need to be synchronized against hot remove. >=20 > From arch/arm64/mm/mmu.c >=20 > void vmemmap_free(unsigned long start, unsigned long end, > struct vmem_altmap *altmap) > { > #ifdef CONFIG_MEMORY_HOTPLUG > WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END)); >=20 > unmap_hotplug_range(start, end, true, altmap); > free_empty_tables(start, end, VMEMMAP_START, VMEMMAP_END); > #endif > } You are right, however, AFAIR 1) We always populate base pages, meaning we only modify PTEs and not=20 actually add/remove page tables when creating/destroying a hugetlb page.=20 Page table walkers should be fine and not suddenly run into a=20 use-after-free. 2) For pfn_to_page() users to never fault, we have to do an atomic=20 exchange of PTES, meaning, someone traversing a page table looking for=20 pte_none() entries (like free_empty_tables() in your example) should=20 never get a false positive. Makes sense, or am I missing something? >=20 >> >> vmemmap access (accessing the memmap via a virtual address) itself is = not an issue. Manually walking (vmemmap) page tables might behave >=20 > Right. >=20 > differently, not sure if ptdump would require any synchronization. >=20 > Dumping an wrong value is probably okay but crashing because a page tab= le > entry is being freed after ptdump acquired the pointer is bad. On arm64= , > ptdump() is protected against hotremove via [get|put]_online_mems(). Okay, and as the feature in question only exchanges PTEs, we should be=20 fine. --=20 Thanks, David / dhildenb