From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8A89BC433EF for ; Fri, 17 Jun 2022 09:40:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 073E86B0071; Fri, 17 Jun 2022 05:40:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0233D6B0073; Fri, 17 Jun 2022 05:40:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E2E1C6B0074; Fri, 17 Jun 2022 05:40:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id D2A046B0071 for ; Fri, 17 Jun 2022 05:40:36 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay11.hostedemail.com (Postfix) with ESMTP id 9A1408068C for ; Fri, 17 Jun 2022 09:40:36 +0000 (UTC) X-FDA: 79587232872.28.42310AC Received: from mail-pf1-f175.google.com (mail-pf1-f175.google.com [209.85.210.175]) by imf21.hostedemail.com (Postfix) with ESMTP id 67EB11C001E for ; Fri, 17 Jun 2022 09:40:35 +0000 (UTC) Received: by mail-pf1-f175.google.com with SMTP id s37so3707266pfg.11 for ; Fri, 17 Jun 2022 02:40:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=oP9b1fRe1FrYO+cRjF7MOt6K6rFA5naLprw/nGX+IIc=; b=dFgM3JVWxR5Y3jWKGxOgJTH/OCbEngqqOMW/DUPt1hXXSe8xR7N3XYxfk98e/nLL3K G8qPMV6EXSkfQp6ypfAvTFX0/5Ch/IhiA8vIzRtvV8zoLqB6l7w/ydv+Y42IP/IOf6t/ pzNLKGUFH3NeK1Qixa3QD3m4G5O1K1oFDWn6Xg/sWee2VHFJaDhrzxXpRbHvE86S+0n5 Oxoa9pa/BXXqKvRcGAlpZq+otlPlBVn7E7MZZTQaSZN/B/kUMjhnZWZbcU4w+lEZQIdt K2OT+MHVz++Xq+07Nqu148ck5wOuaOzqjvSsfNg2Fvm+nBd3oF5b1SxFqQjVird3WZLK xY3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=oP9b1fRe1FrYO+cRjF7MOt6K6rFA5naLprw/nGX+IIc=; b=jkcdnc1/tg275xwTE3Rb2T7snGBSD8dI+lNIMShttQOuFahWU3uAydJAknX687OBOe xZdSzXSJCFW3wO8zcrNz71qcMI2DrmcsE/gVjAcTyUKYOc8EIt/I4kH83FlPlhNNLxpu Eon6SUvDnQD23KfSjo8D7tIU+9ZV7uNO7HwXJBv1sYWT09IHKhqVzSgDjaseqq2FpfYh HJBSJEI9aQ85amqyyvtJLODHbcjqNoz0+RMTWyOljovbbULLVdpwG2pBPuxg0SjsbB4S lhEQLfTsRAMjYr6K8Zb1iIYsq1toLBAM+iCd46FQOhfmB4QYlNbP2QRJOx11SVVWuyjK hhDA== X-Gm-Message-State: AJIora/4U9DlB/WR1XLRASMTfvHLf/KHBCnWK4Nj3PyQA7NVM0V05WTj g2XwSx0EFKiTycB57PjhG6ubRw== X-Google-Smtp-Source: AGRyM1vEn8ceYz8KxZK0n+8LVfnirfs9naCdtivRNjInsHXP6hfgB3ai/WEkLU4q+sRnf4JgqXVDQw== X-Received: by 2002:a05:6a00:319b:b0:51b:c5ec:62ba with SMTP id bj27-20020a056a00319b00b0051bc5ec62bamr9221172pfb.1.1655458834211; Fri, 17 Jun 2022 02:40:34 -0700 (PDT) Received: from localhost ([139.177.225.255]) by smtp.gmail.com with ESMTPSA id t14-20020a63954e000000b0040c644e82efsm84034pgn.43.2022.06.17.02.40.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 Jun 2022 02:40:32 -0700 (PDT) Date: Fri, 17 Jun 2022 17:40:28 +0800 From: Muchun Song To: David Hildenbrand Cc: Oscar Salvador , corbet@lwn.net, akpm@linux-foundation.org, paulmck@kernel.org, mike.kravetz@oracle.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, smuchun@gmail.com Subject: Re: [PATCH v2 2/2] mm: memory_hotplug: introduce SECTION_CANNOT_OPTIMIZE_VMEMMAP Message-ID: References: <20220520025538.21144-1-songmuchun@bytedance.com> <20220520025538.21144-3-songmuchun@bytedance.com> <53024884-0182-df5f-9ca2-00652c64ce36@redhat.com> <24d5ec20-9c9e-93aa-11f4-c4619f51f7d1@redhat.com> <79a1ca29-de8e-6456-460b-a9099340fec4@redhat.com> <751b003d-e7c2-dfcc-82b3-e80612cdc104@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <751b003d-e7c2-dfcc-82b3-e80612cdc104@redhat.com> ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655458836; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=oP9b1fRe1FrYO+cRjF7MOt6K6rFA5naLprw/nGX+IIc=; b=Pjiz7x9iC1jdxtfn6h6cuP3YNf2KaruVS49eZhRJZPJpph+2649hYgY2jo8sVs7YEQWiC/ FQdOUtYsR1Cyg/Hps/eiA0rxcNJTSIsU/uYwUbW5WkfpIrGB6umGcyxGyiEJ9dEp1zYV82 Bsi2d8261wRAtpxkhOZkZ145kozHrPI= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=dFgM3JVW; spf=pass (imf21.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.210.175 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655458836; a=rsa-sha256; cv=none; b=sB0rBnuOrvC0OEA2IXizW5K0A2vNxeyzmoNf+D6srncZYtdo55XybeEh5MUuzJryLE4yFL NK/4DL2qkMs23n7rZgvYUSeTPDGbHlVx9nGkc6QBLtA4duHeNHrghuwwO0+iPPcKTN7s/y Xx7sHd++OuW1Z+e59Qpsin7yd02g7wk= X-Stat-Signature: 9bxdu85i68rczzid68cho4zn3bkaumpe X-Rspamd-Queue-Id: 67EB11C001E X-Rspamd-Server: rspam11 X-Rspam-User: Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=dFgM3JVW; spf=pass (imf21.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.210.175 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com X-HE-Tag: 1655458835-41635 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Jun 17, 2022 at 11:25:20AM +0200, David Hildenbrand wrote: > On 17.06.22 11:10, Muchun Song wrote: > > On Fri, Jun 17, 2022 at 09:39:27AM +0200, David Hildenbrand wrote: > >> On 17.06.22 09:28, Muchun Song wrote: > >>> On Fri, Jun 17, 2022 at 07:46:53AM +0200, Oscar Salvador wrote: > >>>> On Thu, Jun 16, 2022 at 09:30:33AM +0200, David Hildenbrand wrote: > >>>>> IIRC, that was used to skip these patches on the offlining path before > >>>>> we provided the ranges to offline_pages(). > >>>> > >>>> Yeah, it was designed for that purpose back then. > >>>> > >>>>> I'd not mess with PG_reserved, and give them a clearer name, to not > >>>>> confuse them with other, ordinary, vmemmap pages that are not > >>>>> self-hosted (maybe in the future we might want to flag all vmemmap pages > >>>>> with a new type?). > >>>> > >>>> Not sure whether a new type is really needed, or to put it another way, I > >>>> cannot see the benefit. > >>>> > >>>>> > >>>>> I'd just try reusing the flag PG_owner_priv_1. And eventually, flag all > >>>>> (v)memmap pages with a type PG_memmap. However, the latter would be > >>>>> optional and might not be strictly required > >>>>> > >>>>> > >>>>> So what think could make sense is > >>>>> > >>>>> /* vmemmap pages that are self-hosted and cannot be optimized/freed. */ > >>>>> PG_vmemmap_self_hosted = PG_owner_priv_1, > >>>> > >>>> Sure, I just lightly tested the below, and seems to work, but not sure > >>>> whether that is what you are referring to. > >>>> @Munchun: thoughts? > >>>> > >>> > >>> I think it works and fits my requirement. > >>> > >>>> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h > >>>> index e66f7aa3191d..a4556afd7bda 100644 > >>>> --- a/include/linux/page-flags.h > >>>> +++ b/include/linux/page-flags.h > >>>> @@ -193,6 +193,11 @@ enum pageflags { > >>>> > >>>> /* Only valid for buddy pages. Used to track pages that are reported */ > >>>> PG_reported = PG_uptodate, > >>>> + > >>>> +#ifdef CONFIG_MEMORY_HOTPLUG > >>>> + /* For self-hosted memmap pages */ > >>>> + PG_vmemmap_self_hosted = PG_owner_priv_1, > >>>> +#endif > >>>> }; > >>>> > >>>> #define PAGEFLAGS_MASK ((1UL << NR_PAGEFLAGS) - 1) > >>>> @@ -628,6 +633,10 @@ PAGEFLAG_FALSE(SkipKASanPoison, skip_kasan_poison) > >>>> */ > >>>> __PAGEFLAG(Reported, reported, PF_NO_COMPOUND) > >>>> > >>>> +#ifdef CONFIG_MEMORY_HOTPLUG > >>>> +PAGEFLAG(Vmemmap_self_hosted, vmemmap_self_hosted, PF_ANY) > >>>> +#endif > >>>> + > >>>> /* > >>>> * On an anonymous page mapped into a user virtual memory area, > >>>> * page->mapping points to its anon_vma, not to a struct address_space; > >>>> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c > >>>> index 1089ea8a9c98..e2de7ed27e9e 100644 > >>>> --- a/mm/hugetlb_vmemmap.c > >>>> +++ b/mm/hugetlb_vmemmap.c > >>>> @@ -101,6 +101,14 @@ void hugetlb_vmemmap_free(struct hstate *h, struct page *head) > >>>> { > >>>> unsigned long vmemmap_addr = (unsigned long)head; > >>>> unsigned long vmemmap_end, vmemmap_reuse, vmemmap_pages; > >>>> + struct mem_section *ms = __pfn_to_section(page_to_pfn(head)); > >>>> + struct page *memmap; > >>>> + > >>>> + memmap = sparse_decode_mem_map(ms->section_mem_map, > >>>> + pfn_to_section_nr(page_to_pfn(head))); > >>>> + > >>>> + if (PageVmemmap_self_hosted(memmap)) > >>>> + return; > >>> > >>> I think here needs a loop if it is a 1GB page (spans multiple sections). > >>> Right? Here is an implementation based on another approach. But I think > >>> your implementation is more simpler and efficient. Would you mind me > >>> squash your diff into my patch and with your "Co-developed-by"? > >> > >> Due to hugtlb alignment requirements, and the vmemmap pages being at the > >> start of the hotplugged memory region, I think that cannot currently > >> happen. Checking the first vmemmap page might be good enough for now, > >> and probably for the future. > >> > > > > If the memory block size is 128MB, then a 1GB huge page spans 8 blocks. > > Is it possible that some blocks of them are vmemmap-hosted? > > No, don't think so. If you think about it, a huge/gigantic page can only > start in a memmap-on-memory region but never end in on (or overlap one) > -- because the reserved memmap part of the memory block always precedes > actually usable data. > > So even with variable-size memory blocks and weird address alignment, > checking the first memmap of a huge page for vmemmp-on-memory should be > sufficient. > > Unless I am missing something. > Got it. You are awesome. I ignored the fact that we have reserved some memory as vmemmap pages in memmap-on-memory case, the whole memory block cannot be used as a gigantic page. Thanks for your nice explanation.