From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B0B7DC433DB for ; Tue, 16 Mar 2021 17:45:30 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4548865120 for ; Tue, 16 Mar 2021 17:45:30 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4548865120 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D6E776B006E; Tue, 16 Mar 2021 13:45:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D1DB16B0070; Tue, 16 Mar 2021 13:45:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BBFA36B0071; Tue, 16 Mar 2021 13:45:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0119.hostedemail.com [216.40.44.119]) by kanga.kvack.org (Postfix) with ESMTP id 9EB916B006E for ; Tue, 16 Mar 2021 13:45:29 -0400 (EDT) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 53952180AD83E for ; Tue, 16 Mar 2021 17:45:29 +0000 (UTC) X-FDA: 77926464378.09.6836BAF Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf09.hostedemail.com (Postfix) with ESMTP id D590C6000129 for ; Tue, 16 Mar 2021 17:45:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1615916726; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kj43tFwVRr2NmhS0pno3wwc/9fOdv1FSTvbdHxmpOdI=; b=O4V3k6lMOpnvT+uh6gaSSXzu35O/zqEFVzymrXCWR3M8TNHaNZWc0JoLn/RAaMGfL15uZH 2blSfro7Jq+cAXAyF25o4hUW2EbZK0SoHy0LxKPUJhMPwBwk6ZlZGzE+n3YP/eFmqmI5U8 g+kwQ63xuYzX3rnHB46X6oAvWFXV+Mc= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-383-LEH97x4aMoq-B_tMNgGNfg-1; Tue, 16 Mar 2021 13:45:22 -0400 X-MC-Unique: LEH97x4aMoq-B_tMNgGNfg-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 238CA817469; Tue, 16 Mar 2021 17:45:20 +0000 (UTC) Received: from [10.36.114.203] (ovpn-114-203.ams2.redhat.com [10.36.114.203]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0812219D61; Tue, 16 Mar 2021 17:45:17 +0000 (UTC) Subject: Re: [PATCH v4 1/5] mm,memory_hotplug: Allocate memmap from the added memory range From: David Hildenbrand To: Oscar Salvador Cc: Andrew Morton , Michal Hocko , Anshuman Khandual , Vlastimil Babka , Pavel Tatashin , linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20210309175546.5877-1-osalvador@suse.de> <20210309175546.5877-2-osalvador@suse.de> <20210315102224.GA24699@linux> Organization: Red Hat GmbH Message-ID: Date: Tue, 16 Mar 2021 18:45:17 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: D590C6000129 X-Stat-Signature: h61ea96d896kuqtjf65brhzypas7kuu5 Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf09; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=216.205.24.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1615916726-411276 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 16.03.21 17:46, David Hildenbrand wrote: > On 15.03.21 11:22, Oscar Salvador wrote: >> On Thu, Mar 11, 2021 at 08:06:53PM +0100, David Hildenbrand wrote: >>> This looks essentially good to me, except some parts in >>> mhp_supports_memmap_on_memory() >>> >>>> +bool mhp_supports_memmap_on_memory(unsigned long size) >>>> +{ >>>> + unsigned long pageblock_size = PFN_PHYS(pageblock_nr_pages); >>>> + unsigned long remaining_mem = size - PMD_SIZE; >> >> Hi David, thanks for the review! >> >>> This looks weird. I think what you want to test is that >>> >>> >>> a) "nr_vmemmap_pages * sizeof(struct page)" spans complete PMDs (IOW, we >>> won't map too much via the altmap when populating a PMD in the vmemmap) >>> >>> b) "remaining = size - nr_vmemmap_pages * sizeof(struct page)" spans >>> complete pageblock. >> >> We do not know the nr_vmemmap_pages at this point in time, although it is >> easy to pre-compute. >> >> For every section we populate, we use PMD_SIZE. So, PMD_SIZE/PAGE_SIZE lays >> the nr_vmemmap_pages that are used for populating a single section. > > I find that cross reference to vmemmap code a little hard to digest. > I would have assume that we don't have to care about PMDs in this > code here at all. The vmemmap population code should handle that. > > I think I already mentioned that somewhere, I think it should be like this: > > a) vmemmap code should *never* populate more memory than requested for > a single memory section when we are populating from the altmap. > If that cannot be guaranteed for PMDs, then we have to fallback > to populating base pages. Populating PMDs from an altmap with > sizeof(struct page) == 64 is highly dangerous. > > Assume we have sizeof(struct page) == 56. A 128 MiB section > spans 32768 pages - we need 32768 * sizeof(struct page) > space for the vmemmap. > > With 64k pages we *can* use exactly one PMD. With 56k pages > we need 448 individual (full!) pages for the vmemmap. > > IOW, we don't care how vmemmap code will do the mapping. > vmemmap code has to get it right. IMHO, asserting it in > this code is wrong. > > > b) In this code, we really should only care about what > memory onlining/offlining code can or can't do. > We really only care that > > 1) size == memory_block_size_bytes() > 2) remaining_size > 3) IS_ALIGNED(remaining_size, pageblock_size); > > > I think a) is a logical consequence of b) for x86-64, > whereby the pageblock size corresponds to PMD, so we > might not have to care about a) right now. > > See below for my take. > > >> >> But let me explain the reasoning I use in the current code: >> >> I will enumarate the assumptions that must hold true in order to support the >> feature together with their check: >> >> - We span a single memory block >> >> size == memory_block_size_bytes() >> >> - The vmemmap pages span a complete PMD and no more than a PMD. >> >> !(PMD_SIZE % sizeof(struct page)) >> >> - The vmemmap pages and the pages exposed to the buddy have to cover full >> pageblocks >> >> remaining_mem = size - PMD_SIZE; >> IS_ALIGNED(remaining_mem, pageblock_size) >> >> Although this check only covers the range without the vmemmap pages, one could >> argue that since we use only a PMD_SIZE at a time, we know that PMD_SIZE is >> pageblock aligned, so the vmemmap range is PMD_SIZE as well. >> >> Now, I see how this might be confusing and rather incomplete. >> So I guess a better and more clear way to write it would be: >> >> bool mhp_supports_memmap_on_memory(unsigned long size) >> { >> unsigned long nr_vmemmap_pages = PMD_SIZE / PAGE_SIZE; >> unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page); >> unsigned long remaining_size = size - vmemmap_size; >> >> return memmap_on_memory && >> IS_ENABLED(CONFIG_MHP_MEMMAP_ON_MEMORY) && >> size == memory_block_size_bytes() && >> !(PMD_SIZE % vmemmap_size) && >> IS_ALIGNED(vmemmap_size, pageblock_size) && >> remaining_size && >> IS_ALIGNED(remaining_size, pageblock_size); >> } >> >> Note that above check is only for a single section, but if assumptions hold true >> for a single section, it will for many as well. > > Okay, please document the statement about single sections, that's > important to understand what's happening. > > My take would be > > bool mhp_supports_memmap_on_memory(unsigned long size) > { > /* > * Note: We calculate for a single memory section. The calculation > * implicitly covers memory blocks that span multiple sections. > */ > unsigned long nr_vmemmap_pages = SECTION_SIZE / PAGE_SIZE; > unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page); > unsigned long remaining_size = size - vmemmap_size; > > /* > * Note: vmemmap code has to make sure to not populate more memory > * via the altmap than absolutely necessary for a single section. > * This always holds when we allocate pageblock-sized chunks from > * the altmap, as we require pageblock alignment here. > * > * TODO, other doc > */ > return memmap_on_memory && > IS_ENABLED(CONFIG_MHP_MEMMAP_ON_MEMORY) && > size == memory_block_size_bytes() && > remaining_size && > IS_ALIGNED(remaining_size, pageblock_size); Oh, I forgot, we can also drop the check for "remaining_size" in that code. If we ever have PAGE_SIZE == sizeof(struct page) we'll be in bigger trouble :D -- Thanks, David / dhildenb