From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.1 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D3374C433DB for ; Thu, 18 Mar 2021 11:24:28 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4E54F64F2B for ; Thu, 18 Mar 2021 11:24:28 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4E54F64F2B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 852236B0074; Thu, 18 Mar 2021 07:24:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7EF756B0075; Thu, 18 Mar 2021 07:24:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 68FBC6B0078; Thu, 18 Mar 2021 07:24:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0077.hostedemail.com [216.40.44.77]) by kanga.kvack.org (Postfix) with ESMTP id 4F05D6B0074 for ; Thu, 18 Mar 2021 07:24:27 -0400 (EDT) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 094B3181AF5F7 for ; Thu, 18 Mar 2021 11:24:27 +0000 (UTC) X-FDA: 77932761732.13.E26F116 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf11.hostedemail.com (Postfix) with ESMTP id A1F3D200024F for ; Thu, 18 Mar 2021 11:24:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1616066665; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CKgKig4qhgfYAN5+Xyhcsw2h6SqxVFiqlkAsYQ9Uxzk=; b=AuCixRuLxYhT2S8pomla42doo8PMxFuxgc29ltp+Deo6fbuB4sIqyq7EmR9sK5kxJJevze fD407IVhq+vlOzvSWrt6XlOeiPOLtVMv2Op3tbgzWlSZNwO5rNGaMpidyshaT9XwlJBEMo pk9xjBx2GOyq29ocWCde5gPKDv3h/kk= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-341-7p8QkfV3OtC-7pYIeYaAnw-1; Thu, 18 Mar 2021 07:24:21 -0400 X-MC-Unique: 7p8QkfV3OtC-7pYIeYaAnw-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 74AE88189C6; Thu, 18 Mar 2021 11:24:19 +0000 (UTC) Received: from [10.36.113.61] (ovpn-113-61.ams2.redhat.com [10.36.113.61]) by smtp.corp.redhat.com (Postfix) with ESMTP id 696DE5C1D1; Thu, 18 Mar 2021 11:24:17 +0000 (UTC) To: Oscar Salvador Cc: Andrew Morton , Michal Hocko , Anshuman Khandual , Vlastimil Babka , Pavel Tatashin , linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20210309175546.5877-1-osalvador@suse.de> <20210309175546.5877-2-osalvador@suse.de> <20210315102224.GA24699@linux> <20210317140847.GA20407@linux> From: David Hildenbrand Organization: Red Hat GmbH Subject: Re: [PATCH v4 1/5] mm,memory_hotplug: Allocate memmap from the added memory range Message-ID: <51c645b3-1220-80c4-e44c-4c0411222148@redhat.com> Date: Thu, 18 Mar 2021 12:24:16 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: A1F3D200024F X-Stat-Signature: 7f9tnsafde63bqmyqqkw9apbdipknoiq Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf11; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=216.205.24.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1616066665-140777 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 18.03.21 11:38, Oscar Salvador wrote: > On Thu, Mar 18, 2021 at 09:27:48AM +0100, Oscar Salvador wrote: >>> If we check for >>> >>> IS_ALIGNED(nr_vmemmap_pages, PMD_SIZE), please add a proper TODO comm= ent >>> that this is most probably the wrong place to take care of this. >> >> Sure, I will stuff the check in there and place a big TODO comment so = we >> do not forget about addressing this issue the right way. >=20 > Ok, I realized something while working on v5. >=20 > Here is what I have right now: >=20 > bool mhp_supports_memmap_on_memory(unsigned long size) > { > /* > * Note: We calculate for a single memory section. The calcula= tion > * implicitly covers memory blocks that span multiple sections= . > * > * Not all archs define SECTION_SIZE, but MIN_MEMORY_BLOCK_SIZ= E always > * equals SECTION_SIZE, so use that instead. > */ > unsigned long nr_vmemmap_pages =3D MIN_MEMORY_BLOCK_SIZE / PAG= E_SIZE; Even clearer would be just using "size / PAGE_SIZE" here. The you can=20 even drop the comment. > unsigned long vmemmap_size =3D nr_vmemmap_pages * sizeof(struc= t page); > unsigned long remaining_size =3D size - vmemmap_size; > =20 > /* > * Besides having arch support and the feature enabled at runt= ime, we > * need a few more assumptions to hold true: > * > * a) We span a single memory block: memory onlining/offlinin;= g happens > * in memory block granularity. We don't want the vmemmap o= f online > * memory blocks to reside on offline memory blocks. In the= future, > * we might want to support variable-sized memory blocks to= make the > * feature more versatile. > * > * b) The vmemmap pages span complete PMDs: We don't want vmem= map code > * to populate memory from the altmap for unrelated parts (= i.e., > * other memory blocks) > * > * c) The vmemmap pages (and thereby the pages that will be ex= posed to > * the buddy) have to cover full pageblocks: memory onlinin= g/offlining > * code requires applicable ranges to be page-aligned, for = example, to > * set the migratetypes properly. > * > * TODO: Although we have a check here to make sure that vmemm= ap pages > * fully populate a PMD, it is not the right place to ch= eck for > * this. A much better solution involves improving vmemm= ap code > * to fallback to base pages when trying to populate vme= mmap using > * altmap as an alternative source of memory, and we do = not exactly > * populate a single PMD. > */ > return memmap_on_memory && > IS_ENABLED(CONFIG_MHP_MEMMAP_ON_MEMORY) && > size =3D=3D memory_block_size_bytes() && > remaining_size && > IS_ALIGNED(remaining_size, pageblock_size) && > IS_ALIGNED(vmemmap_size, PMD_SIZE); > } >=20 > Assume we are on x86_64 to simplify the case. >=20 > Above, nr_vmemmap_pages would be 32768 and vmemmap_size 2MB (exactly = a > PMD). >=20 > Now, although correct, this nr_vmemmap_pages does not match with the > altmap->alloc. >=20 > static void * __meminit altmap_alloc_block_buf(unsigned long size, > struct altmap) > { > ... > ... > nr_pfns =3D size >> PAGE_SHIFT; //size is PMD_SIZE > altmap->alloc +=3D nr_pfns; > } >=20 > altmap->alloc will be 512, 512 * 4K pages =3D 2MB. >=20 > Of course, the reason they do not match is because in one case, we are > saying a) how many pfns we need to cover a PMD_SIZE, while in the > other case we say b) how many pages we need to cover SECTION_SIZE >=20 > Then b) multiply for page_size to get the current size of it. I don't follow. 2MB =3D=3D 2MB. And if there would be difference then we=20 would be in the problem I brought up: vmemmap code allocating too much=20 via the altmap, which can be very bad because might be populating more=20 vmemmap than we actually need. >=20 > So, I have mixed feeling about this. > Would it be more clear to just do: >=20 > bool mhp_supports_memmap_on_memory(unsigned long size) > { > /* > * Note: We calculate for a single memory section. The calcula= tion > * implicitly covers memory blocks that span multiple sections= . > */ Then this comment is wrong > unsigned long nr_vmemmap_pages =3D PMD_SIZE / PAGE_SIZE; And this stuff just gets confusing. nr_vmemmap_pages =3D 2MiB / 4 KiB =3D 512; > unsigned long vmemmap_size =3D nr_vmemmap_pages * PAGE_SIZE; vmemmap_size =3D 512 * 4KiB =3D 2 MiB. That calculation wasn't very useful (/ PAGE_SIZE * PAGE_SIZE)? > unsigned long remaining_size =3D size - vmemmap_size; And here we could get something like remaining_size =3D 2 GiB - 2 MiB ? Which does not make any sense. > ... > ... >=20 >=20 --=20 Thanks, David / dhildenb