From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.1 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 89457C433DB for ; Fri, 26 Mar 2021 08:53:08 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C3452619FE for ; Fri, 26 Mar 2021 08:53:07 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C3452619FE Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 40FE36B0036; Fri, 26 Mar 2021 04:53:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3BFEB6B006E; Fri, 26 Mar 2021 04:53:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 273026B0070; Fri, 26 Mar 2021 04:53:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 0DE956B0036 for ; Fri, 26 Mar 2021 04:53:07 -0400 (EDT) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id B53D2180ACF75 for ; Fri, 26 Mar 2021 08:53:06 +0000 (UTC) X-FDA: 77961410772.29.EF6CD99 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf11.hostedemail.com (Postfix) with ESMTP id E9F43200024F for ; Fri, 26 Mar 2021 08:53:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1616748785; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=snShQz8EXjjPJgp2ULZFPA/A2D46xCzf1Hq2y3U4qug=; b=K+sasgzlTLJSnHmYGyeBZxEDvUR28YXVSapXUHSIQiCI0fXrcZD725/+U1xvYYOdQGufBM pPbTSS7H+JXvJZEHYWuTk0OONNUonRfzXo6KBJmUmP+SNy8jsc0z5dDdvd35LXBSdJxKXV 6x/LBf1hxLmZo1FgoRtnCfFFfXbJ85g= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-169-Ve0gsv2kNa6dnPMAREF3Cg-1; Fri, 26 Mar 2021 04:53:03 -0400 X-MC-Unique: Ve0gsv2kNa6dnPMAREF3Cg-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 5C187853EC3; Fri, 26 Mar 2021 08:53:01 +0000 (UTC) Received: from [10.36.114.167] (ovpn-114-167.ams2.redhat.com [10.36.114.167]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5993D891D0; Fri, 26 Mar 2021 08:52:59 +0000 (UTC) To: Michal Hocko , Oscar Salvador Cc: Andrew Morton , Anshuman Khandual , Vlastimil Babka , Pavel Tatashin , linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <31c3e6f7-f631-7b00-2c33-518b0f24a75f@redhat.com> <40fac999-2d28-9205-23f0-516fa9342bbe@redhat.com> <92fe19d0-56ac-e929-a9c1-d6a4e0da39d1@redhat.com> From: David Hildenbrand Organization: Red Hat GmbH Subject: Re: [PATCH v5 1/5] mm,memory_hotplug: Allocate memmap from the added memory range Message-ID: <5be95091-b4ac-8e05-4694-ac5c65f790a4@redhat.com> Date: Fri, 26 Mar 2021 09:52:58 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Stat-Signature: tx8d716ixr51czxkwg1m3d1tfcfws31n X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: E9F43200024F Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf11; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=216.205.24.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1616748782-755155 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 26.03.21 09:35, Michal Hocko wrote: > On Thu 25-03-21 23:06:50, Oscar Salvador wrote: >> On Thu, Mar 25, 2021 at 05:47:30PM +0100, Michal Hocko wrote: >>> On Thu 25-03-21 17:36:22, Michal Hocko wrote: >>>> If all it takes is to make pfn_to_online_page work (and my >>>> previous attempt is incorrect because it should consult block rather >>>> than section pfn range) >>> >>> This should work. >> >> Sorry, but while this solves some of the issues with that approach, I = really >> think that overcomplicates things and buys us not so much in return. >> To me it seems that we are just patching things to make it work that >> way. >=20 > I do agree that special casing vmemmap areas is something I do not > really like but we are in that schr=C3=B6dinger situation when this mem= ory is > not onlineable unless it shares memory section with an onlineable > memory. There are two ways around that, either special case it on > pfn_to_online_page or mark the vmemmap section online even though it is > not really. >=20 >> To be honest, I dislike this, and I guess we can only agree to disagre= e >> here. >=20 > No problem there. I will not insist on my approach unless I can convinc= e > you that it is a better solution. It seems I have failed and I can live > with that. >=20 >> I find the following much easier, cleaner, and less risky to encounter >> pitfalls in the future: >> >> (!!!It is untested and incomplete, and I would be surprised if it even >> compiles, but it is enough as a PoC !!!) >> >> diff --git a/drivers/base/memory.c b/drivers/base/memory.c >> index 5ea2b3fbce02..799d14fc2f9b 100644 >> --- a/drivers/base/memory.c >> +++ b/drivers/base/memory.c >> @@ -169,6 +169,60 @@ int memory_notify(unsigned long val, void *v) >> return blocking_notifier_call_chain(&memory_chain, val, v); >> } >> >> +static int memory_block_online(unsigned long start_pfn, unsigned long= nr_pages, >> + unsigned long nr_vmemmap_pages, int online_type, >> + int nid) >> +{ >> + int ret; >> + /* >> + * Despite vmemmap pages having a different lifecycle than the pages >> + * they describe, initialiating and accounting vmemmap pages at the >> + * online/offline stage eases things a lot. >=20 > This requires quite some explaining. >=20 >> + * We do it out of {online,offline}_pages, so those routines only ha= ve >> + * to deal with pages that are actual usable memory. >> + */ >> + if (nr_vmemmap_pages) { >> + struct zone *z; >> + >> + z =3D zone_for_pfn_range(online_type, nid, start_pfn, nr_pages); >> + move_pfn_range_to_zone(z, start_pfn, nr_vmemmap_pages, NULL, >> + MIGRATE_UNMOVABLE); >> + /* >> + * The below could go to a helper to make it less bulky here, >> + * so {online,offline}_pages could also use it. >> + */ >> + z->present_pages +=3D nr_pages; >> + pgdat_resize_lock(z->zone_pgdat, &flags); >> + z->zone_pgdat->node_present_pages +=3D nr_pages; >> + pgdat_resize_unlock(z->zone_pgdat, &flags); Might have to set fully spanned section online. (vmemmap >=3D SECTION_SIZ= E) >> + } >> + >> + ret =3D online_pages(start_pfn + nr_vmemmap_pages, nr_pages - nr_vme= mmap_pages, >> + online_type); >> + >> + /* >> + * In case online_pages() failed for some reason, we should cleanup = vmemmap >> + * accounting as well. >> + */ >> + return ret; >> +} >=20 > Yes this is much better! Just a minor suggestion would be to push > memory_block all the way to memory_block_online (it oline a memory > block). I would also slightly prefer to provide 2 helpers that would ma= ke > it clear that this is to reserve/cleanup the vmemamp space (defined in > the memory_hotplug proper). >=20 > Thanks! >=20 Something else to note: We'll not call the memory notifier (e.g., MEM_ONLINE) for the vmemmap.=20 The result is that 1. We won't allocate extended struct pages for the range. Don't think=20 this is really problematic (pages are never allocated/freed, so I guess=20 we don't care - like ZONE_DEVICE code). 2. We won't allocate kasan shadow memory. We most probably have to do it=20 explicitly via kasan_add_zero_shadow()/kasan_remove_zero_shadow(), see=20 mm/memremap.c:pagemap_range() Further a locking rework might be necessary. We hold the device hotplug=20 lock, but not the memory hotplug lock. E.g., for get_online_mems().=20 Might have to move that out online_pages. --=20 Thanks, David / dhildenb