From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=UKVd=OP=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 3F80CC433EF
	for <linux-mm@archiver.kernel.org>; Sat, 25 Sep 2021 00:36:38 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id CC77761039
	for <linux-mm@archiver.kernel.org>; Sat, 25 Sep 2021 00:36:37 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org CC77761039
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=quicinc.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org
Received: by kanga.kvack.org (Postfix)
	id 56CA96B0071; Fri, 24 Sep 2021 20:36:37 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 51D106B0072; Fri, 24 Sep 2021 20:36:37 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 3BCCE900002; Fri, 24 Sep 2021 20:36:37 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0103.hostedemail.com [216.40.44.103])
	by kanga.kvack.org (Postfix) with ESMTP id 261DF6B0071
	for <linux-mm@kvack.org>; Fri, 24 Sep 2021 20:36:37 -0400 (EDT)
Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay04.hostedemail.com (Postfix) with ESMTP id D8DEB3015E
	for <linux-mm@kvack.org>; Sat, 25 Sep 2021 00:36:36 +0000 (UTC)
X-FDA: 78624229992.30.1C0E05E
Received: from alexa-out-sd-02.qualcomm.com (alexa-out-sd-02.qualcomm.com [199.106.114.39])
	by imf12.hostedemail.com (Postfix) with ESMTP id 3934B10000A5
	for <linux-mm@kvack.org>; Sat, 25 Sep 2021 00:36:36 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
  d=quicinc.com; i=@quicinc.com; q=dns/txt; s=qcdkim;
  t=1632530196; x=1664066196;
  h=message-id:date:mime-version:subject:to:cc:references:
   from:in-reply-to:content-transfer-encoding;
  bh=lnnXXdf/yo9MQY2KlKeLp08sm4OCbgh7fqceG1KReHM=;
  b=Kj/UqLRtK8sG4bZ5ouZ5i4O53M/PWfnrDJcQLz1U1Os/50UUeyu2WK3q
   O1zjyCni7jHxjuR647TfEEI8o6VObmzrf9/bGHEOnzBrV0V8CNShygDNY
   9h+s3d8S4OZL+nM22xoEqzazixAGu6gsBTLa83mucDtb9yAdPBNrgbq5E
   E=;
Received: from unknown (HELO ironmsg-SD-alpha.qualcomm.com) ([10.53.140.30])
  by alexa-out-sd-02.qualcomm.com with ESMTP; 24 Sep 2021 17:36:35 -0700
X-QCInternal: smtphost
Received: from nasanex01b.na.qualcomm.com ([10.46.141.250])
  by ironmsg-SD-alpha.qualcomm.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Sep 2021 17:36:34 -0700
Received: from [10.111.163.122] (10.80.80.8) by nasanex01b.na.qualcomm.com
 (10.46.141.250) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.922.7; Fri, 24 Sep 2021
 17:36:32 -0700
Message-ID: <a3bf84c4-8f35-f273-145c-55928a06f332@quicinc.com>
Date: Fri, 24 Sep 2021 17:36:30 -0700
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
 Thunderbird/91.1.1
Subject: Re: [RFC] arm64: mm: update max_pfn after memory hotplug
Content-Language: en-US
To: David Hildenbrand <david@redhat.com>, Florian Fainelli
	<f.fainelli@gmail.com>, Chris Goldsworthy <quic_cgoldswo@quicinc.com>,
	Catalin Marinas <catalin.marinas@arm.com>, Will Deacon <will@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>, <anshuman.khandual@arm.com>
CC: <linux-arm-kernel@lists.infradead.org>, <linux-kernel@vger.kernel.org>,
	<linux-mm@kvack.org>, Doug Berger <opendmb@gmail.com>
References: <cover.1632437225.git.quic_cgoldswo@quicinc.com>
 <595d09279824faf1f54961cef52b745609b05d97.1632437225.git.quic_cgoldswo@quicinc.com>
 <6eb8319d-acba-b69a-5db3-5dca9ef426e8@gmail.com>
 <41789cad-76c6-0ea5-4aa1-3e4a52acff86@redhat.com>
From: Sudarshan Rajagopalan <quic_sudaraja@quicinc.com>
In-Reply-To: <41789cad-76c6-0ea5-4aa1-3e4a52acff86@redhat.com>
Content-Type: text/plain; charset="UTF-8"; format=flowed
Content-Transfer-Encoding: 7bit
X-Originating-IP: [10.80.80.8]
X-ClientProxiedBy: nasanex01b.na.qualcomm.com (10.46.141.250) To
 nasanex01b.na.qualcomm.com (10.46.141.250)
X-Rspamd-Server: rspam02
X-Rspamd-Queue-Id: 3934B10000A5
X-Stat-Signature: disthteb4e5xwbrkh384c78u31pnkgba
Authentication-Results: imf12.hostedemail.com;
	dkim=pass header.d=quicinc.com header.s=qcdkim header.b="Kj/UqLRt";
	spf=pass (imf12.hostedemail.com: domain of quic_sudaraja@quicinc.com designates 199.106.114.39 as permitted sender) smtp.mailfrom=quic_sudaraja@quicinc.com;
	dmarc=pass (policy=none) header.from=quicinc.com
X-HE-Tag: 1632530196-881480
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

+Anshuman

Thanks for the responses. Do we see any issues with having this patch go 
for merge? I agree with David that 'max_pfn' is just a hint about 
maximum possible PFN that can go. So it doesn't need updating when 
memory is hotunplugged out.

Also regarding comment about possible reserved memory region with 
"no-map" attribute - I'm not sure if we such reserved memory allocations 
are allowed post early init when memory blocks are hot added later on 
after arm64_memblock_init is done? These are all done during setup_arch 
right?

We found this issue/bug where kernel inits with limited boot memory 
using 'mem=' and a driver hot-adds memory blocks during module init 
using add_memory_driver_managed and owns it. This makes 'max_pfn' out of 
sync and breaks /proc/kpageXXX kpageflags_read() functionality where any 
processes using pages beyond this outdated 'max_pfn', 
stable_page_flags() would simply return KPF_NOPAGE.

I think this is generic bug that exists with arm64 memory hotplug where 
any consumers of 'max_pfn' would break its functionality after memory 
blocks are hotplugged in after init. If this patch looks OK and be 
ACKed, we can have it in GKI. Other suggestions on proper fix or 
feedback are also welcome.

On 9/24/2021 1:17 AM, David Hildenbrand wrote:
> On 24.09.21 04:47, Florian Fainelli wrote:
>>
>>
>> On 9/23/2021 3:54 PM, Chris Goldsworthy wrote:
>>> From: Sudarshan Rajagopalan <quic_sudaraja@quicinc.com>
>>>
>>> After new memory blocks have been hotplugged, max_pfn and max_low_pfn
>>> needs updating to reflect on new PFNs being hot added to system.
>>>
>>> Signed-off-by: Sudarshan Rajagopalan <quic_sudaraja@quicinc.com>
>>> Signed-off-by: Chris Goldsworthy <quic_cgoldswo@quicinc.com>
>>> ---
>>>    arch/arm64/mm/mmu.c | 5 +++++
>>>    1 file changed, 5 insertions(+)
>>>
>>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>>> index cfd9deb..fd85b51 100644
>>> --- a/arch/arm64/mm/mmu.c
>>> +++ b/arch/arm64/mm/mmu.c
>>> @@ -1499,6 +1499,11 @@ int arch_add_memory(int nid, u64 start, u64 
>>> size,
>>>        if (ret)
>>>            __remove_pgd_mapping(swapper_pg_dir,
>>>                         __phys_to_virt(start), size);
>>> +    else {
>>> +        max_pfn = PFN_UP(start + size);
>>> +        max_low_pfn = max_pfn;
>>> +    }
>>
>> This is a drive by review, but it got me thinking about your changes 
>> a bit:
>>
>> - if you raise max_pfn when you hotplug memory, don't you need to lower
>> it when you hot unplug memory as well?
>
> The issue with lowering is that you actually have to do some search to 
> figure out the actual value -- and it's not really worth the trouble. 
> Raising the limit is easy.
>
> With memory hotunplug, anybody wanting to take a look at a "struct 
> page" via a pfn has to do a pfn_to_online_page() either way. That will 
> fail if there isn't actually a memmap anymore because the memory has 
> been unplugged. So "max_pfn" is actually rather a hint what maximum 
> pfn to look at, and it can be bigger than it actually is.
>
> The a look at the example usage in fs/proc/page.c:kpageflags_read()
>
> pfn_to_online_page() will simply fail and stable_page_flags() will 
> indicate a KPF_NOPAGE.
>
> Just like we would have a big memory hole now at the end of memory.
>
>>
>> - suppose that you have a platform which maps physical memory into the
>> CPU's address space at 0x00_4000_0000 (1GB offset) and the kernel boots
>> with 2GB of DRAM plugged by default. At that point we have not
>> registered a swiotlb because we have less than 4GB of addressable
>> physical memory, there is no IOMMU in that system, it's a happy world.
>> Now assume that we plug an additional 2GB of DRAM into that system
>> adjacent to the previous 2GB, from 0x00_C0000_0000 through
>> 0x14_0000_0000, now we have physical addresses above 4GB, but we still
>> don't have a swiotlb, some of our DMA_BIT_MASK(32) peripherals are going
>> to be unable to DMA from that hot plugged memory, but they could if we
>> had a swiotlb.
>
> That's why platforms that hotplug memory should indicate the maximum 
> possible PFN via some mechanism during boot. On x86-64 (and IIRC also 
> arm64 now), this is done via the ACPI SRAT.
>
> And that's where "max_possible_pfn" and "max_pfn" differ. See 
> drivers/acpi/numa/srat.c:acpi_numa_memory_affinity_init():
>
>     max_possible_pfn = max(max_possible_pfn, PFN_UP(end - 1));$
>
>
> Using max_possible_pfn, the OS can properly setup the swiotlb, even 
> thought it wouldn't currently be required when just looking at max_pfn.
>
> I documented that for virtio-mem in
>     https://virtio-mem.gitlab.io/user-guide/user-guide-linux.html
> "swiotlb and DMA memory".
>
>>
>> - now let's go even further but this is very contrived. Assume that the
>> firmware has somewhat created a reserved memory region with a 'no-map'
>> attribute thus indicating it does not want a struct page to be created
>> for a specific PFN range, is it valid to "blindly" raise max_pfn if that
>> region were to be at the end of the just hot-plugged memory?
>
> no-map means that no direct mapping is to be created, right? We would 
> still have a memmap IIRC, and the pages are PG_reserved.
>
> Again, I think this is very similar to just having no-map regions like 
> random memory holes within the existing memory layout.
>
>
> What Chris proposes here is very similar to 
> arch/x86/mm/init_64.c:update_end_of_memory_vars() called during 
> arch_add_memory()->add_pages() on x86-64.
>