From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 951E8C0015E for ; Fri, 28 Jul 2023 04:01:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A6B9E6B0072; Fri, 28 Jul 2023 00:01:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A1C306B0074; Fri, 28 Jul 2023 00:01:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 90BB86B0075; Fri, 28 Jul 2023 00:01:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 835AD6B0072 for ; Fri, 28 Jul 2023 00:01:22 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 538FDA05DC for ; Fri, 28 Jul 2023 04:01:22 +0000 (UTC) X-FDA: 81059670804.26.1A3336F Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf02.hostedemail.com (Postfix) with ESMTP id F401080004 for ; Fri, 28 Jul 2023 04:01:19 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf02.hostedemail.com: domain of anshuman.khandual@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=anshuman.khandual@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690516880; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gh0LO9oN4HNnAJ1jLFZFXyfBLdUoxcxyjjJx25XdPZg=; b=GK2zPneQA8qn9TbF2XXlHXlGG66itcNh2Bdx1FJSo4imAFpq3yaXrUid/T6TS8E0OGub6F KBh2u/RG229uyNhcThv+jCHSqtV4ov5vSxr+SXDcQomBtE24TVlRt6QKNo4CeVgra8+Wsz n9ccax+p4FeR4vn2ckcft0qO+Ga5EjU= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf02.hostedemail.com: domain of anshuman.khandual@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=anshuman.khandual@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690516880; a=rsa-sha256; cv=none; b=iMZH/ihmoEScggHp/kNYumDB3jeqRRIkE1FlnFQb27sXT3FEUCFksd42T6B7lN02I5A77Q 4LPF8BnQ4UcaKgROlYzp06VCV8dQ04Ypv64/AgNIaOP5BBijHh+bh2kXGrfubWaNwPj6cg /EgAl//L3B0twuc1hxoYl59WXAFkpWg= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D1AF72F4; Thu, 27 Jul 2023 21:02:01 -0700 (PDT) Received: from [10.163.51.135] (unknown [10.163.51.135]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id EB4483F6C4; Thu, 27 Jul 2023 21:01:14 -0700 (PDT) Message-ID: <306b2bc5-9637-0743-b8bb-7a60aa1ad65d@arm.com> Date: Fri, 28 Jul 2023 09:31:11 +0530 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Subject: Re: [RFC PATCH] arm64: mm: Fix kernel page tables incorrectly deleted during memory removal Content-Language: en-US To: David Hildenbrand , mawupeng , will@kernel.org Cc: catalin.marinas@arm.com, akpm@linux-foundation.org, sudaraja@codeaurora.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, wangkefeng.wang@huawei.com, linux-arm-kernel@lists.infradead.org, mark.rutland@arm.com References: <20230717115150.1806954-1-mawupeng1@huawei.com> <20230721103628.GA12601@willie-the-truck> <35a0dad6-4f3b-f2c3-f835-b13c1e899f8d@huawei.com> <732e0db0-eb41-6c58-85b7-46257b4ba0b7@redhat.com> <3149f5f8-7878-dfe1-5745-870fddcc1108@huawei.com> From: Anshuman Khandual In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: F401080004 X-Stat-Signature: yq6ab7p55t3jr5nms6mzu89qebeikui7 X-HE-Tag: 1690516879-673927 X-HE-Meta: U2FsdGVkX19L+aqRhtn6uW1Aj7kscJygfTrBYD2n/KOk7nb5jNpx9kTj2MHiMwkI5EN5AAHjq0okQAOXEbKRAAX8xgU6i49UvRGX6MctbYQ6lC3XI2e7pALaHLE1NlNWUS6QBoXgzf2wfzZIhkuP+yZLv7b6KoN2RQbRUrPbv2S2kWEUhajN0mU8rNhP0ygW2wCiJrQETh+Ot+pCsK3T5g1z3GtGbZMHV3k/Aj3kaXzf0HfFrMkKP1/Mnl1ga571feAS2xm61jRvejipgg401hOjt4f6Oeui0qDr5nP8DnG8dLB0WoFMo0Jem8viT8r2BYUGzjBNzqqm4hzSKF6aBfUrGdFAjb8EEsgp+Ps+E31VVQ3M+oeZ/01vmv8GAMmVBx4lQJr+dqDoqwsWxaaEczfCs0VYvyKH4a5QGluzzh8GixTh72eJ2HXhvvqQWPi+c2rd3H7PAphGk/XFL7U6kjQ779vMzv2vwXxP3owAekyOEpAuwYX2Xhosj2JZWgTAGF0Cy2NgCCJKxdYn4uJESfq+I0MBrb2jQhr0bsCU7xMzOnX4Wmm5iJFyGFwusczbvxnau5W5CYPPbBs2qb0hPj1WnYx+0eVFHLyXcKLd3+BbvHhXsUU33T2lLnswAOTsqJSdxdGGH6XKUZGuE7grMBxoL7+9vYPAOTR/qjcEo0ux4hbVlgjvRY6A4NkdHyXBtXEqT86Ci0JYv+DKIVux6jknSiZ11BMj3enRKHraTnnLZif3pWPtiVT3EXtxP/UsgT60DOKbbkFXR1iUKqFv9xv17fZom3+yV/PCtLH4rr0wfk6/e2tQwAE+2MAt24OFkiFO/n2v6QJuthMvYbFG69WNxHBkiAMi3j6+ZQSC0qUP5VTH4BKK28UOCPBmxF26KM5wT/SUgVoZ1Bzs2B8sJCwBUv30h9askvxfXb1rJOoc7ZBkQh//Cu8eSIybmW0B1ANsxc18jVB+6fM9NHw oXeENX0N C/F9bK0hxdogGqRM+8riDI/tDRw+H8zaTMHMx4ycHyyHDz/XlHN4AmDdNa7plEdoh+kQ+5+TRnna+cauThjXVkBGWSO3NUAXG7uSaiRL7NuOhDRuE4O0GUCrxQvUX0ZcJ74r+ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 7/26/23 13:20, David Hildenbrand wrote: > On 26.07.23 08:20, mawupeng wrote: >> >> >> On 2023/7/24 14:11, David Hildenbrand wrote: >>> On 24.07.23 07:54, Anshuman Khandual wrote: >>>> >>>> >>>> On 7/24/23 06:55, mawupeng wrote: >>>>> >>>>> On 2023/7/21 18:36, Will Deacon wrote: >>>>>> On Mon, Jul 17, 2023 at 07:51:50PM +0800, Wupeng Ma wrote: >>>>>>> From: Ma Wupeng >>>>>>> >>>>>>> During our test, we found that kernel page table may be unexpectedly >>>>>>> cleared with rodata off. The root cause is that the kernel page is >>>>>>> initialized with pud size(1G block mapping) while offline is memory >>>>>>> block size(MIN_MEMORY_BLOCK_SIZE 128M), eg, if 2G memory is hot-added, >>>>>>> when offline a memory block, the call trace is shown below, >>> >>> Is someone adding memory in 2 GiB granularity and then removing parts of it in 128 MiB granularity? That would be against what we support using the add_memory() / offline_and_remove_memory() API and that driver should be fixed instead. >> >> Yes, this kind of situation. >> >> The problem occurs in the following scenarios: >> 1. use mem=xxG to reserve memory. >> 2. add_momory to online memory. >> 3. offline part of the memroy via offline_and_remove_memory. >> >> During my research, ACPI memory removal use memory_subsys_offline to offline memory section and >> this will not delete page table entry which do not trigger this kind of problem. >> >> So I understand what you are talking about. >> 1. 3rd-party driver shouldn't use add_memory/offline_and_remove_memory to online/offline memory. >>     If it have to use, this can be achieved by driver. >> 2. memory_subsys_offline is perfered to do such thing. > > No, my point is that > > 1) If you use add_memory() and offline_and_remove_memory() in the *same >    granularity* it has to be working, otherwise it has to be fixed. > > 2) If you use add_memory() and offline_and_remove_memory() in different >    granularity (especially, add_memory() in bigger granularity) , then >    change your code to do add_memory() in the same granularity. > > > If you run into 1), then we populated a PUD for boot memory that also covers yet unpopulated physical memory ranges that are later populated by add_memory(). If that's the case, then we can either fix it by Is that case possible ? __create_pgd_mapping() is called to create the mapping both in hotplug and boot memory cases. alloc_init_pud() ensures [1], that both virtual and physical address ranges are PUD_MASK aligned, before creating a huge or block page entry. (addr | next | phys) & ~PUD_MASK) == 0 > > a) Not doing that. Use PMD tables instead for that piece of memory. > > b) Detecting that that PUD still covers memory and refusing to remove >    that PUD. > > c) Rejecting to hotadd memory in this situation at that location. We >    have mhp_get_pluggable_range() -> arch_get_mappable_range() to kind- >    of handle something like that. >