From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6287CC43334 for ; Wed, 6 Jul 2022 02:49:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EC0858E0002; Tue, 5 Jul 2022 22:49:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E70298E0001; Tue, 5 Jul 2022 22:49:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D5F138E0002; Tue, 5 Jul 2022 22:49:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id C75E48E0001 for ; Tue, 5 Jul 2022 22:49:52 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 99AC3205BF for ; Wed, 6 Jul 2022 02:49:52 +0000 (UTC) X-FDA: 79655145024.14.38D1521 Received: from out30-44.freemail.mail.aliyun.com (out30-44.freemail.mail.aliyun.com [115.124.30.44]) by imf28.hostedemail.com (Postfix) with ESMTP id 0B34AC0003 for ; Wed, 6 Jul 2022 02:49:49 +0000 (UTC) X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R101e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046049;MF=guanghuifeng@linux.alibaba.com;NM=1;PH=DS;RN=21;SR=0;TI=SMTPD_---0VIW9-BM_1657075784; Received: from 30.225.28.170(mailfrom:guanghuifeng@linux.alibaba.com fp:SMTPD_---0VIW9-BM_1657075784) by smtp.aliyun-inc.com; Wed, 06 Jul 2022 10:49:45 +0800 Message-ID: <7bf7c5ea-16eb-b02f-8ef5-bb94c157236d@linux.alibaba.com> Date: Wed, 6 Jul 2022 10:49:43 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 Subject: Re: [PATCH v4] arm64: mm: fix linear mem mapping access performance degradation To: Mike Rapoport , Catalin Marinas Cc: Will Deacon , Ard Biesheuvel , baolin.wang@linux.alibaba.com, akpm@linux-foundation.org, david@redhat.com, jianyong.wu@arm.com, james.morse@arm.com, quic_qiancai@quicinc.com, christophe.leroy@csgroup.eu, jonathan@marek.ca, mark.rutland@arm.com, thunder.leizhen@huawei.com, anshuman.khandual@arm.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, geert+renesas@glider.be, linux-mm@kvack.org, yaohongbo@linux.alibaba.com, alikernel-developer@linux.alibaba.com References: <6977c692-78ca-5a67-773e-0389c85f2650@linux.alibaba.com> <20220704163815.GA32177@willie-the-truck> <20220705095231.GB552@willie-the-truck> <5d044fdd-a61a-d60f-d294-89e17de37712@linux.alibaba.com> <20220705121115.GB1012@willie-the-truck> From: "guanghui.fgh" In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657075792; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=J3Qn9WjogyvZdL6eValRsykdo4ZX+t9etfargLEKbGc=; b=JpOgntJzyvYkNohlI3tlbne5Xi3AFEbmOH4we2Yz+irM4j8D0H0gbxajlVR6IUaRjKAJNx D5rOa9Ogt0r4nZBa4oYy6gzKA0yOIX/RVBplq1Rj1pR0+txaYSjYDFO1XI2T6s9Zxqy56Q 3OOsFceUYzO1M/Q2kH+Oyvf/BXnUrSU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1657075792; a=rsa-sha256; cv=none; b=ij+e68wa9DSOi+qdI+Mka1KgisCM8barwjH1wgNwi5j8IsCGhC/dAn4FjMvGiFF5wPdwNQ nxA/4omrFkEe1/CMHbCSkc31synpWlZip+AqTpkvhlc1qFlq5SobM68nOmQlK4HTSLjBAs npUc6q4ZFvMEHNjrvSsAAJmG8n8w53k= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=none; spf=pass (imf28.hostedemail.com: domain of guanghuifeng@linux.alibaba.com designates 115.124.30.44 as permitted sender) smtp.mailfrom=guanghuifeng@linux.alibaba.com; dmarc=pass (policy=none) header.from=alibaba.com X-Rspam-User: Authentication-Results: imf28.hostedemail.com; dkim=none; spf=pass (imf28.hostedemail.com: domain of guanghuifeng@linux.alibaba.com designates 115.124.30.44 as permitted sender) smtp.mailfrom=guanghuifeng@linux.alibaba.com; dmarc=pass (policy=none) header.from=alibaba.com X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 0B34AC0003 X-Stat-Signature: 6nomyz4rxu8f6kmh3mg3y1ugt49r88q8 X-HE-Tag: 1657075789-776941 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Thanks. 在 2022/7/6 4:45, Mike Rapoport 写道: > On Tue, Jul 05, 2022 at 06:05:01PM +0100, Catalin Marinas wrote: >> On Tue, Jul 05, 2022 at 06:57:53PM +0300, Mike Rapoport wrote: >>> On Tue, Jul 05, 2022 at 04:34:09PM +0100, Catalin Marinas wrote: >>>> On Tue, Jul 05, 2022 at 06:02:02PM +0300, Mike Rapoport wrote: >>>>> +void __init remap_crashkernel(void) >>>>> +{ >>>>> +#ifdef CONFIG_KEXEC_CORE >>>>> + phys_addr_t start, end, size; >>>>> + phys_addr_t aligned_start, aligned_end; >>>>> + >>>>> + if (can_set_direct_map() || IS_ENABLED(CONFIG_KFENCE)) >>>>> + return; >>>>> + >>>>> + if (!crashk_res.end) >>>>> + return; >>>>> + >>>>> + start = crashk_res.start & PAGE_MASK; >>>>> + end = PAGE_ALIGN(crashk_res.end); >>>>> + >>>>> + aligned_start = ALIGN_DOWN(crashk_res.start, PUD_SIZE); >>>>> + aligned_end = ALIGN(end, PUD_SIZE); >>>>> + >>>>> + /* Clear PUDs containing crash kernel memory */ >>>>> + unmap_hotplug_range(__phys_to_virt(aligned_start), >>>>> + __phys_to_virt(aligned_end), false, NULL); >>>> >>>> What I don't understand is what happens if there's valid kernel data >>>> between aligned_start and crashk_res.start (or the other end of the >>>> range). >>> >>> Data shouldn't go anywhere :) >>> >>> There is >>> >>> + /* map area from PUD start to start of crash kernel with large pages */ >>> + size = start - aligned_start; >>> + __create_pgd_mapping(swapper_pg_dir, aligned_start, >>> + __phys_to_virt(aligned_start), >>> + size, PAGE_KERNEL, early_pgtable_alloc, 0); >>> >>> and >>> >>> + /* map area from end of crash kernel to PUD end with large pages */ >>> + size = aligned_end - end; >>> + __create_pgd_mapping(swapper_pg_dir, end, __phys_to_virt(end), >>> + size, PAGE_KERNEL, early_pgtable_alloc, 0); >>> >>> after the unmap, so after we tear down a part of a linear map we >>> immediately recreate it, just with a different page size. >>> >>> This all happens before SMP, so there is no concurrency at that point. >> >> That brief period of unmap worries me. The kernel text, data and stack >> are all in the vmalloc space but any other (memblock) allocation to this >> point may be in the unmapped range before and after the crashkernel >> reservation. The interrupts are off, so I think the only allocation and >> potential access that may go in this range is the page table itself. But >> it looks fragile to me. > > I agree there are chances there will be an allocation from the unmapped > range. > > We can make sure this won't happen, though. We can cap the memblock > allocations with memblock_set_current_limit(aligned_end) or > memblock_reserve(algined_start, aligned_end) until the mappings are > restored. > >> -- >> Catalin > I think there is no need to worry about vmalloc mem. 1.As mentioned above, When reserving crashkernel and remapping linear mem mapping, there is only one boot cpu running. There is no other cpu/thread running at the same time. 2.Although vmalloc may alloc mem from the ummaped area, but we will rebuid remapping using pte level mapping which keeps virtual address to the same physical address (At the same time, no other cpu/thread is access vmalloc mem). As a result, it has no effect to vmalloc mem.