From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 125C8C35274 for ; Mon, 18 Dec 2023 10:59:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 894EB6B0082; Mon, 18 Dec 2023 05:59:37 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 81DCA6B0085; Mon, 18 Dec 2023 05:59:37 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 698186B0087; Mon, 18 Dec 2023 05:59:37 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 53EFE6B0082 for ; Mon, 18 Dec 2023 05:59:37 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 29C891A0F91 for ; Mon, 18 Dec 2023 10:59:37 +0000 (UTC) X-FDA: 81579643194.10.9788340 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf24.hostedemail.com (Postfix) with ESMTP id 4CD9918001F for ; Mon, 18 Dec 2023 10:59:35 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf24.hostedemail.com: domain of alexandru.elisei@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=alexandru.elisei@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1702897175; a=rsa-sha256; cv=none; b=XHQHMfcmmtQy94NuoLY4/QDNQu549L5oHzxl7dknBhJi+di1SFP5OOfkgdUs9bvum/MbX1 AIhOZr2gFyWC51A7pzbC2LYB1Je8DF+f1/mAfJVPqNyaWkUPL+8I1JIZiIpnOu6CLHX2X7 hbporGvo9YBxPml/4Ytbqw+fEKDDQrU= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf24.hostedemail.com: domain of alexandru.elisei@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=alexandru.elisei@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1702897175; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=SaSXsQkikae5bwctmCvu3rtiywlCsKzJ96LWB2mEl+s=; b=znM1MvBxHNuzdOu0snLtx2ez8fSKRDBteCuSWoN9UTLvoVWJWFPx/6SjFXiVKMWDlFA/Jf 1zsjwRR2yJOh5BOQ+GRO5V2EWEOiW6D6zvDvcZ645nNGcB7li61ZfxF8mtCAKOkKgmN/gW 3Cr/yf/AVKKFINgJrXgKu0yCgsf8RI0= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id F20AA1FB; Mon, 18 Dec 2023 03:00:18 -0800 (PST) Received: from raptor (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 748C23F738; Mon, 18 Dec 2023 02:59:29 -0800 (PST) Date: Mon, 18 Dec 2023 10:59:26 +0000 From: Alexandru Elisei To: Rob Herring Cc: catalin.marinas@arm.com, will@kernel.org, oliver.upton@linux.dev, maz@kernel.org, james.morse@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, arnd@arndb.de, akpm@linux-foundation.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, mhiramat@kernel.org, rppt@kernel.org, hughd@google.com, pcc@google.com, steven.price@arm.com, anshuman.khandual@arm.com, vincenzo.frascino@arm.com, david@redhat.com, eugenis@google.com, kcc@google.com, hyesoo.yu@samsung.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org Subject: Re: [PATCH RFC v2 11/27] arm64: mte: Reserve tag storage memory Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 4CD9918001F X-Stat-Signature: rps6d6m785a839n3mzycu5kz5eu1btdd X-Rspam-User: X-HE-Tag: 1702897175-788319 X-HE-Meta: U2FsdGVkX1/5xJXXUslGRxeTbfKtAoFE/nqQ+CL/+3CLh5lgN4OQ6n0IrydsDMJK5bIVbynLpzdn19MznASN79XZJJ2WfpQtbipx6zLOhd8gNrz8I7Ep3iMipAzQkwnaW3G47nL2CRgHW8+sYxN2DHCAe8IO4Y0BcwAB0Z3TeN9imEFsYMfXyDIzG8lekXqN2nEXAQFKNym4QJZpSJb+y17PqcdE+2tenyBHsSbDhzdDjgRQeL2Wm6UyMVrGL/0q56D7PTTMIqV8rKtAdEEnZXbqFoMvBrXn8cmg9am1/pnJgqWVcPG9N9txUSIZKPo+cpt1SzOeO/ccAWu7MoJ+4mRuYCgOQcob0idWrk3hIhqXPdtW2taHBLXBusjblm8H/P9TxEJMm51rneJRZsbKAV/K6uYgu7sr9C87kAaHNk1IkYOliszd6VPz7OfVnGaNbmOmWVIoUevMd+IlgxuFIFOtY/LRB5yjqRN+GgfxFsVdOUKcrN9FT6PNnZ4+KRmIwfEXQJvERpUcx9hfG2fIWshCRIowz9+63KpUG8PpvFk6dM2VHF/vTlw91bq4Aa/bRCJHRdQXzfsat4aioeSWnbRgzFaSmjDKiSSSpeONanaJpPdgQp/Zf7SsUTe0RSV5J+aBw0p3V+TmbuHcUsFEvE/I7Ir/AHl1jWpAu9wxrHjStYVmG2v8KyTIsEr9NCEs4WXev7dPIFH6UbgQONMgKdPeRfbCG5aj80ZOHCa7YpFY/NT7h9BIPVgROTLF+W36RYRKjE/mdGueVOMPPtzFyBvcMSqe+tEOPyml8Nyq7FO69iuUHliY2Dqr2G7HXa5ZODsBgrn/DzbpBPXxvzAEFPN+Y3poZfPedFrJj4jqRLJ3NlqPCmGAB0u6G4sHnBTr3c0SgYxaBoXnznElpcFpi8N/YbHs2ZXFlyNRsHmSbjFukPoew3jUXdGTqXTIun2iyFLBibShCnUWL7otWtF +4hkc57u ZPF9v5AGNykHTWaMCtXn6g2d0gECpIyLVzabRtdllBka/DKs0qHBxwC5z9aqkxVaOW+TdKjyu9eyiOpYwKgRY/XIUY7O5IoJXTTLUC03exRW6Jwn1m7iP5D0JgeRVaJ8C6l3YlcOfLN/mcX4dSoLa7TUKU/Yh3VF1TexyJSfNDUbfb2AQdixYPoHUQSXiDzrNBHwU0fxLdVGUQwIxyjhdMafP9X9cOKV0T+6S X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi, On Thu, Dec 14, 2023 at 12:55:14PM -0600, Rob Herring wrote: > On Thu, Dec 14, 2023 at 9:45 AM Alexandru Elisei > wrote: > > > > Hi, > > > > On Wed, Dec 13, 2023 at 02:30:42PM -0600, Rob Herring wrote: > > > On Wed, Dec 13, 2023 at 11:44 AM Alexandru Elisei > > > wrote: > > > > > > > > On Wed, Dec 13, 2023 at 11:22:17AM -0600, Rob Herring wrote: > > > > > On Wed, Dec 13, 2023 at 8:51 AM Alexandru Elisei > > > > > wrote: > > > > > > > > > > > > Hi, > > > > > > > > > > > > On Wed, Dec 13, 2023 at 08:06:44AM -0600, Rob Herring wrote: > > > > > > > On Wed, Dec 13, 2023 at 7:05 AM Alexandru Elisei > > > > > > > wrote: > > > > > > > > > > > > > > > > Hi Rob, > > > > > > > > > > > > > > > > On Tue, Dec 12, 2023 at 12:44:06PM -0600, Rob Herring wrote: > > > > > > > > > On Tue, Dec 12, 2023 at 10:38 AM Alexandru Elisei > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > Hi Rob, > > > > > > > > > > > > > > > > > > > > Thank you so much for the feedback, I'm not very familiar with device tree, > > > > > > > > > > and any comments are very useful. > > > > > > > > > > > > > > > > > > > > On Mon, Dec 11, 2023 at 11:29:40AM -0600, Rob Herring wrote: > > > > > > > > > > > On Sun, Nov 19, 2023 at 10:59 AM Alexandru Elisei > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > Allow the kernel to get the size and location of the MTE tag storage > > > > > > > > > > > > regions from the DTB. This memory is marked as reserved for now. > > > > > > > > > > > > > > > > > > > > > > > > The DTB node for the tag storage region is defined as: > > > > > > > > > > > > > > > > > > > > > > > > tags0: tag-storage@8f8000000 { > > > > > > > > > > > > compatible = "arm,mte-tag-storage"; > > > > > > > > > > > > reg = <0x08 0xf8000000 0x00 0x4000000>; > > > > > > > > > > > > block-size = <0x1000>; > > > > > > > > > > > > memory = <&memory0>; // Associated tagged memory node > > > > > > > > > > > > }; > > > > > > > > > > > > > > > > > > > > > > I skimmed thru the discussion some. If this memory range is within > > > > > > > > > > > main RAM, then it definitely belongs in /reserved-memory. > > > > > > > > > > > > > > > > > > > > Ok, will do that. > > > > > > > > > > > > > > > > > > > > If you don't mind, why do you say that it definitely belongs in > > > > > > > > > > reserved-memory? I'm not trying to argue otherwise, I'm curious about the > > > > > > > > > > motivation. > > > > > > > > > > > > > > > > > > Simply so that /memory nodes describe all possible memory and > > > > > > > > > /reserved-memory is just adding restrictions. It's also because > > > > > > > > > /reserved-memory is what gets handled early, and we don't need > > > > > > > > > multiple things to handle early. > > > > > > > > > > > > > > > > > > > Tag storage is not DMA and can live anywhere in memory. > > > > > > > > > > > > > > > > > > Then why put it in DT at all? The only reason CMA is there is to set > > > > > > > > > the size. It's not even clear to me we need CMA in DT either. The > > > > > > > > > reasoning long ago was the kernel didn't do a good job of moving and > > > > > > > > > reclaiming contiguous space, but that's supposed to be better now (and > > > > > > > > > most h/w figured out they need IOMMUs). > > > > > > > > > > > > > > > > > > But for tag storage you know the size as it is a function of the > > > > > > > > > memory size, right? After all, you are validating the size is correct. > > > > > > > > > I guess there is still the aspect of whether you want enable MTE or > > > > > > > > > not which could be done in a variety of ways. > > > > > > > > > > > > > > > > Oh, sorry, my bad, I should have been clearer about this. I don't want to > > > > > > > > put it in the DT as a "linux,cma" node. But I want it to be managed by CMA. > > > > > > > > > > > > > > Yes, I understand, but my point remains. Why do you need this in DT? > > > > > > > If the location doesn't matter and you can calculate the size from the > > > > > > > memory size, what else is there to add to the DT? > > > > > > > > > > > > I am afraid there has been a misunderstanding. What do you mean by > > > > > > "location doesn't matter"? > > > > > > > > > > You said: > > > > > > Tag storage is not DMA and can live anywhere in memory. > > > > > > > > > > Which I took as the kernel can figure out where to put it. But maybe > > > > > you meant the h/w platform can hard code it to be anywhere in memory? > > > > > If so, then yes, DT is needed. > > > > > > > > Ah, I see, sorry for not being clear enough, you are correct: tag storage > > > > is a hardware property, and software needs a mechanism (in this case, the > > > > dt) to discover its properties. > > > > > > > > > > > > > > > At the very least, Linux needs to know the address and size of a memory > > > > > > region to use it. The series is about using the tag storage memory for > > > > > > data. Tag storage cannot be described as a regular memory node because it > > > > > > cannot be tagged (and normal memory can). > > > > > > > > > > If the tag storage lives in the middle of memory, then it would be > > > > > described in the memory node, but removed by being in reserved-memory > > > > > node. > > > > > > > > I don't follow. Would you mind going into more details? > > > > > > It goes back to what I said earlier about /memory nodes describing all > > > the memory. There's no reason to reserve memory if you haven't > > > described that range as memory to begin with. One could presumably > > > just have a memory node for each contiguous chunk and not need > > > /reserved-memory (ignoring the need to say what things are reserved > > > for). That would become very difficult to adjust. Note that the kernel > > > has a hardcoded limit of 64 reserved regions currently and that is not > > > enough for some people. Seems like a lot, but I have no idea how they > > > are (ab)using /reserved-memory. > > > > Ah, I see what you mean, reserved memory is about marking existing memory > > (from a /memory node) as special, not about adding new memory. > > > > After the memblock allocator is initialized, the kernel can use it for its > > own allocations. Kernel allocations are not movable. > > > > When a page is allocated as tagged, the associated tag storage cannot be > > used for data, otherwise the tags would corrupt that data. To avoid this, > > the requirement is that tag storage pages are only used for movable > > allocations. When a page is allocated as tagged, the data in the associated > > tag storage is migrated and the tag storage is taken from the page > > allocator (via alloc_contig_range()). > > > > My understanding is that the memblock allocator can use all the memory from > > a /memory node. If the tags storage memory is declared in a /memory node, > > there exists the possibility that Linux will use tag storage memory for its > > own allocation, which would make that tags storage memory unmovable, and > > thus unusable for storing tags. > > No, because the tag storage would be reserved in /reserved-memory. > > Of course, the arch code could do something between scanning /memory > nodes and /reserved-memory, but that would be broken arch code. > Ideally, there wouldn't be any arch code in between those 2 points, > but it's complicated. It used to mainly be powerpc, but we keep adding > to the complexity on arm64. Ah, yes, thats what I was referring to, the fact that the memory nodes are parsed in setup_arch -> setup_machine_fdt -> early_init_dt_scan, and the reserved memory is parsed later in setup_arch -> arm64_memblock_init. If the rule is that no memblock allocations can take place between setup_machine_fdt() and arm64_memblock_init(), then putting tag storage in a /memory node will work, thank you for the clarification. > > > Looking at early_init_dt_scan_memory(), even if a /memory node if marked at > > hotpluggable, memblock will still use it, unless "movable_node" is set on > > the kernel command line. > > > > That's the reason why I'm not describing tag storage in a /memory node. Is > > there way to tell the memblock allocator not to use memory from a /memory > > node? > > > > > > > > Let me give an example. Presumably using MTE at all is configurable. > > > If you boot a kernel with MTE disabled (or older and not supporting > > > it), then I'd assume you'd want to use the tag storage for regular > > > memory. Well, If tag storage is already part of /memory, then all you > > > have to do is ignore the tag reserved-memory region. Tweaking the > > > memory nodes would be more work. > > > > Right now, memory is added via memblock_reserve(), and if MTE is disabled > > (for example, via the kernel command line), the code calls > > free_reserved_page() for each tag storage page. I find that straightfoward > > to implement. > > But better to just not reserve the region in the first place. Also, it > needs to be simple enough to back port. I don't think that works - reserved memory is parsed in setup_arch -> arm64_memblock_init, and the cpu capabilities are initialized later, in smp_prepare_boot_cpu. > > Also, does free_reserved_page() work on ranges outside of memblock > range (e.g. beyond end_of_DRAM())? If the tag storage happened to live > at the end of DRAM and you shorten the /memory node size to remove tag > storage, is it still going to work? Tag storage memory is discovered in 2 staged: first it is added to memblock with memblock_add(), then reserved with memblock_reserve(). This is performed in setup_arch(), after setup_machine_fdt(), and before arm64_memblock_init(). The tag torage code keeps an array of the discovered tag regions. This is implemented in this patch. The next patch [1] adds an arch_initcall that checks if memblock_end_of_DRAM() is less than the upper address of a tag storage region. If that is the case, then tag storage memory is kept as reserved and remains unused by the kernel. The next check is for mte enabled: if it is disabled, then the pages are unreserved by doing free_reserved_page(). And finally, if all the checks pass, the tag storage pages are put on the MIGRATE_CMA lists with init_cma_reserved_pageblock(). [1] https://lore.kernel.org/all/20231119165721.9849-12-alexandru.elisei@arm.com/ Thanks, Alex > > Rob