From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 17B57C36002 for ; Wed, 9 Apr 2025 09:06:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8CBF46B00F9; Wed, 9 Apr 2025 05:06:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 87D1B6B00FA; Wed, 9 Apr 2025 05:06:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 744CC6B00FB; Wed, 9 Apr 2025 05:06:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 5353F6B00F9 for ; Wed, 9 Apr 2025 05:06:45 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 29E9D8218E for ; Wed, 9 Apr 2025 09:06:46 +0000 (UTC) X-FDA: 83313925212.30.321AEC8 Received: from sea.source.kernel.org (sea.source.kernel.org [172.234.252.31]) by imf21.hostedemail.com (Postfix) with ESMTP id 6B5B01C0006 for ; Wed, 9 Apr 2025 09:06:44 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=eerjUQlv; spf=pass (imf21.hostedemail.com: domain of rppt@kernel.org designates 172.234.252.31 as permitted sender) smtp.mailfrom=rppt@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744189604; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TS1J/iGjpOnhfesj1YKRHzCDJZQiACcT4LkCJhVJGx0=; b=HrLv2uKDXGRAs+UqUcM19HTgppfqxMbw8iV/RCeiefBzde62p96+TJ79lqu2YK2ge8WU+F zp4h3yWrXZeQL7JCqD91DmmpiAIUSt5hQzGrptyUjontXDevTkOwHsoHSSDk5qIuk6aewB Dv6YsK1qIUueQ+0B4Hs/4sYecUQOS54= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744189604; a=rsa-sha256; cv=none; b=5pv+JaRPnyaDlmgyFccQIrzOgEaIeLeePOK63znEKQhucAdD8Mc5RbH9phYLf7aYziB2Zw DUHW07SRTw3LYMD6sQZu62GpuKteelDOD8b16EpNY63zWmFx3OVI7DxtC4blziNZD6CFEt Lv7MPuO9P4VxeKaSSby1XHCLg1ysw2E= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=eerjUQlv; spf=pass (imf21.hostedemail.com: domain of rppt@kernel.org designates 172.234.252.31 as permitted sender) smtp.mailfrom=rppt@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sea.source.kernel.org (Postfix) with ESMTP id 41B9D43A63; Wed, 9 Apr 2025 09:06:42 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 20181C4CEE3; Wed, 9 Apr 2025 09:06:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1744189602; bh=/s+U51p2rme6ttKunG01nE4+T3pq5eYIYautG/ah3Wc=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=eerjUQlvNJF+OcsvIsqmEbAp/QAZ/fvQAZPtZABd4PBsB7L/b4Q5V00nkAKW7iTGB fOWeKV/HJ/ib1NS32lxrfK674W4vSwxOS/yrVTY0c7xJyhbfGDcT9JWQU5S0ZYKolL bIKUumb/XfD+cSCkjS6jADB/RYMB7aQ2g7eLdflW7X+exTmDyUGTqcW+pVSPt4uoGZ oTqY7MD0cLXyIw8SnWTcPccndUqN9w/kWekQURRgNXJsL+uDtiwyA58zBUMHEEnuxs chk5n2/7cxnMmUCrsFmejaBdPXgvCnJLEGYHXd5nkJcUyqeHVzitwPYKYPg2Ps+4hN M2oTs57z8vEAg== Date: Wed, 9 Apr 2025 12:06:27 +0300 From: Mike Rapoport To: Jason Gunthorpe Cc: Pratyush Yadav , Changyuan Lyu , linux-kernel@vger.kernel.org, graf@amazon.com, akpm@linux-foundation.org, luto@kernel.org, anthony.yznaga@oracle.com, arnd@arndb.de, ashish.kalra@amd.com, benh@kernel.crashing.org, bp@alien8.de, catalin.marinas@arm.com, dave.hansen@linux.intel.com, dwmw2@infradead.org, ebiederm@xmission.com, mingo@redhat.com, jgowans@amazon.com, corbet@lwn.net, krzk@kernel.org, mark.rutland@arm.com, pbonzini@redhat.com, pasha.tatashin@soleen.com, hpa@zytor.com, peterz@infradead.org, robh+dt@kernel.org, robh@kernel.org, saravanak@google.com, skinsburskii@linux.microsoft.com, rostedt@goodmis.org, tglx@linutronix.de, thomas.lendacky@amd.com, usama.arif@bytedance.com, will@kernel.org, devicetree@vger.kernel.org, kexec@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org Subject: Re: [PATCH v5 09/16] kexec: enable KHO support for memory preservation Message-ID: References: <20250403142438.GF342109@nvidia.com> <20250404124729.GH342109@nvidia.com> <20250404143031.GB1336818@nvidia.com> <20250407141626.GB1557073@nvidia.com> <20250407170305.GI1557073@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250407170305.GI1557073@nvidia.com> X-Rspam-User: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 6B5B01C0006 X-Stat-Signature: ktu8cfypob1pzkxzcnzmbrsr9n3nd3wt X-HE-Tag: 1744189604-677361 X-HE-Meta: U2FsdGVkX18RYK/EPenWEyzkROaRKjCz95HHx9rR7fYZDmXizkcDP/+zJ25jGRbh7w909/iXcpSl0yeFONuxNilgWv+KBl4D267WPI9mKtRkZUn/P+PJ6DQfd/hbis9u6nm9MoXAaCsHPB4Z7NsJutWN15i8dFZHN7EcuqbnflCu8jNGjSZLeapW8iQ+uhJPb+M+xIf9bFeO7Zaca4ZZXwfNHg+N1Khddg6/6LteCqJFUy4BIRNdzwI5bYlYPgBv2gR0DPf0RBWUEleoqYnK/h10YnpNQlN1r40LXWLeqby3Qp/0mm44wl1RA6oFDxTbnLBua+RcPyL7oETmeDSdAr1jWU3Uhm7IGDUJ3PrAJsr6NayYtst7Ygae2xzMq8dOyDwNxAnmjLMcoOcCkz4OYCQOJDk+K2byJtfWkcpH/qjH1Ddliakw3eO2j/cJFRZvSAas6JgBtI+lOOX3OkQPl7/xjvs1bTpQ5kpCviztbqhJq/ma0ZoEfGbqzYX5Id7qXwd1cgRTXZztJg+kxmpm8dyee6XF8MQvFKrqqpETeLoXzd1PEiJMMkDy/ftH0TNnjIrhLWsaTjzgKBl3w/qxzt5XsrEP5IZkEphrlRabzvb8CCo8OBsY7WZe3/PpRSVCfJKstrmRl/lwheER7iXEzlKj5sNQgZVRG7ttckZ+FNq5PvEVCoB3twvMgXOOocE2oyD+aBGCKcsPmf9Cdol4scZR/Fb6wOHySSWpKx5zbnAsN/aR6i/oKxN5lsQOxIvl9Z7p18gGE94zAHVLIu41wLXc3HEazmbuFitLdUugvo8+3UgUVuH+r5AEyq5FXMlLV3sWhPDvRnhgAMoZU1isBGix8jM0UzPzh9T59nX+AK1sU/6fuIUa92xbIP/UFv0sNYb0AA+twzYp6UQYvDLEqJb5R4sBRTTcmpm3iiMU0Yuqth+MfxfCSTd3JGxOdYumQLBkMLFtgB7Yh01H6Lh JxjIncIG nRTOwSDyG0MMGUCoH1Qoqq3iQ17ueEo3e1CTHexGgCTvoS17FnjgvPJppETvpBsc/ZFaLllk5XfRARBHdA+kaWVj4zZ2/j/1ZU2y2tmXq2lcZQmMqdL0tVjROHk4rfgKHgpdEYc+KnTMc4fMK/I9s90hjCOesVCTqucQjb1Atu2OID1MSG5Cu440nXfrjGCN5UM6HttPLEE49IFBRMCaLT2/FDtptcZ8KHVDop7xulfcAgC9Znj/fpX0Lm/amz5XhbkvCO6VigSjrtQmMO4IDYhtIfUXx35hYg5rT/JpGgjxmR5InIbCk6z5m7/kd07ma0Ue+p4JN8iEGeto= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Apr 07, 2025 at 02:03:05PM -0300, Jason Gunthorpe wrote: > On Mon, Apr 07, 2025 at 07:31:21PM +0300, Mike Rapoport wrote: > > > > Ok, let's stick with memdesc then. Put aside the name it looks like we do > > agree that KHO needs to provide a way to preserve memory allocated from > > buddy along with some of the metadata describing that memory, like order > > for multi-order allocations. > > +1 > > > The issue I see with bitmaps is that there's nothing except the order that > > we can save. And if sometime later we'd have to recreate memdesc for that > > memory, that would mean allocating a correct data structure, i.e. struct > > folio, struct slab, struct vmalloc maybe. > > Yes. The caller would have to take care of this using a caller > specific serialization of any memdesc data. Like slab would have to > presumably record the object size and the object allocation bitmap. > > > I'm not sure we are going to preserve slabs at least at the foreseeable > > future, but vmalloc seems like something that we'd have to address. > > And I suspect vmalloc doesn't need to preserve any memdesc information? > It can all be recreated vmalloc does not have anything in memdesc now, just plain order-0 pages from alloc_pages variants. Now we've settled with terminology, and given that currently memdesc == struct page, I think we need kho_preserve_folio(struct *folio) for actual struct folios and, apparently other high order allocations, and kho_preserve_pages(struct page *, int nr) for memblock, vmalloc and alloc_pages_exact. On the restore path kho_restore_folio() will recreate multi-order thingy by doing parts of what prep_new_page() does. And kho_restore_pages() will recreate order-0 pages as if they were allocated from buddy. If the caller needs more in its memdesc, it is responsible to fill in the missing bits. > > > Also the bitmap scanning to optimize the memblock reserve isn't > > > implemented for xarray.. I don't think this is representative.. > > > > I believe that even with optimization of bitmap scanning maple tree would > > perform much better when the memory is not fragmented. > > Hard to guess, bitmap scanning is not free, especially if there are > lots of zeros, but memory allocating maple tree nodes and locking them > is not free either so who knows where things cross over.. > > > And when it is fragmented both will need to call memblock_reserve() > > similar number of times and there won't be real difference. Of > > course maple tree will consume much more memory in the worst case. > > Yes. > > bitmaps are bounded like the comment says, 512K for 16G of memory with > arbitary order 0 fragmentation. > > Assuming absolute worst case fragmentation maple tree (@24 bytes per > range, alternating allocated/freed pattern) would require around > 50M. Then almost doubled since we have the maple tree and then the > serialized copy. > > 100Mb vs 512k - I will pick the 512K :) Nah, memory is cheap nowadays :) Ok, let's start with bitmaps and then see what are the actual bottlenecks we have to optimize. > Jason -- Sincerely yours, Mike.