From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CDB8BC00140 for ; Fri, 5 Aug 2022 12:09:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5C57E8E0002; Fri, 5 Aug 2022 08:09:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 54FD18E0001; Fri, 5 Aug 2022 08:09:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 379078E0002; Fri, 5 Aug 2022 08:09:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 2175E8E0001 for ; Fri, 5 Aug 2022 08:09:39 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id C601E8192F for ; Fri, 5 Aug 2022 12:09:38 +0000 (UTC) X-FDA: 79765419636.16.C8948E7 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf06.hostedemail.com (Postfix) with ESMTP id 0B03218011D for ; Fri, 5 Aug 2022 12:09:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1659701377; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tfAY0OctIFwhY2dxc6t3vPJmdx9dX9vR5e4NaQbcGUc=; b=JqUcOuVWtcwSzN4FuN3uvar/U1xblJ9AX6hBYWfH78ay/Hvvi+zzcex8LA0S6zHckjacHW eSCVwXMeWhWMm0Yd6O7DIpkZOJ6lh8bGyngVgOw4vL5Qhiqe5kfAl2JMl02bi3kGJani0q uBW4bQKXRiQXmbjb3EsvzY6gFkiMwXQ= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-619-nyblAjvqPMOmK484lZq65g-1; Fri, 05 Aug 2022 08:09:35 -0400 X-MC-Unique: nyblAjvqPMOmK484lZq65g-1 Received: by mail-wm1-f69.google.com with SMTP id d10-20020a05600c34ca00b003a4f0932ec3so4142814wmq.0 for ; Fri, 05 Aug 2022 05:09:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc; bh=tfAY0OctIFwhY2dxc6t3vPJmdx9dX9vR5e4NaQbcGUc=; b=BxrZOZmRPgb/GUpMCQvw/mBsUWTGixVnupQBTwN8zJGIYem0OgGQzlgpzAZXM3F2ns vVN499VW+4+xcWwi2Q0z/ZjEWfM6/TPdVEJnsj2BjMEl3DvBeBjggP2X94TwwRTvflUr cZ1j4dAHIUPI5dmjY3kbZUf6u7LU5dwHatBREjImEHSACsXGOmHBf5KElZy69n1Bp5yc CG25GBFRVJZbGPze4EeJx82u1Sbtvb9jrDvFbUsku+3F5Lc+CgO8MEpNFYqLgZ0hcRWP 8glZOcQBn83zEs/51zYuLtUuNBL0Q6MBZiXacUXxQx8rifTrBFjHXL8UH1LetBhod7uN 6kwA== X-Gm-Message-State: ACgBeo2J9zl96ZK308qyo14uWVP8rsRwWfoJixt4pvpg3bStXFBSMjkr F+Qz+o0dJAZHM9+v/ZfX4Cy80VbUvD8ctFvKD5KKzfT3PAH1+JLf3nNUzn4ApIFKo12+GKgtkeQ xNcCi9TCU8KY= X-Received: by 2002:a05:6000:1379:b0:21f:c4d:957a with SMTP id q25-20020a056000137900b0021f0c4d957amr4235755wrz.210.1659701373606; Fri, 05 Aug 2022 05:09:33 -0700 (PDT) X-Google-Smtp-Source: AA6agR4KIyeyf2kV1/U5HQgcYIKESpAYnB5Rpy18WHdGgXasfneZBXhzdedc2vFgJWSbeAMW30V03w== X-Received: by 2002:a05:6000:1379:b0:21f:c4d:957a with SMTP id q25-20020a056000137900b0021f0c4d957amr4235739wrz.210.1659701373283; Fri, 05 Aug 2022 05:09:33 -0700 (PDT) Received: from ?IPV6:2003:cb:c706:fb00:f5c3:24b2:3d03:9d52? (p200300cbc706fb00f5c324b23d039d52.dip0.t-ipconnect.de. [2003:cb:c706:fb00:f5c3:24b2:3d03:9d52]) by smtp.gmail.com with ESMTPSA id bh19-20020a05600c3d1300b003a2f6367049sm4449097wmb.48.2022.08.05.05.09.31 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 05 Aug 2022 05:09:32 -0700 (PDT) Message-ID: Date: Fri, 5 Aug 2022 14:09:31 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 Subject: Re: [PATCHv7 02/14] mm: Add support for unaccepted memory To: Vlastimil Babka , "Kirill A. Shutemov" , Borislav Petkov , Andy Lutomirski , Sean Christopherson , Andrew Morton , Joerg Roedel , Ard Biesheuvel Cc: Andi Kleen , Kuppuswamy Sathyanarayanan , David Rientjes , Tom Lendacky , Thomas Gleixner , Peter Zijlstra , Paolo Bonzini , Ingo Molnar , Dario Faggioli , Dave Hansen , Mike Rapoport , marcelo.cerri@canonical.com, tim.gardner@canonical.com, khalid.elmously@canonical.com, philip.cox@canonical.com, x86@kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev, linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org, Mike Rapoport , Mel Gorman References: <20220614120231.48165-1-kirill.shutemov@linux.intel.com> <20220614120231.48165-3-kirill.shutemov@linux.intel.com> <8cf143e7-2b62-1a1e-de84-e3dcc6c027a4@suse.cz> From: David Hildenbrand Organization: Red Hat In-Reply-To: <8cf143e7-2b62-1a1e-de84-e3dcc6c027a4@suse.cz> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1659701378; a=rsa-sha256; cv=none; b=Ue3szrJTHH5EVHaUQEnk6mjlbH48XQmzJ83upyu5duZUWVIRoQm3SZ+lSVets/OFeESm6g +IOmKRHpMYErC8jyVG9XaoCZKV8xY9rJaWCFq1xc9hy+x9rOHxdUg3/zmNuK4ObzJ7hcb9 YbYPgs+KMSNVP5t39DCpAseSw28uqDM= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=JqUcOuVW; spf=pass (imf06.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1659701378; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=tfAY0OctIFwhY2dxc6t3vPJmdx9dX9vR5e4NaQbcGUc=; b=DCRL7wmAl2uzsQiamY0kvjB8NZwy2EFTQKav7vdolEB75p2ZA/Eaw3MjDlcDN9xKRJUwgz tnZX1QdiaVVJcjNw06bv+9RTame7hODMUjNvxClkbEzkrRlVpcPwKRjrkT5TS/6as3CiMv AlYzRZZkns8fJ+qJXLUIf82U2RplOw4= X-Stat-Signature: y86in8j7x7jdqz3hsn7e89zc1pqfk38y X-Rspamd-Queue-Id: 0B03218011D Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=JqUcOuVW; spf=pass (imf06.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspam-User: X-Rspamd-Server: rspam05 X-HE-Tag: 1659701377-128561 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 05.08.22 13:49, Vlastimil Babka wrote: > On 6/14/22 14:02, Kirill A. Shutemov wrote: >> UEFI Specification version 2.9 introduces the concept of memory >> acceptance. Some Virtual Machine platforms, such as Intel TDX or AMD >> SEV-SNP, require memory to be accepted before it can be used by the >> guest. Accepting happens via a protocol specific to the Virtual Machine >> platform. >> >> There are several ways kernel can deal with unaccepted memory: >> >> 1. Accept all the memory during the boot. It is easy to implement and >> it doesn't have runtime cost once the system is booted. The downside >> is very long boot time. >> >> Accept can be parallelized to multiple CPUs to keep it manageable >> (i.e. via DEFERRED_STRUCT_PAGE_INIT), but it tends to saturate >> memory bandwidth and does not scale beyond the point. >> >> 2. Accept a block of memory on the first use. It requires more >> infrastructure and changes in page allocator to make it work, but >> it provides good boot time. >> >> On-demand memory accept means latency spikes every time kernel steps >> onto a new memory block. The spikes will go away once workload data >> set size gets stabilized or all memory gets accepted. >> >> 3. Accept all memory in background. Introduce a thread (or multiple) >> that gets memory accepted proactively. It will minimize time the >> system experience latency spikes on memory allocation while keeping >> low boot time. >> >> This approach cannot function on its own. It is an extension of #2: >> background memory acceptance requires functional scheduler, but the >> page allocator may need to tap into unaccepted memory before that. >> >> The downside of the approach is that these threads also steal CPU >> cycles and memory bandwidth from the user's workload and may hurt >> user experience. >> >> Implement #2 for now. It is a reasonable default. Some workloads may >> want to use #1 or #3 and they can be implemented later based on user's >> demands. >> >> Support of unaccepted memory requires a few changes in core-mm code: >> >> - memblock has to accept memory on allocation; >> >> - page allocator has to accept memory on the first allocation of the >> page; >> >> Memblock change is trivial. >> >> The page allocator is modified to accept pages on the first allocation. >> The new page type (encoded in the _mapcount) -- PageUnaccepted() -- is >> used to indicate that the page requires acceptance. >> >> Architecture has to provide two helpers if it wants to support >> unaccepted memory: >> >> - accept_memory() makes a range of physical addresses accepted. >> >> - range_contains_unaccepted_memory() checks anything within the range >> of physical addresses requires acceptance. >> >> Signed-off-by: Kirill A. Shutemov >> Acked-by: Mike Rapoport # memblock >> Reviewed-by: David Hildenbrand > > Hmm I realize it's not ideal to raise this at v7, and maybe it was discussed > before, but it's really not great how this affects the core page allocator > paths. Wouldn't it be possible to only release pages to page allocator when > accepted, and otherwise use some new per-zone variables together with the > bitmap to track how much exactly is where to accept? Then it could be hooked > in get_page_from_freelist() similarly to CONFIG_DEFERRED_STRUCT_PAGE_INIT - > if we fail zone_watermark_fast() and there are unaccepted pages in the zone, > accept them and continue. With a static key to flip in case we eventually > accept everything. Because this is really similar scenario to the deferred > init and that one was solved in a way that adds minimal overhead. I kind of like just having the memory stats being correct (e.g., free memory) and acceptance being an internal detail to be triggered when allocating pages -- just like the arch_alloc_page() callback. I'm sure we could optimize for the !unaccepted memory via static keys also in this version with some checks at the right places if we find this to hurt performance? -- Thanks, David / dhildenb