From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 1CA8CC433FE
	for <linux-mm@archiver.kernel.org>; Tue, 11 Jan 2022 19:46:45 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 5BCBA6B00BE; Tue, 11 Jan 2022 14:46:45 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 56B386B00BF; Tue, 11 Jan 2022 14:46:45 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 40ADD6B00C0; Tue, 11 Jan 2022 14:46:45 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0172.hostedemail.com [216.40.44.172])
	by kanga.kvack.org (Postfix) with ESMTP id 2D6546B00BE
	for <linux-mm@kvack.org>; Tue, 11 Jan 2022 14:46:45 -0500 (EST)
Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay04.hostedemail.com (Postfix) with ESMTP id D64EB92EDB
	for <linux-mm@kvack.org>; Tue, 11 Jan 2022 19:46:44 +0000 (UTC)
X-FDA: 79019038728.29.5216461
Received: from mga07.intel.com (mga07.intel.com [134.134.136.100])
	by imf01.hostedemail.com (Postfix) with ESMTP id BD3C34000B
	for <linux-mm@kvack.org>; Tue, 11 Jan 2022 19:46:43 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1641930404; x=1673466404;
  h=message-id:date:mime-version:subject:to:cc:references:
   from:in-reply-to:content-transfer-encoding;
  bh=gTc4jBDH9a16Q1xuiD12eIuLKoPR30A3hlnu1JbJSQM=;
  b=G/7BTdi4u1M58KRw2PIiDwacfxyARc/1nPS62IuKef4uaQejqGYGVRrs
   ouACoQVXoyF0Tx7pWL+oU/cSGOy6qawCUzPwNtF6paMKNtVnSp+7Q0FGk
   bU04M2g6HToBgE+/rhz7+W+nkiSdIH8qfzMHQxY76TsV+CoPEaLa1e5Dq
   mSO3J5T8U7nqG27OobtU8YsUrkic2wS3jIVFCspQNt0tsuoc3V/sXJ0Xk
   YqTRy0suq/vqMk6iuh+l42mCZa9QQLc18sLBybl9aYmvzSWgjVAQCME8d
   fu4QDwWnZoZR2qcRT26f7k4eG7PhK+Om3w1bMMds1+tB4MGNRfeQV7+08
   Q==;
X-IronPort-AV: E=McAfee;i="6200,9189,10224"; a="306923073"
X-IronPort-AV: E=Sophos;i="5.88,279,1635231600"; 
   d="scan'208";a="306923073"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
  by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Jan 2022 11:46:42 -0800
X-IronPort-AV: E=Sophos;i="5.88,279,1635231600"; 
   d="scan'208";a="613324769"
Received: from padhika1-mobl.amr.corp.intel.com (HELO [10.209.13.65]) ([10.209.13.65])
  by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Jan 2022 11:46:40 -0800
Message-ID: <3a68fabd-eaff-2164-5609-3a71fd4a7257@intel.com>
Date: Tue, 11 Jan 2022 11:46:37 -0800
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
 Thunderbird/91.4.1
Subject: Re: [PATCHv2 1/7] mm: Add support for unaccepted memory
Content-Language: en-US
To: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
 Borislav Petkov <bp@alien8.de>, Andy Lutomirski <luto@kernel.org>,
 Sean Christopherson <seanjc@google.com>,
 Andrew Morton <akpm@linux-foundation.org>, Joerg Roedel <jroedel@suse.de>,
 Ard Biesheuvel <ardb@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>,
 Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>,
 David Rientjes <rientjes@google.com>, Vlastimil Babka <vbabka@suse.cz>,
 Tom Lendacky <thomas.lendacky@amd.com>, Thomas Gleixner
 <tglx@linutronix.de>, Peter Zijlstra <peterz@infradead.org>,
 Paolo Bonzini <pbonzini@redhat.com>, Ingo Molnar <mingo@redhat.com>,
 Varad Gautam <varad.gautam@suse.com>, Dario Faggioli <dfaggioli@suse.com>,
 x86@kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev,
 linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org
References: <20220111113314.27173-1-kirill.shutemov@linux.intel.com>
 <20220111113314.27173-2-kirill.shutemov@linux.intel.com>
From: Dave Hansen <dave.hansen@intel.com>
In-Reply-To: <20220111113314.27173-2-kirill.shutemov@linux.intel.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Stat-Signature: deycsbfe9g8fsui1xkgyaokyztigjkhj
Authentication-Results: imf01.hostedemail.com;
	dkim=pass header.d=intel.com header.s=Intel header.b="G/7BTdi4";
	dmarc=pass (policy=none) header.from=intel.com;
	spf=none (imf01.hostedemail.com: domain of dave.hansen@intel.com has no SPF policy when checking 134.134.136.100) smtp.mailfrom=dave.hansen@intel.com
X-Rspamd-Server: rspam09
X-Rspamd-Queue-Id: BD3C34000B
X-HE-Tag: 1641930403-518944
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

> diff --git a/mm/memblock.c b/mm/memblock.c
> index 1018e50566f3..6dfa594192de 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -1400,6 +1400,7 @@ phys_addr_t __init memblock_alloc_range_nid(phys_addr_t size,
>   		 */
>   		kmemleak_alloc_phys(found, size, 0, 0);
>   
> +	accept_memory(found, found + size);
>   	return found;
>   }

This could use a comment.

Looking at this, I also have to wonder if accept_memory() is a bit too 
generic.  Should it perhaps be: cc_accept_memory() or 
cc_guest_accept_memory()?

> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c5952749ad40..5707b4b5f774 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1064,6 +1064,7 @@ static inline void __free_one_page(struct page *page,
>   	unsigned int max_order;
>   	struct page *buddy;
>   	bool to_tail;
> +	bool offline = PageOffline(page);
>   
>   	max_order = min_t(unsigned int, MAX_ORDER - 1, pageblock_order);
>   
> @@ -1097,6 +1098,10 @@ static inline void __free_one_page(struct page *page,
>   			clear_page_guard(zone, buddy, order, migratetype);
>   		else
>   			del_page_from_free_list(buddy, zone, order);
> +
> +		if (PageOffline(buddy))
> +			offline = true;
> +
>   		combined_pfn = buddy_pfn & pfn;
>   		page = page + (combined_pfn - pfn);
>   		pfn = combined_pfn;
> @@ -1130,6 +1135,9 @@ static inline void __free_one_page(struct page *page,
>   done_merging:
>   	set_buddy_order(page, order);
>   
> +	if (offline)
> +		__SetPageOffline(page);
> +
>   	if (fpi_flags & FPI_TO_TAIL)
>   		to_tail = true;
>   	else if (is_shuffle_order(order))

This is touching some pretty hot code paths.  You mention both that 
accepting memory is slow and expensive, yet you're doing it in the core 
allocator.

That needs at least some discussion in the changelog.

> @@ -1155,7 +1163,8 @@ static inline void __free_one_page(struct page *page,
>   static inline bool page_expected_state(struct page *page,
>   					unsigned long check_flags)
>   {
> -	if (unlikely(atomic_read(&page->_mapcount) != -1))
> +	if (unlikely(atomic_read(&page->_mapcount) != -1) &&
> +	    !PageOffline(page))
>   		return false;

Looking at stuff like this, I can't help but think that a:

	#define PageOffline PageUnaccepted

and some other renaming would be a fine idea.  I get that the Offline 
bit can be reused, but I'm not sure that the "Offline" *naming* should 
be reused.  What you're doing here is logically distinct from existing 
offlining.

>   	if (unlikely((unsigned long)page->mapping |
> @@ -1734,6 +1743,8 @@ void __init memblock_free_pages(struct page *page, unsigned long pfn,
>   {
>   	if (early_page_uninitialised(pfn))
>   		return;
> +
> +	maybe_set_page_offline(page, order);
>   	__free_pages_core(page, order);
>   }
>   
> @@ -1823,10 +1834,12 @@ static void __init deferred_free_range(unsigned long pfn,
>   	if (nr_pages == pageblock_nr_pages &&
>   	    (pfn & (pageblock_nr_pages - 1)) == 0) {
>   		set_pageblock_migratetype(page, MIGRATE_MOVABLE);
> +		maybe_set_page_offline(page, pageblock_order);
>   		__free_pages_core(page, pageblock_order);
>   		return;
>   	}
>   
> +	accept_memory(pfn << PAGE_SHIFT, (pfn + nr_pages) << PAGE_SHIFT);
>   	for (i = 0; i < nr_pages; i++, page++, pfn++) {
>   		if ((pfn & (pageblock_nr_pages - 1)) == 0)
>   			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
> @@ -2297,6 +2310,9 @@ static inline void expand(struct zone *zone, struct page *page,
>   		if (set_page_guard(zone, &page[size], high, migratetype))
>   			continue;
>   
> +		if (PageOffline(page))
> +			__SetPageOffline(&page[size]);

Yeah, this is really begging for comments.  Please add some.

>   		add_to_free_list(&page[size], zone, high, migratetype);
>   		set_buddy_order(&page[size], high);
>   	}
> @@ -2393,6 +2409,9 @@ inline void post_alloc_hook(struct page *page, unsigned int order,
>   	 */
>   	kernel_unpoison_pages(page, 1 << order);
>   
> +	if (PageOffline(page))
> +		accept_and_clear_page_offline(page, order);
> +
>   	/*
>   	 * As memory initialization might be integrated into KASAN,
>   	 * kasan_alloc_pages and kernel_init_free_pages must be

I guess once there are no more PageOffline() pages in the allocator, the 
only impact from these patches will be a bunch of conditional branches 
from the "if (PageOffline(page))" that always have the same result.  The 
branch predictors should do a good job with that.

*BUT*, that overhead is going to be universally inflicted on all users 
on x86, even those without TDX.  I guess the compiler will save non-x86 
users because they'll have an empty stub for 
accept_and_clear_page_offline() which the compiler will optimize away.

It sure would be nice to have some changelog material about why this is 
OK, though.  This is especially true since there's a global spinlock 
hidden in accept_and_clear_page_offline() wrapping a slow and "costly" 
operation.