From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id DB710C021B1
	for <linux-mm@archiver.kernel.org>; Thu, 20 Feb 2025 20:37:06 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 5DB712802EB; Thu, 20 Feb 2025 15:37:06 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 58B822802E2; Thu, 20 Feb 2025 15:37:06 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 42BF72802EB; Thu, 20 Feb 2025 15:37:06 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11])
	by kanga.kvack.org (Postfix) with ESMTP id 24F0D2802E2
	for <linux-mm@kvack.org>; Thu, 20 Feb 2025 15:37:06 -0500 (EST)
Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay01.hostedemail.com (Postfix) with ESMTP id BAF7B1CD3D0
	for <linux-mm@kvack.org>; Thu, 20 Feb 2025 20:37:05 +0000 (UTC)
X-FDA: 83141482410.04.71FF34D
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124])
	by imf24.hostedemail.com (Postfix) with ESMTP id 4C6B8180005
	for <linux-mm@kvack.org>; Thu, 20 Feb 2025 20:37:03 +0000 (UTC)
Authentication-Results: imf24.hostedemail.com;
	dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=AzIuPu9J;
	spf=pass (imf24.hostedemail.com: domain of luizcap@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=luizcap@redhat.com;
	dmarc=pass (policy=none) header.from=redhat.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1740083823;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=IMhy9qiICvzONWp5u6SxcMcMKcO6AHzHEe89orkSRsM=;
	b=UTb/R0lGcJgFKOLfCa8IVkq+4IH7ebAB+s7Sy4jc3iWJLb4lL2TUxHrp3Lg9pLOGphGX/e
	MqyKGt774bg+r1lhhs//lih7wOV7nS302lfMrTkpkAfCAyTk7qr56nW97g3Gnv/G6Huy5d
	lMvQJAlVbmoW+xIfR2nJ6qS+pqipzyA=
ARC-Authentication-Results: i=1;
	imf24.hostedemail.com;
	dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=AzIuPu9J;
	spf=pass (imf24.hostedemail.com: domain of luizcap@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=luizcap@redhat.com;
	dmarc=pass (policy=none) header.from=redhat.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740083823; a=rsa-sha256;
	cv=none;
	b=H3pDSU3S1vWnJcZJVSJAcKiwgk+WQ7AKzaCfn0COzHnO7RDaqgO9+Bp0bK2UKnUonQs0Y2
	m5oolz3kHa7w2E6GtLJWCCu2UrMSUY//hKjpzSbP815F0jwapq7wwxkq9CX199JBSOk5c+
	3WfP+86yyxVrSR/e/t1M/NEm4AUWMg0=
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1740083822;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=IMhy9qiICvzONWp5u6SxcMcMKcO6AHzHEe89orkSRsM=;
	b=AzIuPu9JYkffTRme0BVOu2hLBk3Qbecfu7NM4pyG/3fuz2j1WCgDOMt+EZe/SkkFDrmDMy
	nZd7ELMXX3cGPCQ8LvM3T/phz64eZofrlTT+pBZTYc94ZH/i6fvl0qV9b9ZPir+gCEe+mu
	+Wt01KEWnESUss5B14eyIiH1jxO4Duc=
Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com
 [209.85.222.200]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id
 us-mta-383-OEmaY9XPNEe62ixDV7oaRQ-1; Thu, 20 Feb 2025 15:36:59 -0500
X-MC-Unique: OEmaY9XPNEe62ixDV7oaRQ-1
X-Mimecast-MFC-AGG-ID: OEmaY9XPNEe62ixDV7oaRQ_1740083818
Received: by mail-qk1-f200.google.com with SMTP id af79cd13be357-7c0bb434620so198591185a.3
        for <linux-mm@kvack.org>; Thu, 20 Feb 2025 12:36:59 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1740083818; x=1740688618;
        h=content-transfer-encoding:in-reply-to:from:content-language
         :references:cc:to:subject:user-agent:mime-version:date:message-id
         :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=IMhy9qiICvzONWp5u6SxcMcMKcO6AHzHEe89orkSRsM=;
        b=qTNExFbWJBXPh/gMcjxzzZpPrxfcVHKgPyCPTWeCnlUpOYzD7wc6fL9f1v1gfAuks1
         Wsg8idFTtQ7GNlytVucayQxdu7abgb3Pfd8ZBTEumVmZ7RXV3gHR3s+T57CIpHFjsn7f
         E2ifU9L/ClAdObBxUe9Hm9scakpoaBsDX08tcuWxp4CVVK3pVEXJ3tMhCRs6m+aQGB7J
         rg8k8zNhu/acNKuocP/uxHQY5YeV85E6IavOfloWOZE5oJJtWoptIIVo15Co6uP0t6N9
         ri14MCfjkiAPfa9zqM2PqSy5TwTMXKMXLoazgPGAqZLkM1PVrOzi9UIXI3Sk2BJNkI85
         67gQ==
X-Forwarded-Encrypted: i=1; AJvYcCWv7yWKaMohMgRNmPDfZdpskBEvIjFWbE13eEGEt4mvk45gs7gXEiMKwYs67MHq//xfb/FNJtkNVA==@kvack.org
X-Gm-Message-State: AOJu0Yw3y5eIa79A+OVYrmjkchMemD+netfC/SznCQ962WTrZLRgFPM+
	gwb0hSZr2mtWvFn9DEP54Wp068G6Th9kqnalc4kqTXzxRAKuEuY/gb0premCJcQBFuLhVrCrljA
	Xu+QZDS9iYHOrZpFh6S15ly/oH6vEgQewPHQAs2HTRXbWxQF2
X-Gm-Gg: ASbGncuvprxLFlCk9B1BBILUArHeXlUJLh2eVsK1t6aqoaBujSIpdTJIVd+ah08w+B7
	ei7mcrGaipnxMavCMc7CPibSYofOxO01EtJeS+GmEyCiuJJpBLBWb81gMHyS0YORqHB3hPzxJI3
	R6bYoQ3ILrGwWjKt3qfARfmsP8WvKcF5Mj9dsP6DkhzKt2u+HVhzJPH+EzTZEGXtgPF2nkDoOJ5
	4ihyNh7Wdv46G2sQoqNnHnSOfWDl40FqscBOHGN9CniTXc4g6iG9j9NYOCq2W90Z4y8neUA/InK
	bQnR
X-Received: by 2002:a05:620a:269f:b0:7c0:c568:2ab9 with SMTP id af79cd13be357-7c0cf953f67mr30938085a.35.1740083818095;
        Thu, 20 Feb 2025 12:36:58 -0800 (PST)
X-Google-Smtp-Source: AGHT+IF4OTd1+4Kk9zClIjp8SNBVcJyux52H4N1MVp9g47S8DyVsnoN7iPs0orNbXKfk0IPgzBwf+w==
X-Received: by 2002:a05:620a:269f:b0:7c0:c568:2ab9 with SMTP id af79cd13be357-7c0cf953f67mr30933585a.35.1740083817755;
        Thu, 20 Feb 2025 12:36:57 -0800 (PST)
Received: from [192.168.2.110] ([70.52.22.87])
        by smtp.gmail.com with ESMTPSA id af79cd13be357-7c096e87af7sm561315085a.79.2025.02.20.12.36.57
        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
        Thu, 20 Feb 2025 12:36:57 -0800 (PST)
Message-ID: <0a8bd481-0b97-416d-935e-84828016445d@redhat.com>
Date: Thu, 20 Feb 2025 15:36:46 -0500
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: [PATCH 1/4] mm: page_ext: add an iteration API for page
 extensions
To: David Hildenbrand <david@redhat.com>, linux-kernel@vger.kernel.org,
 linux-mm@kvack.org, yuzhao@google.com, pasha.tatashin@soleen.com
Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, muchun.song@linux.dev
References: <cover.1739931468.git.luizcap@redhat.com>
 <3f0e058aef3951b39cf6bb4259c247352d4fe736.1739931468.git.luizcap@redhat.com>
 <4cb93166-29fd-4aea-965b-5dfb62d4dc8c@redhat.com>
From: Luiz Capitulino <luizcap@redhat.com>
In-Reply-To: <4cb93166-29fd-4aea-965b-5dfb62d4dc8c@redhat.com>
X-Mimecast-Spam-Score: 0
X-Mimecast-MFC-PROC-ID: IZZMdMbH4ZqsB8jrHlHfpPrQUFR7CLFsr72jS3lKTDY_1740083818
X-Mimecast-Originator: redhat.com
Content-Language: en-US, en-CA
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Rspam-User: 
X-Stat-Signature: dw1pgej36eofe9fhzcu6zs354iiqba8j
X-Rspamd-Server: rspam05
X-Rspamd-Queue-Id: 4C6B8180005
X-HE-Tag: 1740083823-848458
X-HE-Meta: U2FsdGVkX18/Mpmx5UJvzWI5jcZA6h9OCOE8GGnvyrZ+huIVclqDhtAe94LcYfqFys9ywfIg7BZt4liq/VaVHg/cIckCB5etq6J2ooyETOwULb6lUL6rE3zITN44cE3+UXNQRe9fN+FoLtwNCxi9JiEgd5CD6XWtHhobcsvJ/tYGwmg3+C1BR/CwpxGEjL2X7BQLmiC2ptp53BZn1FEH5vlgHGHMU/dqFEimXR5JBMhxIXJTbu+AbaIu/loFNXc9tBzAMknCZLmBfPMAUd7LXx9sr7ToYiwo+1iNH+EFHfihfGSRVk+CVZFS+e9gEUBcB0NmJ04SPLQTF0JQdgrdKdMvsNGJQECnrw9UV1f/KYqojt/Ng5NGEn9b4kPU1Un3lwcy6lywF81KwVeJc2efg+3L165LESpbucbjqkZxyHoCGl9TkA2UljxPMgirScWO8oKdWGNZRKlcq0EFOiDmzpzurt90bLJWAACcyW9oijIfo8vNfnx/qrozKJ/Ryl5aaE2/dKEarIjRAgPMa6svPJMMQqPxXa6tAPteGiXj40144ZaDNTKfEnlg8in6BFbFl6LPDjG9Tp3QehmPfDi/u7ol+FXvKjUvxEnwJ+t/G5MTTLG5ClTzQyI7cDxzI+GpjAvSL6LLPMNlKXDpSWbK7yP6wUAngxyHIlPYgeHuWxKJs5WpNB+xFnL5wxxwlXcdHghPUBazbVHCL/KgPYM3j7pJ1QasyQuPMF1w1TmMP4WZ+CuKBOQV9K2wsG+zimi/j8UWxXdDs70CCPgKZ8H1Js758R13TePDzV0mH6EG3Tpg0K+naJ6OZt9veBbxt8uYZppj8jfz7IdJJZtevzffnYUVsVuYuwx9RxQgfx2Gmq2OOXxvhyyEIDW52nkrkEXV+7G9pIyEMYFOsHELlQByyCMgQDoM7yLnW3TnUINg2qCUsGJo0ZB8ouqM+lfoT1t5AgPPiZM+NV39ExMqiIF
 SGmt1Nke
 Mhk0OosbAnN80tlX1HdqSauSSqXGtnNgaBZDmngPnUJUhUrJqmTCrhio24BrHPLVxJMlM0bhGcTuaFEecJS45O56Xi5yaUGcXYlHIlr9zAFWnzOf69G1P7pexsFUkppFYALt+0+az6sU0g4XmWvhLZ2dbvuTobGrQdE4cOPbu3wkBjaIKXhSKplgXcLoSzTTIGqgN3jL4yklK1PQVQNEy7DIzZCuoaVeM1d0VshfnQAA0MY1AGIuRQIpspu4gX2Qw18jZvS7oEtG7LJ0md+23Pmy4xUddAFBgaIisJN8k/57mOOjfyzJLRieQpuc7/tUO66zSxJS4I+/EYCkBAzqZoXyxS8C33UhJhNaQbKBH75AJ/yc8bC/eyTv6c0x7EjMSzCQceRDzLIZ5JWd38m+pPwnbTis+/2tljqm0ZHDmBPW+afElin1waRjEOr4Va5dib9RG
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On 2025-02-20 05:59, David Hildenbrand wrote:
> On 19.02.25 03:17, Luiz Capitulino wrote:
>> The page extension implementation assumes that all page extensions of
>> a given page order are stored in the same memory section. The function
>> page_ext_next() relies on this assumption by adding an offset to the
>> current object to return the next adjacent page extension.
>>
>> This behavior works as expected for flatmem but fails for sparsemem when
>> using 1G pages. The commit cf54f310d0d3 ("mm/hugetlb: use __GFP_COMP for
>> gigantic folios") exposes this issue, making it possible for a crash when
>> using page_owner or page_table_check page extensions.
>>
>> The problem is that for 1G pages, the page extensions may span memory
>> section boundaries and be stored in different memory sections. This issue
>> was not visible before commit cf54f310d0d3 ("mm/hugetlb: use __GFP_COMP
>> for gigantic folios") because alloc_contig_pages() never passed more than
>> MAX_PAGE_ORDER to post_alloc_hook(). However, the series introducing
>> mentioned commit changed this behavior allowing the full 1G page order
>> to be passed.
>>
>> Reproducer:
>>
>>   1. Build the kernel with CONFIG_SPARSEMEM=y and table extensions
>>      support
>>   2. Pass 'default_hugepagesz=1 page_owner=on' in the kernel command-line
>>   3. Reserve one 1G page at run-time, this should crash (backtrace below)
>>
>> To address this issue, this commit introduces a new API for iterating
>> through page extensions. The main iteration loops are for_each_page_ext()
>> and for_each_page_ext_order(). Both must be called with the RCU read
>> lock taken. Here's an usage example:
>>
>> """
>> struct page_ext_iter iter;
>> struct page_ext *page_ext;
>>
>> ...
>>
>> rcu_read_lock();
>> for_each_page_ext_order(page, order, page_ext, iter) {
>>     struct my_page_ext *obj = get_my_page_ext_obj(page_ext);
>>     ...
>> }
>> rcu_read_unlock();
>> """
>>
> 
> [...]
> 
>> +struct page_ext *page_ext_iter_begin(struct page_ext_iter *iter, struct page *page);
>> +struct page_ext *page_ext_iter_next(struct page_ext_iter *iter);
>> +
>> +/**
>> + * page_ext_iter_get() - Get current page extension
>> + * @iter: page extension iterator.
>> + *
>> + * Return: NULL if no page_ext exists for this iterator.
>> + */
>> +static inline struct page_ext *page_ext_iter_get(const struct page_ext_iter *iter)
>> +{
>> +    return iter->page_ext;
>> +}
>> +
>> +/**
>> + * for_each_page_ext(): iterate through page_ext objects.
>> + * @__page: the page we're interested in
>> + * @__pgcount: how many pages to iterate through
>> + * @__page_ext: struct page_ext pointer where the current page_ext
>> + *              object is returned
>> + * @__iter: struct page_ext_iter object (defined in the stack)
>> + *
>> + * IMPORTANT: must be called with RCU read lock taken.
>> + */
>> +#define for_each_page_ext(__page, __pgcount, __page_ext, __iter) \
>> +    __page_ext = page_ext_iter_begin(&__iter, __page);     \
> 
> Doing stuff out of the loop is dangerous. Assume something does
> 
> if (xxx)
>      for_each_page_ext()
> 
> Just move that inside the for().
> 
> for (__page_ext = page_ext_iter_begin(&__iter, __page), __iter.index = 0)

Oh, good point. Will do the change.

>> +    for (__iter.index = 0;                                 \
>> +        __page_ext && __iter.index < __pgcount;        \
>> +        __page_ext = page_ext_iter_next(&__iter),      \
>> +        __iter.index++)
> 
> Hm, if we now have an index, why not turn iter.pfn -> iter.start_pfn, and only adjust the index in page_ext_iter_next?
> 
> Then you can set the index to 0 in page_ext_iter_begin() and have here
> 
> for (__page_ext = page_ext_iter_begin(&__iter, __page),
>       __page_ext && __iter.index < __pgcount,
>       __page_ext = page_ext_iter_next(&__iter);)

I can do this if you feel strong about it, but I prefer explicitly over
implicitly. I moved the index into the iter object just to avoid having
to define it in the macro's body. Also, the way I did it allows for
using page_ext_iter_begin()/page_ext_iter_next() own their if the need
arises.

> A page_ext_iter_reset() could then simply reset the index=0 and
> lookup the page_ext(start_pfn + index) == page_ext(start_pfn)

Just note we don't have page_ext_iter_reset() today (and I guess it's
not needed).

>> +
>> +/**
>> + * for_each_page_ext_order(): iterate through page_ext objects
>> + *                            for a given page order
>> + * @__page: the page we're interested in
>> + * @__order: page order to iterate through
>> + * @__page_ext: struct page_ext pointer where the current page_ext
>> + *              object is returned
>> + * @__iter: struct page_ext_iter object (defined in the stack)
>> + *
>> + * IMPORTANT: must be called with RCU read lock taken.
>> + */
>> +#define for_each_page_ext_order(__page, __order, __page_ext, __iter) \
>> +    for_each_page_ext(__page, (1UL << __order), __page_ext, __iter)
>> +
>>   #else /* !CONFIG_PAGE_EXTENSION */
>>   struct page_ext;
>> diff --git a/mm/page_ext.c b/mm/page_ext.c
>> index 641d93f6af4c1..508deb04d5ead 100644
>> --- a/mm/page_ext.c
>> +++ b/mm/page_ext.c
>> @@ -549,3 +549,44 @@ void page_ext_put(struct page_ext *page_ext)
>>       rcu_read_unlock();
>>   }
>> +
>> +/**
>> + * page_ext_iter_begin() - Prepare for iterating through page extensions.
>> + * @iter: page extension iterator.
>> + * @page: The page we're interested in.
>> + *
>> + * Must be called with RCU read lock taken.
>> + *
>> + * Return: NULL if no page_ext exists for this page.
>> + */
>> +struct page_ext *page_ext_iter_begin(struct page_ext_iter *iter, struct page *page)
>> +{
>> +    iter->pfn = page_to_pfn(page);
>> +    iter->page_ext = lookup_page_ext(page);
>> +
>> +    return iter->page_ext;
>> +}
>> +
>> +/**
>> + * page_ext_iter_next() - Get next page extension
>> + * @iter: page extension iterator.
>> + *
>> + * Must be called with RCU read lock taken.
>> + *
>> + * Return: NULL if no next page_ext exists.
>> + */
>> +struct page_ext *page_ext_iter_next(struct page_ext_iter *iter)
>> +{
>> +    if (WARN_ON_ONCE(!iter->page_ext))
>> +        return NULL;
>> +
>> +    iter->pfn++;
>  > +> +    if (page_ext_iter_next_fast_possible(iter->pfn)) {
>> +        iter->page_ext = page_ext_next(iter->page_ext);
>> +    } else {
>> +        iter->page_ext = lookup_page_ext(pfn_to_page(iter->pfn));
>> +    }
>> +
>> +    return iter->page_ext;
>> +}
> 
> We now always have a function call when calling into page_ext_iter_next(). Could we move that to the header and rather expose lookup_page_ext() ?

I personally don't like over-using inline functions, also I don't think this
code needs optimization since the current clients make the affected code paths
slow anyways (and this also applies to the likely/unlikely use in page_owner
and page_table_check, I'd drop all of them if you ask me). But again, I can
change if this would prevent you from giving your ACK :)