linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 1/2] mm/page_alloc: drop the unnecessary pfn_valid() for start pfn
@ 2023-04-23 10:59 Baolin Wang
  2023-04-23 10:59 ` [PATCH v2 2/2] mm/page_alloc: add some comments to explain the possible hole in __pageblock_pfn_to_page() Baolin Wang
  2023-04-24  9:50 ` [PATCH v2 1/2] mm/page_alloc: drop the unnecessary pfn_valid() for start pfn Michal Hocko
  0 siblings, 2 replies; 14+ messages in thread
From: Baolin Wang @ 2023-04-23 10:59 UTC (permalink / raw)
  To: akpm
  Cc: rppt, ying.huang, mgorman, vbabka, mhocko, david, baolin.wang,
	linux-mm, linux-kernel

We've already used pfn_to_online_page() for start pfn to make sure
it is online and valid, so the pfn_valid() for the start pfn is
unnecessary, drop it.

Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
---
Changes from v1:
 - Collect reviewed tags. Thanks David and Ying.
---
 mm/page_alloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9de2a18519a1..6457b64fe562 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1512,7 +1512,7 @@ struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
 	/* end_pfn is one past the range we are checking */
 	end_pfn--;
 
-	if (!pfn_valid(start_pfn) || !pfn_valid(end_pfn))
+	if (!pfn_valid(end_pfn))
 		return NULL;
 
 	start_page = pfn_to_online_page(start_pfn);
-- 
2.27.0



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 2/2] mm/page_alloc: add some comments to explain the possible hole in __pageblock_pfn_to_page()
  2023-04-23 10:59 [PATCH v2 1/2] mm/page_alloc: drop the unnecessary pfn_valid() for start pfn Baolin Wang
@ 2023-04-23 10:59 ` Baolin Wang
  2023-04-24  2:24   ` Huang, Ying
  2023-04-24  9:54   ` Michal Hocko
  2023-04-24  9:50 ` [PATCH v2 1/2] mm/page_alloc: drop the unnecessary pfn_valid() for start pfn Michal Hocko
  1 sibling, 2 replies; 14+ messages in thread
From: Baolin Wang @ 2023-04-23 10:59 UTC (permalink / raw)
  To: akpm
  Cc: rppt, ying.huang, mgorman, vbabka, mhocko, david, baolin.wang,
	linux-mm, linux-kernel

Now the __pageblock_pfn_to_page() is used by set_zone_contiguous(), which
checks whether the given zone contains holes, and uses pfn_to_online_page()
to validate if the start pfn is online and valid, as well as using pfn_valid()
to validate the end pfn.

However, the __pageblock_pfn_to_page() function may return non-NULL even
if the end pfn of a pageblock is in a memory hole in some situations. For
example, if the pageblock order is MAX_ORDER, which will fall into 2
sub-sections, and the end pfn of the pageblock may be hole even though
the start pfn is online and valid.

This did not break anything until now, but the zone continuous is fragile
in this possible scenario. So as previous discussion[1], it is better to
add some comments to explain this possible issue in case there are some
future pfn walkers that rely on this.

[1] https://lore.kernel.org/all/87r0sdsmr6.fsf@yhuang6-desk2.ccr.corp.intel.com/

Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
Changes from v1:
 - Update the comments per Ying and Mike, thanks.
---
 mm/page_alloc.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6457b64fe562..9756d66f471c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1502,6 +1502,13 @@ void __free_pages_core(struct page *page, unsigned int order)
  * interleaving within a single pageblock. It is therefore sufficient to check
  * the first and last page of a pageblock and avoid checking each individual
  * page in a pageblock.
+ *
+ * Note: the function may return non-NULL even if the end pfn of a pageblock
+ * is in a memory hole in some situations. For example, if the pageblock
+ * order is MAX_ORDER, which will fall into 2 sub-sections, and the end pfn
+ * of the pageblock may be hole even though the start pfn is online and valid.
+ * This did not break anything until now, but be careful about this possible
+ * issue when checking whether all pfns of a pageblock are valid.
  */
 struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
 				     unsigned long end_pfn, struct zone *zone)
-- 
2.27.0



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] mm/page_alloc: add some comments to explain the possible hole in __pageblock_pfn_to_page()
  2023-04-23 10:59 ` [PATCH v2 2/2] mm/page_alloc: add some comments to explain the possible hole in __pageblock_pfn_to_page() Baolin Wang
@ 2023-04-24  2:24   ` Huang, Ying
  2023-04-24  9:54   ` Michal Hocko
  1 sibling, 0 replies; 14+ messages in thread
From: Huang, Ying @ 2023-04-24  2:24 UTC (permalink / raw)
  To: Baolin Wang
  Cc: akpm, rppt, mgorman, vbabka, mhocko, david, linux-mm, linux-kernel

Baolin Wang <baolin.wang@linux.alibaba.com> writes:

> Now the __pageblock_pfn_to_page() is used by set_zone_contiguous(), which
> checks whether the given zone contains holes, and uses pfn_to_online_page()
> to validate if the start pfn is online and valid, as well as using pfn_valid()
> to validate the end pfn.
>
> However, the __pageblock_pfn_to_page() function may return non-NULL even
> if the end pfn of a pageblock is in a memory hole in some situations. For
> example, if the pageblock order is MAX_ORDER, which will fall into 2
> sub-sections, and the end pfn of the pageblock may be hole even though
> the start pfn is online and valid.
>
> This did not break anything until now, but the zone continuous is fragile
> in this possible scenario. So as previous discussion[1], it is better to
> add some comments to explain this possible issue in case there are some
> future pfn walkers that rely on this.
>
> [1] https://lore.kernel.org/all/87r0sdsmr6.fsf@yhuang6-desk2.ccr.corp.intel.com/
>
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>

Looks good to me!  Thanks!

Reviewed-by: "Huang, Ying" <ying.huang@intel.com>

> ---
> Changes from v1:
>  - Update the comments per Ying and Mike, thanks.
> ---
>  mm/page_alloc.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 6457b64fe562..9756d66f471c 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1502,6 +1502,13 @@ void __free_pages_core(struct page *page, unsigned int order)
>   * interleaving within a single pageblock. It is therefore sufficient to check
>   * the first and last page of a pageblock and avoid checking each individual
>   * page in a pageblock.
> + *
> + * Note: the function may return non-NULL even if the end pfn of a pageblock
> + * is in a memory hole in some situations. For example, if the pageblock
> + * order is MAX_ORDER, which will fall into 2 sub-sections, and the end pfn
> + * of the pageblock may be hole even though the start pfn is online and valid.
> + * This did not break anything until now, but be careful about this possible
> + * issue when checking whether all pfns of a pageblock are valid.
>   */
>  struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
>  				     unsigned long end_pfn, struct zone *zone)


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] mm/page_alloc: drop the unnecessary pfn_valid() for start pfn
  2023-04-23 10:59 [PATCH v2 1/2] mm/page_alloc: drop the unnecessary pfn_valid() for start pfn Baolin Wang
  2023-04-23 10:59 ` [PATCH v2 2/2] mm/page_alloc: add some comments to explain the possible hole in __pageblock_pfn_to_page() Baolin Wang
@ 2023-04-24  9:50 ` Michal Hocko
  2023-04-24 10:46   ` Baolin Wang
  1 sibling, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2023-04-24  9:50 UTC (permalink / raw)
  To: Baolin Wang
  Cc: akpm, rppt, ying.huang, mgorman, vbabka, david, linux-mm, linux-kernel

On Sun 23-04-23 18:59:10, Baolin Wang wrote:
> We've already used pfn_to_online_page() for start pfn to make sure

Who is we? I do not see any note explicitly requiring that start_pfn has
to be valid for __pageblock_pfn_to_page.

> it is online and valid, so the pfn_valid() for the start pfn is
> unnecessary, drop it.
> 
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> Reviewed-by: David Hildenbrand <david@redhat.com>
> Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
> ---
> Changes from v1:
>  - Collect reviewed tags. Thanks David and Ying.
> ---
>  mm/page_alloc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 9de2a18519a1..6457b64fe562 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1512,7 +1512,7 @@ struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
>  	/* end_pfn is one past the range we are checking */
>  	end_pfn--;
>  
> -	if (!pfn_valid(start_pfn) || !pfn_valid(end_pfn))
> +	if (!pfn_valid(end_pfn))
>  		return NULL;
>  
>  	start_page = pfn_to_online_page(start_pfn);
> -- 
> 2.27.0

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] mm/page_alloc: add some comments to explain the possible hole in __pageblock_pfn_to_page()
  2023-04-23 10:59 ` [PATCH v2 2/2] mm/page_alloc: add some comments to explain the possible hole in __pageblock_pfn_to_page() Baolin Wang
  2023-04-24  2:24   ` Huang, Ying
@ 2023-04-24  9:54   ` Michal Hocko
  2023-04-24 11:20     ` Baolin Wang
  1 sibling, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2023-04-24  9:54 UTC (permalink / raw)
  To: Baolin Wang
  Cc: akpm, rppt, ying.huang, mgorman, vbabka, david, linux-mm, linux-kernel

On Sun 23-04-23 18:59:11, Baolin Wang wrote:
> Now the __pageblock_pfn_to_page() is used by set_zone_contiguous(), which
> checks whether the given zone contains holes, and uses pfn_to_online_page()
> to validate if the start pfn is online and valid, as well as using pfn_valid()
> to validate the end pfn.
> 
> However, the __pageblock_pfn_to_page() function may return non-NULL even
> if the end pfn of a pageblock is in a memory hole in some situations. For
> example, if the pageblock order is MAX_ORDER, which will fall into 2
> sub-sections, and the end pfn of the pageblock may be hole even though
> the start pfn is online and valid.
> 
> This did not break anything until now, but the zone continuous is fragile
> in this possible scenario. So as previous discussion[1], it is better to
> add some comments to explain this possible issue in case there are some
> future pfn walkers that rely on this.
> 
> [1] https://lore.kernel.org/all/87r0sdsmr6.fsf@yhuang6-desk2.ccr.corp.intel.com/

Do I remember correctly you've had a specific configuration that would
trigger this case?

> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
> Changes from v1:
>  - Update the comments per Ying and Mike, thanks.
> ---
>  mm/page_alloc.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 6457b64fe562..9756d66f471c 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1502,6 +1502,13 @@ void __free_pages_core(struct page *page, unsigned int order)
>   * interleaving within a single pageblock. It is therefore sufficient to check
>   * the first and last page of a pageblock and avoid checking each individual
>   * page in a pageblock.
> + *
> + * Note: the function may return non-NULL even if the end pfn of a pageblock
> + * is in a memory hole in some situations. For example, if the pageblock
> + * order is MAX_ORDER, which will fall into 2 sub-sections, and the end pfn
> + * of the pageblock may be hole even though the start pfn is online and valid.
> + * This did not break anything until now, but be careful about this possible
> + * issue when checking whether all pfns of a pageblock are valid.

It is not really clear what you should be doing (other than to be
careful which is not helpful much TBH) when you encounter this
situation. If the reality changes and this would break in the future
what would breakage look like? What should be done about that?

>   */
>  struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
>  				     unsigned long end_pfn, struct zone *zone)
> -- 
> 2.27.0

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] mm/page_alloc: drop the unnecessary pfn_valid() for start pfn
  2023-04-24  9:50 ` [PATCH v2 1/2] mm/page_alloc: drop the unnecessary pfn_valid() for start pfn Michal Hocko
@ 2023-04-24 10:46   ` Baolin Wang
  2023-04-24 10:54     ` Michal Hocko
  0 siblings, 1 reply; 14+ messages in thread
From: Baolin Wang @ 2023-04-24 10:46 UTC (permalink / raw)
  To: Michal Hocko
  Cc: akpm, rppt, ying.huang, mgorman, vbabka, david, linux-mm, linux-kernel



On 4/24/2023 5:50 PM, Michal Hocko wrote:
> On Sun 23-04-23 18:59:10, Baolin Wang wrote:
>> We've already used pfn_to_online_page() for start pfn to make sure
> 
> Who is we? I do not see any note explicitly requiring that start_pfn has
> to be valid for __pageblock_pfn_to_page.

Sorry for confusing, what I mean is the __pageblock_pfn_to_page() 
function, which has used pfn_to_online_page() for start pfn. So the 
pfn_valid() in __pageblock_pfn_to_page() for start pfn is unnecessary.

I will update the commit log to make it clear.

>> it is online and valid, so the pfn_valid() for the start pfn is
>> unnecessary, drop it.
>>
>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> Reviewed-by: David Hildenbrand <david@redhat.com>
>> Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
>> ---
>> Changes from v1:
>>   - Collect reviewed tags. Thanks David and Ying.
>> ---
>>   mm/page_alloc.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 9de2a18519a1..6457b64fe562 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -1512,7 +1512,7 @@ struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
>>   	/* end_pfn is one past the range we are checking */
>>   	end_pfn--;
>>   
>> -	if (!pfn_valid(start_pfn) || !pfn_valid(end_pfn))
>> +	if (!pfn_valid(end_pfn))
>>   		return NULL;
>>   
>>   	start_page = pfn_to_online_page(start_pfn);
>> -- 
>> 2.27.0
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] mm/page_alloc: drop the unnecessary pfn_valid() for start pfn
  2023-04-24 10:46   ` Baolin Wang
@ 2023-04-24 10:54     ` Michal Hocko
  2023-04-24 11:21       ` Baolin Wang
  0 siblings, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2023-04-24 10:54 UTC (permalink / raw)
  To: Baolin Wang
  Cc: akpm, rppt, ying.huang, mgorman, vbabka, david, linux-mm, linux-kernel

On Mon 24-04-23 18:46:40, Baolin Wang wrote:
> 
> 
> On 4/24/2023 5:50 PM, Michal Hocko wrote:
> > On Sun 23-04-23 18:59:10, Baolin Wang wrote:
> > > We've already used pfn_to_online_page() for start pfn to make sure
> > 
> > Who is we? I do not see any note explicitly requiring that start_pfn has
> > to be valid for __pageblock_pfn_to_page.
> 
> Sorry for confusing, what I mean is the __pageblock_pfn_to_page() function,
> which has used pfn_to_online_page() for start pfn. So the pfn_valid() in
> __pageblock_pfn_to_page() for start pfn is unnecessary.
> 
> I will update the commit log to make it clear.

Your comment suggested that the check _has_ already been done. Which is
not the case. pfn_to_online_page is called later in the function so I
guess you should rephrase as following:

"
__pageblock_pfn_to_page currently performs both pfn_valid check and
pfn_to_online_page. The former one is redundant because the latter is a
stronger check. Drop pfn_valid.
"

With that or something going along with that. Feel free to add
Acked-by: Michal Hocko <mhocko@suse.com>

> > > it is online and valid, so the pfn_valid() for the start pfn is
> > > unnecessary, drop it.
> > > 
> > > Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> > > Reviewed-by: David Hildenbrand <david@redhat.com>
> > > Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
> > > ---
> > > Changes from v1:
> > >   - Collect reviewed tags. Thanks David and Ying.
> > > ---
> > >   mm/page_alloc.c | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index 9de2a18519a1..6457b64fe562 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -1512,7 +1512,7 @@ struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
> > >   	/* end_pfn is one past the range we are checking */
> > >   	end_pfn--;
> > > -	if (!pfn_valid(start_pfn) || !pfn_valid(end_pfn))
> > > +	if (!pfn_valid(end_pfn))
> > >   		return NULL;
> > >   	start_page = pfn_to_online_page(start_pfn);
> > > -- 
> > > 2.27.0
> > 

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] mm/page_alloc: add some comments to explain the possible hole in __pageblock_pfn_to_page()
  2023-04-24  9:54   ` Michal Hocko
@ 2023-04-24 11:20     ` Baolin Wang
  2023-04-24 11:34       ` Michal Hocko
  0 siblings, 1 reply; 14+ messages in thread
From: Baolin Wang @ 2023-04-24 11:20 UTC (permalink / raw)
  To: Michal Hocko
  Cc: akpm, rppt, ying.huang, mgorman, vbabka, david, linux-mm, linux-kernel



On 4/24/2023 5:54 PM, Michal Hocko wrote:
> On Sun 23-04-23 18:59:11, Baolin Wang wrote:
>> Now the __pageblock_pfn_to_page() is used by set_zone_contiguous(), which
>> checks whether the given zone contains holes, and uses pfn_to_online_page()
>> to validate if the start pfn is online and valid, as well as using pfn_valid()
>> to validate the end pfn.
>>
>> However, the __pageblock_pfn_to_page() function may return non-NULL even
>> if the end pfn of a pageblock is in a memory hole in some situations. For
>> example, if the pageblock order is MAX_ORDER, which will fall into 2
>> sub-sections, and the end pfn of the pageblock may be hole even though
>> the start pfn is online and valid.
>>
>> This did not break anything until now, but the zone continuous is fragile
>> in this possible scenario. So as previous discussion[1], it is better to
>> add some comments to explain this possible issue in case there are some
>> future pfn walkers that rely on this.
>>
>> [1] https://lore.kernel.org/all/87r0sdsmr6.fsf@yhuang6-desk2.ccr.corp.intel.com/
> 
> Do I remember correctly you've had a specific configuration that would
> trigger this case?

Yes, I provided an example in previous thread [2] so show the 
__pageblock_pfn_to_page() is fragile in some cases.

[2] 
https://lore.kernel.org/all/52dfdd2e-9c99-eac4-233e-59919a24323e@linux.alibaba.com/

> 
>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> ---
>> Changes from v1:
>>   - Update the comments per Ying and Mike, thanks.
>> ---
>>   mm/page_alloc.c | 7 +++++++
>>   1 file changed, 7 insertions(+)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 6457b64fe562..9756d66f471c 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -1502,6 +1502,13 @@ void __free_pages_core(struct page *page, unsigned int order)
>>    * interleaving within a single pageblock. It is therefore sufficient to check
>>    * the first and last page of a pageblock and avoid checking each individual
>>    * page in a pageblock.
>> + *
>> + * Note: the function may return non-NULL even if the end pfn of a pageblock
>> + * is in a memory hole in some situations. For example, if the pageblock
>> + * order is MAX_ORDER, which will fall into 2 sub-sections, and the end pfn
>> + * of the pageblock may be hole even though the start pfn is online and valid.
>> + * This did not break anything until now, but be careful about this possible
>> + * issue when checking whether all pfns of a pageblock are valid.
> 
> It is not really clear what you should be doing (other than to be
> careful which is not helpful much TBH) when you encounter this
> situation. If the reality changes and this would break in the future
> what would breakage look like? What should be done about that?

That depends on what the future pfn walkers do, which may access some 
hole memory with zero-init page frame. For example, if checking the 
__PageMovable() for a zero-init page frame, that will crash the system. 
But I can not list all the possible cases.

So how about below words?

  * Note: the function may return non-NULL even if the end pfn of a 
pageblock
  * is in a memory hole in some situations. For example, if the pageblock
  * order is MAX_ORDER, which will fall into 2 sub-sections, and the end pfn
  * of the pageblock may be hole even though the start pfn is online and 
valid.
  * This did not break anything until now, but be careful about this 
possible
  * issue when checking whether all pfns of a pageblock are valid, that may
  * lead to accessing empty page frame, and the worst case can crash the 
system.
  * So you should use pfn_to_onlie_page() instead of pfn_valid() to 
valid the
  * pfns in a pageblock if such case happens.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] mm/page_alloc: drop the unnecessary pfn_valid() for start pfn
  2023-04-24 10:54     ` Michal Hocko
@ 2023-04-24 11:21       ` Baolin Wang
  0 siblings, 0 replies; 14+ messages in thread
From: Baolin Wang @ 2023-04-24 11:21 UTC (permalink / raw)
  To: Michal Hocko
  Cc: akpm, rppt, ying.huang, mgorman, vbabka, david, linux-mm, linux-kernel



On 4/24/2023 6:54 PM, Michal Hocko wrote:
> On Mon 24-04-23 18:46:40, Baolin Wang wrote:
>>
>>
>> On 4/24/2023 5:50 PM, Michal Hocko wrote:
>>> On Sun 23-04-23 18:59:10, Baolin Wang wrote:
>>>> We've already used pfn_to_online_page() for start pfn to make sure
>>>
>>> Who is we? I do not see any note explicitly requiring that start_pfn has
>>> to be valid for __pageblock_pfn_to_page.
>>
>> Sorry for confusing, what I mean is the __pageblock_pfn_to_page() function,
>> which has used pfn_to_online_page() for start pfn. So the pfn_valid() in
>> __pageblock_pfn_to_page() for start pfn is unnecessary.
>>
>> I will update the commit log to make it clear.
> 
> Your comment suggested that the check _has_ already been done. Which is
> not the case. pfn_to_online_page is called later in the function so I
> guess you should rephrase as following:
> 
> "
> __pageblock_pfn_to_page currently performs both pfn_valid check and
> pfn_to_online_page. The former one is redundant because the latter is a
> stronger check. Drop pfn_valid.
> "

Yes, will change the commit log.

> 
> With that or something going along with that. Feel free to add
> Acked-by: Michal Hocko <mhocko@suse.com>

Thanks.

>>>> it is online and valid, so the pfn_valid() for the start pfn is
>>>> unnecessary, drop it.
>>>>
>>>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>>> Reviewed-by: David Hildenbrand <david@redhat.com>
>>>> Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
>>>> ---
>>>> Changes from v1:
>>>>    - Collect reviewed tags. Thanks David and Ying.
>>>> ---
>>>>    mm/page_alloc.c | 2 +-
>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>> index 9de2a18519a1..6457b64fe562 100644
>>>> --- a/mm/page_alloc.c
>>>> +++ b/mm/page_alloc.c
>>>> @@ -1512,7 +1512,7 @@ struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
>>>>    	/* end_pfn is one past the range we are checking */
>>>>    	end_pfn--;
>>>> -	if (!pfn_valid(start_pfn) || !pfn_valid(end_pfn))
>>>> +	if (!pfn_valid(end_pfn))
>>>>    		return NULL;
>>>>    	start_page = pfn_to_online_page(start_pfn);
>>>> -- 
>>>> 2.27.0
>>>
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] mm/page_alloc: add some comments to explain the possible hole in __pageblock_pfn_to_page()
  2023-04-24 11:20     ` Baolin Wang
@ 2023-04-24 11:34       ` Michal Hocko
  2023-04-24 11:40         ` Baolin Wang
  0 siblings, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2023-04-24 11:34 UTC (permalink / raw)
  To: Baolin Wang
  Cc: akpm, rppt, ying.huang, mgorman, vbabka, david, linux-mm, linux-kernel

On Mon 24-04-23 19:20:43, Baolin Wang wrote:
> 
> 
> On 4/24/2023 5:54 PM, Michal Hocko wrote:
> > On Sun 23-04-23 18:59:11, Baolin Wang wrote:
> > > Now the __pageblock_pfn_to_page() is used by set_zone_contiguous(), which
> > > checks whether the given zone contains holes, and uses pfn_to_online_page()
> > > to validate if the start pfn is online and valid, as well as using pfn_valid()
> > > to validate the end pfn.
> > > 
> > > However, the __pageblock_pfn_to_page() function may return non-NULL even
> > > if the end pfn of a pageblock is in a memory hole in some situations. For
> > > example, if the pageblock order is MAX_ORDER, which will fall into 2
> > > sub-sections, and the end pfn of the pageblock may be hole even though
> > > the start pfn is online and valid.
> > > 
> > > This did not break anything until now, but the zone continuous is fragile
> > > in this possible scenario. So as previous discussion[1], it is better to
> > > add some comments to explain this possible issue in case there are some
> > > future pfn walkers that rely on this.
> > > 
> > > [1] https://lore.kernel.org/all/87r0sdsmr6.fsf@yhuang6-desk2.ccr.corp.intel.com/
> > 
> > Do I remember correctly you've had a specific configuration that would
> > trigger this case?
> 
> Yes, I provided an example in previous thread [2] so show the
> __pageblock_pfn_to_page() is fragile in some cases.
> 
> [2] https://lore.kernel.org/all/52dfdd2e-9c99-eac4-233e-59919a24323e@linux.alibaba.com/

Please make it a part of the changelog.
 
> > 
> > > Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> > > ---
> > > Changes from v1:
> > >   - Update the comments per Ying and Mike, thanks.
> > > ---
> > >   mm/page_alloc.c | 7 +++++++
> > >   1 file changed, 7 insertions(+)
> > > 
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index 6457b64fe562..9756d66f471c 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -1502,6 +1502,13 @@ void __free_pages_core(struct page *page, unsigned int order)
> > >    * interleaving within a single pageblock. It is therefore sufficient to check
> > >    * the first and last page of a pageblock and avoid checking each individual
> > >    * page in a pageblock.
> > > + *
> > > + * Note: the function may return non-NULL even if the end pfn of a pageblock
> > > + * is in a memory hole in some situations. For example, if the pageblock
> > > + * order is MAX_ORDER, which will fall into 2 sub-sections, and the end pfn
> > > + * of the pageblock may be hole even though the start pfn is online and valid.
> > > + * This did not break anything until now, but be careful about this possible
> > > + * issue when checking whether all pfns of a pageblock are valid.
> > 
> > It is not really clear what you should be doing (other than to be
> > careful which is not helpful much TBH) when you encounter this
> > situation. If the reality changes and this would break in the future
> > what would breakage look like? What should be done about that?
> 
> That depends on what the future pfn walkers do, which may access some hole
> memory with zero-init page frame. For example, if checking the
> __PageMovable() for a zero-init page frame, that will crash the system. But
> I can not list all the possible cases.
> 
> So how about below words?
> 
>  * Note: the function may return non-NULL even if the end pfn of a pageblock
>  * is in a memory hole in some situations. For example, if the pageblock
>  * order is MAX_ORDER, which will fall into 2 sub-sections, and the end pfn
>  * of the pageblock may be hole even though the start pfn is online and
> valid.
>  * This did not break anything until now, but be careful about this possible
>  * issue when checking whether all pfns of a pageblock are valid, that may
>  * lead to accessing empty page frame, and the worst case can crash the
> system.
>  * So you should use pfn_to_onlie_page() instead of pfn_valid() to valid the
>  * pfns in a pageblock if such case happens.

Does that mean that struct page is not initialized and PagePoisoned will
trigger or it is just zero-prefilled?

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] mm/page_alloc: add some comments to explain the possible hole in __pageblock_pfn_to_page()
  2023-04-24 11:34       ` Michal Hocko
@ 2023-04-24 11:40         ` Baolin Wang
  2023-04-24 12:08           ` Michal Hocko
  0 siblings, 1 reply; 14+ messages in thread
From: Baolin Wang @ 2023-04-24 11:40 UTC (permalink / raw)
  To: Michal Hocko
  Cc: akpm, rppt, ying.huang, mgorman, vbabka, david, linux-mm, linux-kernel



On 4/24/2023 7:34 PM, Michal Hocko wrote:
> On Mon 24-04-23 19:20:43, Baolin Wang wrote:
>>
>>
>> On 4/24/2023 5:54 PM, Michal Hocko wrote:
>>> On Sun 23-04-23 18:59:11, Baolin Wang wrote:
>>>> Now the __pageblock_pfn_to_page() is used by set_zone_contiguous(), which
>>>> checks whether the given zone contains holes, and uses pfn_to_online_page()
>>>> to validate if the start pfn is online and valid, as well as using pfn_valid()
>>>> to validate the end pfn.
>>>>
>>>> However, the __pageblock_pfn_to_page() function may return non-NULL even
>>>> if the end pfn of a pageblock is in a memory hole in some situations. For
>>>> example, if the pageblock order is MAX_ORDER, which will fall into 2
>>>> sub-sections, and the end pfn of the pageblock may be hole even though
>>>> the start pfn is online and valid.
>>>>
>>>> This did not break anything until now, but the zone continuous is fragile
>>>> in this possible scenario. So as previous discussion[1], it is better to
>>>> add some comments to explain this possible issue in case there are some
>>>> future pfn walkers that rely on this.
>>>>
>>>> [1] https://lore.kernel.org/all/87r0sdsmr6.fsf@yhuang6-desk2.ccr.corp.intel.com/
>>>
>>> Do I remember correctly you've had a specific configuration that would
>>> trigger this case?
>>
>> Yes, I provided an example in previous thread [2] so show the
>> __pageblock_pfn_to_page() is fragile in some cases.
>>
>> [2] https://lore.kernel.org/all/52dfdd2e-9c99-eac4-233e-59919a24323e@linux.alibaba.com/
> 
> Please make it a part of the changelog.

Sure.

>   
>>>
>>>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>>> ---
>>>> Changes from v1:
>>>>    - Update the comments per Ying and Mike, thanks.
>>>> ---
>>>>    mm/page_alloc.c | 7 +++++++
>>>>    1 file changed, 7 insertions(+)
>>>>
>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>> index 6457b64fe562..9756d66f471c 100644
>>>> --- a/mm/page_alloc.c
>>>> +++ b/mm/page_alloc.c
>>>> @@ -1502,6 +1502,13 @@ void __free_pages_core(struct page *page, unsigned int order)
>>>>     * interleaving within a single pageblock. It is therefore sufficient to check
>>>>     * the first and last page of a pageblock and avoid checking each individual
>>>>     * page in a pageblock.
>>>> + *
>>>> + * Note: the function may return non-NULL even if the end pfn of a pageblock
>>>> + * is in a memory hole in some situations. For example, if the pageblock
>>>> + * order is MAX_ORDER, which will fall into 2 sub-sections, and the end pfn
>>>> + * of the pageblock may be hole even though the start pfn is online and valid.
>>>> + * This did not break anything until now, but be careful about this possible
>>>> + * issue when checking whether all pfns of a pageblock are valid.
>>>
>>> It is not really clear what you should be doing (other than to be
>>> careful which is not helpful much TBH) when you encounter this
>>> situation. If the reality changes and this would break in the future
>>> what would breakage look like? What should be done about that?
>>
>> That depends on what the future pfn walkers do, which may access some hole
>> memory with zero-init page frame. For example, if checking the
>> __PageMovable() for a zero-init page frame, that will crash the system. But
>> I can not list all the possible cases.
>>
>> So how about below words?
>>
>>   * Note: the function may return non-NULL even if the end pfn of a pageblock
>>   * is in a memory hole in some situations. For example, if the pageblock
>>   * order is MAX_ORDER, which will fall into 2 sub-sections, and the end pfn
>>   * of the pageblock may be hole even though the start pfn is online and
>> valid.
>>   * This did not break anything until now, but be careful about this possible
>>   * issue when checking whether all pfns of a pageblock are valid, that may
>>   * lead to accessing empty page frame, and the worst case can crash the
>> system.
>>   * So you should use pfn_to_onlie_page() instead of pfn_valid() to valid the
>>   * pfns in a pageblock if such case happens.
> 
> Does that mean that struct page is not initialized and PagePoisoned will
> trigger or it is just zero-prefilled?

In the example I provided[2], these page frames of the hole memory are 
zero-prefilled.

[2] 
https://lore.kernel.org/all/52dfdd2e-9c99-eac4-233e-59919a24323e@linux.alibaba.com/


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] mm/page_alloc: add some comments to explain the possible hole in __pageblock_pfn_to_page()
  2023-04-24 11:40         ` Baolin Wang
@ 2023-04-24 12:08           ` Michal Hocko
  2023-04-24 12:48             ` Baolin Wang
  0 siblings, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2023-04-24 12:08 UTC (permalink / raw)
  To: Baolin Wang
  Cc: akpm, rppt, ying.huang, mgorman, vbabka, david, linux-mm, linux-kernel

On Mon 24-04-23 19:40:30, Baolin Wang wrote:
> 
> 
> On 4/24/2023 7:34 PM, Michal Hocko wrote:
> > On Mon 24-04-23 19:20:43, Baolin Wang wrote:
> > > 
> > > 
> > > On 4/24/2023 5:54 PM, Michal Hocko wrote:
> > > > On Sun 23-04-23 18:59:11, Baolin Wang wrote:
> > > > > Now the __pageblock_pfn_to_page() is used by set_zone_contiguous(), which
> > > > > checks whether the given zone contains holes, and uses pfn_to_online_page()
> > > > > to validate if the start pfn is online and valid, as well as using pfn_valid()
> > > > > to validate the end pfn.
> > > > > 
> > > > > However, the __pageblock_pfn_to_page() function may return non-NULL even
> > > > > if the end pfn of a pageblock is in a memory hole in some situations. For
> > > > > example, if the pageblock order is MAX_ORDER, which will fall into 2
> > > > > sub-sections, and the end pfn of the pageblock may be hole even though
> > > > > the start pfn is online and valid.
> > > > > 
> > > > > This did not break anything until now, but the zone continuous is fragile
> > > > > in this possible scenario. So as previous discussion[1], it is better to
> > > > > add some comments to explain this possible issue in case there are some
> > > > > future pfn walkers that rely on this.
> > > > > 
> > > > > [1] https://lore.kernel.org/all/87r0sdsmr6.fsf@yhuang6-desk2.ccr.corp.intel.com/
> > > > 
> > > > Do I remember correctly you've had a specific configuration that would
> > > > trigger this case?
> > > 
> > > Yes, I provided an example in previous thread [2] so show the
> > > __pageblock_pfn_to_page() is fragile in some cases.
> > > 
> > > [2] https://lore.kernel.org/all/52dfdd2e-9c99-eac4-233e-59919a24323e@linux.alibaba.com/
> > 
> > Please make it a part of the changelog.
> 
> Sure.
> 
> > > > 
> > > > > Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> > > > > ---
> > > > > Changes from v1:
> > > > >    - Update the comments per Ying and Mike, thanks.
> > > > > ---
> > > > >    mm/page_alloc.c | 7 +++++++
> > > > >    1 file changed, 7 insertions(+)
> > > > > 
> > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > > > index 6457b64fe562..9756d66f471c 100644
> > > > > --- a/mm/page_alloc.c
> > > > > +++ b/mm/page_alloc.c
> > > > > @@ -1502,6 +1502,13 @@ void __free_pages_core(struct page *page, unsigned int order)
> > > > >     * interleaving within a single pageblock. It is therefore sufficient to check
> > > > >     * the first and last page of a pageblock and avoid checking each individual
> > > > >     * page in a pageblock.
> > > > > + *
> > > > > + * Note: the function may return non-NULL even if the end pfn of a pageblock
> > > > > + * is in a memory hole in some situations. For example, if the pageblock
> > > > > + * order is MAX_ORDER, which will fall into 2 sub-sections, and the end pfn
> > > > > + * of the pageblock may be hole even though the start pfn is online and valid.
> > > > > + * This did not break anything until now, but be careful about this possible
> > > > > + * issue when checking whether all pfns of a pageblock are valid.
> > > > 
> > > > It is not really clear what you should be doing (other than to be
> > > > careful which is not helpful much TBH) when you encounter this
> > > > situation. If the reality changes and this would break in the future
> > > > what would breakage look like? What should be done about that?
> > > 
> > > That depends on what the future pfn walkers do, which may access some hole
> > > memory with zero-init page frame. For example, if checking the
> > > __PageMovable() for a zero-init page frame, that will crash the system. But
> > > I can not list all the possible cases.
> > > 
> > > So how about below words?
> > > 
> > >   * Note: the function may return non-NULL even if the end pfn of a pageblock
> > >   * is in a memory hole in some situations. For example, if the pageblock
> > >   * order is MAX_ORDER, which will fall into 2 sub-sections, and the end pfn
> > >   * of the pageblock may be hole even though the start pfn is online and
> > > valid.
> > >   * This did not break anything until now, but be careful about this possible
> > >   * issue when checking whether all pfns of a pageblock are valid, that may
> > >   * lead to accessing empty page frame, and the worst case can crash the
> > > system.
> > >   * So you should use pfn_to_onlie_page() instead of pfn_valid() to valid the
> > >   * pfns in a pageblock if such case happens.
> > 
> > Does that mean that struct page is not initialized and PagePoisoned will
> > trigger or it is just zero-prefilled?
> 
> In the example I provided[2], these page frames of the hole memory are
> zero-prefilled.

OK, so make _that_ explicit in the comment. Essentially you want to say
that there are cases where we have zero-initialized struct pages for
memory holes. In general no pfn walker should touch a physical memory
range for pfn where the struct page doesn't contain any metadata it
recognizes. Zero fill struct pages do not contain any distinguishable
state so that makes it less of a problem.

All that being said I would reformulate the comment as follows:

	* Note: the function may return non-NULL struct page even for a
	* page block which contains a memory hole (i.e. there is no
	* physical memory for a subset of the pfn range). This should be
	* safe most of the time because struct pages are still zero
	* pre-filled and pfn walkers shouldn't touch any physical memory
	* range for which they do not recognize any specific metadata in
	* struct pages.


> [2] https://lore.kernel.org/all/52dfdd2e-9c99-eac4-233e-59919a24323e@linux.alibaba.com/

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] mm/page_alloc: add some comments to explain the possible hole in __pageblock_pfn_to_page()
  2023-04-24 12:08           ` Michal Hocko
@ 2023-04-24 12:48             ` Baolin Wang
  2023-04-24 13:08               ` Michal Hocko
  0 siblings, 1 reply; 14+ messages in thread
From: Baolin Wang @ 2023-04-24 12:48 UTC (permalink / raw)
  To: Michal Hocko
  Cc: akpm, rppt, ying.huang, mgorman, vbabka, david, linux-mm, linux-kernel



On 4/24/2023 8:08 PM, Michal Hocko wrote:
> On Mon 24-04-23 19:40:30, Baolin Wang wrote:
>>
>>
>> On 4/24/2023 7:34 PM, Michal Hocko wrote:
>>> On Mon 24-04-23 19:20:43, Baolin Wang wrote:
>>>>
>>>>
>>>> On 4/24/2023 5:54 PM, Michal Hocko wrote:
>>>>> On Sun 23-04-23 18:59:11, Baolin Wang wrote:
>>>>>> Now the __pageblock_pfn_to_page() is used by set_zone_contiguous(), which
>>>>>> checks whether the given zone contains holes, and uses pfn_to_online_page()
>>>>>> to validate if the start pfn is online and valid, as well as using pfn_valid()
>>>>>> to validate the end pfn.
>>>>>>
>>>>>> However, the __pageblock_pfn_to_page() function may return non-NULL even
>>>>>> if the end pfn of a pageblock is in a memory hole in some situations. For
>>>>>> example, if the pageblock order is MAX_ORDER, which will fall into 2
>>>>>> sub-sections, and the end pfn of the pageblock may be hole even though
>>>>>> the start pfn is online and valid.
>>>>>>
>>>>>> This did not break anything until now, but the zone continuous is fragile
>>>>>> in this possible scenario. So as previous discussion[1], it is better to
>>>>>> add some comments to explain this possible issue in case there are some
>>>>>> future pfn walkers that rely on this.
>>>>>>
>>>>>> [1] https://lore.kernel.org/all/87r0sdsmr6.fsf@yhuang6-desk2.ccr.corp.intel.com/
>>>>>
>>>>> Do I remember correctly you've had a specific configuration that would
>>>>> trigger this case?
>>>>
>>>> Yes, I provided an example in previous thread [2] so show the
>>>> __pageblock_pfn_to_page() is fragile in some cases.
>>>>
>>>> [2] https://lore.kernel.org/all/52dfdd2e-9c99-eac4-233e-59919a24323e@linux.alibaba.com/
>>>
>>> Please make it a part of the changelog.
>>
>> Sure.
>>
>>>>>
>>>>>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>>>>> ---
>>>>>> Changes from v1:
>>>>>>     - Update the comments per Ying and Mike, thanks.
>>>>>> ---
>>>>>>     mm/page_alloc.c | 7 +++++++
>>>>>>     1 file changed, 7 insertions(+)
>>>>>>
>>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>>>> index 6457b64fe562..9756d66f471c 100644
>>>>>> --- a/mm/page_alloc.c
>>>>>> +++ b/mm/page_alloc.c
>>>>>> @@ -1502,6 +1502,13 @@ void __free_pages_core(struct page *page, unsigned int order)
>>>>>>      * interleaving within a single pageblock. It is therefore sufficient to check
>>>>>>      * the first and last page of a pageblock and avoid checking each individual
>>>>>>      * page in a pageblock.
>>>>>> + *
>>>>>> + * Note: the function may return non-NULL even if the end pfn of a pageblock
>>>>>> + * is in a memory hole in some situations. For example, if the pageblock
>>>>>> + * order is MAX_ORDER, which will fall into 2 sub-sections, and the end pfn
>>>>>> + * of the pageblock may be hole even though the start pfn is online and valid.
>>>>>> + * This did not break anything until now, but be careful about this possible
>>>>>> + * issue when checking whether all pfns of a pageblock are valid.
>>>>>
>>>>> It is not really clear what you should be doing (other than to be
>>>>> careful which is not helpful much TBH) when you encounter this
>>>>> situation. If the reality changes and this would break in the future
>>>>> what would breakage look like? What should be done about that?
>>>>
>>>> That depends on what the future pfn walkers do, which may access some hole
>>>> memory with zero-init page frame. For example, if checking the
>>>> __PageMovable() for a zero-init page frame, that will crash the system. But
>>>> I can not list all the possible cases.
>>>>
>>>> So how about below words?
>>>>
>>>>    * Note: the function may return non-NULL even if the end pfn of a pageblock
>>>>    * is in a memory hole in some situations. For example, if the pageblock
>>>>    * order is MAX_ORDER, which will fall into 2 sub-sections, and the end pfn
>>>>    * of the pageblock may be hole even though the start pfn is online and
>>>> valid.
>>>>    * This did not break anything until now, but be careful about this possible
>>>>    * issue when checking whether all pfns of a pageblock are valid, that may
>>>>    * lead to accessing empty page frame, and the worst case can crash the
>>>> system.
>>>>    * So you should use pfn_to_onlie_page() instead of pfn_valid() to valid the
>>>>    * pfns in a pageblock if such case happens.
>>>
>>> Does that mean that struct page is not initialized and PagePoisoned will
>>> trigger or it is just zero-prefilled?
>>
>> In the example I provided[2], these page frames of the hole memory are
>> zero-prefilled.
> 
> OK, so make _that_ explicit in the comment. Essentially you want to say
> that there are cases where we have zero-initialized struct pages for
> memory holes. In general no pfn walker should touch a physical memory
> range for pfn where the struct page doesn't contain any metadata it
> recognizes. Zero fill struct pages do not contain any distinguishable
> state so that makes it less of a problem.
> 
> All that being said I would reformulate the comment as follows:
> 
> 	* Note: the function may return non-NULL struct page even for a
> 	* page block which contains a memory hole (i.e. there is no
> 	* physical memory for a subset of the pfn range). This should be
> 	* safe most of the time because struct pages are still zero
> 	* pre-filled and pfn walkers shouldn't touch any physical memory
> 	* range for which they do not recognize any specific metadata in
> 	* struct pages.

Thanks. That makes sense to me. A trivial thing is I still want to add 
the example in the comments to make it clear. Are you okay with below 
description?

+ * Note: the function may return non-NULL struct page even for a page block
+ * which contains a memory hole (i.e. there is no physical memory for a 
subset
+ * of the pfn range). For example, if the pageblock order is MAX_ORDER, 
which
+ * will fall into 2 sub-sections, and the end pfn of the pageblock may 
be hole
+ * even though the start pfn is online and valid. This should be safe 
most of
+ * the time because struct pages are still zero pre-filled and pfn walkers
+ * shouldn't touch any physical memory range for which they do not 
recognize
+ * any specific metadata in struct pages.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] mm/page_alloc: add some comments to explain the possible hole in __pageblock_pfn_to_page()
  2023-04-24 12:48             ` Baolin Wang
@ 2023-04-24 13:08               ` Michal Hocko
  0 siblings, 0 replies; 14+ messages in thread
From: Michal Hocko @ 2023-04-24 13:08 UTC (permalink / raw)
  To: Baolin Wang
  Cc: akpm, rppt, ying.huang, mgorman, vbabka, david, linux-mm, linux-kernel

On Mon 24-04-23 20:48:32, Baolin Wang wrote:
> 
> 
> On 4/24/2023 8:08 PM, Michal Hocko wrote:
> > On Mon 24-04-23 19:40:30, Baolin Wang wrote:
> > > 
> > > 
> > > On 4/24/2023 7:34 PM, Michal Hocko wrote:
> > > > On Mon 24-04-23 19:20:43, Baolin Wang wrote:
> > > > > 
> > > > > 
> > > > > On 4/24/2023 5:54 PM, Michal Hocko wrote:
> > > > > > On Sun 23-04-23 18:59:11, Baolin Wang wrote:
> > > > > > > Now the __pageblock_pfn_to_page() is used by set_zone_contiguous(), which
> > > > > > > checks whether the given zone contains holes, and uses pfn_to_online_page()
> > > > > > > to validate if the start pfn is online and valid, as well as using pfn_valid()
> > > > > > > to validate the end pfn.
> > > > > > > 
> > > > > > > However, the __pageblock_pfn_to_page() function may return non-NULL even
> > > > > > > if the end pfn of a pageblock is in a memory hole in some situations. For
> > > > > > > example, if the pageblock order is MAX_ORDER, which will fall into 2
> > > > > > > sub-sections, and the end pfn of the pageblock may be hole even though
> > > > > > > the start pfn is online and valid.
> > > > > > > 
> > > > > > > This did not break anything until now, but the zone continuous is fragile
> > > > > > > in this possible scenario. So as previous discussion[1], it is better to
> > > > > > > add some comments to explain this possible issue in case there are some
> > > > > > > future pfn walkers that rely on this.
> > > > > > > 
> > > > > > > [1] https://lore.kernel.org/all/87r0sdsmr6.fsf@yhuang6-desk2.ccr.corp.intel.com/
> > > > > > 
> > > > > > Do I remember correctly you've had a specific configuration that would
> > > > > > trigger this case?
> > > > > 
> > > > > Yes, I provided an example in previous thread [2] so show the
> > > > > __pageblock_pfn_to_page() is fragile in some cases.
> > > > > 
> > > > > [2] https://lore.kernel.org/all/52dfdd2e-9c99-eac4-233e-59919a24323e@linux.alibaba.com/
> > > > 
> > > > Please make it a part of the changelog.
> > > 
> > > Sure.
> > > 
> > > > > > 
> > > > > > > Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> > > > > > > ---
> > > > > > > Changes from v1:
> > > > > > >     - Update the comments per Ying and Mike, thanks.
> > > > > > > ---
> > > > > > >     mm/page_alloc.c | 7 +++++++
> > > > > > >     1 file changed, 7 insertions(+)
> > > > > > > 
> > > > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > > > > > index 6457b64fe562..9756d66f471c 100644
> > > > > > > --- a/mm/page_alloc.c
> > > > > > > +++ b/mm/page_alloc.c
> > > > > > > @@ -1502,6 +1502,13 @@ void __free_pages_core(struct page *page, unsigned int order)
> > > > > > >      * interleaving within a single pageblock. It is therefore sufficient to check
> > > > > > >      * the first and last page of a pageblock and avoid checking each individual
> > > > > > >      * page in a pageblock.
> > > > > > > + *
> > > > > > > + * Note: the function may return non-NULL even if the end pfn of a pageblock
> > > > > > > + * is in a memory hole in some situations. For example, if the pageblock
> > > > > > > + * order is MAX_ORDER, which will fall into 2 sub-sections, and the end pfn
> > > > > > > + * of the pageblock may be hole even though the start pfn is online and valid.
> > > > > > > + * This did not break anything until now, but be careful about this possible
> > > > > > > + * issue when checking whether all pfns of a pageblock are valid.
> > > > > > 
> > > > > > It is not really clear what you should be doing (other than to be
> > > > > > careful which is not helpful much TBH) when you encounter this
> > > > > > situation. If the reality changes and this would break in the future
> > > > > > what would breakage look like? What should be done about that?
> > > > > 
> > > > > That depends on what the future pfn walkers do, which may access some hole
> > > > > memory with zero-init page frame. For example, if checking the
> > > > > __PageMovable() for a zero-init page frame, that will crash the system. But
> > > > > I can not list all the possible cases.
> > > > > 
> > > > > So how about below words?
> > > > > 
> > > > >    * Note: the function may return non-NULL even if the end pfn of a pageblock
> > > > >    * is in a memory hole in some situations. For example, if the pageblock
> > > > >    * order is MAX_ORDER, which will fall into 2 sub-sections, and the end pfn
> > > > >    * of the pageblock may be hole even though the start pfn is online and
> > > > > valid.
> > > > >    * This did not break anything until now, but be careful about this possible
> > > > >    * issue when checking whether all pfns of a pageblock are valid, that may
> > > > >    * lead to accessing empty page frame, and the worst case can crash the
> > > > > system.
> > > > >    * So you should use pfn_to_onlie_page() instead of pfn_valid() to valid the
> > > > >    * pfns in a pageblock if such case happens.
> > > > 
> > > > Does that mean that struct page is not initialized and PagePoisoned will
> > > > trigger or it is just zero-prefilled?
> > > 
> > > In the example I provided[2], these page frames of the hole memory are
> > > zero-prefilled.
> > 
> > OK, so make _that_ explicit in the comment. Essentially you want to say
> > that there are cases where we have zero-initialized struct pages for
> > memory holes. In general no pfn walker should touch a physical memory
> > range for pfn where the struct page doesn't contain any metadata it
> > recognizes. Zero fill struct pages do not contain any distinguishable
> > state so that makes it less of a problem.
> > 
> > All that being said I would reformulate the comment as follows:
> > 
> > 	* Note: the function may return non-NULL struct page even for a
> > 	* page block which contains a memory hole (i.e. there is no
> > 	* physical memory for a subset of the pfn range). This should be
> > 	* safe most of the time because struct pages are still zero
> > 	* pre-filled and pfn walkers shouldn't touch any physical memory
> > 	* range for which they do not recognize any specific metadata in
> > 	* struct pages.
> 
> Thanks. That makes sense to me. A trivial thing is I still want to add the
> example in the comments to make it clear. Are you okay with below
> description?
> 
> + * Note: the function may return non-NULL struct page even for a page block
> + * which contains a memory hole (i.e. there is no physical memory for a
> subset
> + * of the pfn range). For example, if the pageblock order is MAX_ORDER,
> which
> + * will fall into 2 sub-sections, and the end pfn of the pageblock may be
> hole
> + * even though the start pfn is online and valid. This should be safe most
> of
> + * the time because struct pages are still zero pre-filled and pfn walkers
> + * shouldn't touch any physical memory range for which they do not
> recognize
> + * any specific metadata in struct pages.

No objections of course. I do not see an additional value, quite
honestly but if somebody does then it doesn't hurt.

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-04-24 13:08 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-23 10:59 [PATCH v2 1/2] mm/page_alloc: drop the unnecessary pfn_valid() for start pfn Baolin Wang
2023-04-23 10:59 ` [PATCH v2 2/2] mm/page_alloc: add some comments to explain the possible hole in __pageblock_pfn_to_page() Baolin Wang
2023-04-24  2:24   ` Huang, Ying
2023-04-24  9:54   ` Michal Hocko
2023-04-24 11:20     ` Baolin Wang
2023-04-24 11:34       ` Michal Hocko
2023-04-24 11:40         ` Baolin Wang
2023-04-24 12:08           ` Michal Hocko
2023-04-24 12:48             ` Baolin Wang
2023-04-24 13:08               ` Michal Hocko
2023-04-24  9:50 ` [PATCH v2 1/2] mm/page_alloc: drop the unnecessary pfn_valid() for start pfn Michal Hocko
2023-04-24 10:46   ` Baolin Wang
2023-04-24 10:54     ` Michal Hocko
2023-04-24 11:21       ` Baolin Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox