linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH v2 2/4] mm: remap unused subpages to shared zeropage when splitting isolated thp
       [not found]   ` <20240807200241.GB1828817@cmpxchg.org>
@ 2024-08-08 15:53     ` Usama Arif
  0 siblings, 0 replies; 6+ messages in thread
From: Usama Arif @ 2024-08-08 15:53 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: akpm, linux-mm, riel, shakeel.butt, roman.gushchin, yuzhao,
	david, baohua, ryan.roberts, rppt, willy, cerasuolodomenico,
	corbet, linux-kernel, linux-doc, kernel-team, Shuang Zhai



On 07/08/2024 21:02, Johannes Weiner wrote:
> On Wed, Aug 07, 2024 at 02:46:47PM +0100, Usama Arif wrote:
>> @@ -177,13 +177,56 @@ void putback_movable_pages(struct list_head *l)
>>  	}
>>  }
>>  
>> +static bool try_to_map_unused_to_zeropage(struct page_vma_mapped_walk *pvmw,
>> +					  struct folio *folio,
>> +					  unsigned long idx)
>> +{
>> +	struct page *page = folio_page(folio, idx);
>> +	bool contains_data;
>> +	pte_t newpte;
>> +	void *addr;
>> +
>> +	VM_BUG_ON_PAGE(PageCompound(page), page);
>> +	VM_BUG_ON_PAGE(!PageAnon(page), page);
>> +	VM_BUG_ON_PAGE(!PageLocked(page), page);
>> +	VM_BUG_ON_PAGE(pte_present(*pvmw->pte), page);
>> +
>> +	if (PageMlocked(page) || (pvmw->vma->vm_flags & VM_LOCKED))
>> +		return false;
>> +
>> +	/*
>> +	 * The pmd entry mapping the old thp was flushed and the pte mapping
>> +	 * this subpage has been non present. If the subpage is only zero-filled
>> +	 * then map it to the shared zeropage.
>> +	 */
>> +	addr = kmap_local_page(page);
>> +	contains_data = memchr_inv(addr, 0, PAGE_SIZE);
>> +	kunmap_local(addr);
>> +
>> +	if (contains_data || mm_forbids_zeropage(pvmw->vma->vm_mm))
>> +		return false;
>> +
>> +	newpte = pte_mkspecial(pfn_pte(page_to_pfn(ZERO_PAGE(pvmw->address)),
>> +					pvmw->vma->vm_page_prot));
> 
> Why not use my_zero_pfn() here? On many configurations this just
> returns zero_pfn and avoids the indirection through mem_map.
> 
>> @@ -904,7 +958,7 @@ static int writeout(struct address_space *mapping, struct folio *folio)
>>  	 * At this point we know that the migration attempt cannot
>>  	 * be successful.
>>  	 */
>> -	remove_migration_ptes(folio, folio, false);
>> +	remove_migration_ptes(folio, folio, false, false);
> 
> bool params are not great for callsite readability.
> 
> How about a flags parameter and using names?
> 
> enum rmp_flags {
> 	RMP_LOCKED	= 1 << 0,
> 	RMP_ZEROPAGES	= 1 << 1,
> }

Thanks! Will include both of the above changes in the next revision.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 4/4] mm: split underutilized THPs
       [not found] ` <20240807134732.3292797-5-usamaarif642@gmail.com>
@ 2024-08-08 15:55   ` David Hildenbrand
  2024-08-09 10:31     ` Usama Arif
  0 siblings, 1 reply; 6+ messages in thread
From: David Hildenbrand @ 2024-08-08 15:55 UTC (permalink / raw)
  To: Usama Arif, akpm, linux-mm
  Cc: hannes, riel, shakeel.butt, roman.gushchin, yuzhao, baohua,
	ryan.roberts, rppt, willy, cerasuolodomenico, corbet,
	linux-kernel, linux-doc, kernel-team

On 07.08.24 15:46, Usama Arif wrote:
> This is an attempt to mitigate the issue of running out of memory when THP
> is always enabled. During runtime whenever a THP is being faulted in
> (__do_huge_pmd_anonymous_page) or collapsed by khugepaged
> (collapse_huge_page), the THP is added to  _deferred_list. Whenever memory
> reclaim happens in linux, the kernel runs the deferred_split
> shrinker which goes through the _deferred_list.
> 
> If the folio was partially mapped, the shrinker attempts to split it.
> A new boolean is added to be able to distinguish between partially
> mapped folios and others in the deferred_list at split time in
> deferred_split_scan. Its needed as __folio_remove_rmap decrements
> the folio mapcount elements, hence it won't be possible to distinguish
> between partially mapped folios and others in deferred_split_scan
> without the boolean.

Just so I get this right: Are you saying that we might now add fully 
mapped folios to the deferred split queue and that's what you want to 
distinguish?

If that's the case, then could we use a bit in folio->_flags_1 instead?

Further, I think you forgot to update at least one instance if a 
list_empty(&folio->_deferred_list) check where we want to detect 
"partially mapped". Please go over all and see what needs adjustments.

I would actually suggest to split decoupling of "_deferred_list" and 
"partially mapped" into a separate preparation patch.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 1/4] mm: free zapped tail pages when splitting isolated thp
       [not found] ` <20240807134732.3292797-2-usamaarif642@gmail.com>
@ 2024-08-08 15:56   ` David Hildenbrand
  0 siblings, 0 replies; 6+ messages in thread
From: David Hildenbrand @ 2024-08-08 15:56 UTC (permalink / raw)
  To: Usama Arif, akpm, linux-mm
  Cc: hannes, riel, shakeel.butt, roman.gushchin, yuzhao, baohua,
	ryan.roberts, rppt, willy, cerasuolodomenico, corbet,
	linux-kernel, linux-doc, kernel-team, Shuang Zhai

On 07.08.24 15:46, Usama Arif wrote:
> From: Yu Zhao <yuzhao@google.com>
> 
> If a tail page has only two references left, one inherited from the
> isolation of its head and the other from lru_add_page_tail() which we
> are about to drop, it means this tail page was concurrently zapped.
> Then we can safely free it and save page reclaim or migration the
> trouble of trying it.
> 
> Signed-off-by: Yu Zhao <yuzhao@google.com>
> Tested-by: Shuang Zhai <zhais@google.com>
> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
> ---
>   mm/huge_memory.c | 27 +++++++++++++++++++++++++++
>   1 file changed, 27 insertions(+)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 0167dc27e365..35c1089d8d61 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2923,7 +2923,9 @@ static void __split_huge_page(struct page *page, struct list_head *list,
>   	unsigned int new_nr = 1 << new_order;
>   	int order = folio_order(folio);
>   	unsigned int nr = 1 << order;
> +	struct folio_batch free_folios;
>   
> +	folio_batch_init(&free_folios);
>   	/* complete memcg works before add pages to LRU */
>   	split_page_memcg(head, order, new_order);
>   
> @@ -3007,6 +3009,26 @@ static void __split_huge_page(struct page *page, struct list_head *list,
>   		if (subpage == page)
>   			continue;
>   		folio_unlock(new_folio);
> +		/*
> +		 * If a folio has only two references left, one inherited
> +		 * from the isolation of its head and the other from
> +		 * lru_add_page_tail() which we are about to drop, it means this
> +		 * folio was concurrently zapped. Then we can safely free it
> +		 * and save page reclaim or migration the trouble of trying it.
> +		 */
> +		if (list && page_ref_freeze(subpage, 2)) {
> +			VM_WARN_ON_ONCE_FOLIO(folio_test_lru(new_folio), new_folio);
> +			VM_WARN_ON_ONCE_FOLIO(folio_test_large(new_folio), new_folio);
> +			VM_WARN_ON_ONCE_FOLIO(folio_mapped(new_folio), new_folio);
> +
> +			folio_clear_active(new_folio);
> +			folio_clear_unevictable(new_folio);
> +			if (folio_batch_add(&free_folios, folio) == 0) {

nit:  "!folio_batch_add(&free_folios, folio)"

Nothing else jumped at me.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 4/4] mm: split underutilized THPs
  2024-08-08 15:55   ` [PATCH v2 4/4] mm: split underutilized THPs David Hildenbrand
@ 2024-08-09 10:31     ` Usama Arif
  2024-08-09 13:21       ` David Hildenbrand
  0 siblings, 1 reply; 6+ messages in thread
From: Usama Arif @ 2024-08-09 10:31 UTC (permalink / raw)
  To: David Hildenbrand, akpm, linux-mm
  Cc: hannes, riel, shakeel.butt, roman.gushchin, yuzhao, baohua,
	ryan.roberts, rppt, willy, cerasuolodomenico, corbet,
	linux-kernel, linux-doc, kernel-team



On 08/08/2024 16:55, David Hildenbrand wrote:
> On 07.08.24 15:46, Usama Arif wrote:
>> This is an attempt to mitigate the issue of running out of memory when THP
>> is always enabled. During runtime whenever a THP is being faulted in
>> (__do_huge_pmd_anonymous_page) or collapsed by khugepaged
>> (collapse_huge_page), the THP is added to  _deferred_list. Whenever memory
>> reclaim happens in linux, the kernel runs the deferred_split
>> shrinker which goes through the _deferred_list.
>>
>> If the folio was partially mapped, the shrinker attempts to split it.
>> A new boolean is added to be able to distinguish between partially
>> mapped folios and others in the deferred_list at split time in
>> deferred_split_scan. Its needed as __folio_remove_rmap decrements
>> the folio mapcount elements, hence it won't be possible to distinguish
>> between partially mapped folios and others in deferred_split_scan
>> without the boolean.
> 
> Just so I get this right: Are you saying that we might now add fully mapped folios to the deferred split queue and that's what you want to distinguish?

Yes

> 
> If that's the case, then could we use a bit in folio->_flags_1 instead?
Yes, thats a good idea. Will create the below flag for the next revision

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 5769fe6e4950..5825bd1cf6db 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -189,6 +189,11 @@ enum pageflags {
 
 #define PAGEFLAGS_MASK         ((1UL << NR_PAGEFLAGS) - 1)
 
+enum folioflags_1 {
+       /* The first 8 bits of folio->_flags_1 are used to keep track of folio order */
+       FOLIO_PARTIALLY_MAPPED = 8,     /* folio is partially mapped */
+}
+
 #ifndef __GENERATING_BOUNDS_H


and use set/clear/test_bit(FOLIO_PARTIALLY_MAPPED, &folio->_flags_1) in the respective places.

> 
> Further, I think you forgot to update at least one instance if a list_empty(&folio->_deferred_list) check where we want to detect "partially mapped". Please go over all and see what needs adjustments.
> 

Ah I think its the one in free_tail_page_prepare? The way I wrote this patch is by going through all instances of "folio->_deferred_list" and deciding if partially_mapped needs to be set/cleared/tested. I think I missed it when rebasing to mm-unstable. Double checked now and the only one missing is free_tail_page_prepare ([1] was removed recently by Barry)

[1] https://lore.kernel.org/lkml/20240629234155.53524-1-21cnbao@gmail.com/

Will include the below diff in the next revision.

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index aae00ba3b3bd..b4e1393cbd4f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -957,8 +957,9 @@ static int free_tail_page_prepare(struct page *head_page, struct page *page)
                break;
        case 2:
                /* the second tail page: deferred_list overlaps ->mapping */
-               if (unlikely(!list_empty(&folio->_deferred_list))) {
-                       bad_page(page, "on deferred list");
+               if (unlikely(!list_empty(&folio->_deferred_list) &&
+                   test_bit(FOLIO_PARTIALLY_MAPPED, &folio->_flags_1))) {
+                       bad_page(page, "partially mapped folio on deferred list");
                        goto out;
                }
                break;

> I would actually suggest to split decoupling of "_deferred_list" and "partially mapped" into a separate preparation patch.
> 
Yes, will do. I will split it into 3 patches, 1st one that introduces FOLIO_PARTIALLY_MAPPED and sets/clear it in the right place without introducing any functional change, 2nd to split underutilized THPs and 3rd to add sysfs entry to enable/disable the shrinker. Should make the patches quite small and easy to review.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 4/4] mm: split underutilized THPs
  2024-08-09 10:31     ` Usama Arif
@ 2024-08-09 13:21       ` David Hildenbrand
  2024-08-09 14:25         ` Usama Arif
  0 siblings, 1 reply; 6+ messages in thread
From: David Hildenbrand @ 2024-08-09 13:21 UTC (permalink / raw)
  To: Usama Arif, akpm, linux-mm
  Cc: hannes, riel, shakeel.butt, roman.gushchin, yuzhao, baohua,
	ryan.roberts, rppt, willy, cerasuolodomenico, corbet,
	linux-kernel, linux-doc, kernel-team

On 09.08.24 12:31, Usama Arif wrote:
> 
> 
> On 08/08/2024 16:55, David Hildenbrand wrote:
>> On 07.08.24 15:46, Usama Arif wrote:
>>> This is an attempt to mitigate the issue of running out of memory when THP
>>> is always enabled. During runtime whenever a THP is being faulted in
>>> (__do_huge_pmd_anonymous_page) or collapsed by khugepaged
>>> (collapse_huge_page), the THP is added to  _deferred_list. Whenever memory
>>> reclaim happens in linux, the kernel runs the deferred_split
>>> shrinker which goes through the _deferred_list.
>>>
>>> If the folio was partially mapped, the shrinker attempts to split it.
>>> A new boolean is added to be able to distinguish between partially
>>> mapped folios and others in the deferred_list at split time in
>>> deferred_split_scan. Its needed as __folio_remove_rmap decrements
>>> the folio mapcount elements, hence it won't be possible to distinguish
>>> between partially mapped folios and others in deferred_split_scan
>>> without the boolean.
>>
>> Just so I get this right: Are you saying that we might now add fully mapped folios to the deferred split queue and that's what you want to distinguish?
> 
> Yes
> 
>>
>> If that's the case, then could we use a bit in folio->_flags_1 instead?
> Yes, thats a good idea. Will create the below flag for the next revision
> 
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 5769fe6e4950..5825bd1cf6db 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -189,6 +189,11 @@ enum pageflags {
>   
>   #define PAGEFLAGS_MASK         ((1UL << NR_PAGEFLAGS) - 1)
>   
> +enum folioflags_1 {
> +       /* The first 8 bits of folio->_flags_1 are used to keep track of folio order */
> +       FOLIO_PARTIALLY_MAPPED = 8,     /* folio is partially mapped */
> +}

This might be what you want to achieve:

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index a0a29bd092f8..d4722ed60ef8 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -182,6 +182,7 @@ enum pageflags {
         /* At least one page in this folio has the hwpoison flag set */
         PG_has_hwpoisoned = PG_active,
         PG_large_rmappable = PG_workingset, /* anon or file-backed */
+       PG_partially_mapped, /* was identified to be partially mapped */
  };
  
  #define PAGEFLAGS_MASK         ((1UL << NR_PAGEFLAGS) - 1)
@@ -861,8 +862,9 @@ static inline void ClearPageCompound(struct page *page)
         ClearPageHead(page);
  }
  FOLIO_FLAG(large_rmappable, FOLIO_SECOND_PAGE)
+FOLIO_FLAG(partially_mapped, FOLIO_SECOND_PAGE)
  #else
-FOLIO_FLAG_FALSE(large_rmappable)
+FOLIO_FLAG_FALSE(partially_mapped)
  #endif
  
  #define PG_head_mask ((1UL << PG_head))

The downside is an atomic op to set/clear, but it should likely not really matter
(initially, the flag will be clear, and we should only ever set it once when
partially unmapping). If it hurts, we can reconsider.

[...]

>> I would actually suggest to split decoupling of "_deferred_list" and "partially mapped" into a separate preparation patch.
>>
> Yes, will do. I will split it into 3 patches, 1st one that introduces FOLIO_PARTIALLY_MAPPED and sets/clear it in the right place without introducing any functional change, 2nd to split underutilized THPs and 3rd to add sysfs entry to enable/disable the shrinker. Should make the patches quite small and easy to review.


Great! As always, please shout if you disagree with something I propose :)

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 4/4] mm: split underutilized THPs
  2024-08-09 13:21       ` David Hildenbrand
@ 2024-08-09 14:25         ` Usama Arif
  0 siblings, 0 replies; 6+ messages in thread
From: Usama Arif @ 2024-08-09 14:25 UTC (permalink / raw)
  To: David Hildenbrand, akpm, linux-mm
  Cc: hannes, riel, shakeel.butt, roman.gushchin, yuzhao, baohua,
	ryan.roberts, rppt, willy, cerasuolodomenico, corbet,
	linux-kernel, linux-doc, kernel-team



On 09/08/2024 14:21, David Hildenbrand wrote:
> On 09.08.24 12:31, Usama Arif wrote:
>>
>>
>> On 08/08/2024 16:55, David Hildenbrand wrote:
>>> On 07.08.24 15:46, Usama Arif wrote:
>>>> This is an attempt to mitigate the issue of running out of memory when THP
>>>> is always enabled. During runtime whenever a THP is being faulted in
>>>> (__do_huge_pmd_anonymous_page) or collapsed by khugepaged
>>>> (collapse_huge_page), the THP is added to  _deferred_list. Whenever memory
>>>> reclaim happens in linux, the kernel runs the deferred_split
>>>> shrinker which goes through the _deferred_list.
>>>>
>>>> If the folio was partially mapped, the shrinker attempts to split it.
>>>> A new boolean is added to be able to distinguish between partially
>>>> mapped folios and others in the deferred_list at split time in
>>>> deferred_split_scan. Its needed as __folio_remove_rmap decrements
>>>> the folio mapcount elements, hence it won't be possible to distinguish
>>>> between partially mapped folios and others in deferred_split_scan
>>>> without the boolean.
>>>
>>> Just so I get this right: Are you saying that we might now add fully mapped folios to the deferred split queue and that's what you want to distinguish?
>>
>> Yes
>>
>>>
>>> If that's the case, then could we use a bit in folio->_flags_1 instead?
>> Yes, thats a good idea. Will create the below flag for the next revision
>>
>> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
>> index 5769fe6e4950..5825bd1cf6db 100644
>> --- a/include/linux/page-flags.h
>> +++ b/include/linux/page-flags.h
>> @@ -189,6 +189,11 @@ enum pageflags {
>>     #define PAGEFLAGS_MASK         ((1UL << NR_PAGEFLAGS) - 1)
>>   +enum folioflags_1 {
>> +       /* The first 8 bits of folio->_flags_1 are used to keep track of folio order */
>> +       FOLIO_PARTIALLY_MAPPED = 8,     /* folio is partially mapped */
>> +}
> 
> This might be what you want to achieve:
> 
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index a0a29bd092f8..d4722ed60ef8 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -182,6 +182,7 @@ enum pageflags {
>         /* At least one page in this folio has the hwpoison flag set */
>         PG_has_hwpoisoned = PG_active,
>         PG_large_rmappable = PG_workingset, /* anon or file-backed */
> +       PG_partially_mapped, /* was identified to be partially mapped */
>  };
>  
>  #define PAGEFLAGS_MASK         ((1UL << NR_PAGEFLAGS) - 1)
> @@ -861,8 +862,9 @@ static inline void ClearPageCompound(struct page *page)
>         ClearPageHead(page);
>  }
>  FOLIO_FLAG(large_rmappable, FOLIO_SECOND_PAGE)
> +FOLIO_FLAG(partially_mapped, FOLIO_SECOND_PAGE)
>  #else
> -FOLIO_FLAG_FALSE(large_rmappable)
> +FOLIO_FLAG_FALSE(partially_mapped)
>  #endif
>  
>  #define PG_head_mask ((1UL << PG_head))
> 
> The downside is an atomic op to set/clear, but it should likely not really matter
> (initially, the flag will be clear, and we should only ever set it once when
> partially unmapping). If it hurts, we can reconsider.
> 
> [...]

I was looking for where the bits for flags_1 were specified! I just saw the start of enum pageflags, saw that compound order isn't specified anywhere over there and ignored the end :)

Yes, this is what I wanted to do. Thanks.



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-08-09 14:26 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20240807134732.3292797-1-usamaarif642@gmail.com>
     [not found] ` <20240807134732.3292797-3-usamaarif642@gmail.com>
     [not found]   ` <20240807200241.GB1828817@cmpxchg.org>
2024-08-08 15:53     ` [PATCH v2 2/4] mm: remap unused subpages to shared zeropage when splitting isolated thp Usama Arif
     [not found] ` <20240807134732.3292797-5-usamaarif642@gmail.com>
2024-08-08 15:55   ` [PATCH v2 4/4] mm: split underutilized THPs David Hildenbrand
2024-08-09 10:31     ` Usama Arif
2024-08-09 13:21       ` David Hildenbrand
2024-08-09 14:25         ` Usama Arif
     [not found] ` <20240807134732.3292797-2-usamaarif642@gmail.com>
2024-08-08 15:56   ` [PATCH v2 1/4] mm: free zapped tail pages when splitting isolated thp David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox