linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: prohibit the last subpage from reusing the entire large folio
@ 2024-03-08  8:56 Barry Song
  2024-03-08  9:03 ` David Hildenbrand
  0 siblings, 1 reply; 4+ messages in thread
From: Barry Song @ 2024-03-08  8:56 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: minchan, fengwei.yin, linux-kernel, mhocko, peterx, ryan.roberts,
	shy828301, songmuchun, wangkefeng.wang, xiehuan09, zokeefe,
	chrisl, yuzhao, Barry Song, David Hildenbrand, Lance Yang

From: Barry Song <v-songbaohua@oppo.com>

In a Copy-on-Write (CoW) scenario, the last subpage will reuse the entire
large folio, resulting in the waste of (nr_pages - 1) pages. This wasted
memory remains allocated until it is either unmapped or memory
reclamation occurs.

The following small program can serve as evidence of this behavior

 main()
 {
 #define SIZE 1024 * 1024 * 1024UL
         void *p = malloc(SIZE);
         memset(p, 0x11, SIZE);
         if (fork() == 0)
                 _exit(0);
         memset(p, 0x12, SIZE);
         printf("done\n");
         while(1);
 }

For example, using a 1024KiB mTHP by:
 echo always > /sys/kernel/mm/transparent_hugepage/hugepages-1024kB/enabled

(1) w/o the patch, it takes 2GiB,

Before running the test program,
 / # free -m
                total        used        free      shared  buff/cache   available
 Mem:            5754          84        5692           0          17        5669
 Swap:              0           0           0

 / # /a.out &
 / # done

After running the test program,
 / # free -m
                 total        used        free      shared  buff/cache   available
 Mem:            5754        2149        3627           0          19        3605
 Swap:              0           0           0

(2) w/ the patch, it takes 1GiB only,

Before running the test program,
 / # free -m
                 total        used        free      shared  buff/cache   available
 Mem:            5754          89        5687           0          17        5664
 Swap:              0           0           0

 / # /a.out &
 / # done

After running the test program,
 / # free -m
                total        used        free      shared  buff/cache   available
 Mem:            5754        1122        4655           0          17        4632
 Swap:              0           0           0

This patch migrates the last subpage to a small folio and immediately
returns the large folio to the system. It benefits both memory availability
and anti-fragmentation.

Cc: David Hildenbrand <david@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Lance Yang <ioworker0@gmail.com>
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
---
 mm/memory.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/mm/memory.c b/mm/memory.c
index e17669d4f72f..0200bfc15f94 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3523,6 +3523,14 @@ static bool wp_can_reuse_anon_folio(struct folio *folio,
 		folio_unlock(folio);
 		return false;
 	}
+	/*
+	 * If the last subpage reuses the entire large folio, it would
+	 * result in a waste of (nr_pages - 1) pages
+	 */
+	if (folio_ref_count(folio) == 1 && folio_test_large(folio)) {
+		folio_unlock(folio);
+		return false;
+	}
 	/*
 	 * Ok, we've got the only folio reference from our mapping
 	 * and the folio is locked, it's dark out, and we're wearing
-- 
2.34.1



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm: prohibit the last subpage from reusing the entire large folio
  2024-03-08  8:56 [PATCH] mm: prohibit the last subpage from reusing the entire large folio Barry Song
@ 2024-03-08  9:03 ` David Hildenbrand
  2024-03-08  9:07   ` Barry Song
  0 siblings, 1 reply; 4+ messages in thread
From: David Hildenbrand @ 2024-03-08  9:03 UTC (permalink / raw)
  To: Barry Song, akpm, linux-mm
  Cc: minchan, fengwei.yin, linux-kernel, mhocko, peterx, ryan.roberts,
	shy828301, songmuchun, wangkefeng.wang, xiehuan09, zokeefe,
	chrisl, yuzhao, Barry Song, Lance Yang

On 08.03.24 09:56, Barry Song wrote:
> From: Barry Song <v-songbaohua@oppo.com>
> 
> In a Copy-on-Write (CoW) scenario, the last subpage will reuse the entire
> large folio, resulting in the waste of (nr_pages - 1) pages. This wasted
> memory remains allocated until it is either unmapped or memory
> reclamation occurs.
> 
> The following small program can serve as evidence of this behavior
> 
>   main()
>   {
>   #define SIZE 1024 * 1024 * 1024UL
>           void *p = malloc(SIZE);
>           memset(p, 0x11, SIZE);
>           if (fork() == 0)
>                   _exit(0);
>           memset(p, 0x12, SIZE);
>           printf("done\n");
>           while(1);
>   }
> 
> For example, using a 1024KiB mTHP by:
>   echo always > /sys/kernel/mm/transparent_hugepage/hugepages-1024kB/enabled
> 
> (1) w/o the patch, it takes 2GiB,
> 
> Before running the test program,
>   / # free -m
>                  total        used        free      shared  buff/cache   available
>   Mem:            5754          84        5692           0          17        5669
>   Swap:              0           0           0
> 
>   / # /a.out &
>   / # done
> 
> After running the test program,
>   / # free -m
>                   total        used        free      shared  buff/cache   available
>   Mem:            5754        2149        3627           0          19        3605
>   Swap:              0           0           0
> 
> (2) w/ the patch, it takes 1GiB only,
> 
> Before running the test program,
>   / # free -m
>                   total        used        free      shared  buff/cache   available
>   Mem:            5754          89        5687           0          17        5664
>   Swap:              0           0           0
> 
>   / # /a.out &
>   / # done
> 
> After running the test program,
>   / # free -m
>                  total        used        free      shared  buff/cache   available
>   Mem:            5754        1122        4655           0          17        4632
>   Swap:              0           0           0
> 
> This patch migrates the last subpage to a small folio and immediately
> returns the large folio to the system. It benefits both memory availability
> and anti-fragmentation.
> 
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Cc: Lance Yang <ioworker0@gmail.com>
> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> ---
>   mm/memory.c | 8 ++++++++
>   1 file changed, 8 insertions(+)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index e17669d4f72f..0200bfc15f94 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3523,6 +3523,14 @@ static bool wp_can_reuse_anon_folio(struct folio *folio,
>   		folio_unlock(folio);
>   		return false;
>   	}
> +	/*
> +	 * If the last subpage reuses the entire large folio, it would
> +	 * result in a waste of (nr_pages - 1) pages
> +	 */
> +	if (folio_ref_count(folio) == 1 && folio_test_large(folio)) {
> +		folio_unlock(folio);
> +		return false;
> +	}
>   	/*
>   	 * Ok, we've got the only folio reference from our mapping
>   	 * and the folio is locked, it's dark out, and we're wearing


Why not simply:

diff --git a/mm/memory.c b/mm/memory.c
index e17669d4f72f7..46d286bd450c6 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3498,6 +3498,10 @@ static vm_fault_t wp_page_shared(struct vm_fault 
*vmf, struct folio *folio)
  static bool wp_can_reuse_anon_folio(struct folio *folio,
                                     struct vm_area_struct *vma)
  {
+
+       if (folio_test_large(folio))
+               return false;
+
         /*
          * We have to verify under folio lock: these early checks are
          * just an optimization to avoid locking the folio and freeing

We could only possibly succeed if we are the last one mapping a PTE 
either way. No we simply give up right away for the time being.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm: prohibit the last subpage from reusing the entire large folio
  2024-03-08  9:03 ` David Hildenbrand
@ 2024-03-08  9:07   ` Barry Song
  2024-03-08  9:17     ` David Hildenbrand
  0 siblings, 1 reply; 4+ messages in thread
From: Barry Song @ 2024-03-08  9:07 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: akpm, linux-mm, minchan, fengwei.yin, linux-kernel, mhocko,
	peterx, ryan.roberts, shy828301, songmuchun, wangkefeng.wang,
	xiehuan09, zokeefe, chrisl, yuzhao, Barry Song, Lance Yang

On Fri, Mar 8, 2024 at 10:03 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 08.03.24 09:56, Barry Song wrote:
> > From: Barry Song <v-songbaohua@oppo.com>
> >
> > In a Copy-on-Write (CoW) scenario, the last subpage will reuse the entire
> > large folio, resulting in the waste of (nr_pages - 1) pages. This wasted
> > memory remains allocated until it is either unmapped or memory
> > reclamation occurs.
> >
> > The following small program can serve as evidence of this behavior
> >
> >   main()
> >   {
> >   #define SIZE 1024 * 1024 * 1024UL
> >           void *p = malloc(SIZE);
> >           memset(p, 0x11, SIZE);
> >           if (fork() == 0)
> >                   _exit(0);
> >           memset(p, 0x12, SIZE);
> >           printf("done\n");
> >           while(1);
> >   }
> >
> > For example, using a 1024KiB mTHP by:
> >   echo always > /sys/kernel/mm/transparent_hugepage/hugepages-1024kB/enabled
> >
> > (1) w/o the patch, it takes 2GiB,
> >
> > Before running the test program,
> >   / # free -m
> >                  total        used        free      shared  buff/cache   available
> >   Mem:            5754          84        5692           0          17        5669
> >   Swap:              0           0           0
> >
> >   / # /a.out &
> >   / # done
> >
> > After running the test program,
> >   / # free -m
> >                   total        used        free      shared  buff/cache   available
> >   Mem:            5754        2149        3627           0          19        3605
> >   Swap:              0           0           0
> >
> > (2) w/ the patch, it takes 1GiB only,
> >
> > Before running the test program,
> >   / # free -m
> >                   total        used        free      shared  buff/cache   available
> >   Mem:            5754          89        5687           0          17        5664
> >   Swap:              0           0           0
> >
> >   / # /a.out &
> >   / # done
> >
> > After running the test program,
> >   / # free -m
> >                  total        used        free      shared  buff/cache   available
> >   Mem:            5754        1122        4655           0          17        4632
> >   Swap:              0           0           0
> >
> > This patch migrates the last subpage to a small folio and immediately
> > returns the large folio to the system. It benefits both memory availability
> > and anti-fragmentation.
> >
> > Cc: David Hildenbrand <david@redhat.com>
> > Cc: Ryan Roberts <ryan.roberts@arm.com>
> > Cc: Lance Yang <ioworker0@gmail.com>
> > Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> > ---
> >   mm/memory.c | 8 ++++++++
> >   1 file changed, 8 insertions(+)
> >
> > diff --git a/mm/memory.c b/mm/memory.c
> > index e17669d4f72f..0200bfc15f94 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -3523,6 +3523,14 @@ static bool wp_can_reuse_anon_folio(struct folio *folio,
> >               folio_unlock(folio);
> >               return false;
> >       }
> > +     /*
> > +      * If the last subpage reuses the entire large folio, it would
> > +      * result in a waste of (nr_pages - 1) pages
> > +      */
> > +     if (folio_ref_count(folio) == 1 && folio_test_large(folio)) {
> > +             folio_unlock(folio);
> > +             return false;
> > +     }
> >       /*
> >        * Ok, we've got the only folio reference from our mapping
> >        * and the folio is locked, it's dark out, and we're wearing
>
>
> Why not simply:
>
> diff --git a/mm/memory.c b/mm/memory.c
> index e17669d4f72f7..46d286bd450c6 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3498,6 +3498,10 @@ static vm_fault_t wp_page_shared(struct vm_fault
> *vmf, struct folio *folio)
>   static bool wp_can_reuse_anon_folio(struct folio *folio,
>                                      struct vm_area_struct *vma)
>   {
> +
> +       if (folio_test_large(folio))
> +               return false;
> +
>          /*
>           * We have to verify under folio lock: these early checks are
>           * just an optimization to avoid locking the folio and freeing
>
> We could only possibly succeed if we are the last one mapping a PTE
> either way. No we simply give up right away for the time being.

nice !

>
> --
> Cheers,
>
> David / dhildenb
>


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm: prohibit the last subpage from reusing the entire large folio
  2024-03-08  9:07   ` Barry Song
@ 2024-03-08  9:17     ` David Hildenbrand
  0 siblings, 0 replies; 4+ messages in thread
From: David Hildenbrand @ 2024-03-08  9:17 UTC (permalink / raw)
  To: Barry Song
  Cc: akpm, linux-mm, minchan, fengwei.yin, linux-kernel, mhocko,
	peterx, ryan.roberts, shy828301, songmuchun, wangkefeng.wang,
	xiehuan09, zokeefe, chrisl, yuzhao, Barry Song, Lance Yang

On 08.03.24 10:07, Barry Song wrote:
> On Fri, Mar 8, 2024 at 10:03 PM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 08.03.24 09:56, Barry Song wrote:
>>> From: Barry Song <v-songbaohua@oppo.com>
>>>
>>> In a Copy-on-Write (CoW) scenario, the last subpage will reuse the entire
>>> large folio, resulting in the waste of (nr_pages - 1) pages. This wasted
>>> memory remains allocated until it is either unmapped or memory
>>> reclamation occurs.
>>>
>>> The following small program can serve as evidence of this behavior
>>>
>>>    main()
>>>    {
>>>    #define SIZE 1024 * 1024 * 1024UL
>>>            void *p = malloc(SIZE);
>>>            memset(p, 0x11, SIZE);
>>>            if (fork() == 0)
>>>                    _exit(0);
>>>            memset(p, 0x12, SIZE);
>>>            printf("done\n");
>>>            while(1);
>>>    }
>>>
>>> For example, using a 1024KiB mTHP by:
>>>    echo always > /sys/kernel/mm/transparent_hugepage/hugepages-1024kB/enabled
>>>
>>> (1) w/o the patch, it takes 2GiB,
>>>
>>> Before running the test program,
>>>    / # free -m
>>>                   total        used        free      shared  buff/cache   available
>>>    Mem:            5754          84        5692           0          17        5669
>>>    Swap:              0           0           0
>>>
>>>    / # /a.out &
>>>    / # done
>>>
>>> After running the test program,
>>>    / # free -m
>>>                    total        used        free      shared  buff/cache   available
>>>    Mem:            5754        2149        3627           0          19        3605
>>>    Swap:              0           0           0
>>>
>>> (2) w/ the patch, it takes 1GiB only,
>>>
>>> Before running the test program,
>>>    / # free -m
>>>                    total        used        free      shared  buff/cache   available
>>>    Mem:            5754          89        5687           0          17        5664
>>>    Swap:              0           0           0
>>>
>>>    / # /a.out &
>>>    / # done
>>>
>>> After running the test program,
>>>    / # free -m
>>>                   total        used        free      shared  buff/cache   available
>>>    Mem:            5754        1122        4655           0          17        4632
>>>    Swap:              0           0           0
>>>
>>> This patch migrates the last subpage to a small folio and immediately
>>> returns the large folio to the system. It benefits both memory availability
>>> and anti-fragmentation.
>>>
>>> Cc: David Hildenbrand <david@redhat.com>
>>> Cc: Ryan Roberts <ryan.roberts@arm.com>
>>> Cc: Lance Yang <ioworker0@gmail.com>
>>> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
>>> ---
>>>    mm/memory.c | 8 ++++++++
>>>    1 file changed, 8 insertions(+)
>>>
>>> diff --git a/mm/memory.c b/mm/memory.c
>>> index e17669d4f72f..0200bfc15f94 100644
>>> --- a/mm/memory.c
>>> +++ b/mm/memory.c
>>> @@ -3523,6 +3523,14 @@ static bool wp_can_reuse_anon_folio(struct folio *folio,
>>>                folio_unlock(folio);
>>>                return false;
>>>        }
>>> +     /*
>>> +      * If the last subpage reuses the entire large folio, it would
>>> +      * result in a waste of (nr_pages - 1) pages
>>> +      */
>>> +     if (folio_ref_count(folio) == 1 && folio_test_large(folio)) {
>>> +             folio_unlock(folio);
>>> +             return false;
>>> +     }
>>>        /*
>>>         * Ok, we've got the only folio reference from our mapping
>>>         * and the folio is locked, it's dark out, and we're wearing
>>
>>
>> Why not simply:
>>
>> diff --git a/mm/memory.c b/mm/memory.c
>> index e17669d4f72f7..46d286bd450c6 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -3498,6 +3498,10 @@ static vm_fault_t wp_page_shared(struct vm_fault
>> *vmf, struct folio *folio)
>>    static bool wp_can_reuse_anon_folio(struct folio *folio,
>>                                       struct vm_area_struct *vma)
>>    {
>> +
>> +       if (folio_test_large(folio))
>> +               return false;
>> +
>>           /*
>>            * We have to verify under folio lock: these early checks are
>>            * just an optimization to avoid locking the folio and freeing
>>
>> We could only possibly succeed if we are the last one mapping a PTE
>> either way. No we simply give up right away for the time being.
> 
> nice !

... of course, adding a comment like

"We could currently only reuse a subpage of a large folio if no other 
subpages of the large folios are still mapped. However, let's just 
consistently not reuse subpages even if we could reuse in that scenario, 
and give back a large folio a bit sooner."

The !fotk() case would be more tricky to handle, because the folio has 
PAE set and may be pinned or we might race with pinning. So this here is 
the low hanging fruit.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-03-08  9:17 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-08  8:56 [PATCH] mm: prohibit the last subpage from reusing the entire large folio Barry Song
2024-03-08  9:03 ` David Hildenbrand
2024-03-08  9:07   ` Barry Song
2024-03-08  9:17     ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox