From: zhiguojiang <justinjiang@vivo.com>
To: Barry Song <21cnbao@gmail.com>
Cc: David Hildenbrand <david@redhat.com>,
akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, opensource.kernel@vivo.com,
willy@infradead.org
Subject: Re: [PATCH v7] mm: shrink skip folio mapped by an exiting process
Date: Wed, 10 Jul 2024 16:38:17 +0800 [thread overview]
Message-ID: <85b144ba-ad45-4ce7-92d1-bd8f6fe222b7@vivo.com> (raw)
In-Reply-To: <CAGsJ_4ze50AYaBnAAt=pyZ0rWQ6scpeuYaFiqJfGeibET+anKg@mail.gmail.com>
在 2024/7/10 15:11, Barry Song 写道:
> [Some people who received this message don't often get email from 21cnbao@gmail.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> On Wed, Jul 10, 2024 at 6:47 PM zhiguojiang <justinjiang@vivo.com> wrote:
>>
>>
>> 在 2024/7/10 12:44, Barry Song 写道:
>>> [Some people who received this message don't often get email from 21cnbao@gmail.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>>>
>>> On Wed, Jul 10, 2024 at 4:04 PM David Hildenbrand <david@redhat.com> wrote:
>>>> On 10.07.24 06:02, Barry Song wrote:
>>>>> On Wed, Jul 10, 2024 at 3:59 PM David Hildenbrand <david@redhat.com> wrote:
>>>>>> On 10.07.24 05:32, Barry Song wrote:
>>>>>>> On Wed, Jul 10, 2024 at 9:23 AM Andrew Morton <akpm@linux-foundation.org> wrote:
>>>>>>>> On Tue, 9 Jul 2024 20:31:15 +0800 Zhiguo Jiang <justinjiang@vivo.com> wrote:
>>>>>>>>
>>>>>>>>> The releasing process of the non-shared anonymous folio mapped solely by
>>>>>>>>> an exiting process may go through two flows: 1) the anonymous folio is
>>>>>>>>> firstly is swaped-out into swapspace and transformed into a swp_entry
>>>>>>>>> in shrink_folio_list; 2) then the swp_entry is released in the process
>>>>>>>>> exiting flow. This will result in the high cpu load of releasing a
>>>>>>>>> non-shared anonymous folio mapped solely by an exiting process.
>>>>>>>>>
>>>>>>>>> When the low system memory and the exiting process exist at the same
>>>>>>>>> time, it will be likely to happen, because the non-shared anonymous
>>>>>>>>> folio mapped solely by an exiting process may be reclaimed by
>>>>>>>>> shrink_folio_list.
>>>>>>>>>
>>>>>>>>> This patch is that shrink skips the non-shared anonymous folio solely
>>>>>>>>> mapped by an exting process and this folio is only released directly in
>>>>>>>>> the process exiting flow, which will save swap-out time and alleviate
>>>>>>>>> the load of the process exiting.
>>>>>>>> It would be helpful to provide some before-and-after runtime
>>>>>>>> measurements, please. It's a performance optimization so please let's
>>>>>>>> see what effect it has.
>>>>>>> Hi Andrew,
>>>>>>>
>>>>>>> This was something I was curious about too, so I created a small test program
>>>>>>> that allocates and continuously writes to 256MB of memory. Using QEMU, I set
>>>>>>> up a small machine with only 300MB of RAM to trigger kswapd.
>>>>>>>
>>>>>>> qemu-system-aarch64 -M virt,gic-version=3,mte=off -nographic \
>>>>>>> -smp cpus=4 -cpu max \
>>>>>>> -m 300M -kernel arch/arm64/boot/Image
>>>>>>>
>>>>>>> The test program will be randomly terminated by its subprocess to trigger
>>>>>>> the use case of this patch.
>>>>>>>
>>>>>>> #include <stdio.h>
>>>>>>> #include <stdlib.h>
>>>>>>> #include <unistd.h>
>>>>>>> #include <string.h>
>>>>>>> #include <sys/types.h>
>>>>>>> #include <sys/wait.h>
>>>>>>> #include <time.h>
>>>>>>> #include <signal.h>
>>>>>>>
>>>>>>> #define MEMORY_SIZE (256 * 1024 * 1024)
>>>>>>>
>>>>>>> unsigned char *memory;
>>>>>>>
>>>>>>> void allocate_and_write_memory()
>>>>>>> {
>>>>>>> memory = (unsigned char *)malloc(MEMORY_SIZE);
>>>>>>> if (memory == NULL) {
>>>>>>> perror("malloc");
>>>>>>> exit(EXIT_FAILURE);
>>>>>>> }
>>>>>>>
>>>>>>> while (1)
>>>>>>> memset(memory, 0x11, MEMORY_SIZE);
>>>>>>> }
>>>>>>>
>>>>>>> int main()
>>>>>>> {
>>>>>>> pid_t pid;
>>>>>>> srand(time(NULL));
>>>>>>>
>>>>>>> pid = fork();
>>>>>>>
>>>>>>> if (pid < 0) {
>>>>>>> perror("fork");
>>>>>>> exit(EXIT_FAILURE);
>>>>>>> }
>>>>>>>
>>>>>>> if (pid == 0) {
>>>>>>> int delay = (rand() % 10000) + 10000;
>>>>>>> usleep(delay * 1000);
>>>>>>>
>>>>>>> /* kill parent when it is busy on swapping */
>>>>>>> kill(getppid(), SIGKILL);
>>>>>>> _exit(0);
>>>>>>> } else {
>>>>>>> allocate_and_write_memory();
>>>>>>>
>>>>>>> wait(NULL);
>>>>>>>
>>>>>>> free(memory);
>>>>>>> }
>>>>>>>
>>>>>>> return 0;
>>>>>>> }
>>>>>>>
>>>>>>> I tracked the number of folios that could be redundantly
>>>>>>> swapped out by adding a simple counter as shown below:
>>>>>>>
>>>>>>> @@ -879,6 +880,9 @@ static bool folio_referenced_one(struct folio *folio,
>>>>>>> check_stable_address_space(vma->vm_mm)) &&
>>>>>>> folio_test_swapbacked(folio) &&
>>>>>>> !folio_likely_mapped_shared(folio)) {
>>>>>>> + static long i, size;
>>>>>>> + size += folio_size(folio);
>>>>>>> + pr_err("index: %d skipped folio:%lx total size:%d\n", i++, (unsigned long)folio, size);
>>>>>>> pra->referenced = -1;
>>>>>>> page_vma_mapped_walk_done(&pvmw);
>>>>>>> return false;
>>>>>>>
>>>>>>>
>>>>>>> This is what I have observed:
>>>>>>>
>>>>>>> / # /home/barry/develop/linux/skip_swap_out_test
>>>>>>> [ 82.925645] index: 0 skipped folio:fffffdffc0425400 total size:65536
>>>>>>> [ 82.925960] index: 1 skipped folio:fffffdffc0425800 total size:131072
>>>>>>> [ 82.927524] index: 2 skipped folio:fffffdffc0425c00 total size:196608
>>>>>>> [ 82.928649] index: 3 skipped folio:fffffdffc0426000 total size:262144
>>>>>>> [ 82.929383] index: 4 skipped folio:fffffdffc0426400 total size:327680
>>>>>>> [ 82.929995] index: 5 skipped folio:fffffdffc0426800 total size:393216
>>>>>>> ...
>>>>>>> [ 88.469130] index: 6112 skipped folio:fffffdffc0390080 total size:97230848
>>>>>>> [ 88.469966] index: 6113 skipped folio:fffffdffc038d000 total size:97296384
>>>>>>> [ 89.023414] index: 6114 skipped folio:fffffdffc0366cc0 total size:97300480
>>>>>>>
>>>>>>> I observed that this patch effectively skipped 6114 folios (either 4KB or 64KB
>>>>>>> mTHP), potentially reducing the swap-out by up to 92MB (97,300,480 bytes) during
>>>>>>> the process exit.
>>>>>>>
>>>>>>> Despite the numerous mistakes Zhiguo made in sending this patch, it is still
>>>>>>> quite valuable. Please consider pulling his v9 into the mm tree for testing.
>>>>>> BTW, we dropped the folio_test_anon() check, but what about shmem? They
>>>>>> also do __folio_set_swapbacked()?
>>>>> my point is that the purpose is skipping redundant swap-out, if shmem is single
>>>>> mapped, they could be also skipped.
>>>> But they won't get necessarily *freed* when unmapping them. They might
>>>> just continue living in tmpfs? where some other process might just map
>>>> them later?
>>>>
>>> You're correct. I overlooked this aspect, focusing on swap and thinking of shmem
>>> solely in terms of swap.
>>>
>>>> IMHO, there is a big difference here between anon and shmem. (well,
>>>> anon_shmem would actually be different :) )
>>> Even though anon_shmem behaves similarly to anonymous memory when
>>> releasing memory, it doesn't seem worth the added complexity?
>>>
>>> So unfortunately it seems Zhiguo still needs v10 to take folio_test_anon()
>>> back? Sorry for my bad, Zhiguo.
>> If folio_test_anon(folio) && folio_test_swapbacked(folio) condition is
>> used, can
>> it means that the folio is anonymous anther than shmem definitely? So does
>> folio_likely_mapped_shared() need to be removed?
> No, shared memory (shmem) isn't necessarily shared, and private anonymous
> memory isn't necessarily unshared. There is no direct relationship between
> them.
>
> In the case of a fork, your private anonymous folio can be shared by
> two or more processes before CoW.
Hi,
I have added folio_test_anon(folio) condition in v10.
Thanks
>
>>>> --
>>>> Cheers,
>>>>
>>>> David / dhildenb
>>>>
>>> Thanks
>>> Barry
>> Thanks
>> Zhiguo
>>
next prev parent reply other threads:[~2024-07-10 8:38 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-09 12:31 Zhiguo Jiang
2024-07-09 13:02 ` Barry Song
2024-07-10 1:46 ` zhiguojiang
2024-07-10 2:00 ` Barry Song
2024-07-09 21:23 ` Andrew Morton
2024-07-10 3:32 ` Barry Song
2024-07-10 3:59 ` David Hildenbrand
2024-07-10 4:02 ` Barry Song
2024-07-10 4:04 ` David Hildenbrand
2024-07-10 4:44 ` Barry Song
2024-07-10 6:47 ` zhiguojiang
2024-07-10 7:11 ` Barry Song
2024-07-10 8:38 ` zhiguojiang [this message]
2024-07-10 2:12 ` Barry Song
2024-07-10 2:41 ` zhiguojiang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=85b144ba-ad45-4ce7-92d1-bd8f6fe222b7@vivo.com \
--to=justinjiang@vivo.com \
--cc=21cnbao@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=opensource.kernel@vivo.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox