From: David Hildenbrand <david@redhat.com>
To: Stefan Roesch <shr@devkernel.io>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
Andrew Morton <akpm@linux-foundation.org>,
kernel-team@fb.com, linux-mm@kvack.org, riel@surriel.com,
mhocko@suse.com, linux-kselftest@vger.kernel.org,
linux-doc@vger.kernel.org, Hugh Dickins <hughd@google.com>
Subject: Re: [PATCH v4 0/3] mm: process/cgroup ksm support
Date: Thu, 6 Apr 2023 19:10:59 +0200 [thread overview]
Message-ID: <10dd1fd4-4d10-c25d-174b-de37f01bef48@redhat.com> (raw)
In-Reply-To: <qvqw4jptc59w.fsf@dev0134.prn3.facebook.com>
On 06.04.23 18:59, Stefan Roesch wrote:
>
> Stefan Roesch <shr@devkernel.io> writes:
>
>> David Hildenbrand <david@redhat.com> writes:
>>
>>>>>> Obviously we could spend months analysing which exact allocations are
>>>>>> identical, and then more months or years reworking the architecture to
>>>>>> deduplicate them by hand and in userspace. But this isn't practical,
>>>>>> and KSM is specifically for cases where this isn't practical.
>>>>>> Based on your request in the previous thread, we investigated whether
>>>>>> the boost was coming from the unintended side effects of KSM splitting
>>>>>> THPs. This wasn't the case.
>>>>>> If you have other theories on how the results could be bogus, we'd be
>>>>>> happy to investigate those as well. But you have to let us know what
>>>>>> you're looking for.
>>>>>>
>>>>>
>>>>> Maybe I'm bad at making such requests but
>>>>>
>>>>> "Stefan, can you do me a favor and investigate which pages we end up
>>>>> deduplicating -- especially if it's mostly only the zeropage and if it's
>>>>> still that significant when disabling THP?"
>>>>>
>>>>> "In any case, it would be nice to get a feeling for how much variety in
>>>>> these 20% of deduplicated pages are. "
>>>>>
>>>>> is pretty clear to me. And shouldn't take months.
>>>>>
>>>
>>> Just to clarify: the details I requested are not meant to decide whether to
>>> reject the patch set (I understand that it can be beneficial to have); I
>>> primarily want to understand if we're really dealing with a workload where KSM
>>> is able to deduplicate pages that are non-trivial, to maybe figure out if there
>>> are other workloads that could similarly benefit -- or if we could optimize KSM
>>> for these specific cases or avoid the memory deduplication altogether.
>>>
>>> In contrast to e.g.:
>>>
>>> 1) THP resulted in many zeropages we end up deduplicating again. The THP
>>> placement was unfortunate.
>>>
>>> 2) Unoptimized memory allocators that leave many identical pages mapped
>>> after freeing up memory (e.g., zeroed pages, pages all filled with
>>> poison values) instead of e.g., using MADV_DONTNEED to free up that
>>> memory.
>>>
>>>
>>
>> I repeated an experiment with and without KSM. In terms of THP there is
>> no huge difference between the two. On a 64GB main memory machine I see
>> between 100 - 400MB in AnonHugePages.
>>
>>>> /sys/kernel/mm/ksm/pages_shared is over 10000 when we run this on an
>>>> Instagram workload. The workload consists of 36 processes plus a few
>>>> sidecar processes.
>>>
>>> Thanks! To which value is /sys/kernel/mm/ksm/max_page_sharing set in that
>>> environment?
>>>
>>
>> It's set to the standard value of 256.
>>
>> In the meantime I have run experiments with different settings for
>> pages_to_scan. With the default value of 100, we only get a relatively
>> small benefit of KSM. If I increase the value to for instance to 2000 or
>> 3000 the savings are substantial. (The workload is memory bound, not
>> CPU bound).
>>
>> Here are some stats for setting pages_to_scan to 3000:
>>
>> full_scans: 560
>> general_profit: 20620539008
>> max_page_sharing: 256
>> merge_across_nodes: 1
>> pages_shared: 125446
>> pages_sharing: 5259506
>> pages_to_scan: 3000
>> pages_unshared: 1897537
>> pages_volatile: 12389223
>> run: 1
>> sleep_millisecs: 20
>> stable_node_chains: 176
>> stable_node_chains_prune_millisecs: 2000
>> stable_node_dups: 2604
>> use_zero_pages: 0
>> zero_pages_sharing: 0
>>
>>
>>> What would be interesting is pages_shared after max_page_sharing was set to a
>>> very high number such that pages_shared does not include duplicates. Then
>>> pages_shared actually expresses how many different pages we deduplicate. No need
>>> to run without THP in that case.
>>>
>>
>> Thats on my list for the next set of experiments.
>>
>
> In the new experiment I increased the max_page_sharing value to 16384.
> This reduced the number of stable_node_dups considerably (its around 3%
> of the previous value). However pages_sharing is still very high for
> this workload.
>
> full_scans: 138
> general_profit: 24442268608
> max_page_sharing: 16384
> merge_across_nodes: 1
> pages_shared: 144590
> pages_sharing: 6230983
> pages_to_scan: 3000
> pages_unshared: 2120307
> pages_volatile: 14590780
> run: 1
> sleep_millisecs: 20
> stable_node_chains: 23
> stable_node_chains_prune_millisecs: 2000
> stable_node_dups: 78
> use_zero_pages: 0
> zero_pages_sharing: 0
Interesting, thanks!
I wonder if it's really many interpreters performing (and caching?)
essentially same blobs (for example, for a JIT the IR and/or target
executable code). So maybe in general, such multi-instance interpreters
are a good candidate for KSM. (I recall there were some processes where
a server would perform and cache the translations instead) But just a
pure speculation :)
--
Thanks,
David / dhildenb
next prev parent reply other threads:[~2023-04-06 17:11 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-10 18:28 Stefan Roesch
2023-03-10 18:28 ` [PATCH v4 1/3] mm: add new api to enable ksm per process Stefan Roesch
2023-03-13 16:26 ` Johannes Weiner
2023-04-03 10:37 ` David Hildenbrand
2023-04-03 11:03 ` David Hildenbrand
2023-04-04 16:32 ` Stefan Roesch
2023-04-04 16:43 ` Stefan Roesch
2023-04-05 6:51 ` Christian Borntraeger
2023-04-05 16:04 ` David Hildenbrand
2023-04-03 15:50 ` Stefan Roesch
2023-04-03 17:02 ` David Hildenbrand
2023-03-10 18:28 ` [PATCH v4 2/3] mm: add new KSM process and sysfs knobs Stefan Roesch
2023-04-05 17:04 ` David Hildenbrand
2023-04-05 21:20 ` Stefan Roesch
2023-04-06 13:23 ` David Hildenbrand
2023-04-06 14:16 ` Johannes Weiner
2023-04-06 14:32 ` David Hildenbrand
2023-03-10 18:28 ` [PATCH v4 3/3] selftests/mm: add new selftests for KSM Stefan Roesch
2023-03-15 20:03 ` [PATCH v4 0/3] mm: process/cgroup ksm support David Hildenbrand
2023-03-15 20:23 ` Mike Kravetz
2023-03-15 21:05 ` Johannes Weiner
2023-03-15 21:19 ` Johannes Weiner
2023-03-15 21:45 ` David Hildenbrand
2023-03-15 21:47 ` David Hildenbrand
2023-03-30 16:19 ` Stefan Roesch
2023-03-28 23:09 ` Andrew Morton
2023-03-30 4:55 ` David Hildenbrand
2023-03-30 14:26 ` Johannes Weiner
2023-03-30 14:40 ` David Hildenbrand
2023-03-30 16:41 ` Stefan Roesch
2023-04-03 9:48 ` David Hildenbrand
2023-04-03 16:34 ` Stefan Roesch
2023-04-03 17:04 ` David Hildenbrand
2023-04-06 16:59 ` Stefan Roesch
2023-04-06 17:10 ` David Hildenbrand [this message]
2023-03-30 20:18 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=10dd1fd4-4d10-c25d-174b-de37f01bef48@redhat.com \
--to=david@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=kernel-team@fb.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=riel@surriel.com \
--cc=shr@devkernel.io \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox