linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Stefan Roesch <shr@devkernel.io>
To: David Hildenbrand <david@redhat.com>
Cc: kernel-team@fb.com, akpm@linux-foundation.org,
	linux-fsdevel@vger.kernel.org, hannes@cmpxchg.org,
	riel@surriel.com, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [PATCH v4] proc/ksm: add ksm stats to /proc/pid/smaps
Date: Wed, 23 Aug 2023 09:41:56 -0700	[thread overview]
Message-ID: <qvqwzg2hr9oq.fsf@devbig1114.prn1.facebook.com> (raw)
In-Reply-To: <b380a853-4466-d060-1084-dcdd65a5ce13@redhat.com>


David Hildenbrand <david@redhat.com> writes:

> On 22.08.23 20:05, Stefan Roesch wrote:
>> With madvise and prctl KSM can be enabled for different VMA's. Once it
>> is enabled we can query how effective KSM is overall. However we cannot
>> easily query if an individual VMA benefits from KSM.
>> This commit adds a KSM section to the /prod/<pid>/smaps file. It reports
>> how many of the pages are KSM pages. The returned value for KSM is
>> independent of the use of the shared zeropage.
>
> Maybe phrase that to something like "The returned value for KSM includes
> KSM-placed zeropages, so we can observe the actual KSM benefit independent
> of the usage of the shared zeropage for KSM.".
>
> But thinking about it (see below), maybe we really just let any user figure that out by
> temporarily disabling the shared zeropage.
>
> So this would be
>
> "It reports how many of the pages are KSM pages. Note that KSM-placed zeropages
> are not included, only actual KSM pages."
>

I'll replace the commit message and the documentation with the above
sentence.

>> Here is a typical output:
>> 7f420a000000-7f421a000000 rw-p 00000000 00:00 0
>> Size:             262144 kB
>> KernelPageSize:        4 kB
>> MMUPageSize:           4 kB
>> Rss:               51212 kB
>> Pss:                8276 kB
>> Shared_Clean:        172 kB
>> Shared_Dirty:      42996 kB
>> Private_Clean:       196 kB
>> Private_Dirty:      7848 kB
>> Referenced:        15388 kB
>> Anonymous:         51212 kB
>> KSM:               41376 kB
>> LazyFree:              0 kB
>> AnonHugePages:         0 kB
>> ShmemPmdMapped:        0 kB
>> FilePmdMapped:         0 kB
>> Shared_Hugetlb:        0 kB
>> Private_Hugetlb:       0 kB
>> Swap:             202016 kB
>> SwapPss:            3882 kB
>> Locked:                0 kB
>> THPeligible:    0
>> ProtectionKey:         0
>> ksm_state:          0
>> ksm_skip_base:      0
>> ksm_skip_count:     0
>> VmFlags: rd wr mr mw me nr mg anon
>> This information also helps with the following workflow:
>> - First enable KSM for all the VMA's of a process with prctl.
>> - Then analyze with the above smaps report which VMA's benefit the most
>> - Change the application (if possible) to add the corresponding madvise
>> calls for the VMA's that benefit the most
>> Signed-off-by: Stefan Roesch <shr@devkernel.io>
>> ---
>>   Documentation/filesystems/proc.rst |  4 ++++
>>   fs/proc/task_mmu.c                 | 16 +++++++++++-----
>>   2 files changed, 15 insertions(+), 5 deletions(-)
>> diff --git a/Documentation/filesystems/proc.rst
>> b/Documentation/filesystems/proc.rst
>> index 7897a7dafcbc..d5bdfd59f5b0 100644
>> --- a/Documentation/filesystems/proc.rst
>> +++ b/Documentation/filesystems/proc.rst
>> @@ -461,6 +461,7 @@ Memory Area, or VMA) there is a series of lines such as the following::
>>       Private_Dirty:         0 kB
>>       Referenced:          892 kB
>>       Anonymous:             0 kB
>> +    KSM:                   0 kB
>>       LazyFree:              0 kB
>>       AnonHugePages:         0 kB
>>       ShmemPmdMapped:        0 kB
>> @@ -501,6 +502,9 @@ accessed.
>>   a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE
>>   and a page is modified, the file page is replaced by a private anonymous copy.
>>   +"KSM" shows the amount of anonymous memory that has been de-duplicated. The
>> +value is independent of the use of shared zeropage.
>
> Maybe here as well.
>
>> +
>>   "LazyFree" shows the amount of memory which is marked by madvise(MADV_FREE).
>>   The memory isn't freed immediately with madvise(). It's freed in memory
>>   pressure if the memory is clean. Please note that the printed value might
>> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
>> index 51315133cdc2..4532caa8011c 100644
>> --- a/fs/proc/task_mmu.c
>> +++ b/fs/proc/task_mmu.c
>> @@ -4,6 +4,7 @@
>>   #include <linux/hugetlb.h>
>>   #include <linux/huge_mm.h>
>>   #include <linux/mount.h>
>> +#include <linux/ksm.h>
>>   #include <linux/seq_file.h>
>>   #include <linux/highmem.h>
>>   #include <linux/ptrace.h>
>> @@ -396,6 +397,7 @@ struct mem_size_stats {
>>   	unsigned long swap;
>>   	unsigned long shared_hugetlb;
>>   	unsigned long private_hugetlb;
>> +	unsigned long ksm;
>>   	u64 pss;
>>   	u64 pss_anon;
>>   	u64 pss_file;
>> @@ -435,9 +437,9 @@ static void smaps_page_accumulate(struct mem_size_stats *mss,
>>   	}
>>   }
>>   -static void smaps_account(struct mem_size_stats *mss, struct page *page,
>> -		bool compound, bool young, bool dirty, bool locked,
>> -		bool migration)
>> +static void smaps_account(struct mem_size_stats *mss, pte_t *pte,
>> +		struct page *page, bool compound, bool young, bool dirty,
>> +		bool locked, bool migration)
>>   {
>>   	int i, nr = compound ? compound_nr(page) : 1;
>>   	unsigned long size = nr * PAGE_SIZE;
>> @@ -452,6 +454,9 @@ static void smaps_account(struct mem_size_stats *mss, struct page *page,
>>   			mss->lazyfree += size;
>>   	}
>>   +	if (PageKsm(page) && (!pte || !is_ksm_zero_pte(*pte)))
>
> I think this won't work either way, because smaps_pte_entry() never ends up calling
> this function with !page. And the shared zeropage here always gives us !page.
>
> What would work is:
>
>
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 15ddf4653a19..ef6f39d7c5a2 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -528,6 +528,9 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr,
>                 page = vm_normal_page(vma, addr, ptent);
>                 young = pte_young(ptent);
>                 dirty = pte_dirty(ptent);
> +
> +               if (!page && is_ksm_zero_pte(ptent))
> +                       mss->ksm += size;
>         } else if (is_swap_pte(ptent)) {
>                 swp_entry_t swpent = pte_to_swp_entry(ptent);
>  That means that "KSM" can be bigger than "Anonymous" and "RSS" when the shared
> zeropage is used.
>
> Interestingly, right now we account each KSM page individually towards
> "Anonymous" and "RSS".
>
> So if we have 100 times the same KSM page in a VMA, we will have 100 times anon
> and 100 times rss.
>
> Thinking about it, I guess considering the KSM-placed zeropage indeed adds more
> confusion to that. Eventually, we might just want separate "Shared-zeropages" count.
>
>
> So maybe v3 is better, clarifying the documentation a bit, that the
> KSM-placed zeropage is not considered.
>
o
> Sorry for changing my mind :D Thoughts?

I agree I think v3 is better, I'll revert to v3 and add the above
documentation change.


      reply	other threads:[~2023-08-23 16:43 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-22 18:05 Stefan Roesch
2023-08-23 10:09 ` David Hildenbrand
2023-08-23 16:41   ` Stefan Roesch [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=qvqwzg2hr9oq.fsf@devbig1114.prn1.facebook.com \
    --to=shr@devkernel.io \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@fb.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=riel@surriel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox