linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Longlong Xia <xialonglong@kylinos.cn>
To: xu.xin16@zte.com.cn, david@redhat.com
Cc: akpm@linux-foundation.org, chengming.zhou@linux.dev,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	shr@devkernel.io, corbet@lwn.net, lorenzo.stoakes@oracle.com,
	Liam.Howlett@oracle.com, vbabka@suse.cz
Subject: Re: [PATCH 1/1] mm/ksm: add ksm_pages_sharing for each process to calculate profit more accurately
Date: Tue, 24 Jun 2025 16:35:27 +0800	[thread overview]
Message-ID: <910a462a-8d4d-4b5d-941c-ba1396e287dc@kylinos.cn> (raw)
In-Reply-To: <202506181714096412Nvp5B3BkFpi3-CKLQ9ep@zte.com.cn>


在 2025/6/18 17:14, xu.xin16@zte.com.cn 写道:
>>>> and /proc/self/ksm_stat/ to indicate the saved pages of this process.
>>>> (not including ksm_zero_pages)
>>> Curious, why is updating ksm_process_profit() insufficient and we also
>>> have to expose ksm_pages_sharing?
>>>
>> Since ksm_process_profit() uses ksm_merging_pages(pages_sharing +
>> pages_shared) to calculate the profit for individual processes,
>>
>> while general_profit uses pages_sharing for profit calculation, this can
>> lead to the total profit calculated for each process being greater than
>> that of general_profit.
>>
>> Additionally, exposing ksm_pages_sharing under /proc/self/ksm_stat/ may
>> be sufficient.
>>
> Hi,
>
> Althorugh it's true, however, this patch maybe not okay. It can only ensure
> that the sum of each process's profit roughly equals the system's general_profit
> , but gives totally wrong profit result for some one process. For example, when
> two pages from two different processes are merged, one process's page_shared
> increments by +1, while the other's pages_sharing increments by +1, which
> resulting in different calculated profits for the two processes, even though
> their actual profits are identical. If in more extreme cases, this could even
> render a process's profit entirely unreadable.
>
> Lastly, do we really need each process’s profit sum to perfectly match the general
> profit, or we just want a rough estimate of the process’s profit from KSM ?
>
Hi,

In extreme cases, stable nodes may be distributed quite unevenly, which 
is due to stable nodes not being per mm, of course.
There are also situations where there are 1000 pairs of pages, with the 
pages within each pair being identical, while each pair is different 
from all other pages.
This results in the number of page_sharing and page_shared being the 
same. This way, using ksm_merging_pages(page_sharing + page_shared) 
averages a 50% error.
In practical testing, we may only need to enable KSM for specific 
applications and calculate the total benefits of these processes.
Since page_shared is also included in the statistics, this may lead to 
the calculated benefits being higher than the actual ones.
In practical testing, the error may reach 20%. For example, in one test, 
the total benefits of all processes were estimated to be around 528MB,
while the profit calculated through general_profit was only around 428MB.
The theoretical error may be around 50%.

If we expose the ksm_pages_sharing for each process, we can not only 
calculate the actual benefits

but also determine how many ksm_pages_shared there are by the difference 
between ksm_merging_pages and ksm_pages_sharing of each process.

>>
>>> Hm, I am wondering if that works. Stable nodes are not per MM, so
>>> can't we create an accounting imbalance for one MM somehow?
>>>
>>> (did not look into all the details, just something that came to mind)
>>>
>> Indeed, using the method in this patch to calculate ksm_pages_sharing
>> for each process to determine ksm_pages_shared
>>
>> can sometimes result in negative values for ksm_pages_shared.
>>
>> example for calculate mm->ksm_pages_shared:
>>
>>           if (rmap_item->hlist.next) {
>>               ksm_pages_sharing--;
>>               rmap_item->mm->ksm_pages_sharing--;
>>
>>           } else {
>>               ksm_pages_shared--;
>>                rmap_item->mm->ksm_pages_shared--; // can be negative
>>           }
>>
>>           rmap_item->mm->ksm_merging_pages--;
>>
>>
>> Would it be possible to compare the ratio of each process's rmap_item to
>> the total rmap_item and the ratio of the process's page_shared to the
>> total page_shared
>>
>> to assess this imbalance? For now, I don't have any better ideas.
> Although stable_node is not per-mm, if you really add ksm_shared to mm,
> it won't cause negative ksm_pages_shared, because the count of ksm_shared
> will only be attributed to the process of the first rmap_item.
Yes, it was the incorrect method I used during testing that led to the 
negative values.
After the improvement, it has not occurred again.

Thank you for your time.
Best regards,
Longlong Xia






  reply	other threads:[~2025-06-24  8:35 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-06  7:03 Longlong Xia
2025-06-06 10:08 ` David Hildenbrand
2025-06-18  7:13   ` Longlong Xia
2025-06-18  9:14     ` xu.xin16
2025-06-24  8:35       ` Longlong Xia [this message]
2025-06-24  9:47         ` 答复: " xu.xin16

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=910a462a-8d4d-4b5d-941c-ba1396e287dc@kylinos.cn \
    --to=xialonglong@kylinos.cn \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=chengming.zhou@linux.dev \
    --cc=corbet@lwn.net \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=shr@devkernel.io \
    --cc=vbabka@suse.cz \
    --cc=xu.xin16@zte.com.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox