linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: <xu.xin16@zte.com.cn>
To: <xialonglong@kylinos.cn>
Cc: <david@redhat.com>, <akpm@linux-foundation.org>,
	<chengming.zhou@linux.dev>, <linux-mm@kvack.org>,
	<linux-kernel@vger.kernel.org>, <shr@devkernel.io>,
	<corbet@lwn.net>, <lorenzo.stoakes@oracle.com>,
	<Liam.Howlett@oracle.com>, <vbabka@suse.cz>
Subject: 答复: [PATCH 1/1] mm/ksm: add ksm_pages_sharing for each process to calculate profit more accurately
Date: Tue, 24 Jun 2025 17:47:11 +0800 (CST)	[thread overview]
Message-ID: <20250624174711752wxRCqiy0LZGukb1R7z_6D@zte.com.cn> (raw)
In-Reply-To: <910a462a-8d4d-4b5d-941c-ba1396e287dc@kylinos.cn>

> >>>> and /proc/self/ksm_stat/ to indicate the saved pages of this process.
> >>>> (not including ksm_zero_pages)
> >>> Curious, why is updating ksm_process_profit() insufficient and we also
> >>> have to expose ksm_pages_sharing?
> >>>
> >> Since ksm_process_profit() uses ksm_merging_pages(pages_sharing +
> >> pages_shared) to calculate the profit for individual processes,
> >>
> >> while general_profit uses pages_sharing for profit calculation, this can
> >> lead to the total profit calculated for each process being greater than
> >> that of general_profit.
> >>
> >> Additionally, exposing ksm_pages_sharing under /proc/self/ksm_stat/ may
> >> be sufficient.
> >>
> > Hi,
> >
> > Althorugh it's true, however, this patch maybe not okay. It can only ensure
> > that the sum of each process's profit roughly equals the system's general_profit
> > , but gives totally wrong profit result for some one process. For example, when
> > two pages from two different processes are merged, one process's page_shared
> > increments by +1, while the other's pages_sharing increments by +1, which
> > resulting in different calculated profits for the two processes, even though
> > their actual profits are identical. If in more extreme cases, this could even
> > render a process's profit entirely unreadable.
> >
> > Lastly, do we really need each process’s profit sum to perfectly match the general
> > profit, or we just want a rough estimate of the process’s profit from KSM ?
> >
> Hi,
> 
> In extreme cases, stable nodes may be distributed quite unevenly, which 
> is due to stable nodes not being per mm, of course.
> There are also situations where there are 1000 pairs of pages, with the 
> pages within each pair being identical, while each pair is different 
> from all other pages.
> This results in the number of page_sharing and page_shared being the 
> same. This way, using ksm_merging_pages(page_sharing + page_shared) 
> averages a 50% error.

In your tests, I don't agree 50% error because your
assumption that process benefit equals pages_sharing is fundamentally flawed.

The issue lies in what is the most accurate definition of process KSM profit.
Since stable_node isn't per-mm, we cannot calculate a process's
benefit solely based on pages_sharing. The cost of stable_node should be split
fairly among every process sharing this stable_node, rather than being assigned
to a single individual.

It's inaccurate to claim that when two processes' pages merge into a
single KSM page, one process gains 4k - sizeof(rmap_item) while
the other gains 0 ? This is unfair to the second process, as it actively
participated in the KSM merge.


The most accurate and fair profit caculation should be:

    profit = (ksm_merging_pages - united_stable_nodes)*PAGE_SIZE - sizeof(rmap_items)*ksm_rmap_items

where 'united_stable_nodes' is (stable_node)/shared_process. This is too complex.

For example: process A with one page is merged with process B with one page

process A      process B
    page          page
     \           /
      \         /
       \       /
        \     /
        Ksm Page

A: pages_sharing(1), pages_shared(0)
B: pages_sharing(0), pages_shared(1)

then

profit(A) = (pages_sharing + pages_shared - united_stable_nodes)*4K- sizeof(rmap_items)*ksm_rmap_items

          = (1 + 0 - 1/2)*4k-sizeof(rmap_items)*ksm_rmap_items

          = 0.5*4k - sizeof(rmap_items)*ksm_rmap_items

profit(A) = (pages_sharing + pages_shared - united_stable_nodes)*4K- sizeof(rmap_items)*ksm_rmap_items

	  = (0 + 1 - 1/2)*4k-sizeof(rmap_items)*ksm_rmap_items

          = 0.5*4k - sizeof(rmap_items)*ksm_rmap_items


> In practical testing, we may only need to enable KSM for specific 
> applications and calculate the total benefits of these processes.
> Since page_shared is also included in the statistics, this may lead to 
> the calculated benefits being higher than the actual ones.
> In practical testing, the error may reach 20%. For example, in one test, 
> the total benefits of all processes were estimated to be around 528MB,
> while the profit calculated through general_profit was only around 428MB.
> The theoretical error may be around 50%.
> 
> If we expose the ksm_pages_sharing for each process, we can not only 
> calculate the actual benefits
> 
> but also determine how many ksm_pages_shared there are by the difference 
> between ksm_merging_pages and ksm_pages_sharing of each process.


      reply	other threads:[~2025-06-24  9:47 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-06  7:03 Longlong Xia
2025-06-06 10:08 ` David Hildenbrand
2025-06-18  7:13   ` Longlong Xia
2025-06-18  9:14     ` xu.xin16
2025-06-24  8:35       ` Longlong Xia
2025-06-24  9:47         ` xu.xin16 [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250624174711752wxRCqiy0LZGukb1R7z_6D@zte.com.cn \
    --to=xu.xin16@zte.com.cn \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=chengming.zhou@linux.dev \
    --cc=corbet@lwn.net \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=shr@devkernel.io \
    --cc=vbabka@suse.cz \
    --cc=xialonglong@kylinos.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox