linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jianfeng Wang <jianfeng.w.wang@oracle.com>
To: Vlastimil Babka <vbabka@suse.cz>,
	"Christoph Lameter (Ampere)" <cl@linux.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	penberg@kernel.org, rientjes@google.com, iamjoonsoo.kim@lge.com,
	akpm@linux-foundation.org, junxiao.bi@oracle.com
Subject: Re: [External] : Re: [PATCH] slub: limit number of slabs to scan in count_partial()
Date: Fri, 12 Apr 2024 10:29:55 -0700	[thread overview]
Message-ID: <a8e208fb-7842-4bca-9d2d-3aae21da030c@oracle.com> (raw)
In-Reply-To: <38ef26aa-169b-48ad-81ad-8378e7a38f25@suse.cz>



On 4/12/24 12:48 AM, Vlastimil Babka wrote:
> On 4/11/24 7:02 PM, Christoph Lameter (Ampere) wrote:
>> On Thu, 11 Apr 2024, Jianfeng Wang wrote:
>>
>>> So, the fix is to limit the number of slabs to scan in
>>> count_partial(), and output an approximated result if the list is too
>>> long. Default to 10000 which should be enough for most sane cases.
>>
>>
>> That is a creative approach. The problem though is that objects on the 
>> partial lists are kind of sorted. The partial slabs with only a few 
>> objects available are at the start of the list so that allocations cause 
>> them to be removed from the partial list fast. Full slabs do not need to 
>> be tracked on any list.
>>
>> The partial slabs with few objects are put at the end of the partial list 
>> in the hope that the few objects remaining will also be freed which would 
>> allow the freeing of the slab folio.
>>
>> So the object density may be higher at the beginning of the list.
>>
>> kmem_cache_shrink() will explicitly sort the partial lists to put the 
>> partial pages in that order.
>>
>> Can you run some tests showing the difference between the estimation and 
>> the real count?

Yes.
On a server with one NUMA node, I create a case that uses many dentry objects.
For "dentry", the length of partial slabs is slightly above 250000. Then, I
compare my approach of scanning N slabs from the list's head v.s. the original
approach of scanning the full list. I do it by getting both results using
the new and the original count_partial() and printing them in /proc/slabinfo.

N = 10000
my_result = 4741651
org_result = 4744966
diff = (org_result - my_result) / org_result = 0.00069 = 0.069 %

Increasing N further to 25000 will only slight improve the accuracy:
N = 15000 -> diff =  0.02 %
N = 20000 -> diff =  0.01 %
N = 25000 -> diff = -0.017 %

Based on the measurement, I think the difference between the estimation and
the real count is very limited (i.e. less than 0.1% for N = 10000). The
benefit is significant: shorter execution time for get_slabinfo(); no more
soft lockups or crashes caused by count_partial().

> 
> Maybe we could also get a more accurate picture by counting N slabs from the
> head and N from the tail and approximating from both. Also not perfect, but
> could be able to answer the question if the kmem_cache is significantly
> fragmented. Which is probably the only information we can get from the
> slabinfo <active_objs> vs <num_objs>. IIRC the latter is always accurate,
> the former never because of cpu slabs, so we never know how many objects are
> exactly in use. By comparing both we can get an idea of the fragmentation,
> and if this change won't make that estimate significantly worse, it should
> be acceptable.


  reply	other threads:[~2024-04-12 17:30 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-11 16:40 Jianfeng Wang
2024-04-11 17:02 ` Christoph Lameter (Ampere)
2024-04-12  7:48   ` Vlastimil Babka
2024-04-12 17:29     ` Jianfeng Wang [this message]
2024-04-12 18:16       ` Christoph Lameter (Ampere)
2024-04-12 18:32         ` Jianfeng Wang
2024-04-12 20:20       ` [External] : " Vlastimil Babka
2024-04-12 20:44         ` Jianfeng Wang
2024-04-13  1:17           ` Jianfeng Wang
2024-04-15  7:35             ` Vlastimil Babka
2024-04-16 18:58               ` Jianfeng Wang
2024-04-16 20:14                 ` Vlastimil Babka
2024-04-15 16:20             ` Christoph Lameter (Ampere)
2024-04-13  4:43         ` [External] : " Matthew Wilcox
2024-04-12  7:41 ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a8e208fb-7842-4bca-9d2d-3aae21da030c@oracle.com \
    --to=jianfeng.w.wang@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=junxiao.bi@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox