linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Jianfeng Wang <jianfeng.w.wang@oracle.com>,
	"Christoph Lameter (Ampere)" <cl@linux.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"penberg@kernel.org" <penberg@kernel.org>,
	"rientjes@google.com" <rientjes@google.com>,
	"iamjoonsoo.kim@lge.com" <iamjoonsoo.kim@lge.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	Junxiao Bi <junxiao.bi@oracle.com>
Subject: Re: [PATCH] slub: limit number of slabs to scan in count_partial()
Date: Mon, 15 Apr 2024 09:35:18 +0200	[thread overview]
Message-ID: <567ed01c-f0f5-45ee-9711-cc5719ee7666@suse.cz> (raw)
In-Reply-To: <5552D041-8549-4E76-B3EC-03C76C117077@oracle.com>

On 4/13/24 3:17 AM, Jianfeng Wang wrote:
> 
>> On Apr 12, 2024, at 1:44 PM, Jianfeng Wang <jianfeng.w.wang@oracle.com> wrote:
>> 
>> On 4/12/24 1:20 PM, Vlastimil Babka wrote:
>>> On 4/12/24 7:29 PM, Jianfeng Wang wrote:
>>>> 
>>>> On 4/12/24 12:48 AM, Vlastimil Babka wrote:
>>>>> On 4/11/24 7:02 PM, Christoph Lameter (Ampere) wrote:
>>>>>> On Thu, 11 Apr 2024, Jianfeng Wang wrote:
>>>>>> 
>>>>>>> So, the fix is to limit the number of slabs to scan in
>>>>>>> count_partial(), and output an approximated result if the list is too
>>>>>>> long. Default to 10000 which should be enough for most sane cases.
>>>>>> 
>>>>>> 
>>>>>> That is a creative approach. The problem though is that objects on the 
>>>>>> partial lists are kind of sorted. The partial slabs with only a few 
>>>>>> objects available are at the start of the list so that allocations cause 
>>>>>> them to be removed from the partial list fast. Full slabs do not need to 
>>>>>> be tracked on any list.
>>>>>> 
>>>>>> The partial slabs with few objects are put at the end of the partial list 
>>>>>> in the hope that the few objects remaining will also be freed which would 
>>>>>> allow the freeing of the slab folio.
>>>>>> 
>>>>>> So the object density may be higher at the beginning of the list.
>>>>>> 
>>>>>> kmem_cache_shrink() will explicitly sort the partial lists to put the 
>>>>>> partial pages in that order.
>>>>>> 
> 
> Realized that I’d do "echo 1 > /sys/kernel/slab/dentry/shrink” to sort the list explicitly.
> After that, the numbers become:
> N = 10000 -> diff = 7.1 %
> N = 20000 -> diff = 5.7 %
> N = 25000 -> diff = 5.4 %
> So, expecting ~5-7% difference after shrinking.
> 
>>>>>> Can you run some tests showing the difference between the estimation and 
>>>>>> the real count?
>>>> 
>>>> Yes.
>>>> On a server with one NUMA node, I create a case that uses many dentry objects.
>>> 
>>> Could you describe in more detail how do you make dentry cache to grow such
>>> a large partial slabs list? Thanks.
>>> 
>> 
>> I utilized the fact that creating a folder will create a new dentry object;
>> deleting a folder will delete all its sub-folder's dentry objects.
>> 
>> Then, I started to create N folders, while each folder has M empty sub-folders.
>> Assuming that these operations would consume a large number of dentry
>> objects in the sequential order. Their slabs were very likely to be full slabs.
>> After all folders were created, I deleted a subset of the N folders (i.e.,
>> one out of every two folders). This would create many holes, which turned a
>> subset of full slabs into partial slabs.

Thanks, right, so that's quite a deterministic way to achieve the long
partial lists with very close to uniform ratio of free/used, so no wonder
the resulting accuracy is good and the diff is very small. But in practice
the workloads that may lead to long lists will not be so uniform. The result
after shrinking shows what happens if there's bias in which slabs we inspect
due to the sorting. But still most of the slabs will have the near-uniform
free/used ratio so the sorting will not do so much difference. But another
workload might do that.

So what happens if you inspect X slabs from the head and X from the tail as
I suggested? That should help your test case even after you sort, and also
should in theory be more accurate even for less uniform workloads.


  reply	other threads:[~2024-04-15  7:35 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-11 16:40 Jianfeng Wang
2024-04-11 17:02 ` Christoph Lameter (Ampere)
2024-04-12  7:48   ` Vlastimil Babka
2024-04-12 17:29     ` [External] : " Jianfeng Wang
2024-04-12 18:16       ` Christoph Lameter (Ampere)
2024-04-12 18:32         ` Jianfeng Wang
2024-04-12 20:20       ` [External] : " Vlastimil Babka
2024-04-12 20:44         ` Jianfeng Wang
2024-04-13  1:17           ` Jianfeng Wang
2024-04-15  7:35             ` Vlastimil Babka [this message]
2024-04-16 18:58               ` Jianfeng Wang
2024-04-16 20:14                 ` Vlastimil Babka
2024-04-15 16:20             ` Christoph Lameter (Ampere)
2024-04-13  4:43         ` [External] : " Matthew Wilcox
2024-04-12  7:41 ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=567ed01c-f0f5-45ee-9711-cc5719ee7666@suse.cz \
    --to=vbabka@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=jianfeng.w.wang@oracle.com \
    --cc=junxiao.bi@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox