linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Chuck Lever III <chuck.lever@oracle.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: Feng Tang <feng.tang@intel.com>,
	"Sang, Oliver" <oliver.sang@intel.com>,
	"oe-lkp@lists.linux.dev" <oe-lkp@lists.linux.dev>,
	lkp <lkp@intel.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Christian Brauner <brauner@kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	"Huang, Ying" <ying.huang@intel.com>,
	"Yin, Fengwei" <fengwei.yin@intel.com>,
	Liam Howlett <liam.howlett@oracle.com>
Subject: Re: [linus:master] [shmem]  a2e459555c:  aim9.disk_src.ops_per_sec -19.0% regression
Date: Thu, 4 Jan 2024 19:33:47 +0000	[thread overview]
Message-ID: <D00399D9-D629-4CE3-AC32-636FD6F06C24@oracle.com> (raw)
In-Reply-To: <ZQCLdzmtVcjxZWXt@casper.infradead.org>



> On Sep 12, 2023, at 12:01 PM, Matthew Wilcox <willy@infradead.org> wrote:
> 
> On Tue, Sep 12, 2023 at 11:14:42PM +0800, Feng Tang wrote:
>>> Well that's the problem. Since I can't run the reproducer, there's
>>> nothing I can do to troubleshoot the problem myself.
>> 
>> We dug more into the perf and other profiling data from 0Day server
>> running this case, and it seems that the new simple_offset_add()
>> called by shmem_mknod() brings extra cost related with slab,
>> specifically the 'radix_tree_node', which cause the regression.
>> 
>> Here is some slabinfo diff for commit a2e459555c5f and its parent:
>> 
>> 23a31d87645c6527 a2e459555c5f9da3e619b7e47a6 
>> ---------------- --------------------------- 
>> 
>>     26363           +40.2%      36956        slabinfo.radix_tree_node.active_objs
>>    941.00           +40.4%       1321        slabinfo.radix_tree_node.active_slabs
>>     26363           +40.3%      37001        slabinfo.radix_tree_node.num_objs
>>    941.00           +40.4%       1321        slabinfo.radix_tree_node.num_slabs
> 
> I can't find the benchmark source, but my suspicion is that this
> creates and deletes a lot of files in a directory.  The 'stable
> directory offsets' series uses xa_alloc_cyclic(), so we'll end up
> with a very sparse radix tree.  ie it'll look something like this:
> 
> 0 - "."
> 1 - ".."
> 6 - "d"
> 27 - "y"
> 4000 - "fzz"
> 65537 - "czzz"
> 643289767 - "bzzzzzz"
> 
> (i didn't work out the names precisely here, but this is approximately
> what you'd get if you create files a-z, aa-zz, aaa-zzz, etc and delete
> almost all of them)
> 
> The radix tree does not handle this well.  It'll allocate one node for:
> 
> entries 0-63 (covers the first 4 entries)
> entries 0-4095
> entries 3968-4031 (the first 5)
> entries 0-262143
> entries 65536-69631
> entries 65536-65599 (the first 6)
> entries 0-16777215
> entries 0-1073741823
> entries 637534208-654311423
> entries 643039232-643301375
> entries 643289088-643293183
> entries 643289728-643289791 (all 7)
> 
> That ends up being 12 nodes (you get 7 nodes per page) to store 7
> pointers.  Admittedly to get here, you have to do 643289765 creations
> and nearly as many deletions, so are we going to see it in a
> non-benchmark situation?
> 
> The maple tree is more resilient against this kind of shenanigan, but
> we're not there in terms of supporting the kind of allocation you
> want.  For this kind of allocation pattern, you'd get all 7 pointers
> in a single 256-byte node.

Hello Matthew, it's been a couple of kernel releases, so
following up.

Is Maple tree ready for libfs to use it for managing directory
offsets?

Should we just go for broke and convert libfs from xarray to
Maple tree now?


--
Chuck Lever



  parent reply	other threads:[~2024-01-04 19:34 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-08  5:26 kernel test robot
2023-09-08 14:43 ` Chuck Lever III
2023-09-12  1:25   ` Oliver Sang
2023-09-12 13:01     ` Chuck Lever III
2023-09-12 13:19       ` Oliver Sang
2023-09-12 15:14       ` Feng Tang
2023-09-12 15:26         ` Chuck Lever III
2023-09-12 16:01         ` Matthew Wilcox
2023-09-12 16:27           ` Chuck Lever III
2023-09-13 17:45           ` Chuck Lever III
2024-01-04 19:33           ` Chuck Lever III [this message]
2024-01-05 16:27             ` Liam R. Howlett
2024-01-05 16:33               ` Chuck Lever III
2023-09-13  6:47         ` Feng Tang
2023-09-13 13:32           ` Chuck Lever III

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=D00399D9-D629-4CE3-AC32-636FD6F06C24@oracle.com \
    --to=chuck.lever@oracle.com \
    --cc=brauner@kernel.org \
    --cc=feng.tang@intel.com \
    --cc=fengwei.yin@intel.com \
    --cc=liam.howlett@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkp@intel.com \
    --cc=oe-lkp@lists.linux.dev \
    --cc=oliver.sang@intel.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox