From: Kent Overstreet <kent.overstreet@linux.dev>
To: David Wang <00107082@163.com>
Cc: Suren Baghdasaryan <surenb@google.com>,
akpm@linux-foundation.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] alloc_tag: avoid mem alloc and iter reset when reading allocinfo
Date: Thu, 8 May 2025 09:33:50 -0400 [thread overview]
Message-ID: <y6d7vzvii5wvfby5446ukpvdmulwd5lzcyki6rpxckh432d6jz@xwtlwnkhztuo> (raw)
In-Reply-To: <e1cc19.5287.196ae733594.Coremail.00107082@163.com>
On Thu, May 08, 2025 at 01:51:48PM +0800, David Wang wrote:
> At 2025-05-08 12:07:40, "Kent Overstreet" <kent.overstreet@linux.dev> wrote:
> >Another thing to note is that memory layout - avoiding pointer chasing -
> >is hugely important, but it'll almost never show up as allocator calls.
> >
> >To give you some examples, mempools and biosets used to be separately
> >allocated. This was mainly to make error paths in outer object
> >constructors/destructors easier and safer: instead of keeping track of
> >what's initialized and what's not, if you've got a pointer to a
> >mempool/bioset you call *_free() on it.
> >
> >(People hadn't yet clued that you can just kzalloc() the entire outer
> >object, and then if the inner object is zeroed it wasn't initialized).
> >
> >But that means you're adding a pointer chase to every mempool_alloc()
> >call, and since bioset itself has mempools allocating bios had _two_
> >unnecessary pointer derefs. That's death for performance when you're
> >running cache cold, but since everyone benchmarks cache-hot...
> >
> >(I was the one who fixed that).
> >
> >Another big one was generic_file_buffered_read(). Main buffered read
> >path, everyone wants it to be as fast as possible.
> >
> >But the core is (was) a loop that walks the pagecache radix tree to get
> >the page, then copies 4k of data out to userspace (there goes l1), then
> >repeats all that pointer chasing for the next 4k. Pre large folios, it
> >was horrific.
> >
> >Solution - vectorize it. Look up all the pages we're copying from all at
> >once, stuff them in a (dynamically allocated! for each read!) vector,
> >and then do the copying out to userspace all at once. Massive
> >performance gain.
> >
> >Of course, to do that I first had to clean up a tangled 250+ line
> >monstrosity of half baked, poorly thought out "optimizations" (the worst
> >spaghetti of gotos you'd ever seen) and turn it into something
> >manageable...
> >
> >So - keep things simple, don't overthink the little stuff, so you can
> >spot and tackle the big algorithmic wins :)
> I will keep this in mind~! :)
>
> And thanks for the enlightening notes~!!
>
> Though I could not quite catch up with the first one, I think I got
> the point: avoid unnecessary pointer chasing and keep the pointer
> chasing as short(balanced) as possible~
To illustrate - DRAM latency is 30-70n.
At 4GHz, that's 120-280 cycles, and a properly fed CPU can do multiple
instructions per clock - so a cache miss all the way to DRAM can cost
you hundreds of instructions.
> The second one, about copy 4k by 4k, seems quite similar to seq_file,
> at least the "4k" part, literally. seq_file read() defaults to alloc
> 4k buffer, and read data until EOF or the 4k buffer is full, and
> start over again for the next read().
>
> One solution could be make changes to seq_file, do not stop until user
> buffer is full for each read. kind of similar to your second note, in
> a sequential style, I think.
>
> If user read with 128K buffer, and seq_file fill the buffer 4k by
> 4k, it would only need ~3 read calls for allocinfo. (I did post a
> patch for seq_file to fill user buffer, but start/stop still happens
> at 4k boundary , so no help for
> the iterator rewinding when read /proc/allocinfo yet.
> https://lore.kernel.org/lkml/20241220140819.9887-1-00107082@163.com/ )
> The solution in this patch is keeping the iterator alive and valid
> cross read boundary, this can also avoid the cost for each start
> over.
The first question is - does it matter? If the optimization is just for
/proc/allocinfo, who's reading it at a high enough rate that we care?
If it's only being used interactively, it doesn't matter. If it's being
read at a high rate by some sort of profiling program, we'd want to skip
the text interface entirely and add an ioctl to read the data out in a
binary format.
The idea of changing seq_file to continue until the user buffer is full
- that'd be a good one, if you're making changes that benefit all
seq_file users.
next prev parent reply other threads:[~2025-05-08 13:34 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-07 17:55 David Wang
2025-05-07 18:19 ` David Wang
2025-05-07 23:42 ` [PATCH] " Suren Baghdasaryan
2025-05-08 0:01 ` Kent Overstreet
2025-05-08 3:06 ` David Wang
2025-05-08 3:31 ` Kent Overstreet
2025-05-08 3:35 ` David Wang
2025-05-08 4:07 ` Kent Overstreet
2025-05-08 5:51 ` David Wang
2025-05-08 13:33 ` Kent Overstreet [this message]
2025-05-08 16:24 ` David Wang
2025-05-08 16:34 ` Kent Overstreet
2025-05-08 16:58 ` David Wang
2025-05-08 17:17 ` David Wang
2025-05-08 17:26 ` Kent Overstreet
2025-05-08 2:24 ` David Wang
2025-05-07 23:36 ` Suren Baghdasaryan
2025-05-08 3:10 ` David Wang
2025-05-08 15:32 ` David Wang
2025-05-08 21:41 ` Suren Baghdasaryan
2025-05-09 5:53 ` [PATCH 2/2] alloc_tag: keep codetag iterator cross read() calls David Wang
2025-05-09 17:34 ` Suren Baghdasaryan
2025-05-09 17:45 ` David Wang
2025-05-09 17:39 ` [PATCH v2 2/2] alloc_tag: keep codetag iterator active between " David Wang
2025-05-09 18:33 ` Tim Chen
2025-05-09 19:36 ` Suren Baghdasaryan
2025-05-09 19:46 ` Tim Chen
2025-05-09 20:46 ` Suren Baghdasaryan
2025-05-09 21:15 ` Suren Baghdasaryan
2025-05-10 3:10 ` David Wang
2025-05-10 3:30 ` Suren Baghdasaryan
2025-05-10 3:58 ` David Wang
2025-05-10 4:03 ` Suren Baghdasaryan
2025-05-10 3:35 ` David Wang
2025-05-10 3:25 ` David Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=y6d7vzvii5wvfby5446ukpvdmulwd5lzcyki6rpxckh432d6jz@xwtlwnkhztuo \
--to=kent.overstreet@linux.dev \
--cc=00107082@163.com \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=surenb@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox