* [LSF/MM/BPF TOPIC] Memory fragmentation with large block sizes
@ 2026-02-19 9:54 Hannes Reinecke
2026-02-19 14:32 ` Theodore Tso
2026-02-19 14:53 ` Bart Van Assche
0 siblings, 2 replies; 5+ messages in thread
From: Hannes Reinecke @ 2026-02-19 9:54 UTC (permalink / raw)
To: lsf-pc, linux-nvme, linux-block, linux-mm
Hi all,
I (together with the Czech Technical University) did some experiments
trying to measure memory fragmentation with large block sizes.
Testbed used was an nvme setup talking to a nvmet storage over
the network.
Doing so raised some challenges:
- How do you _generate_ memory fragmentation? The MM subsystem is
precisely geared up to avoid it, so you would need to come up
with some idea how to defeat it. With the help from Willy I managed
to come up with something, but I really would like to discuss
what would be the best option here.
- What is acceptable memory fragmentation? Are we good enough if the
measured fragmentation does not grow during the test runs?
- Do we have better visibility into memory fragmentation other than
just reading /proc/buddyinfo?
And, of course, I would like to present (and discuss) the results
of the testruns done on 4k, 8k, and 16k blocksizes.
Not sure if this should be a storage or MM topic; I'll let the
lsf-pc decide.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [LSF/MM/BPF TOPIC] Memory fragmentation with large block sizes
2026-02-19 9:54 [LSF/MM/BPF TOPIC] Memory fragmentation with large block sizes Hannes Reinecke
@ 2026-02-19 14:32 ` Theodore Tso
2026-02-20 7:44 ` Hannes Reinecke
2026-02-19 14:53 ` Bart Van Assche
1 sibling, 1 reply; 5+ messages in thread
From: Theodore Tso @ 2026-02-19 14:32 UTC (permalink / raw)
To: Hannes Reinecke; +Cc: lsf-pc, linux-nvme, linux-block, linux-mm
On Thu, Feb 19, 2026 at 10:54:48AM +0100, Hannes Reinecke wrote:
> Hi all,
>
> I (together with the Czech Technical University) did some experiments trying
> to measure memory fragmentation with large block sizes.
> Testbed used was an nvme setup talking to a nvmet storage over
> the network.
>
> Doing so raised some challenges:
>
> - How do you _generate_ memory fragmentation? The MM subsystem is
> precisely geared up to avoid it, so you would need to come up
> with some idea how to defeat it. With the help from Willy I managed
> to come up with something, but I really would like to discuss
> what would be the best option here.
I'm trying to understand the goal of the experiment. I'm guessing
that the goal was to see how much memory fragmentation would result
from using large block sizes with the control being to use, say, 4k
blocks. Is that correct?
So I guess the question here is what are realstic workloads that
people would have in real world situations, so we can do the A-B
experiments to see what using LBS result in?
> - What is acceptable memory fragmentation? Are we good enough if the
> measured fragmentation does not grow during the test runs?
I can think of two possible metrics. The first is whether it results
in degradation of performance given certain real world workloads.
The second is whether given a particular memory pressure, the memory
fragmentation results in more jobs getting OOM killed.
- Ted
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [LSF/MM/BPF TOPIC] Memory fragmentation with large block sizes
2026-02-19 9:54 [LSF/MM/BPF TOPIC] Memory fragmentation with large block sizes Hannes Reinecke
2026-02-19 14:32 ` Theodore Tso
@ 2026-02-19 14:53 ` Bart Van Assche
2026-02-19 15:00 ` Matthew Wilcox
1 sibling, 1 reply; 5+ messages in thread
From: Bart Van Assche @ 2026-02-19 14:53 UTC (permalink / raw)
To: Hannes Reinecke, lsf-pc, linux-nvme, linux-block, linux-mm
On 2/19/26 1:54 AM, Hannes Reinecke wrote:
> I (together with the Czech Technical University) did some experiments
> trying to measure memory fragmentation with large block sizes.
> Testbed used was an nvme setup talking to a nvmet storage over
> the network.
>
> Doing so raised some challenges:
>
> - How do you _generate_ memory fragmentation? The MM subsystem is
> precisely geared up to avoid it, so you would need to come up
> with some idea how to defeat it. With the help from Willy I managed
> to come up with something, but I really would like to discuss
> what would be the best option here.
> - What is acceptable memory fragmentation? Are we good enough if the
> measured fragmentation does not grow during the test runs?
> - Do we have better visibility into memory fragmentation other than
> just reading /proc/buddyinfo?
The larger the block size, the higher the write amplification (WAF),
isn't it? Why to increase the block size since there is a solution
available that doesn't increase WAF, namely zoned storage?
Additionally, why is contiguous memory required for block sizes
larger than the page size? Does this perhaps come from the VFS layer?
If so, is this something that can be fixed?
Thanks,
Bart.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [LSF/MM/BPF TOPIC] Memory fragmentation with large block sizes
2026-02-19 14:53 ` Bart Van Assche
@ 2026-02-19 15:00 ` Matthew Wilcox
0 siblings, 0 replies; 5+ messages in thread
From: Matthew Wilcox @ 2026-02-19 15:00 UTC (permalink / raw)
To: Bart Van Assche
Cc: Hannes Reinecke, lsf-pc, linux-nvme, linux-block, linux-mm
On Thu, Feb 19, 2026 at 06:53:28AM -0800, Bart Van Assche wrote:
> Additionally, why is contiguous memory required for block sizes
> larger than the page size? Does this perhaps come from the VFS layer?
> If so, is this something that can be fixed?
No.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [LSF/MM/BPF TOPIC] Memory fragmentation with large block sizes
2026-02-19 14:32 ` Theodore Tso
@ 2026-02-20 7:44 ` Hannes Reinecke
0 siblings, 0 replies; 5+ messages in thread
From: Hannes Reinecke @ 2026-02-20 7:44 UTC (permalink / raw)
To: Theodore Tso; +Cc: lsf-pc, linux-nvme, linux-block, linux-mm
On 2/19/26 15:32, Theodore Tso wrote:
> On Thu, Feb 19, 2026 at 10:54:48AM +0100, Hannes Reinecke wrote:
>> Hi all,
>>
>> I (together with the Czech Technical University) did some experiments trying
>> to measure memory fragmentation with large block sizes.
>> Testbed used was an nvme setup talking to a nvmet storage over
>> the network.
>>
>> Doing so raised some challenges:
>>
>> - How do you _generate_ memory fragmentation? The MM subsystem is
>> precisely geared up to avoid it, so you would need to come up
>> with some idea how to defeat it. With the help from Willy I managed
>> to come up with something, but I really would like to discuss
>> what would be the best option here.
>
> I'm trying to understand the goal of the experiment. I'm guessing
> that the goal was to see how much memory fragmentation would result
> from using large block sizes with the control being to use, say, 4k
> blocks. Is that correct?
>
The main goal was to figure out if we have increased memory
fragmentation when using LBS.
Clearly, most (internal) allocations still work on page-sized
objects, so one can argue that using LBS might increase fragmentation.
On the other hand, all _filesystem_ objects will be in LBS sizes,
so we won't increase fragmentation if we only allocate in LBS sizes.
So which is it?
> So I guess the question here is what are realstic workloads that
> people would have in real world situations, so we can do the A-B
> experiments to see what using LBS result in?
>
Yes.
>> - What is acceptable memory fragmentation? Are we good enough if the
>> measured fragmentation does not grow during the test runs?
>
> I can think of two possible metrics. The first is whether it results
> in degradation of performance given certain real world workloads.
>
> The second is whether given a particular memory pressure, the memory
> fragmentation results in more jobs getting OOM killed.
>
That would be ideal, but we first need to have a program exerting
memory pressure...
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-02-20 7:44 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-19 9:54 [LSF/MM/BPF TOPIC] Memory fragmentation with large block sizes Hannes Reinecke
2026-02-19 14:32 ` Theodore Tso
2026-02-20 7:44 ` Hannes Reinecke
2026-02-19 14:53 ` Bart Van Assche
2026-02-19 15:00 ` Matthew Wilcox
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox