* [RFC] More deterministic SLOB for real time embedded systems
@ 2021-10-17 4:28 Hyeonggon Yoo
2021-10-17 13:36 ` segregated list + slab merging is much better than original SLOB Hyeonggon Yoo
2021-10-25 8:14 ` [RFC] More deterministic SLOB for real time embedded systems Christoph Lameter
0 siblings, 2 replies; 19+ messages in thread
From: Hyeonggon Yoo @ 2021-10-17 4:28 UTC (permalink / raw)
To: linux-mm
Cc: linux-kernel, Christoph Lameter, Pekka Enberg, David Rientjes,
Joonsoo Kim, Andrew Morton, Vlastimil Babka, Hyeonggon Yoo
I've been reading SLUB/SLOB code for a while. SLUB recently became
real time compatible by reducing its locking area.
for now, SLUB is the only slab allocator for PREEMPT_RT because
it works better than SLAB on RT and SLOB uses non-deterministic method,
sequential fit.
But memory usage of SLUB is too high for systems with low memory.
So In my local repository I made SLOB to use segregated free list
method, which is more more deterministic, to provide bounded latency.
This can be done by managing list of partial pages globally
for every power of two sizes (8, 16, 32, ..., PAGE_SIZE) per NUMA nodes.
minimal allocation size is size of pointers to keep pointer of next free object
like SLUB.
By making size of objects in same page to have same size, there's no
need to iterate free blocks in a page. (Also iterating pages isn't needed)
Some cleanups and more tests (especially with NUMA/RT configs) needed,
but want to hear your opinion about the idea. Did not test on RT yet.
Below is result of benchmarks and memory usage. (on !RT)
with 13% increase in memory usage, it's nine times faster and
bounded fragmentation, and importantly provides predictable execution time.
current SLOB:
memory usage:
Slab: 7936 kB
hackbench:
Time: 263.900
Performance counter stats for 'hackbench -g 4 -l 10000':
527649.37 msec cpu-clock # 1.999 CPUs utilized
12451963 context-switches # 23.599 K/sec
251231 cpu-migrations # 476.132 /sec
4112 page-faults # 7.793 /sec
342196899596 cycles # 0.649 GHz
228439896295 instructions # 0.67 insn per cycle
3228211614 branch-misses
65667138978 cache-references # 124.452 M/sec
7406902357 cache-misses # 11.279 % of all cache refs
263.956975106 seconds time elapsed
5.213166000 seconds user
521.716737000 seconds sys
SLOB with segregated free list:
memory usage:
Slab: 8976 kB
hackbench:
Time: 28.896
Performance counter stats for 'hackbench -g 4 -l 10000':
57669.66 msec cpu-clock # 1.995 CPUs utilized
902343 context-switches # 15.647 K/sec
10569 cpu-migrations # 183.268 /sec
4116 page-faults # 71.372 /sec
72101524728 cycles # 1.250 GHz
68780577270 instructions # 0.95 insn per cycle
230133481 branch-misses
23610741192 cache-references # 409.414 M/sec
896060729 cache-misses # 3.795 % of all cache refs
28.909188264 seconds time elapsed
1.521686000 seconds user
56.105718000 seconds sys
^ permalink raw reply [flat|nested] 19+ messages in thread* segregated list + slab merging is much better than original SLOB 2021-10-17 4:28 [RFC] More deterministic SLOB for real time embedded systems Hyeonggon Yoo @ 2021-10-17 13:36 ` Hyeonggon Yoo 2021-10-17 13:57 ` Do we really need SLOB nowdays? Hyeonggon Yoo 2021-10-25 8:14 ` [RFC] More deterministic SLOB for real time embedded systems Christoph Lameter 1 sibling, 1 reply; 19+ messages in thread From: Hyeonggon Yoo @ 2021-10-17 13:36 UTC (permalink / raw) To: linux-mm Cc: linux-kernel, Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, Vlastimil Babka, Hyeonggon Yoo On Sun, Oct 17, 2021 at 04:28:52AM +0000, Hyeonggon Yoo wrote: > I've been reading SLUB/SLOB code for a while. SLUB recently became > real time compatible by reducing its locking area. > > for now, SLUB is the only slab allocator for PREEMPT_RT because > it works better than SLAB on RT and SLOB uses non-deterministic method, > sequential fit. > > But memory usage of SLUB is too high for systems with low memory. > So In my local repository I made SLOB to use segregated free list > method, which is more more deterministic, to provide bounded latency. > > This can be done by managing list of partial pages globally > for every power of two sizes (8, 16, 32, ..., PAGE_SIZE) per NUMA nodes. > minimal allocation size is size of pointers to keep pointer of next free object > like SLUB. > > By making objects in same page to have same size, there's no > need to iterate free blocks in a page. (Also iterating pages isn't needed) > > Some cleanups and more tests (especially with NUMA/RT configs) needed, > but want to hear your opinion about the idea. Did not test on RT yet. > > Below is result of benchmarks and memory usage. (on !RT) > with 13% increase in memory usage, it's nine times faster and > bounded fragmentation, and importantly provides predictable execution time. > Hello linux-mm, I improved it and it uses lower memory and 9x~13x faster than original SLOB. it shows much less fragmentation after hackbench. Rather than managing global freelist that has power of 2 sizes, I made a kmem_cache to manage its own freelist (for each NUMA nodes) and Added support for slab merging. So It quite looks like a lightweight SLUB now. I'll send rfc patch after some testing and code cleaning. I think it is more RT-friendly becuase it's uses more deterministic algorithm (But lock is still shared among cpus). Any opinions for RT? current SLOB: memory usage: after boot: Slab: 7908 kB after hackbench: Slab: 8544 kB Time: 189.947 Performance counter stats for 'hackbench -g 4 -l 10000': 379413.20 msec cpu-clock # 1.997 CPUs utilized 8818226 context-switches # 23.242 K/sec 375186 cpu-migrations # 988.859 /sec 3954 page-faults # 10.421 /sec 269923095290 cycles # 0.711 GHz 212341582012 instructions # 0.79 insn per cycle 2361087153 branch-misses 58222839688 cache-references # 153.455 M/sec 6786521959 cache-misses # 11.656 % of all cache refs 190.002062273 seconds time elapsed 3.486150000 seconds user 375.599495000 seconds sys SLOB with segregated list + slab merging: memory usage: after boot: Slab: 7560 kB after hackbench: Slab: 7836 kB hackbench: Time: 20.780 Performance counter stats for 'hackbench -g 4 -l 10000': 41509.79 msec cpu-clock # 1.996 CPUs utilized 630032 context-switches # 15.178 K/sec 8287 cpu-migrations # 199.640 /sec 4036 page-faults # 97.230 /sec 57477161020 cycles # 1.385 GHz 62775453932 instructions # 1.09 insn per cycle 164902523 branch-misses 22559952993 cache-references # 543.485 M/sec 832404011 cache-misses # 3.690 % of all cache refs 20.791893590 seconds time elapsed 1.423282000 seconds user 40.072449000 seconds sys - Thanks, Hyeonggon ^ permalink raw reply [flat|nested] 19+ messages in thread
* Do we really need SLOB nowdays? 2021-10-17 13:36 ` segregated list + slab merging is much better than original SLOB Hyeonggon Yoo @ 2021-10-17 13:57 ` Hyeonggon Yoo 2021-10-17 14:39 ` Matthew Wilcox 2021-10-25 8:15 ` Christoph Lameter 0 siblings, 2 replies; 19+ messages in thread From: Hyeonggon Yoo @ 2021-10-17 13:57 UTC (permalink / raw) To: linux-mm Cc: linux-kernel, Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, Vlastimil Babka, Hyeonggon Yoo On Sun, Oct 17, 2021 at 01:36:18PM +0000, Hyeonggon Yoo wrote: > On Sun, Oct 17, 2021 at 04:28:52AM +0000, Hyeonggon Yoo wrote: > > I've been reading SLUB/SLOB code for a while. SLUB recently became > > real time compatible by reducing its locking area. > > > > for now, SLUB is the only slab allocator for PREEMPT_RT because > > it works better than SLAB on RT and SLOB uses non-deterministic method, > > sequential fit. > > > > But memory usage of SLUB is too high for systems with low memory. > > So In my local repository I made SLOB to use segregated free list > > method, which is more more deterministic, to provide bounded latency. > > > > This can be done by managing list of partial pages globally > > for every power of two sizes (8, 16, 32, ..., PAGE_SIZE) per NUMA nodes. > > minimal allocation size is size of pointers to keep pointer of next free object > > like SLUB. > > > > By making objects in same page to have same size, there's no > > need to iterate free blocks in a page. (Also iterating pages isn't needed) > > > > Some cleanups and more tests (especially with NUMA/RT configs) needed, > > but want to hear your opinion about the idea. Did not test on RT yet. > > > > Below is result of benchmarks and memory usage. (on !RT) > > with 13% increase in memory usage, it's nine times faster and > > bounded fragmentation, and importantly provides predictable execution time. > > > > Hello linux-mm, I improved it and it uses lower memory > and 9x~13x faster than original SLOB. it shows much less fragmentation > after hackbench. > > Rather than managing global freelist that has power of 2 sizes, > I made a kmem_cache to manage its own freelist (for each NUMA nodes) and > Added support for slab merging. So It quite looks like a lightweight SLUB now. > > I'll send rfc patch after some testing and code cleaning. > > I think it is more RT-friendly becuase it's uses more deterministic > algorithm (But lock is still shared among cpus). Any opinions for RT? Hi there. after some thinking, I got a new question: If a lightweight SLUB is better than SLOB, Do we really need SLOB nowdays? And one more question: in Christoph's presentation [1], it says SLOB uses 300 KB of memory. but on my system it uses almost 8000 KB. what's is differences? [1] https://events.static.linuxfound.org/sites/events/files/slides/slaballocators.pdf SLUB without cpu partials: memory usage: after boot: Slab: 8672 kB after hackbench: Slab: 9540 kB Performance counter stats for 'hackbench -g 4 -l 10000': 48463.05 msec cpu-clock # 1.995 CPUs utilized 944154 context-switches # 19.482 K/sec 8161 cpu-migrations # 168.396 /sec 4117 page-faults # 84.951 /sec 52570808507 cycles # 1.085 GHz 65083778667 instructions # 1.24 insn per cycle 234990576 branch-misses 23628671709 cache-references # 487.561 M/sec 739599271 cache-misses # 3.130 % of all cache refs 24.287392120 seconds time elapsed 1.509198000 seconds user 46.942748000 seconds sys > current SLOB: > memory usage: > after boot: > Slab: 7908 kB > after hackbench: > Slab: 8544 kB > > Time: 189.947 > Performance counter stats for 'hackbench -g 4 -l 10000': > 379413.20 msec cpu-clock # 1.997 CPUs utilized > 8818226 context-switches # 23.242 K/sec > 375186 cpu-migrations # 988.859 /sec > 3954 page-faults # 10.421 /sec > 269923095290 cycles # 0.711 GHz > 212341582012 instructions # 0.79 insn per cycle > 2361087153 branch-misses > 58222839688 cache-references # 153.455 M/sec > 6786521959 cache-misses # 11.656 % of all cache refs > > 190.002062273 seconds time elapsed > > 3.486150000 seconds user > 375.599495000 seconds sys > > SLOB with segregated list + slab merging: > memory usage: > after boot: > Slab: 7560 kB > after hackbench: > Slab: 7836 kB > > hackbench: > Time: 20.780 > Performance counter stats for 'hackbench -g 4 -l 10000': > 41509.79 msec cpu-clock # 1.996 CPUs utilized > 630032 context-switches # 15.178 K/sec > 8287 cpu-migrations # 199.640 /sec > 4036 page-faults # 97.230 /sec > 57477161020 cycles # 1.385 GHz > 62775453932 instructions # 1.09 insn per cycle > 164902523 branch-misses > 22559952993 cache-references # 543.485 M/sec > 832404011 cache-misses # 3.690 % of all cache refs > > 20.791893590 seconds time elapsed > > 1.423282000 seconds user > 40.072449000 seconds sys > - > Thanks, > Hyeonggon ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Do we really need SLOB nowdays? 2021-10-17 13:57 ` Do we really need SLOB nowdays? Hyeonggon Yoo @ 2021-10-17 14:39 ` Matthew Wilcox 2021-10-18 9:45 ` Hyeonggon Yoo 2021-10-25 8:15 ` Christoph Lameter 1 sibling, 1 reply; 19+ messages in thread From: Matthew Wilcox @ 2021-10-17 14:39 UTC (permalink / raw) To: Hyeonggon Yoo Cc: linux-mm, linux-kernel, Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, Vlastimil Babka On Sun, Oct 17, 2021 at 01:57:08PM +0000, Hyeonggon Yoo wrote: > On Sun, Oct 17, 2021 at 01:36:18PM +0000, Hyeonggon Yoo wrote: > > On Sun, Oct 17, 2021 at 04:28:52AM +0000, Hyeonggon Yoo wrote: > > > I've been reading SLUB/SLOB code for a while. SLUB recently became > > > real time compatible by reducing its locking area. > > > > > > for now, SLUB is the only slab allocator for PREEMPT_RT because > > > it works better than SLAB on RT and SLOB uses non-deterministic method, > > > sequential fit. > > > > > > But memory usage of SLUB is too high for systems with low memory. > > > So In my local repository I made SLOB to use segregated free list > > > method, which is more more deterministic, to provide bounded latency. > > > > > > This can be done by managing list of partial pages globally > > > for every power of two sizes (8, 16, 32, ..., PAGE_SIZE) per NUMA nodes. > > > minimal allocation size is size of pointers to keep pointer of next free object > > > like SLUB. > > > > > > By making objects in same page to have same size, there's no > > > need to iterate free blocks in a page. (Also iterating pages isn't needed) > > > > > > Some cleanups and more tests (especially with NUMA/RT configs) needed, > > > but want to hear your opinion about the idea. Did not test on RT yet. > > > > > > Below is result of benchmarks and memory usage. (on !RT) > > > with 13% increase in memory usage, it's nine times faster and > > > bounded fragmentation, and importantly provides predictable execution time. > > > > > > > Hello linux-mm, I improved it and it uses lower memory > > and 9x~13x faster than original SLOB. it shows much less fragmentation > > after hackbench. > > > > Rather than managing global freelist that has power of 2 sizes, > > I made a kmem_cache to manage its own freelist (for each NUMA nodes) and > > Added support for slab merging. So It quite looks like a lightweight SLUB now. > > > > I'll send rfc patch after some testing and code cleaning. > > > > I think it is more RT-friendly becuase it's uses more deterministic > > algorithm (But lock is still shared among cpus). Any opinions for RT? > > Hi there. after some thinking, I got a new question: > If a lightweight SLUB is better than SLOB, > Do we really need SLOB nowdays? Better for what use case? SLOB is for machines with 1-16MB of RAM. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Do we really need SLOB nowdays? 2021-10-17 14:39 ` Matthew Wilcox @ 2021-10-18 9:45 ` Hyeonggon Yoo 2021-10-25 8:17 ` Christoph Lameter 0 siblings, 1 reply; 19+ messages in thread From: Hyeonggon Yoo @ 2021-10-18 9:45 UTC (permalink / raw) To: Matthew Wilcox Cc: Linux Memory Management List, LKML, Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, Vlastimil Babka [-- Attachment #1: Type: text/plain, Size: 2636 bytes --] On Sun, Oct 17, 2021, 11:40 PM Matthew Wilcox <willy@infradead.org> wrote: > On Sun, Oct 17, 2021 at 01:57:08PM +0000, Hyeonggon Yoo wrote: > > On Sun, Oct 17, 2021 at 01:36:18PM +0000, Hyeonggon Yoo wrote: > > > On Sun, Oct 17, 2021 at 04:28:52AM +0000, Hyeonggon Yoo wrote: > > > > I've been reading SLUB/SLOB code for a while. SLUB recently became > > > > real time compatible by reducing its locking area. > > > > > > > > for now, SLUB is the only slab allocator for PREEMPT_RT because > > > > it works better than SLAB on RT and SLOB uses non-deterministic > method, > > > > sequential fit. > > > > > > > > But memory usage of SLUB is too high for systems with low memory. > > > > So In my local repository I made SLOB to use segregated free list > > > > method, which is more more deterministic, to provide bounded latency. > > > > > > > > This can be done by managing list of partial pages globally > > > > for every power of two sizes (8, 16, 32, ..., PAGE_SIZE) per NUMA > nodes. > > > > minimal allocation size is size of pointers to keep pointer of next > free object > > > > like SLUB. > > > > > > > > By making objects in same page to have same size, there's no > > > > need to iterate free blocks in a page. (Also iterating pages isn't > needed) > > > > > > > > Some cleanups and more tests (especially with NUMA/RT configs) > needed, > > > > but want to hear your opinion about the idea. Did not test on RT yet. > > > > > > > > Below is result of benchmarks and memory usage. (on !RT) > > > > with 13% increase in memory usage, it's nine times faster and > > > > bounded fragmentation, and importantly provides predictable > execution time. > > > > > > > > > > Hello linux-mm, I improved it and it uses lower memory > > > and 9x~13x faster than original SLOB. it shows much less fragmentation > > > after hackbench. > > > > > > Rather than managing global freelist that has power of 2 sizes, > > > I made a kmem_cache to manage its own freelist (for each NUMA nodes) > and > > > Added support for slab merging. So It quite looks like a lightweight > SLUB now. > > > > > > I'll send rfc patch after some testing and code cleaning. > > > > > > I think it is more RT-friendly becuase it's uses more deterministic > > > algorithm (But lock is still shared among cpus). Any opinions for RT? > > > > Hi there. after some thinking, I got a new question: > > If a lightweight SLUB is better than SLOB, > > Do we really need SLOB nowdays? > > Better for what use case? SLOB is for machines with 1-16MB of RAM. > 1~16M is smaller than I thought. Hmm... I'm going to see how it works on tiny configuration. Thank you Matthew! > [-- Attachment #2: Type: text/html, Size: 3626 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Do we really need SLOB nowdays? 2021-10-18 9:45 ` Hyeonggon Yoo @ 2021-10-25 8:17 ` Christoph Lameter 2021-10-28 10:04 ` Hyeonggon Yoo 0 siblings, 1 reply; 19+ messages in thread From: Christoph Lameter @ 2021-10-25 8:17 UTC (permalink / raw) To: Hyeonggon Yoo Cc: Matthew Wilcox, Linux Memory Management List, LKML, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, Vlastimil Babka On Mon, 18 Oct 2021, Hyeonggon Yoo wrote: > > Better for what use case? SLOB is for machines with 1-16MB of RAM. > > > > 1~16M is smaller than I thought. Hmm... I'm going to see how it works on > tiny configuration. Thank you Matthew! Is there any reference where we can see such a configuration? Sure it does not work with SLUB too? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Do we really need SLOB nowdays? 2021-10-25 8:17 ` Christoph Lameter @ 2021-10-28 10:04 ` Hyeonggon Yoo 2021-10-28 12:08 ` Matthew Wilcox 0 siblings, 1 reply; 19+ messages in thread From: Hyeonggon Yoo @ 2021-10-28 10:04 UTC (permalink / raw) To: Christoph Lameter Cc: Matthew Wilcox, Linux Memory Management List, LKML, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, Vlastimil Babka On Mon, Oct 25, 2021 at 10:17:08AM +0200, Christoph Lameter wrote: > On Mon, 18 Oct 2021, Hyeonggon Yoo wrote: > > > > Better for what use case? SLOB is for machines with 1-16MB of RAM. > > > > > > > 1~16M is smaller than I thought. Hmm... I'm going to see how it works on > > tiny configuration. Thank you Matthew! > > Is there any reference where we can see such a configuration? Sure it does > not work with SLUB too? I thought why Matthew said "SLOB is for machines with 1-16MB of RAM" is because if memory is so low, then it is sensitive to memory usage. (But I still have doubt if we can run linux on machines like that.) ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Do we really need SLOB nowdays? 2021-10-28 10:04 ` Hyeonggon Yoo @ 2021-10-28 12:08 ` Matthew Wilcox 2021-10-30 6:12 ` Hyeonggon Yoo [not found] ` <20211210110835.GA632811@odroid> 0 siblings, 2 replies; 19+ messages in thread From: Matthew Wilcox @ 2021-10-28 12:08 UTC (permalink / raw) To: Hyeonggon Yoo Cc: Christoph Lameter, Linux Memory Management List, LKML, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, Vlastimil Babka On Thu, Oct 28, 2021 at 10:04:14AM +0000, Hyeonggon Yoo wrote: > On Mon, Oct 25, 2021 at 10:17:08AM +0200, Christoph Lameter wrote: > > On Mon, 18 Oct 2021, Hyeonggon Yoo wrote: > > > > > > Better for what use case? SLOB is for machines with 1-16MB of RAM. > > > > > > > > > > 1~16M is smaller than I thought. Hmm... I'm going to see how it works on > > > tiny configuration. Thank you Matthew! > > > > Is there any reference where we can see such a configuration? Sure it does > > not work with SLUB too? > > I thought why Matthew said "SLOB is for machines with 1-16MB of RAM" > is because if memory is so low, then it is sensitive to memory usage. > > (But I still have doubt if we can run linux on machines like that.) I sent you a series of articles about making Linux run in 1MB. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Do we really need SLOB nowdays? 2021-10-28 12:08 ` Matthew Wilcox @ 2021-10-30 6:12 ` Hyeonggon Yoo [not found] ` <20211210110835.GA632811@odroid> 1 sibling, 0 replies; 19+ messages in thread From: Hyeonggon Yoo @ 2021-10-30 6:12 UTC (permalink / raw) To: Matthew Wilcox Cc: Christoph Lameter, Linux Memory Management List, LKML, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, Vlastimil Babka On Thu, Oct 28, 2021 at 01:08:02PM +0100, Matthew Wilcox wrote: > On Thu, Oct 28, 2021 at 10:04:14AM +0000, Hyeonggon Yoo wrote: > > On Mon, Oct 25, 2021 at 10:17:08AM +0200, Christoph Lameter wrote: > > > On Mon, 18 Oct 2021, Hyeonggon Yoo wrote: > > > > > > > > Better for what use case? SLOB is for machines with 1-16MB of RAM. > > > > > > > > > > > > > 1~16M is smaller than I thought. Hmm... I'm going to see how it works on > > > > tiny configuration. Thank you Matthew! > > > > > > Is there any reference where we can see such a configuration? Sure it does > > > not work with SLUB too? > > > > I thought why Matthew said "SLOB is for machines with 1-16MB of RAM" > > is because if memory is so low, then it is sensitive to memory usage. > > > > (But I still have doubt if we can run linux on machines like that.) > > I sent you a series of articles about making Linux run in 1MB. Oh I missed your mail, I'm gonna read this! Thanks! Thanks, Hyeonggon. ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <20211210110835.GA632811@odroid>]
* Re: Do we really need SLOB nowdays? [not found] ` <20211210110835.GA632811@odroid> @ 2021-12-10 12:06 ` Christoph Lameter 2021-12-14 17:24 ` Vlastimil Babka 0 siblings, 1 reply; 19+ messages in thread From: Christoph Lameter @ 2021-12-10 12:06 UTC (permalink / raw) To: Hyeonggon Yoo Cc: Matthew Wilcox, Christoph Lameter, Linux Memory Management List, LKML, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, Vlastimil Babka On Fri, 10 Dec 2021, Hyeonggon Yoo wrote: > > > (But I still have doubt if we can run linux on machines like that.) > > > > I sent you a series of articles about making Linux run in 1MB. > > After some time playing with the size of kernel, > I was able to run linux in 6.6MiB of RAM. and the SLOB used > around 300KiB of memory. What is the minimal size you need for SLUB? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Do we really need SLOB nowdays? 2021-12-10 12:06 ` Christoph Lameter @ 2021-12-14 17:24 ` Vlastimil Babka [not found] ` <20211215062904.GA1150813@odroid> 0 siblings, 1 reply; 19+ messages in thread From: Vlastimil Babka @ 2021-12-14 17:24 UTC (permalink / raw) To: Christoph Lameter, Hyeonggon Yoo Cc: Matthew Wilcox, Christoph Lameter, Linux Memory Management List, LKML, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton On 12/10/21 13:06, Christoph Lameter wrote: > On Fri, 10 Dec 2021, Hyeonggon Yoo wrote: > >> > > (But I still have doubt if we can run linux on machines like that.) >> > >> > I sent you a series of articles about making Linux run in 1MB. >> >> After some time playing with the size of kernel, >> I was able to run linux in 6.6MiB of RAM. and the SLOB used >> around 300KiB of memory. > > What is the minimal size you need for SLUB? Good question. Meanwhile I tried to compare Slab: in /proc/meminfo on a virtme run: virtme-run --mods=auto --kdir /home/vbabka/wrk/linux/ --memory 2G,slots=2,maxmem=4G --qemu-opts --smp 4 Got ~30800kB with SLOB, 34500kB with SLUB without DEBUG and PERCPU_PARTIAL. Then did a quick and dirty patch (below) to never load c->slab in ___slab_alloc() and got to 32200kB. Fiddling with slub_min_order/slub_max_order didn't actually help, probably due to causing more internal fragmentation. So that's relatively close, but on a really small system the difference can be possibly more prominent. Also my test doesn't account for text/data or percpu usage differences. diff --git a/mm/slub.c b/mm/slub.c index 68aa112e469b..fd9c853971d1 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -3054,6 +3054,8 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, */ goto return_single; + goto return_single; + retry_load_slab: local_lock_irqsave(&s->cpu_slab->lock, flags); ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <20211215062904.GA1150813@odroid>]
* Re: Do we really need SLOB nowdays? [not found] ` <20211215062904.GA1150813@odroid> @ 2021-12-15 10:10 ` Vlastimil Babka 2021-12-15 15:23 ` Christoph Lameter 2022-02-18 10:13 ` Hyeonggon Yoo 0 siblings, 2 replies; 19+ messages in thread From: Vlastimil Babka @ 2021-12-15 10:10 UTC (permalink / raw) To: Hyeonggon Yoo Cc: Christoph Lameter, Matthew Wilcox, Christoph Lameter, Linux Memory Management List, LKML, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton On 12/15/21 07:29, Hyeonggon Yoo wrote: > On Tue, Dec 14, 2021 at 06:24:58PM +0100, Vlastimil Babka wrote: >> On 12/10/21 13:06, Christoph Lameter wrote: >> > On Fri, 10 Dec 2021, Hyeonggon Yoo wrote: >> > >> >> > > (But I still have doubt if we can run linux on machines like that.) >> >> > >> >> > I sent you a series of articles about making Linux run in 1MB. >> >> >> >> After some time playing with the size of kernel, >> >> I was able to run linux in 6.6MiB of RAM. and the SLOB used >> >> around 300KiB of memory. >> > >> > What is the minimal size you need for SLUB? >> > > I don't know why Christoph's mail is not in my mailbox. maybe I deleted it > by mistake or I'm not cc-ed. > > Anyway, I tried to measure this again with SLUB and SLOB. > > SLUB uses few hundreds of bytes than SLOB. > > There isn't much difference in 'Memory required to boot'. > (interestingly SLUB requires less) > > 'Memory required to boot' is measured by reducing memory > until it says 'System is deadlocked on memory'. I don't know > exact reason why they differ. > > Note that the configuration is based on tinyconfig and > I added initramfs support + tty layer (+ uart driver) + procfs support, > + ELF binary support + etc. > > there isn't even block layer, but it's good starting point to see > what happens in small system. > > SLOB: > > Memory required to boot: 6950K > > Slab: 368 kB > > SLUB: > Memory required to boot: 6800K > > Slab: 552 kB > > SLUB with slab merging: > > Slab: 536 kB 168kB different on a system with less than 8MB memory looks rather significant to me to simply delete SLOB, I'm afraid. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Do we really need SLOB nowdays? 2021-12-15 10:10 ` Vlastimil Babka @ 2021-12-15 15:23 ` Christoph Lameter 2022-02-18 10:13 ` Hyeonggon Yoo 1 sibling, 0 replies; 19+ messages in thread From: Christoph Lameter @ 2021-12-15 15:23 UTC (permalink / raw) To: Vlastimil Babka Cc: Hyeonggon Yoo, Matthew Wilcox, Christoph Lameter, Linux Memory Management List, LKML, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton On Wed, 15 Dec 2021, Vlastimil Babka wrote: > > SLOB: > > > > Memory required to boot: 6950K > > > > Slab: 368 kB > > > > SLUB: > > Memory required to boot: 6800K > > > > Slab: 552 kB > > > > SLUB with slab merging: > > > > Slab: 536 kB > > 168kB different on a system with less than 8MB memory looks rather > significant to me to simply delete SLOB, I'm afraid. This looks more like a bug/difference in SLAB accounting of SLOB. How could SLOB require more memory to boot but use less SLAB memory? This looks to me like a significant reason enough to remove SLOB since SLUB works with less memory than SLOB. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Do we really need SLOB nowdays? 2021-12-15 10:10 ` Vlastimil Babka 2021-12-15 15:23 ` Christoph Lameter @ 2022-02-18 10:13 ` Hyeonggon Yoo 2022-02-18 10:37 ` Hyeonggon Yoo 2022-02-18 16:10 ` David Laight 1 sibling, 2 replies; 19+ messages in thread From: Hyeonggon Yoo @ 2022-02-18 10:13 UTC (permalink / raw) To: Vlastimil Babka Cc: Christoph Lameter, Matthew Wilcox, Christoph Lameter, Linux Memory Management List, LKML, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton On Wed, Dec 15, 2021 at 11:10:06AM +0100, Vlastimil Babka wrote: > On 12/15/21 07:29, Hyeonggon Yoo wrote: > > On Tue, Dec 14, 2021 at 06:24:58PM +0100, Vlastimil Babka wrote: > >> On 12/10/21 13:06, Christoph Lameter wrote: > >> > On Fri, 10 Dec 2021, Hyeonggon Yoo wrote: > >> > > >> >> > > (But I still have doubt if we can run linux on machines like that.) > >> >> > > >> >> > I sent you a series of articles about making Linux run in 1MB. > >> >> > >> >> After some time playing with the size of kernel, > >> >> I was able to run linux in 6.6MiB of RAM. and the SLOB used > >> >> around 300KiB of memory. > >> > > >> > What is the minimal size you need for SLUB? > >> > > > > I don't know why Christoph's mail is not in my mailbox. maybe I deleted it > > by mistake or I'm not cc-ed. > > > > Anyway, I tried to measure this again with SLUB and SLOB. > > > > SLUB uses few hundreds of bytes than SLOB. > > > > There isn't much difference in 'Memory required to boot'. > > (interestingly SLUB requires less) > > > > 'Memory required to boot' is measured by reducing memory > > until it says 'System is deadlocked on memory'. I don't know > > exact reason why they differ. > > > > Note that the configuration is based on tinyconfig and > > I added initramfs support + tty layer (+ uart driver) + procfs support, > > + ELF binary support + etc. > > > > there isn't even block layer, but it's good starting point to see > > what happens in small system. > > > > SLOB: > > > > Memory required to boot: 6950K > > > > Slab: 368 kB > > > > SLUB: > > Memory required to boot: 6800K > > > > Slab: 552 kB > > > > SLUB with slab merging: > > > > Slab: 536 kB > > 168kB different on a system with less than 8MB memory looks rather > significant to me to simply delete SLOB, I'm afraid. Just FYI... Some experiment based on v5.17-rc3: SLOB: Slab: 388 kB SLUB: Slab: 540 kB (+152kb) SLUB with s->min_partial = 0: Slab: 452 kB (+64kb) SLUB with s->min_partial = 0 && slub_max_order = 0: Slab: 436 kB (+48kb) SLUB with s->min_partial = 0 && slub_max_order = 0 + merging slabs crazily (just ignore SLAB_NEVER_MERGE/SLAB_MERGE_SAME): Slab: 408 kB (+20kb) Decreasing further seem to be hard and I guess +20kb are due to partial slabs. I think SLUB can be memory-efficient as SLOB. Is SLOB (Address-Ordered next fit) stronger to fragmentation than SLUB? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Do we really need SLOB nowdays? 2022-02-18 10:13 ` Hyeonggon Yoo @ 2022-02-18 10:37 ` Hyeonggon Yoo 2022-02-18 16:10 ` David Laight 1 sibling, 0 replies; 19+ messages in thread From: Hyeonggon Yoo @ 2022-02-18 10:37 UTC (permalink / raw) To: Vlastimil Babka Cc: Christoph Lameter, Matthew Wilcox, Christoph Lameter, Linux Memory Management List, LKML, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton On Fri, Feb 18, 2022 at 10:13:29AM +0000, Hyeonggon Yoo wrote: > On Wed, Dec 15, 2021 at 11:10:06AM +0100, Vlastimil Babka wrote: > > On 12/15/21 07:29, Hyeonggon Yoo wrote: > > > On Tue, Dec 14, 2021 at 06:24:58PM +0100, Vlastimil Babka wrote: > > >> On 12/10/21 13:06, Christoph Lameter wrote: > > >> > On Fri, 10 Dec 2021, Hyeonggon Yoo wrote: > > >> > > > >> >> > > (But I still have doubt if we can run linux on machines like that.) > > >> >> > > > >> >> > I sent you a series of articles about making Linux run in 1MB. > > >> >> > > >> >> After some time playing with the size of kernel, > > >> >> I was able to run linux in 6.6MiB of RAM. and the SLOB used > > >> >> around 300KiB of memory. > > >> > > > >> > What is the minimal size you need for SLUB? > > >> > > > > > > I don't know why Christoph's mail is not in my mailbox. maybe I deleted it > > > by mistake or I'm not cc-ed. > > > > > > Anyway, I tried to measure this again with SLUB and SLOB. > > > > > > SLUB uses few hundreds of bytes than SLOB. > > > > > > There isn't much difference in 'Memory required to boot'. > > > (interestingly SLUB requires less) > > > > > > 'Memory required to boot' is measured by reducing memory > > > until it says 'System is deadlocked on memory'. I don't know > > > exact reason why they differ. > > > > > > Note that the configuration is based on tinyconfig and > > > I added initramfs support + tty layer (+ uart driver) + procfs support, > > > + ELF binary support + etc. > > > > > > there isn't even block layer, but it's good starting point to see > > > what happens in small system. > > > > > > SLOB: > > > > > > Memory required to boot: 6950K > > > > > > Slab: 368 kB > > > > > > SLUB: > > > Memory required to boot: 6800K > > > > > > Slab: 552 kB > > > > > > SLUB with slab merging: > > > > > > Slab: 536 kB > > > > 168kB different on a system with less than 8MB memory looks rather > > significant to me to simply delete SLOB, I'm afraid. > > Just FYI... > Some experiment based on v5.17-rc3: > > SLOB: > Slab: 388 kB > > SLUB: > Slab: 540 kB (+152kb) > > SLUB with s->min_partial = 0: > Slab: 452 kB (+64kb) > > SLUB with s->min_partial = 0 && slub_max_order = 0: > Slab: 436 kB (+48kb) > > SLUB with s->min_partial = 0 && slub_max_order = 0 > + merging slabs crazily (just ignore SLAB_NEVER_MERGE/SLAB_MERGE_SAME): > Slab: 408 kB (+20kb) > > Decreasing further seem to be hard and > I guess +20kb are due to partial slabs. > > I think SLUB can be memory-efficient as SLOB. > Is SLOB (Address-Ordered next fit) stronger to fragmentation than SLUB? (Address-Ordered *first* fit) ^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: Do we really need SLOB nowdays? 2022-02-18 10:13 ` Hyeonggon Yoo 2022-02-18 10:37 ` Hyeonggon Yoo @ 2022-02-18 16:10 ` David Laight 2022-02-19 11:59 ` Hyeonggon Yoo 1 sibling, 1 reply; 19+ messages in thread From: David Laight @ 2022-02-18 16:10 UTC (permalink / raw) To: 'Hyeonggon Yoo', Vlastimil Babka Cc: Christoph Lameter, Matthew Wilcox, Christoph Lameter, Linux Memory Management List, LKML, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton From: Hyeonggon Yoo > Sent: 18 February 2022 10:13 ... > I think SLUB can be memory-efficient as SLOB. > Is SLOB (Address-Ordered next^Wfirst fit) stronger to fragmentation than SLUB? Dunno, but I had to patch the vxworks malloc to use 'best fit' because 'first fit' based on a fifo free list was really horrid. I can't imagine an address ordered 'first fit' really being that much better. There are probably a lot more allocs and frees than the kernel used to have. Also isn't the performance of a 'first fit' going to get horrid when there are a lot of small items on the free list. Does SLUB split pages into 3s and 5s (on cache lime boundaries) as well as powers of 2? David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Do we really need SLOB nowdays? 2022-02-18 16:10 ` David Laight @ 2022-02-19 11:59 ` Hyeonggon Yoo 0 siblings, 0 replies; 19+ messages in thread From: Hyeonggon Yoo @ 2022-02-19 11:59 UTC (permalink / raw) To: David Laight Cc: Vlastimil Babka, Christoph Lameter, Matthew Wilcox, Christoph Lameter, Linux Memory Management List, LKML, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton On Fri, Feb 18, 2022 at 04:10:28PM +0000, David Laight wrote: > From: Hyeonggon Yoo > > Sent: 18 February 2022 10:13 > ... > > I think SLUB can be memory-efficient as SLOB. > > Is SLOB (Address-Ordered next^Wfirst fit) stronger to fragmentation than SLUB? > > Dunno, but I had to patch the vxworks malloc to use 'best fit' > because 'first fit' based on a fifo free list was really horrid. > > I can't imagine an address ordered 'first fit' really being that much better. > > There are probably a lot more allocs and frees than the kernel used to have. > > Also isn't the performance of a 'first fit' going to get horrid > when there are a lot of small items on the free list. SLOB is focused on low memory usage, at the cost of poor performance. Its speed is not a concern. I think Address-Ordered sequential fit method pretty well in terms of low memory usage. And I think SLUB may replace SLOB, but we need to sure SLUB is absolute winner.. I wonder How slab maintainers think? > > Does SLUB split pages into 3s and 5s (on cache lime boundaries) > as well as powers of 2? > SLUB/SLAB use different strategy than SLOB, for better allocation performance. It's variant of segregated storage method. SLUB/SLAB both creates dedicated "caches" for each type of object. for example, on my system, there are slab cache for dentry(192), filp(256), fs_cache(64) ... etc. Objects that has different types are by default managed by different cache, which holds manages of pages. slab caches can be merged for better cacheline utilization. SLUB/SLAB also creates global kmalloc caches at boot time for power of 2 objects and (128, 256, 512, 1K, 2K, 4K, 8K on my system). Thanks, Hyeonggon. > David > > - > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK > Registration No: 1397386 (Wales) > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Do we really need SLOB nowdays? 2021-10-17 13:57 ` Do we really need SLOB nowdays? Hyeonggon Yoo 2021-10-17 14:39 ` Matthew Wilcox @ 2021-10-25 8:15 ` Christoph Lameter 1 sibling, 0 replies; 19+ messages in thread From: Christoph Lameter @ 2021-10-25 8:15 UTC (permalink / raw) To: Hyeonggon Yoo Cc: linux-mm, linux-kernel, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, Vlastimil Babka On Sun, 17 Oct 2021, Hyeonggon Yoo wrote: > And one more question: > in Christoph's presentation [1], it says SLOB uses > 300 KB of memory. but on my system it uses almost 8000 KB. > what's is differences? Hmmm.... Someone already made "improvements" to SLOB? Kernel needs to be compiled for minimal overhead and debugging removed. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC] More deterministic SLOB for real time embedded systems 2021-10-17 4:28 [RFC] More deterministic SLOB for real time embedded systems Hyeonggon Yoo 2021-10-17 13:36 ` segregated list + slab merging is much better than original SLOB Hyeonggon Yoo @ 2021-10-25 8:14 ` Christoph Lameter 1 sibling, 0 replies; 19+ messages in thread From: Christoph Lameter @ 2021-10-25 8:14 UTC (permalink / raw) To: Hyeonggon Yoo Cc: linux-mm, linux-kernel, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton, Vlastimil Babka On Sun, 17 Oct 2021, Hyeonggon Yoo wrote: > But memory usage of SLUB is too high for systems with low memory. The memory usage of SLUB without all the extras (partial slabs, debugging etc etc) is very comparable to SLOB. ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2022-02-19 11:59 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-17 4:28 [RFC] More deterministic SLOB for real time embedded systems Hyeonggon Yoo
2021-10-17 13:36 ` segregated list + slab merging is much better than original SLOB Hyeonggon Yoo
2021-10-17 13:57 ` Do we really need SLOB nowdays? Hyeonggon Yoo
2021-10-17 14:39 ` Matthew Wilcox
2021-10-18 9:45 ` Hyeonggon Yoo
2021-10-25 8:17 ` Christoph Lameter
2021-10-28 10:04 ` Hyeonggon Yoo
2021-10-28 12:08 ` Matthew Wilcox
2021-10-30 6:12 ` Hyeonggon Yoo
[not found] ` <20211210110835.GA632811@odroid>
2021-12-10 12:06 ` Christoph Lameter
2021-12-14 17:24 ` Vlastimil Babka
[not found] ` <20211215062904.GA1150813@odroid>
2021-12-15 10:10 ` Vlastimil Babka
2021-12-15 15:23 ` Christoph Lameter
2022-02-18 10:13 ` Hyeonggon Yoo
2022-02-18 10:37 ` Hyeonggon Yoo
2022-02-18 16:10 ` David Laight
2022-02-19 11:59 ` Hyeonggon Yoo
2021-10-25 8:15 ` Christoph Lameter
2021-10-25 8:14 ` [RFC] More deterministic SLOB for real time embedded systems Christoph Lameter
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox