SLUB

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* SLUB
@ 2007-12-20 15:06 Mark Seger
  2007-12-20 19:44 ` SLUB Christoph Lameter
  0 siblings, 1 reply; 24+ messages in thread
From: Mark Seger @ 2007-12-20 15:06 UTC (permalink / raw)
  To: linux-mm, clameter

Forgive me if this is the wrong place to be asking this, but if so could 
someone point me to a better place?

This past summer I released a tool on sourceforge called collectl - see 
http://collectl.sourceforge.net/ which does some pretty nifty system 
monitoring, one component of which is slabs.  I finally got around to 
trying it out on a newer kernel and I picked 2.6.23 and lo and behold, 
it didn't work because /proc/slabinfo has disappeared to be replaced by 
/sys/slab.  I've been looking around to try and better understand how to 
map slubs to slabs and couldn't find anything written up the definitions 
of the field on /sys/slab and I also suspect that while some of 
information reported by slub might map there could be other useful 
information worth tracking.

To back up a few steps, in my collectl tool I can monitor slabs both in 
real time or log that data to a file for later playback.  The format I 
use for display is modeled after slabtop, but I simply record data for 
all slabs (you can supply a filter).  What I think is particularly 
useful about collectl is a switch that only shows allocations that have 
changed.  This means if you run my tool with a monitoring interval of a 
second (the default interval I use for slabs is 60 seconds since it is 
more work to read/process all of slabinfo) you only see occasional 
changes as they occur.  I've also found this feature very useful when 
analyzing longer term data that was collected at the 60 second 
intervals.  Here's an example of running it with a 1 second monitoring 
interval on a relatively idle system:

#                               
<-----------Objects----------><---------Slab Allocation------>
#         Name                  InUse   Bytes   Alloc   Bytes   InUse   
Bytes   Total   Bytes
09:28:54 sgpool-32                 32   32768      36   36864       8   
32768       9   36864
09:28:54 blkdev_requests           12    3168      30    7920       1    
4096       2    8192
09:28:54 bio                      313   40064     372   47616      11   
45056      12   49152
09:28:55 sgpool-32                 32   32768      32   32768       8   
32768       8   32768
09:28:55 blkdev_requests           12    3168      15    3960       1    
4096       1    4096
09:28:55 bio                      313   40064     341   43648      11   
45056      11   45056
09:28:56 bio                      287   36736     341   43648      10   
40960      11   45056
09:28:56 task_struct              128  253952     140  277760      69  
282624      70  286720
09:28:58 sgpool-64                 33   67584      34   69632      17   
69632      17   69632
09:28:58 bio                      403   51584     403   51584      13   
53248      13   53248
09:28:58 task_struct              124  246016     140  277760      68  
278528      70  286720
09:28:59 journal_handle             0       0       0       0       
0       0       0       0
09:28:59 task_struct              124  246016     136  269824      68  
278528      68  278528
09:29:00 journal_handle            16     768      81    3888       1    
4096       1    4096
09:29:00 scsi_cmd_cache            24   12288      35   17920       5   
20480       5   20480
09:29:00 sgpool-64                 32   65536      34   69632      16   
65536      17   69632
09:29:00 sgpool-8                  51   13056      75   19200       5   
20480       5   20480

The thing that is especially useful with collectl is that by monitoring 
slabs at the same time as monitoring cpu, processes, disk, network and 
more, you can get a very comprehensive picture of what's going on at any 
one time.

My main purpose for writing to this list then becomes what would make 
the most sense to do with slabs with the new slub allocator?  Should I 
simply report on these same fields?  Are there others that make more 
sense?  Do I need to read all 184 entries in /sys/slab and then all the 
entries under them?  Clearly I want to do this efficiently and provide 
meaningful data at the same time.  Perhaps someone would like to take 
this discussion off-line with me and even collaborate with me on 
enhancements for slub in collectl?

-mark

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: SLUB
  2007-12-20 15:06 SLUB Mark Seger
@ 2007-12-20 19:44 ` Christoph Lameter
  2007-12-20 23:36   ` SLUB Mark Seger
  2007-12-21 16:59   ` SLUB Mark Seger
  0 siblings, 2 replies; 24+ messages in thread
From: Christoph Lameter @ 2007-12-20 19:44 UTC (permalink / raw)
  To: Mark Seger; +Cc: linux-mm

On Thu, 20 Dec 2007, Mark Seger wrote:

> This past summer I released a tool on sourceforge called collectl - see
> http://collectl.sourceforge.net/ which does some pretty nifty system
> monitoring, one component of which is slabs.  I finally got around to trying
> it out on a newer kernel and I picked 2.6.23 and lo and behold, it didn't work
> because /proc/slabinfo has disappeared to be replaced by /sys/slab.  I've been

Yes. The information available about slabs is different now.

> The thing that is especially useful with collectl is that by monitoring slabs
> at the same time as monitoring cpu, processes, disk, network and more, you can
> get a very comprehensive picture of what's going on at any one time.

Good idea.

> My main purpose for writing to this list then becomes what would make the most
> sense to do with slabs with the new slub allocator?  Should I simply report on
> these same fields?  Are there others that make more sense?  Do I need to read
> all 184 entries in /sys/slab and then all the entries under them?  Clearly I
> want to do this efficiently and provide meaningful data at the same time.

You only need to read certain files that you need for the information you 
want to display.

> Perhaps someone would like to take this discussion off-line with me and even
> collaborate with me on enhancements for slub in collectl?

I think we better keep it public (so that it goes into the archive). Here 
a short description of the field in /sys/kernel/slab/<slabcache> that you 
would need

-r--r--r-- 1 root root 4096 Dec 20 11:41 object_size

The size of an object. Subtract slab_size - object_size and you have the 
per object overhead generated by alignements and slab metadata. Does not 
change you only need to read this once.

-r--r--r-- 1 root root 4096 Dec 20 11:41 objects

Number of objects in use. This changes and you may want to monitor it.

-r--r--r-- 1 root root 4096 Dec 20 11:41 slab_size

Total memory used for a single object. Read this only once.

-r--r--r-- 1 root root 4096 Dec 20 11:41 slabs

Number of slab pages in use for this slab cache. May change if slab is 
extended.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: SLUB
  2007-12-20 19:44 ` SLUB Christoph Lameter
@ 2007-12-20 23:36   ` Mark Seger
  2007-12-21  1:09     ` SLUB Mark Seger
  2007-12-21 21:32     ` SLUB Christoph Lameter
  2007-12-21 16:59   ` SLUB Mark Seger
  1 sibling, 2 replies; 24+ messages in thread
From: Mark Seger @ 2007-12-20 23:36 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm

>> Perhaps someone would like to take this discussion off-line with me and even
>> collaborate with me on enhancements for slub in collectl?
sounds good to me, I just didn't want to annoy anyone...
>> I think we better keep it public (so that it goes into the archive). Here 
>> a short description of the field in /sys/kernel/slab/<slabcache> that you 
>> would need
>>
>> -r--r--r-- 1 root root 4096 Dec 20 11:41 object_size
>>
>> The size of an object. Subtract slab_size - object_size and you have the 
>> per object overhead generated by alignements and slab metadata. Does not 
>> change you only need to read this once.
>>
>> -r--r--r-- 1 root root 4096 Dec 20 11:41 objects
>>
>> Number of objects in use. This changes and you may want to monitor it.
>>
>> -r--r--r-- 1 root root 4096 Dec 20 11:41 slab_size
>>
>> Total memory used for a single object. Read this only once.
>>
>> -r--r--r-- 1 root root 4096 Dec 20 11:41 slabs
>>
>> Number of slab pages in use for this slab cache. May change if slab is 
>> extended.
>>     
What I'm not sure about is how this maps to the old slab info.  
Specifically, I believe in the old model one reported on the size taken 
up by the slabs (number of slabs X number of objects/slab X object 
size).  There was a second size for the actual number of objects in use, 
so in my report that looked like this:

#                      <-----------Objects----------><---------Slab 
Allocation------>
#Name                  InUse   Bytes   Alloc   Bytes   InUse   Bytes   
Total   Bytes
nfs_direct_cache           0       0       0       0       0       
0       0       0
nfs_write_data            36   27648      40   30720       8   
32768       8   32768

the slab allocation was real memory allocated (which should come close 
to Slab: in /proc/meminfo, right?) for the slabs while the object bytes 
were those in use.  Is it worth it to continue this model or do thing 
work differently.   It sounds like I can still do this with the numbers 
you've pointed me to above and I do now realize I only need to monitor 
the number of slabs and the number of objects since the others are 
constants.

To get back to my original question, I'd like to make sure that I'm 
reporting useful information and not just data for the sake of it.  In 
one of your postings I saw a report you had that showed:

slubinfo - version: 1.0
# name            <objects> <order> <objsize> <slabs>/<partial>/<cpu> <flags> <nodes>

How useful is order, cpu, flags and nodes?
Do people really care about how much memory is taken up by objects vs 
slabs?  If not, I could see reporting for each slab:
- object size
- number objects
- slab size
- number of slabs
- total memory (slab size X number of slabs)
- whatever else people might think to be useful such as order, cpu, 
flags, etc

Another thing I noticed is a number of the slabs are simply links to the 
same base name and is it sufficient to just report the base names and 
not those linked to it?  Seems reasonable to me...

The interesting thing about collectl is that it's written in perl (but 
I'm trying to be very careful to keep it efficient and it tends to use 
<0.1% cpu when run as a daemon) and the good news is it's pretty easy to 
get something implemented, depending on my free time.  If we can get 
some level of agreement on what seems useful I could get a version up 
fairly quickly for people to start playing with if there is any interest.

-mark

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: SLUB
  2007-12-20 23:36   ` SLUB Mark Seger
@ 2007-12-21  1:09     ` Mark Seger
  2007-12-21  1:27       ` SLUB Mark Seger
  2007-12-21 21:41       ` SLUB Christoph Lameter
  2007-12-21 21:32     ` SLUB Christoph Lameter
  1 sibling, 2 replies; 24+ messages in thread
From: Mark Seger @ 2007-12-21  1:09 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Mark Seger, linux-mm

I did some preliminary prototyping and I guess I'm not sure of the 
math.  If I understand what you're saying, an object has a particular 
size, but given the fact that you may need alignment, the true size is 
really the slab size, and the difference is the overhead.  What I don't 
understand is how to calculate how much memory a particular slab takes 
up.  If the slabsize is really the size of an object, wouldn't I 
multiple that times the number of objects?  But when I do that I get a 
number smaller than that reported in /proc/meminfo, in my case 15997K vs 
17388K.  Given memory numbers rarely seem to add up maybe this IS close 
enough?  If so, what's the significance of the number of slabs?  Would I 
divide the 15997K by the number of slabs to find out how big a single 
slab is?  I would have thought that's what the slab_size is but clearly 
it isn't.

In any event, here's a table of what I see on my machine.  The first 4 
columns come from /sys/slab and the 5th I calculated by just multiplying 
SlabSize X NumObj.  If I should be doing something else, please tell 
me.  Also be sure to tell me if I should include other data.  For 
example, the number of objects is a little misleading since when I look 
at the file I really see something like:

49 N0=19 N1=30

which I'm guessing may mean 19 objects are allocated to socket 0 and 30 
to socket 1?  this is a dual-core, dual-socket system.

-mark

Mark Seger wrote:
>
>>> Perhaps someone would like to take this discussion off-line with me 
>>> and even
>>> collaborate with me on enhancements for slub in collectl?
> sounds good to me, I just didn't want to annoy anyone...
>>> I think we better keep it public (so that it goes into the archive). 
>>> Here a short description of the field in 
>>> /sys/kernel/slab/<slabcache> that you would need
>>>
>>> -r--r--r-- 1 root root 4096 Dec 20 11:41 object_size
>>>
>>> The size of an object. Subtract slab_size - object_size and you have 
>>> the per object overhead generated by alignements and slab metadata. 
>>> Does not change you only need to read this once.
>>>
>>> -r--r--r-- 1 root root 4096 Dec 20 11:41 objects
>>>
>>> Number of objects in use. This changes and you may want to monitor it.
>>>
>>> -r--r--r-- 1 root root 4096 Dec 20 11:41 slab_size
>>>
>>> Total memory used for a single object. Read this only once.
>>>
>>> -r--r--r-- 1 root root 4096 Dec 20 11:41 slabs
>>>
>>> Number of slab pages in use for this slab cache. May change if slab 
>>> is extended.
>>>     
> What I'm not sure about is how this maps to the old slab info.  
> Specifically, I believe in the old model one reported on the size 
> taken up by the slabs (number of slabs X number of objects/slab X 
> object size).  There was a second size for the actual number of 
> objects in use, so in my report that looked like this:
>
> #                      <-----------Objects----------><---------Slab 
> Allocation------>
> #Name                  InUse   Bytes   Alloc   Bytes   InUse   Bytes   
> Total   Bytes
> nfs_direct_cache           0       0       0       0       0       
> 0       0       0
> nfs_write_data            36   27648      40   30720       8   
> 32768       8   32768
>
> the slab allocation was real memory allocated (which should come close 
> to Slab: in /proc/meminfo, right?) for the slabs while the object 
> bytes were those in use.  Is it worth it to continue this model or do 
> thing work differently.   It sounds like I can still do this with the 
> numbers you've pointed me to above and I do now realize I only need to 
> monitor the number of slabs and the number of objects since the others 
> are constants.
>
> To get back to my original question, I'd like to make sure that I'm 
> reporting useful information and not just data for the sake of it.  In 
> one of your postings I saw a report you had that showed:
>
> slubinfo - version: 1.0
> # name            <objects> <order> <objsize> <slabs>/<partial>/<cpu> 
> <flags> <nodes>
>
> How useful is order, cpu, flags and nodes?
> Do people really care about how much memory is taken up by objects vs 
> slabs?  If not, I could see reporting for each slab:
> - object size
> - number objects
> - slab size
> - number of slabs
> - total memory (slab size X number of slabs)
> - whatever else people might think to be useful such as order, cpu, 
> flags, etc
>
> Another thing I noticed is a number of the slabs are simply links to 
> the same base name and is it sufficient to just report the base names 
> and not those linked to it?  Seems reasonable to me...
>
> The interesting thing about collectl is that it's written in perl (but 
> I'm trying to be very careful to keep it efficient and it tends to use 
> <0.1% cpu when run as a daemon) and the good news is it's pretty easy 
> to get something implemented, depending on my free time.  If we can 
> get some level of agreement on what seems useful I could get a version 
> up fairly quickly for people to start playing with if there is any 
> interest.
>
> -mark
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: SLUB
  2007-12-21  1:09     ` SLUB Mark Seger
@ 2007-12-21  1:27       ` Mark Seger
  2007-12-21 21:41       ` SLUB Christoph Lameter
  1 sibling, 0 replies; 24+ messages in thread
From: Mark Seger @ 2007-12-21  1:27 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Mark Seger, linux-mm

I just realized I forgot to include an example of the output I was 
generating so here it is:

Slab Name              ObjSize   NumObj  SlabSize  NumSlab     Total
:0000008                     8     2185         8        5     17480
:0000016                    16     1604        16        9     25664
:0000024                    24      409        24        4      9816
:0000032                    32      380        32        5     12160
:0000040                    40      204        40        2      8160
:0000048                    48        0        48        0         0
:0000064                    64      843        64       17     53952
:0000072                    72      167        72        3     12024
:0000088                    88     5549        88      121    488312
:0000096                    96     1400        96       40    134400
:0000112                   112        0       112        0         0
:0000128                   128      385       128       21     49280
:0000136                   136       70       136        4      9520
:0000152                   152       59       152        4      8968
:0000160                   160       46       160        4      7360
:0000176                   176     2071       176       93    364496
:0000192                   192      400       192       24     76800
:0000256                   256     1333       256      100    341248
:0000288                   288       54       288        6     15552
:0000320                   320       53       320        7     16960
:0000384                   384       29       384        5     11136
:0000448                   420       22       448        4      9856
:0000512                   512      150       512       22     76800
:0000704                   696       33       704        3     23232
:0000768                   768       82       768       21     62976
:0000832                   776       98       832       15     81536
:0000896                   896       48       896       14     43008
:0000960                   944       39       960       15     37440
:0001024                  1024      303      1024       80    310272
:0001088                  1048       28      1088        4     30464
:0001608                  1608       34      1608        7     54672
:0001728                  1712       16      1728        5     27648
:0001856                  1856        8      1856        2     14848
:0001904                  1904       87      1904       28    165648
:0002048                  2048      504      2048      131   1032192
:0004096                  4096       49      4096       28    200704
:0008192                  8192        8      8192       12     65536
:0016384                 16384        4     16384        7     65536
:0032768                 32768        3     32768        3     98304
:0065536                 65536        1     65536        1     65536
:0131072                131072        0    131072        0         0
:0262144                262144        0    262144        0         0
:0524288                524288        0    524288        0         0
:1048576               1048576        0   1048576        0         0
:2097152               2097152        0   2097152        0         0
:4194304               4194304        0   4194304        0         0
:a-0000088                  88        0        88        0         0
:a-0000104                 104    13963       104      359   1452152
:a-0000168                 168        0       168        0         0
:a-0000224                 224    11113       224      619   2489312
:a-0000256                 248        0       256        0         0
anon_vma                    40      796        48       12     38208
bdev_cache                 960       32      1024        8     32768
ext2_inode_cache           920        0       928        0         0
ext3_inode_cache           968     4775       976     1194   4660400
file_lock_cache            192       58       200        4     11600
hugetlbfs_inode_cache       752        5       760        1      3800
idr_layer_cache            528       91       536       14     48776
inode_cache                720     3015       728      604   2194920
isofs_inode_cache          768        0       776        0         0
kmem_cache_node             72      232        72        6     16704
mqueue_inode_cache        1040        7      1088        1      7616
nfs_inode_cache           1120      102      1128       15    115056
proc_inode_cache           752      503       760      102    382280
radix_tree_node            552     2666       560      381   1492960
rpc_inode_cache            928       16       960        4     15360
shmem_inode_cache          960      243       968       61    235224
sighand_cache             2120       86      2176       31    187136
sock_inode_cache           816       81       832       11     67392
TOTAL K: 17169

and here's /proc/meminfo
MemTotal:      4040768 kB
MemFree:       3726112 kB
Buffers:         13864 kB
Cached:         196920 kB
SwapCached:          0 kB
Active:         127264 kB
Inactive:       127864 kB
SwapTotal:     4466060 kB
SwapFree:      4466060 kB
Dirty:              60 kB
Writeback:           0 kB
AnonPages:       44364 kB
Mapped:          16124 kB
Slab:            18608 kB
SReclaimable:    11768 kB
SUnreclaim:       6840 kB
PageTables:       2240 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:   6486444 kB
Committed_AS:    64064 kB
VmallocTotal: 34359738367 kB
VmallocUsed:     32364 kB
VmallocChunk: 34359705775 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
Hugepagesize:     2048 kB

-mark

Mark Seger wrote:
> I did some preliminary prototyping and I guess I'm not sure of the 
> math.  If I understand what you're saying, an object has a particular 
> size, but given the fact that you may need alignment, the true size is 
> really the slab size, and the difference is the overhead.  What I 
> don't understand is how to calculate how much memory a particular slab 
> takes up.  If the slabsize is really the size of an object, wouldn't I 
> multiple that times the number of objects?  But when I do that I get a 
> number smaller than that reported in /proc/meminfo, in my case 15997K 
> vs 17388K.  Given memory numbers rarely seem to add up maybe this IS 
> close enough?  If so, what's the significance of the number of slabs?  
> Would I divide the 15997K by the number of slabs to find out how big a 
> single slab is?  I would have thought that's what the slab_size is but 
> clearly it isn't.
>
> In any event, here's a table of what I see on my machine.  The first 4 
> columns come from /sys/slab and the 5th I calculated by just 
> multiplying SlabSize X NumObj.  If I should be doing something else, 
> please tell me.  Also be sure to tell me if I should include other 
> data.  For example, the number of objects is a little misleading since 
> when I look at the file I really see something like:
>
> 49 N0=19 N1=30
>
> which I'm guessing may mean 19 objects are allocated to socket 0 and 
> 30 to socket 1?  this is a dual-core, dual-socket system.
>
> -mark
>
> Mark Seger wrote:
>>
>>>> Perhaps someone would like to take this discussion off-line with me 
>>>> and even
>>>> collaborate with me on enhancements for slub in collectl?
>> sounds good to me, I just didn't want to annoy anyone...
>>>> I think we better keep it public (so that it goes into the 
>>>> archive). Here a short description of the field in 
>>>> /sys/kernel/slab/<slabcache> that you would need
>>>>
>>>> -r--r--r-- 1 root root 4096 Dec 20 11:41 object_size
>>>>
>>>> The size of an object. Subtract slab_size - object_size and you 
>>>> have the per object overhead generated by alignements and slab 
>>>> metadata. Does not change you only need to read this once.
>>>>
>>>> -r--r--r-- 1 root root 4096 Dec 20 11:41 objects
>>>>
>>>> Number of objects in use. This changes and you may want to monitor it.
>>>>
>>>> -r--r--r-- 1 root root 4096 Dec 20 11:41 slab_size
>>>>
>>>> Total memory used for a single object. Read this only once.
>>>>
>>>> -r--r--r-- 1 root root 4096 Dec 20 11:41 slabs
>>>>
>>>> Number of slab pages in use for this slab cache. May change if slab 
>>>> is extended.
>>>>     
>> What I'm not sure about is how this maps to the old slab info.  
>> Specifically, I believe in the old model one reported on the size 
>> taken up by the slabs (number of slabs X number of objects/slab X 
>> object size).  There was a second size for the actual number of 
>> objects in use, so in my report that looked like this:
>>
>> #                      <-----------Objects----------><---------Slab 
>> Allocation------>
>> #Name                  InUse   Bytes   Alloc   Bytes   InUse   
>> Bytes   Total   Bytes
>> nfs_direct_cache           0       0       0       0       0       
>> 0       0       0
>> nfs_write_data            36   27648      40   30720       8   
>> 32768       8   32768
>>
>> the slab allocation was real memory allocated (which should come 
>> close to Slab: in /proc/meminfo, right?) for the slabs while the 
>> object bytes were those in use.  Is it worth it to continue this 
>> model or do thing work differently.   It sounds like I can still do 
>> this with the numbers you've pointed me to above and I do now realize 
>> I only need to monitor the number of slabs and the number of objects 
>> since the others are constants.
>>
>> To get back to my original question, I'd like to make sure that I'm 
>> reporting useful information and not just data for the sake of it.  
>> In one of your postings I saw a report you had that showed:
>>
>> slubinfo - version: 1.0
>> # name            <objects> <order> <objsize> <slabs>/<partial>/<cpu> 
>> <flags> <nodes>
>>
>> How useful is order, cpu, flags and nodes?
>> Do people really care about how much memory is taken up by objects vs 
>> slabs?  If not, I could see reporting for each slab:
>> - object size
>> - number objects
>> - slab size
>> - number of slabs
>> - total memory (slab size X number of slabs)
>> - whatever else people might think to be useful such as order, cpu, 
>> flags, etc
>>
>> Another thing I noticed is a number of the slabs are simply links to 
>> the same base name and is it sufficient to just report the base names 
>> and not those linked to it?  Seems reasonable to me...
>>
>> The interesting thing about collectl is that it's written in perl 
>> (but I'm trying to be very careful to keep it efficient and it tends 
>> to use <0.1% cpu when run as a daemon) and the good news is it's 
>> pretty easy to get something implemented, depending on my free time.  
>> If we can get some level of agreement on what seems useful I could 
>> get a version up fairly quickly for people to start playing with if 
>> there is any interest.
>>
>> -mark
>>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: SLUB
  2007-12-21  1:09     ` SLUB Mark Seger
  2007-12-21  1:27       ` SLUB Mark Seger
@ 2007-12-21 21:41       ` Christoph Lameter
  2007-12-27 14:22         ` SLUB Mark Seger
  1 sibling, 1 reply; 24+ messages in thread
From: Christoph Lameter @ 2007-12-21 21:41 UTC (permalink / raw)
  To: Mark Seger; +Cc: linux-mm

On Thu, 20 Dec 2007, Mark Seger wrote:

> I did some preliminary prototyping and I guess I'm not sure of the math.  If I
> understand what you're saying, an object has a particular size, but given the
> fact that you may need alignment, the true size is really the slab size, and
> the difference is the overhead.  What I don't understand is how to calculate
> how much memory a particular slab takes up.  If the slabsize is really the

If you want the use in terms of pages allocated from the page allocator 
then you do

slabs << order

If you want to use in actual bytes in allocated objects by the user of 
a slab cache then you can do

objects * obj_size

> this IS close enough?  If so, what's the significance of the number of slabs?

Its the amount of pages that were taken from the page allocator.

> Would I divide the 15997K by the number of slabs to find out how big a single
> slab is?  I would have thought that's what the slab_size is but clearly it
> isn't.

The size of a single slab that contains multiple objects is

PAGE_SIZE << order

> 49 N0=19 N1=30
> 
> which I'm guessing may mean 19 objects are allocated to socket 0 and 30 to
> socket 1?  this is a dual-core, dual-socket system.

Right. There are 49 objects in use. 19 of those are on node 0 and 30 on 
node 0. The Nx values only show up on NUMA systems otherwise this will be 
omitted.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: SLUB
  2007-12-21 21:41       ` SLUB Christoph Lameter
@ 2007-12-27 14:22         ` Mark Seger
  2007-12-27 15:59           ` SLUB Mark Seger
  2007-12-27 19:40           ` SLUB Christoph Lameter
  0 siblings, 2 replies; 24+ messages in thread
From: Mark Seger @ 2007-12-27 14:22 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm

Now that I've had some more time to think about this and play around 
with the slabinfo tool I fear my problem had getting my head wrapped 
around the terminology, but that's my problem.  Since there are entries 
called object_size, objs_per_slab and slab_size I would have thought 
that object_size*objects_per_slab=slab_size but that clearly isn't the 
case.  Since slabs are allocated in pages, the actual size of the slabs 
is always a multiple of the page_size (actually by a power of 2) and 
that's why I see calculations in slabinfo like page_size << order, but I 
guess I'm still not sure what the  actual definition of 'order' actually is.

Anyhow, when I run slabinfo and see the following entry

Slabcache: skbuff_fclone_cache   Aliases:  0 Order :  0 Objects: 25
** Hardware cacheline aligned

Sizes (bytes)     Slabs              Debug                Memory
------------------------------------------------------------------------
Object :     420  Total  :       4   Sanity Checks : Off  Total:   16384
SlabObj:     448  Full   :       0   Redzoning     : Off  Used :   10500
SlabSiz:    4096  Partial:       0   Poisoning     : Off  Loss :    5884
Loss   :      28  CpuSlab:       4   Tracking      : Off  Lalig:     700
Align  :       0  Objects:       9   Tracing       : Off  Lpadd:     256

according to the entries under /sys/slabs/skbuff_fclone_cache it looks 
like the slab_size field is being reported above as 'SlabObj' and 
objs_per_slab is being reported as 'Objects' and as I mentioned above, 
SlabSiz is based on 'order'.

Anyhow, as I understand what's going on at a very high level, memory is 
reserved for use as slabs (which themselves are multiples of pages) and 
processes allocate objects from within slabs as they need them.  
Therefore the 2 high-level numbers that seem of interest from a memory 
usage perspective are the memory allocated and the amount in use.  I 
think these are the "Total" and "Used" fields in slabinfo.

Total = page_size << order

As for 'Used' that looks to be a straight calculation of objects * 
object_size

The Slabs field in /proc/meminfo is the total of the individual 'Total's...

Stay tuned and at some point I'll have support in collectl for reporting 
total/allocated usage by slab in collectl, though perhaps I'll post a 
'proposal' first in the hopes of getting some constructive feedback as I 
want to present useful information rather than that columns of numbers.

-mark

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: SLUB
  2007-12-27 14:22         ` SLUB Mark Seger
@ 2007-12-27 15:59           ` Mark Seger
  2007-12-27 19:43             ` SLUB Christoph Lameter
  2007-12-27 19:40           ` SLUB Christoph Lameter
  1 sibling, 1 reply; 24+ messages in thread
From: Mark Seger @ 2007-12-27 15:59 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Mark Seger, linux-mm

I now have a 'prototype' of something I think makes sense, at least from 
my collectl tool's perspective.  Keep in mind the philosophy behind 
collectl is to have a tool you can run both interactively and as a 
daemon that will give you enough information to paint a picture of 
what's happening on your system and in this case I'm focused on slabs.  
This is not intended to be a highly analytical tool but rather a 
starting point to identify areas potentially requiring a deeper dive.  
For example, with the current version that's driven off /proc/slabinfo, 
it's been possible to look at the long term changes to individual slabs 
to get picture of how memory is being allocated and when there are 
memory issues it can be useful to see which slabs (if any) are growing 
at an unexpected rate.  That said, I'm thinking of reporting something 
like the following:

                           <-------- objects --------><----- slabs 
-----><------ memory ------>
Slab Name                     Size   In Use    Avail     Size   
Number        Used       Total
:0000008                         8     2164     2560     4096        
5       17312       20480
:0000016                        16     1448     2816     4096       
11       23168       45056
:0000024                        24      460      680     4096        
4       11040       16384
:0000032                        32      384     1152     4096        
9       12288       36864
:0000040                        40      306      306     4096        
3       12240       12288

The idea here is that  for each slab in the 'objects' section one can 
see how many objects are 'in use' and how many are 'available', the 
point being one can look at the difference to see how many more objects 
are available before the system needs to allocate another slab.  Under 
the 'slabs' section you can see how big the individual slabs are and how 
many of them there are and finally under 'memory' you can see how much 
has been used by processes vs how much is still allocated as slabs.

There are all sorts of other ways to present the data such as 
percentages, differences, etc. but this is more-or-less the way I did it 
in the past and the information was useful.  One could also argue that 
the real key information here is Uses/Total and the rest is just window 
dressing and I couldn't disagree with that either, but I do think it 
helps paint a more complete picture.

-mark

Mark Seger wrote:
> Now that I've had some more time to think about this and play around 
> with the slabinfo tool I fear my problem had getting my head wrapped 
> around the terminology, but that's my problem.  Since there are 
> entries called object_size, objs_per_slab and slab_size I would have 
> thought that object_size*objects_per_slab=slab_size but that clearly 
> isn't the case.  Since slabs are allocated in pages, the actual size 
> of the slabs is always a multiple of the page_size (actually by a 
> power of 2) and that's why I see calculations in slabinfo like 
> page_size << order, but I guess I'm still not sure what the  actual 
> definition of 'order' actually is.
>
> Anyhow, when I run slabinfo and see the following entry
>
> Slabcache: skbuff_fclone_cache   Aliases:  0 Order :  0 Objects: 25
> ** Hardware cacheline aligned
>
> Sizes (bytes)     Slabs              Debug                Memory
> ------------------------------------------------------------------------
> Object :     420  Total  :       4   Sanity Checks : Off  Total:   16384
> SlabObj:     448  Full   :       0   Redzoning     : Off  Used :   10500
> SlabSiz:    4096  Partial:       0   Poisoning     : Off  Loss :    5884
> Loss   :      28  CpuSlab:       4   Tracking      : Off  Lalig:     700
> Align  :       0  Objects:       9   Tracing       : Off  Lpadd:     256
>
> according to the entries under /sys/slabs/skbuff_fclone_cache it looks 
> like the slab_size field is being reported above as 'SlabObj' and 
> objs_per_slab is being reported as 'Objects' and as I mentioned above, 
> SlabSiz is based on 'order'.
>
> Anyhow, as I understand what's going on at a very high level, memory 
> is reserved for use as slabs (which themselves are multiples of pages) 
> and processes allocate objects from within slabs as they need them.  
> Therefore the 2 high-level numbers that seem of interest from a memory 
> usage perspective are the memory allocated and the amount in use.  I 
> think these are the "Total" and "Used" fields in slabinfo.
>
> Total = page_size << order
>
> As for 'Used' that looks to be a straight calculation of objects * 
> object_size
>
> The Slabs field in /proc/meminfo is the total of the individual 
> 'Total's...
>
> Stay tuned and at some point I'll have support in collectl for 
> reporting total/allocated usage by slab in collectl, though perhaps 
> I'll post a 'proposal' first in the hopes of getting some constructive 
> feedback as I want to present useful information rather than that 
> columns of numbers.
>
> -mark
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: SLUB
  2007-12-27 15:59           ` SLUB Mark Seger
@ 2007-12-27 19:43             ` Christoph Lameter
  2007-12-27 19:57               ` SLUB Mark Seger
  0 siblings, 1 reply; 24+ messages in thread
From: Christoph Lameter @ 2007-12-27 19:43 UTC (permalink / raw)
  To: Mark Seger; +Cc: linux-mm

On Thu, 27 Dec 2007, Mark Seger wrote:

>                           <-------- objects --------><----- slabs
> -----><------ memory ------>
> Slab Name                     Size   In Use    Avail     Size   Number Used       Total
> :0000008                         8     2164     2560     4096        5 17312       20480

The right hand side is okay. Could you list all the slab names that are 
covered by :00008 on the left side (maybe separated by commas?) Having the 
:00008 there is ugly. slabinfo can show you a way how to get the names.

> There are all sorts of other ways to present the data such as percentages,
> differences, etc. but this is more-or-less the way I did it in the past and
> the information was useful.  One could also argue that the real key
> information here is Uses/Total and the rest is just window dressing and I
> couldn't disagree with that either, but I do think it helps paint a more
> complete picture.

I agree.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: SLUB
  2007-12-27 19:43             ` SLUB Christoph Lameter
@ 2007-12-27 19:57               ` Mark Seger
  2007-12-27 19:58                 ` SLUB Christoph Lameter
  0 siblings, 1 reply; 24+ messages in thread
From: Mark Seger @ 2007-12-27 19:57 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm


Christoph Lameter wrote:
> On Thu, 27 Dec 2007, Mark Seger wrote:
>
>   
>>                           <-------- objects --------><----- slabs
>> -----><------ memory ------>
>> Slab Name                     Size   In Use    Avail     Size   Number Used       Total
>> :0000008                         8     2164     2560     4096        5 17312       20480
>>     
>
> The right hand side is okay. Could you list all the slab names that are 
> covered by :00008 on the left side (maybe separated by commas?) Having the 
> :00008 there is ugly. slabinfo can show you a way how to get the names.
>   
here's the challenge - I only want to use a single line per entry AND I 
want all the columns to line up for easy reading (I don't want much do 
I?).  I'll have to do some experiments to see what might look better.  
One thought is to list a 'primary' name (whatever that might mean) in 
the left-hand column and perhaps line up the rest of the other names to 
the right of the total.  Another option could be to just repeat the line 
with each slab entry but that also generates a lot of output and one of 
the other notions behind collectl is to make it real easy to see what's 
going on and repeating information can be confusing.
I'm assuming the way slabinfo gets the names (or at least the way I can 
think of doing it) it so just look for entries in /sys/slab that are links.
>> There are all sorts of other ways to present the data such as percentages,
>> differences, etc. but this is more-or-less the way I did it in the past and
>> the information was useful.  One could also argue that the real key
>> information here is Uses/Total and the rest is just window dressing and I
>> couldn't disagree with that either, but I do think it helps paint a more
>> complete picture.
>>     
>
> I agree.
>   
The neat thing about collectl is it's written in perl and contains lots 
of switches and print statements.  I can easily see additional switches 
that might control how the information is printed, such as the 'node' 
level allocations, but I figure that can come later.

-mark


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: SLUB
  2007-12-27 19:57               ` SLUB Mark Seger
@ 2007-12-27 19:58                 ` Christoph Lameter
  2007-12-27 20:17                   ` SLUB Mark Seger
  2007-12-27 20:55                   ` SLUB Mark Seger
  0 siblings, 2 replies; 24+ messages in thread
From: Christoph Lameter @ 2007-12-27 19:58 UTC (permalink / raw)
  To: Mark Seger; +Cc: linux-mm

On Thu, 27 Dec 2007, Mark Seger wrote:

> > The right hand side is okay. Could you list all the slab names that are
> > covered by :00008 on the left side (maybe separated by commas?) Having the
> > :00008 there is ugly. slabinfo can show you a way how to get the names.
> >   
> here's the challenge - I only want to use a single line per entry AND I want
> all the columns to line up for easy reading (I don't want much do I?).  I'll
> have to do some experiments to see what might look better.  One thought is to
> list a 'primary' name (whatever that might mean) in the left-hand column and
> perhaps line up the rest of the other names to the right of the total.

slabinfo has the concept of the "first" name of a slab. See the -f option.

> Another option could be to just repeat the line with each slab entry but that
> also generates a lot of output and one of the other notions behind collectl is
> to make it real easy to see what's going on and repeating information can be
> confusing.

I'd say just pack as much as fit into the space and then create a new line 
if there are too many aliases of the slab.

> I'm assuming the way slabinfo gets the names (or at least the way I can think
> of doing it) it so just look for entries in /sys/slab that are links.

It scans for symlinks pointing to that strange name. Source code for 
slabinfo is in Documentation/vm/slabinfo.c.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: SLUB
  2007-12-27 19:58                 ` SLUB Christoph Lameter
@ 2007-12-27 20:17                   ` Mark Seger
  2007-12-27 20:55                   ` SLUB Mark Seger
  1 sibling, 0 replies; 24+ messages in thread
From: Mark Seger @ 2007-12-27 20:17 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm


Christoph Lameter wrote:
> On Thu, 27 Dec 2007, Mark Seger wrote:
>
>   
>>> The right hand side is okay. Could you list all the slab names that are
>>> covered by :00008 on the left side (maybe separated by commas?) Having the
>>> :00008 there is ugly. slabinfo can show you a way how to get the names.
>>>   
>>>       
>> here's the challenge - I only want to use a single line per entry AND I want
>> all the columns to line up for easy reading (I don't want much do I?).  I'll
>> have to do some experiments to see what might look better.  One thought is to
>> list a 'primary' name (whatever that might mean) in the left-hand column and
>> perhaps line up the rest of the other names to the right of the total.
>>     
>
> slabinfo has the concept of the "first" name of a slab. See the -f option.
>   
slick!
>> Another option could be to just repeat the line with each slab entry but that
>> also generates a lot of output and one of the other notions behind collectl is
>> to make it real easy to see what's going on and repeating information can be
>> confusing.
>>     
>
> I'd say just pack as much as fit into the space and then create a new line 
> if there are too many aliases of the slab.
>   
lemme play with it some
>> I'm assuming the way slabinfo gets the names (or at least the way I can think
>> of doing it) it so just look for entries in /sys/slab that are links.
>>     
>
> It scans for symlinks pointing to that strange name. Source code for 
> slabinfo is in Documentation/vm/slabinfo.c.
>   
gottcha...
-mark


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: SLUB
  2007-12-27 19:58                 ` SLUB Christoph Lameter
  2007-12-27 20:17                   ` SLUB Mark Seger
@ 2007-12-27 20:55                   ` Mark Seger
  2007-12-27 20:59                     ` SLUB Christoph Lameter
  1 sibling, 1 reply; 24+ messages in thread
From: Mark Seger @ 2007-12-27 20:55 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm

ok, here's a dumb question...  I've been looking at slabinfo and see a 
routine called find_one_alias which returns the alias that gets printed 
with the -f switch.  the only thing is the leading comment says "Find 
the shortest alias of a slab" but it looks like it returns the longest 
name.  Did you change the functionality after your wrote the comment?  
that'll teach you for commenting your code!  8-)
I'm also not sure why it would stop the search when it finds an alias 
that started with 'kmall'.  Is there some reason you wouldn't want to 
use any of those names as potential candidates?  Does it really matter 
how I choose the 'first' name?  It's certainly easy enough to pick the 
longest, I'm just not sure about the test for 'kmall'.
-mark

Christoph Lameter wrote:
> On Thu, 27 Dec 2007, Mark Seger wrote:
>
>   
>>> The right hand side is okay. Could you list all the slab names that are
>>> covered by :00008 on the left side (maybe separated by commas?) Having the
>>> :00008 there is ugly. slabinfo can show you a way how to get the names.
>>>   
>>>       
>> here's the challenge - I only want to use a single line per entry AND I want
>> all the columns to line up for easy reading (I don't want much do I?).  I'll
>> have to do some experiments to see what might look better.  One thought is to
>> list a 'primary' name (whatever that might mean) in the left-hand column and
>> perhaps line up the rest of the other names to the right of the total.
>>     
>
> slabinfo has the concept of the "first" name of a slab. See the -f option.
>
>   
>> Another option could be to just repeat the line with each slab entry but that
>> also generates a lot of output and one of the other notions behind collectl is
>> to make it real easy to see what's going on and repeating information can be
>> confusing.
>>     
>
> I'd say just pack as much as fit into the space and then create a new line 
> if there are too many aliases of the slab.
>
>   
>> I'm assuming the way slabinfo gets the names (or at least the way I can think
>> of doing it) it so just look for entries in /sys/slab that are links.
>>     
>
> It scans for symlinks pointing to that strange name. Source code for 
> slabinfo is in Documentation/vm/slabinfo.c.
>   

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: SLUB
  2007-12-27 20:55                   ` SLUB Mark Seger
@ 2007-12-27 20:59                     ` Christoph Lameter
  2007-12-27 23:49                       ` collectl and the new slab allocator [slub] statistics Mark Seger
  0 siblings, 1 reply; 24+ messages in thread
From: Christoph Lameter @ 2007-12-27 20:59 UTC (permalink / raw)
  To: Mark Seger; +Cc: linux-mm

On Thu, 27 Dec 2007, Mark Seger wrote:

> ok, here's a dumb question...  I've been looking at slabinfo and see a routine
> called find_one_alias which returns the alias that gets printed with the -f
> switch.  the only thing is the leading comment says "Find the shortest alias
> of a slab" but it looks like it returns the longest name.  Did you change the
> functionality after your wrote the comment?  that'll teach you for commenting
> your code!  8-)

Yuck.

> I'm also not sure why it would stop the search when it finds an alias that
> started with 'kmall'.  Is there some reason you wouldn't want to use any of
> those names as potential candidates?  Does it really matter how I choose the
> 'first' name?  It's certainly easy enough to pick the longest, I'm just not
> sure about the test for 'kmall'.

Well the kmallocs are generic and just give size information. You want a 
slab name that is more informative than that. 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* collectl and the new slab allocator [slub] statistics
  2007-12-27 20:59                     ` SLUB Christoph Lameter
@ 2007-12-27 23:49                       ` Mark Seger
  2007-12-27 23:52                         ` Christoph Lameter
  0 siblings, 1 reply; 24+ messages in thread
From: Mark Seger @ 2007-12-27 23:49 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm

I hope you don't mind but I changed the subject from the pretty generic 
one of 'slub.

My latest thought about handling the multiple aliases is what if I do 
something like slabinfo - pick a 'primary' one based on a similar 
criteria such as the longest name that isn't 'kmalloc' or that other 
funky format with the size in its name.  Then provide a second option 
that shows the mappings of all the names to the primary ones.  That way 
if you're interested in a particular slab you can always look up its 
mapping.  I would also provide a mechanism for specifying those slabs 
you want to monitor and even if not a 'primary' name it would use that name.

Today's kind of over for me but perhaps I can send out an updated 
prototype format tomorrow.

-mark

Christoph Lameter wrote:
> On Thu, 27 Dec 2007, Mark Seger wrote:
>
>   
>> ok, here's a dumb question...  I've been looking at slabinfo and see a routine
>> called find_one_alias which returns the alias that gets printed with the -f
>> switch.  the only thing is the leading comment says "Find the shortest alias
>> of a slab" but it looks like it returns the longest name.  Did you change the
>> functionality after your wrote the comment?  that'll teach you for commenting
>> your code!  8-)
>>     
>
> Yuck.
>
>   
>> I'm also not sure why it would stop the search when it finds an alias that
>> started with 'kmall'.  Is there some reason you wouldn't want to use any of
>> those names as potential candidates?  Does it really matter how I choose the
>> 'first' name?  It's certainly easy enough to pick the longest, I'm just not
>> sure about the test for 'kmall'.
>>     
>
> Well the kmallocs are generic and just give size information. You want a 
> slab name that is more informative than that. 
>   

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: collectl and the new slab allocator [slub] statistics
  2007-12-27 23:49                       ` collectl and the new slab allocator [slub] statistics Mark Seger
@ 2007-12-27 23:52                         ` Christoph Lameter
  2007-12-28 15:10                           ` Mark Seger
  0 siblings, 1 reply; 24+ messages in thread
From: Christoph Lameter @ 2007-12-27 23:52 UTC (permalink / raw)
  To: Mark Seger; +Cc: linux-mm

On Thu, 27 Dec 2007, Mark Seger wrote:

> particular slab you can always look up its mapping.  I would also provide a
> mechanism for specifying those slabs you want to monitor and even if not a
> 'primary' name it would use that name.

Sounds good.
 
> Today's kind of over for me but perhaps I can send out an updated prototype
> format tomorrow.

Great. But I will only be back next Wednesday.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: collectl and the new slab allocator [slub] statistics
  2007-12-27 23:52                         ` Christoph Lameter
@ 2007-12-28 15:10                           ` Mark Seger
  2007-12-31 18:30                             ` Mark Seger
  0 siblings, 1 reply; 24+ messages in thread
From: Mark Seger @ 2007-12-28 15:10 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm

Christoph Lameter wrote:
> On Thu, 27 Dec 2007, Mark Seger wrote:
>
>   
>> particular slab you can always look up its mapping.  I would also provide a
>> mechanism for specifying those slabs you want to monitor and even if not a
>> 'primary' name it would use that name.
>>     
>
> Sounds good.
>  
>   
>> Today's kind of over for me but perhaps I can send out an updated prototype
>> format tomorrow.
>>     
>
> Great. But I will only be back next Wednesday.
>   
So here's the latest...  I made a couple of tweaks to the format but I 
think it's getting real close and as you can see, I'm now printing the 
longest alias associated with a slab as is done in slabinfo.  I'm also 
including the time to make it easier to read but typically this is an 
option in case the user doesn't want to use the extra screen 
real-estate.  As a minor point, as I was debugging this and comparing 
its output to slabinfo (and we don't always get the same aliases if 
there are multiple aliases of the same length) I found that slabinfo 
reports on 'kmalloc-1024' and I'm reporting 'biovec-64'.  I thought you 
wanted to only print the kmalloc* names when there was nothing else and 
so I suspect a slight bug in slabinfo...

Note that I decided to print the number of objects in a slab, even 
though one could derive that themselves.  I also decided to report the 
size of the slabs in K bytes as well as the user/total memory.  I'm 
still reporting the objects inuse/avail in bytes since these are often 
<1K and I really don't want to report fractions.

                                     <----------- objects 
-----------><--- slabs ---><----- memory ----->
Time      Slab Name                     Size  /slab   In Use    Avail  
SizeK   Number     UsedK    TotalK
10:25:04  TCP                           1728      4       13       
20      8        5        21        40
10:25:04  TCPv6                         1856      4       15       
20      8        5        27        40
10:25:04  UDP-Lite                       896      4       51       
64      4       16        44        64
10:25:04  UDPLITEv6                     1088      7       28       
28      8        4        29        32
10:25:04  anon_vma                        48     85      773     
1105      4       13        36        52

Anyhow, here's an example of watching the system once a second for any 
slabs that change while the system is idle

                                     <----------- objects 
-----------><--- slabs ---><----- memory ----->
Time      Slab Name                     Size  /slab   In Use    Avail  
SizeK   Number     UsedK    TotalK
10:25:34  skbuff_fclone_cache            448      9       16       
36      4        4         7        16
10:25:34  skbuff_head_cache              256     16     1266     
1552      4       97       316       388
10:25:35  skbuff_fclone_cache            448      9       23       
36      4        4        10        16
10:25:35  skbuff_head_cache              256     16     1265     
1552      4       97       316       388
10:25:36  biovec-64                     1024      4      303      
320      4       80       303       320
10:25:36  dentry                         224     18   215543   
215568      4    11976     47150     47904
10:25:36  skbuff_fclone_cache            448      9       19       
36      4        4         8        16
10:25:36  skbuff_head_cache              256     16     1269     
1552      4       97       317       388

And finally, here's watching a single slab while writing a large file, 
noting the I/O started at 10:26:30...

                                     <----------- objects 
-----------><--- slabs ---><----- memory ----->
Time      Slab Name                     Size  /slab   In Use    Avail  
SizeK   Number     UsedK    TotalK
10:26:25  blkdev_requests                288     14       39       
84      4        6        10        24
10:26:30  blkdev_requests                288     14      189      
224      4       16        53        64
10:26:31  blkdev_requests                288     14      187      
224      4       16        52        64
10:26:32  blkdev_requests                288     14      174      
224      4       16        48        64
10:26:33  blkdev_requests                288     14      173      
224      4       16        48        64
10:26:34  blkdev_requests                288     14       46       
84      4        6        12        24

It shouldn't take too much time to actually implement this in collectl, 
but I do need to find the block of time to update the code, man pages, 
etc before releasing it so if there are any final tweaks, now is the 
time to say so...

-mark


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: collectl and the new slab allocator [slub] statistics
  2007-12-28 15:10                           ` Mark Seger
@ 2007-12-31 18:30                             ` Mark Seger
  0 siblings, 0 replies; 24+ messages in thread
From: Mark Seger @ 2007-12-31 18:30 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Mark Seger, linux-mm

Even though I know you won't be around for a few days I found a few more 
cycles to put into this and have implemented quite a lot in collectl.  
Rather than send along a bunch of output, I started to put together a 
web page as part of collectl web site though I haven't linked it in yet 
as I haven't yet released the associated version.  In any event, I took 
a shot of trying to include a few high level words about slabs in 
general as well as show what some of the different output formats will 
look like as I'd much rather make changes before I release it than after.

That said if you or anyone else on this list want to have a look at what 
I've been up to you can see it at 
http://collectl.sourceforge.net/SlabInfo.html

-mark

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: SLUB
  2007-12-27 14:22         ` SLUB Mark Seger
  2007-12-27 15:59           ` SLUB Mark Seger
@ 2007-12-27 19:40           ` Christoph Lameter
  2007-12-27 19:51             ` SLUB Mark Seger
  1 sibling, 1 reply; 24+ messages in thread
From: Christoph Lameter @ 2007-12-27 19:40 UTC (permalink / raw)
  To: Mark Seger; +Cc: linux-mm

On Thu, 27 Dec 2007, Mark Seger wrote:

> Now that I've had some more time to think about this and play around with the
> slabinfo tool I fear my problem had getting my head wrapped around the
> terminology, but that's my problem.  Since there are entries called
> object_size, objs_per_slab and slab_size I would have thought that
> object_size*objects_per_slab=slab_size but that clearly isn't the case.  Since
> slabs are allocated in pages, the actual size of the slabs is always a
> multiple of the page_size (actually by a power of 2) and that's why I see
> calculations in slabinfo like page_size << order, but I guess I'm still not
> sure what the  actual definition of 'order' actually is.

order is the shift you apply to PAGE_SIZE to get to the allocation size 
you want. Order 0 = PAGE_SIZE, order 1 = PAGE_SIZE << 1 (PAGE_SIZE *2), 
order 2 = PAGE_SIZE << 2 (PAGE_SIZE * 4) etc.

> Slabcache: skbuff_fclone_cache   Aliases:  0 Order :  0 Objects: 25
> ** Hardware cacheline aligned
> 
> Sizes (bytes)     Slabs              Debug                Memory
> ------------------------------------------------------------------------
> Object :     420  Total  :       4   Sanity Checks : Off  Total:   16384
> SlabObj:     448  Full   :       0   Redzoning     : Off  Used :   10500
> SlabSiz:    4096  Partial:       0   Poisoning     : Off  Loss :    5884
> Loss   :      28  CpuSlab:       4   Tracking      : Off  Lalig:     700
> Align  :       0  Objects:       9   Tracing       : Off  Lpadd:     256
> 
> according to the entries under /sys/slabs/skbuff_fclone_cache it looks like
> the slab_size field is being reported above as 'SlabObj' and objs_per_slab is
> being reported as 'Objects' and as I mentioned above, SlabSiz is based on
> 'order'.
> 
> Anyhow, as I understand what's going on at a very high level, memory is
> reserved for use as slabs (which themselves are multiples of pages) and
> processes allocate objects from within slabs as they need them.  Therefore the
> 2 high-level numbers that seem of interest from a memory usage perspective are
> the memory allocated and the amount in use.  I think these are the "Total" and
> "Used" fields in slabinfo.

Total is the total memory allocated from the page allocator. There are 4 
slab allocated with the size of 4096 bytes each. This is 16k.

The used value is the memory that was actually handed out through kmalloc 
and friends.
 
> Total = page_size << order

Order = 0. So Total would be 4096 << 0 = 4096. Wrong value.

> As for 'Used' that looks to be a straight calculation of objects * object_size

Right.

> The Slabs field in /proc/meminfo is the total of the individual 'Total's...

Right.

> Stay tuned and at some point I'll have support in collectl for reporting
> total/allocated usage by slab in collectl, though perhaps I'll post a
> 'proposal' first in the hopes of getting some constructive feedback as I want
> to present useful information rather than that columns of numbers.

Ahh Great. Thanks for all your work.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: SLUB
  2007-12-27 19:40           ` SLUB Christoph Lameter
@ 2007-12-27 19:51             ` Mark Seger
  2007-12-27 19:53               ` SLUB Christoph Lameter
  0 siblings, 1 reply; 24+ messages in thread
From: Mark Seger @ 2007-12-27 19:51 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm

It feels like we're closing in on something as O'm getting more 'Right's 
from you than before.  8-)
Just a few more comments/questions to your comments below...

Christoph Lameter wrote:
> On Thu, 27 Dec 2007, Mark Seger wrote:
>
>   
>> Now that I've had some more time to think about this and play around with the
>> slabinfo tool I fear my problem had getting my head wrapped around the
>> terminology, but that's my problem.  Since there are entries called
>> object_size, objs_per_slab and slab_size I would have thought that
>> object_size*objects_per_slab=slab_size but that clearly isn't the case.  Since
>> slabs are allocated in pages, the actual size of the slabs is always a
>> multiple of the page_size (actually by a power of 2) and that's why I see
>> calculations in slabinfo like page_size << order, but I guess I'm still not
>> sure what the  actual definition of 'order' actually is.
>>     
>
> order is the shift you apply to PAGE_SIZE to get to the allocation size 
> you want. Order 0 = PAGE_SIZE, order 1 = PAGE_SIZE << 1 (PAGE_SIZE *2), 
> order 2 = PAGE_SIZE << 2 (PAGE_SIZE * 4) etc.
>   
I think the thing that was throwing me here for awhile was the name 
'order'.  I thought it meant order in the ordinal sense but clearly it's 
more intended as is 'the power of' sense.
>> Slabcache: skbuff_fclone_cache   Aliases:  0 Order :  0 Objects: 25
>> ** Hardware cacheline aligned
>>
>> Sizes (bytes)     Slabs              Debug                Memory
>> ------------------------------------------------------------------------
>> Object :     420  Total  :       4   Sanity Checks : Off  Total:   16384
>> SlabObj:     448  Full   :       0   Redzoning     : Off  Used :   10500
>> SlabSiz:    4096  Partial:       0   Poisoning     : Off  Loss :    5884
>> Loss   :      28  CpuSlab:       4   Tracking      : Off  Lalig:     700
>> Align  :       0  Objects:       9   Tracing       : Off  Lpadd:     256
>>
>> according to the entries under /sys/slabs/skbuff_fclone_cache it looks like
>> the slab_size field is being reported above as 'SlabObj' and objs_per_slab is
>> being reported as 'Objects' and as I mentioned above, SlabSiz is based on
>> 'order'.
>>
>> Anyhow, as I understand what's going on at a very high level, memory is
>> reserved for use as slabs (which themselves are multiples of pages) and
>> processes allocate objects from within slabs as they need them.  Therefore the
>> 2 high-level numbers that seem of interest from a memory usage perspective are
>> the memory allocated and the amount in use.  I think these are the "Total" and
>> "Used" fields in slabinfo.
>>     
>
> Total is the total memory allocated from the page allocator. There are 4 
> slab allocated with the size of 4096 bytes each. This is 16k.
>
> The used value is the memory that was actually handed out through kmalloc 
> and friends.
>  
>   
>> Total = page_size << order
>>     
>
> Order = 0. So Total would be 4096 << 0 = 4096. Wrong value.
>   
I'm not sure what your 'wong value.  I think it's because I said 
page_size << order instead of (page_size << order ) * number of slabs, 
right?
>> As for 'Used' that looks to be a straight calculation of objects * object_size
>>     
>
> Right.
>
>   
>> The Slabs field in /proc/meminfo is the total of the individual 'Total's...
>>     
>
> Right.
>
>   
>> Stay tuned and at some point I'll have support in collectl for reporting
>> total/allocated usage by slab in collectl, though perhaps I'll post a
>> 'proposal' first in the hopes of getting some constructive feedback as I want
>> to present useful information rather than that columns of numbers.
>>     
>
> Ahh Great. Thanks for all your work.
>   
now the only assumption is that someone will actually use it!   8-)

one more thing - can I assume order is a constant for a particular type 
of a slab and only need to read it at initialization time?

-mark

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: SLUB
  2007-12-27 19:51             ` SLUB Mark Seger
@ 2007-12-27 19:53               ` Christoph Lameter
  0 siblings, 0 replies; 24+ messages in thread
From: Christoph Lameter @ 2007-12-27 19:53 UTC (permalink / raw)
  To: Mark Seger; +Cc: linux-mm

On Thu, 27 Dec 2007, Mark Seger wrote:

> > Order = 0. So Total would be 4096 << 0 = 4096. Wrong value.
> >   
> I'm not sure what your 'wong value.  I think it's because I said page_size <<
> order instead of (page_size << order ) * number of slabs, right?

Right.

> one more thing - can I assume order is a constant for a particular type of a
> slab and only need to read it at initialization time?

Correct. Only the number of slabs and the number of objects changes.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: SLUB
  2007-12-20 23:36   ` SLUB Mark Seger
  2007-12-21  1:09     ` SLUB Mark Seger
@ 2007-12-21 21:32     ` Christoph Lameter
  1 sibling, 0 replies; 24+ messages in thread
From: Christoph Lameter @ 2007-12-21 21:32 UTC (permalink / raw)
  To: Mark Seger; +Cc: linux-mm

On Thu, 20 Dec 2007, Mark Seger wrote:

> What I'm not sure about is how this maps to the old slab info.  Specifically,
> I believe in the old model one reported on the size taken up by the slabs
> (number of slabs X number of objects/slab X object size).  There was a second
> size for the actual number of objects in use, so in my report that looked like
> this:
> 
> #                      <-----------Objects----------><---------Slab
> Allocation------>
> #Name                  InUse   Bytes   Alloc   Bytes   InUse   Bytes   Total
> Bytes
> nfs_direct_cache           0       0       0       0       0       0       0
> 0
> nfs_write_data            36   27648      40   30720       8   32768       8
> 32768
> 
> the slab allocation was real memory allocated (which should come close to
> Slab: in /proc/meminfo, right?) for the slabs while the object bytes were

The real memory allocates can be deducated from the "slabs" field. 
Multiply that by the order of the slab and you have the size of it.

The "objects" are the actual objects in current use.

> To get back to my original question, I'd like to make sure that I'm reporting
> useful information and not just data for the sake of it.  In one of your
> postings I saw a report you had that showed:
> 
> slubinfo - version: 1.0
> # name            <objects> <order> <objsize> <slabs>/<partial>/<cpu> <flags>
> <nodes>

That report can be had using the slabinfo tool. See 
Documentation/vm/slabinfo.c

> How useful is order, cpu, flags and nodes?
> Do people really care about how much memory is taken up by objects vs slabs?
> If not, I could see reporting for each slab:
> - object size
> - number objects
> - slab size
> - number of slabs
> - total memory (slab size X number of slabs)
> - whatever else people might think to be useful such as order, cpu, flags, etc

Sounds fine.
 
> Another thing I noticed is a number of the slabs are simply links to the same
> base name and is it sufficient to just report the base names and not those
> linked to it?  Seems reasonable to me...

slabinfo reports it like that.
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: SLUB
  2007-12-20 19:44 ` SLUB Christoph Lameter
  2007-12-20 23:36   ` SLUB Mark Seger
@ 2007-12-21 16:59   ` Mark Seger
  2007-12-21 21:37     ` SLUB Christoph Lameter
  1 sibling, 1 reply; 24+ messages in thread
From: Mark Seger @ 2007-12-21 16:59 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm

> I think we better keep it public (so that it goes into the archive). Here 
> a short description of the field in /sys/kernel/slab/<slabcache> that you 
> would need
>
> -r--r--r-- 1 root root 4096 Dec 20 11:41 object_size
>
> The size of an object. Subtract slab_size - object_size and you have the 
> per object overhead generated by alignements and slab metadata. Does not 
> change you only need to read this once.
>
> -r--r--r-- 1 root root 4096 Dec 20 11:41 objects
>
> Number of objects in use. This changes and you may want to monitor it.
>
> -r--r--r-- 1 root root 4096 Dec 20 11:41 slab_size
>
> Total memory used for a single object. Read this only once.
>
> -r--r--r-- 1 root root 4096 Dec 20 11:41 slabs
>
> Number of slab pages in use for this slab cache. May change if slab is 
> extended.
>   
Sorry for being confused, but I thought that a slab was made up of a 
number of objects and above you're saying slab_size is the size of 
single object.  Furthermore, looking at /sys/slab/shmem_inode_cache I see:

object_size = 960
objs_per_slab = 4

which implies a slab is made up more than one object, so which is it?  
could it be a simple matter of clearer names?  I also see

slab_size = 968

which certainly supports your statement about this being the size of an 
object and it looks like there is 8 bytes of overhead.  finally, I also see

objects = 242

and objects * obj_per_slab = slabsize.  is that a coincidence?

-mark


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: SLUB
  2007-12-21 16:59   ` SLUB Mark Seger
@ 2007-12-21 21:37     ` Christoph Lameter
  0 siblings, 0 replies; 24+ messages in thread
From: Christoph Lameter @ 2007-12-21 21:37 UTC (permalink / raw)
  To: Mark Seger; +Cc: linux-mm

On Fri, 21 Dec 2007, Mark Seger wrote:

> Sorry for being confused, but I thought that a slab was made up of a number of
> objects and above you're saying slab_size is the size of single object.
> Furthermore, looking at /sys/slab/shmem_inode_cache I see:
> 
> object_size = 960
> objs_per_slab = 4
> 
> which implies a slab is made up more than one object, so which is it?  could
> it be a simple matter of clearer names?  I also see

Yes a slab holds "objs_per_slab" object/
> 
> slab_size = 968
> 
> which certainly supports your statement about this being the size of an object
> and it looks like there is 8 bytes of overhead.  finally, I also see
> 
> objects = 242
> 
> and objects * obj_per_slab = slabsize.  is that a coincidence?

This means that the slab contains 242 active objects. From the "slabs" 
field you can deduce how many objects the slab could hold:

slabs * objs_per_slab

If you subtract "objects" from this then you have the number of unused 
objects in the slabs.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2007-12-31 18:30 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-12-20 15:06 SLUB Mark Seger
2007-12-20 19:44 ` SLUB Christoph Lameter
2007-12-20 23:36   ` SLUB Mark Seger
2007-12-21  1:09     ` SLUB Mark Seger
2007-12-21  1:27       ` SLUB Mark Seger
2007-12-21 21:41       ` SLUB Christoph Lameter
2007-12-27 14:22         ` SLUB Mark Seger
2007-12-27 15:59           ` SLUB Mark Seger
2007-12-27 19:43             ` SLUB Christoph Lameter
2007-12-27 19:57               ` SLUB Mark Seger
2007-12-27 19:58                 ` SLUB Christoph Lameter
2007-12-27 20:17                   ` SLUB Mark Seger
2007-12-27 20:55                   ` SLUB Mark Seger
2007-12-27 20:59                     ` SLUB Christoph Lameter
2007-12-27 23:49                       ` collectl and the new slab allocator [slub] statistics Mark Seger
2007-12-27 23:52                         ` Christoph Lameter
2007-12-28 15:10                           ` Mark Seger
2007-12-31 18:30                             ` Mark Seger
2007-12-27 19:40           ` SLUB Christoph Lameter
2007-12-27 19:51             ` SLUB Mark Seger
2007-12-27 19:53               ` SLUB Christoph Lameter
2007-12-21 21:32     ` SLUB Christoph Lameter
2007-12-21 16:59   ` SLUB Mark Seger
2007-12-21 21:37     ` SLUB Christoph Lameter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox