On 10/23/2012 12:07 PM, Glauber Costa wrote:
> On 10/23/2012 04:48 AM, JoonSoo Kim wrote:
>> Hello, Glauber.
>>
>> 2012/10/23 Glauber Costa <glommer@parallels.com>:
>>> On 10/22/2012 06:45 PM, Christoph Lameter wrote:
>>>> On Mon, 22 Oct 2012, Glauber Costa wrote:
>>>>
>>>>> + * kmem_cache_free - Deallocate an object
>>>>> + * @cachep: The cache the allocation was from.
>>>>> + * @objp: The previously allocated object.
>>>>> + *
>>>>> + * Free an object which was previously allocated from this
>>>>> + * cache.
>>>>> + */
>>>>> +void kmem_cache_free(struct kmem_cache *s, void *x)
>>>>> +{
>>>>> +    __kmem_cache_free(s, x);
>>>>> +    trace_kmem_cache_free(_RET_IP_, x);
>>>>> +}
>>>>> +EXPORT_SYMBOL(kmem_cache_free);
>>>>> +
>>>>
>>>> This results in an additional indirection if tracing is off. Wonder if
>>>> there is a performance impact?
>>>>
>>> if tracing is on, you mean?
>>>
>>> Tracing already incurs overhead, not sure how much a function call would
>>> add to the tracing overhead.
>>>
>>> I would not be concerned with this, but I can measure, if you have any
>>> specific workload in mind.
>>
>> With this patch, kmem_cache_free() invokes __kmem_cache_free(),
>> that is, it add one more "call instruction" than before.
>>
>> I think that Christoph's comment means above fact.
> 
> Ah, this. Ok, I got fooled by his mention to tracing.
> 
> I do agree, but since freeing is ultimately dependent on the allocator
> layout, I don't see a clean way of doing this without dropping tears of
> sorrow around. The calls in slub/slab/slob would have to be somehow
> inlined. Hum... maybe it is possible to do it from
> include/linux/sl*b_def.h...
> 
> Let me give it a try and see what I can come up with.
> 

Ok.

I am attaching a PoC for this for your appreciation. This gets quite
ugly, but it's the way I found without including sl{a,u,o}b.c directly -
which would be even worse.

But I guess if we really want to avoid the cost of a function call,
there has to be a tradeoff...

For the record, the proposed usage for this would be:

1) Given a (inline) function, defined in mm/slab.h that translates the
cache from its object address (and then sanity checks it against the
cache parameter), translate_cache():

#define KMEM_CACHE_FREE(allocator_fn)                   \
void kmem_cache_free(struct kmem_cache *s, void *x)     \
{                                                       \
        struct kmem_cache *cachep;                      \
        cachep = translate_cache(s, x);                 \
        if (!cachep)                                    \
                return;                                 \
        allocator_fn(cachep, x);                        \
        trace_kmem_cache_free(_RET_IP_, x);             \
}                                                       \
EXPORT_SYMBOL(kmem_cache_free)