linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Glauber Costa <glommer@parallels.com>
To: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Christoph Lameter <cl@linux.com>,
	linux-mm@kvack.org, Pekka Enberg <penberg@kernel.org>,
	David Rientjes <rientjes@google.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: Any reason to use put_page in slub.c?
Date: Wed, 1 Aug 2012 16:42:59 +0400	[thread overview]
Message-ID: <50192453.9080706@parallels.com> (raw)
In-Reply-To: <1343746344.8473.4.camel@dabdike.int.hansenpartnership.com>

On 07/31/2012 06:52 PM, James Bottomley wrote:
> On Tue, 2012-07-31 at 09:31 -0500, Christoph Lameter wrote:
>> On Tue, 31 Jul 2012, Glauber Costa wrote:
>>
>>> On 07/31/2012 06:17 PM, Christoph Lameter wrote:
>>>> On Tue, 31 Jul 2012, Glauber Costa wrote:
>>>>
>>>>> On 07/31/2012 06:09 PM, Christoph Lameter wrote:
>>>>>> That is understood. Typically these object where page sized though and
>>>>>> various assumptions (pretty dangerous ones as you are finding out) are
>>>>>> made regarding object reuse. The fallback of SLUB for higher order allocs
>>>>>> to the page allocator avoids these problems for higher order pages.
>>>>> omg...
>>>>
>>>> I would be very thankful if you would go through the tree and check for
>>>> any remaining use cases like that. Would take care of your problem.
>>>
>>> I would be happy to do it. Do you have any example of any user that
>>> behaved like this in the past, so I can search for something similar?
>>>
>>> This can potentially take many forms, and auditing every kfree out there
>>> is not humanly possible. The best I can do is to search for known
>>> patterns here...
>>
>> The basic problem is that someone will take the address of an object that
>> is allocated via slab and then access the page struct to increase the page
>> count.
>>
>> So you would see
>>
>> page = virt_to_page(<slab_object>);
>>
>> get_page(page);
>>
>>
>> The main cuprit in the past has been the DMA code in the SCSI layer. I
>> think it was the first 512 byte control block for the device that was the
>> main issue. There was a discussion betwen Hugh Dickins and me when SLUB
>> was first released about this issue and it resulted in some changes so
>> that certain fields in the page struct were not touched by SLUB since they
>> were needed for I/O.
> 
> Hey, don't try to pin this on me.  We don't use get_page() at all on the
> ordinary DMA route.  There are four get_page() calls in the whole of
> drivers/scsi.  One is in the sg.c fault path, which looks genuine.  The
> other three are in fcoe and iSCSI ... what they're trying to do is to
> ensure that the page hangs around until the device sees the data in a
> network tx path.
> 
> I can't see why any of these pages would come from kmalloc() or any
> other slab object since they should all be user pages.
> 
> James
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

On 07/31/2012 06:52 PM, James Bottomley wrote:
> On Tue, 2012-07-31 at 09:31 -0500, Christoph Lameter wrote:
>> On Tue, 31 Jul 2012, Glauber Costa wrote:
>>
>>> On 07/31/2012 06:17 PM, Christoph Lameter wrote:
>>>> On Tue, 31 Jul 2012, Glauber Costa wrote:
>>>>
>>>>> On 07/31/2012 06:09 PM, Christoph Lameter wrote:
>>>>>> That is understood. Typically these object where page sized though and
>>>>>> various assumptions (pretty dangerous ones as you are finding out) are
>>>>>> made regarding object reuse. The fallback of SLUB for higher order allocs
>>>>>> to the page allocator avoids these problems for higher order pages.
>>>>> omg...
>>>>
>>>> I would be very thankful if you would go through the tree and check for
>>>> any remaining use cases like that. Would take care of your problem.
>>>
>>> I would be happy to do it. Do you have any example of any user that
>>> behaved like this in the past, so I can search for something similar?
>>>
>>> This can potentially take many forms, and auditing every kfree out there
>>> is not humanly possible. The best I can do is to search for known
>>> patterns here...
>>
>> The basic problem is that someone will take the address of an object that
>> is allocated via slab and then access the page struct to increase the page
>> count.
>>
>> So you would see
>>
>> page = virt_to_page(<slab_object>);
>>
>> get_page(page);
>>
>>
>> The main cuprit in the past has been the DMA code in the SCSI layer. I
>> think it was the first 512 byte control block for the device that was the
>> main issue. There was a discussion betwen Hugh Dickins and me when SLUB
>> was first released about this issue and it resulted in some changes so
>> that certain fields in the page struct were not touched by SLUB since they
>> were needed for I/O.
> 
> Hey, don't try to pin this on me.  We don't use get_page() at all on the
> ordinary DMA route.  There are four get_page() calls in the whole of
> drivers/scsi.  One is in the sg.c fault path, which looks genuine.  The
> other three are in fcoe and iSCSI ... what they're trying to do is to
> ensure that the page hangs around until the device sees the data in a
> network tx path.
> 
> I can't see why any of these pages would come from kmalloc() or any
> other slab object since they should all be user pages.
> 

I've audited all users of get_page() in the drivers/ directory for
patterns like this. In general, they kmalloc something like a table of
entries, and then get_page() the entries. The entries are either user
pages, pages allocated by the page allocator, or physical addresses
through their pfn (in 2 cases from the vga ones...)

I took a look about some other instances where virt_to_page occurs
together with kmalloc as well, and they all seem to fall in the same
category.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2012-08-01 12:45 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-27 12:19 Glauber Costa
2012-07-27 15:55 ` Christoph Lameter
2012-07-30  7:53   ` Glauber Costa
2012-07-30 19:23     ` Christoph Lameter
2012-07-31  8:25       ` Glauber Costa
2012-07-31 14:09         ` Christoph Lameter
2012-07-31 14:09           ` Glauber Costa
2012-07-31 14:17             ` Christoph Lameter
2012-07-31 14:18               ` Glauber Costa
2012-07-31 14:31                 ` Christoph Lameter
2012-07-31 14:52                   ` James Bottomley
2012-08-01 12:42                     ` Glauber Costa [this message]
2012-08-01 18:10                       ` Christoph Lameter
2012-08-02  7:55                         ` Glauber Costa
2012-08-02  8:07                         ` James Bottomley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50192453.9080706@parallels.com \
    --to=glommer@parallels.com \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=linux-mm@kvack.org \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox