From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx139.postini.com [74.125.245.139]) by kanga.kvack.org (Postfix) with SMTP id D7A6B6B004D for ; Wed, 1 Aug 2012 08:45:58 -0400 (EDT) Message-ID: <50192453.9080706@parallels.com> Date: Wed, 1 Aug 2012 16:42:59 +0400 From: Glauber Costa MIME-Version: 1.0 Subject: Re: Any reason to use put_page in slub.c? References: <1343391586-18837-1-git-send-email-glommer@parallels.com> <50163D94.5050607@parallels.com> <5017968C.6050301@parallels.com> <5017E72D.2060303@parallels.com> <5017E929.70602@parallels.com> <1343746344.8473.4.camel@dabdike.int.hansenpartnership.com> In-Reply-To: <1343746344.8473.4.camel@dabdike.int.hansenpartnership.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: James Bottomley Cc: Christoph Lameter , linux-mm@kvack.org, Pekka Enberg , David Rientjes , Andrew Morton On 07/31/2012 06:52 PM, James Bottomley wrote: > On Tue, 2012-07-31 at 09:31 -0500, Christoph Lameter wrote: >> On Tue, 31 Jul 2012, Glauber Costa wrote: >> >>> On 07/31/2012 06:17 PM, Christoph Lameter wrote: >>>> On Tue, 31 Jul 2012, Glauber Costa wrote: >>>> >>>>> On 07/31/2012 06:09 PM, Christoph Lameter wrote: >>>>>> That is understood. Typically these object where page sized though and >>>>>> various assumptions (pretty dangerous ones as you are finding out) are >>>>>> made regarding object reuse. The fallback of SLUB for higher order allocs >>>>>> to the page allocator avoids these problems for higher order pages. >>>>> omg... >>>> >>>> I would be very thankful if you would go through the tree and check for >>>> any remaining use cases like that. Would take care of your problem. >>> >>> I would be happy to do it. Do you have any example of any user that >>> behaved like this in the past, so I can search for something similar? >>> >>> This can potentially take many forms, and auditing every kfree out there >>> is not humanly possible. The best I can do is to search for known >>> patterns here... >> >> The basic problem is that someone will take the address of an object that >> is allocated via slab and then access the page struct to increase the page >> count. >> >> So you would see >> >> page = virt_to_page(); >> >> get_page(page); >> >> >> The main cuprit in the past has been the DMA code in the SCSI layer. I >> think it was the first 512 byte control block for the device that was the >> main issue. There was a discussion betwen Hugh Dickins and me when SLUB >> was first released about this issue and it resulted in some changes so >> that certain fields in the page struct were not touched by SLUB since they >> were needed for I/O. > > Hey, don't try to pin this on me. We don't use get_page() at all on the > ordinary DMA route. There are four get_page() calls in the whole of > drivers/scsi. One is in the sg.c fault path, which looks genuine. The > other three are in fcoe and iSCSI ... what they're trying to do is to > ensure that the page hangs around until the device sees the data in a > network tx path. > > I can't see why any of these pages would come from kmalloc() or any > other slab object since they should all be user pages. > > James > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org > On 07/31/2012 06:52 PM, James Bottomley wrote: > On Tue, 2012-07-31 at 09:31 -0500, Christoph Lameter wrote: >> On Tue, 31 Jul 2012, Glauber Costa wrote: >> >>> On 07/31/2012 06:17 PM, Christoph Lameter wrote: >>>> On Tue, 31 Jul 2012, Glauber Costa wrote: >>>> >>>>> On 07/31/2012 06:09 PM, Christoph Lameter wrote: >>>>>> That is understood. Typically these object where page sized though and >>>>>> various assumptions (pretty dangerous ones as you are finding out) are >>>>>> made regarding object reuse. The fallback of SLUB for higher order allocs >>>>>> to the page allocator avoids these problems for higher order pages. >>>>> omg... >>>> >>>> I would be very thankful if you would go through the tree and check for >>>> any remaining use cases like that. Would take care of your problem. >>> >>> I would be happy to do it. Do you have any example of any user that >>> behaved like this in the past, so I can search for something similar? >>> >>> This can potentially take many forms, and auditing every kfree out there >>> is not humanly possible. The best I can do is to search for known >>> patterns here... >> >> The basic problem is that someone will take the address of an object that >> is allocated via slab and then access the page struct to increase the page >> count. >> >> So you would see >> >> page = virt_to_page(); >> >> get_page(page); >> >> >> The main cuprit in the past has been the DMA code in the SCSI layer. I >> think it was the first 512 byte control block for the device that was the >> main issue. There was a discussion betwen Hugh Dickins and me when SLUB >> was first released about this issue and it resulted in some changes so >> that certain fields in the page struct were not touched by SLUB since they >> were needed for I/O. > > Hey, don't try to pin this on me. We don't use get_page() at all on the > ordinary DMA route. There are four get_page() calls in the whole of > drivers/scsi. One is in the sg.c fault path, which looks genuine. The > other three are in fcoe and iSCSI ... what they're trying to do is to > ensure that the page hangs around until the device sees the data in a > network tx path. > > I can't see why any of these pages would come from kmalloc() or any > other slab object since they should all be user pages. > I've audited all users of get_page() in the drivers/ directory for patterns like this. In general, they kmalloc something like a table of entries, and then get_page() the entries. The entries are either user pages, pages allocated by the page allocator, or physical addresses through their pfn (in 2 cases from the vga ones...) I took a look about some other instances where virt_to_page occurs together with kmalloc as well, and they all seem to fall in the same category. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org