* Any reason to use put_page in slub.c? @ 2012-07-27 12:19 Glauber Costa 2012-07-27 15:55 ` Christoph Lameter 0 siblings, 1 reply; 15+ messages in thread From: Glauber Costa @ 2012-07-27 12:19 UTC (permalink / raw) To: linux-mm; +Cc: Pekka Enberg, David Rientjes, Christoph Lameter, Andrew Morton Hi, I've recently came across a bug in my kmemcg slab implementation, where memory wasn't being unaccounted every time I expected it to be. (bugs found by myself are becoming a lot lot rarer, for the record) I tracked it down to be due to the fact that we are now unaccounting at the page allocator by calling __free_accounted_pages instead of normal __free_pages. However, higher order kmalloc allocations in the slub doesn't do that. They call put_page() instead, and I missed the conversion spot when converting __free_pages() to __free_accounted_pages(). Now, although of course I can come up with put_accounted_page(), this is a bit more awkward: first, it is in everybody's interest in keeping changes to the page allocator to a minimum; also, put_page will not necessarily free the page, so the semantics can get a bit complicated. Since we are not doing any kind of page sharing with those pages in the slub - and are already doing compound checks ourselves, I was wondering why couldn't we just use __free_pages() instead. I see no reason not to. Replacing it with __free_page() seems to work - my patched kernel is up and running, and doing fine. But I am still wondering if there is anything I am overlooking. Do you guys think the following patch is safe? --- mm/slub.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/slub.c b/mm/slub.c index a136a75..a8fffeb 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -3399,7 +3399,7 @@ void kfree(const void *x) if (unlikely(!PageSlab(page))) { BUG_ON(!PageCompound(page)); kmemleak_free(x); - put_page(page); + __free_pages(page); return; } slab_free(page->slab, page, object, _RET_IP_); -- 1.7.10.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Any reason to use put_page in slub.c? 2012-07-27 12:19 Any reason to use put_page in slub.c? Glauber Costa @ 2012-07-27 15:55 ` Christoph Lameter 2012-07-30 7:53 ` Glauber Costa 0 siblings, 1 reply; 15+ messages in thread From: Christoph Lameter @ 2012-07-27 15:55 UTC (permalink / raw) To: Glauber Costa; +Cc: linux-mm, Pekka Enberg, David Rientjes, Andrew Morton On Fri, 27 Jul 2012, Glauber Costa wrote: > But I am still wondering if there is anything I am overlooking. put_page() is necessary because other subsystems may still be holding a refcount on the page (if f.e. there is DMA still pending to that page). -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Any reason to use put_page in slub.c? 2012-07-27 15:55 ` Christoph Lameter @ 2012-07-30 7:53 ` Glauber Costa 2012-07-30 19:23 ` Christoph Lameter 0 siblings, 1 reply; 15+ messages in thread From: Glauber Costa @ 2012-07-30 7:53 UTC (permalink / raw) To: Christoph Lameter; +Cc: linux-mm, Pekka Enberg, David Rientjes, Andrew Morton On 07/27/2012 07:55 PM, Christoph Lameter wrote: > On Fri, 27 Jul 2012, Glauber Costa wrote: > >> But I am still wondering if there is anything I am overlooking. > > put_page() is necessary because other subsystems may still be holding a > refcount on the page (if f.e. there is DMA still pending to that page). > Humm, this seems to be extremely unsafe in my read. If you do kmalloc, the API - AFAIK - does not provide us with any guarantee that the object (it's not even a page, in the strict sense!) allocated is reference counted internally. So relying on kfree to do it doesn't bode well. For one thing, slab doesn't go to the page allocator for high order allocations, and this code would crash miserably if running with the slab. Or am I missing something ? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Any reason to use put_page in slub.c? 2012-07-30 7:53 ` Glauber Costa @ 2012-07-30 19:23 ` Christoph Lameter 2012-07-31 8:25 ` Glauber Costa 0 siblings, 1 reply; 15+ messages in thread From: Christoph Lameter @ 2012-07-30 19:23 UTC (permalink / raw) To: Glauber Costa; +Cc: linux-mm, Pekka Enberg, David Rientjes, Andrew Morton On Mon, 30 Jul 2012, Glauber Costa wrote: > On 07/27/2012 07:55 PM, Christoph Lameter wrote: > > On Fri, 27 Jul 2012, Glauber Costa wrote: > > > >> But I am still wondering if there is anything I am overlooking. > > > > put_page() is necessary because other subsystems may still be holding a > > refcount on the page (if f.e. there is DMA still pending to that page). > > > > Humm, this seems to be extremely unsafe in my read. I do not like it either. Hopefully these usecases have been removed in the meantime but that used to be an issue. > If you do kmalloc, the API - AFAIK - does not provide us with any > guarantee that the object (it's not even a page, in the strict sense!) > allocated is reference counted internally. So relying on kfree to do it > doesn't bode well. For one thing, slab doesn't go to the page allocator > for high order allocations, and this code would crash miserably if > running with the slab. > > Or am I missing something ? Yes the refcounting is done at the page level by the page allocator. It is safe. The slab allocator can free a page removing all references from its internal structure while the subsystem page reference will hold off the page allocator from actually freeing the page until the subsystem itself drops the page count. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Any reason to use put_page in slub.c? 2012-07-30 19:23 ` Christoph Lameter @ 2012-07-31 8:25 ` Glauber Costa 2012-07-31 14:09 ` Christoph Lameter 0 siblings, 1 reply; 15+ messages in thread From: Glauber Costa @ 2012-07-31 8:25 UTC (permalink / raw) To: Christoph Lameter; +Cc: linux-mm, Pekka Enberg, David Rientjes, Andrew Morton On 07/30/2012 11:23 PM, Christoph Lameter wrote: > On Mon, 30 Jul 2012, Glauber Costa wrote: > >> On 07/27/2012 07:55 PM, Christoph Lameter wrote: >>> On Fri, 27 Jul 2012, Glauber Costa wrote: >>> >>>> But I am still wondering if there is anything I am overlooking. >>> >>> put_page() is necessary because other subsystems may still be holding a >>> refcount on the page (if f.e. there is DMA still pending to that page). >>> >> >> Humm, this seems to be extremely unsafe in my read. > > I do not like it either. Hopefully these usecases have been removed in the > meantime but that used to be an issue. > >> If you do kmalloc, the API - AFAIK - does not provide us with any >> guarantee that the object (it's not even a page, in the strict sense!) >> allocated is reference counted internally. So relying on kfree to do it >> doesn't bode well. For one thing, slab doesn't go to the page allocator >> for high order allocations, and this code would crash miserably if >> running with the slab. >> >> Or am I missing something ? > > Yes the refcounting is done at the page level by the page allocator. It is > safe. The slab allocator can free a page removing all references from its > internal structure while the subsystem page reference will hold off the > page allocator from actually freeing the page until the subsystem itself > drops the page count. > pages, yes. But when you do kfree, you don't free a page. You free an object. The allocator is totally free to keep the page around and pass it on to someone else. The use case that put_page protect against, would be totally and absolutely broken with every other allocator. They could give an object in the same address to another user in the very next moment. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Any reason to use put_page in slub.c? 2012-07-31 8:25 ` Glauber Costa @ 2012-07-31 14:09 ` Christoph Lameter 2012-07-31 14:09 ` Glauber Costa 0 siblings, 1 reply; 15+ messages in thread From: Christoph Lameter @ 2012-07-31 14:09 UTC (permalink / raw) To: Glauber Costa; +Cc: linux-mm, Pekka Enberg, David Rientjes, Andrew Morton On Tue, 31 Jul 2012, Glauber Costa wrote: > >> Or am I missing something ? > > > > Yes the refcounting is done at the page level by the page allocator. It is > > safe. The slab allocator can free a page removing all references from its > > internal structure while the subsystem page reference will hold off the > > page allocator from actually freeing the page until the subsystem itself > > drops the page count. > > > > pages, yes. But when you do kfree, you don't free a page. You free an > object. The allocator is totally free to keep the page around and pass > it on to someone else. That is understood. Typically these object where page sized though and various assumptions (pretty dangerous ones as you are finding out) are made regarding object reuse. The fallback of SLUB for higher order allocs to the page allocator avoids these problems for higher order pages. It would be better and cleaner if all callers would not use slab allocators but the page allocators directly for any page that requires an increased refcount for DMA operations. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Any reason to use put_page in slub.c? 2012-07-31 14:09 ` Christoph Lameter @ 2012-07-31 14:09 ` Glauber Costa 2012-07-31 14:17 ` Christoph Lameter 0 siblings, 1 reply; 15+ messages in thread From: Glauber Costa @ 2012-07-31 14:09 UTC (permalink / raw) To: Christoph Lameter; +Cc: linux-mm, Pekka Enberg, David Rientjes, Andrew Morton On 07/31/2012 06:09 PM, Christoph Lameter wrote: > That is understood. Typically these object where page sized though and > various assumptions (pretty dangerous ones as you are finding out) are > made regarding object reuse. The fallback of SLUB for higher order allocs > to the page allocator avoids these problems for higher order pages. omg... I am curious how slab handles this, since it doesn't seem to refcount in the same way slub does? Now, I am still left with the original problem: __free_pages() here would be a superior solution, and the right thing to do. Should we just convert it - and then fix whoever we find to be abusing it (it doesn't mean anything, but I am running it on my systems since then - 0 problems), or should I just create a hacky put_accounted_page()? I really, really dislike the later. Anyone else would care to comment on this ? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Any reason to use put_page in slub.c? 2012-07-31 14:09 ` Glauber Costa @ 2012-07-31 14:17 ` Christoph Lameter 2012-07-31 14:18 ` Glauber Costa 0 siblings, 1 reply; 15+ messages in thread From: Christoph Lameter @ 2012-07-31 14:17 UTC (permalink / raw) To: Glauber Costa; +Cc: linux-mm, Pekka Enberg, David Rientjes, Andrew Morton On Tue, 31 Jul 2012, Glauber Costa wrote: > On 07/31/2012 06:09 PM, Christoph Lameter wrote: > > That is understood. Typically these object where page sized though and > > various assumptions (pretty dangerous ones as you are finding out) are > > made regarding object reuse. The fallback of SLUB for higher order allocs > > to the page allocator avoids these problems for higher order pages. > omg... I would be very thankful if you would go through the tree and check for any remaining use cases like that. Would take care of your problem. > I am curious how slab handles this, since it doesn't seem to refcount in > the same way slub does? Slabs are not refcounting in general. With slab larger sized free pages may be queued for awhile on the freelist. I guess this has taken care of these issues in the past. > Now, I am still left with the original problem: > __free_pages() here would be a superior solution, and the right thing to > do. Should we just convert it - and then fix whoever we find to be > abusing it (it doesn't mean anything, but I am running it on my systems > since then - 0 problems), or should I just create a hacky > put_accounted_page()? > > I really, really dislike the later. So do I. If you can verify that this no longer occurs then your patch wil be fine and we can get rid of the put_page(). -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Any reason to use put_page in slub.c? 2012-07-31 14:17 ` Christoph Lameter @ 2012-07-31 14:18 ` Glauber Costa 2012-07-31 14:31 ` Christoph Lameter 0 siblings, 1 reply; 15+ messages in thread From: Glauber Costa @ 2012-07-31 14:18 UTC (permalink / raw) To: Christoph Lameter; +Cc: linux-mm, Pekka Enberg, David Rientjes, Andrew Morton On 07/31/2012 06:17 PM, Christoph Lameter wrote: > On Tue, 31 Jul 2012, Glauber Costa wrote: > >> On 07/31/2012 06:09 PM, Christoph Lameter wrote: >>> That is understood. Typically these object where page sized though and >>> various assumptions (pretty dangerous ones as you are finding out) are >>> made regarding object reuse. The fallback of SLUB for higher order allocs >>> to the page allocator avoids these problems for higher order pages. >> omg... > > I would be very thankful if you would go through the tree and check for > any remaining use cases like that. Would take care of your problem. I would be happy to do it. Do you have any example of any user that behaved like this in the past, so I can search for something similar? This can potentially take many forms, and auditing every kfree out there is not humanly possible. The best I can do is to search for known patterns here... -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Any reason to use put_page in slub.c? 2012-07-31 14:18 ` Glauber Costa @ 2012-07-31 14:31 ` Christoph Lameter 2012-07-31 14:52 ` James Bottomley 0 siblings, 1 reply; 15+ messages in thread From: Christoph Lameter @ 2012-07-31 14:31 UTC (permalink / raw) To: Glauber Costa; +Cc: linux-mm, Pekka Enberg, David Rientjes, Andrew Morton On Tue, 31 Jul 2012, Glauber Costa wrote: > On 07/31/2012 06:17 PM, Christoph Lameter wrote: > > On Tue, 31 Jul 2012, Glauber Costa wrote: > > > >> On 07/31/2012 06:09 PM, Christoph Lameter wrote: > >>> That is understood. Typically these object where page sized though and > >>> various assumptions (pretty dangerous ones as you are finding out) are > >>> made regarding object reuse. The fallback of SLUB for higher order allocs > >>> to the page allocator avoids these problems for higher order pages. > >> omg... > > > > I would be very thankful if you would go through the tree and check for > > any remaining use cases like that. Would take care of your problem. > > I would be happy to do it. Do you have any example of any user that > behaved like this in the past, so I can search for something similar? > > This can potentially take many forms, and auditing every kfree out there > is not humanly possible. The best I can do is to search for known > patterns here... The basic problem is that someone will take the address of an object that is allocated via slab and then access the page struct to increase the page count. So you would see page = virt_to_page(<slab_object>); get_page(page); The main cuprit in the past has been the DMA code in the SCSI layer. I think it was the first 512 byte control block for the device that was the main issue. There was a discussion betwen Hugh Dickins and me when SLUB was first released about this issue and it resulted in some changes so that certain fields in the page struct were not touched by SLUB since they were needed for I/O. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Any reason to use put_page in slub.c? 2012-07-31 14:31 ` Christoph Lameter @ 2012-07-31 14:52 ` James Bottomley 2012-08-01 12:42 ` Glauber Costa 0 siblings, 1 reply; 15+ messages in thread From: James Bottomley @ 2012-07-31 14:52 UTC (permalink / raw) To: Christoph Lameter Cc: Glauber Costa, linux-mm, Pekka Enberg, David Rientjes, Andrew Morton On Tue, 2012-07-31 at 09:31 -0500, Christoph Lameter wrote: > On Tue, 31 Jul 2012, Glauber Costa wrote: > > > On 07/31/2012 06:17 PM, Christoph Lameter wrote: > > > On Tue, 31 Jul 2012, Glauber Costa wrote: > > > > > >> On 07/31/2012 06:09 PM, Christoph Lameter wrote: > > >>> That is understood. Typically these object where page sized though and > > >>> various assumptions (pretty dangerous ones as you are finding out) are > > >>> made regarding object reuse. The fallback of SLUB for higher order allocs > > >>> to the page allocator avoids these problems for higher order pages. > > >> omg... > > > > > > I would be very thankful if you would go through the tree and check for > > > any remaining use cases like that. Would take care of your problem. > > > > I would be happy to do it. Do you have any example of any user that > > behaved like this in the past, so I can search for something similar? > > > > This can potentially take many forms, and auditing every kfree out there > > is not humanly possible. The best I can do is to search for known > > patterns here... > > The basic problem is that someone will take the address of an object that > is allocated via slab and then access the page struct to increase the page > count. > > So you would see > > page = virt_to_page(<slab_object>); > > get_page(page); > > > The main cuprit in the past has been the DMA code in the SCSI layer. I > think it was the first 512 byte control block for the device that was the > main issue. There was a discussion betwen Hugh Dickins and me when SLUB > was first released about this issue and it resulted in some changes so > that certain fields in the page struct were not touched by SLUB since they > were needed for I/O. Hey, don't try to pin this on me. We don't use get_page() at all on the ordinary DMA route. There are four get_page() calls in the whole of drivers/scsi. One is in the sg.c fault path, which looks genuine. The other three are in fcoe and iSCSI ... what they're trying to do is to ensure that the page hangs around until the device sees the data in a network tx path. I can't see why any of these pages would come from kmalloc() or any other slab object since they should all be user pages. James -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Any reason to use put_page in slub.c? 2012-07-31 14:52 ` James Bottomley @ 2012-08-01 12:42 ` Glauber Costa 2012-08-01 18:10 ` Christoph Lameter 0 siblings, 1 reply; 15+ messages in thread From: Glauber Costa @ 2012-08-01 12:42 UTC (permalink / raw) To: James Bottomley Cc: Christoph Lameter, linux-mm, Pekka Enberg, David Rientjes, Andrew Morton On 07/31/2012 06:52 PM, James Bottomley wrote: > On Tue, 2012-07-31 at 09:31 -0500, Christoph Lameter wrote: >> On Tue, 31 Jul 2012, Glauber Costa wrote: >> >>> On 07/31/2012 06:17 PM, Christoph Lameter wrote: >>>> On Tue, 31 Jul 2012, Glauber Costa wrote: >>>> >>>>> On 07/31/2012 06:09 PM, Christoph Lameter wrote: >>>>>> That is understood. Typically these object where page sized though and >>>>>> various assumptions (pretty dangerous ones as you are finding out) are >>>>>> made regarding object reuse. The fallback of SLUB for higher order allocs >>>>>> to the page allocator avoids these problems for higher order pages. >>>>> omg... >>>> >>>> I would be very thankful if you would go through the tree and check for >>>> any remaining use cases like that. Would take care of your problem. >>> >>> I would be happy to do it. Do you have any example of any user that >>> behaved like this in the past, so I can search for something similar? >>> >>> This can potentially take many forms, and auditing every kfree out there >>> is not humanly possible. The best I can do is to search for known >>> patterns here... >> >> The basic problem is that someone will take the address of an object that >> is allocated via slab and then access the page struct to increase the page >> count. >> >> So you would see >> >> page = virt_to_page(<slab_object>); >> >> get_page(page); >> >> >> The main cuprit in the past has been the DMA code in the SCSI layer. I >> think it was the first 512 byte control block for the device that was the >> main issue. There was a discussion betwen Hugh Dickins and me when SLUB >> was first released about this issue and it resulted in some changes so >> that certain fields in the page struct were not touched by SLUB since they >> were needed for I/O. > > Hey, don't try to pin this on me. We don't use get_page() at all on the > ordinary DMA route. There are four get_page() calls in the whole of > drivers/scsi. One is in the sg.c fault path, which looks genuine. The > other three are in fcoe and iSCSI ... what they're trying to do is to > ensure that the page hangs around until the device sees the data in a > network tx path. > > I can't see why any of these pages would come from kmalloc() or any > other slab object since they should all be user pages. > > James > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> > On 07/31/2012 06:52 PM, James Bottomley wrote: > On Tue, 2012-07-31 at 09:31 -0500, Christoph Lameter wrote: >> On Tue, 31 Jul 2012, Glauber Costa wrote: >> >>> On 07/31/2012 06:17 PM, Christoph Lameter wrote: >>>> On Tue, 31 Jul 2012, Glauber Costa wrote: >>>> >>>>> On 07/31/2012 06:09 PM, Christoph Lameter wrote: >>>>>> That is understood. Typically these object where page sized though and >>>>>> various assumptions (pretty dangerous ones as you are finding out) are >>>>>> made regarding object reuse. The fallback of SLUB for higher order allocs >>>>>> to the page allocator avoids these problems for higher order pages. >>>>> omg... >>>> >>>> I would be very thankful if you would go through the tree and check for >>>> any remaining use cases like that. Would take care of your problem. >>> >>> I would be happy to do it. Do you have any example of any user that >>> behaved like this in the past, so I can search for something similar? >>> >>> This can potentially take many forms, and auditing every kfree out there >>> is not humanly possible. The best I can do is to search for known >>> patterns here... >> >> The basic problem is that someone will take the address of an object that >> is allocated via slab and then access the page struct to increase the page >> count. >> >> So you would see >> >> page = virt_to_page(<slab_object>); >> >> get_page(page); >> >> >> The main cuprit in the past has been the DMA code in the SCSI layer. I >> think it was the first 512 byte control block for the device that was the >> main issue. There was a discussion betwen Hugh Dickins and me when SLUB >> was first released about this issue and it resulted in some changes so >> that certain fields in the page struct were not touched by SLUB since they >> were needed for I/O. > > Hey, don't try to pin this on me. We don't use get_page() at all on the > ordinary DMA route. There are four get_page() calls in the whole of > drivers/scsi. One is in the sg.c fault path, which looks genuine. The > other three are in fcoe and iSCSI ... what they're trying to do is to > ensure that the page hangs around until the device sees the data in a > network tx path. > > I can't see why any of these pages would come from kmalloc() or any > other slab object since they should all be user pages. > I've audited all users of get_page() in the drivers/ directory for patterns like this. In general, they kmalloc something like a table of entries, and then get_page() the entries. The entries are either user pages, pages allocated by the page allocator, or physical addresses through their pfn (in 2 cases from the vga ones...) I took a look about some other instances where virt_to_page occurs together with kmalloc as well, and they all seem to fall in the same category. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Any reason to use put_page in slub.c? 2012-08-01 12:42 ` Glauber Costa @ 2012-08-01 18:10 ` Christoph Lameter 2012-08-02 7:55 ` Glauber Costa 2012-08-02 8:07 ` James Bottomley 0 siblings, 2 replies; 15+ messages in thread From: Christoph Lameter @ 2012-08-01 18:10 UTC (permalink / raw) To: Glauber Costa Cc: James Bottomley, linux-mm, Pekka Enberg, David Rientjes, Andrew Morton On Wed, 1 Aug 2012, Glauber Costa wrote: > I've audited all users of get_page() in the drivers/ directory for > patterns like this. In general, they kmalloc something like a table of > entries, and then get_page() the entries. The entries are either user > pages, pages allocated by the page allocator, or physical addresses > through their pfn (in 2 cases from the vga ones...) > > I took a look about some other instances where virt_to_page occurs > together with kmalloc as well, and they all seem to fall in the same > category. The case that was notorious in the past was a scsi control structure allocated from slab that was then written to the device via DMA. And it was not on x86 but some esoteric platform (powerpc?), A reference to the discussion of this issue in 2007: http://lkml.indiana.edu/hypermail/linux/kernel/0706.3/0424.html -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Any reason to use put_page in slub.c? 2012-08-01 18:10 ` Christoph Lameter @ 2012-08-02 7:55 ` Glauber Costa 2012-08-02 8:07 ` James Bottomley 1 sibling, 0 replies; 15+ messages in thread From: Glauber Costa @ 2012-08-02 7:55 UTC (permalink / raw) To: Christoph Lameter Cc: James Bottomley, linux-mm, Pekka Enberg, David Rientjes, Andrew Morton On 08/01/2012 10:10 PM, Christoph Lameter wrote: > On Wed, 1 Aug 2012, Glauber Costa wrote: > >> I've audited all users of get_page() in the drivers/ directory for >> patterns like this. In general, they kmalloc something like a table of >> entries, and then get_page() the entries. The entries are either user >> pages, pages allocated by the page allocator, or physical addresses >> through their pfn (in 2 cases from the vga ones...) >> >> I took a look about some other instances where virt_to_page occurs >> together with kmalloc as well, and they all seem to fall in the same >> category. > > The case that was notorious in the past was a scsi control structure > allocated from slab that was then written to the device via DMA. And it > was not on x86 but some esoteric platform (powerpc?), > > A reference to the discussion of this issue in 2007: > > http://lkml.indiana.edu/hypermail/linux/kernel/0706.3/0424.html > Thanks. So again, I've scanned across that thread, and found some very useful excerpts from it, that can only argue in favor of my patch =) "There are no kmalloced pages. There is only kmalloced memory. You allocate pages from the page allocator. Its a layering violation to expect a page struct operation on a slab object to work." "So someone played loose ball with the slab, was successful and that makes it right now?" Looking at the code again, I see that page_mapping(), that ends up being called to do the translation in those pathological cases now features a VM_BUG_ON(), put in place by yourself. This dates back from 2007, giving me enough reason to believe that whatever issue still existed back then is already sorted out - or nobody really cares. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Any reason to use put_page in slub.c? 2012-08-01 18:10 ` Christoph Lameter 2012-08-02 7:55 ` Glauber Costa @ 2012-08-02 8:07 ` James Bottomley 1 sibling, 0 replies; 15+ messages in thread From: James Bottomley @ 2012-08-02 8:07 UTC (permalink / raw) To: Christoph Lameter Cc: Glauber Costa, linux-mm, Pekka Enberg, David Rientjes, Andrew Morton On Wed, 2012-08-01 at 13:10 -0500, Christoph Lameter wrote: > On Wed, 1 Aug 2012, Glauber Costa wrote: > > > I've audited all users of get_page() in the drivers/ directory for > > patterns like this. In general, they kmalloc something like a table of > > entries, and then get_page() the entries. The entries are either user > > pages, pages allocated by the page allocator, or physical addresses > > through their pfn (in 2 cases from the vga ones...) > > > > I took a look about some other instances where virt_to_page occurs > > together with kmalloc as well, and they all seem to fall in the same > > category. > > The case that was notorious in the past was a scsi control structure > allocated from slab that was then written to the device via DMA. And it > was not on x86 but some esoteric platform (powerpc?), > > A reference to the discussion of this issue in 2007: > > http://lkml.indiana.edu/hypermail/linux/kernel/0706.3/0424.html What you wrote above bears no relation to the thread. The thread is a long argument about what flush_dcache_page() should be doing if called on slab memory. Hugh told you about five times: "nothing". Which is the correct answer since flush_dcache_page() is our user to kernel memory coherence API ... it's actually nothing to do with DMA although it can be used to cause coherence for the purposes for DMA, but often it's simply used to allow the kernel to write to or read from user memory. What you seem to be worried about is DMA cache line interference caused by object misalignment? But even in what you wrote above it's clear you don't understand what that actually is. As long as you flush correctly, you can never actually get DMA cache line interference on memory being sent to a device via DMA ... however misaligned it is. The killer case is unresolvable incoherencies when you DMA *from* a device. For this case, if you have a cache line overlapping an object like this -------------------------------- OBJ | Other OBJ -------------------------------- | CPU Cache line | -------------------------------- If OBJ gets its underlying page altered by DMA at the same time someone writes to other OBJ (causing the CPU to pull in the cache line with the old pre-DMA value for OBJ and then dirty the component for Other OBJ), you have both a dirty cache line and altered (i.e. dirty) underlying memory. Here an invalidate will destroy the update to other OBJ and a flush will destroy the DMA update to OBJ, so the alias is unresolvable. Is that what the worry is? James -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2012-08-02 8:07 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-07-27 12:19 Any reason to use put_page in slub.c? Glauber Costa 2012-07-27 15:55 ` Christoph Lameter 2012-07-30 7:53 ` Glauber Costa 2012-07-30 19:23 ` Christoph Lameter 2012-07-31 8:25 ` Glauber Costa 2012-07-31 14:09 ` Christoph Lameter 2012-07-31 14:09 ` Glauber Costa 2012-07-31 14:17 ` Christoph Lameter 2012-07-31 14:18 ` Glauber Costa 2012-07-31 14:31 ` Christoph Lameter 2012-07-31 14:52 ` James Bottomley 2012-08-01 12:42 ` Glauber Costa 2012-08-01 18:10 ` Christoph Lameter 2012-08-02 7:55 ` Glauber Costa 2012-08-02 8:07 ` James Bottomley
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox