* HARDENED_USERCOPY will BUG on multiple slub objects coalesced into an sk_buff fragment @ 2018-06-01 0:03 Anton Eidelman 2018-06-01 19:02 ` Laura Abbott 0 siblings, 1 reply; 10+ messages in thread From: Anton Eidelman @ 2018-06-01 0:03 UTC (permalink / raw) To: linux-mm [-- Attachment #1: Type: text/plain, Size: 4160 bytes --] Hello, Here's a rare issue I reproduce on 4.12.10 (centos config): full log sample below. An innocent process (dhcpclient) is about to receive a datagram, but during skb_copy_datagram_iter() usercopy triggers a BUG in: usercopy.c:check_heap_object() -> slub.c:__check_heap_object(), because the sk_buff fragment being copied crosses the 64-byte slub object boundary. Example __check_heap_object() context: n=128 << usually 128, sometimes 192. object_size=64 s->size=64 page_address(page)=0xffff880233f7c000 ptr=0xffff880233f7c540 My take on the root cause: When adding data to an skb, new data is appended to the current fragment if the new chunk immediately follows the last one: by simply increasing the frag->size, skb_frag_size_add(). See include/linux/skbuff.h:skb_can_coalesce() callers. This happens very frequently for kmem_cache objects (slub/slab) with intensive kernel level TCP traffic, and produces sk_buff fragments that span multiple kmem_cache objects. However, if the same happens to receive data intended for user space, usercopy triggers a BUG. This is quite rare but possible: fails after 5-60min of network traffic (needs some unfortunate timing, e.g. only on QEMU, without CONFIG_SLUB_DEBUG_ON etc). I used an instrumentation that counts coalesced chunks in the fragment, in order to confirm that the failing fragment was legally constructed from multiple slub objects. On 4.17.0.rc3: I could not reproduce the issue with the latest kernel, but the changes in usercopy.c and slub.c since 4.12 do not address the issue. Moreover, it would be quite hard to do without effectively disabling the heap protection. However, looking at the recent changes in include/linux/sk_buff.h I see skb_zcopy() that yields negative skb_can_coalesce() and may have masked the problem. Please, let me know what do you think? 4.12.10 is the centos official kernel with CONFIG_HARDENED_USERCOPY enabled: if the problem is real we better have an erratum for it. Regards, Anton Eidelman [ 655.602500] usercopy: kernel memory exposure attempt detected from ffff88022a31aa00 *(kmalloc-64) (192 bytes*) [ 655.604254] ----------[ cut here ]---------- [ 655.604877] kernel BUG at mm/usercopy.c:72! [ 655.606302] invalid opcode: 0000 1 SMP [ 655.618390] CPU: 3 PID: 2335 Comm: dhclient Tainted: G O 4.12.10-1.el7.elrepo.x86_64 #1 [ 655.619666] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 655.620926] task: ffff880229ab2d80 task.stack: ffffc90001198000 [ 655.621786] RIP: 0010:__check_object_size+0x74/0x190 [ 655.622489] RSP: 0018:ffffc9000119bbb8 EFLAGS: 00010246 [ 655.623236] RAX: 0000000000000060 RBX: ffff88022a31aa00 RCX: 0000000000000000 [ 655.624234] RDX: 0000000000000000 RSI: ffff88023fcce108 RDI: ffff88023fcce108 [ 655.625237] RBP: ffffc9000119bbd8 R08: 00000000fffffffe R09: 0000000000000271 [ 655.626248] R10: 0000000000000005 R11: 0000000000000270 R12: 00000000000000c0 [ 655.627256] R13: ffff88022a31aac0 R14: 0000000000000001 R15: 00000000000000c0 [ 655.628268] FS: 00007fb54413b880(0000) GS:ffff88023fcc0000(0000) knlGS:0000000000000000 [ 655.629561] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 655.630289] CR2: 00007fb5439dc5c0 CR3: 000000023211d000 CR4: 00000000003406e0 [ 655.631268] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 655.632281] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 655.633318] Call Trace: [ 655.633696] copy_page_to_iter_iovec+0x9c/0x180 [ 655.634351] copy_page_to_iter+0x22/0x160 [ 655.634943] skb_copy_datagram_iter+0x157/0x260 [ 655.635604] packet_recvmsg+0xcb/0x460 [ 655.636156] ? selinux_socket_recvmsg+0x17/0x20 [ 655.636816] sock_recvmsg+0x3d/0x50 [ 655.637330] ___sys_recvmsg+0xd7/0x1f0 [ 655.637892] ? kvm_clock_get_cycles+0x1e/0x20 [ 655.638533] ? ktime_get_ts64+0x49/0xf0 [ 655.639101] ? _copy_to_user+0x26/0x40 [ 655.639657] __sys_recvmsg+0x51/0x90 [ 655.640184] SyS_recvmsg+0x12/0x20 [ 655.640696] entry_SYSCALL_64_fastpath+0x1a/0xa5 -------------------------------------------------------------------------------------------------------------------------------------------- [-- Attachment #2: Type: text/html, Size: 43005 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: HARDENED_USERCOPY will BUG on multiple slub objects coalesced into an sk_buff fragment 2018-06-01 0:03 HARDENED_USERCOPY will BUG on multiple slub objects coalesced into an sk_buff fragment Anton Eidelman @ 2018-06-01 19:02 ` Laura Abbott 2018-06-01 20:49 ` Kees Cook 0 siblings, 1 reply; 10+ messages in thread From: Laura Abbott @ 2018-06-01 19:02 UTC (permalink / raw) To: Anton Eidelman, linux-mm, Kees Cook; +Cc: linux-hardened (cc-ing some interested people) On 05/31/2018 05:03 PM, Anton Eidelman wrote: > Hello, > > Here's a rare issue I reproduce on 4.12.10 (centos config): full log sample below. > An innocent process (dhcpclient) is about to receive a datagram, but duringA skb_copy_datagram_iter() usercopy triggers a BUG in: > usercopy.c:check_heap_object() -> slub.c:__check_heap_object(), because the sk_buff fragment being copied crosses the 64-byte slub object boundary. > > Example __check_heap_object() context: > A n=128A A << usually 128, sometimes 192. > A object_size=64 > A s->size=64 > A page_address(page)=0xffff880233f7c000 > A ptr=0xffff880233f7c540 > > My take on the root cause: > A When adding data to an skb, new data is appended to the current fragment if the new chunk immediately follows the last one: by simply increasing the frag->size, skb_frag_size_add(). > A See include/linux/skbuff.h:skb_can_coalesce() callers. > A This happens very frequently for kmem_cache objects (slub/slab) with intensive kernel level TCP traffic, and produces sk_buff fragments that span multiple kmem_cache objects. > A However, if the same happens to receive data intended for user space, usercopy triggers a BUG. > A This is quite rare but possible: fails after 5-60min of network traffic (needs some unfortunate timing, e.g. only on QEMU, without CONFIG_SLUB_DEBUG_ON etc). > A I used an instrumentation that counts coalesced chunks in the fragment, in order to confirm that the failing fragment was legally constructed from multiple slub objects. > > On 4.17.0.rc3: > A I could not reproduce the issue with the latest kernel, but the changes in usercopy.c and slub.c since 4.12 do not address the issue. > A Moreover, it would be quite hard to do without effectively disabling the heap protection. > A However, looking at the recent changes in include/linux/sk_buff.h I seeA skb_zcopy() that yields negative skb_can_coalesce()A and may have masked the problem. > > Please, let me know what do you think? > 4.12.10 is the centos official kernel with CONFIG_HARDENED_USERCOPYA enabled: if the problem is real we better have an erratum for it. > > Regards, > Anton Eidelman > > > [ 655.602500] usercopy: kernel memory exposure attempt detected from ffff88022a31aa00 *(kmalloc-64) (192 bytes*) > [ 655.604254] ----------[ cut here ]---------- > [ 655.604877] kernel BUG at mm/usercopy.c:72! > [ 655.606302] invalid opcode: 0000 1 SMP > [ 655.618390] CPU: 3 PID: 2335 Comm: dhclient Tainted: G O 4.12.10-1.el7.elrepo.x86_64 #1 > [ 655.619666] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 > [ 655.620926] task: ffff880229ab2d80 task.stack: ffffc90001198000 > [ 655.621786] RIP: 0010:__check_object_size+0x74/0x190 > [ 655.622489] RSP: 0018:ffffc9000119bbb8 EFLAGS: 00010246 > [ 655.623236] RAX: 0000000000000060 RBX: ffff88022a31aa00 RCX: 0000000000000000 > [ 655.624234] RDX: 0000000000000000 RSI: ffff88023fcce108 RDI: ffff88023fcce108 > [ 655.625237] RBP: ffffc9000119bbd8 R08: 00000000fffffffe R09: 0000000000000271 > [ 655.626248] R10: 0000000000000005 R11: 0000000000000270 R12: 00000000000000c0 > [ 655.627256] R13: ffff88022a31aac0 R14: 0000000000000001 R15: 00000000000000c0 > [ 655.628268] FS: 00007fb54413b880(0000) GS:ffff88023fcc0000(0000) knlGS:0000000000000000 > [ 655.629561] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 655.630289] CR2: 00007fb5439dc5c0 CR3: 000000023211d000 CR4: 00000000003406e0 > [ 655.631268] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 655.632281] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [ 655.633318] Call Trace: > [ 655.633696] copy_page_to_iter_iovec+0x9c/0x180 > [ 655.634351] copy_page_to_iter+0x22/0x160 > [ 655.634943] skb_copy_datagram_iter+0x157/0x260 > [ 655.635604] packet_recvmsg+0xcb/0x460 > [ 655.636156] ? selinux_socket_recvmsg+0x17/0x20 > [ 655.636816] sock_recvmsg+0x3d/0x50 > [ 655.637330] ___sys_recvmsg+0xd7/0x1f0 > [ 655.637892] ? kvm_clock_get_cycles+0x1e/0x20 > [ 655.638533] ? ktime_get_ts64+0x49/0xf0 > [ 655.639101] ? _copy_to_user+0x26/0x40 > [ 655.639657] __sys_recvmsg+0x51/0x90 > [ 655.640184] SyS_recvmsg+0x12/0x20 > [ 655.640696] entry_SYSCALL_64_fastpath+0x1a/0xa5 > -------------------------------------------------------------------------------------------------------------------------------------------- > The analysis makes sense. Kees, any thoughts about what we might do? It seems unlikely we can fix the networking code so do we need some kind of override in usercopy? Thanks, Laura ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: HARDENED_USERCOPY will BUG on multiple slub objects coalesced into an sk_buff fragment 2018-06-01 19:02 ` Laura Abbott @ 2018-06-01 20:49 ` Kees Cook 2018-06-01 20:58 ` Matthew Wilcox 0 siblings, 1 reply; 10+ messages in thread From: Kees Cook @ 2018-06-01 20:49 UTC (permalink / raw) To: Laura Abbott; +Cc: Anton Eidelman, Linux-MM, linux-hardened On Fri, Jun 1, 2018 at 12:02 PM, Laura Abbott <labbott@redhat.com> wrote: > (cc-ing some interested people) > > > > On 05/31/2018 05:03 PM, Anton Eidelman wrote: >> Here's a rare issue I reproduce on 4.12.10 (centos config): full log >> sample below. Thanks for digging into this! Do you have any specific reproducer for this? If so, I'd love to try a bisection, as I'm surprised this has only now surfaced: hardened usercopy was introduced in 4.8 ... >> An innocent process (dhcpclient) is about to receive a datagram, but >> during skb_copy_datagram_iter() usercopy triggers a BUG in: >> usercopy.c:check_heap_object() -> slub.c:__check_heap_object(), because >> the sk_buff fragment being copied crosses the 64-byte slub object boundary. >> >> Example __check_heap_object() context: >> n=128 << usually 128, sometimes 192. >> object_size=64 >> s->size=64 >> page_address(page)=0xffff880233f7c000 >> ptr=0xffff880233f7c540 >> >> My take on the root cause: >> When adding data to an skb, new data is appended to the current >> fragment if the new chunk immediately follows the last one: by simply >> increasing the frag->size, skb_frag_size_add(). >> See include/linux/skbuff.h:skb_can_coalesce() callers. Oooh, sneaky: return page == skb_frag_page(frag) && off == frag->page_offset + skb_frag_size(frag); Originally I was thinking that slab red-zoning would get triggered too, but I see the above is checking to see if these are precisely neighboring allocations, I think. But then ... how does freeing actually work? I'm really not sure how this seeming layering violation could be safe in other areas? > The analysis makes sense. Kees, any thoughts about what > we might do? It seems unlikely we can fix the networking > code so do we need some kind of override in usercopy? If this really is safe against kfree(), then I'd like to find the logic that makes it safe and either teach skb_can_coalesce() different rules (i.e. do not cross slab objects) or teach __check_heap_object() about skb... which seems worse. Wheee. -Kees -- Kees Cook Pixel Security ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: HARDENED_USERCOPY will BUG on multiple slub objects coalesced into an sk_buff fragment 2018-06-01 20:49 ` Kees Cook @ 2018-06-01 20:58 ` Matthew Wilcox 2018-06-01 21:55 ` Kees Cook 2018-06-05 14:51 ` Christopher Lameter 0 siblings, 2 replies; 10+ messages in thread From: Matthew Wilcox @ 2018-06-01 20:58 UTC (permalink / raw) To: Kees Cook; +Cc: Laura Abbott, Anton Eidelman, Linux-MM, linux-hardened On Fri, Jun 01, 2018 at 01:49:38PM -0700, Kees Cook wrote: > On Fri, Jun 1, 2018 at 12:02 PM, Laura Abbott <labbott@redhat.com> wrote: > > (cc-ing some interested people) > > > > > > > > On 05/31/2018 05:03 PM, Anton Eidelman wrote: > >> Here's a rare issue I reproduce on 4.12.10 (centos config): full log > >> sample below. > > Thanks for digging into this! Do you have any specific reproducer for > this? If so, I'd love to try a bisection, as I'm surprised this has > only now surfaced: hardened usercopy was introduced in 4.8 ... > > >> An innocent process (dhcpclient) is about to receive a datagram, but > >> during skb_copy_datagram_iter() usercopy triggers a BUG in: > >> usercopy.c:check_heap_object() -> slub.c:__check_heap_object(), because > >> the sk_buff fragment being copied crosses the 64-byte slub object boundary. > >> > >> Example __check_heap_object() context: > >> n=128 << usually 128, sometimes 192. > >> object_size=64 > >> s->size=64 > >> page_address(page)=0xffff880233f7c000 > >> ptr=0xffff880233f7c540 > >> > >> My take on the root cause: > >> When adding data to an skb, new data is appended to the current > >> fragment if the new chunk immediately follows the last one: by simply > >> increasing the frag->size, skb_frag_size_add(). > >> See include/linux/skbuff.h:skb_can_coalesce() callers. > > Oooh, sneaky: > return page == skb_frag_page(frag) && > off == frag->page_offset + skb_frag_size(frag); > > Originally I was thinking that slab red-zoning would get triggered > too, but I see the above is checking to see if these are precisely > neighboring allocations, I think. > > But then ... how does freeing actually work? I'm really not sure how > this seeming layering violation could be safe in other areas? I'm confused ... I thought skb frags came from the page_frag allocator, not the slab allocator. But then why would the slab hardening trigger? > > The analysis makes sense. Kees, any thoughts about what > > we might do? It seems unlikely we can fix the networking > > code so do we need some kind of override in usercopy? > > If this really is safe against kfree(), then I'd like to find the > logic that makes it safe and either teach skb_can_coalesce() different > rules (i.e. do not cross slab objects) or teach __check_heap_object() > about skb... which seems worse. Wheee. > > -Kees > > -- > Kees Cook > Pixel Security > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: HARDENED_USERCOPY will BUG on multiple slub objects coalesced into an sk_buff fragment 2018-06-01 20:58 ` Matthew Wilcox @ 2018-06-01 21:55 ` Kees Cook 2018-06-01 23:34 ` Anton Eidelman 2018-06-05 14:51 ` Christopher Lameter 1 sibling, 1 reply; 10+ messages in thread From: Kees Cook @ 2018-06-01 21:55 UTC (permalink / raw) To: Matthew Wilcox; +Cc: Laura Abbott, Anton Eidelman, Linux-MM, linux-hardened On Fri, Jun 1, 2018 at 1:58 PM, Matthew Wilcox <willy@infradead.org> wrote: > On Fri, Jun 01, 2018 at 01:49:38PM -0700, Kees Cook wrote: >> On Fri, Jun 1, 2018 at 12:02 PM, Laura Abbott <labbott@redhat.com> wrote: >> > (cc-ing some interested people) >> > >> > >> > >> > On 05/31/2018 05:03 PM, Anton Eidelman wrote: >> >> Here's a rare issue I reproduce on 4.12.10 (centos config): full log >> >> sample below. >> >> Thanks for digging into this! Do you have any specific reproducer for >> this? If so, I'd love to try a bisection, as I'm surprised this has >> only now surfaced: hardened usercopy was introduced in 4.8 ... >> >> >> An innocent process (dhcpclient) is about to receive a datagram, but >> >> during skb_copy_datagram_iter() usercopy triggers a BUG in: >> >> usercopy.c:check_heap_object() -> slub.c:__check_heap_object(), because >> >> the sk_buff fragment being copied crosses the 64-byte slub object boundary. >> >> >> >> Example __check_heap_object() context: >> >> n=128 << usually 128, sometimes 192. >> >> object_size=64 >> >> s->size=64 >> >> page_address(page)=0xffff880233f7c000 >> >> ptr=0xffff880233f7c540 >> >> >> >> My take on the root cause: >> >> When adding data to an skb, new data is appended to the current >> >> fragment if the new chunk immediately follows the last one: by simply >> >> increasing the frag->size, skb_frag_size_add(). >> >> See include/linux/skbuff.h:skb_can_coalesce() callers. >> >> Oooh, sneaky: >> return page == skb_frag_page(frag) && >> off == frag->page_offset + skb_frag_size(frag); >> >> Originally I was thinking that slab red-zoning would get triggered >> too, but I see the above is checking to see if these are precisely >> neighboring allocations, I think. >> >> But then ... how does freeing actually work? I'm really not sure how >> this seeming layering violation could be safe in other areas? > > I'm confused ... I thought skb frags came from the page_frag allocator, > not the slab allocator. But then why would the slab hardening trigger? Well that would certainly make more sense (well, the sense about alloc/free). Having it overlap with a slab allocation, though, that's quite bad. Perhaps this is a very odd use-after-free case? I.e. freed page got allocated to slab, and when it got copied out, usercopy found it spanned a slub object? [ 655.602500] usercopy: kernel memory exposure attempt detected from ffff88022a31aa00 (kmalloc-64) (192 bytes) This wouldn't be the first time usercopy triggered due to a memory corruption... -Kees -- Kees Cook Pixel Security ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: HARDENED_USERCOPY will BUG on multiple slub objects coalesced into an sk_buff fragment 2018-06-01 21:55 ` Kees Cook @ 2018-06-01 23:34 ` Anton Eidelman 2018-06-05 15:27 ` Christopher Lameter 0 siblings, 1 reply; 10+ messages in thread From: Anton Eidelman @ 2018-06-01 23:34 UTC (permalink / raw) To: Kees Cook; +Cc: Matthew Wilcox, Laura Abbott, Linux-MM, linux-hardened [-- Attachment #1: Type: text/plain, Size: 5401 bytes --] Hi all, I do not have a way of reproducing this decent enough to recommend: I'll keep digging. The page belongs to a slub when the fragment is being constructed in __skb_fill_page_desc(), see the instrumentation I used below. When usercopy triggers, .coals shows values of 2/3 for 128/192 bytes respectively. The question is how the RX sk_buff ends up having data fragment in a PageSlab page. Some network drivers use netdev_alloc_frag() so pages indeed come from page_frag allocator. Others (mellanox, intel) just alloc_page() when filling their RX descriptors. In both cases the pages will be refcounted properly. I suspect my kernel TCP traffic that uses kernel_sendpage() for bio pages AND slub pages. Thanks a lot! Anton diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index a098d95..7cd744c 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -40,6 +40,7 @@ #include <linux/in6.h> #include <linux/if_packet.h> #include <net/flow.h> +#include <linux/slub_def.h> /* The interface for checksum offload between the stack and networking drivers * is as follows... @@ -316,7 +317,8 @@ struct skb_frag_struct { } page; #if (BITS_PER_LONG > 32) || (PAGE_SIZE >= 65536) __u32 page_offset; - __u32 size; + __u16 size; + __u16 coals; #else __u16 page_offset; __u16 size; @@ -1850,9 +1852,11 @@ static inline void __skb_fill_page_desc(struct sk_buff *skb, int i, */ frag->page.p = page; frag->page_offset = off; + frag->coals = 0; skb_frag_size_set(frag, size); page = compound_head(page); + *WARN_ON(PageSlab(page) && (page->slab_cache->size < size)); // does NOT trigger* if (page_is_pfmemalloc(page)) skb->pfmemalloc = true; } @@ -2849,10 +2853,14 @@ static inline bool skb_can_coalesce(struct sk_buff *skb, int i, const struct page *page, int off) { if (i) { - const struct skb_frag_struct *frag = &skb_shinfo(skb)->frags[i - 1]; + struct skb_frag_struct *frag = &skb_shinfo(skb)->frags[i - 1]; - return page == skb_frag_page(frag) && + bool ret = page == skb_frag_page(frag) && off == frag->page_offset + skb_frag_size(frag); + if (unlikely(ret)) *+ if (PageSlab(compound_head((struct page *)page)))* *+ frag->coals++;* + return ret; } return false; } On Fri, Jun 1, 2018 at 2:55 PM, Kees Cook <keescook@chromium.org> wrote: > On Fri, Jun 1, 2018 at 1:58 PM, Matthew Wilcox <willy@infradead.org> > wrote: > > On Fri, Jun 01, 2018 at 01:49:38PM -0700, Kees Cook wrote: > >> On Fri, Jun 1, 2018 at 12:02 PM, Laura Abbott <labbott@redhat.com> > wrote: > >> > (cc-ing some interested people) > >> > > >> > > >> > > >> > On 05/31/2018 05:03 PM, Anton Eidelman wrote: > >> >> Here's a rare issue I reproduce on 4.12.10 (centos config): full log > >> >> sample below. > >> > >> Thanks for digging into this! Do you have any specific reproducer for > >> this? If so, I'd love to try a bisection, as I'm surprised this has > >> only now surfaced: hardened usercopy was introduced in 4.8 ... > >> > >> >> An innocent process (dhcpclient) is about to receive a datagram, but > >> >> during skb_copy_datagram_iter() usercopy triggers a BUG in: > >> >> usercopy.c:check_heap_object() -> slub.c:__check_heap_object(), > because > >> >> the sk_buff fragment being copied crosses the 64-byte slub object > boundary. > >> >> > >> >> Example __check_heap_object() context: > >> >> n=128 << usually 128, sometimes 192. > >> >> object_size=64 > >> >> s->size=64 > >> >> page_address(page)=0xffff880233f7c000 > >> >> ptr=0xffff880233f7c540 > >> >> > >> >> My take on the root cause: > >> >> When adding data to an skb, new data is appended to the current > >> >> fragment if the new chunk immediately follows the last one: by simply > >> >> increasing the frag->size, skb_frag_size_add(). > >> >> See include/linux/skbuff.h:skb_can_coalesce() callers. > >> > >> Oooh, sneaky: > >> return page == skb_frag_page(frag) && > >> off == frag->page_offset + skb_frag_size(frag); > >> > >> Originally I was thinking that slab red-zoning would get triggered > >> too, but I see the above is checking to see if these are precisely > >> neighboring allocations, I think. > >> > >> But then ... how does freeing actually work? I'm really not sure how > >> this seeming layering violation could be safe in other areas? > > > > I'm confused ... I thought skb frags came from the page_frag allocator, > > not the slab allocator. But then why would the slab hardening trigger? > > Well that would certainly make more sense (well, the sense about > alloc/free). Having it overlap with a slab allocation, though, that's > quite bad. Perhaps this is a very odd use-after-free case? I.e. freed > page got allocated to slab, and when it got copied out, usercopy found > it spanned a slub object? > > [ 655.602500] usercopy: kernel memory exposure attempt detected from > ffff88022a31aa00 (kmalloc-64) (192 bytes) > > This wouldn't be the first time usercopy triggered due to a memory > corruption... > > -Kees > > -- > Kees Cook > Pixel Security > [-- Attachment #2: Type: text/html, Size: 11295 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: HARDENED_USERCOPY will BUG on multiple slub objects coalesced into an sk_buff fragment 2018-06-01 23:34 ` Anton Eidelman @ 2018-06-05 15:27 ` Christopher Lameter 2018-06-05 20:45 ` Anton Eidelman 0 siblings, 1 reply; 10+ messages in thread From: Christopher Lameter @ 2018-06-05 15:27 UTC (permalink / raw) To: Anton Eidelman Cc: Kees Cook, Matthew Wilcox, Laura Abbott, Linux-MM, linux-hardened On Fri, 1 Jun 2018, Anton Eidelman wrote: > I do not have a way of reproducing this decent enough to recommend: I'll > keep digging. If you can reproduce it: Could you try the following patch? Subject: [NET] Fix false positives of skb_can_coalesce Skb fragments may be slab objects. Two slab objects may reside in the same slab page. In that case skb_can_coalesce() may return true althought the skb cannot be expanded because it would cross a slab boundary. Enabling slab debugging will avoid the issue since red zones will be inserted and thus the skb_can_coalesce() check will not detect neighboring objects and return false. Signed-off-by: Christoph Lameter <cl@linux.com> Index: linux/include/linux/skbuff.h =================================================================== --- linux.orig/include/linux/skbuff.h +++ linux/include/linux/skbuff.h @@ -3010,8 +3010,29 @@ static inline bool skb_can_coalesce(stru if (i) { const struct skb_frag_struct *frag = &skb_shinfo(skb)->frags[i - 1]; - return page == skb_frag_page(frag) && - off == frag->page_offset + skb_frag_size(frag); + if (page != skb_frag_page(frag)) + return false; + + if (off != frag->page_offset + skb_frag_size(frag)) + return false; + + /* + * This may be a slab page and we may have pointers + * to different slab objects in the same page + */ + if (!PageSlab(skb_frag_page(frag))) + return true; + + /* + * We could still return true if we would check here + * if the two fragments are within the same + * slab object. But that is complicated and + * I guess we would need a new slab function + * to check if two pointers are within the same + * object. + */ + return false; + } return false; } ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: HARDENED_USERCOPY will BUG on multiple slub objects coalesced into an sk_buff fragment 2018-06-05 15:27 ` Christopher Lameter @ 2018-06-05 20:45 ` Anton Eidelman 2018-06-05 21:43 ` Christopher Lameter 0 siblings, 1 reply; 10+ messages in thread From: Anton Eidelman @ 2018-06-05 20:45 UTC (permalink / raw) To: Christopher Lameter Cc: Kees Cook, Matthew Wilcox, Laura Abbott, Linux-MM, linux-hardened [-- Attachment #1: Type: text/plain, Size: 3343 bytes --] Hi Christopher, This eliminates the failure as expected (at the source). I do not think such a solution is required, and it probably affect performance. As Matthew said, slab objects should not be used in sk_buff fragments. The source of these is my kernel TCP sockets, where kernel_sendpage() is used with slab payload. I eliminated this, and the failure disappeared, even though with this kind of fine timing issues, no failure does not mean anything Moreover, I tried triggering on slab in sk_buff fragments and nothing came up. So far: 1) Use of slab payload in kernel_sendpage() is not polite, even though we do not BUG on this and documentation does not tell it was just wrong. 2) RX path cannot bring sk_buffs in slab: drivers use alloc_pagexxx or page_frag_alloc(). What I am still wondering about (and investigating), is how kernel_sendpage() with slab payload results in slab payload on another socket RX. Do you see how page ref-counting can be broken with extra references taken on a slab page containing the fragments, and dropped when networking is done with them? Thanks, Anton On Tue, Jun 5, 2018 at 8:27 AM, Christopher Lameter <cl@linux.com> wrote: > On Fri, 1 Jun 2018, Anton Eidelman wrote: > > > I do not have a way of reproducing this decent enough to recommend: I'll > > keep digging. > > If you can reproduce it: Could you try the following patch? > > > > Subject: [NET] Fix false positives of skb_can_coalesce > > Skb fragments may be slab objects. Two slab objects may reside > in the same slab page. In that case skb_can_coalesce() may return > true althought the skb cannot be expanded because it would > cross a slab boundary. > > Enabling slab debugging will avoid the issue since red zones will > be inserted and thus the skb_can_coalesce() check will not detect > neighboring objects and return false. > > Signed-off-by: Christoph Lameter <cl@linux.com> > > Index: linux/include/linux/skbuff.h > =================================================================== > --- linux.orig/include/linux/skbuff.h > +++ linux/include/linux/skbuff.h > @@ -3010,8 +3010,29 @@ static inline bool skb_can_coalesce(stru > if (i) { > const struct skb_frag_struct *frag = > &skb_shinfo(skb)->frags[i - 1]; > > - return page == skb_frag_page(frag) && > - off == frag->page_offset + skb_frag_size(frag); > + if (page != skb_frag_page(frag)) > + return false; > + > + if (off != frag->page_offset + skb_frag_size(frag)) > + return false; > + > + /* > + * This may be a slab page and we may have pointers > + * to different slab objects in the same page > + */ > + if (!PageSlab(skb_frag_page(frag))) > + return true; > + > + /* > + * We could still return true if we would check here > + * if the two fragments are within the same > + * slab object. But that is complicated and > + * I guess we would need a new slab function > + * to check if two pointers are within the same > + * object. > + */ > + return false; > + > } > return false; > } > [-- Attachment #2: Type: text/html, Size: 5611 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: HARDENED_USERCOPY will BUG on multiple slub objects coalesced into an sk_buff fragment 2018-06-05 20:45 ` Anton Eidelman @ 2018-06-05 21:43 ` Christopher Lameter 0 siblings, 0 replies; 10+ messages in thread From: Christopher Lameter @ 2018-06-05 21:43 UTC (permalink / raw) To: Anton Eidelman Cc: Kees Cook, Matthew Wilcox, Laura Abbott, Linux-MM, linux-hardened On Tue, 5 Jun 2018, Anton Eidelman wrote: > What I am still wondering about (and investigating), is how kernel_sendpage() > with slab payload results in slab payload on another socket RX. > Do you see how page ref-counting can be broken with extra references taken > on a slab page containing the fragments, and dropped when networking is > done with them? The slab allocators do not use page refcounting. The objects may be destroyed via poisioning etc if you use kfree() while still holding a refcount on the page. Even without poisoning the slab allocator may overwrite the object. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: HARDENED_USERCOPY will BUG on multiple slub objects coalesced into an sk_buff fragment 2018-06-01 20:58 ` Matthew Wilcox 2018-06-01 21:55 ` Kees Cook @ 2018-06-05 14:51 ` Christopher Lameter 1 sibling, 0 replies; 10+ messages in thread From: Christopher Lameter @ 2018-06-05 14:51 UTC (permalink / raw) To: Matthew Wilcox Cc: Kees Cook, Laura Abbott, Anton Eidelman, Linux-MM, linux-hardened On Fri, 1 Jun 2018, Matthew Wilcox wrote: > > >> My take on the root cause: > > >> When adding data to an skb, new data is appended to the current > > >> fragment if the new chunk immediately follows the last one: by simply > > >> increasing the frag->size, skb_frag_size_add(). > > >> See include/linux/skbuff.h:skb_can_coalesce() callers. > > > > Oooh, sneaky: > > return page == skb_frag_page(frag) && > > off == frag->page_offset + skb_frag_size(frag); > > > > Originally I was thinking that slab red-zoning would get triggered > > too, but I see the above is checking to see if these are precisely > > neighboring allocations, I think. > > > > But then ... how does freeing actually work? I'm really not sure how > > this seeming layering violation could be safe in other areas? So if there are two neighboring slab objects that the page struct addresses will match and the network code will coalesce the objects even if they are in two different slab objects? The check in skb_can_coalesce() must verify that these are distinct slab object. Simple thing would be to return false if one object is a slab object but then the coalescing would not work in a single slab object either. So what needs to happen is that we need to check if this is 1) A Page. Then the proper length of the segment within we can coalesce is the page size. 2) A slab page. Then we can use ksize() to establish the end of the slab object and we should only coalesce within that boundary. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2018-06-05 21:43 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-06-01 0:03 HARDENED_USERCOPY will BUG on multiple slub objects coalesced into an sk_buff fragment Anton Eidelman 2018-06-01 19:02 ` Laura Abbott 2018-06-01 20:49 ` Kees Cook 2018-06-01 20:58 ` Matthew Wilcox 2018-06-01 21:55 ` Kees Cook 2018-06-01 23:34 ` Anton Eidelman 2018-06-05 15:27 ` Christopher Lameter 2018-06-05 20:45 ` Anton Eidelman 2018-06-05 21:43 ` Christopher Lameter 2018-06-05 14:51 ` Christopher Lameter
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox