HARDENED_USERCOPY will BUG on multiple slub objects coalesced into an sk

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* HARDENED_USERCOPY will BUG on multiple slub objects coalesced into an sk_buff fragment
@ 2018-06-01  0:03 Anton Eidelman
  2018-06-01 19:02 ` Laura Abbott
  0 siblings, 1 reply; 10+ messages in thread
From: Anton Eidelman @ 2018-06-01  0:03 UTC (permalink / raw)
  To: linux-mm

[-- Attachment #1: Type: text/plain, Size: 4160 bytes --]

Hello,

Here's a rare issue I reproduce on 4.12.10 (centos config): full log sample
below.
An innocent process (dhcpclient) is about to receive a datagram, but
during skb_copy_datagram_iter() usercopy triggers a BUG in:
usercopy.c:check_heap_object() -> slub.c:__check_heap_object(), because the
sk_buff fragment being copied crosses the 64-byte slub object boundary.

Example __check_heap_object() context:
  n=128    << usually 128, sometimes 192.
  object_size=64
  s->size=64
  page_address(page)=0xffff880233f7c000
  ptr=0xffff880233f7c540

My take on the root cause:
  When adding data to an skb, new data is appended to the current fragment
if the new chunk immediately follows the last one: by simply increasing the
frag->size, skb_frag_size_add().
  See include/linux/skbuff.h:skb_can_coalesce() callers.
  This happens very frequently for kmem_cache objects (slub/slab) with
intensive kernel level TCP traffic, and produces sk_buff fragments that
span multiple kmem_cache objects.
  However, if the same happens to receive data intended for user space,
usercopy triggers a BUG.
  This is quite rare but possible: fails after 5-60min of network traffic
(needs some unfortunate timing, e.g. only on QEMU, without
CONFIG_SLUB_DEBUG_ON etc).
  I used an instrumentation that counts coalesced chunks in the fragment,
in order to confirm that the failing fragment was legally constructed from
multiple slub objects.

On 4.17.0.rc3:
  I could not reproduce the issue with the latest kernel, but the changes
in usercopy.c and slub.c since 4.12 do not address the issue.
  Moreover, it would be quite hard to do without effectively disabling the
heap protection.
  However, looking at the recent changes in include/linux/sk_buff.h I
see skb_zcopy() that yields negative skb_can_coalesce() and may have masked
the problem.

Please, let me know what do you think?
4.12.10 is the centos official kernel with
CONFIG_HARDENED_USERCOPY enabled: if the problem is real we better have an
erratum for it.

Regards,
Anton Eidelman

[ 655.602500] usercopy: kernel memory exposure attempt detected from
ffff88022a31aa00 *(kmalloc-64) (192 bytes*)
[ 655.604254] ----------[ cut here ]----------
[ 655.604877] kernel BUG at mm/usercopy.c:72!
[ 655.606302] invalid opcode: 0000 1 SMP
[ 655.618390] CPU: 3 PID: 2335 Comm: dhclient Tainted: G O
4.12.10-1.el7.elrepo.x86_64 #1
[ 655.619666] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.10.2-1ubuntu1 04/01/2014
[ 655.620926] task: ffff880229ab2d80 task.stack: ffffc90001198000
[ 655.621786] RIP: 0010:__check_object_size+0x74/0x190
[ 655.622489] RSP: 0018:ffffc9000119bbb8 EFLAGS: 00010246
[ 655.623236] RAX: 0000000000000060 RBX: ffff88022a31aa00 RCX:
0000000000000000
[ 655.624234] RDX: 0000000000000000 RSI: ffff88023fcce108 RDI:
ffff88023fcce108
[ 655.625237] RBP: ffffc9000119bbd8 R08: 00000000fffffffe R09:
0000000000000271
[ 655.626248] R10: 0000000000000005 R11: 0000000000000270 R12:
00000000000000c0
[ 655.627256] R13: ffff88022a31aac0 R14: 0000000000000001 R15:
00000000000000c0
[ 655.628268] FS: 00007fb54413b880(0000) GS:ffff88023fcc0000(0000)
knlGS:0000000000000000
[ 655.629561] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 655.630289] CR2: 00007fb5439dc5c0 CR3: 000000023211d000 CR4:
00000000003406e0
[ 655.631268] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 655.632281] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 655.633318] Call Trace:
[ 655.633696] copy_page_to_iter_iovec+0x9c/0x180
[ 655.634351] copy_page_to_iter+0x22/0x160
[ 655.634943] skb_copy_datagram_iter+0x157/0x260
[ 655.635604] packet_recvmsg+0xcb/0x460
[ 655.636156] ? selinux_socket_recvmsg+0x17/0x20
[ 655.636816] sock_recvmsg+0x3d/0x50
[ 655.637330] ___sys_recvmsg+0xd7/0x1f0
[ 655.637892] ? kvm_clock_get_cycles+0x1e/0x20
[ 655.638533] ? ktime_get_ts64+0x49/0xf0
[ 655.639101] ? _copy_to_user+0x26/0x40
[ 655.639657] __sys_recvmsg+0x51/0x90
[ 655.640184] SyS_recvmsg+0x12/0x20
[ 655.640696] entry_SYSCALL_64_fastpath+0x1a/0xa5
--------------------------------------------------------------------------------------------------------------------------------------------

[-- Attachment #2: Type: text/html, Size: 43005 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: HARDENED_USERCOPY will BUG on multiple slub objects coalesced into an sk_buff fragment
  2018-06-01  0:03 HARDENED_USERCOPY will BUG on multiple slub objects coalesced into an sk_buff fragment Anton Eidelman
@ 2018-06-01 19:02 ` Laura Abbott
  2018-06-01 20:49   ` Kees Cook
  0 siblings, 1 reply; 10+ messages in thread
From: Laura Abbott @ 2018-06-01 19:02 UTC (permalink / raw)
  To: Anton Eidelman, linux-mm, Kees Cook; +Cc: linux-hardened

(cc-ing some interested people)


On 05/31/2018 05:03 PM, Anton Eidelman wrote:
> Hello,
> 
> Here's a rare issue I reproduce on 4.12.10 (centos config): full log sample below.
> An innocent process (dhcpclient) is about to receive a datagram, but duringA skb_copy_datagram_iter() usercopy triggers a BUG in:
> usercopy.c:check_heap_object() -> slub.c:__check_heap_object(), because the sk_buff fragment being copied crosses the 64-byte slub object boundary.
> 
> Example __check_heap_object() context:
>  A  n=128A  A  << usually 128, sometimes 192.
>  A  object_size=64
>  A  s->size=64
>  A  page_address(page)=0xffff880233f7c000
>  A  ptr=0xffff880233f7c540
> 
> My take on the root cause:
>  A  When adding data to an skb, new data is appended to the current fragment if the new chunk immediately follows the last one: by simply increasing the frag->size, skb_frag_size_add().
>  A  See include/linux/skbuff.h:skb_can_coalesce() callers.
>  A  This happens very frequently for kmem_cache objects (slub/slab) with intensive kernel level TCP traffic, and produces sk_buff fragments that span multiple kmem_cache objects.
>  A  However, if the same happens to receive data intended for user space, usercopy triggers a BUG.
>  A  This is quite rare but possible: fails after 5-60min of network traffic (needs some unfortunate timing, e.g. only on QEMU, without CONFIG_SLUB_DEBUG_ON etc).
>  A  I used an instrumentation that counts coalesced chunks in the fragment, in order to confirm that the failing fragment was legally constructed from multiple slub objects.
> 
> On 4.17.0.rc3:
>  A  I could not reproduce the issue with the latest kernel, but the changes in usercopy.c and slub.c since 4.12 do not address the issue.
>  A  Moreover, it would be quite hard to do without effectively disabling the heap protection.
>  A  However, looking at the recent changes in include/linux/sk_buff.h I seeA skb_zcopy() that yields negative skb_can_coalesce()A and may have masked the problem.
> 
> Please, let me know what do you think?
> 4.12.10 is the centos official kernel with CONFIG_HARDENED_USERCOPYA enabled: if the problem is real we better have an erratum for it.
> 
> Regards,
> Anton Eidelman
> 
> 
> [ 655.602500] usercopy: kernel memory exposure attempt detected from ffff88022a31aa00 *(kmalloc-64) (192 bytes*)
> [ 655.604254] ----------[ cut here ]----------
> [ 655.604877] kernel BUG at mm/usercopy.c:72!
> [ 655.606302] invalid opcode: 0000 1 SMP
> [ 655.618390] CPU: 3 PID: 2335 Comm: dhclient Tainted: G O 4.12.10-1.el7.elrepo.x86_64 #1
> [ 655.619666] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
> [ 655.620926] task: ffff880229ab2d80 task.stack: ffffc90001198000
> [ 655.621786] RIP: 0010:__check_object_size+0x74/0x190
> [ 655.622489] RSP: 0018:ffffc9000119bbb8 EFLAGS: 00010246
> [ 655.623236] RAX: 0000000000000060 RBX: ffff88022a31aa00 RCX: 0000000000000000
> [ 655.624234] RDX: 0000000000000000 RSI: ffff88023fcce108 RDI: ffff88023fcce108
> [ 655.625237] RBP: ffffc9000119bbd8 R08: 00000000fffffffe R09: 0000000000000271
> [ 655.626248] R10: 0000000000000005 R11: 0000000000000270 R12: 00000000000000c0
> [ 655.627256] R13: ffff88022a31aac0 R14: 0000000000000001 R15: 00000000000000c0
> [ 655.628268] FS: 00007fb54413b880(0000) GS:ffff88023fcc0000(0000) knlGS:0000000000000000
> [ 655.629561] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 655.630289] CR2: 00007fb5439dc5c0 CR3: 000000023211d000 CR4: 00000000003406e0
> [ 655.631268] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 655.632281] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 655.633318] Call Trace:
> [ 655.633696] copy_page_to_iter_iovec+0x9c/0x180
> [ 655.634351] copy_page_to_iter+0x22/0x160
> [ 655.634943] skb_copy_datagram_iter+0x157/0x260
> [ 655.635604] packet_recvmsg+0xcb/0x460
> [ 655.636156] ? selinux_socket_recvmsg+0x17/0x20
> [ 655.636816] sock_recvmsg+0x3d/0x50
> [ 655.637330] ___sys_recvmsg+0xd7/0x1f0
> [ 655.637892] ? kvm_clock_get_cycles+0x1e/0x20
> [ 655.638533] ? ktime_get_ts64+0x49/0xf0
> [ 655.639101] ? _copy_to_user+0x26/0x40
> [ 655.639657] __sys_recvmsg+0x51/0x90
> [ 655.640184] SyS_recvmsg+0x12/0x20
> [ 655.640696] entry_SYSCALL_64_fastpath+0x1a/0xa5
> --------------------------------------------------------------------------------------------------------------------------------------------
> 

The analysis makes sense. Kees, any thoughts about what
we might do? It seems unlikely we can fix the networking
code so do we need some kind of override in usercopy?

Thanks,
Laura

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: HARDENED_USERCOPY will BUG on multiple slub objects coalesced into an sk_buff fragment
  2018-06-01 19:02 ` Laura Abbott
@ 2018-06-01 20:49   ` Kees Cook
  2018-06-01 20:58     ` Matthew Wilcox
  0 siblings, 1 reply; 10+ messages in thread
From: Kees Cook @ 2018-06-01 20:49 UTC (permalink / raw)
  To: Laura Abbott; +Cc: Anton Eidelman, Linux-MM, linux-hardened

On Fri, Jun 1, 2018 at 12:02 PM, Laura Abbott <labbott@redhat.com> wrote:
> (cc-ing some interested people)
>
>
>
> On 05/31/2018 05:03 PM, Anton Eidelman wrote:
>> Here's a rare issue I reproduce on 4.12.10 (centos config): full log
>> sample below.

Thanks for digging into this! Do you have any specific reproducer for
this? If so, I'd love to try a bisection, as I'm surprised this has
only now surfaced: hardened usercopy was introduced in 4.8 ...

>> An innocent process (dhcpclient) is about to receive a datagram, but
>> during skb_copy_datagram_iter() usercopy triggers a BUG in:
>> usercopy.c:check_heap_object() -> slub.c:__check_heap_object(), because
>> the sk_buff fragment being copied crosses the 64-byte slub object boundary.
>>
>> Example __check_heap_object() context:
>>    n=128    << usually 128, sometimes 192.
>>    object_size=64
>>    s->size=64
>>    page_address(page)=0xffff880233f7c000
>>    ptr=0xffff880233f7c540
>>
>> My take on the root cause:
>>    When adding data to an skb, new data is appended to the current
>> fragment if the new chunk immediately follows the last one: by simply
>> increasing the frag->size, skb_frag_size_add().
>>    See include/linux/skbuff.h:skb_can_coalesce() callers.

Oooh, sneaky:
                return page == skb_frag_page(frag) &&
                       off == frag->page_offset + skb_frag_size(frag);

Originally I was thinking that slab red-zoning would get triggered
too, but I see the above is checking to see if these are precisely
neighboring allocations, I think.

But then ... how does freeing actually work? I'm really not sure how
this seeming layering violation could be safe in other areas?

> The analysis makes sense. Kees, any thoughts about what
> we might do? It seems unlikely we can fix the networking
> code so do we need some kind of override in usercopy?

If this really is safe against kfree(), then I'd like to find the
logic that makes it safe and either teach skb_can_coalesce() different
rules (i.e. do not cross slab objects) or teach __check_heap_object()
about skb... which seems worse. Wheee.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: HARDENED_USERCOPY will BUG on multiple slub objects coalesced into an sk_buff fragment
  2018-06-01 20:49   ` Kees Cook
@ 2018-06-01 20:58     ` Matthew Wilcox
  2018-06-01 21:55       ` Kees Cook
  2018-06-05 14:51       ` Christopher Lameter
  0 siblings, 2 replies; 10+ messages in thread
From: Matthew Wilcox @ 2018-06-01 20:58 UTC (permalink / raw)
  To: Kees Cook; +Cc: Laura Abbott, Anton Eidelman, Linux-MM, linux-hardened

On Fri, Jun 01, 2018 at 01:49:38PM -0700, Kees Cook wrote:
> On Fri, Jun 1, 2018 at 12:02 PM, Laura Abbott <labbott@redhat.com> wrote:
> > (cc-ing some interested people)
> >
> >
> >
> > On 05/31/2018 05:03 PM, Anton Eidelman wrote:
> >> Here's a rare issue I reproduce on 4.12.10 (centos config): full log
> >> sample below.
> 
> Thanks for digging into this! Do you have any specific reproducer for
> this? If so, I'd love to try a bisection, as I'm surprised this has
> only now surfaced: hardened usercopy was introduced in 4.8 ...
> 
> >> An innocent process (dhcpclient) is about to receive a datagram, but
> >> during skb_copy_datagram_iter() usercopy triggers a BUG in:
> >> usercopy.c:check_heap_object() -> slub.c:__check_heap_object(), because
> >> the sk_buff fragment being copied crosses the 64-byte slub object boundary.
> >>
> >> Example __check_heap_object() context:
> >>    n=128    << usually 128, sometimes 192.
> >>    object_size=64
> >>    s->size=64
> >>    page_address(page)=0xffff880233f7c000
> >>    ptr=0xffff880233f7c540
> >>
> >> My take on the root cause:
> >>    When adding data to an skb, new data is appended to the current
> >> fragment if the new chunk immediately follows the last one: by simply
> >> increasing the frag->size, skb_frag_size_add().
> >>    See include/linux/skbuff.h:skb_can_coalesce() callers.
> 
> Oooh, sneaky:
>                 return page == skb_frag_page(frag) &&
>                        off == frag->page_offset + skb_frag_size(frag);
> 
> Originally I was thinking that slab red-zoning would get triggered
> too, but I see the above is checking to see if these are precisely
> neighboring allocations, I think.
> 
> But then ... how does freeing actually work? I'm really not sure how
> this seeming layering violation could be safe in other areas?

I'm confused ... I thought skb frags came from the page_frag allocator,
not the slab allocator.  But then why would the slab hardening trigger?

> > The analysis makes sense. Kees, any thoughts about what
> > we might do? It seems unlikely we can fix the networking
> > code so do we need some kind of override in usercopy?
> 
> If this really is safe against kfree(), then I'd like to find the
> logic that makes it safe and either teach skb_can_coalesce() different
> rules (i.e. do not cross slab objects) or teach __check_heap_object()
> about skb... which seems worse. Wheee.
> 
> -Kees
> 
> -- 
> Kees Cook
> Pixel Security
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: HARDENED_USERCOPY will BUG on multiple slub objects coalesced into an sk_buff fragment
  2018-06-01 20:58     ` Matthew Wilcox
@ 2018-06-01 21:55       ` Kees Cook
  2018-06-01 23:34         ` Anton Eidelman
  2018-06-05 14:51       ` Christopher Lameter
  1 sibling, 1 reply; 10+ messages in thread
From: Kees Cook @ 2018-06-01 21:55 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Laura Abbott, Anton Eidelman, Linux-MM, linux-hardened

On Fri, Jun 1, 2018 at 1:58 PM, Matthew Wilcox <willy@infradead.org> wrote:
> On Fri, Jun 01, 2018 at 01:49:38PM -0700, Kees Cook wrote:
>> On Fri, Jun 1, 2018 at 12:02 PM, Laura Abbott <labbott@redhat.com> wrote:
>> > (cc-ing some interested people)
>> >
>> >
>> >
>> > On 05/31/2018 05:03 PM, Anton Eidelman wrote:
>> >> Here's a rare issue I reproduce on 4.12.10 (centos config): full log
>> >> sample below.
>>
>> Thanks for digging into this! Do you have any specific reproducer for
>> this? If so, I'd love to try a bisection, as I'm surprised this has
>> only now surfaced: hardened usercopy was introduced in 4.8 ...
>>
>> >> An innocent process (dhcpclient) is about to receive a datagram, but
>> >> during skb_copy_datagram_iter() usercopy triggers a BUG in:
>> >> usercopy.c:check_heap_object() -> slub.c:__check_heap_object(), because
>> >> the sk_buff fragment being copied crosses the 64-byte slub object boundary.
>> >>
>> >> Example __check_heap_object() context:
>> >>    n=128    << usually 128, sometimes 192.
>> >>    object_size=64
>> >>    s->size=64
>> >>    page_address(page)=0xffff880233f7c000
>> >>    ptr=0xffff880233f7c540
>> >>
>> >> My take on the root cause:
>> >>    When adding data to an skb, new data is appended to the current
>> >> fragment if the new chunk immediately follows the last one: by simply
>> >> increasing the frag->size, skb_frag_size_add().
>> >>    See include/linux/skbuff.h:skb_can_coalesce() callers.
>>
>> Oooh, sneaky:
>>                 return page == skb_frag_page(frag) &&
>>                        off == frag->page_offset + skb_frag_size(frag);
>>
>> Originally I was thinking that slab red-zoning would get triggered
>> too, but I see the above is checking to see if these are precisely
>> neighboring allocations, I think.
>>
>> But then ... how does freeing actually work? I'm really not sure how
>> this seeming layering violation could be safe in other areas?
>
> I'm confused ... I thought skb frags came from the page_frag allocator,
> not the slab allocator.  But then why would the slab hardening trigger?

Well that would certainly make more sense (well, the sense about
alloc/free). Having it overlap with a slab allocation, though, that's
quite bad. Perhaps this is a very odd use-after-free case? I.e. freed
page got allocated to slab, and when it got copied out, usercopy found
it spanned a slub object?

[ 655.602500] usercopy: kernel memory exposure attempt detected from
ffff88022a31aa00 (kmalloc-64) (192 bytes)

This wouldn't be the first time usercopy triggered due to a memory corruption...

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: HARDENED_USERCOPY will BUG on multiple slub objects coalesced into an sk_buff fragment
  2018-06-01 21:55       ` Kees Cook
@ 2018-06-01 23:34         ` Anton Eidelman
  2018-06-05 15:27           ` Christopher Lameter
  0 siblings, 1 reply; 10+ messages in thread
From: Anton Eidelman @ 2018-06-01 23:34 UTC (permalink / raw)
  To: Kees Cook; +Cc: Matthew Wilcox, Laura Abbott, Linux-MM, linux-hardened

[-- Attachment #1: Type: text/plain, Size: 5401 bytes --]

Hi all,

I do not have a way of reproducing this decent enough to recommend: I'll
keep digging.

The page belongs to a slub when the fragment is being constructed in
__skb_fill_page_desc(), see the instrumentation I used below.
When usercopy triggers, .coals shows values of 2/3 for 128/192 bytes
respectively.

The question is how the RX sk_buff ends up having data fragment in a
PageSlab page.
Some network drivers use netdev_alloc_frag() so pages indeed come
from page_frag allocator.
Others (mellanox, intel) just alloc_page() when filling their RX
descriptors.
In both cases the pages will be refcounted properly.

I suspect my kernel TCP traffic that uses kernel_sendpage() for bio pages
AND slub pages.

Thanks a lot!
Anton
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index a098d95..7cd744c 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -40,6 +40,7 @@
 #include <linux/in6.h>
 #include <linux/if_packet.h>
 #include <net/flow.h>
+#include <linux/slub_def.h>

 /* The interface for checksum offload between the stack and networking
drivers
  * is as follows...
@@ -316,7 +317,8 @@ struct skb_frag_struct {
        } page;
 #if (BITS_PER_LONG > 32) || (PAGE_SIZE >= 65536)
        __u32 page_offset;
-       __u32 size;
+       __u16 size;
+       __u16 coals;
 #else
        __u16 page_offset;
        __u16 size;
@@ -1850,9 +1852,11 @@ static inline void __skb_fill_page_desc(struct
sk_buff *skb, int i,
         */
        frag->page.p              = page;
        frag->page_offset         = off;
+       frag->coals               = 0;
        skb_frag_size_set(frag, size);

        page = compound_head(page);
+       *WARN_ON(PageSlab(page) && (page->slab_cache->size < size)); //
does NOT trigger*
        if (page_is_pfmemalloc(page))
                skb->pfmemalloc = true;
 }
@@ -2849,10 +2853,14 @@ static inline bool skb_can_coalesce(struct sk_buff
*skb, int i,
                                    const struct page *page, int off)
 {
        if (i) {
-               const struct skb_frag_struct *frag =
&skb_shinfo(skb)->frags[i - 1];
+               struct skb_frag_struct *frag = &skb_shinfo(skb)->frags[i -
1];

-               return page == skb_frag_page(frag) &&
+               bool ret = page == skb_frag_page(frag) &&
                       off == frag->page_offset + skb_frag_size(frag);
+               if (unlikely(ret))
*+                       if (PageSlab(compound_head((struct page *)page)))*
*+                               frag->coals++;*
+               return ret;
        }
        return false;
 }


On Fri, Jun 1, 2018 at 2:55 PM, Kees Cook <keescook@chromium.org> wrote:

> On Fri, Jun 1, 2018 at 1:58 PM, Matthew Wilcox <willy@infradead.org>
> wrote:
> > On Fri, Jun 01, 2018 at 01:49:38PM -0700, Kees Cook wrote:
> >> On Fri, Jun 1, 2018 at 12:02 PM, Laura Abbott <labbott@redhat.com>
> wrote:
> >> > (cc-ing some interested people)
> >> >
> >> >
> >> >
> >> > On 05/31/2018 05:03 PM, Anton Eidelman wrote:
> >> >> Here's a rare issue I reproduce on 4.12.10 (centos config): full log
> >> >> sample below.
> >>
> >> Thanks for digging into this! Do you have any specific reproducer for
> >> this? If so, I'd love to try a bisection, as I'm surprised this has
> >> only now surfaced: hardened usercopy was introduced in 4.8 ...
> >>
> >> >> An innocent process (dhcpclient) is about to receive a datagram, but
> >> >> during skb_copy_datagram_iter() usercopy triggers a BUG in:
> >> >> usercopy.c:check_heap_object() -> slub.c:__check_heap_object(),
> because
> >> >> the sk_buff fragment being copied crosses the 64-byte slub object
> boundary.
> >> >>
> >> >> Example __check_heap_object() context:
> >> >>    n=128    << usually 128, sometimes 192.
> >> >>    object_size=64
> >> >>    s->size=64
> >> >>    page_address(page)=0xffff880233f7c000
> >> >>    ptr=0xffff880233f7c540
> >> >>
> >> >> My take on the root cause:
> >> >>    When adding data to an skb, new data is appended to the current
> >> >> fragment if the new chunk immediately follows the last one: by simply
> >> >> increasing the frag->size, skb_frag_size_add().
> >> >>    See include/linux/skbuff.h:skb_can_coalesce() callers.
> >>
> >> Oooh, sneaky:
> >>                 return page == skb_frag_page(frag) &&
> >>                        off == frag->page_offset + skb_frag_size(frag);
> >>
> >> Originally I was thinking that slab red-zoning would get triggered
> >> too, but I see the above is checking to see if these are precisely
> >> neighboring allocations, I think.
> >>
> >> But then ... how does freeing actually work? I'm really not sure how
> >> this seeming layering violation could be safe in other areas?
> >
> > I'm confused ... I thought skb frags came from the page_frag allocator,
> > not the slab allocator.  But then why would the slab hardening trigger?
>
> Well that would certainly make more sense (well, the sense about
> alloc/free). Having it overlap with a slab allocation, though, that's
> quite bad. Perhaps this is a very odd use-after-free case? I.e. freed
> page got allocated to slab, and when it got copied out, usercopy found
> it spanned a slub object?
>
> [ 655.602500] usercopy: kernel memory exposure attempt detected from
> ffff88022a31aa00 (kmalloc-64) (192 bytes)
>
> This wouldn't be the first time usercopy triggered due to a memory
> corruption...
>
> -Kees
>
> --
> Kees Cook
> Pixel Security
>

[-- Attachment #2: Type: text/html, Size: 11295 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: HARDENED_USERCOPY will BUG on multiple slub objects coalesced into an sk_buff fragment
  2018-06-01 23:34         ` Anton Eidelman
@ 2018-06-05 15:27           ` Christopher Lameter
  2018-06-05 20:45             ` Anton Eidelman
  0 siblings, 1 reply; 10+ messages in thread
From: Christopher Lameter @ 2018-06-05 15:27 UTC (permalink / raw)
  To: Anton Eidelman
  Cc: Kees Cook, Matthew Wilcox, Laura Abbott, Linux-MM, linux-hardened

On Fri, 1 Jun 2018, Anton Eidelman wrote:

> I do not have a way of reproducing this decent enough to recommend: I'll
> keep digging.

If you can reproduce it: Could you try the following patch?



Subject: [NET] Fix false positives of skb_can_coalesce

Skb fragments may be slab objects. Two slab objects may reside
in the same slab page. In that case skb_can_coalesce() may return
true althought the skb cannot be expanded because it would
cross a slab boundary.

Enabling slab debugging will avoid the issue since red zones will
be inserted and thus the skb_can_coalesce() check will not detect
neighboring objects and return false.

Signed-off-by: Christoph Lameter <cl@linux.com>

Index: linux/include/linux/skbuff.h
===================================================================
--- linux.orig/include/linux/skbuff.h
+++ linux/include/linux/skbuff.h
@@ -3010,8 +3010,29 @@ static inline bool skb_can_coalesce(stru
 	if (i) {
 		const struct skb_frag_struct *frag = &skb_shinfo(skb)->frags[i - 1];

-		return page == skb_frag_page(frag) &&
-		       off == frag->page_offset + skb_frag_size(frag);
+		if (page != skb_frag_page(frag))
+			return false;
+
+		if (off != frag->page_offset + skb_frag_size(frag))
+			return false;
+
+		/*
+		 * This may be a slab page and we may have pointers
+		 * to different slab objects in the same page
+		 */
+		if (!PageSlab(skb_frag_page(frag)))
+			return true;
+
+		/*
+		 * We could still return true if we would check here
+		 * if the two fragments are within the same
+		 * slab object. But that is complicated and
+		 * I guess we would need a new slab function
+		 * to check if two pointers are within the same
+		 * object.
+		 */
+		return false;
+
 	}
 	return false;
 }

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: HARDENED_USERCOPY will BUG on multiple slub objects coalesced into an sk_buff fragment
  2018-06-05 15:27           ` Christopher Lameter
@ 2018-06-05 20:45             ` Anton Eidelman
  2018-06-05 21:43               ` Christopher Lameter
  0 siblings, 1 reply; 10+ messages in thread
From: Anton Eidelman @ 2018-06-05 20:45 UTC (permalink / raw)
  To: Christopher Lameter
  Cc: Kees Cook, Matthew Wilcox, Laura Abbott, Linux-MM, linux-hardened

[-- Attachment #1: Type: text/plain, Size: 3343 bytes --]

Hi Christopher,

This eliminates the failure as expected (at the source).
I do not think such a solution is required, and it probably affect
performance.

As Matthew said, slab objects should not be used in sk_buff fragments.
The source of these is my kernel TCP sockets, where kernel_sendpage() is
used with slab payload.
I eliminated this, and the failure disappeared, even though with this kind
of fine timing issues, no failure does not mean anything
Moreover, I tried triggering on slab in sk_buff fragments and nothing came
up.

So far:
1) Use of slab payload in kernel_sendpage() is not polite, even though we
do not BUG on this and documentation does not tell it was just wrong.
2) RX path cannot bring sk_buffs in slab: drivers use alloc_pagexxx or
page_frag_alloc().

What I am still wondering about (and investigating), is how kernel_sendpage()
with slab payload results in slab payload on another socket RX.
Do you see how page ref-counting can be broken with extra references taken
on a slab page containing the fragments, and dropped when networking is
done with them?

Thanks,
Anton



On Tue, Jun 5, 2018 at 8:27 AM, Christopher Lameter <cl@linux.com> wrote:

> On Fri, 1 Jun 2018, Anton Eidelman wrote:
>
> > I do not have a way of reproducing this decent enough to recommend: I'll
> > keep digging.
>
> If you can reproduce it: Could you try the following patch?
>
>
>
> Subject: [NET] Fix false positives of skb_can_coalesce
>
> Skb fragments may be slab objects. Two slab objects may reside
> in the same slab page. In that case skb_can_coalesce() may return
> true althought the skb cannot be expanded because it would
> cross a slab boundary.
>
> Enabling slab debugging will avoid the issue since red zones will
> be inserted and thus the skb_can_coalesce() check will not detect
> neighboring objects and return false.
>
> Signed-off-by: Christoph Lameter <cl@linux.com>
>
> Index: linux/include/linux/skbuff.h
> ===================================================================
> --- linux.orig/include/linux/skbuff.h
> +++ linux/include/linux/skbuff.h
> @@ -3010,8 +3010,29 @@ static inline bool skb_can_coalesce(stru
>         if (i) {
>                 const struct skb_frag_struct *frag =
> &skb_shinfo(skb)->frags[i - 1];
>
> -               return page == skb_frag_page(frag) &&
> -                      off == frag->page_offset + skb_frag_size(frag);
> +               if (page != skb_frag_page(frag))
> +                       return false;
> +
> +               if (off != frag->page_offset + skb_frag_size(frag))
> +                       return false;
> +
> +               /*
> +                * This may be a slab page and we may have pointers
> +                * to different slab objects in the same page
> +                */
> +               if (!PageSlab(skb_frag_page(frag)))
> +                       return true;
> +
> +               /*
> +                * We could still return true if we would check here
> +                * if the two fragments are within the same
> +                * slab object. But that is complicated and
> +                * I guess we would need a new slab function
> +                * to check if two pointers are within the same
> +                * object.
> +                */
> +               return false;
> +
>         }
>         return false;
>  }
>

[-- Attachment #2: Type: text/html, Size: 5611 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: HARDENED_USERCOPY will BUG on multiple slub objects coalesced into an sk_buff fragment
  2018-06-05 20:45             ` Anton Eidelman
@ 2018-06-05 21:43               ` Christopher Lameter
  0 siblings, 0 replies; 10+ messages in thread
From: Christopher Lameter @ 2018-06-05 21:43 UTC (permalink / raw)
  To: Anton Eidelman
  Cc: Kees Cook, Matthew Wilcox, Laura Abbott, Linux-MM, linux-hardened

On Tue, 5 Jun 2018, Anton Eidelman wrote:

> What I am still wondering about (and investigating), is how kernel_sendpage()
> with slab payload results in slab payload on another socket RX.
> Do you see how page ref-counting can be broken with extra references taken
> on a slab page containing the fragments, and dropped when networking is
> done with them?

The slab allocators do not use page refcounting. The objects may be
destroyed via poisioning etc if you use kfree() while still holding a
refcount on the page. Even without poisoning the slab allocator may
overwrite the object.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: HARDENED_USERCOPY will BUG on multiple slub objects coalesced into an sk_buff fragment
  2018-06-01 20:58     ` Matthew Wilcox
  2018-06-01 21:55       ` Kees Cook
@ 2018-06-05 14:51       ` Christopher Lameter
  1 sibling, 0 replies; 10+ messages in thread
From: Christopher Lameter @ 2018-06-05 14:51 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Kees Cook, Laura Abbott, Anton Eidelman, Linux-MM, linux-hardened

On Fri, 1 Jun 2018, Matthew Wilcox wrote:

> > >> My take on the root cause:
> > >>    When adding data to an skb, new data is appended to the current
> > >> fragment if the new chunk immediately follows the last one: by simply
> > >> increasing the frag->size, skb_frag_size_add().
> > >>    See include/linux/skbuff.h:skb_can_coalesce() callers.
> >
> > Oooh, sneaky:
> >                 return page == skb_frag_page(frag) &&
> >                        off == frag->page_offset + skb_frag_size(frag);
> >
> > Originally I was thinking that slab red-zoning would get triggered
> > too, but I see the above is checking to see if these are precisely
> > neighboring allocations, I think.
> >
> > But then ... how does freeing actually work? I'm really not sure how
> > this seeming layering violation could be safe in other areas?

So if there are two neighboring slab objects that the page struct
addresses will match and the network code will coalesce the objects even
if they are in two different slab objects?

The check in skb_can_coalesce() must verify that these are distinct slab
object. Simple thing would be to return false if one object is a slab
object but then the coalescing would not work in a single slab object
either.

So what needs to happen is that we need to check if this is

1) A Page. Then the proper length of the segment within we can coalesce is
the page size.

2) A slab page. Then we can use ksize() to establish the end of the slab
object and we should only coalesce within that boundary.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-06-05 21:43 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-01  0:03 HARDENED_USERCOPY will BUG on multiple slub objects coalesced into an sk_buff fragment Anton Eidelman
2018-06-01 19:02 ` Laura Abbott
2018-06-01 20:49   ` Kees Cook
2018-06-01 20:58     ` Matthew Wilcox
2018-06-01 21:55       ` Kees Cook
2018-06-01 23:34         ` Anton Eidelman
2018-06-05 15:27           ` Christopher Lameter
2018-06-05 20:45             ` Anton Eidelman
2018-06-05 21:43               ` Christopher Lameter
2018-06-05 14:51       ` Christopher Lameter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox