From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f71.google.com (mail-wm0-f71.google.com [74.125.82.71]) by kanga.kvack.org (Postfix) with ESMTP id A55FD6B0005 for ; Thu, 31 May 2018 20:03:32 -0400 (EDT) Received: by mail-wm0-f71.google.com with SMTP id g78-v6so77256wmg.9 for ; Thu, 31 May 2018 17:03:32 -0700 (PDT) Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id a206-v6sor146190wmh.77.2018.05.31.17.03.29 for (Google Transport Security); Thu, 31 May 2018 17:03:29 -0700 (PDT) MIME-Version: 1.0 From: Anton Eidelman Date: Thu, 31 May 2018 17:03:27 -0700 Message-ID: Subject: HARDENED_USERCOPY will BUG on multiple slub objects coalesced into an sk_buff fragment Content-Type: multipart/alternative; boundary="0000000000004e20a5056d895068" Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org --0000000000004e20a5056d895068 Content-Type: text/plain; charset="UTF-8" Hello, Here's a rare issue I reproduce on 4.12.10 (centos config): full log sample below. An innocent process (dhcpclient) is about to receive a datagram, but during skb_copy_datagram_iter() usercopy triggers a BUG in: usercopy.c:check_heap_object() -> slub.c:__check_heap_object(), because the sk_buff fragment being copied crosses the 64-byte slub object boundary. Example __check_heap_object() context: n=128 << usually 128, sometimes 192. object_size=64 s->size=64 page_address(page)=0xffff880233f7c000 ptr=0xffff880233f7c540 My take on the root cause: When adding data to an skb, new data is appended to the current fragment if the new chunk immediately follows the last one: by simply increasing the frag->size, skb_frag_size_add(). See include/linux/skbuff.h:skb_can_coalesce() callers. This happens very frequently for kmem_cache objects (slub/slab) with intensive kernel level TCP traffic, and produces sk_buff fragments that span multiple kmem_cache objects. However, if the same happens to receive data intended for user space, usercopy triggers a BUG. This is quite rare but possible: fails after 5-60min of network traffic (needs some unfortunate timing, e.g. only on QEMU, without CONFIG_SLUB_DEBUG_ON etc). I used an instrumentation that counts coalesced chunks in the fragment, in order to confirm that the failing fragment was legally constructed from multiple slub objects. On 4.17.0.rc3: I could not reproduce the issue with the latest kernel, but the changes in usercopy.c and slub.c since 4.12 do not address the issue. Moreover, it would be quite hard to do without effectively disabling the heap protection. However, looking at the recent changes in include/linux/sk_buff.h I see skb_zcopy() that yields negative skb_can_coalesce() and may have masked the problem. Please, let me know what do you think? 4.12.10 is the centos official kernel with CONFIG_HARDENED_USERCOPY enabled: if the problem is real we better have an erratum for it. Regards, Anton Eidelman [ 655.602500] usercopy: kernel memory exposure attempt detected from ffff88022a31aa00 *(kmalloc-64) (192 bytes*) [ 655.604254] ----------[ cut here ]---------- [ 655.604877] kernel BUG at mm/usercopy.c:72! [ 655.606302] invalid opcode: 0000 1 SMP [ 655.618390] CPU: 3 PID: 2335 Comm: dhclient Tainted: G O 4.12.10-1.el7.elrepo.x86_64 #1 [ 655.619666] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 655.620926] task: ffff880229ab2d80 task.stack: ffffc90001198000 [ 655.621786] RIP: 0010:__check_object_size+0x74/0x190 [ 655.622489] RSP: 0018:ffffc9000119bbb8 EFLAGS: 00010246 [ 655.623236] RAX: 0000000000000060 RBX: ffff88022a31aa00 RCX: 0000000000000000 [ 655.624234] RDX: 0000000000000000 RSI: ffff88023fcce108 RDI: ffff88023fcce108 [ 655.625237] RBP: ffffc9000119bbd8 R08: 00000000fffffffe R09: 0000000000000271 [ 655.626248] R10: 0000000000000005 R11: 0000000000000270 R12: 00000000000000c0 [ 655.627256] R13: ffff88022a31aac0 R14: 0000000000000001 R15: 00000000000000c0 [ 655.628268] FS: 00007fb54413b880(0000) GS:ffff88023fcc0000(0000) knlGS:0000000000000000 [ 655.629561] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 655.630289] CR2: 00007fb5439dc5c0 CR3: 000000023211d000 CR4: 00000000003406e0 [ 655.631268] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 655.632281] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 655.633318] Call Trace: [ 655.633696] copy_page_to_iter_iovec+0x9c/0x180 [ 655.634351] copy_page_to_iter+0x22/0x160 [ 655.634943] skb_copy_datagram_iter+0x157/0x260 [ 655.635604] packet_recvmsg+0xcb/0x460 [ 655.636156] ? selinux_socket_recvmsg+0x17/0x20 [ 655.636816] sock_recvmsg+0x3d/0x50 [ 655.637330] ___sys_recvmsg+0xd7/0x1f0 [ 655.637892] ? kvm_clock_get_cycles+0x1e/0x20 [ 655.638533] ? ktime_get_ts64+0x49/0xf0 [ 655.639101] ? _copy_to_user+0x26/0x40 [ 655.639657] __sys_recvmsg+0x51/0x90 [ 655.640184] SyS_recvmsg+0x12/0x20 [ 655.640696] entry_SYSCALL_64_fastpath+0x1a/0xa5 -------------------------------------------------------------------------------------------------------------------------------------------- --0000000000004e20a5056d895068 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hello,

Here's a rare issue I reprod= uce on 4.12.10 (centos config): full log sample below.
An innocen= t process (dhcpclient) is about to receive a datagram, but during=C2=A0skb_= copy_datagram_iter() usercopy triggers a BUG in:
usercopy.c:check= _heap_object() -> slub.c:__check_heap_object(), because the sk_buff frag= ment being copied crosses the 64-byte slub object boundary.

<= /div>
Example __check_heap_object() context:
=C2=A0 = n=3D128=C2=A0 =C2=A0 << usually 128, sometimes 192.
=C2=A0 = object_size=3D64
=C2=A0 s->size=3D64
=C2=A0 page_add= ress(page)=3D0xffff880233f7c000
=C2=A0 ptr=3D0xffff880233f7c540

My take on the root cause:
=C2=A0 Wh= en adding data to an skb, new data is appended to the current fragment if t= he new chunk immediately follows the last one: by simply increasing the fra= g->size, skb_frag_size_add().
=C2=A0 See include/linux/skbuff.= h:skb_can_coalesce() callers.
=C2=A0 This happens very frequently= for kmem_cache objects (slub/slab) with intensive kernel level TCP traffic= , and produces sk_buff fragments that span multiple kmem_cache objects.
=C2=A0 However, if the same happens to receive data intended f= or user space, usercopy triggers a BUG.
=C2=A0 This is quite rare= but possible: fails after 5-60min of network traffic (needs some unfortuna= te timing, e.g. only on QEMU, without CONFIG_SLUB_DEBUG_ON etc).
=C2=A0 I used an instrumentation that counts coalesced chunks in the = fragment, in order to confirm that the failing fragment was legally constru= cted from multiple slub objects.

On=C2=A04.17.0.rc3:
=C2=A0 I could not reproduce = the issue with the latest kernel, but the changes in usercopy.c and slub.c = since 4.12 do not address the issue.
=C2=A0 Moreover, it would be= quite hard to do without effectively disabling the heap protection.
<= div>=C2=A0 However, looking at the recent changes in include/linux/sk_buff.= h I see=C2=A0skb_zcopy() that yields negative skb_can_coalesce()=C2=A0and m= ay have masked the problem.

Please, let me know wh= at do you think?
4.12.10 is the centos official kernel with CONFI= G_HARDENED_USERCOPY=C2=A0enabled: if the problem is real we better have an = erratum for it.=C2=A0

Regards,
Anton Eidelman


[ 655.602500] userco= py: kernel memory exposure attempt detected from ffff88022a31aa00 (kmalloc-64) (192 bytes)
[ 655.604254= ] ----------[ cut here ]----------<= br style=3D"color:rgb(23,43,77);font-family:-apple-system,BlinkMacSystemFon= t,"Segoe UI",Roboto,Oxygen,Ubuntu,"Fira Sans","Dro= id Sans","Helvetica Neue",sans-serif;font-size:14px;font-sty= le:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weigh= t:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform= :none;white-space:normal;word-spacing:0px;background-color:rgb(244,245,247)= ;text-decoration-style:initial;text-decoration-color:initial">[ 655.604877] kernel BUG at mm/usercopy.c:72!
[ 655.606302] = invalid opcode: 0000 1 SMP
[ 655.618390] CPU: 3 PID: 2335 Comm: d= hclient Tainted: G O 4.12.10-1.el7.elrepo.x86_64 #1
[ 655.619666]= Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu= 1 04/01/2014
[ 655.620926] task: ffff880229ab2d80 task.stack: fff= fc90001198000
[ 655.621786] RIP: 0010:__check_object_size+0x74/0x= 190
[ 655.622489] RSP: 0018:ffffc9000119bbb8 EFLAGS: 00010246
[ 655.623236] RAX: 0000000000000060 RBX: ffff88022a31aa00 RCX: 00000= 00000000000
[ 655.624234] RDX: 0000000000000000 RSI: ffff88023fcc= e108 RDI: ffff88023fcce108
[ 655.625237] RBP: ffffc9000119bbd8 R0= 8: 00000000fffffffe R09: 0000000000000271
[ 655.626248] R10: 0000= 000000000005 R11: 0000000000000270 R12: 00000000000000c0
[ 655.62= 7256] R13: ffff88022a31aac0 R14: 0000000000000001 R15: 00000000000000c0
[ 655.628268] FS: 00007fb54413b880(0000) GS:ffff88023fcc0000(0000) k= nlGS:0000000000000000
[ 655.629561] CS: 0010 DS: 0000 ES: 0000 CR= 0: 0000000080050033
[ 655.630289] CR2: 00007fb5439dc5c0 CR3: 0000= 00023211d000 CR4: 00000000003406e0
[ 655.631268] DR0: 00000000000= 00000 DR1: 0000000000000000 DR2: 0000000000000000
[ 655.632281] D= R3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
= [ 655.633318] Call Trace:
[ 655.633696] copy_page_to_iter_iovec+0= x9c/0x180
[ 655.634351] copy_page_to_iter+0x22/0x160
[ = 655.634943] skb_copy_datagram_iter+0x157/0x260
[ 655.635604] pack= et_recvmsg+0xcb/0x460
[ 655.636156] ? selinux_socket_recvmsg+0x17= /0x20
= [ 655.636816] sock_recvmsg+0x3d/0x50
[ 655.637330= ] ___sys_recvmsg+0xd7/0x1f0
[ 655.637892] ? kvm_clock_get_cycles+= 0x1e/0x20
[ 655.638533] ? ktime_get_ts64+0x49/0xf0
[ 65= 5.639101] ? _copy_to_user+0x26/0x40
[ 655.639657] __sys_recvmsg+0= x51/0x90
[ 655.640184] SyS_recvmsg+0x12/0x20
[ 655.6406= 96] entry_SYSCALL_64_fastpath+0x1a/0xa5
-------------------------= ---------------------------------------------------------------------------= ----------------------------------------

--0000000000004e20a5056d895068--