From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 51288C54E58 for ; Mon, 25 Mar 2024 19:37:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D65676B008A; Mon, 25 Mar 2024 15:37:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D156C6B0093; Mon, 25 Mar 2024 15:37:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BDD326B0095; Mon, 25 Mar 2024 15:37:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id AF4806B008A for ; Mon, 25 Mar 2024 15:37:22 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 4DEB880524 for ; Mon, 25 Mar 2024 19:37:22 +0000 (UTC) X-FDA: 81936570324.25.BFA4367 Received: from mail-oa1-f44.google.com (mail-oa1-f44.google.com [209.85.160.44]) by imf12.hostedemail.com (Postfix) with ESMTP id 9559640005 for ; Mon, 25 Mar 2024 19:37:19 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=StOsjaIO; spf=pass (imf12.hostedemail.com: domain of idryomov@gmail.com designates 209.85.160.44 as permitted sender) smtp.mailfrom=idryomov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1711395439; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=90CH8AJ6hrKo9mJJq8CyynE7gOcGB4303/vwGgiZAY8=; b=6LU498QoqDEYS7FN7Q1VL2HPhqt3YKPiIL0MRG++JBAU4+H8qmSldmed+B+H1m3iX80w6S Xy5bqgZJk4hlHxmVV8dX6A1f+w3YCostk2Cah2i0PgcCiT1nRMUrV6r2QYARowpWkHNyDD SEhgN4YrK53LlCew0D9TedZb6lTDeDQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1711395439; a=rsa-sha256; cv=none; b=QiZDEmleRDDubvBv9XaXur47tzMEzP18tQZSssCA5k6E1OSrTg9hJCi4xHHxdx4SYI3Gpf xvyXoZwsFV9pcgAXSOKHARtxsGNdb7PsHsExDbtiufoj4IVunfLdQNAM9oO4rQj9zd/ms8 nHToJ4fVbUWVGQJ3Dx+ioO3I2wtsCH4= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=StOsjaIO; spf=pass (imf12.hostedemail.com: domain of idryomov@gmail.com designates 209.85.160.44 as permitted sender) smtp.mailfrom=idryomov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-oa1-f44.google.com with SMTP id 586e51a60fabf-2228c3ffcbdso1993025fac.2 for ; Mon, 25 Mar 2024 12:37:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1711395438; x=1712000238; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=90CH8AJ6hrKo9mJJq8CyynE7gOcGB4303/vwGgiZAY8=; b=StOsjaIOn5gUpwDLI7qyXwRnC35o0N/wZWesbIKSRCGvdw8imtGyn+8T6GaFcSfwL/ AyvMtIUExljtQ2F7Zo/6tEmTSLVMPie0WOptd302/DFJ9k+vJ5v0C/6REKsc8ip+Uxps UdP7Bp8R4T0NeY7lIDqcE9i71cFsQCjYVGGDKE7as3FFnY2VeiivN/pCTlfzLXOAzvxy nRRnAXcUawk09aK+NT0maN8FbCfomt6d6Cm5TaIjsWQ7gDhjGSsiC+ckIaAp8WbID/XX imz4oM7KjDJJSeCh+oB9uiEsZ6d0MlFUSOqkLIM0Ldyd2P24qVrpmEcIDaNOcNnqlsiW wJ3Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711395438; x=1712000238; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=90CH8AJ6hrKo9mJJq8CyynE7gOcGB4303/vwGgiZAY8=; b=v7KzJfcMfyGty3om4JT50kuBsk/A7r89d/q7nNfgSYDay4HKlltoKz77BzmDGXBp4o sGmWgsRUa/M/wWyzDxtX5LP4NW1JBSLe16JBQcrWnI6qig17TdD6ErbSr5a56nr075Id Qrb4qJ830uM0eqwmeUTPodAKAzWohv7LRQSoK9guL/bPeIKuJnXzczu5yuZGv3WbTDwV iczYHHTzWpd4dw3sjnNRRYe/95aY6++UZGBhzQEdtvxkGK7rWiemh74QO3pIInsfNpy+ d1NIXameI2dO24Erj5jf6A2XhF9mYGmIgT18ttoEM8sMhaSvMmSZzz9qPuLa55qHZi/A h/Ng== X-Forwarded-Encrypted: i=1; AJvYcCUMQV2e/+46FMt+sUuzt5t+VsaxIeJsUx3cIZkPvwlg3yTXhhmEL42PBHXaYL5n73WHfIbLpWeMg6UMHpBc6tHK+dk= X-Gm-Message-State: AOJu0Yw7pNEqbaMCylldg9Lh6+M5ZURYOK0kpzr0mRu/bbfykuLV+os7 4bvd5GFzaG8R8JdIGosStRQHFhvDDk5Y49GbUgsNz0pRgvtdVlB8riZZzdHj/1GMqlDAxQBpGQM Pdpd9MoXwgALpHiYXZ4K9MWzbuW6o3wB1hYY= X-Google-Smtp-Source: AGHT+IF3VDAGVGLUHUxQIydW0V+qUEjZ2t1sxmzA/I4vbE9BQVjOevbN/A3c+HXOFfjVcuvFf/i0ka6/7Cbn92GbWng= X-Received: by 2002:a05:6870:1381:b0:229:f106:492d with SMTP id 1-20020a056870138100b00229f106492dmr7018985oas.12.1711395438612; Mon, 25 Mar 2024 12:37:18 -0700 (PDT) MIME-Version: 1.0 References: <6f5b9d18-2d04-495c-970c-eb5eada5f676@redhat.com> In-Reply-To: <6f5b9d18-2d04-495c-970c-eb5eada5f676@redhat.com> From: Ilya Dryomov Date: Mon, 25 Mar 2024 20:37:06 +0100 Message-ID: Subject: Re: kernel BUG at mm/usercopy.c:102 -- pc : usercopy_abort To: David Hildenbrand Cc: Xiubo Li , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Ceph Development , linux-fsdevel@vger.kernel.org, Vlastimil Babka Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: x5r9b9urzwjie7owihygmbdxwzm7zmcq X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 9559640005 X-Rspam-User: X-HE-Tag: 1711395439-914179 X-HE-Meta: U2FsdGVkX18p8zdqrR2R5rZ99IP9M7CpIWSVxgLHCy5vf5pC0FQH+AG9mU90iArwqgoFP9C9bFcSjjYPYlU0sB49CwIrB+FZFwpNCfmVElFEJUQirD5F/dgj2l4f1UL6mmydIr00t2XK4cuBs7j6r6rCnKs2vY0WoVw/qtUIvsIqUiUyT1kLt9xLLkiJoq1H75KeqtCZTY8+rmNam8H9uIUAaqIBSXMM3pWnQFe4RVQq4CXNbmDCvfs07JQ9e5i4gtW6WxQeoz+EegLKdkzcnB30xkv2jlzqcL9aycDARHTXzgqEP5x3541AX2j4OfCYDo8jDvaO+ue86Xj/QVTY1N+joukDMxOmOHoIUUDtpmQD8/5J5EhrQGztRD98IYVM6bU4yi1YBC7ZTn/hRGv8owXb1LnCfpipMEmDqTxRUQs29JelFhxmdpjBtSYiuYYin+jGSzB+RHxbZ83+0Q6A/8pSPSw5TrMtIMqVQ4blETBv6jzfrrdJKiNc1LYsvSolA/vpoXhdG3eZ6Ud7jT6PrFCJxrLW9Yg5m++c2IHpgcU2m8VG1zwh6v2CaF6XMY3j25kJB0Pg7W1+WyOUACj8WzEURTC1nOtmZh99QVD97CJLIHmmW37TlWMH62N9ucVPST1kDIEubxUJgXAFNaKRhUFCyKj8rTaYH9f4dCkGU1sb54lqU5wbwNAA913k+x4m89Z+R54NrvXvzpsiQMzNH/yzBLLwu3F0qXOOnhWzV6kMFdwkwrXoo8I4Fd9WcGsatFq67LFxxNI3VYjhkelpSpUbgljISo2Ngcd5S3XWC65aKsN2qamS1thb/zXfviJqUTCAA3ggLP1UD7iThQzrwUxgWhaKDIsC3b6WJ3aPrHX8ndb3FIoUq3VOFIDAGYHh05VEmBIHK5B+/UulMCLdN6f2ajbPmYuNL/ltpBr5EPXvCjmev8FT6SgptHDj6TGg1QsB82GAJ7V0A/DcN8u rgO0S6VK AHoZaSVVxrFT+cjGt/JGfk7+LIJrTf3MVmfB7xJ0SGgZHnFO/iBXgvaPrTpzRZ5wbRMdhxynWe/dLWk9MLJlCO7WoHg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Mar 25, 2024 at 6:39=E2=80=AFPM David Hildenbrand wrote: > > On 25.03.24 13:06, Xiubo Li wrote: > > > > On 3/25/24 18:14, David Hildenbrand wrote: > >> On 25.03.24 08:45, Xiubo Li wrote: > >>> Hi guys, > >>> > >>> We are hitting the same crash frequently recently with the latest ker= nel > >>> when testing kceph, and the call trace will be something likes: > >>> > >>> [ 1580.034891] usercopy: Kernel memory exposure attempt detected from > >>> SLUB object 'kmalloc-192' (offset 82, size 499712)!^M > >>> [ 1580.045866] ------------[ cut here ]------------^M > >>> [ 1580.050551] kernel BUG at mm/usercopy.c:102!^M > >>> ^M > >>> Entering kdb (current=3D0xffff8881211f5500, pid 172901) on processor = 4 > >>> Oops: (null)^M > >>> due to oops @ 0xffffffff8138cabd^M > >>> CPU: 4 PID: 172901 Comm: fsstress Tainted: G S 6.6.0-g623393c9d50c #1= ^M > >>> Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 1.0c 09/07/2015= ^M > >>> RIP: 0010:usercopy_abort+0x6d/0x80^M > >>> Code: 4c 0f 44 d0 41 53 48 c7 c0 1c e9 13 82 48 c7 c6 71 62 13 82 48 = 0f > >>> 45 f0 48 89 f9 48 c7 c7 f0 6b 1b 82 4c 89 d2 e8 63 2b df ff <0f> 0b 4= 9 > >>> c7 c1 44 c8 14 82 4d 89 cb 4d 89 c8 eb a5 66 90 f3 0f 1e^M > >>> RSP: 0018:ffffc90006dfba88 EFLAGS: 00010246^M > >>> RAX: 000000000000006a RBX: 000000000007a000 RCX: 0000000000000000^M > >>> RDX: 0000000000000000 RSI: ffff88885fd1d880 RDI: ffff88885fd1d880^M > >>> RBP: 000000000007a000 R08: 0000000000000000 R09: c0000000ffffdfff^M > >>> R10: 0000000000000001 R11: ffffc90006dfb930 R12: 0000000000000001^M > >>> R13: ffff8882b7bbed12 R14: ffff88827a375830 R15: ffff8882b7b44d12^M > >>> FS: 00007fb24c859500(0000) GS:ffff88885fd00000(0000) > >>> knlGS:0000000000000000^M > >>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M > >>> CR2: 000055c2bcf9eb00 CR3: 000000028956c005 CR4: 00000000001706e0^M > >>> Call Trace:^M > >>> ^M > >>> ? kdb_main_loop+0x32c/0xa10^M > >>> ? kdb_stub+0x216/0x420^M > >>> more> > >>> > >>> You can see more detail in ceph tracker > >>> https://tracker.ceph.com/issues/64471. > >> > >> Where is the full backtrace? Above contains only the backtrace of kdb. > >> > > Hi David, > > > > The bad news is that there is no more backtrace. All the failures we hi= t > > are similar with the following logs: > > > > That's unfortunate :/ > > "exposure" in the message means we are in copy_to_user(). > > SLUB object 'kmalloc-192' means that we come from __check_heap_object() > ... we have 192 bytes, but the length we want to access is 499712 ... > 488 KiB. > > So we ended up somehow in > > __copy_to_user()->check_object_size()->__check_object_size()-> > check_heap_object()->__check_heap_object()->usercopy_abort() > > > ... but the big question is which code tried to copy way too much memory > out of a slab folio to user space. > > > > >> That link also contains: > >> > >> Entering kdb (current=3D0xffff9115d14fb980, pid 61925) on processor 5 > >> Oops: (null)^M > >> due to oops @ 0xfffffffface3a1d2^M > >> CPU: 5 PID: 61925 Comm: ld Kdump: loaded Not tainted > >> 5.14.0-421.el9.x86_64 #1^M > >> Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 2.0 12/17/2015^M > >> RIP: 0010:usercopy_abort+0x74/0x76^M > >> Code: 14 74 ad 51 48 0f 44 d6 49 c7 c3 cb 9f 73 ad 4c 89 d1 57 48 c7 > >> c6 60 83 75 ad 48 c7 c7 00 83 75 ad 49 0f 44 f3 e8 1b 3b ff ff <0f> 0b > >> 0f b6 d3 4d 89 e0 48 89 e9 31 f6 48 c7 c7 7f 83 75 ad e8 73^M > >> RSP: 0018:ffffbb97c16af8d0 EFLAGS: 00010246^M > >> RAX: 0000000000000072 RBX: 0000000000000112 RCX: 0000000000000000^M > >> RDX: 0000000000000000 RSI: ffff911d1fd60840 RDI: ffff911d1fd60840^M > >> RBP: 0000000000004000 R08: 80000000ffff84b4 R09: 0000000000ffff0a^M > >> R10: 0000000000000004 R11: 0000000000000076 R12: ffff9115c0be8b00^M > >> R13: 0000000000000001 R14: ffff911665df9f68 R15: ffff9115d16be112^M > >> FS: 00007ff20442eb80(0000) GS:ffff911d1fd40000(0000) > >> knlGS:0000000000000000^M > >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M > >> CR2: 00007ff20446142d CR3: 00000001215ec003 CR4: 00000000003706e0^M > >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000^M > >> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400^M > >> Call Trace:^M > >> ^M > >> ? show_trace_log_lvl+0x1c4/0x2df^M > > > ... are we stuck in show_trace_log_lvl(), probably deadlocked not being > able to print the actuall callstack? If so, that's nasty. Hi David, I don't think so. This appears to be a cut-and-paste from what is essentially a non-interactive serial console. Stack trace entries prefixed with ? aren't exact and kdb prompt more> is there in all cases which is what hides the rest of the stack. There are four ways to get the entire stack trace here: a) try to attach to the serial console and interact with kdb -- this is very much hit or miss due to general IPMI/BMC unreliability and the fact that it would be already attached to for logging b) disable kdb by passing "kdb: false" in the job definition -- this should result in /sys/module/kgdboc/parameters/kgdboc cleared after booting into the kernel under test (or just hack teuthology to not pass "kdb: true" which it does by default if "-k " is given when scheduling) c) if b) fails, rebuild the kernel with kdb disabled in Kconfig d) configure kdump and grab a vmcore -- these is no teuthology support for this, so it would be challenging but would provide the most data to chew on Xiubo, I'd recommend going with b), but take your pick ;) Thanks, Ilya