From: Yi Zhang <yi.zhang@redhat.com>
To: linux-block <linux-block@vger.kernel.org>,
"open list:NVM EXPRESS DRIVER" <linux-nvme@lists.infradead.org>,
linux-mm@kvack.org
Cc: Daniel Wagner <dwagner@suse.de>, Ming Lei <ming.lei@redhat.com>,
Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>,
Keith Busch <kbusch@kernel.org>, Jens Axboe <axboe@fb.com>,
Christoph Hellwig <hch@lst.de>
Subject: Re: [bug report] kernel BUG at mm/hugetlb.c:5880! triggered by blktests nvme/029
Date: Mon, 23 Jun 2025 12:35:57 +0800 [thread overview]
Message-ID: <CAHj4cs8VJxeDQtjHvRN+unBrhzU+-vAweUz+eRY3wmhS9LM1fQ@mail.gmail.com> (raw)
In-Reply-To: <CAHj4cs92q3Lc8f=mEZ-e9piZtLj62eJ2Z5iSO-wJuRepspkbsA@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 10176 bytes --]
The issue still can be reproduced on the latest linux tree and occurred
when do "test_user_io /dev/nvme0n1 511 1024" in blktests nvme/029, add
linux-mm@ since the BUG occurred on hugetlb side, here is the log:
+ test_user_io /dev/nvme0n1 511 1024
+ local disk=/dev/nvme0n1
+ local start=511
+ local cnt=1024
+ local bs size img img1
++ blockdev --getss /dev/nvme0n1
+ bs=512
+ size=524288
++ mktemp /tmp/blk_img_XXXXXX
+ img=/tmp/blk_img_4aWO9O
++ mktemp /tmp/blk_img_XXXXXX
+ img1=/tmp/blk_img_mFMZKv
+ dd if=/dev/urandom of=/tmp/blk_img_4aWO9O bs=512 count=1024 status=none
+ (( cnt-- ))
+ nvme write --start-block=511 --block-count=1023 --data-size=524288
--data=/tmp/blk_img_4aWO9O /dev/nvme0n1
failed to read data buffer from input file Bad address
[ 508.364146] loop0: detected capacity change from 0 to 2097152
[ 516.268535] Key type psk registered
[ 522.033843] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[ 533.542882] nvmet: Created nvm controller 1 for subsystem
blktests-subsystem-1 for NQN
nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
[ 533.545106] nvme nvme0: creating 32 I/O queues.
[ 533.552928] nvme nvme0: new ctrl: "blktests-subsystem-1"
[ 723.596113] ------------[ cut here ]------------
[ 723.596779] kernel BUG at mm/hugetlb.c:5880!
[ 723.597481] Oops: invalid opcode: 0000 [#1] SMP NOPTI
[ 723.597851] CPU: 25 UID: 0 PID: 1637 Comm: nvme Not tainted 6.16.0-rc2+
#26 PREEMPT(undef)
[ 723.598334] Hardware name: HP ProLiant DL385p Gen8, BIOS A28 03/14/2018
[ 723.598688] RIP: 0010:__unmap_hugepage_range+0x7bb/0x840
[ 723.598987] Code: 3c 24 e8 f8 3b da 00 e9 ac fb ff ff 48 b8 00 00 00 c0
7f 00 00 00 48 89 44 24 28 e9 06 f9 ff ff b9 0c 00 00 00 e9 4b ff ff ff
<0f> 0b 0f 0b e9 a0 f8 ff ff 0f 0b 65 48 8b 05 3a 6d 02 03 48 8b 10
[ 723.600315] RSP: 0018:ffffd1abca5ffa98 EFLAGS: 00010206
[ 723.600612] RAX: 0000000000400000 RBX: ffff888d09f93500 RCX:
0000000000000009
[ 723.601376] RDX: 00000000001fffff RSI: ffff888d09f93500 RDI:
ffffd1abca5ffbf0
[ 723.602161] RBP: 0000000000000000 R08: ffffffff868d5ed8 R09:
0000000000000003
[ 723.602945] R10: 00007f48b7603000 R11: 00007f48b7803000 R12:
00007f48b7603000
[ 723.603720] R13: ffffd1abca5ffbf0 R14: ffffd1abca5ffbf0 R15:
ffff888d09f93500
[ 723.604465] FS: 00007f48b878e840(0000) GS:ffff888f70297000(0000)
knlGS:0000000000000000
[ 723.604906] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 723.605592] CR2: 00007f48b85b9d90 CR3: 0000000646365000 CR4:
00000000000406f0
[ 723.606407] Call Trace:
[ 723.606599] <TASK>
[ 723.607096] ? unmap_page_range+0x1a4/0x260
[ 723.607363] unmap_vmas+0xa1/0x180
[ 723.607924] exit_mmap+0xe5/0x3c0
[ 723.608484] __mmput+0x41/0x140
[ 723.609052] exit_mm+0xb1/0x110
[ 723.609623] do_exit+0x19a/0x440
[ 723.610195] do_group_exit+0x2d/0xc0
[ 723.610440] __x64_sys_exit_group+0x18/0x20
[ 723.610665] x64_sys_call+0xfdb/0x14f0
[ 723.610911] do_syscall_64+0x82/0x2c0
[ 723.611148] ? do_read_fault+0x107/0x260
[ 723.611399] ? handle_pte_fault+0x118/0x240
[ 723.611623] ? do_fault+0x150/0x210
[ 723.612182] ? __handle_mm_fault+0x3a7/0x700
[ 723.612872] ? count_memcg_events+0x15d/0x230
[ 723.613521] ? handle_mm_fault+0x248/0x360
[ 7x76/0x7e
[ 723.713877] RIP: 0033:0x7f48b85b9da8
[ 723.714116] Code: Unable to access opcode bytes at 0x7f48b85b9d7e.
[ 723.714835] RSP: 002b:00007ffc2853b238 EFLAGS: 00000202 ORIG_RAX:
00000000000000e7
[ 723.715633] RAX: ffffffffffffffda RBX: 00007f48b86e4fe8 RCX:
00007f48b85b9da8
[ 723.716442] RDX: 00007f48b878eb48 RSI: ffffffffffffff78 RDI:
0000000000000001
[ 723.717218] RBP: 00007ffc2853b290 R08: 0000000000000000 R09:
0000000000000000
[ 723.718029] R10: 00007ffc2853b020 R11: 0000000000000202 R12:
0000000000000001
[ 723.718836] R13: 0000000000000001 R14: 00007f48b86e3680 R15:
00007f48b86e5000
[ 723.719613] </TASK>
5863 void __unmap_hugepage_range(struct mmu_gather *tlb, struct
vm_area_struct *vma,
5864 unsigned long start, unsigned long end,
5865 struct folio *folio, zap_flags_t zap_flags)
5866 {
5867 struct mm_struct *mm = vma->vm_mm;
5868 const bool folio_provided = !!folio;
5869 unsigned long address;
5870 pte_t *ptep;
5871 pte_t pte;
5872 spinlock_t *ptl;
5873 struct hstate *h = hstate_vma(vma);
5874 unsigned long sz = huge_page_size(h);
5875 bool adjust_reservation = false;
5876 unsigned long last_addr_mask;
5877 bool force_flush = false;
5878
5879 WARN_ON(!is_vm_hugetlb_page(vma));
5880 BUG_ON(start & ~huge_page_mask(h));
On Mon, Jun 16, 2025 at 7:45 PM Yi Zhang <yi.zhang@redhat.com> wrote:
> Hi
> CKI triggered the following issue[2] with the block/fo-next commit[1],
> please help check it, thanks.
>
> [1]
> commit: 1cbac730bb6b Merge branch 'block-6.16' into for-next
>
> [2]
> [ 1207.436193] run blktests nvme/029 at 2025-06-13 16:11:12
> [ 1207.476177] loop0: detected capacity change from 0 to 2097152
> [ 1207.488130] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
> [ 1207.506313] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
> [ 1207.556941] nvmet: Created nvm controller 1 for subsystem
> blktests-subsystem-1 for NQN
> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
> [ 1207.560824] nvme nvme0: creating 32 I/O queues.
> [ 1207.561919] nvme nvme0: failed to connect socket: -512
> [ 1207.569392] nvmet: Created nvm controller 2 for subsystem
> blktests-subsystem-1 for NQN
> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
> [ 1207.572517] nvme nvme0: creating 32 I/O queues.
> [ 1207.580131] nvme nvme0: mapped 32/0/0 default/read/poll queues.
> [ 1207.599121] nvme nvme0: new ctrl: NQN "blktests-subsystem-1", addr
> 127.0.0.1:4420, hostnqn:
> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
> [ 1207.916342] ------------[ cut here ]------------
> [ 1207.917026] kernel BUG at mm/hugetlb.c:5880!
> [ 1207.917801] Oops: invalid opcode: 0000 [#1] SMP NOPTI
> [ 1207.918227] CPU: 18 UID: 0 PID: 47801 Comm: nvme Not tainted
> 6.16.0-rc1 #1 PREEMPT(lazy)
> [ 1207.918683] Hardware name: HP ProLiant DL385p Gen8, BIOS A28 03/14/2018
> [ 1207.919300] RIP: 0010:__unmap_hugepage_range+0x7a4/0x7f0
> [ 1207.919611] Code: 89 ef 48 89 c6 e8 2c 90 ff ff 48 8b 3c 24 e8 73
> c3 d9 00 e9 bf fb ff ff 0f 0b 49 8b 50 30 48 f7 d2 4c 85 e2 0f 84 e3
> f8 ff ff <0f> 0b 0f 0b 65 48 8b 05 28 a2 0c 03 48 8b 10 f7 c2 00 00 00
> 20 74
> [ 1207.920942] RSP: 0018:ffffcd058ced7ae0 EFLAGS: 00010206
> [ 1207.921231] RAX: 0000000000400000 RBX: 0000000000000000 RCX:
> 0000000000000009
> [ 1207.922070] RDX: 00000000001fffff RSI: ffff8cb38c141c80 RDI:
> ffffcd058ced7c48
> [ 1207.922850] RBP: ffffffffffffffff R08: ffffffffacb56e98 R09:
> 0000000000200000
> [ 1207.923618] R10: 00007f6120006000 R11: ffff8cb4fb586000 R12:
> 00007f611fe06000
> [ 1207.924421] R13: ffffcd058ced7c48 R14: ffff8cb38c141c80 R15:
> ffffcd058ced7c00
> [ 1207.925221] FS: 00007f61210db840(0000) GS:ffff8cb70b096000(0000)
> knlGS:0000000000000000
> [ 1207.925639] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1207.926357] CR2: 00007f6120f4d710 CR3: 000000025c3ba000 CR4:
> 00000000000406f0
> [ 1207.927164] Call Trace:
> [ 1207.927319] <TASK>
> [ 1207.927833] unmap_vmas+0xa6/0x180
> [ 1207.928565] exit_mmap+0xf0/0x3b0
> [ 1207.929175] __mmput+0x3e/0x130
> [ 1207.929790] exit_mm+0xaf/0x110
> [ 1207.930457] do_exit+0x1a5/0x450
> [ 1207.931054] do_group_exit+0x30/0x80
> [ 1207.931287] __x64_sys_exit_group+0x18/0x20
> [ 1207.931504] x64_sys_call+0xfdb/0x14f0
> [ 1207.931751] do_syscall_64+0x84/0x2c0
> [ 1207.931977] ? count_memcg_events+0x167/0x1d0
> [ 1207.932623] ? handle_mm_fault+0x220/0x340
> [ 1207.932879] ? do_user_addr_fault+0x2c3/0x7f0
> [ 1207.933528] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [ 1207.933842] RIP: 0033:0x7f6120f4d728
> [ 1207.934079] Code: Unable to access opcode bytes at 0x7f6120f4d6fe.
> [ 1207.934818] RSP: 002b:00007ffe4c80b528 EFLAGS: 00000206 ORIG_RAX:
> 00000000000000e7
> [ 1207.935606] RAX: ffffffffffffffda RBX: 00007f6121076fc8 RCX:
> 00007f6120f4d728
> [ 1207.936392] RDX: 00007f61210dbb48 RSI: ffffffffffffff90 RDI:
> 0000000000000001
> [ 1207.937391] RBP: 00007ffe4c80b580 R08: 0000000000000000 R09:
> 0000000000000000
> [ 1207.938510] R10: 00007ffe4c80b310 R11: 0000000000000206 R12:
> 0000000000000001
> [ 1207.939326] R13: 0000000000000001 R14: 00007f6121075680 R15:
> 00007f6121076fe0
> [ 1207.940102] </TASK>
> [ 1207.940260] Modules linked in: nvmet_tcp nvmet nvme_tcp
> nvme_fabrics nvme nvme_core nvme_keyring nvme_auth nbd pktcdvd rfkill
> sunrpc amd64_edac edac_mce_amd ipmi_ssif kvm tg3 i2c_piix4
> fam15h_power acpi_power_meter k10temp hpilo pcspkr ipmi_si i2c_smbus
> irqbypass acpi_ipmi acpi_cpufreq ipmi_devintf ipmi_msghandler fuse
> loop nfnetlink zram lz4hc_compress lz4_compress xfs polyval_clmulni
> ata_generic ghash_clmulni_intel pata_acpi sha512_ssse3 hpsa mgag200
> serio_raw sha1_ssse3 pata_atiixp sp5100_tco scsi_transport_sas hpwdt
> i2c_algo_bit [last unloaded: nvmet]
> [10: 00007f6120006000 R11: ffff8cb4fb586000 R12: 00007f611fe06000
> [ 1208.443808] R13: ffffcd058ced7c48 R14: ffff8cb38c141c80 R15:
> ffffcd058ced7c00
> [ 1208.444605] FS: 00007f61210db840(0000) GS:ffff8cb70b016000(0000)
> knlGS:0000000000000000
> [ 1208.445062] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1208.445777] CR2: 0000562c0be4a000 CR3: 000000025c3ba000 CR4:
> 00000000000406f0
> [ 1208.446610] Kernel panic - not syncing: Fatal exception
> [ 1208.447172] Kernel Offset: 0x28200000 from 0xffffffff81000000
> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [ 1208.451887] ERST: [Firmware Warn]: Firmware does not respond in time.
> [ 1208.484105] ---[ end Kernel panic - not syncing: Fatal exception ]---
>
> --
> Best Regards,
> Yi Zhang
>
--
Best Regards,
Yi Zhang
[-- Attachment #2: Type: text/html, Size: 11343 bytes --]
parent reply other threads:[~2025-06-23 4:36 UTC|newest]
Thread overview: expand[flat|nested] mbox.gz Atom feed
[parent not found: <CAHj4cs92q3Lc8f=mEZ-e9piZtLj62eJ2Z5iSO-wJuRepspkbsA@mail.gmail.com>]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAHj4cs8VJxeDQtjHvRN+unBrhzU+-vAweUz+eRY3wmhS9LM1fQ@mail.gmail.com \
--to=yi.zhang@redhat.com \
--cc=axboe@fb.com \
--cc=dwagner@suse.de \
--cc=hch@lst.de \
--cc=kbusch@kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nvme@lists.infradead.org \
--cc=ming.lei@redhat.com \
--cc=shinichiro.kawasaki@wdc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox