Re: [bug report] kernel BUG at mm/hugetlb.c:5880! triggered by blktests nvme/029

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Yi Zhang <yi.zhang@redhat.com>
To: linux-block <linux-block@vger.kernel.org>,
	 "open list:NVM EXPRESS DRIVER" <linux-nvme@lists.infradead.org>,
	linux-mm@kvack.org
Cc: Daniel Wagner <dwagner@suse.de>, Ming Lei <ming.lei@redhat.com>,
	 Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>,
	Keith Busch <kbusch@kernel.org>,  Jens Axboe <axboe@fb.com>,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [bug report] kernel BUG at mm/hugetlb.c:5880! triggered by blktests nvme/029
Date: Mon, 23 Jun 2025 12:35:57 +0800	[thread overview]
Message-ID: <CAHj4cs8VJxeDQtjHvRN+unBrhzU+-vAweUz+eRY3wmhS9LM1fQ@mail.gmail.com> (raw)
In-Reply-To: <CAHj4cs92q3Lc8f=mEZ-e9piZtLj62eJ2Z5iSO-wJuRepspkbsA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 10176 bytes --]

The issue still can be reproduced on the latest linux tree and occurred
when do "test_user_io /dev/nvme0n1 511 1024" in blktests nvme/029, add
linux-mm@ since the BUG occurred on hugetlb side, here is the log:

+ test_user_io /dev/nvme0n1 511 1024
+ local disk=/dev/nvme0n1
+ local start=511
+ local cnt=1024
+ local bs size img img1
++ blockdev --getss /dev/nvme0n1
+ bs=512
+ size=524288
++ mktemp /tmp/blk_img_XXXXXX
+ img=/tmp/blk_img_4aWO9O
++ mktemp /tmp/blk_img_XXXXXX
+ img1=/tmp/blk_img_mFMZKv
+ dd if=/dev/urandom of=/tmp/blk_img_4aWO9O bs=512 count=1024 status=none
+ (( cnt-- ))
+ nvme write --start-block=511 --block-count=1023 --data-size=524288
--data=/tmp/blk_img_4aWO9O /dev/nvme0n1
failed to read data buffer from input file Bad address

[  508.364146] loop0: detected capacity change from 0 to 2097152
[  516.268535] Key type psk registered
[  522.033843] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[  533.542882] nvmet: Created nvm controller 1 for subsystem
blktests-subsystem-1 for NQN
nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
[  533.545106] nvme nvme0: creating 32 I/O queues.
[  533.552928] nvme nvme0: new ctrl: "blktests-subsystem-1"
[  723.596113] ------------[ cut here ]------------
[  723.596779] kernel BUG at mm/hugetlb.c:5880!
[  723.597481] Oops: invalid opcode: 0000 [#1] SMP NOPTI
[  723.597851] CPU: 25 UID: 0 PID: 1637 Comm: nvme Not tainted 6.16.0-rc2+
#26 PREEMPT(undef)
[  723.598334] Hardware name: HP ProLiant DL385p Gen8, BIOS A28 03/14/2018
[  723.598688] RIP: 0010:__unmap_hugepage_range+0x7bb/0x840
[  723.598987] Code: 3c 24 e8 f8 3b da 00 e9 ac fb ff ff 48 b8 00 00 00 c0
7f 00 00 00 48 89 44 24 28 e9 06 f9 ff ff b9 0c 00 00 00 e9 4b ff ff ff
<0f> 0b 0f 0b e9 a0 f8 ff ff 0f 0b 65 48 8b 05 3a 6d 02 03 48 8b 10
[  723.600315] RSP: 0018:ffffd1abca5ffa98 EFLAGS: 00010206
[  723.600612] RAX: 0000000000400000 RBX: ffff888d09f93500 RCX:
0000000000000009
[  723.601376] RDX: 00000000001fffff RSI: ffff888d09f93500 RDI:
ffffd1abca5ffbf0
[  723.602161] RBP: 0000000000000000 R08: ffffffff868d5ed8 R09:
0000000000000003
[  723.602945] R10: 00007f48b7603000 R11: 00007f48b7803000 R12:
00007f48b7603000
[  723.603720] R13: ffffd1abca5ffbf0 R14: ffffd1abca5ffbf0 R15:
ffff888d09f93500
[  723.604465] FS:  00007f48b878e840(0000) GS:ffff888f70297000(0000)
knlGS:0000000000000000
[  723.604906] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  723.605592] CR2: 00007f48b85b9d90 CR3: 0000000646365000 CR4:
00000000000406f0
[  723.606407] Call Trace:
[  723.606599]  <TASK>
[  723.607096]  ? unmap_page_range+0x1a4/0x260
[  723.607363]  unmap_vmas+0xa1/0x180
[  723.607924]  exit_mmap+0xe5/0x3c0
[  723.608484]  __mmput+0x41/0x140
[  723.609052]  exit_mm+0xb1/0x110
[  723.609623]  do_exit+0x19a/0x440
[  723.610195]  do_group_exit+0x2d/0xc0
[  723.610440]  __x64_sys_exit_group+0x18/0x20
[  723.610665]  x64_sys_call+0xfdb/0x14f0
[  723.610911]  do_syscall_64+0x82/0x2c0
[  723.611148]  ? do_read_fault+0x107/0x260
[  723.611399]  ? handle_pte_fault+0x118/0x240
[  723.611623]  ? do_fault+0x150/0x210
[  723.612182]  ? __handle_mm_fault+0x3a7/0x700
[  723.612872]  ? count_memcg_events+0x15d/0x230
[  723.613521]  ? handle_mm_fault+0x248/0x360
[  7x76/0x7e
[  723.713877] RIP: 0033:0x7f48b85b9da8
[  723.714116] Code: Unable to access opcode bytes at 0x7f48b85b9d7e.
[  723.714835] RSP: 002b:00007ffc2853b238 EFLAGS: 00000202 ORIG_RAX:
00000000000000e7
[  723.715633] RAX: ffffffffffffffda RBX: 00007f48b86e4fe8 RCX:
00007f48b85b9da8
[  723.716442] RDX: 00007f48b878eb48 RSI: ffffffffffffff78 RDI:
0000000000000001
[  723.717218] RBP: 00007ffc2853b290 R08: 0000000000000000 R09:
0000000000000000
[  723.718029] R10: 00007ffc2853b020 R11: 0000000000000202 R12:
0000000000000001
[  723.718836] R13: 0000000000000001 R14: 00007f48b86e3680 R15:
00007f48b86e5000
[  723.719613]  </TASK>

5863 void __unmap_hugepage_range(struct mmu_gather *tlb, struct
vm_area_struct *vma,
5864                             unsigned long start, unsigned long end,
5865                             struct folio *folio, zap_flags_t zap_flags)
5866 {
5867         struct mm_struct *mm = vma->vm_mm;
5868         const bool folio_provided = !!folio;
5869         unsigned long address;
5870         pte_t *ptep;
5871         pte_t pte;
5872         spinlock_t *ptl;
5873         struct hstate *h = hstate_vma(vma);
5874         unsigned long sz = huge_page_size(h);
5875         bool adjust_reservation = false;
5876         unsigned long last_addr_mask;
5877         bool force_flush = false;
5878
5879         WARN_ON(!is_vm_hugetlb_page(vma));
5880         BUG_ON(start & ~huge_page_mask(h));



On Mon, Jun 16, 2025 at 7:45 PM Yi Zhang <yi.zhang@redhat.com> wrote:

> Hi
> CKI triggered the following issue[2] with the block/fo-next commit[1],
> please help check it, thanks.
>
> [1]
> commit: 1cbac730bb6b Merge branch 'block-6.16' into for-next
>
> [2]
> [ 1207.436193] run blktests nvme/029 at 2025-06-13 16:11:12
> [ 1207.476177] loop0: detected capacity change from 0 to 2097152
> [ 1207.488130] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
> [ 1207.506313] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
> [ 1207.556941] nvmet: Created nvm controller 1 for subsystem
> blktests-subsystem-1 for NQN
> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
> [ 1207.560824] nvme nvme0: creating 32 I/O queues.
> [ 1207.561919] nvme nvme0: failed to connect socket: -512
> [ 1207.569392] nvmet: Created nvm controller 2 for subsystem
> blktests-subsystem-1 for NQN
> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
> [ 1207.572517] nvme nvme0: creating 32 I/O queues.
> [ 1207.580131] nvme nvme0: mapped 32/0/0 default/read/poll queues.
> [ 1207.599121] nvme nvme0: new ctrl: NQN "blktests-subsystem-1", addr
> 127.0.0.1:4420, hostnqn:
> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
> [ 1207.916342] ------------[ cut here ]------------
> [ 1207.917026] kernel BUG at mm/hugetlb.c:5880!
> [ 1207.917801] Oops: invalid opcode: 0000 [#1] SMP NOPTI
> [ 1207.918227] CPU: 18 UID: 0 PID: 47801 Comm: nvme Not tainted
> 6.16.0-rc1 #1 PREEMPT(lazy)
> [ 1207.918683] Hardware name: HP ProLiant DL385p Gen8, BIOS A28 03/14/2018
> [ 1207.919300] RIP: 0010:__unmap_hugepage_range+0x7a4/0x7f0
> [ 1207.919611] Code: 89 ef 48 89 c6 e8 2c 90 ff ff 48 8b 3c 24 e8 73
> c3 d9 00 e9 bf fb ff ff 0f 0b 49 8b 50 30 48 f7 d2 4c 85 e2 0f 84 e3
> f8 ff ff <0f> 0b 0f 0b 65 48 8b 05 28 a2 0c 03 48 8b 10 f7 c2 00 00 00
> 20 74
> [ 1207.920942] RSP: 0018:ffffcd058ced7ae0 EFLAGS: 00010206
> [ 1207.921231] RAX: 0000000000400000 RBX: 0000000000000000 RCX:
> 0000000000000009
> [ 1207.922070] RDX: 00000000001fffff RSI: ffff8cb38c141c80 RDI:
> ffffcd058ced7c48
> [ 1207.922850] RBP: ffffffffffffffff R08: ffffffffacb56e98 R09:
> 0000000000200000
> [ 1207.923618] R10: 00007f6120006000 R11: ffff8cb4fb586000 R12:
> 00007f611fe06000
> [ 1207.924421] R13: ffffcd058ced7c48 R14: ffff8cb38c141c80 R15:
> ffffcd058ced7c00
> [ 1207.925221] FS:  00007f61210db840(0000) GS:ffff8cb70b096000(0000)
> knlGS:0000000000000000
> [ 1207.925639] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1207.926357] CR2: 00007f6120f4d710 CR3: 000000025c3ba000 CR4:
> 00000000000406f0
> [ 1207.927164] Call Trace:
> [ 1207.927319]  <TASK>
> [ 1207.927833]  unmap_vmas+0xa6/0x180
> [ 1207.928565]  exit_mmap+0xf0/0x3b0
> [ 1207.929175]  __mmput+0x3e/0x130
> [ 1207.929790]  exit_mm+0xaf/0x110
> [ 1207.930457]  do_exit+0x1a5/0x450
> [ 1207.931054]  do_group_exit+0x30/0x80
> [ 1207.931287]  __x64_sys_exit_group+0x18/0x20
> [ 1207.931504]  x64_sys_call+0xfdb/0x14f0
> [ 1207.931751]  do_syscall_64+0x84/0x2c0
> [ 1207.931977]  ? count_memcg_events+0x167/0x1d0
> [ 1207.932623]  ? handle_mm_fault+0x220/0x340
> [ 1207.932879]  ? do_user_addr_fault+0x2c3/0x7f0
> [ 1207.933528]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [ 1207.933842] RIP: 0033:0x7f6120f4d728
> [ 1207.934079] Code: Unable to access opcode bytes at 0x7f6120f4d6fe.
> [ 1207.934818] RSP: 002b:00007ffe4c80b528 EFLAGS: 00000206 ORIG_RAX:
> 00000000000000e7
> [ 1207.935606] RAX: ffffffffffffffda RBX: 00007f6121076fc8 RCX:
> 00007f6120f4d728
> [ 1207.936392] RDX: 00007f61210dbb48 RSI: ffffffffffffff90 RDI:
> 0000000000000001
> [ 1207.937391] RBP: 00007ffe4c80b580 R08: 0000000000000000 R09:
> 0000000000000000
> [ 1207.938510] R10: 00007ffe4c80b310 R11: 0000000000000206 R12:
> 0000000000000001
> [ 1207.939326] R13: 0000000000000001 R14: 00007f6121075680 R15:
> 00007f6121076fe0
> [ 1207.940102]  </TASK>
> [ 1207.940260] Modules linked in: nvmet_tcp nvmet nvme_tcp
> nvme_fabrics nvme nvme_core nvme_keyring nvme_auth nbd pktcdvd rfkill
> sunrpc amd64_edac edac_mce_amd ipmi_ssif kvm tg3 i2c_piix4
> fam15h_power acpi_power_meter k10temp hpilo pcspkr ipmi_si i2c_smbus
> irqbypass acpi_ipmi acpi_cpufreq ipmi_devintf ipmi_msghandler fuse
> loop nfnetlink zram lz4hc_compress lz4_compress xfs polyval_clmulni
> ata_generic ghash_clmulni_intel pata_acpi sha512_ssse3 hpsa mgag200
> serio_raw sha1_ssse3 pata_atiixp sp5100_tco scsi_transport_sas hpwdt
> i2c_algo_bit [last unloaded: nvmet]
> [10: 00007f6120006000 R11: ffff8cb4fb586000 R12: 00007f611fe06000
> [ 1208.443808] R13: ffffcd058ced7c48 R14: ffff8cb38c141c80 R15:
> ffffcd058ced7c00
> [ 1208.444605] FS:  00007f61210db840(0000) GS:ffff8cb70b016000(0000)
> knlGS:0000000000000000
> [ 1208.445062] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1208.445777] CR2: 0000562c0be4a000 CR3: 000000025c3ba000 CR4:
> 00000000000406f0
> [ 1208.446610] Kernel panic - not syncing: Fatal exception
> [ 1208.447172] Kernel Offset: 0x28200000 from 0xffffffff81000000
> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [ 1208.451887] ERST: [Firmware Warn]: Firmware does not respond in time.
> [ 1208.484105] ---[ end Kernel panic - not syncing: Fatal exception ]---
>
> --
> Best Regards,
>   Yi Zhang
>


-- 
Best Regards,
  Yi Zhang

[-- Attachment #2: Type: text/html, Size: 11343 bytes --]

          parent reply	other threads:[~2025-06-23  4:36 UTC|newest]

Thread overview: expand[flat|nested]  mbox.gz  Atom feed
 [parent not found: <CAHj4cs92q3Lc8f=mEZ-e9piZtLj62eJ2Z5iSO-wJuRepspkbsA@mail.gmail.com>]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHj4cs8VJxeDQtjHvRN+unBrhzU+-vAweUz+eRY3wmhS9LM1fQ@mail.gmail.com \
    --to=yi.zhang@redhat.com \
    --cc=axboe@fb.com \
    --cc=dwagner@suse.de \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=ming.lei@redhat.com \
    --cc=shinichiro.kawasaki@wdc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox