* [REGRESSION] cifs: Subreq overread in dmesg, invalid argument & no data available in apps
@ 2024-08-25 23:26 Forest
2024-08-27 3:32 ` Steve French
0 siblings, 1 reply; 6+ messages in thread
From: Forest @ 2024-08-25 23:26 UTC (permalink / raw)
To: David Howells, Steve French, Shyam Prasad N, Rohith Surabattula,
Jeff Layton
Cc: linux-cifs, netfs, linux-fsdevel, linux-mm, regressions
#regzbot introduced: e3786b29c54c
Dear maintainers,
Recent kernel release candidates have a cifs regression that produces
unexpected errors in userspace and a WARNING (with stack trace) in dmesg.
I can consistently reproduce it with these commands on a mounted Samba
share:
$ echo hello > foo
$ ls -l foo
-rw-r----- 1 user user 6 Aug 25 15:41 foo
$ cat foo
cat: foo: Invalid argument
$ xxd foo
00000000: 6865 6c6c 6f0a 0000 0000 0000 0000 0000 hello...........
00000010: 0000 0000 0000 0000 0000 0000 0000 0000 ................
(...more null bytes...)
00001fe0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00001ff0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
xxd: No data available
The xxd command above also triggers these kernel log messages...
Subreq overread: R3[1] 312 > 8192 - 7956
WARNING: CPU: 30 PID: 421 at fs/netfs/io.c:495
...followed by the usual module list and stack trace.
The bug is present in kernel v6.11-rc5, but not the 6.10 series.
Git bisect says:
e3786b29c54cdae3490b07180a54e2461f42144c is the first bad commit
commit e3786b29c54cdae3490b07180a54e2461f42144c
Author: Dominique Martinet <asmadeus@codewreck.org>
Date: Thu Aug 8 14:29:38 2024 +0100
9p: Fix DIO read through netfs
Here's the full dmesg output when I run xxd on kernel v6.11-rc5:
[ 48.137018] ------------[ cut here ]------------
[ 48.137021] Subreq overread: R3[1] 312 > 8192 - 7956
[ 48.137029] WARNING: CPU: 30 PID: 421 at fs/netfs/io.c:495 netfs_subreq_terminated+0x276/0x2d0 [netfs]
[ 48.137046] Modules linked in: rfcomm algif_hash algif_skcipher af_alg cmac nls_utf8 cifs cifs_arc4 nls_ucs2_utils cifs_md4 dns_resolver netfs nft_masq nft_chain_nat nf_nat
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bridge stp llc nf_tables nfnetlink nvme_fabrics essiv authenc crypto_null snd_seq_dummy snd_hrtimer snd_seq snd_seq_device qrtr zstd
zram bnep binfmt_misc nls_ascii nls_cp437 vfat fat mt7921e snd_hda_codec_realtek mt7921_common snd_hda_codec_generic mt792x_lib snd_hda_scodec_component mt76_connac_lib
snd_hda_codec_hdmi mt76 btusb snd_hda_intel amd_atl btrtl intel_rapl_msr snd_intel_dspcfg intel_rapl_common snd_intel_sdw_acpi btintel amd64_edac edac_mce_amd mac80211 btbcm
snd_hda_codec asus_nb_wmi eeepc_wmi btmtk asus_wmi kvm_amd snd_hda_core sparse_keymap bluetooth libarc4 snd_hwdep platform_profile kvm cfg80211 snd_pcm battery wmi_bmof rapl
snd_timer sp5100_tco ccp pcspkr watchdog snd k10temp rfkill soundcore joydev sg evdev nct6775 nct6775_core hwmon_vid msr parport_pc ppdev lp parport loop efi_pstore
[ 48.137103] configfs ip_tables x_tables autofs4 ext4 mbcache jbd2 btrfs dm_crypt dm_mod efivarfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx
xor raid6_pq libcrc32c crc32c_generic raid1 raid0 md_mod hid_generic usbhid hid amdgpu amdxcp drm_exec gpu_sched drm_buddy i2c_algo_bit drm_suballoc_helper drm_display_helper
sd_mod cec crct10dif_pclmul rc_core crc32_pclmul xhci_pci drm_ttm_helper crc32c_intel ttm ahci xhci_hcd drm_kms_helper libahci r8169 ghash_clmulni_intel libata sha512_ssse3
realtek nvme mdio_devres sha256_ssse3 drm usbcore scsi_mod sha1_ssse3 i2c_piix4 libphy video nvme_core i2c_smbus usb_common scsi_common crc16 wmi gpio_amdpt gpio_generic button
aesni_intel gf128mul crypto_simd cryptd
[ 48.137148] CPU: 30 UID: 0 PID: 421 Comm: kworker/30:1 Not tainted 6.11.0-rc5 #3
[ 48.137150] Hardware name: ASUS System XXXXXXXXXX
[ 48.137151] Workqueue: cifsiod smb2_readv_worker [cifs]
[ 48.137176] RIP: 0010:netfs_subreq_terminated+0x276/0x2d0 [netfs]
[ 48.137182] Code: 66 ff ff ff 0f 1f 44 00 00 e9 5c ff ff ff 48 89 f1 0f b7 93 86 00 00 00 8b b5 ac 01 00 00 48 c7 c7 78 81 7a c2 e8 ba 68 2f da <0f> 0b 48 8b 43 70 31 d2 4c
8d ab 98 00 00 00 66 89 93 84 00 00 00
[ 48.137183] RSP: 0018:ffffad8942637e58 EFLAGS: 00010282
[ 48.137185] RAX: 0000000000000000 RBX: ffff9d09639a7200 RCX: 0000000000000027
[ 48.137186] RDX: ffff9d107e721788 RSI: 0000000000000001 RDI: ffff9d107e721780
[ 48.137187] RBP: ffff9d094bf38a00 R08: 0000000000000000 R09: 0000000000000003
[ 48.137187] R10: ffffad8942637ce8 R11: ffff9d109de3cfe8 R12: 0000000000000001
[ 48.137188] R13: ffff9d095fcf6000 R14: ffff9d09639a7208 R15: 0000000000000000
[ 48.137189] FS: 0000000000000000(0000) GS:ffff9d107e700000(0000) knlGS:0000000000000000
[ 48.137190] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 48.137191] CR2: 000055d1a3402760 CR3: 00000006b0222000 CR4: 0000000000750ef0
[ 48.137192] PKRU: 55555554
[ 48.137193] Call Trace:
[ 48.137194] <TASK>
[ 48.137197] ? __warn+0x80/0x120
[ 48.137201] ? netfs_subreq_terminated+0x276/0x2d0 [netfs]
[ 48.137207] ? report_bug+0x164/0x190
[ 48.137210] ? prb_read_valid+0x1b/0x30
[ 48.137213] ? handle_bug+0x41/0x70
[ 48.137215] ? exc_invalid_op+0x17/0x70
[ 48.137216] ? asm_exc_invalid_op+0x1a/0x20
[ 48.137220] ? netfs_subreq_terminated+0x276/0x2d0 [netfs]
[ 48.137225] ? netfs_subreq_terminated+0x276/0x2d0 [netfs]
[ 48.137230] process_one_work+0x179/0x390
[ 48.137233] worker_thread+0x249/0x350
[ 48.137235] ? __pfx_worker_thread+0x10/0x10
[ 48.137237] kthread+0xcf/0x100
[ 48.137240] ? __pfx_kthread+0x10/0x10
[ 48.137242] ret_from_fork+0x31/0x50
[ 48.137244] ? __pfx_kthread+0x10/0x10
[ 48.137246] ret_from_fork_asm+0x1a/0x30
[ 48.137250] </TASK>
[ 48.137251] ---[ end trace 0000000000000000 ]---
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [REGRESSION] cifs: Subreq overread in dmesg, invalid argument & no data available in apps
2024-08-25 23:26 [REGRESSION] cifs: Subreq overread in dmesg, invalid argument & no data available in apps Forest
@ 2024-08-27 3:32 ` Steve French
2024-08-27 4:27 ` Steve French
2024-08-27 18:15 ` Forest
0 siblings, 2 replies; 6+ messages in thread
From: Steve French @ 2024-08-27 3:32 UTC (permalink / raw)
To: Forest
Cc: David Howells, Steve French, Shyam Prasad N, Rohith Surabattula,
Jeff Layton, linux-cifs, netfs, linux-fsdevel, linux-mm,
regressions
Yes - thanks for reporting this. I can reproduce this (although
slightly different error on "cat foo" with 6.11-rc5 vs. later patches
in for-next - I get ""No data available" now with some of David's
additional patches instead of "Invalid argument" which you get on
6.11-rc5).
This problem looks related to something that David has discussed
earlier - on the wire (with 6.11-rc5) we see SMB3 READ with the server
returning "STATUS_INVALID_PARAMETER" - there are actually two reads
in a row that are similar (1MB at offset 0), the first succeeds, the
second request has a "credit charge" of 1 (instead of 16 which is
what I would have expected) which is likely related to the cause of
the problem (and the second read fails).
It looks like some of this is fixed with David's patch:
https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/commit/?h=netfs-fixes&id=78d0d91398ad7bc37e73cdda65602ac8b6d675bb
0014-cifs-Fix-lack-of-credit-renegotiation-on-read-retry.patch
It is possible that the second part of this is fixed with:
https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/commit/?h=netfs-fixes&id=e81de4d9db1c25dc3e9feb21548fc5f3e2b3ad8e
Is there an easy way for you to retry with those two patches?
On Sun, Aug 25, 2024 at 6:26 PM Forest <forestix@nom.one> wrote:
>
> #regzbot introduced: e3786b29c54c
>
> Dear maintainers,
>
> Recent kernel release candidates have a cifs regression that produces
> unexpected errors in userspace and a WARNING (with stack trace) in dmesg.
>
> I can consistently reproduce it with these commands on a mounted Samba
> share:
>
>
> $ echo hello > foo
> $ ls -l foo
> -rw-r----- 1 user user 6 Aug 25 15:41 foo
> $ cat foo
> cat: foo: Invalid argument
> $ xxd foo
> 00000000: 6865 6c6c 6f0a 0000 0000 0000 0000 0000 hello...........
> 00000010: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> (...more null bytes...)
> 00001fe0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 00001ff0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> xxd: No data available
>
>
> The xxd command above also triggers these kernel log messages...
>
> Subreq overread: R3[1] 312 > 8192 - 7956
> WARNING: CPU: 30 PID: 421 at fs/netfs/io.c:495
>
> ...followed by the usual module list and stack trace.
>
>
> The bug is present in kernel v6.11-rc5, but not the 6.10 series.
>
> Git bisect says:
>
> e3786b29c54cdae3490b07180a54e2461f42144c is the first bad commit
> commit e3786b29c54cdae3490b07180a54e2461f42144c
> Author: Dominique Martinet <asmadeus@codewreck.org>
> Date: Thu Aug 8 14:29:38 2024 +0100
> 9p: Fix DIO read through netfs
>
>
> Here's the full dmesg output when I run xxd on kernel v6.11-rc5:
>
> [ 48.137018] ------------[ cut here ]------------
> [ 48.137021] Subreq overread: R3[1] 312 > 8192 - 7956
> [ 48.137029] WARNING: CPU: 30 PID: 421 at fs/netfs/io.c:495 netfs_subreq_terminated+0x276/0x2d0 [netfs]
> [ 48.137046] Modules linked in: rfcomm algif_hash algif_skcipher af_alg cmac nls_utf8 cifs cifs_arc4 nls_ucs2_utils cifs_md4 dns_resolver netfs nft_masq nft_chain_nat nf_nat
> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bridge stp llc nf_tables nfnetlink nvme_fabrics essiv authenc crypto_null snd_seq_dummy snd_hrtimer snd_seq snd_seq_device qrtr zstd
> zram bnep binfmt_misc nls_ascii nls_cp437 vfat fat mt7921e snd_hda_codec_realtek mt7921_common snd_hda_codec_generic mt792x_lib snd_hda_scodec_component mt76_connac_lib
> snd_hda_codec_hdmi mt76 btusb snd_hda_intel amd_atl btrtl intel_rapl_msr snd_intel_dspcfg intel_rapl_common snd_intel_sdw_acpi btintel amd64_edac edac_mce_amd mac80211 btbcm
> snd_hda_codec asus_nb_wmi eeepc_wmi btmtk asus_wmi kvm_amd snd_hda_core sparse_keymap bluetooth libarc4 snd_hwdep platform_profile kvm cfg80211 snd_pcm battery wmi_bmof rapl
> snd_timer sp5100_tco ccp pcspkr watchdog snd k10temp rfkill soundcore joydev sg evdev nct6775 nct6775_core hwmon_vid msr parport_pc ppdev lp parport loop efi_pstore
> [ 48.137103] configfs ip_tables x_tables autofs4 ext4 mbcache jbd2 btrfs dm_crypt dm_mod efivarfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx
> xor raid6_pq libcrc32c crc32c_generic raid1 raid0 md_mod hid_generic usbhid hid amdgpu amdxcp drm_exec gpu_sched drm_buddy i2c_algo_bit drm_suballoc_helper drm_display_helper
> sd_mod cec crct10dif_pclmul rc_core crc32_pclmul xhci_pci drm_ttm_helper crc32c_intel ttm ahci xhci_hcd drm_kms_helper libahci r8169 ghash_clmulni_intel libata sha512_ssse3
> realtek nvme mdio_devres sha256_ssse3 drm usbcore scsi_mod sha1_ssse3 i2c_piix4 libphy video nvme_core i2c_smbus usb_common scsi_common crc16 wmi gpio_amdpt gpio_generic button
> aesni_intel gf128mul crypto_simd cryptd
> [ 48.137148] CPU: 30 UID: 0 PID: 421 Comm: kworker/30:1 Not tainted 6.11.0-rc5 #3
> [ 48.137150] Hardware name: ASUS System XXXXXXXXXX
> [ 48.137151] Workqueue: cifsiod smb2_readv_worker [cifs]
> [ 48.137176] RIP: 0010:netfs_subreq_terminated+0x276/0x2d0 [netfs]
> [ 48.137182] Code: 66 ff ff ff 0f 1f 44 00 00 e9 5c ff ff ff 48 89 f1 0f b7 93 86 00 00 00 8b b5 ac 01 00 00 48 c7 c7 78 81 7a c2 e8 ba 68 2f da <0f> 0b 48 8b 43 70 31 d2 4c
> 8d ab 98 00 00 00 66 89 93 84 00 00 00
> [ 48.137183] RSP: 0018:ffffad8942637e58 EFLAGS: 00010282
> [ 48.137185] RAX: 0000000000000000 RBX: ffff9d09639a7200 RCX: 0000000000000027
> [ 48.137186] RDX: ffff9d107e721788 RSI: 0000000000000001 RDI: ffff9d107e721780
> [ 48.137187] RBP: ffff9d094bf38a00 R08: 0000000000000000 R09: 0000000000000003
> [ 48.137187] R10: ffffad8942637ce8 R11: ffff9d109de3cfe8 R12: 0000000000000001
> [ 48.137188] R13: ffff9d095fcf6000 R14: ffff9d09639a7208 R15: 0000000000000000
> [ 48.137189] FS: 0000000000000000(0000) GS:ffff9d107e700000(0000) knlGS:0000000000000000
> [ 48.137190] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 48.137191] CR2: 000055d1a3402760 CR3: 00000006b0222000 CR4: 0000000000750ef0
> [ 48.137192] PKRU: 55555554
> [ 48.137193] Call Trace:
> [ 48.137194] <TASK>
> [ 48.137197] ? __warn+0x80/0x120
> [ 48.137201] ? netfs_subreq_terminated+0x276/0x2d0 [netfs]
> [ 48.137207] ? report_bug+0x164/0x190
> [ 48.137210] ? prb_read_valid+0x1b/0x30
> [ 48.137213] ? handle_bug+0x41/0x70
> [ 48.137215] ? exc_invalid_op+0x17/0x70
> [ 48.137216] ? asm_exc_invalid_op+0x1a/0x20
> [ 48.137220] ? netfs_subreq_terminated+0x276/0x2d0 [netfs]
> [ 48.137225] ? netfs_subreq_terminated+0x276/0x2d0 [netfs]
> [ 48.137230] process_one_work+0x179/0x390
> [ 48.137233] worker_thread+0x249/0x350
> [ 48.137235] ? __pfx_worker_thread+0x10/0x10
> [ 48.137237] kthread+0xcf/0x100
> [ 48.137240] ? __pfx_kthread+0x10/0x10
> [ 48.137242] ret_from_fork+0x31/0x50
> [ 48.137244] ? __pfx_kthread+0x10/0x10
> [ 48.137246] ret_from_fork_asm+0x1a/0x30
> [ 48.137250] </TASK>
> [ 48.137251] ---[ end trace 0000000000000000 ]---
>
--
Thanks,
Steve
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [REGRESSION] cifs: Subreq overread in dmesg, invalid argument & no data available in apps
2024-08-27 3:32 ` Steve French
@ 2024-08-27 4:27 ` Steve French
2024-08-27 4:38 ` Dominique Martinet
2024-08-27 18:15 ` Forest
1 sibling, 1 reply; 6+ messages in thread
From: Steve French @ 2024-08-27 4:27 UTC (permalink / raw)
To: Forest
Cc: David Howells, Steve French, Shyam Prasad N, Rohith Surabattula,
Jeff Layton, linux-cifs, netfs, linux-fsdevel, linux-mm,
regressions, Dominique Martinet, Christian Brauner
I have also confirmed your theory that the regressions (there are
multiple) were likely caused by the netfs change added between
6.11-rc3 and 6.11-rc4:
" 9p: Fix DIO read through netfs"
But reverting the cifs.ko part of that patch alters the error but does
not completely fix the problem, so the netfs change is also related
diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
index f9b302cb8233..f92d437c9e55 100644
--- a/fs/smb/client/file.c
+++ b/fs/smb/client/file.c
@@ -217,7 +217,7 @@ static void cifs_req_issue_read(struct
netfs_io_subrequest *subreq)
goto out;
}
- if (subreq->rreq->origin != NETFS_DIO_READ)
+/* if (subreq->rreq->origin != NETFS_DIO_READ) */ /* TEST WITH
THIS CHANGE from "9p Fix DIO read through netfs patch */
__set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
rc = rdata->server->ops->async_readv(rdata);
<
On Mon, Aug 26, 2024 at 10:32 PM Steve French <smfrench@gmail.com> wrote:
>
> Yes - thanks for reporting this. I can reproduce this (although
> slightly different error on "cat foo" with 6.11-rc5 vs. later patches
> in for-next - I get ""No data available" now with some of David's
> additional patches instead of "Invalid argument" which you get on
> 6.11-rc5).
>
> This problem looks related to something that David has discussed
> earlier - on the wire (with 6.11-rc5) we see SMB3 READ with the server
> returning "STATUS_INVALID_PARAMETER" - there are actually two reads
> in a row that are similar (1MB at offset 0), the first succeeds, the
> second request has a "credit charge" of 1 (instead of 16 which is
> what I would have expected) which is likely related to the cause of
> the problem (and the second read fails).
>
> It looks like some of this is fixed with David's patch:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/commit/?h=netfs-fixes&id=78d0d91398ad7bc37e73cdda65602ac8b6d675bb
>
> 0014-cifs-Fix-lack-of-credit-renegotiation-on-read-retry.patch
>
> It is possible that the second part of this is fixed with:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/commit/?h=netfs-fixes&id=e81de4d9db1c25dc3e9feb21548fc5f3e2b3ad8e
>
> Is there an easy way for you to retry with those two patches?
>
> On Sun, Aug 25, 2024 at 6:26 PM Forest <forestix@nom.one> wrote:
> >
> > #regzbot introduced: e3786b29c54c
> >
> > Dear maintainers,
> >
> > Recent kernel release candidates have a cifs regression that produces
> > unexpected errors in userspace and a WARNING (with stack trace) in dmesg.
> >
> > I can consistently reproduce it with these commands on a mounted Samba
> > share:
> >
> >
> > $ echo hello > foo
> > $ ls -l foo
> > -rw-r----- 1 user user 6 Aug 25 15:41 foo
> > $ cat foo
> > cat: foo: Invalid argument
> > $ xxd foo
> > 00000000: 6865 6c6c 6f0a 0000 0000 0000 0000 0000 hello...........
> > 00000010: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> > (...more null bytes...)
> > 00001fe0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> > 00001ff0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> > xxd: No data available
> >
> >
> > The xxd command above also triggers these kernel log messages...
> >
> > Subreq overread: R3[1] 312 > 8192 - 7956
> > WARNING: CPU: 30 PID: 421 at fs/netfs/io.c:495
> >
> > ...followed by the usual module list and stack trace.
> >
> >
> > The bug is present in kernel v6.11-rc5, but not the 6.10 series.
> >
> > Git bisect says:
> >
> > e3786b29c54cdae3490b07180a54e2461f42144c is the first bad commit
> > commit e3786b29c54cdae3490b07180a54e2461f42144c
> > Author: Dominique Martinet <asmadeus@codewreck.org>
> > Date: Thu Aug 8 14:29:38 2024 +0100
> > 9p: Fix DIO read through netfs
> >
> >
> > Here's the full dmesg output when I run xxd on kernel v6.11-rc5:
> >
> > [ 48.137018] ------------[ cut here ]------------
> > [ 48.137021] Subreq overread: R3[1] 312 > 8192 - 7956
> > [ 48.137029] WARNING: CPU: 30 PID: 421 at fs/netfs/io.c:495 netfs_subreq_terminated+0x276/0x2d0 [netfs]
> > [ 48.137046] Modules linked in: rfcomm algif_hash algif_skcipher af_alg cmac nls_utf8 cifs cifs_arc4 nls_ucs2_utils cifs_md4 dns_resolver netfs nft_masq nft_chain_nat nf_nat
> > nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bridge stp llc nf_tables nfnetlink nvme_fabrics essiv authenc crypto_null snd_seq_dummy snd_hrtimer snd_seq snd_seq_device qrtr zstd
> > zram bnep binfmt_misc nls_ascii nls_cp437 vfat fat mt7921e snd_hda_codec_realtek mt7921_common snd_hda_codec_generic mt792x_lib snd_hda_scodec_component mt76_connac_lib
> > snd_hda_codec_hdmi mt76 btusb snd_hda_intel amd_atl btrtl intel_rapl_msr snd_intel_dspcfg intel_rapl_common snd_intel_sdw_acpi btintel amd64_edac edac_mce_amd mac80211 btbcm
> > snd_hda_codec asus_nb_wmi eeepc_wmi btmtk asus_wmi kvm_amd snd_hda_core sparse_keymap bluetooth libarc4 snd_hwdep platform_profile kvm cfg80211 snd_pcm battery wmi_bmof rapl
> > snd_timer sp5100_tco ccp pcspkr watchdog snd k10temp rfkill soundcore joydev sg evdev nct6775 nct6775_core hwmon_vid msr parport_pc ppdev lp parport loop efi_pstore
> > [ 48.137103] configfs ip_tables x_tables autofs4 ext4 mbcache jbd2 btrfs dm_crypt dm_mod efivarfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx
> > xor raid6_pq libcrc32c crc32c_generic raid1 raid0 md_mod hid_generic usbhid hid amdgpu amdxcp drm_exec gpu_sched drm_buddy i2c_algo_bit drm_suballoc_helper drm_display_helper
> > sd_mod cec crct10dif_pclmul rc_core crc32_pclmul xhci_pci drm_ttm_helper crc32c_intel ttm ahci xhci_hcd drm_kms_helper libahci r8169 ghash_clmulni_intel libata sha512_ssse3
> > realtek nvme mdio_devres sha256_ssse3 drm usbcore scsi_mod sha1_ssse3 i2c_piix4 libphy video nvme_core i2c_smbus usb_common scsi_common crc16 wmi gpio_amdpt gpio_generic button
> > aesni_intel gf128mul crypto_simd cryptd
> > [ 48.137148] CPU: 30 UID: 0 PID: 421 Comm: kworker/30:1 Not tainted 6.11.0-rc5 #3
> > [ 48.137150] Hardware name: ASUS System XXXXXXXXXX
> > [ 48.137151] Workqueue: cifsiod smb2_readv_worker [cifs]
> > [ 48.137176] RIP: 0010:netfs_subreq_terminated+0x276/0x2d0 [netfs]
> > [ 48.137182] Code: 66 ff ff ff 0f 1f 44 00 00 e9 5c ff ff ff 48 89 f1 0f b7 93 86 00 00 00 8b b5 ac 01 00 00 48 c7 c7 78 81 7a c2 e8 ba 68 2f da <0f> 0b 48 8b 43 70 31 d2 4c
> > 8d ab 98 00 00 00 66 89 93 84 00 00 00
> > [ 48.137183] RSP: 0018:ffffad8942637e58 EFLAGS: 00010282
> > [ 48.137185] RAX: 0000000000000000 RBX: ffff9d09639a7200 RCX: 0000000000000027
> > [ 48.137186] RDX: ffff9d107e721788 RSI: 0000000000000001 RDI: ffff9d107e721780
> > [ 48.137187] RBP: ffff9d094bf38a00 R08: 0000000000000000 R09: 0000000000000003
> > [ 48.137187] R10: ffffad8942637ce8 R11: ffff9d109de3cfe8 R12: 0000000000000001
> > [ 48.137188] R13: ffff9d095fcf6000 R14: ffff9d09639a7208 R15: 0000000000000000
> > [ 48.137189] FS: 0000000000000000(0000) GS:ffff9d107e700000(0000) knlGS:0000000000000000
> > [ 48.137190] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 48.137191] CR2: 000055d1a3402760 CR3: 00000006b0222000 CR4: 0000000000750ef0
> > [ 48.137192] PKRU: 55555554
> > [ 48.137193] Call Trace:
> > [ 48.137194] <TASK>
> > [ 48.137197] ? __warn+0x80/0x120
> > [ 48.137201] ? netfs_subreq_terminated+0x276/0x2d0 [netfs]
> > [ 48.137207] ? report_bug+0x164/0x190
> > [ 48.137210] ? prb_read_valid+0x1b/0x30
> > [ 48.137213] ? handle_bug+0x41/0x70
> > [ 48.137215] ? exc_invalid_op+0x17/0x70
> > [ 48.137216] ? asm_exc_invalid_op+0x1a/0x20
> > [ 48.137220] ? netfs_subreq_terminated+0x276/0x2d0 [netfs]
> > [ 48.137225] ? netfs_subreq_terminated+0x276/0x2d0 [netfs]
> > [ 48.137230] process_one_work+0x179/0x390
> > [ 48.137233] worker_thread+0x249/0x350
> > [ 48.137235] ? __pfx_worker_thread+0x10/0x10
> > [ 48.137237] kthread+0xcf/0x100
> > [ 48.137240] ? __pfx_kthread+0x10/0x10
> > [ 48.137242] ret_from_fork+0x31/0x50
> > [ 48.137244] ? __pfx_kthread+0x10/0x10
> > [ 48.137246] ret_from_fork_asm+0x1a/0x30
> > [ 48.137250] </TASK>
> > [ 48.137251] ---[ end trace 0000000000000000 ]---
> >
>
>
> --
> Thanks,
>
> Steve
--
Thanks,
Steve
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [REGRESSION] cifs: Subreq overread in dmesg, invalid argument & no data available in apps
2024-08-27 4:27 ` Steve French
@ 2024-08-27 4:38 ` Dominique Martinet
2024-08-27 5:02 ` Steve French
0 siblings, 1 reply; 6+ messages in thread
From: Dominique Martinet @ 2024-08-27 4:38 UTC (permalink / raw)
To: Steve French
Cc: Forest, David Howells, Steve French, Shyam Prasad N,
Rohith Surabattula, Jeff Layton, linux-cifs, netfs,
linux-fsdevel, linux-mm, regressions, Christian Brauner
Steve French wrote on Mon, Aug 26, 2024 at 11:27:33PM -0500:
> I have also confirmed your theory that the regressions (there are
> multiple) were likely caused by the netfs change added between
> 6.11-rc3 and 6.11-rc4:
> " 9p: Fix DIO read through netfs"
>
> But reverting the cifs.ko part of that patch alters the error but does
> not completely fix the problem, so the netfs change is also related
David sent a bunch of cifs fixes including this patch:
https://lore.kernel.org/r/20240823200819.532106-8-dhowells@redhat.com
"netfs, cifs: Fix handling of short DIO read"
I don't have any samba server around to try myself, did you have a
chance to have a look?
That "9p" commit touches all netfs filesystems and definitely shouldn't
have been labeled 9p (even if it does fix a 9p regression from an
earlier netfs commit...), and this all feels a bit like falling forward
but hopefully we can get this all fixed by the time 6.11 is ready...
--
Dominique
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [REGRESSION] cifs: Subreq overread in dmesg, invalid argument & no data available in apps
2024-08-27 4:38 ` Dominique Martinet
@ 2024-08-27 5:02 ` Steve French
0 siblings, 0 replies; 6+ messages in thread
From: Steve French @ 2024-08-27 5:02 UTC (permalink / raw)
To: Dominique Martinet
Cc: Forest, David Howells, Steve French, Shyam Prasad N,
Rohith Surabattula, Jeff Layton, linux-cifs, netfs,
linux-fsdevel, linux-mm, regressions, Christian Brauner
On Mon, Aug 26, 2024 at 11:38 PM Dominique Martinet
<asmadeus@codewreck.org> wrote:
>
> Steve French wrote on Mon, Aug 26, 2024 at 11:27:33PM -0500:
> > I have also confirmed your theory that the regressions (there are
> > multiple) were likely caused by the netfs change added between
> > 6.11-rc3 and 6.11-rc4:
> > " 9p: Fix DIO read through netfs"
> >
> > But reverting the cifs.ko part of that patch alters the error but does
> > not completely fix the problem, so the netfs change is also related
>
> David sent a bunch of cifs fixes including this patch:
> https://lore.kernel.org/r/20240823200819.532106-8-dhowells@redhat.com
> "netfs, cifs: Fix handling of short DIO read"
>
>
> I don't have any samba server around to try myself, did you have a
> chance to have a look?
Yes - I just verified that David's latest branch does fix the problem
you reported (ie "cat foo" and "xxd foo" on mounts to Samba)
I ran with 12 patches (listed below) of David on top of 6.11-rc5 (and
3 unrelated patches from my for-next branch), but most
of his patches are not in linux-next yet
540511907bbf (HEAD -> for-next, origin/for-next, origin/HEAD) netfs,
cifs: Fix handling of short DIO read
76b6078dcc72 cifs: Fix lack of credit renegotiation on read retry
de7b2d4f8789 netfs: Fix interaction of streaming writes with zero-point tracker
0d718d75935d netfs: Fix missing iterator reset on retry of short read
eb54e19d9cb8 netfs: Fix trimming of streaming-write folios in
netfs_inval_folio()
b10b7505454c netfs: Fix netfs_release_folio() to say no if folio dirty
5fee95cba515 mm: Fix missing folio invalidation calls during truncation
68c4ccecdf68 backing-file: convert to using fops->splice_write
65a5aad19a17 Revert "pidfd: prevent creation of pidfds for kthreads"
3e99ab871c90 netfs, ceph: Partially revert "netfs: Replace PG_fscache
by setting folio->private and marking dirty"
06f646e2d0d4 netfs, cifs: Improve some debugging bits
416871f4fb84 cifs: Fix FALLOC_FL_PUNCH_HOLE support
017d17017436 smb/client: fix rdma usage in smb2_async_writev()
b608e2c31878 smb/client: remove unused rq_iter_size from struct smb_rqst
c724b2ab6a46 smb/client: avoid dereferencing rdata=NULL in smb2_new_read_req()
--
Thanks,
Steve
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [REGRESSION] cifs: Subreq overread in dmesg, invalid argument & no data available in apps
2024-08-27 3:32 ` Steve French
2024-08-27 4:27 ` Steve French
@ 2024-08-27 18:15 ` Forest
1 sibling, 0 replies; 6+ messages in thread
From: Forest @ 2024-08-27 18:15 UTC (permalink / raw)
To: Steve French
Cc: David Howells, Steve French, Shyam Prasad N, Rohith Surabattula,
Jeff Layton, linux-cifs, netfs, linux-fsdevel, linux-mm,
regressions
I applied those two patches on top of commit 3e9bff3bbe13 ("Merge tag
'vfs-6.11-rc6.fixes'"). With those, the echo/cat/xxd test behaves as
expected, at least the few times I tried it. No errors, no extra null bytes,
no warning in dmesg.
On Mon, 26 Aug 2024 22:32:35 -0500, Steve French wrote:
>Yes - thanks for reporting this. I can reproduce this (although
>slightly different error on "cat foo" with 6.11-rc5 vs. later patches
>in for-next - I get ""No data available" now with some of David's
>additional patches instead of "Invalid argument" which you get on
>6.11-rc5).
>
>This problem looks related to something that David has discussed
>earlier - on the wire (with 6.11-rc5) we see SMB3 READ with the server
>returning "STATUS_INVALID_PARAMETER" - there are actually two reads
>in a row that are similar (1MB at offset 0), the first succeeds, the
>second request has a "credit charge" of 1 (instead of 16 which is
>what I would have expected) which is likely related to the cause of
>the problem (and the second read fails).
>
>It looks like some of this is fixed with David's patch:
>
>https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/commit/?h=netfs-fixes&id=78d0d91398ad7bc37e73cdda65602ac8b6d675bb
>
>0014-cifs-Fix-lack-of-credit-renegotiation-on-read-retry.patch
>
>It is possible that the second part of this is fixed with:
>
>https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/commit/?h=netfs-fixes&id=e81de4d9db1c25dc3e9feb21548fc5f3e2b3ad8e
>
>Is there an easy way for you to retry with those two patches?
>
>On Sun, Aug 25, 2024 at 6:26?PM Forest <forestix@nom.one> wrote:
>>
>> #regzbot introduced: e3786b29c54c
>>
>> Dear maintainers,
>>
>> Recent kernel release candidates have a cifs regression that produces
>> unexpected errors in userspace and a WARNING (with stack trace) in dmesg.
>>
>> I can consistently reproduce it with these commands on a mounted Samba
>> share:
>>
>>
>> $ echo hello > foo
>> $ ls -l foo
>> -rw-r----- 1 user user 6 Aug 25 15:41 foo
>> $ cat foo
>> cat: foo: Invalid argument
>> $ xxd foo
>> 00000000: 6865 6c6c 6f0a 0000 0000 0000 0000 0000 hello...........
>> 00000010: 0000 0000 0000 0000 0000 0000 0000 0000 ................
>> (...more null bytes...)
>> 00001fe0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
>> 00001ff0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
>> xxd: No data available
>>
>>
>> The xxd command above also triggers these kernel log messages...
>>
>> Subreq overread: R3[1] 312 > 8192 - 7956
>> WARNING: CPU: 30 PID: 421 at fs/netfs/io.c:495
>>
>> ...followed by the usual module list and stack trace.
>>
>>
>> The bug is present in kernel v6.11-rc5, but not the 6.10 series.
>>
>> Git bisect says:
>>
>> e3786b29c54cdae3490b07180a54e2461f42144c is the first bad commit
>> commit e3786b29c54cdae3490b07180a54e2461f42144c
>> Author: Dominique Martinet <asmadeus@codewreck.org>
>> Date: Thu Aug 8 14:29:38 2024 +0100
>> 9p: Fix DIO read through netfs
>>
>>
>> Here's the full dmesg output when I run xxd on kernel v6.11-rc5:
>>
>> [ 48.137018] ------------[ cut here ]------------
>> [ 48.137021] Subreq overread: R3[1] 312 > 8192 - 7956
>> [ 48.137029] WARNING: CPU: 30 PID: 421 at fs/netfs/io.c:495 netfs_subreq_terminated+0x276/0x2d0 [netfs]
>> [ 48.137046] Modules linked in: rfcomm algif_hash algif_skcipher af_alg cmac nls_utf8 cifs cifs_arc4 nls_ucs2_utils cifs_md4 dns_resolver netfs nft_masq nft_chain_nat nf_nat
>> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bridge stp llc nf_tables nfnetlink nvme_fabrics essiv authenc crypto_null snd_seq_dummy snd_hrtimer snd_seq snd_seq_device qrtr zstd
>> zram bnep binfmt_misc nls_ascii nls_cp437 vfat fat mt7921e snd_hda_codec_realtek mt7921_common snd_hda_codec_generic mt792x_lib snd_hda_scodec_component mt76_connac_lib
>> snd_hda_codec_hdmi mt76 btusb snd_hda_intel amd_atl btrtl intel_rapl_msr snd_intel_dspcfg intel_rapl_common snd_intel_sdw_acpi btintel amd64_edac edac_mce_amd mac80211 btbcm
>> snd_hda_codec asus_nb_wmi eeepc_wmi btmtk asus_wmi kvm_amd snd_hda_core sparse_keymap bluetooth libarc4 snd_hwdep platform_profile kvm cfg80211 snd_pcm battery wmi_bmof rapl
>> snd_timer sp5100_tco ccp pcspkr watchdog snd k10temp rfkill soundcore joydev sg evdev nct6775 nct6775_core hwmon_vid msr parport_pc ppdev lp parport loop efi_pstore
>> [ 48.137103] configfs ip_tables x_tables autofs4 ext4 mbcache jbd2 btrfs dm_crypt dm_mod efivarfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx
>> xor raid6_pq libcrc32c crc32c_generic raid1 raid0 md_mod hid_generic usbhid hid amdgpu amdxcp drm_exec gpu_sched drm_buddy i2c_algo_bit drm_suballoc_helper drm_display_helper
>> sd_mod cec crct10dif_pclmul rc_core crc32_pclmul xhci_pci drm_ttm_helper crc32c_intel ttm ahci xhci_hcd drm_kms_helper libahci r8169 ghash_clmulni_intel libata sha512_ssse3
>> realtek nvme mdio_devres sha256_ssse3 drm usbcore scsi_mod sha1_ssse3 i2c_piix4 libphy video nvme_core i2c_smbus usb_common scsi_common crc16 wmi gpio_amdpt gpio_generic button
>> aesni_intel gf128mul crypto_simd cryptd
>> [ 48.137148] CPU: 30 UID: 0 PID: 421 Comm: kworker/30:1 Not tainted 6.11.0-rc5 #3
>> [ 48.137150] Hardware name: ASUS System XXXXXXXXXX
>> [ 48.137151] Workqueue: cifsiod smb2_readv_worker [cifs]
>> [ 48.137176] RIP: 0010:netfs_subreq_terminated+0x276/0x2d0 [netfs]
>> [ 48.137182] Code: 66 ff ff ff 0f 1f 44 00 00 e9 5c ff ff ff 48 89 f1 0f b7 93 86 00 00 00 8b b5 ac 01 00 00 48 c7 c7 78 81 7a c2 e8 ba 68 2f da <0f> 0b 48 8b 43 70 31 d2 4c
>> 8d ab 98 00 00 00 66 89 93 84 00 00 00
>> [ 48.137183] RSP: 0018:ffffad8942637e58 EFLAGS: 00010282
>> [ 48.137185] RAX: 0000000000000000 RBX: ffff9d09639a7200 RCX: 0000000000000027
>> [ 48.137186] RDX: ffff9d107e721788 RSI: 0000000000000001 RDI: ffff9d107e721780
>> [ 48.137187] RBP: ffff9d094bf38a00 R08: 0000000000000000 R09: 0000000000000003
>> [ 48.137187] R10: ffffad8942637ce8 R11: ffff9d109de3cfe8 R12: 0000000000000001
>> [ 48.137188] R13: ffff9d095fcf6000 R14: ffff9d09639a7208 R15: 0000000000000000
>> [ 48.137189] FS: 0000000000000000(0000) GS:ffff9d107e700000(0000) knlGS:0000000000000000
>> [ 48.137190] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 48.137191] CR2: 000055d1a3402760 CR3: 00000006b0222000 CR4: 0000000000750ef0
>> [ 48.137192] PKRU: 55555554
>> [ 48.137193] Call Trace:
>> [ 48.137194] <TASK>
>> [ 48.137197] ? __warn+0x80/0x120
>> [ 48.137201] ? netfs_subreq_terminated+0x276/0x2d0 [netfs]
>> [ 48.137207] ? report_bug+0x164/0x190
>> [ 48.137210] ? prb_read_valid+0x1b/0x30
>> [ 48.137213] ? handle_bug+0x41/0x70
>> [ 48.137215] ? exc_invalid_op+0x17/0x70
>> [ 48.137216] ? asm_exc_invalid_op+0x1a/0x20
>> [ 48.137220] ? netfs_subreq_terminated+0x276/0x2d0 [netfs]
>> [ 48.137225] ? netfs_subreq_terminated+0x276/0x2d0 [netfs]
>> [ 48.137230] process_one_work+0x179/0x390
>> [ 48.137233] worker_thread+0x249/0x350
>> [ 48.137235] ? __pfx_worker_thread+0x10/0x10
>> [ 48.137237] kthread+0xcf/0x100
>> [ 48.137240] ? __pfx_kthread+0x10/0x10
>> [ 48.137242] ret_from_fork+0x31/0x50
>> [ 48.137244] ? __pfx_kthread+0x10/0x10
>> [ 48.137246] ret_from_fork_asm+0x1a/0x30
>> [ 48.137250] </TASK>
>> [ 48.137251] ---[ end trace 0000000000000000 ]---
>>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-08-27 18:15 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-08-25 23:26 [REGRESSION] cifs: Subreq overread in dmesg, invalid argument & no data available in apps Forest
2024-08-27 3:32 ` Steve French
2024-08-27 4:27 ` Steve French
2024-08-27 4:38 ` Dominique Martinet
2024-08-27 5:02 ` Steve French
2024-08-27 18:15 ` Forest
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox