From: Filipe Manana <fdmanana@gmail.com>
To: Chris Murphy <lists@colorremedies.com>
Cc: Linux FS Devel <linux-fsdevel@vger.kernel.org>,
Btrfs BTRFS <linux-btrfs@vger.kernel.org>,
Linux Memory Management List <linux-mm@kvack.org>,
Josef Bacik <josef@toxicpanda.com>
Subject: Re: WARNING at asm/kfence.h:44 kfence_protect_page
Date: Mon, 14 Jun 2021 21:42:56 +0100 [thread overview]
Message-ID: <CAL3q7H7kCNs+fbqyHqwpuJvUudeMKOSWewEfCSdTKc1qFkSiJg@mail.gmail.com> (raw)
In-Reply-To: <CAJCQCtSDYJKq7rrjMLyaHVz3ELgM7g8DBGJrFMOkrw7aBQW+kA@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 10353 bytes --]
On Mon, Jun 14, 2021 at 9:07 PM Chris Murphy <lists@colorremedies.com> wrote:
>
> On Mon, Jun 14, 2021 at 11:50 AM Filipe Manana <fdmanana@gmail.com> wrote:
> >
> > On Mon, Jun 14, 2021 at 6:05 PM Chris Murphy <lists@colorremedies.com> wrote:
> > >
> > > 5.12.9-300.fc34.x86_64
> > >
> > > File system is btrfs, 16G RAM, 8G /dev/zram0 is used for swap, and the
> > > workload is:
> > > * bees doing dedup
> > > * btrfs send
> > > * digikam scanning files and writing image metadata to a sqlite database
> > > * chrome doing some webby things
> > >
> > > And then kaboom.
> > >
> > > Call trace (easy to read, expires in 7 days):
> > > https://pastebin.com/x5KRU23V
> > >
> > > Same call trace (MUA mangled, for the archive):
> > > [60015.902283] kernel: WARNING: CPU: 3 PID: 58159 at
> > > arch/x86/include/asm/kfence.h:44 kfence_protect_page+0x21/0x80
> > > [60015.902292] kernel: Modules linked in: uinput rfcomm snd_seq_dummy
> > > snd_hrtimer xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT
> > > nf_nat_tftp nf_conntrack_tftp nft_objref nf_conntrack_netbios_ns
> > > nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
> > > 8021q garp mrp nft_reject_inet nf_reject_ipv4 nf_reject_ipv6
> > > nft_reject bridge stp llc nft_ct nft_chain_nat ip6table_nat
> > > ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat
> > > nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw
> > > iptable_security ip_set nf_tables nfnetlink ip6table_filter ip6_tables
> > > iptable_filter cmac bnep sunrpc vfat fat iwldvm mac80211 btusb btrtl
> > > edac_mce_amd libarc4 btbcm snd_hda_codec_realtek iwlwifi
> > > snd_hda_codec_generic ledtrig_audio btintel snd_hda_codec_hdmi
> > > snd_hda_intel bluetooth kvm_amd snd_intel_dspcfg cfg80211
> > > snd_intel_sdw_acpi ccp snd_hda_codec snd_hda_core kvm snd_hwdep
> > > snd_seq joydev irqbypass snd_seq_device snd_pcm ecdh_generic ecc
> > > k10temp rfkill i2c_piix4 fam15h_power
> > > [60015.902361] kernel: pcspkr snd_timer snd r8169 soundcore
> > > acpi_cpufreq zram ip_tables amdgpu iommu_v2 gpu_sched crct10dif_pclmul
> > > crc32_pclmul crc32c_intel radeon ghash_clmulni_intel sp5100_tco
> > > i2c_algo_bit drm_ttm_helper ttm drm_kms_helper cec drm video fuse
> > > [60015.902384] kernel: CPU: 3 PID: 58159 Comm: btrfs Not tainted
> > > 5.12.9-300.fc34.x86_64 #1
> > > [60015.902387] kernel: Hardware name: Gigabyte Technology Co., Ltd. To
> > > be filled by O.E.M./F2A88XN-WIFI, BIOS F6 12/24/2015
> > > [60015.902389] kernel: RIP: 0010:kfence_protect_page+0x21/0x80
> > > [60015.902393] kernel: Code: ff 0f 1f 84 00 00 00 00 00 55 48 89 fd 53
> > > 89 f3 48 83 ec 08 48 8d 74 24 04 e8 cb e0 d7 ff 48 85 c0 74 07 83 7c
> > > 24 04 01 74 0b <0f> 0b 48 83 c4 08 31 c0 5b 5d c3 48 8b 38 48 89 c2 84
> > > db 75 36 48
> > > [60015.902396] kernel: RSP: 0018:ffff9fb583453220 EFLAGS: 00010246
> > > [60015.902399] kernel: RAX: 0000000000000000 RBX: 0000000000000000
> > > RCX: ffff9fb583453224
> > > [60015.902401] kernel: RDX: ffff9fb583453224 RSI: 0000000000000000
> > > RDI: 0000000000000000
> > > [60015.902402] kernel: RBP: 0000000000000000 R08: 0000000000000000
> > > R09: 0000000000000000
> > > [60015.902404] kernel: R10: 0000000000000000 R11: 0000000000000000
> > > R12: 0000000000000002
> > > [60015.902406] kernel: R13: ffff9fb583453348 R14: 0000000000000000
> > > R15: 0000000000000001
> > > [60015.902408] kernel: FS: 00007f158e62d8c0(0000)
> > > GS:ffff93bd37580000(0000) knlGS:0000000000000000
> > > [60015.902410] kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [60015.902412] kernel: CR2: 0000000000000039 CR3: 00000001256d2000
> > > CR4: 00000000000506e0
> > > [60015.902414] kernel: Call Trace:
> > > [60015.902419] kernel: kfence_unprotect+0x13/0x30
> > > [60015.902423] kernel: page_fault_oops+0x89/0x270
> > > [60015.902427] kernel: ? search_module_extables+0xf/0x40
> > > [60015.902431] kernel: ? search_bpf_extables+0x57/0x70
> > > [60015.902435] kernel: kernelmode_fixup_or_oops+0xd6/0xf0
> > > [60015.902437] kernel: __bad_area_nosemaphore+0x142/0x180
> > > [60015.902440] kernel: exc_page_fault+0x67/0x150
> > > [60015.902445] kernel: asm_exc_page_fault+0x1e/0x30
> > > [60015.902450] kernel: RIP: 0010:start_transaction+0x71/0x580
> > > [60015.902454] kernel: Code: d3 0f 84 92 00 00 00 80 e7 06 0f 85 63 04
> > > 00 00 65 48 8b 04 25 c0 7b 01 00 4c 8b a0 70 0c 00 00 b8 01 00 00 00
> > > 49 8d 7c 24 38 <f0> 41 0f c1 44 24 38 85 c0 0f 84 41 04 00 00 8d 50 01
> > > 09 c2 0f 88
> > > [60015.902456] kernel: RSP: 0018:ffff9fb5834533f8 EFLAGS: 00010246
> > > [60015.902458] kernel: RAX: 0000000000000001 RBX: 0000000000000001
> > > RCX: 0000000000000000
> > > [60015.902460] kernel: RDX: 0000000000000801 RSI: 0000000000000000
> > > RDI: 0000000000000039
> > > [60015.902462] kernel: RBP: ffff93bc0a7eb800 R08: 0000000000000001
> > > R09: 0000000000000000
> > > [60015.902463] kernel: R10: 0000000000098a00 R11: 0000000000000001
> > > R12: 0000000000000001
> > > [60015.902464] kernel: R13: 0000000000000000 R14: ffff93bc0c92b000
> > > R15: ffff93bc0c92b000
> > > [60015.902468] kernel: btrfs_commit_inode_delayed_inode+0x5d/0x120
> > > [60015.902473] kernel: btrfs_evict_inode+0x2c5/0x3f0
> > > [60015.902476] kernel: evict+0xd1/0x180
> > > [60015.902480] kernel: inode_lru_isolate+0xe7/0x180
> > > [60015.902483] kernel: __list_lru_walk_one+0x77/0x150
> > > [60015.902487] kernel: ? iput+0x1a0/0x1a0
> > > [60015.902489] kernel: ? iput+0x1a0/0x1a0
> > > [60015.902491] kernel: list_lru_walk_one+0x47/0x70
> > > [60015.902495] kernel: prune_icache_sb+0x39/0x50
> > > [60015.902497] kernel: super_cache_scan+0x161/0x1f0
> > > [60015.902501] kernel: do_shrink_slab+0x142/0x240
> > > [60015.902505] kernel: shrink_slab+0x164/0x280
> > > [60015.902509] kernel: shrink_node+0x2c8/0x6e0
> > > [60015.902512] kernel: do_try_to_free_pages+0xcb/0x4b0
> > > [60015.902514] kernel: try_to_free_pages+0xda/0x190
> > > [60015.902516] kernel: __alloc_pages_slowpath.constprop.0+0x373/0xcc0
> > > [60015.902521] kernel: ? __memcg_kmem_charge_page+0xc2/0x1e0
> > > [60015.902525] kernel: __alloc_pages_nodemask+0x30a/0x340
> > > [60015.902528] kernel: pipe_write+0x30b/0x5c0
> > > [60015.902531] kernel: ? set_next_entity+0xad/0x1e0
> > > [60015.902534] kernel: ? switch_mm_irqs_off+0x58/0x440
> > > [60015.902538] kernel: __kernel_write+0x13a/0x2b0
> > > [60015.902541] kernel: kernel_write+0x73/0x150
> > > [60015.902543] kernel: send_cmd+0x7b/0xd0
> > > [60015.902545] kernel: send_extent_data+0x5a3/0x6b0
> > > [60015.902549] kernel: process_extent+0x19b/0xed0
> > > [60015.902551] kernel: btrfs_ioctl_send+0x1434/0x17e0
> > > [60015.902554] kernel: ? _btrfs_ioctl_send+0xe1/0x100
> > > [60015.902557] kernel: _btrfs_ioctl_send+0xbf/0x100
> > > [60015.902559] kernel: ? enqueue_entity+0x18c/0x7b0
> > > [60015.902562] kernel: btrfs_ioctl+0x185f/0x2f80
> > > [60015.902564] kernel: ? psi_task_change+0x84/0xc0
> > > [60015.902569] kernel: ? _flat_send_IPI_mask+0x21/0x40
> > > [60015.902572] kernel: ? check_preempt_curr+0x2f/0x70
> > > [60015.902576] kernel: ? selinux_file_ioctl+0x137/0x1e0
> > > [60015.902579] kernel: ? expand_files+0x1cb/0x1d0
> > > [60015.902582] kernel: ? __x64_sys_ioctl+0x82/0xb0
> > > [60015.902585] kernel: __x64_sys_ioctl+0x82/0xb0
> > > [60015.902588] kernel: do_syscall_64+0x33/0x40
> > > [60015.902591] kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > [60015.902595] kernel: RIP: 0033:0x7f158e38f0ab
> >
> > The problem is we end up trying to start a transaction while in a send
> > context, which is unexpected. But it does happen if memory allocation
> > triggers inode eviction due to memory pressure.
> >
> > You don't have CONFIG_BTRFS_ASSERT=y in your kernel config right?
>
> No, but Fedora debug kernels do have it enabled so I can ask the
> reporter to use 5.12.10-debug or 5.13-rc6. Any preference?
No preference.
It's a problem that has been around for several years now, and not
something recent.
Ok, so it was reported before. Anywhere public, or was it something private?
>
>
> > That would trigger an assertion right away:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/fs/btrfs/transaction.c?h=v5.12.10#n584
> >
> > With assertions disabled, we just cast current->journal_info into a
> > transaction handle and dereference it, which is obviously wrong since
> > ->jornal_info has a value that is not a pointer to a transaction
> > handle, resulting in the crash.
> >
> > How easy is it to reproduce for you?
>
> I'm not sure.
>
> Do you think following this splat we could have somehow been IO
> limited or stalled without any other kernel messages appearing? That
> would then cause reclaim activity, and high memory and swap usage? I'm
> wondering if it's likely or unlikely, because at the moment I think
> it's unlikely since /var/log/journal was successfully recording
> systemd-journald journals for the 81 minutes between this call trace
> and when systemd-oomd started killing things. I'm trying to separate
> out what's causing what.
Don't know about that. And honestly I don't think that it matters.
The send task writes to a pipe and writing to a pipe allocates pages
without GFP_NOFS (with GFP_HIGHUSER | __GFP_ACCOUNT to be precise),
meaning we can end up evicting an inode, which in turn needs to start
a transaction.
That patch would be my preferred way to fix it.
If it doesn't work for any reason, then we can simply set up a NOFS
context surrounding the calls in send to write to the pipe, like this:
https://pastebin.com/raw/tTrfAw0m
(also attached)
I would have to test the first patch in a scenario under heavy memory
pressure to see if there isn't anything I might have missed, but I
think it should work just fine.
I'll test it tomorrow too.
Thanks.
>
> > Can you try the following patch and see if everything works as expected?
>
> I'll ask the reporter. I don't have a setup even remotely approximate
> to this one for testing.
>
> Thanks!
>
> --
> Chris Murphy
--
Filipe David Manana,
“Whether you think you can, or you think you can't — you're right.”
[-- Attachment #2: send_nofs_pipe_write.patch --]
[-- Type: text/x-patch, Size: 1053 bytes --]
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index a2b3c594379d..d8667c07c4eb 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -548,24 +548,32 @@ static struct btrfs_path *alloc_path_for_send(void)
static int write_buf(struct file *filp, const void *buf, u32 len, loff_t *off)
{
- int ret;
+ int ret = 0;
u32 pos = 0;
+ unsigned nofs_flag;
+ /*
+ * Pipe writes can allocate pages which can trigger reclaim and cause
+ * a crash if reclaim triggers inode eviction and that needs to start a
+ * transaction. So use a NOFS context.
+ */
+ nofs_flag = memalloc_nofs_save();
while (pos < len) {
ret = kernel_write(filp, buf + pos, len - pos, off);
/* TODO handle that correctly */
/*if (ret == -ERESTARTSYS) {
continue;
}*/
+ if (ret == 0)
+ ret = -EIO;
if (ret < 0)
- return ret;
- if (ret == 0) {
- return -EIO;
- }
+ break;
pos += ret;
+ ret = 0;
}
+ memalloc_nofs_restore(nofs_flag);
- return 0;
+ return ret;
}
static int tlv_put(struct send_ctx *sctx, u16 attr, const void *data, int len)
next prev parent reply other threads:[~2021-06-14 20:43 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-14 17:02 Chris Murphy
2021-06-14 17:50 ` Filipe Manana
2021-06-14 20:07 ` Chris Murphy
2021-06-14 20:42 ` Filipe Manana [this message]
2021-06-14 21:29 ` Chris Murphy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAL3q7H7kCNs+fbqyHqwpuJvUudeMKOSWewEfCSdTKc1qFkSiJg@mail.gmail.com \
--to=fdmanana@gmail.com \
--cc=josef@toxicpanda.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lists@colorremedies.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox