From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 176A3EB64D7 for ; Wed, 21 Jun 2023 15:38:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 71E8E8D0003; Wed, 21 Jun 2023 11:38:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6CEDA8D0002; Wed, 21 Jun 2023 11:38:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 596668D0003; Wed, 21 Jun 2023 11:38:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 4AF9C8D0002 for ; Wed, 21 Jun 2023 11:38:11 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id E392C160910 for ; Wed, 21 Jun 2023 15:38:10 +0000 (UTC) X-FDA: 80927161140.23.DB621C6 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf05.hostedemail.com (Postfix) with ESMTP id 6CB42100019 for ; Wed, 21 Jun 2023 15:38:07 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=hD8B7TZh; spf=pass (imf05.hostedemail.com: domain of pmladek@suse.com designates 195.135.220.28 as permitted sender) smtp.mailfrom=pmladek@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687361887; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=wb2Po/Ux99flrnhD3Y1rwUvMWMhEwPbSoSmx1EL7OgE=; b=r0djlYA7jbU+rosDHvWjcoQmopRFILpb4nJ8HE//s3NHaQ9nArQ8usFYF6BRQS6R9tXhTC 5qzfqsg45NHCLRKSMJiptu5mSrge34NDnkkFdjWfzK9CqZ3W/+6qUtLlIlMmRWgur3jL7H SBPqwEu32s/MdUC8r44BcL7wDMunrdA= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687361887; a=rsa-sha256; cv=none; b=XVEWPsNYLTl9fkzbIlKTZfd+8qf/bcufjy509RO9pWld+zkqp2NhLY6hYhwz81a1yYqyRT 6Xta92gRlQ+rMZ1B4qa5/neTr/cMlwiHCQUCKNDUQEfwhu5OqKO3SaZboJkCbcDeqOGBsX QdR6ql351/9JYli2m3xx+YgkPZzWYrg= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=hD8B7TZh; spf=pass (imf05.hostedemail.com: domain of pmladek@suse.com designates 195.135.220.28 as permitted sender) smtp.mailfrom=pmladek@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 9B84D21EB6; Wed, 21 Jun 2023 15:38:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1687361885; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wb2Po/Ux99flrnhD3Y1rwUvMWMhEwPbSoSmx1EL7OgE=; b=hD8B7TZhAa26FS0l9mOA3gJ2HGS+Sid9NTez+U59bkkQ/mZhXXw8iF8KLeQTIrBb/JqNAQ RcM9T273zomlawCDBc+euneM1WF1xJbyNwprd859HZioM885YG4/RVXlZ5iVwYYEO+n7c8 tXkdoUMiw3lfmw5RGZEUl333KwYmUKc= Received: from suse.cz (unknown [10.100.201.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 0951B2C141; Wed, 21 Jun 2023 15:38:03 +0000 (UTC) Date: Wed, 21 Jun 2023 17:38:03 +0200 From: Petr Mladek To: Sebastian Andrzej Siewior Cc: Tetsuo Handa , linux-mm@kvack.org, "Luis Claudio R. Goncalves" , Andrew Morton , Mel Gorman , Michal Hocko , Thomas Gleixner , John Ogness Subject: Re: [PATCH] mm/page_alloc: Use write_seqlock_irqsave() instead write_seqlock() + local_irq_save(). Message-ID: References: <20230621104034.HT6QnNkQ@linutronix.de> <0e9fc992-8e05-2e63-b3b1-d8d3ce89fc16@I-love.SAKURA.ne.jp> <20230621130641.-5iueY1I@linutronix.de> <20230621143421.BgHjJklo@linutronix.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20230621143421.BgHjJklo@linutronix.de> X-Rspamd-Queue-Id: 6CB42100019 X-Rspam-User: X-Stat-Signature: 3a6d4od4ireax3m4qxma44fwnmffunkn X-Rspamd-Server: rspam03 X-HE-Tag: 1687361887-616645 X-HE-Meta: U2FsdGVkX196L1xfspoUaXjkXp3c+39S9wC2oNwJmFRHam9mi2kCZLYj8Hj36NP0WbFBxVbGDFoE43Ibps1syiNB0Mcbuc17benMtpFM94sdBVYbTh/76H8o7YRNMaZE1jQXIk4yrbN6O5M+eDqacggyjoH+WeAy2+9ludo82hPMhabz0wGeFsyN0L6DFO9ZXTDv4J5BSujhH1QQs5CFXCX45SGEUAXPVK4sTHyUJVegvTRA2/sbqxuMlL1P3AY1Rtu+Z/p3hVNQVfAkLrye+vPdwptS4EGjMHbcbkqHgANZfe3Nq6qNJxKbF1/FSsj72ilePcsqyTa6YBnDZwmAwWyljqq0o07pLWnVGtHRUxWvaCDo5UOsTYzpBjRVoiSUrRlwq5Tj+JaZyTWo73oNcuBTdWMMANc9dP3Yc/VovpRgf4+OCNFnhAb8+RT8VqV/p35VscL3a/v7ogTv38KTX1ZDTbPUvU1UiykFhWTFwri2XoqQOfjI+8KlGnG7bxGXd57QJc+mE63VF6BrURe8lSc0flKWqG5JG6KSmEoVaJ2ndiVbwE429OxnGd8ePZZ7dZn2Ac4CpOjcCzjJHRTKIttRsU4aaxWg35rC/n1sc442ZxpWuO7kLRt7zsBKgW8NXQUAZlCNY6IDiCelxy6lo6NbFg/u+xKBExl9OBGGYP2WIjCqcw5r2ke5rr9XO0TW4BgszfphQdjXTFS3iIodpRJ+737YF1AcfG6HA5m8h7GsiRIuLLJT+ARedbzJ4uYHqqR4F+G8TGDQWlOIyANjoZxC85O0cdm6AKAfXOPcq9hsxp6dogY5S5QoZoglrYKW0prAmXJzk9d9vDF3Lp0JZqvgnxJ7dvM3h4aAaDEfkvqIYPpOcwGKuubTowcpMxYw81Yuy6/JlI6pefxDmPotSb/qt2vI/YU9rQxpDW/FZG4O7Rit6llLTpw4YBejaG3D5U5+gzMOuTU82Tq2/HT vaRqzY2r pQZCHF7l5Oo6yTRmg8amkkOEIKHWjy9LPHL9k/cN/d6pNwiz+4IhATWbFOmXi7hFhFgiFdYCAEnoG95L13sFQGj1ZqyuPHz9DCzi/e1g7Pc5bbKYiCZCvdl7FX7eZKYTXfjCHzdcbL+B8Emc7PLHlIdrSdyrqwPAIHQvjHK5jufL2HxCT2vHEqTn1uEPkXyzuSQvu X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed 2023-06-21 16:34:21, Sebastian Andrzej Siewior wrote: > On 2023-06-21 22:32:40 [+0900], Tetsuo Handa wrote: > > include/linux/seqlock.h says > … > > Is above understanding correct? > > That is correct. > > > And you are trying to replace > > > > local_irq_save(flags); > > printk_deferred_enter(); > > write_seqlock(&zonelist_update_seq); > > > > with > > > > write_seqlock_irqsave(&zonelist_update_seq, flags); > > printk_deferred_enter(); > > > > , aren't you? > > correct. > > > But include/linux/printk.h says > > > > /* > > * The printk_deferred_enter/exit macros are available only as a hack for > > * some code paths that need to defer all printk console printing. Interrupts > > * must be disabled for the deferred duration. > > */ > > #define printk_deferred_enter __printk_safe_enter > > #define printk_deferred_exit __printk_safe_exit > > > > which means that local_irq_save() is _required_ before printk_deferred_enter(). > > It says that, yes, but that requirement is described as too heavy. The > requirement is that printk_deferred_enter() happens on the same CPU as > printk_deferred_exit(). True. > This can be achieved by an explicit > local_irq_save(), yes, but also by something like spin_lock_irq() which > _ensures_ that the task does not migrate to another CPU while the lock > is acquired. This is the requirement by the current implementation. Good to know. > > If local_irq_save() is hidden by your patch, what guarantees that > > printk_deferred_enter() and printk_deferred_exit() run on the same CPU? > > Because spin_lock_irqsave() on CONFIG_PREEMPT_RT uses migrate_disable(). > The function ensures that the scheduler does not migrate the task to > another CPU. The CPU is even block from going down (as in CPU-hotplug) > until the matching migrate_enable() occurs. > > > Also, if local_irq_save() is hidden due to RT, what guarantees that > > > > write_seqlock_irqsave(&zonelist_update_seq, flags); > > <> > > some_timer_function() { > > printk(); > > } > > <> > > printk_deferred_enter(); > > > > does not happen because write_seqlock_irqsave() does not disable IRQ? > > I don't see how zonelist_update_seq and printk here are connected > without the port lock/ or memory allocation. But there are two things > that are different on RT which probably answer your question: > > - If the reader observes an odd sequence number then it acquires the > lock of the sequence counter (it is held by the writer) which > forces the writer to complete the write critical section and then the > reader can continue. There are _no_ memory allocation within a > hard IRQ context (as in the actual interrupt). The timer (hrtimer or > timer_list timer) are served in task context and we have > forced-threaded interrupts. Clearly this means that the seqlock_t (as > used here) can only be used task context and not in hard IRQ context. > > - The printk implementation is slightly different and it is being worked > to merge it upstream. The two important differences here: > - Printing happens by default in a dedicated printing thread. > - In emergency cases like panic(), printing happens directly within > the invocation of printk(). This requires a so called atomic console > which does not use the tty_port::lock. If I get it correctly, RT solve both possible deadlocks by offloading the nested operation into a kthread (irq and printk threads). Plus, printk uses emergency_write() when the kthread is not usable. If this is true then printk_safe_enter() might be a nop on RT. All possible deadlocks are prevented either by the kthread or the console->emergency_write() call. But wait. AFAIK, the printk kthread is implemented only for the serial console. So, it won't be safe for other consoles, especially the problematic tty_insert_flip_string_and_push_buffer() call. Note that adding printk thread for the graphical consoles will be a real challenge. We have hard times even with the basic UART 8250. There are still races possible in the implementation in the RT tree... OK, what about using migrate_disable() in printk_deferred_enter()? Something like: /* * The printk_deferred_enter/exit macros are available only as a hack. * They define a per-CPU context where all printk console printing * is deferred because it might cause a deadlock otherwise. * * It is highly recommended to use them only in a context with interrupts * disabled. Otherwise, other unrelated printk() calls might be deferred * when they interrupt/preempt the deferred code section. * * They should not be used for to deffer printing of many messages. It might * create softlockup when they are flushed later. * * IMPORTANT: Any new use of these MUST be consulted with printk maintainers. * It might have unexpected side effects on the printk infrastructure. */ #ifdef CONFIG_PREEMPT_RT #define printk_deferred_enter() \ do { \ migrate_disable(); \ __printk_safe_enter(); \ } while (0) #define printk_deferred_exit() \ do { \ __printk_safe_exit(); \ migrate_enable(); \ } while (0) #else /* CONFIG_PREEMPT_RT */ #define printk_deferred_enter() \ do { \ preempt_disable(); \ __printk_safe_enter(); \ } while (0) #define printk_deferred_exit() \ do { \ __printk_safe_exit(); \ preempt_enable(); \ } while (0) #endif /* CONFIG_PREEMPT_RT */ Note that I have used preempt_disable() on non-RT because it is much cheaper. And IRQs will be disabled anyway on non-RT system in this code path. > > Disabling IRQ before incrementing zonelist_update_seq is _required_ for both > > > > making printk_deferred_enter() safe > > > > and > > > > making sure that printk_deferred_enter() takes effect > > > > . > Did I explain why it is sufficient to do > write_seqlock_irqsave() > printk_deferred_enter() > > assuming we have > > | static inline void do_write_seqcount_begin_nested(seqcount_t *s, int subclass) > | { > | seqcount_acquire(&s->dep_map, subclass, 0, _RET_IP_); > | do_raw_write_seqcount_begin(s); > | } Will this prevent any printk() called on the same CPU before printk_deferred_enter() is called? Best Regards, Petr