From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 898BEEB64DC for ; Wed, 28 Jun 2023 13:56:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 183D28D0003; Wed, 28 Jun 2023 09:56:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 133518D0001; Wed, 28 Jun 2023 09:56:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F3D708D0003; Wed, 28 Jun 2023 09:56:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id E28698D0001 for ; Wed, 28 Jun 2023 09:56:26 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id A577D140363 for ; Wed, 28 Jun 2023 13:56:26 +0000 (UTC) X-FDA: 80952306372.03.7FF1621 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf25.hostedemail.com (Postfix) with ESMTP id 62E18A0024 for ; Wed, 28 Jun 2023 13:56:23 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=EVwkpnA3; spf=pass (imf25.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687960583; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=UBkDkoRg0HA6vloPwRjoXM7rrbP50/wBU1LZzUkViAU=; b=UomhfYRNfintvcT2oCruLP0hxBU1Q+ArSqsxe5dLnmeUK3xqKOx6UdNWlbb/7doXUeLwzn sjgJfd4zJyc/TjgMLPPaeym8P1ZC+Gklxivk7ZLQbaJNwUSIa5rGnlgdcln9vRswjzD/pN G7rGrcNC4hCaNFiSLQxyaoR/psNvt9Q= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687960583; a=rsa-sha256; cv=none; b=hYRLyTwmjjXuT3yv8h3hEwJCOAPtn28ONKNYHW0LpfH5DpPRi056Lhk5kztdDips75Wm2F Rg9W25n6RRpMnFC2nMyYRnaMEaELgH0ZZQfnzLUW5P0NLryINXx0XLPJWioSCuB/wNknNk FbGQo+Jjv/1d4ZPvHAIoYXLiDL+T3UU= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=EVwkpnA3; spf=pass (imf25.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 704E61F6E6; Wed, 28 Jun 2023 13:56:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1687960581; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=UBkDkoRg0HA6vloPwRjoXM7rrbP50/wBU1LZzUkViAU=; b=EVwkpnA3ZnBl4diVvgSAhsFc0JP4kr6t+CDom0M9DLYq1eYt8N+THhgRu+9j36T7vPKeOi LJKLNiRmMCwYTXeE2s7k5fcD2XdqTTr3tXb9JZLxdpzZVoMn6LcE+m6Kg3XmAsIUjdAX1Q mBPDBX8sXuZ+VbgdBGvJF4NcamLfP+A= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 4F01C138EF; Wed, 28 Jun 2023 13:56:21 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id l3/3EAU8nGRSDwAAMHmgww (envelope-from ); Wed, 28 Jun 2023 13:56:21 +0000 Date: Wed, 28 Jun 2023 15:56:20 +0200 From: Michal Hocko To: Sebastian Andrzej Siewior , Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Luis Claudio R. Goncalves" , Boqun Feng , Ingo Molnar , John Ogness , Mel Gorman , Peter Zijlstra , Petr Mladek , Tetsuo Handa , Thomas Gleixner , Waiman Long , Will Deacon Subject: Re: [PATCH v3 2/2] mm/page_alloc: Use write_seqlock_irqsave() instead write_seqlock() + local_irq_save(). Message-ID: References: <20230623171232.892937-1-bigeasy@linutronix.de> <20230623171232.892937-3-bigeasy@linutronix.de> <20230623201517.yw286Knb@linutronix.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20230623201517.yw286Knb@linutronix.de> X-Rspamd-Queue-Id: 62E18A0024 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: mnb48414z8yhq9s5fz9y849hy1ggznw7 X-HE-Tag: 1687960583-991549 X-HE-Meta: U2FsdGVkX18yaGof9HcpW+FQYAN8l7RsByluU4as0zpSyOY9vH28vvkNyVVS5b+5pfm7E9sq/3YOTgxGAUfYeYkEVypnwwKHoFs9KZzIiy/hhrXRneLgKiRnrPfx/pMQU987/yByixYRAn+BDhwZMMc4NrbfxteMYD9x2YIzjuncNrZITGJ4NRZm69pnV2qHzkgVpfpesQApMV7mQRqbXGQcyIucnE/SxYVMwPHO9sbRAQOU/afkg+ui96igxto4mUYjU0O8mZG9ld5P/oF64GBhgzcZX6MsKvf88zkZqkWGkQaH/+U3RiK54cAIEjw/F6CC7nKREX0VFWivlY4XSooYK5wHlTej2TmmKHjFrz/kni8izhaGcd5MC8cFPDSWeokIOvYIBW/2YdIpMBPHQEjA4YEZvcIJ7UiBbGRKRcK4sZ+S+AMfTPP9etpw/stAc4hwBMgcFIajaV66GMfJWuRcZNLdHORrWGGQhqMXbKzBqXqlVAOgfqDY3eVmWMDsWuGyhW+Ot9vk4jUXQn9ugwnkq7pS3lXWznOj6hMpcSRmV+EdZEe//iw3vlejYz3YGsRjxXKj5csv9WnR5GX0CiqLpB/bSte6IUlk8Hu54YTap865e7+xxX/tSM2PSeEyp2NPNEs96HhxxoocL7crdz41SIJFeHYhc2hlUkALi2cLmDM0CujiBjnDPP+jLzvVNaXDMlrObgub358l2DCnW9USq9CLdchg0BCDTIBrdKCkty54j9Xo/2X/r2id720vK1YBL1yoFvHLVmf79u7cm7YFMWeN0JGurtPgAb2b+K7OnWlmCfHYfRMWcz6WZ1cqztkiTgasWhKdW/6WmCNbpmLo+eorpdDmPkg3xnBopxu50zKuaeS4WLUhL/VoD04B2Lrvr0/Rc3vZDqKu2CDJ3ofjQZs7Tii3tGDsouKRnfKNhR7EHJ7FuSHtxXnJtv46+FF+yGQfKN/imO6yHtj u/i9hCe4 BHCl2uA+UMS4ibVwpFGO8EbrJgL4TBtAh141D38ixIRtLFXZhpIwDyJiYFu40JJq98uxJVIcLgBLSTtg5KVC+TXgFjPjwqCyC03yakSalzKbKjYpV8ciAnW0AfT5YbYZe01KrM8FHVM0L6Vdm+YktL7X+7g5b8uMCU6hbz9x44titUgZWQNPxas96UE0G6UHRu+bgH3sKhygTv94EUUlAzziiF12YT+VEybmOT7H7Z4eAOZ42FdvMIpYQGdeRmW/L7F6jWa1nMNHLCM/R+qbFFaGkySTvH8CGrADnX68QP/IOF/aAJN1/v61EqWRTYVEBnMWhDXp1vr7DsqA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Andrew, it seems that we have a consensus on the MM side of things that this is good enough to go. I am not sure about patch 1, that is more on lockdep people but I think that this patch is good enough on this own. Can we get this patch merged into mm tree and see whether any of Tetsuo concerns pop out? On Fri 23-06-23 22:15:17, Sebastian Andrzej Siewior wrote: > __build_all_zonelists() acquires zonelist_update_seq by first disabling > interrupts via local_irq_save() and then acquiring the seqlock with > write_seqlock(). This is troublesome and leads to problems on > PREEMPT_RT. The problem is that the inner spinlock_t becomes a sleeping > lock on PREEMPT_RT and must not be acquired with disabled interrupts. > > The API provides write_seqlock_irqsave() which does the right thing in > one step. > printk_deferred_enter() has to be invoked in non-migrate-able context to > ensure that deferred printing is enabled and disabled on the same CPU. > This is the case after zonelist_update_seq has been acquired. > > There was discussion on the first submission that the order should be: > local_irq_disable(); > printk_deferred_enter(); > write_seqlock(); > > to avoid pitfalls like having an unaccounted printk() coming from > write_seqlock_irqsave() before printk_deferred_enter() is invoked. The > only origin of such a printk() can be a lockdep splat because the > lockdep annotation happens after the sequence count is incremented. > This is exceptional and subject to change. > > It was also pointed that PREEMPT_RT can be affected by the printk > problem since its write_seqlock_irqsave() does not really disable > interrupts. This isn't the case because PREEMPT_RT's printk > implementation differs from the mainline implementation in two important > aspects: > - Printing happens in a dedicated threads and not at during the > invocation of printk(). > - In emergency cases where synchronous printing is used, a different > driver is used which does not use tty_port::lock. > > Acquire zonelist_update_seq with write_seqlock_irqsave() and then defer > printk output. > > Fixes: 1007843a91909 ("mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock") > Signed-off-by: Sebastian Andrzej Siewior > Acked-by: Michal Hocko > --- > v2…v3 > - Update comment as per Michal's suggestion. > > v1…v2: > - Improve commit description > > mm/page_alloc.c | 15 ++++++--------- > 1 file changed, 6 insertions(+), 9 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 47421bedc12b7..440e9af67b48d 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -5808,19 +5808,17 @@ static void __build_all_zonelists(void *data) > unsigned long flags; > > /* > - * Explicitly disable this CPU's interrupts before taking seqlock > - * to prevent any IRQ handler from calling into the page allocator > - * (e.g. GFP_ATOMIC) that could hit zonelist_iter_begin and livelock. > + * The zonelist_update_seq must be acquired with irqsave because the > + * reader can be invoked from IRQ with GFP_ATOMIC. > */ > - local_irq_save(flags); > + write_seqlock_irqsave(&zonelist_update_seq, flags); > /* > - * Explicitly disable this CPU's synchronous printk() before taking > - * seqlock to prevent any printk() from trying to hold port->lock, for > + * Also disable synchronous printk() to prevent any printk() from > + * trying to hold port->lock, for > * tty_insert_flip_string_and_push_buffer() on other CPU might be > * calling kmalloc(GFP_ATOMIC | __GFP_NOWARN) with port->lock held. > */ > printk_deferred_enter(); > - write_seqlock(&zonelist_update_seq); > > #ifdef CONFIG_NUMA > memset(node_load, 0, sizeof(node_load)); > @@ -5857,9 +5855,8 @@ static void __build_all_zonelists(void *data) > #endif > } > > - write_sequnlock(&zonelist_update_seq); > printk_deferred_exit(); > - local_irq_restore(flags); > + write_sequnlock_irqrestore(&zonelist_update_seq, flags); > } > > static noinline void __init > -- > 2.40.1 -- Michal Hocko SUSE Labs