From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1D237EB64D7 for ; Fri, 23 Jun 2023 08:12:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 822E68D0003; Fri, 23 Jun 2023 04:12:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7D1F88D0001; Fri, 23 Jun 2023 04:12:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 699CD8D0003; Fri, 23 Jun 2023 04:12:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 575DE8D0001 for ; Fri, 23 Jun 2023 04:12:57 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 22A67C0D40 for ; Fri, 23 Jun 2023 08:12:57 +0000 (UTC) X-FDA: 80933296794.11.1FF6994 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by imf18.hostedemail.com (Postfix) with ESMTP id 1C2BE1C000A for ; Fri, 23 Jun 2023 08:12:54 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=linutronix.de header.s=2020 header.b=2RavZHw3; dkim=pass header.d=linutronix.de header.s=2020e header.b=regLXRkd; dmarc=pass (policy=none) header.from=linutronix.de; spf=pass (imf18.hostedemail.com: domain of bigeasy@linutronix.de designates 193.142.43.55 as permitted sender) smtp.mailfrom=bigeasy@linutronix.de ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687507975; a=rsa-sha256; cv=none; b=7GnTC6BLJ06dkjwzChRcVxFntPt42Jfekw/GtVWFwW+uZ45FjcxOyX5zBtMIByJJYGnZjH wyr7NhAWW0Ahn8dciecR0s4+FP5Th7EsvavDWXY1ph69sFJVerwFtpDBHWrxuFoao6v3VK 8SzdWvo/QfEr9oagtWvz5Jq7uM1NAIw= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=linutronix.de header.s=2020 header.b=2RavZHw3; dkim=pass header.d=linutronix.de header.s=2020e header.b=regLXRkd; dmarc=pass (policy=none) header.from=linutronix.de; spf=pass (imf18.hostedemail.com: domain of bigeasy@linutronix.de designates 193.142.43.55 as permitted sender) smtp.mailfrom=bigeasy@linutronix.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687507975; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Y74j66fJYTOIaSN1Bbkmuq/kT6uIIFo+itoMvVIFwms=; b=2Gj4svuYyuxYgVWnV0Hv5v8RNa2f6FJrDOhnKfzuZsyFWHRFcPp/XwNo6G2ZRKW9ZM58PL nHOoUgZtOJYoHyb38pkPbUapqzCyMYkdSVanaHiEdQ3JjQc9CV8XQmpPr/e/a5l+4jE6u9 /gvh/3mKM5RhAyBaDVj1C6CtIVh4cd8= Date: Fri, 23 Jun 2023 10:12:50 +0200 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1687507973; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Y74j66fJYTOIaSN1Bbkmuq/kT6uIIFo+itoMvVIFwms=; b=2RavZHw3rJGcDUhVSZb6BQ5F3HG0relk2vt2mXdoZ3CGC9Au3dEoWdRBewPUvXpy6GTtKy RSVrmuw538sOqkyI1t2GZQ1PLkr03oEVQhLwMDFqNbIJ4sQojPFq9COgYBvwDFRaRXBK/4 nccrsMiouVksoYv21FyW9LrWaWe4HxR4cWBiyi/Ms0x+HXdp24BuSUgwWYY4iMmUeP5Etp 1ETwpGKRZb0Eg5TJgTxVamPLh2o5o4PxNfE6dLD/LDn7bPY+HdPjZrCU58rjrwZYKBIAXc GLJXOfAh9qazkd98Rz+6FL2tlY+kgLDWLt4VaAeP4oxyT4sgS70Jcdr+J3TnYg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1687507973; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Y74j66fJYTOIaSN1Bbkmuq/kT6uIIFo+itoMvVIFwms=; b=regLXRkdvJ0gUaxaLYEor1NFdRfd8BCaM1ZIhNRCcY0WMamO2UQaJ9SWxyAn+FlpLtSwQG U7aAnRQeSUhwU8Dw== From: Sebastian Andrzej Siewior To: Petr Mladek Cc: Tetsuo Handa , linux-mm@kvack.org, "Luis Claudio R. Goncalves" , Andrew Morton , Mel Gorman , Michal Hocko , Thomas Gleixner , John Ogness Subject: Re: [PATCH] mm/page_alloc: Use write_seqlock_irqsave() instead write_seqlock() + local_irq_save(). Message-ID: <20230623081250.h20rlLik@linutronix.de> References: <20230621104034.HT6QnNkQ@linutronix.de> <0e9fc992-8e05-2e63-b3b1-d8d3ce89fc16@I-love.SAKURA.ne.jp> <20230621130641.-5iueY1I@linutronix.de> <20230621143421.BgHjJklo@linutronix.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable In-Reply-To: X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 1C2BE1C000A X-Stat-Signature: gkregx4zr166zaoppkhgi4m5amaeu9sh X-HE-Tag: 1687507974-922918 X-HE-Meta: U2FsdGVkX1/tBbNuGNNl8boiUnbMrroVFsVaD2PAoc9kzYaaHIyr3KoI0KeOmmXDgfz79DLUONTv6gHY3d5VUs3auR37xKNtrpp4rNosFVwiHR0cgkB/LN8pS0910kHv3p1bRb/1D735xo7sfiyUeIrC3MW9kSxF8fbBhtpvi1xcOsRd1PbIKj+hUnTaHlrQQDk+PzgVzvnzYqbQM+4FkaEAD2lyxqsfGAIbDpYduaB1iGEY7BkAhQfthc6xk9F/EaW4s/wUHMcv272HUKiCIUunOVs7u6Fd4a4Yn3eyAYqnklcatQHEoIwf0G9kKycvucBK+BnNIWq+S3D1PW0ui7K4R+scgmAozq2aFzatGiR/c5b9Xd8mi/v4hDezO2NTLigyYL3NvVWWXNYsUqvFGwJFdb6DJhBFwl5M4N+WDRcFzVXX8yxLIQeIyCzS4L9PoPtGFLgundmGFNYnPU9rwpkEetYPagJ0B8qtxfMU/u5ZlazgXKkWQ5ZJQyIPcnskMDOFvq8AHsdPGjk37S9IDPonUznUHTynV+ZX6egRcTjXFF8C1cJ53ip5nQJqKVoDR/smObOL9xvDC4SdVg5vTmoIQNcLYq60XNk14iD/oBwTvSQ622XAdzyZHjjmM6MV3eEU6vyxgGso4FRovAR5tCIsjftxU+cjb4yOk9+vW1c6D1d0NurVrcRKhHLe6NcHaEG2dXWCWaGXxxYuEsGsn5JLnCg8VEn1RZJSxznS0hoYlQL6dtaLmvCNk/c0LOIeL134tZAqkxb3n30xLklAuKwie66jFO77q1Rcyf4KdmZE7HERuLlw7+TlvUs2EQzDzXPs/BP0EOW5By+sR5259L6Nss1Hm8wkTRFHUrFpZvsIzOop6DOiY1KrPsOPCruiz3QtYsn1zxvH86zf28KDKBoAmuJK2ncbcwiOrHvHK7REnONYBAh5+PjAP+/3ybrNj5ZnAcWexD2aRj2vnWc Va57+hMo oQv5nAL70jg/9+97GhVRYHRaIl1RTdbUmk0BWcJmy58F0k+qTbxaUllRd36wj/gxSUujmbukcPYHOYthXPMUOLBscpDHceyh2c59U0/w3ji+Vj3sYTNliY2pX0Wf4PSmzpZ8egsxTujRcQ6o= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2023-06-21 17:38:03 [+0200], Petr Mladek wrote: > If I get it correctly, RT solve both possible deadlocks by offloading > the nested operation into a kthread (irq and printk threads). > Plus, printk uses emergency_write() when the kthread is not usable. correct. > If this is true then printk_safe_enter() might be a nop on RT. it is gone actually. > All possible deadlocks are prevented either by the kthread or > the console->emergency_write() call. correct. > But wait. AFAIK, the printk kthread is implemented only for the serial > console. So, it won't be safe for other consoles, especially > the problematic tty_insert_flip_string_and_push_buffer() call. if the special printing is not implemented then there is only printing via the kthread meaning no emergency printing if needed. Once the interface is upstream, more drivers can be added including GPU based. > Note that adding printk thread for the graphical consoles will > be a real challenge. We have hard times even with the basic > UART 8250. There are still races possible in the implementation > in the RT tree... John has been mumbling something about those but did not send anything. > OK, what about using migrate_disable() in printk_deferred_enter()? > Something like: >=20 > /* > * The printk_deferred_enter/exit macros are available only as a hack. > * They define a per-CPU context where all printk console printing > * is deferred because it might cause a deadlock otherwise. > * > * It is highly recommended to use them only in a context with interrupts > * disabled. Otherwise, other unrelated printk() calls might be deferred > * when they interrupt/preempt the deferred code section. > * > * They should not be used for to deffer printing of many messages. It mi= ght > * create softlockup when they are flushed later. > * > * IMPORTANT: Any new use of these MUST be consulted with printk maintain= ers. > * It might have unexpected side effects on the printk infrastructure. > */ > #ifdef CONFIG_PREEMPT_RT >=20 > #define printk_deferred_enter() \ > do { \ > migrate_disable(); \ > __printk_safe_enter(); \ > } while (0) >=20 > #define printk_deferred_exit() \ > do { \ > __printk_safe_exit(); \ > migrate_enable(); \ > } while (0) >=20 > #else /* CONFIG_PREEMPT_RT */ >=20 > #define printk_deferred_enter() \ > do { \ > preempt_disable(); \ > __printk_safe_enter(); \ > } while (0) >=20 > #define printk_deferred_exit() \ > do { \ > __printk_safe_exit(); \ > preempt_enable(); \ > } while (0) >=20 > #endif /* CONFIG_PREEMPT_RT */ >=20 > Note that I have used preempt_disable() on non-RT because it is much > cheaper. And IRQs will be disabled anyway on non-RT system > in this code path. It would do _but_ is it really needed? All users but one (the one in __build_all_zonelists()) have interrupts already disabled. This isn't a hot path and is not used very common. Does it justify the ifdef? An unconditional migrate_disable() would do the job if you want it to be safe but I would rather suggest lockdep annotation instead of disabling either preemption or migration. If I read the comment on top right, this API isn't open for the public. The global variable would be probably be simpler but I guess you want to have direct printing on other CPUs. To be honst, I would suggest to work towards always printing in a thread with an emergency console than this deferred/ safe duckt tape. Once tty_port::lock is acquired, every possible printk output, say a BUG, kasan,=E2=80=A6 is a possible death trap. This limitation here wasn't obvio= us and syzbot had to figure it out. There might be other side effects. > > > Disabling IRQ before incrementing zonelist_update_seq is _required_ f= or both > > >=20 > > > making printk_deferred_enter() safe > > >=20 > > > and > > >=20 > > > making sure that printk_deferred_enter() takes effect > > >=20 > > > . > > Did I explain why it is sufficient to do > > write_seqlock_irqsave() > > printk_deferred_enter() > >=20 > > assuming we have > >=20 > > | static inline void do_write_seqcount_begin_nested(seqcount_t *s, int = subclass) > > | { > > | seqcount_acquire(&s->dep_map, subclass, 0, _RET_IP_); > > | do_raw_write_seqcount_begin(s); > > | } >=20 > Will this prevent any printk() called on the same CPU before > printk_deferred_enter() is called? Yes. There is the _irqsave() which disables interrupts at the very beginning. Then we have here (seqcount_acquire()) the last possible origin of a lockdep splat. It ends with do_raw_write_seqcount_begin() which increments the counter (the "locking") and this is followed by printk_deferred_enter(). > Best Regards, > Petr Sebastian