From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7F2F4EB64D8 for ; Thu, 22 Jun 2023 15:05:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E36718D0003; Thu, 22 Jun 2023 11:05:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DE7DE8D0001; Thu, 22 Jun 2023 11:05:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CAF308D0003; Thu, 22 Jun 2023 11:05:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id BED978D0001 for ; Thu, 22 Jun 2023 11:05:51 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 8D4C8B0643 for ; Thu, 22 Jun 2023 15:05:51 +0000 (UTC) X-FDA: 80930708502.26.9A51D58 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf29.hostedemail.com (Postfix) with ESMTP id 89D321201DE for ; Thu, 22 Jun 2023 15:04:45 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=ASbl95VH; spf=pass (imf29.hostedemail.com: domain of pmladek@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=pmladek@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687446286; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=vH9y4nXrL2gE4AYuuIvlmWvRvmW8082RYqFRkeBbNbk=; b=yPOWAPrCJvG0+nCob8PM7uzCz2L9ZDwC/Sa/2spiX53Yv337BeulkIjRBvo3lRGSfsSQl3 Gw9PxmraNZgrQZhKk+BLc8pv/NW32tl7a7MB3vXR8P0UURLjuaH4EdRFm1u6QPftVkGPnY lMmieXVZ1/YmitoZTl5L4EDcWxsQE8I= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687446286; a=rsa-sha256; cv=none; b=jH5SfvVhe8QxKNRc+iMIUPbuel3I0xP96cN00wMfvs0Ene1Fo5ZKsmFahzKVbU+92HjART wduhJXmkYFa0KT3dG9MkNTg36+rl60KU/SB3UQnh043T2oa0qJDbXIARgURbRWUh3P2Ygy jSBut2ChMCNPMj7EEZojxlSb1n/mf0k= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=ASbl95VH; spf=pass (imf29.hostedemail.com: domain of pmladek@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=pmladek@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 4D75F1F388; Thu, 22 Jun 2023 15:04:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1687446284; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=vH9y4nXrL2gE4AYuuIvlmWvRvmW8082RYqFRkeBbNbk=; b=ASbl95VHJV+ybV7phYz5aKkK82zAhKLDEgskwB9iJnvzyWD6hS8ZaL+bv88DZv3Uu9NZbH R4QZetib+Nf9vnO1/By4NYsDAoyB8MCMR5tVuqxWbJP67yEc3x3Nt0Bb13HBG/HiHOmp8v mwDtk0hRyt8uPS6GrK2YOhFLBNpK2Dk= Received: from suse.cz (unknown [10.100.201.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 18AEB2C141; Thu, 22 Jun 2023 15:04:44 +0000 (UTC) Date: Thu, 22 Jun 2023 17:04:43 +0200 From: Petr Mladek To: Tetsuo Handa Cc: Sebastian Andrzej Siewior , linux-mm@kvack.org, "Luis Claudio R. Goncalves" , Andrew Morton , Mel Gorman , Michal Hocko , Thomas Gleixner Subject: Re: [PATCH] mm/page_alloc: Use write_seqlock_irqsave() instead write_seqlock() + local_irq_save(). Message-ID: References: <20230621104034.HT6QnNkQ@linutronix.de> <0e9fc992-8e05-2e63-b3b1-d8d3ce89fc16@I-love.SAKURA.ne.jp> <20230621130641.-5iueY1I@linutronix.de> <20230621143421.BgHjJklo@linutronix.de> <01031ffe-c81f-9cec-76fb-e70d548429cf@I-love.SAKURA.ne.jp> <8b6d3f39-c573-ca2b-957b-8c48c2fa68ad@I-love.SAKURA.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: 89D321201DE X-Rspam-User: X-Stat-Signature: mmm6dqzwihtp5qchki1hino436yhrjcz X-Rspamd-Server: rspam03 X-HE-Tag: 1687446285-852573 X-HE-Meta: U2FsdGVkX1+BEX8ghY8XUIYN1y39lTeRpIZWTd7B1ChNPCTg002Ti/xCXleKFNJ9ZbcToC+8IhklykPa/Jgzna9yxdINnKJtcFsqK+dRcVGoEd60039PQ2mkzDZdPEeLtGd5OMAxI8+xCuO06WQtKWoWXOBCM7PJOEOJSFsb2CmreI0LvOxaVd79OZqhxUDY/VswfBjkvyEeg7MNYnO5DdnftLVHU6V8hnCFfgZbKZcsfNjjgBmenkqMrhXxiSsM66vk/DHqzz6mRFwndG4+Kwk5pMLQKxGD9R0ZILEX7bgBUBHky04aaIhLRxNT8zx5tn2Ok4N566yPeSiMVvd7Xuc5OSmH2G2Dtmdt6HWaqRabZBWBqQxaaR62TUzs7w2rf4QuAj+6iI8Ro+NVorTFXj+1BK8fDkvvhIGaIl5y2kC3g8jiG74irSTemopP7w58DTB6dDBGu70fTCQ7jCp2r60pyQUjF17TLQI59EXpwnwvfOwbIDvOcdRbKw6G0Ppi88pjHxJsBoL5NUvsTRbNqaZMqWXUFqMQqELIWfMISZuBwHTqTpQrD5CYyMNZI8T1vJAhf4jt+d7AwOHPeAlmciihER9NOZOab4LqddybSVpMp6OzMxTgqjPwvN3WBn9KvFORSzAU02xNNELKjAzUIcbUgKM5ztWPi2J9vsfCYHzeadWr4emIbNuaaUVKqQsKbIFfMqUHq0Elu2OlrCyfU2RqcgwH/aA5rPpwaY9u9/E8MfD6jjm0r3fXpw+ocpv2lrIvQE0eIRoZQNxb9aqsJLUReswO+VTa5QAGxpJrWiWlKw8RB410EEs9aUwGBeormXZhivP9qVQ+M9tinvMJtxQuWl5GZyqTT8Nrqnd7dUCOfQA1kzsgNF4nP6t9pjCYXQMuKSqkw94FQF7sbmEPkf086effnC5YEetW+/N92j91KIpFZ40N2RvcEUDw/YoGbDSfeB4VDzWGuC6qdh9 9XbWCaa/ VfHT1nYqXnrDBa7O66Xe2z6lV5MZpqkkYvIh/MfsN8I30im2jeIKcoTQFTF162Q7V1DF2x3Xu0aKExI+BM75tAnz0yWo9OBOrHM2TBbIrBnGVZys5EUI+eUXCNBFiksaHIWyOKbPAU3XaUfnuf/CosUYpVlXEj3xEtrVDFLZw9nTUVlrY+3ZVoHmAGdRf1DuqB70h X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu 2023-06-22 16:11:41, Petr Mladek wrote: > On Thu 2023-06-22 22:36:27, Tetsuo Handa wrote: > > On 2023/06/22 8:24, Tetsuo Handa wrote: > > > By the way, given > > > > > > write_seqlock_irqsave(&zonelist_update_seq, flags); > > > <> > > > some_timer_function() { > > > kmalloc(GFP_ATOMIC); > > > } > > > <> > > > printk_deferred_enter(); > > > > > > scenario in CONFIG_PREEMPT_RT=y case is handled by executing some_timer_function() > > > on a dedicated kernel thread for IRQs, what guarantees that the kernel thread for > > > IRQs gives up CPU and the user thread which called write_seqlock() gains CPU until > > > write_sequnlock() is called? How can the kernel figure out that executing the user > > > thread needs higher priority than the kernel thread? My understanding is that this is achieved by spin_lock_irqsave(&sl->lock, flags). When RT is enabled then rt_spin_lock(lock) is used. AFAIK, rt_spin_lock(lock) fulfills exactly the above requirements. The owner could schedule. The waiter could schedule as well so that they could be running on the same CPU. Also the current owner gets higher priority when the is a waiter with the higher priority to avoid the priority inversion. > > I haven't got response on this question. > > > > Several years ago, I demonstrated that a SCHED_IDLE priority userspace thread holding > > oom_lock causes other concurrently allocating !SCHED_IDLE priority threads to > > misunderstand that mutex_trylock(&oom_lock) failure implies we are making forward > > progress (despite the SCHED_IDLE priority userspace thread was unable to wake up for > > minutes). > > > > If a SCHED_IDLE priority thread which called write_seqlock_irqsave() is preempted by > > some other !SCHED_IDLE priority threads (especially realtime priority threads), and > > such !SCHED_IDLE priority thread calls kmalloc(GFP_ATOMIC) or printk(), a similar thing > > (misunderstand that spinning on read_seqbegin() from zonelist_iter_begin() can make > > forward progress despite a thread which called write_seqlock_irqsave() cannot make > > progress due to preemption) can happen. > > > > Question to Sebastian: > > To make sure that such thing cannot happen, we should make sure that > > a thread which entered write_seqcount_begin(&zonelist_update_seq.seqcount) from > > write_seqlock_irqsave(&zonelist_update_seq, flags) can continue using CPU until > > write_seqcount_end(&zonelist_update_seq.seqcount) from > > write_seqlock_irqrestore(&zonelist_update_seq, flags). > > Does adding preempt_disable() before write_seqlock(&zonelist_update_seq, flags) help? > > > > Question to Peter: > > Even if local_irq_save(flags) disables IRQ, NMI context can enqueue message via printk(). > > When does the message enqueued from NMI context gets printed? > > They are flushed to the console either by irq_work or by another > printk(). The irq_work could not be proceed when IRQs are disabled. > But another non-deferred printk() would try to flush them immediately. > > > If there is a possibility > > that the message enqueued from NMI context gets printed between > > "write_seqlock_irqsave(&zonelist_update_seq, flags) and printk_deferred_enter()" or > > "printk_deferred_exit() and write_sequnlock_irqrestore(&zonelist_update_seq, flags)" ? > > If yes, we can't increment zonelist_update_seq.seqcount before printk_deferred_enter()... > > It might happen when a printk() is called in these holes. I believe that this hole is the only remaining problem. IMHO, the only solution is to disable migration/preemtion in printk_deferred_enter(). It is not a big deal, really. __rt_spin_lock() would call migrate_disable() anyway. And nested migrate_disable() is fast. It just increments the counter when it is not 0. I would suggest to do this change in printk_deferred_enter() first. It will allows to call it before write_seqlock_irqsave(). And we will not need to change the ordering back and forth. The result would look like: in kernel/linux/printk.h: static inline void printk_deferred_enter(void) { if (!defined(CONFIG_PREEMPT_RT)) preempt_disable(); else migrate_disable(); __printk_safe_enter(); } in mm/page_alloc.c printk_deferred_enter(); write_seqlock_irqsafe(&zonelist_update_seq, flags); Best Regards, Petr