From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 79BFCEB64D8 for ; Thu, 22 Jun 2023 13:37:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DE8188D0002; Thu, 22 Jun 2023 09:37:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D97B48D0001; Thu, 22 Jun 2023 09:37:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C5F868D0002; Thu, 22 Jun 2023 09:37:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id B41128D0001 for ; Thu, 22 Jun 2023 09:37:07 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 41D65A0B51 for ; Thu, 22 Jun 2023 13:37:07 +0000 (UTC) X-FDA: 80930484894.29.C62E5C3 Received: from www262.sakura.ne.jp (www262.sakura.ne.jp [202.181.97.72]) by imf07.hostedemail.com (Postfix) with ESMTP id 4A18740007 for ; Thu, 22 Jun 2023 13:37:03 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=none; dmarc=none; spf=none (imf07.hostedemail.com: domain of penguin-kernel@I-love.SAKURA.ne.jp has no SPF policy when checking 202.181.97.72) smtp.mailfrom=penguin-kernel@I-love.SAKURA.ne.jp ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687441025; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5set+tSLU19Je79WxtnYlO3dWTXSktpfUPOkdTxUhvE=; b=Rx08dop0cZDWXSZx9V3/dyTp91ud8CedVZrQB2QCy8Fm0o6hsln3r/zomxu03YOgoM3ewy zxuJ4aSaCXVXR+LcnrH64wlZRkD38d+ABnPvYXJJF6IRpZRV3vbMVY5pCDfy0UCuRs3CRo JYQfcBut5OBqeD5FMmUw3yVzLNzxRwQ= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=none; dmarc=none; spf=none (imf07.hostedemail.com: domain of penguin-kernel@I-love.SAKURA.ne.jp has no SPF policy when checking 202.181.97.72) smtp.mailfrom=penguin-kernel@I-love.SAKURA.ne.jp ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687441025; a=rsa-sha256; cv=none; b=F5CVbBFvHdZo1s5SNDgO31rqWyyrv+OuwIjMJ7yLq1qazAyY75QBl9IUcg4BzAI4RqrV1s rYCQ8Nhs6BM0zZ2af72j+/ChlTebwMq6W91nPILCwYCi2BpXkboTr3DrvN6WF6n+LSQeQH saNlX9EeoYVywild11kZrdciu59XLEA= Received: from fsav313.sakura.ne.jp (fsav313.sakura.ne.jp [153.120.85.144]) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTP id 35MDaTPU090847; Thu, 22 Jun 2023 22:36:30 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) Received: from www262.sakura.ne.jp (202.181.97.72) by fsav313.sakura.ne.jp (F-Secure/fsigk_smtp/550/fsav313.sakura.ne.jp); Thu, 22 Jun 2023 22:36:29 +0900 (JST) X-Virus-Status: clean(F-Secure/fsigk_smtp/550/fsav313.sakura.ne.jp) Received: from [192.168.1.6] (M106072142033.v4.enabler.ne.jp [106.72.142.33]) (authenticated bits=0) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTPSA id 35MDaTBG090844 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NO); Thu, 22 Jun 2023 22:36:29 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) Message-ID: Date: Thu, 22 Jun 2023 22:36:27 +0900 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 Subject: Re: [PATCH] mm/page_alloc: Use write_seqlock_irqsave() instead write_seqlock() + local_irq_save(). Content-Language: en-US From: Tetsuo Handa To: Sebastian Andrzej Siewior , Petr Mladek Cc: linux-mm@kvack.org, "Luis Claudio R. Goncalves" , Andrew Morton , Mel Gorman , Michal Hocko , Thomas Gleixner References: <20230621104034.HT6QnNkQ@linutronix.de> <0e9fc992-8e05-2e63-b3b1-d8d3ce89fc16@I-love.SAKURA.ne.jp> <20230621130641.-5iueY1I@linutronix.de> <20230621143421.BgHjJklo@linutronix.de> <01031ffe-c81f-9cec-76fb-e70d548429cf@I-love.SAKURA.ne.jp> <8b6d3f39-c573-ca2b-957b-8c48c2fa68ad@I-love.SAKURA.ne.jp> In-Reply-To: <8b6d3f39-c573-ca2b-957b-8c48c2fa68ad@I-love.SAKURA.ne.jp> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 4A18740007 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: wh4kfiz6ytsmtt6zof1go3ewn1fhqpi7 X-HE-Tag: 1687441023-606601 X-HE-Meta: U2FsdGVkX19X1P1E7CyCMNfNdY5gmSmhjJJZmFizyEhlaOfye9F8GBFAEPv/q5iOdeiu4B6eK8zNqP3gloxsMXnbaUe5HXUmpz8jGHtsKRB7hQJSpeg3tTA6G9TMS9HncjMLk5Iv4M1I8S17qKN2wD0CiezArmpDkHwntTmwQvBPPfwTvMWnZyGDPnide2j5eDasjlu7sq2VYBMaEYPnT5LDGKE6IxI2czEuykNLwuvYUPtrloYPX2pNCPAacl+ErRQeT+BgYOqjki8rpaPfQ0fPWTWhggagQel2Yld0XU8s/w/33zPyKZExiUCWzwo3W5ERUsBjpVCOyTKDrKZsHx8eVYKps1ryI7USCsnAzWMFbl4dskqc/inXo9YaLyIx41pnoiVzOPDU1zUmiugtu95plG655mFHLNmrepoE18rWaZBmpsOH+Wb+9XjjXhM6kI2MEqkn0m+zrXs6LhbG3i4wc/vdmQ35fvhSEQlAH7KqW8loNFS6IkgNr4jw/R2z1ZC6Z5qBHNnejfBja4Y8sfEr726eOWZKxruuIruTxJZBAToY7VKgCNo8VIhq/iezcayOMJomPBlyo58KMWnRHO1VAzCYNd3H1KNu9xH+7WFQO9VP+YNGoZgMwBB4IWNuaLuTSSca9t/L1Ms6PoKuW8I2tddZG0UaGBKToT8Lj/WNPJ/BX3oMcdffJooiu3W57fApjHNP66Y1+3tJxnow0EpE/PQgeJFT2oyElOWOy/kOByUbrPYXRAIcZLE5mPmWf2BPejkGZ4rFbudGZvJlHRu/528e041yE17Mfv8ufz1ZmdYmfL9JfxgsT9f0XZRnFQFWYJSsGNPKH6Nsj/pILJ3LsiNmV0rQkwI7zaee9L6wFpnHtpZ0qWoNLC8TlOUk22f3sHIWLV89qsQbDl5vn4mI71Ybu3F/8F1U0uunAf815EKwHfCCJWaUA68t/Is9PQgW/S9yYZVpTX57Qnk U/27nLVD ExtN40EvKWOzPcSbZ6BTejixTkWkNUR6ujDJwzzzkJaQYYVhY1xdx5/HfoNTVMkXx7rbXG3KufFKhlJDOODYbasIVSmR2iv+NKxojW+eQ2GM+46AEBz5HdLy1xMdjhfvApOTv/8RvYKSrG2ip2DQWJsMVrJYJgwd5FUzSQE/VZuGBzLl3Tzv0A4Q1aneG5b2s7lUODQlTxgUv5f6yWJFU6UnrTMwL+EB5pMFcbPqO4D8FYNcfTwm/vpSl+N/mM7aW59O7 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2023/06/22 8:24, Tetsuo Handa wrote: > By the way, given > > write_seqlock_irqsave(&zonelist_update_seq, flags); > <> > some_timer_function() { > kmalloc(GFP_ATOMIC); > } > <> > printk_deferred_enter(); > > scenario in CONFIG_PREEMPT_RT=y case is handled by executing some_timer_function() > on a dedicated kernel thread for IRQs, what guarantees that the kernel thread for > IRQs gives up CPU and the user thread which called write_seqlock() gains CPU until > write_sequnlock() is called? How can the kernel figure out that executing the user > thread needs higher priority than the kernel thread? I haven't got response on this question. Several years ago, I demonstrated that a SCHED_IDLE priority userspace thread holding oom_lock causes other concurrently allocating !SCHED_IDLE priority threads to misunderstand that mutex_trylock(&oom_lock) failure implies we are making forward progress (despite the SCHED_IDLE priority userspace thread was unable to wake up for minutes). If a SCHED_IDLE priority thread which called write_seqlock_irqsave() is preempted by some other !SCHED_IDLE priority threads (especially realtime priority threads), and such !SCHED_IDLE priority thread calls kmalloc(GFP_ATOMIC) or printk(), a similar thing (misunderstand that spinning on read_seqbegin() from zonelist_iter_begin() can make forward progress despite a thread which called write_seqlock_irqsave() cannot make progress due to preemption) can happen. Question to Sebastian: To make sure that such thing cannot happen, we should make sure that a thread which entered write_seqcount_begin(&zonelist_update_seq.seqcount) from write_seqlock_irqsave(&zonelist_update_seq, flags) can continue using CPU until write_seqcount_end(&zonelist_update_seq.seqcount) from write_seqlock_irqrestore(&zonelist_update_seq, flags). Does adding preempt_disable() before write_seqlock(&zonelist_update_seq, flags) help? Question to Peter: Even if local_irq_save(flags) disables IRQ, NMI context can enqueue message via printk(). When does the message enqueued from NMI context gets printed? If there is a possibility that the message enqueued from NMI context gets printed between "write_seqlock_irqsave(&zonelist_update_seq, flags) and printk_deferred_enter()" or "printk_deferred_exit() and write_sequnlock_irqrestore(&zonelist_update_seq, flags)" ? If yes, we can't increment zonelist_update_seq.seqcount before printk_deferred_enter()...