From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BC846C001E0 for ; Thu, 10 Aug 2023 15:35:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2A0116B0071; Thu, 10 Aug 2023 11:35:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2501A6B0072; Thu, 10 Aug 2023 11:35:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 118BB6B0075; Thu, 10 Aug 2023 11:35:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 0274A6B0071 for ; Thu, 10 Aug 2023 11:35:39 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id BE5CA1CA0E6 for ; Thu, 10 Aug 2023 15:35:39 +0000 (UTC) X-FDA: 81108594798.24.31FDEF8 Received: from www262.sakura.ne.jp (www262.sakura.ne.jp [202.181.97.72]) by imf18.hostedemail.com (Postfix) with ESMTP id 3C1281C0024 for ; Thu, 10 Aug 2023 15:35:35 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=none; dmarc=none; spf=none (imf18.hostedemail.com: domain of penguin-kernel@I-love.SAKURA.ne.jp has no SPF policy when checking 202.181.97.72) smtp.mailfrom=penguin-kernel@I-love.SAKURA.ne.jp ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1691681737; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vR6Eu8apDiQ2YE1DiUI6PZDJEltXkRuwmmF6AT/x9Ow=; b=nPIGTDT09sAP+ZcUd4gI91CAHdk5aTBC1pcJ5/bOZ0KfG6Ov2oJ8O2hzUhFQoOsevC27Zw ckRFw2MN4AKHFKrIGNf1IT2sFiyp1MdtHExqFUe0ujfx7zEIA1qGhNZEGN6pshzHtCFiSX QL43enbKk8jVAYfkeEnhhyijjrh74wU= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=none; dmarc=none; spf=none (imf18.hostedemail.com: domain of penguin-kernel@I-love.SAKURA.ne.jp has no SPF policy when checking 202.181.97.72) smtp.mailfrom=penguin-kernel@I-love.SAKURA.ne.jp ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1691681737; a=rsa-sha256; cv=none; b=6c1KWuYE+mb3o3OZS7yNoXH+T5J+NI75Y8YJ8P/ChjNwNodkEYFwr1CSOcCuDmvUqADUID B23RHUJ81njm6CWxFBR4hL1wG5ipmW76De3G8D/YTXIy6MJjfrKPmOsR5EOiLUeb3OnJKG nMTNfHMnWf3FizHEnUtnpvs194YA/MQ= Received: from fsav315.sakura.ne.jp (fsav315.sakura.ne.jp [153.120.85.146]) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTP id 37AFZ6FH067394; Fri, 11 Aug 2023 00:35:06 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) Received: from www262.sakura.ne.jp (202.181.97.72) by fsav315.sakura.ne.jp (F-Secure/fsigk_smtp/550/fsav315.sakura.ne.jp); Fri, 11 Aug 2023 00:35:06 +0900 (JST) X-Virus-Status: clean(F-Secure/fsigk_smtp/550/fsav315.sakura.ne.jp) Received: from [192.168.1.6] (M106072142033.v4.enabler.ne.jp [106.72.142.33]) (authenticated bits=0) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTPSA id 37AFZ50L067391 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NO); Fri, 11 Aug 2023 00:35:06 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) Message-ID: Date: Fri, 11 Aug 2023 00:35:03 +0900 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.14.0 Subject: Re: [PATCH v2] mm/page_alloc: don't check zonelist_update_seq from atomic allocations Content-Language: en-US From: Tetsuo Handa To: Sebastian Andrzej Siewior Cc: Michal Hocko , Andrew Morton , Petr Mladek , linux-mm , LKML , "Luis Claudio R. Goncalves" , Boqun Feng , Ingo Molnar , John Ogness , Mel Gorman , Peter Zijlstra , Thomas Gleixner , Waiman Long , Will Deacon References: <6cc13636-eda6-6a95-6564-db1c9ae76bb6@I-love.SAKURA.ne.jp> <20230810072637.6Sc3UU3R@linutronix.de> <566173d4-84d1-c76b-6fe4-f5ea5f24f613@I-love.SAKURA.ne.jp> In-Reply-To: <566173d4-84d1-c76b-6fe4-f5ea5f24f613@I-love.SAKURA.ne.jp> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 3C1281C0024 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: kbkfsbnk15to5y98s694xyriwdkscj6t X-HE-Tag: 1691681735-178059 X-HE-Meta: U2FsdGVkX185AKQTGFpU2eYxwQpNi8m0XEcX3xlLN+8Ee3gZiyjtcqG4py08/5dLyO1MvcirqHdUO8XQOwtqlXzLrW1RfFnyK2NQo0s1bIpYo82hlxRWCJAMbBf+0hEHB2EOwylZvPXhqHm8ZJ/w96cKZWt+j5QFwMJgozBhkEgyrDQhPvcMsp+JJ6EdTCgrwrmINM8w2b7VJu8lPYUKM4gWqty8bs4GDNR1KYHKWnLSxiraze6bwUy08iUktfpITkW9/ZlAY5b9xD4T7PFB0R1pqL2e5yljjwICI7KMrbGXC9ynMpqxRjcTheVXzopdjcCS9oPTrIQ8tVY1xhY48VTBKU+SOzA6BufgxSrhLCoEc5GA8fm0vPqZbWGUXaLdWRXsVxzGTbRIPFjoicmJFbcJBFuPXAFA0QJDYUAWmdJMUeY0slasT4FsgUsaNSq1fC8kFunFOXBeryQd1mlDTvv+iOf8694LZ+oXYNSMlez6r0ZcY7C1fkV4nYN29pM0Uaf1HSlNoKonvyz+OWd78xcQKVNcvBrPow6Na4RASYjLdYDRKDoL1gGwKL1IoV/ffCO3K8UxvvIEwQzS18QvU3lPFz7qdaVtu8oaWTCF4ZUkGAbvY3wE2dt4Fe7BWXnr1dsxBejiBMzRhxJGKp6eTkghruI4uNVeGVYC8KkgGBiG2UaOYPANSUPeHa7V1stdccUY0Kt2IQ1YLF5JsZ9c4iI7WeZ2f9HHZqnEIULsMVdgN64APLe094WW+wKubvbZsCHpDb9jo7q3izQniLvk34M9lasetnw/AwuKaFqYw80on0FnwBBItI2aScAzP5KfZrl5og2v/rYL12prBoYOhsRGG/eAmC28S8C25uJFSr8uubeYzZxWs7MdXVJX5x1SANNmQ7u0jp1T4HdVFVc5jxkpyG3y9dPf7ONR4RJ8ymohxmp9j/w9Kf6GHyLdFlAQtZjKZG2JvJo4BwzGQJf PpywpsJm bf0B12qqnj2MtMxeQeHqNT9H6x13h4LQyzn0i1xcs5CARJY5sPnUJJIHzJglJWdy9O8IzAk9pjUfi1PVSaEZlXnx4D9jOkOpDVjxapFEqHLFdl3D9ITjb4zRbArZK7WnS9sXEae2ICmgaYiP2mcDaswKBXWZDKU+SizLD0xqe/UO4rHiHDe8Ll0wZuxy4KQ3+JtWaFBe0/1I58NDNrYuff3tpwIiKFKBV1Bn/JwEk18FIrQC+5v7/ni+uK3yOGFqkkLYRvWKkpp2Btitrv7jIy9o3oNH+02PJwM2fCH065zbXtvygGw3Beq9cVJQyEWpgPUsQ/2lXuULcR+lyS/Eo/PSpRWaCzHGNNVaW+qyi+YHp5tOIuGWkdihu4qLSIzbsM84wxw5ddF4prCs= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2023/08/10 18:58, Tetsuo Handa wrote: > If __build_all_zonelists() can run without being switched to other threads > (except interrupt handlers), I consider that this approach works. If there is no way to make sure that the section between write_seqlock(&zonelist_update_seq) and write_sequnlock(&zonelist_update_seq) runs without context switching (interrupts handlers are fine), something like below could be used in order to keep spin_lock(s->lock); spin_unlock(s->lock); away from seqprop_sequence() from atomic allocations. But I think that looses the reason to replace read_seqbegin() with raw_seqcount_begin(); that will be essentially the same with https://lkml.kernel.org/r/dfdb9da6-ca8f-7a81-bfdd-d74b4c401f11@I-love.SAKURA.ne.jp . diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 7d3460c7a480..f2f79caab2cf 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3644,20 +3644,20 @@ EXPORT_SYMBOL_GPL(fs_reclaim_release); */ static DEFINE_SEQLOCK(zonelist_update_seq); -static unsigned int zonelist_iter_begin(void) +static unsigned int zonelist_iter_begin(gfp_t gfp) { - if (IS_ENABLED(CONFIG_MEMORY_HOTREMOVE)) - return read_seqbegin(&zonelist_update_seq); + if (IS_ENABLED(CONFIG_MEMORY_HOTREMOVE) && (gfp & __GFP_DIRECT_RECLAIM)) + return data_race(raw_seqcount_begin(&zonelist_update_seq.seqcount)); return 0; } -static unsigned int check_retry_zonelist(unsigned int seq) +static unsigned int check_retry_zonelist(gfp_t gfp, unsigned int seq) { - if (IS_ENABLED(CONFIG_MEMORY_HOTREMOVE)) - return read_seqretry(&zonelist_update_seq, seq); + if (IS_ENABLED(CONFIG_MEMORY_HOTREMOVE) && (gfp & __GFP_DIRECT_RECLAIM)) + return data_race(read_seqcount_retry(&zonelist_update_seq.seqcount, seq)); - return seq; + return 0; } /* Perform direct synchronous page reclaim */ @@ -3968,7 +3968,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, no_progress_loops = 0; compact_priority = DEF_COMPACT_PRIORITY; cpuset_mems_cookie = read_mems_allowed_begin(); - zonelist_iter_cookie = zonelist_iter_begin(); + zonelist_iter_cookie = zonelist_iter_begin(gfp_mask); /* * The fast path uses conservative alloc_flags to succeed only until @@ -4146,7 +4146,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, * a unnecessary OOM kill. */ if (check_retry_cpuset(cpuset_mems_cookie, ac) || - check_retry_zonelist(zonelist_iter_cookie)) + check_retry_zonelist(gfp_mask, zonelist_iter_cookie)) goto restart; /* Reclaim has failed us, start killing things */ @@ -4172,7 +4172,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, * a unnecessary OOM kill. */ if (check_retry_cpuset(cpuset_mems_cookie, ac) || - check_retry_zonelist(zonelist_iter_cookie)) + check_retry_zonelist(gfp_mask, zonelist_iter_cookie)) goto restart; /* @@ -5138,20 +5138,7 @@ static void __build_all_zonelists(void *data) pg_data_t *self = data; unsigned long flags; - /* - * Explicitly disable this CPU's interrupts before taking seqlock - * to prevent any IRQ handler from calling into the page allocator - * (e.g. GFP_ATOMIC) that could hit zonelist_iter_begin and livelock. - */ - local_irq_save(flags); - /* - * Explicitly disable this CPU's synchronous printk() before taking - * seqlock to prevent any printk() from trying to hold port->lock, for - * tty_insert_flip_string_and_push_buffer() on other CPU might be - * calling kmalloc(GFP_ATOMIC | __GFP_NOWARN) with port->lock held. - */ - printk_deferred_enter(); - write_seqlock(&zonelist_update_seq); + write_seqlock_irqsave(&zonelist_update_seq, flags); #ifdef CONFIG_NUMA memset(node_load, 0, sizeof(node_load)); @@ -5188,9 +5175,7 @@ static void __build_all_zonelists(void *data) #endif } - write_sequnlock(&zonelist_update_seq); - printk_deferred_exit(); - local_irq_restore(flags); + write_sequnlock_irqrestore(&zonelist_update_seq, flags); } static noinline void __init