From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 397D8C001E0 for ; Sat, 29 Jul 2023 11:06:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1E7F38D0002; Sat, 29 Jul 2023 07:06:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 195738D0001; Sat, 29 Jul 2023 07:06:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 05D4B8D0002; Sat, 29 Jul 2023 07:06:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id EA4CD8D0001 for ; Sat, 29 Jul 2023 07:06:06 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id A4DDD1A013C for ; Sat, 29 Jul 2023 11:06:06 +0000 (UTC) X-FDA: 81064369932.12.4C18E3A Received: from www262.sakura.ne.jp (www262.sakura.ne.jp [202.181.97.72]) by imf13.hostedemail.com (Postfix) with ESMTP id 2019E20006 for ; Sat, 29 Jul 2023 11:06:03 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=none; spf=none (imf13.hostedemail.com: domain of penguin-kernel@I-love.SAKURA.ne.jp has no SPF policy when checking 202.181.97.72) smtp.mailfrom=penguin-kernel@I-love.SAKURA.ne.jp; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690628764; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZEAyOtYSMbPY1efDALYxlmmM24YaoFsX+Jp8PX7Fvxs=; b=eFT41KCDwH5p+Z62qnVTgdm94T5Tw0X4fSLPYRDN9VVuT0CFaxksWe7kjqMUFwqUBW0cdZ jz2lLqC2yHvC+lxu9PJOst6PcmExbWPADz9mqw/81mCpBQW5TyGCEhuPEw2HjPvy2Zsx18 rIxbti6LJ9Ul3dprtE/MEcAQHGqfuZ0= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690628764; a=rsa-sha256; cv=none; b=OBr2L85vrOTUchoe2gs09PapJN3LYWJvLY0PQ4kcOZdWJ4f8XpKq1hYe5T5hH0PSNWQoLc qjNifZOWxDLMScadglSaAKL3IGk+weYL+NsrDOka8JRQdDy4Hn0ebwBTdGJJiwsEnJaST4 6Xh9MI+JlZ6Sh/zCsqB39NYyw5MDXwE= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=none; spf=none (imf13.hostedemail.com: domain of penguin-kernel@I-love.SAKURA.ne.jp has no SPF policy when checking 202.181.97.72) smtp.mailfrom=penguin-kernel@I-love.SAKURA.ne.jp; dmarc=none Received: from fsav311.sakura.ne.jp (fsav311.sakura.ne.jp [153.120.85.142]) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTP id 36TB5jLB045161; Sat, 29 Jul 2023 20:05:45 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) Received: from www262.sakura.ne.jp (202.181.97.72) by fsav311.sakura.ne.jp (F-Secure/fsigk_smtp/550/fsav311.sakura.ne.jp); Sat, 29 Jul 2023 20:05:45 +0900 (JST) X-Virus-Status: clean(F-Secure/fsigk_smtp/550/fsav311.sakura.ne.jp) Received: from [192.168.1.6] (M106072142033.v4.enabler.ne.jp [106.72.142.33]) (authenticated bits=0) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTPSA id 36TB5iJX045156 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NO); Sat, 29 Jul 2023 20:05:45 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) Message-ID: <649fa1a7-4efd-8cc7-92c7-ac7944adc283@I-love.SAKURA.ne.jp> Date: Sat, 29 Jul 2023 20:05:43 +0900 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Subject: Re: [PATCH v2 1/2] seqlock: Do the lockdep annotation before locking in do_write_seqcount_begin_nested() Content-Language: en-US From: Tetsuo Handa To: Sebastian Andrzej Siewior , Michal Hocko Cc: Petr Mladek , linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Luis Claudio R. Goncalves" , Andrew Morton , Boqun Feng , Ingo Molnar , John Ogness , Mel Gorman , Peter Zijlstra , Thomas Gleixner , Waiman Long , Will Deacon References: <20230623171232.892937-1-bigeasy@linutronix.de> <20230623171232.892937-2-bigeasy@linutronix.de> <20230626081254.XmorFrhs@linutronix.de> <20230727151029.e_M9bi8N@linutronix.de> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 2019E20006 X-Rspam-User: X-Stat-Signature: su7w7ffmpe9yefeh3xd4k9y9kkej67yk X-Rspamd-Server: rspam03 X-HE-Tag: 1690628763-384647 X-HE-Meta: U2FsdGVkX1/1dPS+6PLTEyltWR57ejefvAU+e3iFbRtAwiPGLz9e5pchYFnqELZ68TyFnXDqTxEzPi2cuYhlZa3lIagvcjv4a/072vMxs0da/vahuYdVvJ1nlra+D6BICYC4CrbV7APaCHaK0XaF6JNPSPNLV9o85v4/vXHqZimyWEag83etmPV+J8Qa8LF7z/R9nZOLTGHJ1jmNBpRQMmprFxvFIyfHzooutmRWx462oo9wiLxZ2hSaSnj5xL5BrI1eV42j94ingWL9XuC/e8hrhzyJCrx/P5qFh3Ik0QomNy3G+2QJYWtY2UxXetnM5aOVoqkOJCrwdmJV+3KFFgzpv18PwKxuA85KR/L1GyogUdHP11ALkUHJ7jqpXBph/RWH4qRc2RGu14cv2zjjuhUIkEMdChoSS9Kw98KlyytYvXox+hFXO+jnca/c9q6TwQKJ3Nyw1sW6//RvrWff30J8COBBfGOoFRN4iSqADPAHWoPybLcjbFfRI/baEVfBMjM2fD2M1AwwGctX0XMOnYzh2dneaul7tMz1VZxJzapLt185vaZEe+QcAQNBNM2msy0EhiWWxQ9pPhDcxki6xtLJ6om6OdP4t9Sr0oTk3z4ksyk7IPCkZA1QArmjNeglsx20VLD8/D7Vrwz3BdCaEG08NYpQs0bqE/fq9bvkhQWgX5kb9tGcBuflQ8vf3uLkGawug5ve0AH1HM57gJvRYiXVYdBqMX/8R4+dBMh4EOC4RtAHFjgDY7MjY9qQWNE8xb3VMDj8zslx68PQxX5JetRXAeycJW0OLwGXA73o7EF9k3xuGf17yUqoEDTwdvokNhxOUGkKl/uto0t5w1KbNoIn2P2oKNCDcam8+THevkBtb9kbThAjVyhK/PgMIepAt9O6ECw42gFOnG3CGTSDobrfKVG93dg3YmHDTwaFKzXovsblSQdrTG3W25assb8ULgwaavVzUL/xAF+7wLt IwLLwP3y VllAKvrZfmI7LRgTSEngELrZqJV7FdUrRk/HDQV87pIINU6UW4HdPpfD6Jy0uUdiCwvHC/QXwYr3fSxrnuvJO8vvc8rKoj3D7GdOqfp5gEJmc87OwBECfXdUM/HFR7TrCQmnVCWe+xbt6nM4PMdVenHPgLpP3Yr5abG5NGxpa7UvSqz4xdA9O/o5SqcceEUGUgrLMX+Y+0lTgUFIAEt6NNm/m4doS2+xV8xaNtj+Wwc//Eq2sIpv8xPwFlJQV4SYe4xu4klyTc/o8KXkLBJm9xIpBPmpmqKSOmtMnqukGsqXskcRWHpwo+cosRU02SaeFb7Qxqk1GrF6MKrJQDnGTS1D75ZXquuJtuk3U19tMqH+71AKKmkJcIDUm1OLRl+TPRHpWkDQvcUhUT48= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2023/07/29 14:31, Tetsuo Handa wrote: > On 2023/07/28 0:10, Sebastian Andrzej Siewior wrote: >> On 2023-06-28 21:14:16 [+0900], Tetsuo Handa wrote: >>>> Anyway, please do not do this change only because of printk(). >>>> IMHO, the current ordering is more logical and the printk() problem >>>> should be solved another way. >>> >>> Then, since [PATCH 1/2] cannot be applied, [PATCH 2/2] is automatically >>> rejected. >> >> My understanding is that this patch gets applied and your objection will >> be noted. > > My preference is that zonelist_update_seq is not checked by !__GFP_DIRECT_RECLAIM > allocations, which is a low-hanging fruit towards GFP_LOCKLESS mentioned at > https://lkml.kernel.org/r/ZG3+l4qcCWTPtSMD@dhcp22.suse.cz and > https://lkml.kernel.org/r/ZJWWpGZMJIADQvRS@dhcp22.suse.cz . > > Maybe we can defer checking zonelist_update_seq till retry check like below, > for this is really an infrequent event. > An updated version with comments added. diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 7d3460c7a480..92ecf5f2f76b 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3638,26 +3638,44 @@ EXPORT_SYMBOL_GPL(fs_reclaim_release); /* * Zonelists may change due to hotplug during allocation. Detect when zonelists - * have been rebuilt so allocation retries. Reader side does not lock and - * retries the allocation if zonelist changes. Writer side is protected by the - * embedded spin_lock. + * have been rebuilt so __GFP_DIRECT_RECLAIM allocation retries. Writer side + * makes this sequence odd before rebuilding zonelists and makes this sequence + * even after rebuilding zonelists. Sine writer side disables context switching + * when rebuilding zonelists, and !__GFP_DIRECT_RECLAIM allocation will not + * retry when zonelists changed, reader side does not need to hold a lock (but + * needs to use data_race() annotation), making locking dependency simpler. */ -static DEFINE_SEQLOCK(zonelist_update_seq); +static unsigned int zonelist_update_seq; static unsigned int zonelist_iter_begin(void) { if (IS_ENABLED(CONFIG_MEMORY_HOTREMOVE)) - return read_seqbegin(&zonelist_update_seq); + /* See comment above. */ + return data_race(READ_ONCE(zonelist_update_seq)); return 0; } -static unsigned int check_retry_zonelist(unsigned int seq) +static unsigned int check_retry_zonelist(gfp_t gfp, unsigned int seq) { - if (IS_ENABLED(CONFIG_MEMORY_HOTREMOVE)) - return read_seqretry(&zonelist_update_seq, seq); + if (IS_ENABLED(CONFIG_MEMORY_HOTREMOVE) && (gfp & __GFP_DIRECT_RECLAIM)) { + /* See comment above. */ + unsigned int seq2 = data_race(READ_ONCE(zonelist_update_seq)); - return seq; + /* + * "seq != seq2" indicates that __build_all_zonelists() has + * started or has finished rebuilding zonelists, hence retry. + * "seq == seq2 && (seq2 & 1)" indicates that + * __build_all_zonelists() is still rebuilding zonelists + * with context switching disabled, hence retry. + * "seq == seq2 && !(seq2 & 1)" indicates that + * __build_all_zonelists() did not rebuilt zonelists, hence + * no retry. + */ + return unlikely(seq != seq2 || (seq2 & 1)); + } + + return 0; } /* Perform direct synchronous page reclaim */ @@ -4146,7 +4164,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, * a unnecessary OOM kill. */ if (check_retry_cpuset(cpuset_mems_cookie, ac) || - check_retry_zonelist(zonelist_iter_cookie)) + check_retry_zonelist(gfp_mask, zonelist_iter_cookie)) goto restart; /* Reclaim has failed us, start killing things */ @@ -4172,7 +4190,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, * a unnecessary OOM kill. */ if (check_retry_cpuset(cpuset_mems_cookie, ac) || - check_retry_zonelist(zonelist_iter_cookie)) + check_retry_zonelist(gfp_mask, zonelist_iter_cookie)) goto restart; /* @@ -5136,22 +5154,17 @@ static void __build_all_zonelists(void *data) int nid; int __maybe_unused cpu; pg_data_t *self = data; + static DEFINE_SPINLOCK(lock); unsigned long flags; - /* - * Explicitly disable this CPU's interrupts before taking seqlock - * to prevent any IRQ handler from calling into the page allocator - * (e.g. GFP_ATOMIC) that could hit zonelist_iter_begin and livelock. - */ - local_irq_save(flags); - /* - * Explicitly disable this CPU's synchronous printk() before taking - * seqlock to prevent any printk() from trying to hold port->lock, for - * tty_insert_flip_string_and_push_buffer() on other CPU might be - * calling kmalloc(GFP_ATOMIC | __GFP_NOWARN) with port->lock held. - */ - printk_deferred_enter(); - write_seqlock(&zonelist_update_seq); +#ifdef CONFIG_PREEMPT_RT + migrate_disable() +#endif + /* Serialize increments of zonelist_update_seq. */ + spin_lock_irqsave(&lock, flags); + zonelist_update_seq++; + /* Tell check_retry_zonelist() that we started rebuilding zonelists. */ + smp_wmb(); #ifdef CONFIG_NUMA memset(node_load, 0, sizeof(node_load)); @@ -5188,9 +5201,13 @@ static void __build_all_zonelists(void *data) #endif } - write_sequnlock(&zonelist_update_seq); - printk_deferred_exit(); - local_irq_restore(flags); + /* Tell check_retry_zonelist() that we finished rebuilding zonelists. */ + smp_wmb(); + zonelist_update_seq++; + spin_unlock_irqrestore(&lock, flags); +#ifdef CONFIG_PREEMPT_RT + migrate_enable() +#endif } static noinline void __init