From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D043BEB64D7 for ; Wed, 21 Jun 2023 23:24:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3B14A8D0003; Wed, 21 Jun 2023 19:24:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 360128D0002; Wed, 21 Jun 2023 19:24:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 24F748D0003; Wed, 21 Jun 2023 19:24:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 141928D0002 for ; Wed, 21 Jun 2023 19:24:59 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id CBEF1A0256 for ; Wed, 21 Jun 2023 23:24:58 +0000 (UTC) X-FDA: 80928337476.27.8539E23 Received: from www262.sakura.ne.jp (www262.sakura.ne.jp [202.181.97.72]) by imf24.hostedemail.com (Postfix) with ESMTP id 0B56118000C for ; Wed, 21 Jun 2023 23:24:55 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=none; dmarc=none; spf=none (imf24.hostedemail.com: domain of penguin-kernel@I-love.SAKURA.ne.jp has no SPF policy when checking 202.181.97.72) smtp.mailfrom=penguin-kernel@I-love.SAKURA.ne.jp ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687389897; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=aJEQgYtpXoE/8RG/JoLdhVjy+U6sD0c2/sre6lNTQ1Y=; b=NjDAi+12QbtDggQEYZJz9ZLfYQbcGvu3+iK7b/YpJK3HtS53Eb2uXWufFIS/jyfFAPMCq1 KjivSH8LhXTARgXqYs8vCb4lxsxBjuZOczoDZ9Ju189/MwzZ544pLwlsmWMdoLocI5SrF2 +uV/ve9Qzrc99LCkd7lQG9YJHBmS0PM= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=none; dmarc=none; spf=none (imf24.hostedemail.com: domain of penguin-kernel@I-love.SAKURA.ne.jp has no SPF policy when checking 202.181.97.72) smtp.mailfrom=penguin-kernel@I-love.SAKURA.ne.jp ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687389897; a=rsa-sha256; cv=none; b=05n6hrcD45cGlAZkh+PhKwVSPn/A/fO6NxyB+Py6uheQdjN3HSOyxdPXrTJSfYzwFliVzQ STYvaeMaBpIWg0IngfZI6xducZ4c1T66Dmd5QYBZcJQhp6ZzlzJjWUWyIrVBO+AZBV6Z+C RCZm501GuX4k3bCcS/K00ADjGiKfEIg= Received: from fsav414.sakura.ne.jp (fsav414.sakura.ne.jp [133.242.250.113]) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTP id 35LNOTnv077428; Thu, 22 Jun 2023 08:24:30 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) Received: from www262.sakura.ne.jp (202.181.97.72) by fsav414.sakura.ne.jp (F-Secure/fsigk_smtp/550/fsav414.sakura.ne.jp); Thu, 22 Jun 2023 08:24:29 +0900 (JST) X-Virus-Status: clean(F-Secure/fsigk_smtp/550/fsav414.sakura.ne.jp) Received: from [192.168.1.6] (M106072142033.v4.enabler.ne.jp [106.72.142.33]) (authenticated bits=0) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTPSA id 35LNOTbi077424 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NO); Thu, 22 Jun 2023 08:24:29 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) Message-ID: <8b6d3f39-c573-ca2b-957b-8c48c2fa68ad@I-love.SAKURA.ne.jp> Date: Thu, 22 Jun 2023 08:24:30 +0900 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 Subject: Re: [PATCH] mm/page_alloc: Use write_seqlock_irqsave() instead write_seqlock() + local_irq_save(). Content-Language: en-US From: Tetsuo Handa To: Sebastian Andrzej Siewior Cc: linux-mm@kvack.org, "Luis Claudio R. Goncalves" , Andrew Morton , Mel Gorman , Michal Hocko , Thomas Gleixner , Petr Mladek References: <20230621104034.HT6QnNkQ@linutronix.de> <0e9fc992-8e05-2e63-b3b1-d8d3ce89fc16@I-love.SAKURA.ne.jp> <20230621130641.-5iueY1I@linutronix.de> <20230621143421.BgHjJklo@linutronix.de> <01031ffe-c81f-9cec-76fb-e70d548429cf@I-love.SAKURA.ne.jp> In-Reply-To: <01031ffe-c81f-9cec-76fb-e70d548429cf@I-love.SAKURA.ne.jp> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 0B56118000C X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: txw651uzskqi1rge3nidywg16g5smui5 X-HE-Tag: 1687389895-368042 X-HE-Meta: U2FsdGVkX18mHc8PyJnCjsCy9nKszqMUO9Uwwm7FFdVaU0vkm+WtGaxbmL+El+AUgcNnmeLt46P4QovwVVC/Rq+ndytXmWNeyA2Pny9Qx1EyRvNV2Y4WSkCp4h1rO2s55Wu8s0WqhEjfedVLwKkhyhZNLtd6zftlre2YkE1EnTr46JJhoh21DMz/Sy5l3wI74lLbjSaBBtDXwO/VqRCYrBH2255cRsEJdk5qX3Xyutj7+x+oJJGcg3mbM2WSZMTbn8eyVr+h9NP8p3FeV8Eh0DGaYHJeNoF1VCrqs2e7e5ZTBQo+v+VoS/51PfIFi0f/VsZTNj44mPAKV7+m/Hl5Q1Wr+qkNEq8hFOtBaY7ER9u7ZV/8QQljzHh5tbjmy8oxSqPjQhk6dWMRXbZzpUVrLZ0hcVizaM11pc0uxvaDdRwsuW6Y1X5sH/7/0d57CPS5SHnYl3kiUsAkfI8om7ZuvqYBstHRRyDAVjprcoG+vPxUsXK4YSofPoC9cEK5fKBXdAlaO/Rea9msMVKC3pJVBqXVQYD588k6oCiAtMWsxa3DGBBJYF+AqqEp5bg4/lt2acLuP/qKn6OIh7hh/m7VDSKfra/onfq8JW9m6NId1zNapD2Q/Mdob791kHJWyEpd4/+o5v1NUs3XFdSVDTz9gt8BqTk+q2PRd3xGTbab9xZv14q7Q4sPZIFXaYkxhoF8Zqk5RxJ8OQV0iRCTw06aFWKQsOPAKvw4sfqaJhsRABwpve0YSj3aen4VkxV+PvMz0CMXdGdDzlNd3vORQ8Qm69n74y9rrZheVRUZ2OBdC+ed+BgyLsqqLeB7+dSeG0n9Tp/GDRTiYJZB76UB+Y73XjqqN0BjwqiWJtMuacezkgU5EYbgkU4i5kGnaA9kuPdi5QKgKV+BzyRziKEBxcVqyPWV3JvKibLO1ZTKBzDwCnnLaeQoC/LDrjtBHCelg1R74SIUfENXqwQrDQ2Tykz WfLgk6jG xhhNlbch5oOq5FypV6Oq1IbcH1CpZfk1B3s/m4TsKtEXjpm8AEhatV55aVHYsTFFtC7FWqU1WtqMPE9ZCGz86PgBc2ovfDR94FcHfKqgn5GydiDZhAzwgEQD4NtSbNTsZPZFHjmEHwSaO9W5woFjs8Pwa5JoTkSOYgKDSGclYQ0DU2BM7GfIoBetCCIqdExrknYBupVZHoLAz1epYE1ZTVuTcvdS6rwwIBzDVHQRYW0Jq7w5KB+jEWIIlQKWB95ijmaSWqzT48Qse1lA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2023/06/21 23:50, Tetsuo Handa wrote: > On 2023/06/21 23:34, Sebastian Andrzej Siewior wrote: >>> Also, if local_irq_save() is hidden due to RT, what guarantees that >>> >>> write_seqlock_irqsave(&zonelist_update_seq, flags); >>> <> >>> some_timer_function() { >>> printk(); >>> } >>> <> >>> printk_deferred_enter(); >>> >>> does not happen because write_seqlock_irqsave() does not disable IRQ? >> >> I don't see how zonelist_update_seq and printk here are connected >> without the port lock/ or memory allocation. But there are two things >> that are different on RT which probably answer your question: > > It is explained as the first deadlock scenario in commit 1007843a9190 > ("mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock"). > We have to disable IRQ before making zonelist_update_seq.seqcount odd. > Since we must replace local_irq_save() + write_seqlock() with write_seqlock_irqsave() for CONFIG_PREEMPT_RT=y case but we must not replace local_irq_save() + write_seqlock() with write_seqlock_irqsave() for CONFIG_PREEMPT_RT=n case, the proper fix is something like below? diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 47421bedc12b..e3e9bd719dcc 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5798,28 +5798,30 @@ static void per_cpu_pages_init(struct per_cpu_pages *pcp, struct per_cpu_zonesta #define BOOT_PAGESET_HIGH 0 #define BOOT_PAGESET_BATCH 1 static DEFINE_PER_CPU(struct per_cpu_pages, boot_pageset); static DEFINE_PER_CPU(struct per_cpu_zonestat, boot_zonestats); static void __build_all_zonelists(void *data) { int nid; int __maybe_unused cpu; pg_data_t *self = data; +#ifndef CONFIG_PREEMPT_RT unsigned long flags; /* * Explicitly disable this CPU's interrupts before taking seqlock * to prevent any IRQ handler from calling into the page allocator * (e.g. GFP_ATOMIC) that could hit zonelist_iter_begin and livelock. */ local_irq_save(flags); +#endif /* * Explicitly disable this CPU's synchronous printk() before taking * seqlock to prevent any printk() from trying to hold port->lock, for * tty_insert_flip_string_and_push_buffer() on other CPU might be * calling kmalloc(GFP_ATOMIC | __GFP_NOWARN) with port->lock held. */ printk_deferred_enter(); write_seqlock(&zonelist_update_seq); #ifdef CONFIG_NUMA @@ -5852,21 +5854,23 @@ static void __build_all_zonelists(void *data) * secondary cpus' numa_mem as they come on-line. During * node/memory hotplug, we'll fixup all on-line cpus. */ for_each_online_cpu(cpu) set_cpu_numa_mem(cpu, local_memory_node(cpu_to_node(cpu))); #endif } write_sequnlock(&zonelist_update_seq); printk_deferred_exit(); +#ifndef CONFIG_PREEMPT_RT local_irq_restore(flags); +#endif } static noinline void __init build_all_zonelists_init(void) { int cpu; __build_all_zonelists(NULL); /* By the way, given write_seqlock_irqsave(&zonelist_update_seq, flags); <> some_timer_function() { kmalloc(GFP_ATOMIC); } <> printk_deferred_enter(); scenario in CONFIG_PREEMPT_RT=y case is handled by executing some_timer_function() on a dedicated kernel thread for IRQs, what guarantees that the kernel thread for IRQs gives up CPU and the user thread which called write_seqlock() gains CPU until write_sequnlock() is called? How can the kernel figure out that executing the user thread needs higher priority than the kernel thread?