From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6C729C761A6 for ; Tue, 4 Apr 2023 15:20:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E3B2A6B0071; Tue, 4 Apr 2023 11:20:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DC515900002; Tue, 4 Apr 2023 11:20:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C3E496B0074; Tue, 4 Apr 2023 11:20:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id AC9746B0071 for ; Tue, 4 Apr 2023 11:20:42 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 401CC80EF3 for ; Tue, 4 Apr 2023 15:20:41 +0000 (UTC) X-FDA: 80644070682.26.20586EE Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf02.hostedemail.com (Postfix) with ESMTP id 94A4680027 for ; Tue, 4 Apr 2023 15:20:37 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=uQuxU21R; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf02.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.28 as permitted sender) smtp.mailfrom=mhocko@suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680621637; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=cjbfHmCCjLJH+dlTfw5k2w28+HBd1vdQYQzxswazcHo=; b=fgg8W4X4W4nvqYLtpIWo4V1qTVsva2Gqn3r+4fIFYm2g4LxymVFL1lR7AsHXONvxTbBIAJ C6msOt3Gyh7c64GCJoO59fc6BQaxHvnRqybuq2rQvDB8MVV+nb39jOFK2lr/u3oDmoysTb CN5ulfO1qFFXiwm6WAu2XZNHVlRLQ1M= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=uQuxU21R; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf02.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.28 as permitted sender) smtp.mailfrom=mhocko@suse.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1680621637; a=rsa-sha256; cv=none; b=Ow8iRaDVU7td2AUwi4E2aQaV0z9C2IMOCFTsuq51dD2G2oDlvHMK7humwO0rw6s/TV4Fr5 zbxsCCNBAK/seb8KRW8713ZHzZgJ3wx/O2tv26JgKZn6I4KJ3cu5Rew91/dPEF0DH+Ckn0 kw0PV1z5cAVD1ORnKili5EdxVmMO6/Y= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id EBD022285F; Tue, 4 Apr 2023 15:20:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1680621635; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=cjbfHmCCjLJH+dlTfw5k2w28+HBd1vdQYQzxswazcHo=; b=uQuxU21R4RNs+Nf3HHOy8/fcB/1RNP3Ylv+Fiyikc0U0jFIy7eKvCk5Qpg/i63skgfrgKc oLUTLwZWHXARZ5z8WPXGQEZPMnqKxuBiML179yTnUjo7F9CQ2hEfJV3AVGmTRaEXMRhXCi KCUQMZ/RxqbvUE4/YS3/VYqB8sMzP0w= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id DA9E013920; Tue, 4 Apr 2023 15:20:35 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id zMysNENALGR3AwAAMHmgww (envelope-from ); Tue, 04 Apr 2023 15:20:35 +0000 Date: Tue, 4 Apr 2023 17:20:35 +0200 From: Michal Hocko To: Tetsuo Handa Cc: Petr Mladek , Patrick Daly , Mel Gorman , David Hildenbrand , Andrew Morton , Sergey Senozhatsky , Steven Rostedt , John Ogness , syzkaller-bugs@googlegroups.com, Ilpo =?iso-8859-1?Q?J=E4rvinen?= , syzbot , linux-mm Subject: Re: [PATCH v2] mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock Message-ID: References: <78ff6e70-e986-1fcb-eafb-3edd5f2bceae@I-love.SAKURA.ne.jp> <6266b161-e4c3-7d65-6590-da6cc04d93ec@I-love.SAKURA.ne.jp> <0585ddb9-5de8-8cdd-202e-53887bbb6b5f@I-love.SAKURA.ne.jp> <8796b95c-3da3-5885-fddd-6ef55f30e4d3@I-love.SAKURA.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8796b95c-3da3-5885-fddd-6ef55f30e4d3@I-love.SAKURA.ne.jp> X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 94A4680027 X-Stat-Signature: qcgs1h3omkeneyzmixpscnansrpkbkk9 X-HE-Tag: 1680621637-200679 X-HE-Meta: U2FsdGVkX19B691cybHZwm1KnUN2eYnCnRH+zESnoQPwptxHipN4i6juQ/prGy1GjRJKdskszLZKI0Onulm/wI8m30YzZZKpHEuS8hNvz3CIJds++ZiXt1tFsLxR5nrMryQ7FQefQml2HHRGPHnkKkYz0UAp3+oxjgExg+/0i22ldrhJ5oGd2yJWcwilDo1Qp2qOvsOoAywsZPgQciX9qDVpDTZrfCia1bO0124F+rpXog8tIXOd8YFyLMNCamBUZDJa4PmfL6OJkU57L+TI6NC1kHz1mHz+Y2CwVrkVHQaqQFb+DzXTmUi0YE32EjuWS0dEyQISTsGw7Lg/FDss3E+C8FPR/fkVnuwovX4lLMGYH620scZrofQoeA+9WLii8CbQ5+c7jfFFPthJZ07oe8cKE6VDEo0oi7bua8O1Fxpnfwkbujaw4eKJLvvozAV/7xzZlLGVTnAQaKjeNmTd2s5GBSr5Y8jiY5swJZg48hoYX4NJb69BqhpH4stLWjcoa4tj8GsZvjI1bnE1YMbdGa62tb0k+bKYBVQ+SyCKBLrvkg7GRg6Z4W3cFiSEkdJFsP5DJeqPCF8SmPJiPyn7FC8rpqkR5reebgnmBEWEn3+ZQkarfCdR1bP6ckso65/7YPQ/bdqbg2JEYwjgMvmCealW8sEvxtKXPH176U9QOYHkoBylcDKJ7yYwmHWvu5p7WNlMrFNMgIoAYpXjRJDpva8Ff3OWF16y5LaNgShZTMbTUqaE7yA9QcRlXde3t45IWqRvWsJqMBFOPFid65CechzQcKHNPCgiSXqYRxfU1a6yIftka/NcNzAsjbvWY+Go4R3PwAzUy8lFZRrKCwVAzjL11Tj2WsDYs6+rzwPbEx/iu/EdZatSUEBzyLbcIBzOH6U086rl/dSghWR6N/WM8yT2Brji4t8IkxXQtICplCPRSrTUcJAmH9MaXluarwDhRLpbJVOTPQIN2F9uha5 CCgW1mZq ERfGsxQaHuzBiu3TkjVjvZDX+UOmg/Bh4yDYjhivf8ruwg8rUBAWrdzdYdoQjB3biYwzY7sGXLZiwKLPhi7Ab6BTLM13KQmtV+QgaeFLGOMkP07Xq9FQ4AY1pk0iNUJSU4b5fzSn7XJuvDwbg3VJaCJ1sCfo0sJATwueHGoU2IrRaYLfGsvtSOFxInp2AHWW+8tGY8V4MtYwT1wHmGoNTGgKgHOiXAa/t8ZAxFSjl77MhWaHOtDrd7plfv7wimqVz3z+M7kIFffjr3X7miTZF0u0SdeWWpl42G1odU8WyR1tNVIdnozcASBvjSEsDqNP+FF28CV9X4Lqttupjsn3jG12xCvWvdWkqKevjMF5fklg9lVN/Zulj9/MBwZcdp4Z35PPih8kGBQnorGOJ9q6Oct5K1FEe4wMjFft/vq9BaH54fdhm8poO8SR407f6haAYGmKn4qwDiQt6c0U0INi7X0tOXstwe9wm98tGjxJCBgnqvy+7DJaugBnyKUmJVmoId0SPt0JrcgMDFk5RTZCYlGE+jcVaBtJzTS5AZrPanfaI9GktHFp8krLlrAPoVcUkSQwp X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue 04-04-23 23:31:58, Tetsuo Handa wrote: > syzbot is reporting circular locking dependency which involves > zonelist_update_seq seqlock [1], for this lock is checked by memory > allocation requests which do not need to be retried. > > One deadlock scenario is kmalloc(GFP_ATOMIC) from an interrupt handler. > > CPU0 > ---- > __build_all_zonelists() { > write_seqlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount odd > // e.g. timer interrupt handler runs at this moment > some_timer_func() { > kmalloc(GFP_ATOMIC) { > __alloc_pages_slowpath() { > read_seqbegin(&zonelist_update_seq) { > // spins forever because zonelist_update_seq.seqcount is odd > } > } > } > } > // e.g. timer interrupt handler finishes > write_sequnlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount even > } > > This deadlock scenario can be easily eliminated by not calling > read_seqbegin(&zonelist_update_seq) from !__GFP_DIRECT_RECLAIM allocation > requests, for retry is applicable to only __GFP_DIRECT_RECLAIM allocation > requests. But Michal Hocko does not know whether we should go with this > approach. It would have been more useful to explain why that is not preferred or desirable. > Another deadlock scenario which syzbot is reporting is a race between > kmalloc(GFP_ATOMIC) from tty_insert_flip_string_and_push_buffer() > with port->lock held and printk() from __build_all_zonelists() with > zonelist_update_seq held. > > CPU0 CPU1 > ---- ---- > pty_write() { > tty_insert_flip_string_and_push_buffer() { > __build_all_zonelists() { > write_seqlock(&zonelist_update_seq); > build_zonelists() { > printk() { > vprintk() { > vprintk_default() { > vprintk_emit() { > console_unlock() { > console_flush_all() { > console_emit_next_record() { > con->write() = serial8250_console_write() { > spin_lock_irqsave(&port->lock, flags); > tty_insert_flip_string() { > tty_insert_flip_string_fixed_flag() { > __tty_buffer_request_room() { > tty_buffer_alloc() { > kmalloc(GFP_ATOMIC | __GFP_NOWARN) { > __alloc_pages_slowpath() { > zonelist_iter_begin() { > read_seqbegin(&zonelist_update_seq); // spins forever because zonelist_update_seq.seqcount is odd > spin_lock_irqsave(&port->lock, flags); // spins forever because port->lock is held > } > } > } > } > } > } > } > } > spin_unlock_irqrestore(&port->lock, flags); > // message is printed to console > spin_unlock_irqrestore(&port->lock, flags); > } > } > } > } > } > } > } > } > } > write_sequnlock(&zonelist_update_seq); > } > } > } > > This deadlock scenario can be eliminated by > > preventing interrupt context from calling kmalloc(GFP_ATOMIC) > > and > > preventing printk() from calling console_flush_all() > > while zonelist_update_seq.seqcount is odd. > > Since Petr Mladek thinks that __build_all_zonelists() can become a > candidate for deferring printk() [2], let's address this problem by > > disabling local interrupts in order to avoid kmalloc(GFP_ATOMIC) > > and > > disabling synchronous printk() in order to avoid console_flush_all() > > . > > As a side effect of minimizing duration of zonelist_update_seq.seqcount > being odd by disabling synchronous printk(), latency at > read_seqbegin(&zonelist_update_seq) for both !__GFP_DIRECT_RECLAIM and > __GFP_DIRECT_RECLAIM allocation requests will be reduced. Although, from > lockdep perspective, not calling read_seqbegin(&zonelist_update_seq) (i.e. > do not record unnecessary locking dependency) from interrupt context is > still preferable, even if we don't allow calling kmalloc(GFP_ATOMIC) inside > write_seqlock(&zonelist_update_seq)/write_sequnlock(&zonelist_update_seq) > section... I have really hard time to wrap my head around this changelog. I would rephrase as follows. The syzbot has noticed the following deadlock scenario[1] CPU0 CPU1 pty_write() { tty_insert_flip_string_and_push_buffer() { __build_all_zonelists() { write_seqlock(&zonelist_update_seq); (A) build_zonelists() { printk() { vprintk() { vprintk_default() { vprintk_emit() { console_unlock() { console_flush_all() { console_emit_next_record() { con->write() = serial8250_console_write() { spin_lock_irqsave(&port->lock, flags); (B) spin_lock_irqsave(&port->lock, flags); <<< spinning on (B) tty_insert_flip_string() { tty_insert_flip_string_fixed_flag() { __tty_buffer_request_room() { tty_buffer_alloc() { kmalloc(GFP_ATOMIC | __GFP_NOWARN) { __alloc_pages_slowpath() { zonelist_iter_begin() { read_seqbegin(&zonelist_update_seq); <<< spinning on (A) This can happen during memory hotplug operation. This means that write_seqlock on zonelist_update_seq is not allowed to call into synchronous printk code path. This can be avoided by using a deferred printk context. This is not the only problematic scenario though. Another one would be __build_all_zonelists() { write_seqlock(&zonelist_update_seq); <<< (A) kmalloc(GFP_ATOMIC) { __alloc_pages_slowpath() { read_seqbegin(&zonelist_update_seq) <<< spinning on (A) Allocations from (soft)IRQ contexts are quite common. This can be avoided by disabling interrupts for this path so we won't self livelock. > Reported-by: syzbot > Link: https://syzkaller.appspot.com/bug?extid=223c7461c58c58a4cb10 [1] > Fixes: 3d36424b3b58 ("mm/page_alloc: fix race condition between build_all_zonelists and page allocation") > Link: https://lkml.kernel.org/r/ZCrs+1cDqPWTDFNM@alley [2] > Signed-off-by: Tetsuo Handa > Cc: Michal Hocko > Cc: Petr Mladek Anyway the patch is correct Acked-by: Michal Hocko > --- > Changes in v2: > Update patch description and comment. > > mm/page_alloc.c | 16 ++++++++++++++++ > 1 file changed, 16 insertions(+) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 7136c36c5d01..e8b4f294d763 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -6632,7 +6632,21 @@ static void __build_all_zonelists(void *data) > int nid; > int __maybe_unused cpu; > pg_data_t *self = data; > + unsigned long flags; > > + /* > + * Explicitly disable this CPU's interrupts before taking seqlock > + * to prevent any IRQ handler from calling into the page allocator > + * (e.g. GFP_ATOMIC) that could hit zonelist_iter_begin and livelock. > + */ > + local_irq_save(flags); > + /* > + * Explicitly disable this CPU's synchronous printk() before taking > + * seqlock to prevent any printk() from trying to hold port->lock, for > + * tty_insert_flip_string_and_push_buffer() on other CPU might be > + * calling kmalloc(GFP_ATOMIC | __GFP_NOWARN) with port->lock held. > + */ > + printk_deferred_enter(); > write_seqlock(&zonelist_update_seq); > > #ifdef CONFIG_NUMA > @@ -6671,6 +6685,8 @@ static void __build_all_zonelists(void *data) > } > > write_sequnlock(&zonelist_update_seq); > + printk_deferred_exit(); > + local_irq_restore(flags); > } > > static noinline void __init > -- > 2.34.1 -- Michal Hocko SUSE Labs