From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CBD82D64072 for ; Fri, 8 Nov 2024 17:33:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 510C06B009C; Fri, 8 Nov 2024 12:33:15 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4C0886B009B; Fri, 8 Nov 2024 12:33:15 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 388E76B009E; Fri, 8 Nov 2024 12:33:15 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 1BD396B009B for ; Fri, 8 Nov 2024 12:33:15 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 977CAA0134 for ; Fri, 8 Nov 2024 17:33:14 +0000 (UTC) X-FDA: 82763622018.01.1A17471 Received: from nyc.source.kernel.org (nyc.source.kernel.org [147.75.193.91]) by imf28.hostedemail.com (Postfix) with ESMTP id 47804C0016 for ; Fri, 8 Nov 2024 17:32:35 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=SAAEV9jv; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf28.hostedemail.com: domain of sj@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=sj@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731087132; a=rsa-sha256; cv=none; b=YjL/AXVEMG4gM6/Pb2anP488oT68l+8S/YylM45WjSukBG0G7XQg+V7jUgmMA5I97ooekd td85Uudp2vei11o2gHMLh9HPb9SKRGSsSC0vScpBvTTY2TnONJwmJ7GOCtN2/IPJ9nuBPv 58vcuvwaDo+0+EkFXD2uAPKs1Vct/QA= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=SAAEV9jv; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf28.hostedemail.com: domain of sj@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=sj@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731087132; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=WaWHZJ9TQwtt8oc7jEuQTSDE1UsXu8EpJX8JPG1hkbU=; b=t69croDy0T/NBcfOneAgIP417Pbb8zLsBa3uKLQnaOQUK2qQl7AxyyzIR8aW1SBhvbOemh 53hR0H40upDQ/fkAPq1+7AXDN2xHQRZ7HjDTlw87tySUXSgdqCFv5MzvszA7Hr/q/dbsmm lcsaNJh1Ud4lwTGXoSxeA/nZ+oWWZbU= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by nyc.source.kernel.org (Postfix) with ESMTP id 62352A444D7; Fri, 8 Nov 2024 17:31:17 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 931FFC4CECD; Fri, 8 Nov 2024 17:33:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1731087191; bh=DKAomHDMSV3ChJFG0xijXuKMqsNJv7vwnmfJy9ep5Bc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=SAAEV9jv0Ztb4YUnjOOXZgd6yMg8nk/mzDlTItNojR/Qd/wUpmwQNSnDEOHzpMQoU kQiC8qOt/Fb/vqqQfcYH0WOQ+9UvshWnDH9z8Wy6tjPs9LOfWv+imazG/lFE5bRPwJ CRhugCfGYEA6ReHmWYBajBt6ggEbuPXjIqtI5dBJLDd8b8Yboo1VdxLlKEVtOCnoNE hXxraskFWgXJvjtQdsdNIKrwIt7p5SV21soT/6OgTrLQ5/zeCtQWow8zrYvpZhGq6d r6HdQ1GQ+UaoQGpCw+t30T1Rup3XqS2CtfcLZDSga/BXKYqQMARR8hlPqzyxySRz9+ w8OLwepgJs4lA== From: SeongJae Park To: Joanne Koong Cc: SeongJae Park , miklos@szeredi.hu, linux-fsdevel@vger.kernel.org, shakeel.butt@linux.dev, jefflexu@linux.alibaba.com, josef@toxicpanda.com, linux-mm@kvack.org, bernd.schubert@fastmail.fm, kernel-team@meta.com, David Hildenbrand Subject: Re: [PATCH v4 4/6] mm/memory-hotplug: add finite retries in offline_pages() if migration fails Date: Fri, 8 Nov 2024 09:33:09 -0800 Message-Id: <20241108173309.71619-1-sj@kernel.org> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20241107235614.3637221-5-joannelkoong@gmail.com> References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Queue-Id: 47804C0016 X-Rspamd-Server: rspam01 X-Stat-Signature: fd8n55791397x46m3ri3i9nmj749xn6h X-HE-Tag: 1731087155-799249 X-HE-Meta: U2FsdGVkX1/FuGErs91/p9uj3LyhRd0uLTKvbLFwv0DlVXyMoaPigTqATN55QPzxlos8N+iK+ibDNbG8TcAgh5KLjGhihGrcxMsGft6cjXp8SakJ8kTkGrxEEOvQO1DR8qF/Cx9+LBage5V0nZ08TUrCxNKp5Hd6RBmnQ6Rx/gX+EmvZxFWIaAgs3j4g3i3im8uoOmKw5nib1y6zOOaor51YHgcbq7sD1M8Q4bRUksMe+k6iMHqK9lfoDS3Hor4vKZobCxV26dtwHey+YLxLv7+lHl6igjv2ezqAEUDBkxRF8wmJ/JJBoX9LK2p6IJX5X9rprdc+QgRmkA83tlyMru5Lb4GAajBiapVgW0Wzuq4j8bC8VKwH2zPfmP0EgzVLuaXSEaTiZ6+452jVpcDtsTvE5Ku5sNcsmWVPyt7McUK4TC8Ph36ZUXf77i1J3KF7ghUH3bMHeG3hOmoW3KDvSTczLAE1tfQdQX0ap2AjXWA1K2LLvvupgzi1fON5zSldry2wiNaR2P+eREEDPoA52vI6Hxnozjqen9Ic7AjJlGiM1efSiZfGS+M7XglzJTyzAUbCawc0Hm9DJwx+0fgCDLhSJCpS0vsJjNRotQXPhSv+RMkXqIqUvCh+NN1ozs8a86XOuspzEfsK+dZDI16mVwYvxIY76WHHGZaUSEGLPnRC1ZdqqdipBc5QPkEPNhyi3NAiiiOXLeNs+6uWnsj9wdKO1Jh/5gQkXCEibIQkt8ZjnjSX4gWnW16FPSFNuG6G2QDTO3X54YUEikas3e3LgRaCQfB+Wm/8qrS6F9TCq6Uif6BohcuaeNt4JcknbWWB6CiSKuOZeIIGlayCo9M5JNqS0L06/fAjSKlMBe7k3cwYQbG5ImII7/i8HCTRJfCIeL8dUNgIwlv+MyA0d8N314LNZYp1oXifGzqjaN/0HLHBpHjLvkvt7i9/kMicTmdjCsp6SVMAqRpUqsOEmq1 OjL2DpYk WPzqHtrKgHx+VbpKZlF6BI7j+JGYaaMbrzXPXTOGS5EjN/bZkSGe0Kev6aB40UU8hcMZB0rmg5ojQsi0r5H7FpbPQgKprRQZsddoc4eb+Tlxf6QMgPaxGjv6io7ugZ++SkMlMcnRK0cLtUnC80CT4KBZQ5oExdoT4CJO5DROjTo2NDdeWjAmW4vkhbSm/UlWmFDSWot0sIFkCKXkFkJZby//u0nxpaOdEIH6uA8S+IdoNjGCvGEzy9fjbUqB16U4QjQvGC61IBGx/w7U7BU+5brc2yw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: + David Hildenbrand On Thu, 7 Nov 2024 15:56:12 -0800 Joanne Koong wrote: > In offline_pages(), do_migrate_range() may potentially retry forever if > the migration fails. Add a return value for do_migrate_range(), and > allow offline_page() to try migrating pages 5 times before erroring > out, similar to how migration failures in __alloc_contig_migrate_range() > is handled. I'm curious if this could cause unexpected behavioral differences to memory hotplugging users, and how '5' is chosen. Could you please enlighten me? > > Signed-off-by: Joanne Koong > --- > mm/memory_hotplug.c | 13 ++++++------- > 1 file changed, 6 insertions(+), 7 deletions(-) > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index 621ae1015106..49402442ea3b 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -1770,13 +1770,14 @@ static int scan_movable_pages(unsigned long start, unsigned long end, > return 0; > } > > -static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) > +static int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) Seems the return value is used for only knowing if it is failed or not. If there is no plan to use the error code in future, what about using bool return type? > { > struct folio *folio; > unsigned long pfn; > LIST_HEAD(source); > static DEFINE_RATELIMIT_STATE(migrate_rs, DEFAULT_RATELIMIT_INTERVAL, > DEFAULT_RATELIMIT_BURST); > + int ret = 0; > > for (pfn = start_pfn; pfn < end_pfn; pfn++) { > struct page *page; > @@ -1833,7 +1834,6 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) > .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL, > .reason = MR_MEMORY_HOTPLUG, > }; > - int ret; > > /* > * We have checked that migration range is on a single zone so > @@ -1863,6 +1863,7 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) > putback_movable_pages(&source); > } > } > + return ret; > } > > static int __init cmdline_parse_movable_node(char *p) > @@ -1940,6 +1941,7 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages, > const int node = zone_to_nid(zone); > unsigned long flags; > struct memory_notify arg; > + unsigned int tries = 0; > char *reason; > int ret; > > @@ -2028,11 +2030,8 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages, > > ret = scan_movable_pages(pfn, end_pfn, &pfn); > if (!ret) { > - /* > - * TODO: fatal migration failures should bail > - * out > - */ > - do_migrate_range(pfn, end_pfn); > + if (do_migrate_range(pfn, end_pfn) && ++tries == 5) > + ret = -EBUSY; > } In the '++tries == 5' case, users will show the failure reason as "unmovable page" from the debug log. What about setting 'reason' here to be more specific, e.g., "multiple migration failures"? Also, my humble understanding of the intention of this change is as follow. If there are 'AS_WRITEBACK_MAY_BLOCK' pages in the migration target range, do_migrate_range() will continuously fail. And hence this could become infinite loop. Pleae let me know if I'm misunderstanding. But if I'm not wrong... There is a check for expected failures above (scan_movable_pages()). What about adding 'AS_WRITEBACK_MAY_BLOCK' pages existence check there? > } while (!ret); > > -- > 2.43.5 Thanks, SJ