From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EF320D64098 for ; Fri, 8 Nov 2024 21:59:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 36618900007; Fri, 8 Nov 2024 16:59:42 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2EDAE8D0002; Fri, 8 Nov 2024 16:59:42 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 141B0900007; Fri, 8 Nov 2024 16:59:42 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id E52E08D0002 for ; Fri, 8 Nov 2024 16:59:41 -0500 (EST) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 94AFBA0FA7 for ; Fri, 8 Nov 2024 21:59:41 +0000 (UTC) X-FDA: 82764294480.05.7024CD9 Received: from mail-qt1-f169.google.com (mail-qt1-f169.google.com [209.85.160.169]) by imf13.hostedemail.com (Postfix) with ESMTP id 5B89A20022 for ; Fri, 8 Nov 2024 21:59:01 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=FrckxHAb; spf=pass (imf13.hostedemail.com: domain of joannelkoong@gmail.com designates 209.85.160.169 as permitted sender) smtp.mailfrom=joannelkoong@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731102991; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2EmE/xkKMxXBW1hJZW1gknUNBtQNSTs3M2bOiwkKXiE=; b=edkh2Db+ly/EiCpG/Re5hW6bb4SJnMrAwecxpuCcN8MqNQFnYXYbX4ZCFY6vhRLl01dTHw DCagl22oxUBV0NXRkwXsrn3rtmI9Ibyf0c4JQCW16/wKpRzZ0B1OTsMnNTSZzq4tdvnAbp jMOuAXWjQxOroSx2qwbFbBzLCWjFpa8= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=FrckxHAb; spf=pass (imf13.hostedemail.com: domain of joannelkoong@gmail.com designates 209.85.160.169 as permitted sender) smtp.mailfrom=joannelkoong@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731102991; a=rsa-sha256; cv=none; b=0Fwmgiw5zrGLcoqMe+fsTxfHPHjKnnj+s/WLT78QD/F4DqzZuA+CMJq1awhTqtNlhCA1IJ hvG4gbcGF7lBjpjqgcs/WDpBmpeFGwg1MQuDzM1bfKBqj/bBZ1SLU55uzP9AE9GFWVQ93Y robb2NvkfE4DcNCRTEF5fY5yAMYrEqw= Received: by mail-qt1-f169.google.com with SMTP id d75a77b69052e-460ad98b043so23097311cf.1 for ; Fri, 08 Nov 2024 13:59:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1731103179; x=1731707979; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=2EmE/xkKMxXBW1hJZW1gknUNBtQNSTs3M2bOiwkKXiE=; b=FrckxHAbnfIhBTarPDngT5c3OpMjd/rztZybXfORoH2W527pVtm8OZGyogvzdFJno4 SIJHR7rV54lUUJuX6JG+wzLA5IusdITF7JgKbF57ethRcBbXAgWgGL7KHyTQMgpaxv9Z cGTz0GwDAO+MJGsrg1yrrx8p3wX1wts/gOvmYpLHg95uGLMdmcZ0P7t+fKPGioYQIvlh hf90gF2q5+G4eJoIHKLhGYroOgSrtUGTyaHOxHxHS0ejGjtqWQtTykBACSxUzY5BpHBD C72+/ZxYewmMcHKTw/w+GQqg3wudoyG7HyEIjK6i0ryQ7uzxK9Ho6XhgrmuXueo/JKA0 VR4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731103179; x=1731707979; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2EmE/xkKMxXBW1hJZW1gknUNBtQNSTs3M2bOiwkKXiE=; b=jvj2eBsX3Z9Nf0oaAbewTxqJ4Lmt4cTD40RptjURxmKklkL6o11UG00Zq51nj+/+G4 GcLhvj7eooNiKhETkY+1eezenYw3Ioh4DyvTZ56vs0UpDnDdIhYvsRbI/FjZCmHpjOhs tyXKLbCpJHB6+FWywTj2qMhlySspc/tBTht7o+g9WsioyRsmactYKLm84uehybbgOWXT 0W2iV9SOPaYieQgHtJh4/Mh2UzKwPRJGE+7ur2fFvLqW+TipyQhFgPeML6DNX3nYBGJr PTVtJSC28WFPUOyUb0nhGfrGl0BoA5ovmPAwl9VpTA+q9C8ew3tUqQsLH3xW3yLjgzq5 2pmQ== X-Forwarded-Encrypted: i=1; AJvYcCWySWodMC7fek47Pxl7aIYWt+ZNVv/SnDkoAtQlpFUyVjQuzE+WQjWgX4oCbNRSMaaanYRnAInyHg==@kvack.org X-Gm-Message-State: AOJu0Ywnf0k0HThS+mnQ4nc7xEJKc0cnuESM40R/0Qv/NuF+Jzttt1Vr 69DsXW/dUBRLRPgo4xVQhWMRMvx9SsXuE9OVD5zDmuMf4q3GANIXvRPkR+CYipZALX+YR3RjTn4 AQpoXiJDxlZhV/YEgTXOLy22bJvg= X-Google-Smtp-Source: AGHT+IEaVgqOaiDInJrP3kyBLWVE6eHkRkjvc+Mp0Kw0OiaUGY2GRb4AfWE8fZEb1u/d2Ey1zFuwCi+nzKiVJCoc0Ag= X-Received: by 2002:ac8:7f44:0:b0:461:7558:892f with SMTP id d75a77b69052e-463085f5466mr78904531cf.15.1731103178783; Fri, 08 Nov 2024 13:59:38 -0800 (PST) MIME-Version: 1.0 References: <20241107235614.3637221-5-joannelkoong@gmail.com> <20241108173309.71619-1-sj@kernel.org> In-Reply-To: <20241108173309.71619-1-sj@kernel.org> From: Joanne Koong Date: Fri, 8 Nov 2024 13:59:27 -0800 Message-ID: Subject: Re: [PATCH v4 4/6] mm/memory-hotplug: add finite retries in offline_pages() if migration fails To: SeongJae Park Cc: miklos@szeredi.hu, linux-fsdevel@vger.kernel.org, shakeel.butt@linux.dev, jefflexu@linux.alibaba.com, josef@toxicpanda.com, linux-mm@kvack.org, bernd.schubert@fastmail.fm, kernel-team@meta.com, David Hildenbrand Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 5B89A20022 X-Stat-Signature: oq1jd6ah35k9pohkwdjd81ijterthf8i X-Rspam-User: X-HE-Tag: 1731103141-690842 X-HE-Meta: U2FsdGVkX1+HY6FGv1j7wWnOIAcY80fGn/qj301NcmWhHMpknjlELuhpXLo0SNipZ4+bBOFj0AosyChTSgL2uFXsuByGKmoSIvrUP6DufHEILdCc2FEXZ6kLgxuqP/EZx4OqzpQqZG37bcXSC4oXyWw1mWg9nHol72iHicYz/gGTHIfZbK+6pk3qviBchjUbcVbM7UaboAfZtrI17VsBVmbR84ON7MM0Tr+toPeORFTKcDhAgrHfhOsfoVvoFXQfg1Mm+nCGKkyC19D7s18pKNKb9gqoexJL+nj+ZNysbT3gGJViFkeZXTY02LHNicqgGFljMGNXGkCKbI+8WfZflViM1ZUJCXBhvbVxhQeCAGjhtjNsfqDfeOJkxGwxJHKrP4wvK2q01NJgR5PMm9PXKq8bw3RU5+gDezMRbjUE/d6EoNvkIqgYmkRju5Ri5ELDRv3g5q5lM1lsaaxQN3ZnRVwiCEFBsgZn1oPB7QqW+a4SafC8xACyy0TvqExt1rcSIbqP9vML1gKCx4dqKtxHdLFcil31IULd67aoetoSR2dVn+zbgTbv5NhbZsmGE++yIOA5u+PNQoTuntWU7ctYvx2AU3ucSDYZBiOs23P43b4h1ZsblvUkAEl5baKjHoLNSdH3W9vc+eE0do/Df9JNcwjP17GuNRj2fEAGGp7UEMAfKgyB9eJj5+SVffxay0gFw2cr0ohe3/XjX47CoXwi07aziPTUwSx15oKo5MuD0yHh+jPHTSVWziDpEjgHqH4a1lFJauHaK2/FZCYVQhWcbEYWriz9ggSWAEvDEL4X325qSrQV4g5CYLwsC+U5coQ81YuUjLhvYVkNEtyh3eqa9bb4c5mqKE7mo6rBBGK75UF9cX412UOdmSWFTt6dcEIjhAvRuucpT0sch9EtXHAN40RX/85mduBHizFdnT9CqHKRkA3AYcWnDwE/P0ZJSuHL0xOL0xwkEhFWGFkwyhW EHPNPkFE qDNVDqlkEZWKCrY63yxBuVsozh+nqzss2fBFdc7xGLIz7D6oWV4P5zxAFFwd7GOOmdipw8JUtPRLx4stM33rI6OQ0HnS8PEGuhmDg4jL4HqxkwhZuSeLzaYwQl3ROYXex2IyQJL+zODTEN25Wg8qT1Qw/c59dQRP7Oye68sf/UYsdi+6HNF7g9ZUn9hqjiHaBfDtXde88BAUf2BO2ISM+OCIRoczFg6rDsqc3gY6wCZmipR+czXDzXMJEAUxvxQ09hl2SvtMr4f8TimB9LxF1Od1Qj007hRTFO8PODFM3uMBtGXASIkDF5rX1TY2LCZVzc7GTF7FzBJs+reCmqvHV10zcBFf4dcgyIfgG0VVinA8UFfHlm7qX4gLxqccsObUBTib+htwt9/O5b6qT/TejeXLtWtBDLgTBSd+1 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Nov 8, 2024 at 9:33=E2=80=AFAM SeongJae Park wrote: > > + David Hildenbrand > > On Thu, 7 Nov 2024 15:56:12 -0800 Joanne Koong w= rote: > > > In offline_pages(), do_migrate_range() may potentially retry forever if > > the migration fails. Add a return value for do_migrate_range(), and > > allow offline_page() to try migrating pages 5 times before erroring > > out, similar to how migration failures in __alloc_contig_migrate_range(= ) > > is handled. > Hi SeongJae, Thanks for taking a look. I'm going to drop this patch per the conversation with David and Shakeel below, but wanted to reply back to some of the questions here for completion's sake. > I'm curious if this could cause unexpected behavioral differences to memo= ry > hotplugging users, and how '5' is chosen. Could you please enlighten me? > Most of this logic was copied from __alloc_contig_migrate_range() - in that function, '5' is hard-coded as the number of times to retry for migration failures. No other reason for '5' other than that. > > > > Signed-off-by: Joanne Koong > > --- > > mm/memory_hotplug.c | 13 ++++++------- > > 1 file changed, 6 insertions(+), 7 deletions(-) > > > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > > index 621ae1015106..49402442ea3b 100644 > > --- a/mm/memory_hotplug.c > > +++ b/mm/memory_hotplug.c > > @@ -1770,13 +1770,14 @@ static int scan_movable_pages(unsigned long sta= rt, unsigned long end, > > return 0; > > } > > > > -static void do_migrate_range(unsigned long start_pfn, unsigned long en= d_pfn) > > +static int do_migrate_range(unsigned long start_pfn, unsigned long end= _pfn) > > Seems the return value is used for only knowing if it is failed or not. = If > there is no plan to use the error code in future, what about using bool r= eturn > type? > > > { > > struct folio *folio; > > unsigned long pfn; > > LIST_HEAD(source); > > static DEFINE_RATELIMIT_STATE(migrate_rs, DEFAULT_RATELIMIT_INTER= VAL, > > DEFAULT_RATELIMIT_BURST); > > + int ret =3D 0; > > > > for (pfn =3D start_pfn; pfn < end_pfn; pfn++) { > > struct page *page; > > @@ -1833,7 +1834,6 @@ static void do_migrate_range(unsigned long start_= pfn, unsigned long end_pfn) > > .gfp_mask =3D GFP_USER | __GFP_MOVABLE | __GFP_RE= TRY_MAYFAIL, > > .reason =3D MR_MEMORY_HOTPLUG, > > }; > > - int ret; > > > > /* > > * We have checked that migration range is on a single zo= ne so > > @@ -1863,6 +1863,7 @@ static void do_migrate_range(unsigned long start_= pfn, unsigned long end_pfn) > > putback_movable_pages(&source); > > } > > } > > + return ret; > > } > > > > static int __init cmdline_parse_movable_node(char *p) > > @@ -1940,6 +1941,7 @@ int offline_pages(unsigned long start_pfn, unsign= ed long nr_pages, > > const int node =3D zone_to_nid(zone); > > unsigned long flags; > > struct memory_notify arg; > > + unsigned int tries =3D 0; > > char *reason; > > int ret; > > > > @@ -2028,11 +2030,8 @@ int offline_pages(unsigned long start_pfn, unsig= ned long nr_pages, > > > > ret =3D scan_movable_pages(pfn, end_pfn, &pfn); > > if (!ret) { > > - /* > > - * TODO: fatal migration failures should = bail > > - * out > > - */ > > - do_migrate_range(pfn, end_pfn); > > + if (do_migrate_range(pfn, end_pfn) && ++t= ries =3D=3D 5) > > + ret =3D -EBUSY; > > } > > In the '++tries =3D=3D 5' case, users will show the failure reason as "un= movable > page" from the debug log. What about setting 'reason' here to be more > specific, e.g., "multiple migration failures"? > > Also, my humble understanding of the intention of this change is as follo= w. If > there are 'AS_WRITEBACK_MAY_BLOCK' pages in the migration target range, > do_migrate_range() will continuously fail. And hence this could become > infinite loop. Pleae let me know if I'm misunderstanding. > > But if I'm not wrong... There is a check for expected failures above > (scan_movable_pages()). What about adding 'AS_WRITEBACK_MAY_BLOCK' pages > existence check there? The main difference between adding migrate_pages() retries (this patch) vs adding an 'AS_WRITEBACK_MAY_BLOCK' check in scan_movable_pages() is that in the latter, all pages in an 'AS_WRITEBACK_MAY_BLOCK' mapping will be skipped for migration whereas in the former, only pages under writeback will be skipped. I think the latter is probably fine too for this case but the former seemed a bit more optimal to me. Thanks, Joanne > > > } while (!ret); > > > > -- > > 2.43.5 > > > Thanks, > SJ