From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6244ECDB474 for ; Tue, 17 Oct 2023 04:58:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C440D8D00E7; Tue, 17 Oct 2023 00:58:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BF21C8D00DE; Tue, 17 Oct 2023 00:58:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AE1558D00E7; Tue, 17 Oct 2023 00:58:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 9DFFA8D00DE for ; Tue, 17 Oct 2023 00:58:41 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 6962180E4C for ; Tue, 17 Oct 2023 04:58:41 +0000 (UTC) X-FDA: 81353748042.04.E7F6588 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.88]) by imf29.hostedemail.com (Postfix) with ESMTP id 4247D12000B for ; Tue, 17 Oct 2023 04:58:38 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Otup1cfZ; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf29.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1697518719; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hoTmaAhAPRZ5BZp0Or8pB3mGTr1o+UKak5H1C6wGu2I=; b=zTem+LIyWEz5lofhlkPY4QyNOV7hLRaUUaU4X8/4UKAbZAX3pfGEGqEW5y0ze3z+iVoO5t VPfUarnU+q2W4dpgiDwxFOZqQz9/Oa41XpQhWhxr93zzQsBK7d0mBAH4WZIx3Z7+kCJsJO rdVmlzCOVmaJS1OOE/X6H587yqCsJWk= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Otup1cfZ; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf29.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1697518719; a=rsa-sha256; cv=none; b=N8q4gszm9mIJTSTAJGoWJpgbMX9RCLY30MNwp2xi8okEI6GjfYJZE9fQVw6M8tIa4xkby8 KvSPGzhTZjMEfdXYy7lbxv0/UpF2937YhqwvDbf5lg0Pb8HreVZcgzKSCRaIoLfY0WGjlW Box/Wy+J2kTkrJOD828m0W7HLPHgG/4= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1697518718; x=1729054718; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=dHD8WBxQWQjMRDgXMWFGWJqNZ500csN6h+LPai/u7+M=; b=Otup1cfZF4wxraDyzG+S14XZVwndy7zBEfQlbR91HAJUKhlG2Xr6qP6U P6DDd5rT0FI8XX4TXAEU86i0W99kyCXI2WR5Yqdb+iF8IyW6pGIeEFAnj j8zfe7ErT4CTvpDS7gTQ75eGJqG9hquNT9+wG7Qr5ibfLdkivav+2nYhy Cd3w8QgDMTe6NbR5yOKHheT3Dtrj03LtpgRvB/ie4EtYJQpauxFLQp2hL qedsB+Du6jREScgR8yLF6s0yF32YoAFvG3BN4uNTHuYbB1SBVfgfHGIYv fNSY6LgruCZ5Dxd6sUah1Ui3M4gNWnna8Wb0MwRtGq9DbqBspsIbSaWiv w==; X-IronPort-AV: E=McAfee;i="6600,9927,10865"; a="416773786" X-IronPort-AV: E=Sophos;i="6.03,231,1694761200"; d="scan'208";a="416773786" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Oct 2023 21:58:36 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10865"; a="821825348" X-IronPort-AV: E=Sophos;i="6.03,231,1694761200"; d="scan'208";a="821825348" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Oct 2023 21:58:33 -0700 From: "Huang, Ying" To: Zi Yan Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Zi Yan , Andrew Morton , "Matthew Wilcox (Oracle)" , David Hildenbrand , Baolin Wang Subject: Re: [PATCH v2] mm/migrate: correct nr_failed in migrate_pages_sync() In-Reply-To: <20231016154156.1948815-1-zi.yan@sent.com> (Zi Yan's message of "Mon, 16 Oct 2023 11:41:56 -0400") References: <20231016154156.1948815-1-zi.yan@sent.com> Date: Tue, 17 Oct 2023 12:56:12 +0800 Message-ID: <871qdtdez7.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspamd-Queue-Id: 4247D12000B X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: zfjnwgppxq376beyrqq9h7ceae8k11q4 X-HE-Tag: 1697518718-150222 X-HE-Meta: U2FsdGVkX1/5mL2pG1r1mBw16/wjdSkodkmDfg7JLhJR2B8j2OvUdNjV8jLtlRH37YI9/4EzpbqW1dgaAaiSrOJ6Hji8W5K3z6pFPbhyKoFAEupzXBC+HkKGGA2wmt5/oLHeX4mf2FctiFDCsUqaEK56d0ym8f71YGU9iDGNZmyAtamHeDiTQaAuTW/T7lXhzm5ZkpYJHHuHg2O5tiKjBBjSYznep76OgVuPfE7GA4Zo+D0nPCaQlmEKzWCcc1eLqjxlh2ogeJ4qRw3OrmpCTodSqs5zoMXAqHKYvx2S4hsxSn3yCfFAxhEJJi2GK41nL/b1xBdFid4Q/w4pm56yjy/Le/EbC4SlKwQQ6sLaD5wI9d8PHuZSZL2b1PXeMsaLdzGNeAANqniRGvR56LXxgZ3ocRe5pSf6hTEgbTV7BmiFDcIQQ8gg8/Pyi4t/yKbRJwHzbTcJ8etFNeURxislPEGrxqbNeY+J6vE54NXEkR6rIF3A2CpJFd/5yQkvYUHTA2O1tlf7SKl7mCzcLd6w+EFJCacpRejQMUOE37QA2hFf/9P83N3iEzJvxFVSygbuJxrX6AWN8dYqux4j/ipJUwF7o72zYj4lIXyu0HM7AMXXZESpeHNQ8RmNxUn+imJrV0B+NztL7huYKV92sUhJi55a37QvjaTtcjrCS1wzFBR4iKwk2EvuyuiQIHtH6BXw2u0ysXWfMEonqwbcW4+bnSqK/JIP/ZXwgIL5XRJ6QcPVqWd2Trw5dz5C6ciFX7dsO/t8BHL8de/1B2bP1kNMQiyttYcrdrKT/ay2JP5LUqrfZjXIwarRKlHETBTb0DJMexl5bz2extg1NUdHIxNccnXjNkZ4+ol34AqDaSITE/HJcmM8BwT+sSpO5PEbO7atNIk/poeRN/2oue4jGTJZaYgLPSgWxvaXN8nHH44nnBR3biwA3ZgNLF/o/dwY3JbIrZfj7G36u7cEseQHFHt JFoohnfa ThYt1E32MqQAmWnTte+M/wumfa36/stMqbGHatPZpwZnDrFNLEtsIG9bvYEI9Im1Qod7PWJzPrYzv6FdvKJODi5e2z859oNKOF97S46gxGv1G3xowrXlNR2RwcFW0GV/zOyKu18xTy+JXAfYAyw8pYbVcj9m7T0KgOLDYgLJgT/WqSXMSQj/I14BuB7+Ob15b8dSbsH7cwgTIDTP2Kq3SXRy1VXSJ3cveT89zgAwuykVvy1Bw3V5vt2KubUFIeuWDrTw/zwq1D7ThMwjaLRxPz9BmrUoJvph7qttdGqIb7rFG1lc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Zi Yan writes: > From: Zi Yan > > nr_failed was missing the large folio splits from migrate_pages_batch() > and can cause a mismatch between migrate_pages() return value and the > number of not migrated pages, i.e., when the return value of > migrate_pages() is 0, there are still pages left in the from page list. > It will happen when a non-PMD THP large folio fails to migrate due to > -ENOMEM and is split successfully but not all the split pages are not > migrated, migrate_pages_batch() would return non-zero, but > astats.nr_thp_split = 0. nr_failed would be 0 and returned to the caller > of migrate_pages(), but the not migrated pages are left in the from page > list without being added back to LRU lists. > > Fix it by adding a new nr_split counter for large folio splits and adding > it to nr_failed in migrate_page_sync() after migrate_pages_batch() is > done. > > Fixes: 2ef7dbb26990 ("migrate_pages: try migrate in batch asynchronously firstly") > Signed-off-by: Zi Yan > --- > include/trace/events/migrate.h | 24 ++++++++++++++---------- > mm/migrate.c | 15 +++++++++++---- > 2 files changed, 25 insertions(+), 14 deletions(-) > > diff --git a/include/trace/events/migrate.h b/include/trace/events/migrate.h > index 061b5128f335..0190ef725b43 100644 > --- a/include/trace/events/migrate.h > +++ b/include/trace/events/migrate.h > @@ -49,10 +49,11 @@ TRACE_EVENT(mm_migrate_pages, > > TP_PROTO(unsigned long succeeded, unsigned long failed, > unsigned long thp_succeeded, unsigned long thp_failed, > - unsigned long thp_split, enum migrate_mode mode, int reason), > + unsigned long thp_split, unsigned long large_folio_split, > + enum migrate_mode mode, int reason), > > TP_ARGS(succeeded, failed, thp_succeeded, thp_failed, > - thp_split, mode, reason), > + thp_split, large_folio_split, mode, reason), > > TP_STRUCT__entry( > __field( unsigned long, succeeded) > @@ -60,26 +61,29 @@ TRACE_EVENT(mm_migrate_pages, > __field( unsigned long, thp_succeeded) > __field( unsigned long, thp_failed) > __field( unsigned long, thp_split) > + __field( unsigned long, large_folio_split) > __field( enum migrate_mode, mode) > __field( int, reason) > ), > > TP_fast_assign( > - __entry->succeeded = succeeded; > - __entry->failed = failed; > - __entry->thp_succeeded = thp_succeeded; > - __entry->thp_failed = thp_failed; > - __entry->thp_split = thp_split; > - __entry->mode = mode; > - __entry->reason = reason; > + __entry->succeeded = succeeded; > + __entry->failed = failed; > + __entry->thp_succeeded = thp_succeeded; > + __entry->thp_failed = thp_failed; > + __entry->thp_split = thp_split; > + __entry->large_folio_split = large_folio_split; > + __entry->mode = mode; > + __entry->reason = reason; > ), > > - TP_printk("nr_succeeded=%lu nr_failed=%lu nr_thp_succeeded=%lu nr_thp_failed=%lu nr_thp_split=%lu mode=%s reason=%s", > + TP_printk("nr_succeeded=%lu nr_failed=%lu nr_thp_succeeded=%lu nr_thp_failed=%lu nr_thp_split=%lu nr_split=%lu mode=%s reason=%s", > __entry->succeeded, > __entry->failed, > __entry->thp_succeeded, > __entry->thp_failed, > __entry->thp_split, > + __entry->large_folio_split, > __print_symbolic(__entry->mode, MIGRATE_MODE), > __print_symbolic(__entry->reason, MIGRATE_REASON)) > ); I suggest to split trace event change in another patch. Because this patch will be backported to previous stable kernel. It's more convenient for users if we keep trace event unchanged in stable kernel. And this isn't part of fix. > diff --git a/mm/migrate.c b/mm/migrate.c > index c602bf6dec97..4caf405b6504 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -1495,6 +1495,7 @@ struct migrate_pages_stats { > int nr_thp_succeeded; /* THP migrated successfully */ > int nr_thp_failed; /* THP failed to be migrated */ > int nr_thp_split; /* THP split before migrating */ > + int nr_split; /* Large folio (include THP) split before migrating */ > }; > > /* > @@ -1614,6 +1615,7 @@ static int migrate_pages_batch(struct list_head *from, > int nr_retry_pages = 0; > int pass = 0; > bool is_thp = false; > + bool is_large = false; > struct folio *folio, *folio2, *dst = NULL, *dst2; > int rc, rc_saved = 0, nr_pages; > LIST_HEAD(unmap_folios); > @@ -1629,7 +1631,8 @@ static int migrate_pages_batch(struct list_head *from, > nr_retry_pages = 0; > > list_for_each_entry_safe(folio, folio2, from, lru) { > - is_thp = folio_test_large(folio) && folio_test_pmd_mappable(folio); > + is_large = folio_test_large(folio); > + is_thp = is_large && folio_test_pmd_mappable(folio); > nr_pages = folio_nr_pages(folio); > > cond_resched(); > @@ -1649,6 +1652,7 @@ static int migrate_pages_batch(struct list_head *from, > stats->nr_thp_failed++; > if (!try_split_folio(folio, split_folios)) { > stats->nr_thp_split++; > + stats->nr_split++; > continue; > } > stats->nr_failed_pages += nr_pages; > @@ -1677,11 +1681,12 @@ static int migrate_pages_batch(struct list_head *from, > nr_failed++; > stats->nr_thp_failed += is_thp; > /* Large folio NUMA faulting doesn't split to retry. */ > - if (folio_test_large(folio) && !nosplit) { > + if (is_large && !nosplit) { It appears that there's only one user of "is_large"? If so, it seems unnecessary to introduce another variable. But this isn't a big issue. > int ret = try_split_folio(folio, split_folios); > > if (!ret) { > stats->nr_thp_split += is_thp; > + stats->nr_split++; > break; > } else if (reason == MR_LONGTERM_PIN && > ret == -EAGAIN) { > @@ -1827,6 +1832,7 @@ static int migrate_pages_sync(struct list_head *from, new_folio_t get_new_folio, > stats->nr_succeeded += astats.nr_succeeded; > stats->nr_thp_succeeded += astats.nr_thp_succeeded; > stats->nr_thp_split += astats.nr_thp_split; > + stats->nr_split += astats.nr_split; > if (rc < 0) { > stats->nr_failed_pages += astats.nr_failed_pages; > stats->nr_thp_failed += astats.nr_thp_failed; > @@ -1834,7 +1840,7 @@ static int migrate_pages_sync(struct list_head *from, new_folio_t get_new_folio, > return rc; > } > stats->nr_thp_failed += astats.nr_thp_split; > - nr_failed += astats.nr_thp_split; > + nr_failed += astats.nr_split + astats.nr_thp_split; THP splitting is counted in .nr_thp_split and .nr_split. So we should add .nr_split only here. > /* > * Fall back to migrate all failed folios one by one synchronously. All > * failed folios except split THPs will be retried, so their failure > @@ -1969,7 +1975,8 @@ int migrate_pages(struct list_head *from, new_folio_t get_new_folio, > count_vm_events(THP_MIGRATION_SPLIT, stats.nr_thp_split); > trace_mm_migrate_pages(stats.nr_succeeded, stats.nr_failed_pages, > stats.nr_thp_succeeded, stats.nr_thp_failed, > - stats.nr_thp_split, mode, reason); > + stats.nr_thp_split, stats.nr_split, mode, > + reason); > > if (ret_succeeded) > *ret_succeeded = stats.nr_succeeded; -- Best Regards, Huang, Ying