From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6FAA5CDB465 for ; Thu, 19 Oct 2023 06:11:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DB62480049; Thu, 19 Oct 2023 02:11:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D659580046; Thu, 19 Oct 2023 02:11:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C544F80049; Thu, 19 Oct 2023 02:11:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id B597280046 for ; Thu, 19 Oct 2023 02:11:27 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 82A53A1162 for ; Thu, 19 Oct 2023 06:11:27 +0000 (UTC) X-FDA: 81361189014.19.2AC2688 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.136]) by imf20.hostedemail.com (Postfix) with ESMTP id 4529A1C0004 for ; Thu, 19 Oct 2023 06:11:24 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=JoESdPRB; spf=pass (imf20.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.136 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1697695885; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0cAVA/zOwX6/Ws1ALQE7hvBJ9P+0Q1kt0QjMwo6/uaA=; b=iNieGD8GoKhZX+R9/gANm9jFLCoBOwghL32XqTLzzl54Ot9ls2VBydukQzocBc5T6WXDST LbCyKO0hyNVcngTbKIYWUVpdpCebaFkhJGptcC17BaZKgHm3VVEn8nJxu9LjqbDmf5Exp/ hlKJCQuUJjn1KXGsmLNj8D7CPbsffRk= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1697695885; a=rsa-sha256; cv=none; b=tW/HWYhRtX+OfsgEpRiI7L7Wbvy4XgeJQ8DbcTV4pLRk9X+/p0zN1WJPhaZEBNGpuFn+sZ zt8aNHX2GsAYZ5lJg4p0hchl3k9xqBLdroxlh1eHBljstqPjk8Jzt/W+uWVTOYN0Zz5JdP 06WW62SrNm9X9i1VxrVc705CF25Kz1g= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=JoESdPRB; spf=pass (imf20.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.136 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1697695884; x=1729231884; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=rQtYgeSku38Ko0AKqUIbk92pm06Ktjoia6y4d/RVils=; b=JoESdPRBA6YOGhFwOUlMIsaPgSoO4+0+XigOSKrA5Jbhgm9HV+EHUIZJ Gp4pvaC4XxWpV1oGF+gSJLg2MS2A6cuF6Stn05QGWXkcJcHiYczaCJpj1 fdtkinp+zbeWp2W97mklJTqVP0MA5YtbAi9U5+8Q5jRJ1DZMKkQL5kbKC 1r8wuq52JcELnGwzPsGfCMXyPAIDVqt6fFuRS7XQUfJFVPMk1NAlfYMzP YPr+Q/zycb6YrwvLCIzJSosLHdLYDk1ISF6/mncEOL7P2fJfE0AvTUDly gmskYLT6ucYAtrck0lRF9grY2Tq1lsrD8BEb9t7vdGwQd3KsOmZXelb0e A==; X-IronPort-AV: E=McAfee;i="6600,9927,10867"; a="365531612" X-IronPort-AV: E=Sophos;i="6.03,236,1694761200"; d="scan'208";a="365531612" Received: from fmviesa001.fm.intel.com ([10.60.135.141]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Oct 2023 23:11:21 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.03,236,1694761200"; d="scan'208";a="4844576" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Oct 2023 23:11:22 -0700 From: "Huang, Ying" To: Zi Yan Cc: Baolin Wang , , , , , , , Fengwei Yin Subject: Re: [PATCH] mm: migrate: record the mlocked page status to remove unnecessary lru drain In-Reply-To: <1F80D8DA-8BB5-4C7E-BC2F-030BF52931F7@nvidia.com> (Zi Yan's message of "Wed, 18 Oct 2023 10:00:52 -0400") References: <64899ad0bb78cde88b52abed1a5a5abbc9919998.1697632761.git.baolin.wang@linux.alibaba.com> <1F80D8DA-8BB5-4C7E-BC2F-030BF52931F7@nvidia.com> Date: Thu, 19 Oct 2023 14:09:18 +0800 Message-ID: <87il73uos1.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspamd-Queue-Id: 4529A1C0004 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: yfjpq97i7wnm7hmq4sxqi4d9yakfwm9y X-HE-Tag: 1697695884-291875 X-HE-Meta: U2FsdGVkX1+D7qfyMUlLdyRruhg5H0nO2FsJmSjqAUImqYVT3CGh4SumnhIUo+bDp5sAofQNplstR8snCw2IOpNuS5Y+qSzFFu9XkMHrABCqMw7OKV/3Ctud2EMnNAYh3YfvK6K88pEByd6wkFO30wU4wROLSaqa1QOImSqPkaN9Cp7pgARp8hze4K4M2x1Sa0Vsgg59h9UVit8JHf1wJHPlJloZyEU3942pYeCcUYoL9UidM7mpjeVlie0iFenegut4tR801K3a3bFXhCkhpRoH+VEjInAMPUKdXoHZHr+bjIwHlgnW90jWgmTNKy8FKIyMpKFwF92VMFvnC2VMQfpy8KKC7Jrfo3douf+cmEvnprK6zzscDjJZWYkyqYo3oehRY1A5pI2W5EX21BuEOIOMxiv2SctqistTkCJHu9mVpbOSdKScY9I0eO3LNULgVkk2jw3MTSYClRf1hvNb6WB0H0x2ABCXmIv7F7pHDNRQhjsZdCHf49uJuFabf3NP6gjD2H+b2NqbkheBPqEBmwYHEyVmuaL6yQKOgc9PY/St1xIyr6Ij288UajCTTRdITKYAMRFvyyrtReocpmeWNHTLWD705AQ44eNij+fQYrNryfZGUsi6R5ftOTq6aKkWkDjqXB0omYUVS41+slVaizBm0ZhLRs7mzOdGijXRP4Q0m7zucH1wLn0FBZjO24a/Z8nfys582PuEQ4SmL1TKWjOV1fkc5Hzo0ANCaAJFxo+0QEnSNOt6EZjivjg44ELfsVmjGropHy3jTq9ISRj8SdlLWuByIwHN3mrQ3P/zuK/OtGg4SPP+qgONm2jbEYwsurZEoiRUPzTAnBNvNJwmIp5LY4Qs1z5mw7i49qPW9GmzockCboRg9j0PtfZa9pbo1ne38oCYFE9QLLZTAHgKEwXEceoQcH99fIoRYMC0db6GMpj+yzlrVRj6aRE9RlKzwKwuX2cPlnTtfI7NiK/ V9j64zlf F2YwZWRp+cOF4AbqRIY3/KR/yO2LcEunasjgsq14t+VXnRr8ZZseTXhNmMxAJ8pGfekHEqqbYhjpOYS9OfZSJ4zH7m1HTplF9OfmPhXQ4TaGDQBFiupQRXf+iL2AaM+D6DmC08XxNe69/rqZ5zMGtk8yX2+LwKQSLRSV+5TSyGkXhrW4Lo+/VUh9t+NGCeRNaEyJk X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Zi Yan writes: > On 18 Oct 2023, at 9:04, Baolin Wang wrote: > >> When doing compaction, I found the lru_add_drain() is an obvious hotspot >> when migrating pages. The distribution of this hotspot is as follows: >> - 18.75% compact_zone >> - 17.39% migrate_pages >> - 13.79% migrate_pages_batch >> - 11.66% migrate_folio_move >> - 7.02% lru_add_drain >> + 7.02% lru_add_drain_cpu >> + 3.00% move_to_new_folio >> 1.23% rmap_walk >> + 1.92% migrate_folio_unmap >> + 3.20% migrate_pages_sync >> + 0.90% isolate_migratepages >> >> The lru_add_drain() was added by commit c3096e6782b7 ("mm/migrate: >> __unmap_and_move() push good newpage to LRU") to drain the newpage to LRU >> immediately, to help to build up the correct newpage->mlock_count in >> remove_migration_ptes() for mlocked pages. However, if there are no mlocked >> pages are migrating, then we can avoid this lru drain operation, especailly >> for the heavy concurrent scenarios. > > lru_add_drain() is also used to drain pages out of folio_batch. Pages in folio_batch > have an additional pin to prevent migration. See folio_get(folio); in folio_add_lru(). lru_add_drain() is called after the page reference count checking in move_to_new_folio(). So, I don't this is an issue. >> >> So we can record the source pages' mlocked status in migrate_folio_unmap(), >> and only drain the lru list when the mlocked status is set in migrate_folio_move(). >> In addition, the page was already isolated from lru when migrating, so we >> check the mlocked status is stable by folio_test_mlocked() in migrate_folio_unmap(). >> >> After this patch, I can see the hotpot of the lru_add_drain() is gone: >> - 9.41% migrate_pages_batch >> - 6.15% migrate_folio_move >> - 3.64% move_to_new_folio >> + 1.80% migrate_folio_extra >> + 1.70% buffer_migrate_folio >> + 1.41% rmap_walk >> + 0.62% folio_add_lru >> + 3.07% migrate_folio_unmap >> >> Meanwhile, the compaction latency shows some improvements when running >> thpscale: >> base patched >> Amean fault-both-1 1131.22 ( 0.00%) 1112.55 * 1.65%* >> Amean fault-both-3 2489.75 ( 0.00%) 2324.15 * 6.65%* >> Amean fault-both-5 3257.37 ( 0.00%) 3183.18 * 2.28%* >> Amean fault-both-7 4257.99 ( 0.00%) 4079.04 * 4.20%* >> Amean fault-both-12 6614.02 ( 0.00%) 6075.60 * 8.14%* >> Amean fault-both-18 10607.78 ( 0.00%) 8978.86 * 15.36%* >> Amean fault-both-24 14911.65 ( 0.00%) 11619.55 * 22.08%* >> Amean fault-both-30 14954.67 ( 0.00%) 14925.66 * 0.19%* >> Amean fault-both-32 16654.87 ( 0.00%) 15580.31 * 6.45%* >> >> Signed-off-by: Baolin Wang >> --- >> mm/migrate.c | 50 ++++++++++++++++++++++++++++++++++++++------------ >> 1 file changed, 38 insertions(+), 12 deletions(-) >> >> diff --git a/mm/migrate.c b/mm/migrate.c >> index 4caf405b6504..32c96f89710f 100644 >> --- a/mm/migrate.c >> +++ b/mm/migrate.c >> @@ -1027,22 +1027,32 @@ union migration_ptr { >> struct anon_vma *anon_vma; >> struct address_space *mapping; >> }; >> + >> +enum { >> + PAGE_WAS_MAPPED = 1 << 0, >> + PAGE_WAS_MLOCKED = 1 << 1, >> +}; >> + >> static void __migrate_folio_record(struct folio *dst, >> - unsigned long page_was_mapped, >> + unsigned long page_flags, >> struct anon_vma *anon_vma) >> { >> union migration_ptr ptr = { .anon_vma = anon_vma }; >> dst->mapping = ptr.mapping; >> - dst->private = (void *)page_was_mapped; >> + dst->private = (void *)page_flags; >> } >> >> static void __migrate_folio_extract(struct folio *dst, >> int *page_was_mappedp, >> + int *page_was_mlocked, >> struct anon_vma **anon_vmap) >> { >> union migration_ptr ptr = { .mapping = dst->mapping }; >> + unsigned long page_flags = (unsigned long)dst->private; >> + >> *anon_vmap = ptr.anon_vma; >> - *page_was_mappedp = (unsigned long)dst->private; >> + *page_was_mappedp = page_flags & PAGE_WAS_MAPPED ? 1 : 0; >> + *page_was_mlocked = page_flags & PAGE_WAS_MLOCKED ? 1 : 0; >> dst->mapping = NULL; >> dst->private = NULL; >> } >> @@ -1103,7 +1113,7 @@ static int migrate_folio_unmap(new_folio_t get_new_folio, >> { >> struct folio *dst; >> int rc = -EAGAIN; >> - int page_was_mapped = 0; >> + int page_was_mapped = 0, page_was_mlocked = 0; >> struct anon_vma *anon_vma = NULL; >> bool is_lru = !__folio_test_movable(src); >> bool locked = false; >> @@ -1157,6 +1167,7 @@ static int migrate_folio_unmap(new_folio_t get_new_folio, >> folio_lock(src); >> } >> locked = true; >> + page_was_mlocked = folio_test_mlocked(src); >> >> if (folio_test_writeback(src)) { >> /* >> @@ -1206,7 +1217,7 @@ static int migrate_folio_unmap(new_folio_t get_new_folio, >> dst_locked = true; >> >> if (unlikely(!is_lru)) { >> - __migrate_folio_record(dst, page_was_mapped, anon_vma); >> + __migrate_folio_record(dst, 0, anon_vma); >> return MIGRATEPAGE_UNMAP; >> } >> >> @@ -1236,7 +1247,13 @@ static int migrate_folio_unmap(new_folio_t get_new_folio, >> } >> >> if (!folio_mapped(src)) { >> - __migrate_folio_record(dst, page_was_mapped, anon_vma); >> + unsigned int page_flags = 0; >> + >> + if (page_was_mapped) >> + page_flags |= PAGE_WAS_MAPPED; >> + if (page_was_mlocked) >> + page_flags |= PAGE_WAS_MLOCKED; >> + __migrate_folio_record(dst, page_flags, anon_vma); >> return MIGRATEPAGE_UNMAP; >> } >> >> @@ -1261,12 +1278,13 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private, >> struct list_head *ret) >> { >> int rc; >> - int page_was_mapped = 0; >> + int page_was_mapped = 0, page_was_mlocked = 0; >> struct anon_vma *anon_vma = NULL; >> bool is_lru = !__folio_test_movable(src); >> struct list_head *prev; >> >> - __migrate_folio_extract(dst, &page_was_mapped, &anon_vma); >> + __migrate_folio_extract(dst, &page_was_mapped, >> + &page_was_mlocked, &anon_vma); > > It is better to read out the flag, then check page_was_mapped and page_was_mlocked > to avoid future __migrate_folio_extract() interface churns. IHMO, in contrast, it's better to use separate flags in __migrate_folio_record() too to avoid to pack flags in each call site. >> prev = dst->lru.prev; >> list_del(&dst->lru); >> >> @@ -1287,7 +1305,7 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private, >> * isolated from the unevictable LRU: but this case is the easiest. >> */ >> folio_add_lru(dst); >> - if (page_was_mapped) >> + if (page_was_mlocked) >> lru_add_drain(); > > Like I said at the top, this would be if (page_was_mapped || page_was_mlocked). > >> >> if (page_was_mapped) >> @@ -1321,8 +1339,15 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private, >> * right list unless we want to retry. >> */ >> if (rc == -EAGAIN) { >> + unsigned int page_flags = 0; >> + >> + if (page_was_mapped) >> + page_flags |= PAGE_WAS_MAPPED; >> + if (page_was_mlocked) >> + page_flags |= PAGE_WAS_MLOCKED; >> + >> list_add(&dst->lru, prev); >> - __migrate_folio_record(dst, page_was_mapped, anon_vma); >> + __migrate_folio_record(dst, page_flags, anon_vma); >> return rc; >> } >> >> @@ -1799,10 +1824,11 @@ static int migrate_pages_batch(struct list_head *from, >> dst = list_first_entry(&dst_folios, struct folio, lru); >> dst2 = list_next_entry(dst, lru); >> list_for_each_entry_safe(folio, folio2, &unmap_folios, lru) { >> - int page_was_mapped = 0; >> + int page_was_mapped = 0, page_was_mlocked = 0; >> struct anon_vma *anon_vma = NULL; >> >> - __migrate_folio_extract(dst, &page_was_mapped, &anon_vma); >> + __migrate_folio_extract(dst, &page_was_mapped, >> + &page_was_mlocked, &anon_vma); >> migrate_folio_undo_src(folio, page_was_mapped, anon_vma, >> true, ret_folios); >> list_del(&dst->lru); >> -- >> 2.39.3 -- Best Regards, Huang, Ying