From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D3386C77B73 for ; Sun, 23 Apr 2023 08:00:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 34E446B0072; Sun, 23 Apr 2023 04:00:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2D72B6B0074; Sun, 23 Apr 2023 04:00:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 178106B0075; Sun, 23 Apr 2023 04:00:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 02EE76B0072 for ; Sun, 23 Apr 2023 04:00:26 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id C312A801D9 for ; Sun, 23 Apr 2023 08:00:25 +0000 (UTC) X-FDA: 80711908410.18.CC1EEDA Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf30.hostedemail.com (Postfix) with ESMTP id AA9DC80012 for ; Sun, 23 Apr 2023 08:00:23 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=NKIXnUuh; spf=pass (imf30.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1682236824; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=IVQgwK1CGmbYCiF4ITP57+Hd5yjBDfdVvg8FTNlkM+U=; b=dJxVdQaOPTW3OYp6t8+Ay1IgQCq6rs8lmn4tNI+p6kzCaWVMG9hHkYW0tmVSbd6/B4qsOy Z97v8nJiHJGB9Tk900mORt4MzCWWGvJyctEIgk12qGClrU6/GH32YtmZ8MtGSNsfZ19aa0 ni502SGgtJoQSjbpflnYapalGCRWuZ0= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=NKIXnUuh; spf=pass (imf30.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1682236824; a=rsa-sha256; cv=none; b=E5fTaE9jKoqAALQ8QScxzFotuFAB7kQiwZBNsjwKjQEcul6nKXT0VHgCqbdLD4gSs5XUH2 D0smlLio+WWLO047eQQJusYGCxyh0cBX2J14GSjW2KbDKcJ2+eWr2NQpAbBEMnSinPkWiI +9be5QhJVaAqGBlYAqYWSm9dYzICeUg= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1682236823; x=1713772823; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=O6HQx+7ATNfLn29XVUobeBHwkKkA2qEXspWQTyfx/uw=; b=NKIXnUuhQZ8kEz4Mq0Le+67d5dK1Apmjq4m4rntMHlm9CBKLYCTSTWjv uK4FTlgbPLdnzYGWNTampJoFpYA2jaHNvoH9CHNR2fusGSMMe3svVsHnn QIMAG/mk3NLz3YgHiq5s44zmJoUjHA0wORZ6xCPjFs8NR10mibuVEoSj9 4+dVRMtkqhT8RUV2frBo6U/ruJB4+JganMPjAFpOE5b/UBdl5bhpYOrkW 3xz8gIj9f31aaztXt9sSP+7TRUJ/4h9v0L/3GcEtUry5tG9ORk2rpB/4B 6Wny1r+BpTzZ+n5Ra47/ckVwCj2NV2ZyGok8kci/blD0Pt2fQVFG0xMQS A==; X-IronPort-AV: E=McAfee;i="6600,9927,10688"; a="343744157" X-IronPort-AV: E=Sophos;i="5.99,220,1677571200"; d="scan'208";a="343744157" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Apr 2023 01:00:21 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10688"; a="695382675" X-IronPort-AV: E=Sophos;i="5.99,220,1677571200"; d="scan'208";a="695382675" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Apr 2023 01:00:19 -0700 From: "Huang, Ying" To: Douglas Anderson Cc: Andrew Morton , Mel Gorman , Vlastimil Babka , Alexander Viro , Christian Brauner , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Yu Zhao , linux-fsdevel@vger.kernel.org, Matthew Wilcox Subject: Re: [PATCH v2 3/4] migrate_pages: Don't wait forever locking pages in MIGRATE_SYNC_LIGHT References: <20230421221249.1616168-1-dianders@chromium.org> <20230421151135.v2.3.Ia86ccac02a303154a0b8bc60567e7a95d34c96d3@changeid> Date: Sun, 23 Apr 2023 15:59:14 +0800 In-Reply-To: <20230421151135.v2.3.Ia86ccac02a303154a0b8bc60567e7a95d34c96d3@changeid> (Douglas Anderson's message of "Fri, 21 Apr 2023 15:12:47 -0700") Message-ID: <87h6t7kp0t.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: qwfaaqyf3uttrqz6ewnink93tk41yogj X-Rspamd-Queue-Id: AA9DC80012 X-HE-Tag: 1682236823-853421 X-HE-Meta: U2FsdGVkX1+k4daWLMDom6ezRo7cgRU0JvNaqd3oABJRQUNJpsNwMKNEvSvUVYB0wYzlNAI+XfKnCqlO2VGvo1Xzsk+7n4Ke5I0xEwBvxDTsE4lYXAfD4QRe+EqDszQ/MS/EOibC7LKIQqQwWn0Ln6ASxvpeM+1UseQGpzLYp2OlXkD6sEFK/FmR2K9/2nSczgJ0RT8hgxmncSJbJ9vD/3s93vwemXZ3AxhYOh4DXC2BE0Lj865YIANc4bNd+2BOEnxDYWk6xJXRcNHqpTjzEfQ7vBMg6ffVj1cDGrtm5y8de6QWGrVhodEsVitQUoIGAFxnSeRAc6X3UPIPLNQboHAFqPj5Lz+ZQ2OX1Lvva258UICNAvckViqFea+yISGYCaUTX+XvdKlFd0OY3UtDodPsljKG8eNfqMAhF4ytZAbDR6oKE9com1QHqNN9vFMzLGkNH9PSReixnMmSkGyXINGa18hWqeWRPLHpAypLbM9p+0A4DRW2/Zpz47UamBuoOudElaRaK7LFGpJs3wDEGUcFDRUD8i0vvInJfCN+3Br+cq7u+gDfHcEjVkJ0hF2o8ea1B2AaN0lcKAAz/d7kXaCifnCwhFh1HLgjD5iBrNawrhosdzuPurqAVVT/CxkJOJknWR1bkBSqvm4w08UgU6YPsddCIM94OchUWBA8sAJD6Z5yoLdGm5JP4ro2Iz1wbp1nGbK7OojYrZE5HPqM+ZY+6s+vsxGYyP7yj73tIr/lPChSOD7tZjoEyWa6ibU5aAq4RJlo/5ikZVOAw62xliUjzRN3YoLCIux3xiGUAyTbSvPA7g+QMfDFmrmbwZzCP2pauUgSrF4OgELZEH/3KFJ4hPDvlJtmF8jfgK6cK3ZvfzvybEE58UOKoGeIPfMAfqLERMyyRWzm5M+LeMnsPHu/C/X4z54mkWlGVSRGlhYs7i65H6XeImn8pE8PUijLdQBL8d+HeuHJDqC+lLR x4W6NiZz 22QGFcK0lE2ejtP6lOftyVAZrQI5P14xImwQKYwI0ySkmIgQvmL8ZUF7gJh9pgjmduGGBuTWd+S797+Qvk35ocTT4jop8YH7oxqunIZ2jHNBFskkkNgjoZY1+gcbLTy1b4Uhgd+He5FhCBg1N4bSKO+TPZZVqsptSH8GIFhQWOB6mP0ctyLbMhLTssuOOunmD3N2eJBN8EirPsAerkl9wuEqdPElV0HMr8G7QGtdIliRtNzd++gqeP9hSohJtbRhSfqoguAL0AD7YiyUkiU9p2DTVSmwjPtKGzXHNXTBqiqjPL8pjpWhApbq67g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Douglas Anderson writes: > The MIGRATE_SYNC_LIGHT mode is intended to block for things that will > finish quickly but not for things that will take a long time. Exactly > how long is too long is not well defined, but waits of tens of > milliseconds is likely non-ideal. > > Waiting on the folio lock in isolate_movable_page() is something that > usually is pretty quick, but is not officially bounded. Nothing stops > another process from holding a folio lock while doing an expensive > operation. Having an unbounded wait like this is not within the design > goals of MIGRATE_SYNC_LIGHT. > > When putting a Chromebook under memory pressure (opening over 90 tabs > on a 4GB machine) it was fairly easy to see delays waiting for the > lock of > 100 ms. While the laptop wasn't amazingly usable in this > state, it was still limping along and this state isn't something > artificial. Sometimes we simply end up with a lot of memory pressure. > > Putting the same Chromebook under memory pressure while it was running > Android apps (though not stressing them) showed a much worse result > (NOTE: this was on a older kernel but the codepaths here are > similar). Android apps on ChromeOS currently run from a 128K-block, > zlib-compressed, loopback-mounted squashfs disk. If we get a page > fault from something backed by the squashfs filesystem we could end up > holding a folio lock while reading enough from disk to decompress 128K > (and then decompressing it using the somewhat slow zlib algorithms). > That reading goes through the ext4 subsystem (because it's a loopback > mount) before eventually ending up in the block subsystem. This extra > jaunt adds extra overhead. Without much work I could see cases where > we ended up blocked on a folio lock for over a second. With more > more extreme memory pressure I could see up to 25 seconds. > > Let's bound the amount of time we can wait for the folio lock. The > SYNC_LIGHT migration mode can already handle failure for things that > are slow, so adding this timeout in is fairly straightforward. > > With this timeout, it can be seen that kcompactd can move on to more > productive tasks if it's taking a long time to acquire a lock. How long is the max wait time of folio_lock_timeout()? > NOTE: The reason I stated digging into this isn't because some > benchmark had gone awry, but because we've received in-the-field crash > reports where we have a hung task waiting on the page lock (which is > the equivalent code path on old kernels). While the root cause of > those crashes is likely unrelated and won't be fixed by this patch, > analyzing those crash reports did point out this unbounded wait and it > seemed like something good to fix. > > ALSO NOTE: the timeout mechanism used here uses "jiffies" and we also > will retry up to 7 times. That doesn't give us much accuracy in > specifying the timeout. On 1000 Hz machines we'll end up timing out in > 7-14 ms. On 100 Hz machines we'll end up in 70-140 ms. Given that we > don't have a strong definition of how long "too long" is, this is > probably OK. You can use HZ to work with different configuration. It doesn't help much if your target is 1ms. But I think that it's possible to set it to longer than that in the future. So, some general definition looks better. Best Regards, Huang, Ying > Suggested-by: Mel Gorman > Signed-off-by: Douglas Anderson > --- > > Changes in v2: > - Keep unbounded delay in "SYNC", delay with a timeout in "SYNC_LIGHT" > > mm/migrate.c | 20 +++++++++++++++++++- > 1 file changed, 19 insertions(+), 1 deletion(-) > > diff --git a/mm/migrate.c b/mm/migrate.c > index db3f154446af..60982df71a93 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -58,6 +58,23 @@ > > #include "internal.h" > > +/* Returns the schedule timeout for a non-async mode */ > +static long timeout_for_mode(enum migrate_mode mode) > +{ > + /* > + * We'll always return 1 jiffy as the timeout. Since all places using > + * this timeout are in a retry loop this means that the maximum time > + * we might block is actually NR_MAX_MIGRATE_SYNC_RETRY jiffies. > + * If a jiffy is 1 ms that's 7 ms, though with the accuracy of the > + * timeouts it often ends up more like 14 ms; if a jiffy is 10 ms > + * that's 70-140 ms. > + */ > + if (mode == MIGRATE_SYNC_LIGHT) > + return 1; > + > + return MAX_SCHEDULE_TIMEOUT; > +} > + > bool isolate_movable_page(struct page *page, isolate_mode_t mode) > { > struct folio *folio = folio_get_nontail_page(page); > @@ -1162,7 +1179,8 @@ static int migrate_folio_unmap(new_page_t get_new_page, free_page_t put_new_page > if (current->flags & PF_MEMALLOC) > goto out; > > - folio_lock(src); > + if (folio_lock_timeout(src, timeout_for_mode(mode))) > + goto out; > } > locked = true;