From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 14186C77B61 for ; Mon, 24 Apr 2023 09:38:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 79CDC6B0071; Mon, 24 Apr 2023 05:38:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 74DAD6B0074; Mon, 24 Apr 2023 05:38:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 645456B0075; Mon, 24 Apr 2023 05:38:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 542A46B0071 for ; Mon, 24 Apr 2023 05:38:24 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 1B5D312032B for ; Mon, 24 Apr 2023 09:38:24 +0000 (UTC) X-FDA: 80715784128.27.3697D49 Received: from outbound-smtp17.blacknight.com (outbound-smtp17.blacknight.com [46.22.139.234]) by imf21.hostedemail.com (Postfix) with ESMTP id 10C971C000F for ; Mon, 24 Apr 2023 09:38:21 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf21.hostedemail.com: domain of mgorman@techsingularity.net designates 46.22.139.234 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1682329102; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5pDyAs0ZkxJjepjoBC4iCwtnGSBLR4pKEYEtxhYz+ok=; b=nnh+eV46m0z2EG9QaX71Inw9e2tPUK2fUGcZ1LBo0CrYAxUCq5ozRZ/j4eCl4YrPLXcS36 m5Ex5n6hGwLyzyaHINZSMHDXEWbzirSufzYRE6RWnxUyf5P29HyH1qc1rgE7IOC+rMkgO2 z75Pb3OzDWxlqYUA9V9cd71Yhw8X2Ck= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf21.hostedemail.com: domain of mgorman@techsingularity.net designates 46.22.139.234 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1682329102; a=rsa-sha256; cv=none; b=keStY2b/fq73YFKLRXY+FbwTqAAj4WYYC1QdXiiHLE7y+CmnIVJGWFYVk3/r1k5wYwGEfv 1WrrcmdOkd8H0nge1xGh0DTB7unCYy0xSsAWBHHj05ea2HSbyuu6tWeJqypeoFXJX3AWXJ jdo3SIk8MbZcXAJOIMSinO9OXvT6XUM= Received: from mail.blacknight.com (pemlinmail06.blacknight.ie [81.17.255.152]) by outbound-smtp17.blacknight.com (Postfix) with ESMTPS id 0A2AD1C3AE3 for ; Mon, 24 Apr 2023 10:38:20 +0100 (IST) Received: (qmail 14803 invoked from network); 24 Apr 2023 09:38:19 -0000 Received: from unknown (HELO techsingularity.net) (mgorman@techsingularity.net@[84.203.21.103]) by 81.17.254.9 with ESMTPSA (AES256-SHA encrypted, authenticated); 24 Apr 2023 09:38:19 -0000 Date: Mon, 24 Apr 2023 10:38:17 +0100 From: Mel Gorman To: Douglas Anderson Cc: Andrew Morton , Vlastimil Babka , Ying , Alexander Viro , Christian Brauner , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Yu Zhao , linux-fsdevel@vger.kernel.org, Matthew Wilcox Subject: Re: [PATCH v2 3/4] migrate_pages: Don't wait forever locking pages in MIGRATE_SYNC_LIGHT Message-ID: <20230424093817.am3qpsba35yrhmow@techsingularity.net> References: <20230421221249.1616168-1-dianders@chromium.org> <20230421151135.v2.3.Ia86ccac02a303154a0b8bc60567e7a95d34c96d3@changeid> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20230421151135.v2.3.Ia86ccac02a303154a0b8bc60567e7a95d34c96d3@changeid> X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 10C971C000F X-Stat-Signature: tpbhxy6euaicy1t8mznahgg1tbqy68rd X-HE-Tag: 1682329101-670995 X-HE-Meta: U2FsdGVkX1/hITwLGmvHn3kgYN4qIKo5HWhYIWNmLOMGm4LaRqhnjtkLYkZayFs0mOj7A0E5o5WyqfPKGwf6CBP7xb5bPtuYhxGypmP9t+ZerE8n9jzrIut7MHIBFuuK+C0Mn0cPzXFr6J/bF9chyVJEC/waRrDhlLbflUDh1cTv4ZAwfCI5QWJuk0teuvzc3TZFeYxnd0StTnAbJX0TW+HQ9zjs7COQ+IPhtIhs3Pe+o782rx2iI854Onzzqjz0zixvPdi9zGg17nI1xKplHdic+JuFWZNktjNkBnUFrIwnoaTT0Mh7Wq1EAIM6YBmQwbvdgm3H2V4TYWQYaZs2Ps7pWqqfWgAl7EH+JANz8sPALw/8iucX2+lWktPCh9rWPtaS6ZMjB5VU8ul+q0AjHtjZfiIOerx07UtNyyttrAz2sjP37u5hXRDbMUPFyreHLuzYjCLXEoLxuoRrl5KwXMOdwsCZJKqpFHnlRs6ozIp/QcESLfTweUEvc/07svxmrpSuCFZ1J2wUS6s3To7K3W4hUHO884ZLVZfdet/40DVn1sOVAKafQrg1a46/MxrgvPE7biuJyoTHnmgOUaUyhUw5p8qEPD1+zioqk15G7FlFrLfncuepqDaF/Y2JwyT9YV/4jjKZUvZvcq/smwbGkW29psqtRTsoWknwsZpc+K6T6hSFYXfqiPdW6wuQ3Amj97CntXKtMAjPkUiFDOg4pjqJ5BakdUmL1ejrRnln4TNpHsJrSRvRAwosDVYqnC4NDH6oGzWvJEpKZDa6XxRE0+6RizSHCw/xnxI8wNDRVXS54RCAO05lAeeYDQRNCbLmYZ1q0khb1Fw4jqxBmUum1LdTmyK7lho1vxge0ywJUSSu2ARyvWNhEL471DNQoZIRL7L/l4Mnqg3p9Y5Hp0fAtvsu+v5ZzUzWe6+VL5IxZr7oZx0zNxKIeF1ykqEMhpCJtQF0eDFm4eF5UDkwdXu cYOCW7gD Ki/3tED4HvzVVa/ZbvLDDgWQDKQt1+qpXg+IYZxEA/FkP7Jd7EgK0F6J5Cdv3ToF1WvHhKrsyVh6PIqsmwy0BYnEB6WBgWK+PZoCgg+W7rpubE1ydxPVcxlwxIbFCnh3Ymcn6+8TmrLovNFkMmBFByvSbiui9Ri3XC4RkqtfOH0Xc4YWZKfLav8704gfdoEPcGfQkTBRmLSYvMJbxlZJNESk3rQbUJEhZ/hKebASv23jj0GVJP0J6XmbZpm1K20qGggXAUt8KNoloupSmE2SdWhv9GMzDdPhAhT1BG15pGTXQLvDWqJSkIUuB9MSVm1ElTY18GIQjXPLZ/9rGpfpT7tHViw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Apr 21, 2023 at 03:12:47PM -0700, Douglas Anderson wrote: > The MIGRATE_SYNC_LIGHT mode is intended to block for things that will > finish quickly but not for things that will take a long time. Exactly > how long is too long is not well defined, but waits of tens of > milliseconds is likely non-ideal. > > Waiting on the folio lock in isolate_movable_page() is something that > usually is pretty quick, but is not officially bounded. Nothing stops > another process from holding a folio lock while doing an expensive > operation. Having an unbounded wait like this is not within the design > goals of MIGRATE_SYNC_LIGHT. > > When putting a Chromebook under memory pressure (opening over 90 tabs > on a 4GB machine) it was fairly easy to see delays waiting for the > lock of > 100 ms. While the laptop wasn't amazingly usable in this > state, it was still limping along and this state isn't something > artificial. Sometimes we simply end up with a lot of memory pressure. > > Putting the same Chromebook under memory pressure while it was running > Android apps (though not stressing them) showed a much worse result > (NOTE: this was on a older kernel but the codepaths here are > similar). Android apps on ChromeOS currently run from a 128K-block, > zlib-compressed, loopback-mounted squashfs disk. If we get a page > fault from something backed by the squashfs filesystem we could end up > holding a folio lock while reading enough from disk to decompress 128K > (and then decompressing it using the somewhat slow zlib algorithms). > That reading goes through the ext4 subsystem (because it's a loopback > mount) before eventually ending up in the block subsystem. This extra > jaunt adds extra overhead. Without much work I could see cases where > we ended up blocked on a folio lock for over a second. With more > more extreme memory pressure I could see up to 25 seconds. > > Let's bound the amount of time we can wait for the folio lock. The > SYNC_LIGHT migration mode can already handle failure for things that > are slow, so adding this timeout in is fairly straightforward. > > With this timeout, it can be seen that kcompactd can move on to more > productive tasks if it's taking a long time to acquire a lock. > > NOTE: The reason I stated digging into this isn't because some > benchmark had gone awry, but because we've received in-the-field crash > reports where we have a hung task waiting on the page lock (which is > the equivalent code path on old kernels). While the root cause of > those crashes is likely unrelated and won't be fixed by this patch, > analyzing those crash reports did point out this unbounded wait and it > seemed like something good to fix. > > ALSO NOTE: the timeout mechanism used here uses "jiffies" and we also > will retry up to 7 times. That doesn't give us much accuracy in > specifying the timeout. On 1000 Hz machines we'll end up timing out in > 7-14 ms. On 100 Hz machines we'll end up in 70-140 ms. Given that we > don't have a strong definition of how long "too long" is, this is > probably OK. > > Suggested-by: Mel Gorman > Signed-off-by: Douglas Anderson > --- > > Changes in v2: > - Keep unbounded delay in "SYNC", delay with a timeout in "SYNC_LIGHT" > > mm/migrate.c | 20 +++++++++++++++++++- > 1 file changed, 19 insertions(+), 1 deletion(-) > > diff --git a/mm/migrate.c b/mm/migrate.c > index db3f154446af..60982df71a93 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -58,6 +58,23 @@ > > #include "internal.h" > > +/* Returns the schedule timeout for a non-async mode */ > +static long timeout_for_mode(enum migrate_mode mode) > +{ > + /* > + * We'll always return 1 jiffy as the timeout. Since all places using > + * this timeout are in a retry loop this means that the maximum time > + * we might block is actually NR_MAX_MIGRATE_SYNC_RETRY jiffies. > + * If a jiffy is 1 ms that's 7 ms, though with the accuracy of the > + * timeouts it often ends up more like 14 ms; if a jiffy is 10 ms > + * that's 70-140 ms. > + */ > + if (mode == MIGRATE_SYNC_LIGHT) > + return 1; > + Use switch and WARN_ON_ONCE if MIGRATE_ASYNC with a fallthrough to MIGRATE_SYNC_LIGHT? > + return MAX_SCHEDULE_TIMEOUT; > +} > + Even though HZ is defined at compile time, it is underdesirable to use a constant timeout unrelated to HZ because it's normal case is variable depending on CONFIG_HZ. Please use a value like DIV_ROUND_UP(HZ/250) or DIV_ROUND_UP(HZ/1000) for a 4ms or 1ms timeout respectively. Even though it's still potentially variable, it would make any hypothetical transition to [milli|micro|nano]seconds easier in the future as the intent would be known. While there are no plans for change as such, working in jiffies is occasionally problematic in kernel/sched/. At OSPM this year, the notion of dynamic HZ was brought up (it would be hard) and a preliminary step would be converting all uses of HZ to normal time. -- Mel Gorman SUSE Labs