From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 136DCC7618E for ; Fri, 21 Apr 2023 00:36:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 658A9900003; Thu, 20 Apr 2023 20:36:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5E161900002; Thu, 20 Apr 2023 20:36:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 45B76900003; Thu, 20 Apr 2023 20:36:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 35053900002 for ; Thu, 20 Apr 2023 20:36:36 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id F3E121204CB for ; Fri, 21 Apr 2023 00:36:35 +0000 (UTC) X-FDA: 80703532350.10.BD70748 Received: from mail-qt1-f179.google.com (mail-qt1-f179.google.com [209.85.160.179]) by imf08.hostedemail.com (Postfix) with ESMTP id 04332160004 for ; Fri, 21 Apr 2023 00:36:33 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=LVkhB7fm; spf=pass (imf08.hostedemail.com: domain of dianders@chromium.org designates 209.85.160.179 as permitted sender) smtp.mailfrom=dianders@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1682037394; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=SsVjJ4SkHhpAcJ2qMCfs0/jsmBQOUMoCMr/YEOhqf00=; b=rGJV7VpcnpqX4FDDY3FBHofjKUK3reqmtiTx4Q4FypA2chkOg3M8ntIopLMiMOuT593pNn BnWAq1NISPV0uNp0jAniL/afXoye9uLyM61K29/t61gWT+1iYZigipKXroR42aIokU/Ykx K2TaMwIj2ornqbgkomVqlMBSL6v51dM= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=LVkhB7fm; spf=pass (imf08.hostedemail.com: domain of dianders@chromium.org designates 209.85.160.179 as permitted sender) smtp.mailfrom=dianders@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1682037394; a=rsa-sha256; cv=none; b=3FcqoOgdewGlxmZNpjbbPefbqT2Gs0g9y8XEGIgWYff4Fgt6RAhPBw6iJeMdx3EoDN563t AH7ncFVugBHu90XgP8PTRcBgq0z0muWXRyOz9aZOaTO+nOcZf1vDievVKgAABmPbiiOvGt w0L/YDABuwHTAUg5w+ca2v47dZBJSAA= Received: by mail-qt1-f179.google.com with SMTP id d75a77b69052e-3ef35d44612so14273511cf.1 for ; Thu, 20 Apr 2023 17:36:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1682037392; x=1684629392; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=SsVjJ4SkHhpAcJ2qMCfs0/jsmBQOUMoCMr/YEOhqf00=; b=LVkhB7fmYHzbjISrJz1LWFnsU3aY3gtxk39iUKqM2fFk0JneQmJPPI2r764uWQc3ub hyitBAhiMcSh9fx1n+tUwI6YtYrCH+9DZk4mZpqPr56giJPWSUM8qDS0Q66Y9UBVr+f9 iCfABZBtFj33ZCQkDyvJ9SZCPW/J7F8IgoL4U= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682037392; x=1684629392; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=SsVjJ4SkHhpAcJ2qMCfs0/jsmBQOUMoCMr/YEOhqf00=; b=QvM6pxr75qUO1Rpdd7x/Jn9vEdDufrYnIeEojMMIYKuS8ET2K6ibrCEVF6RkRoMlQG u0KPuUWP73ppDDL40TBefPebozSjE7iNilBIZuAt5BY4p7bNZe3U60r/VXaoyHCOdIx9 i3Wga9BOBZPr4ubDChKQT5JJ+qJ5pyeH7Qqmi2F5pX1XnFQLxmdN/7ir0IugLB/EygFk kdOO9jg4wmE0rFhVX84qibA77hiHchb3GVMvYsjP1wVoiR8lqMzw+/aPscQLVDfnN4Sl n4Z8xW4iFqW34DcEoeZi1DzJY4uBFYg3BUpb7hu7DmbfR8SA8XIzgGTgxXMz0A8uiFod HY1g== X-Gm-Message-State: AAQBX9c5ZlWVdivML2GoiX9morti3uhhxf2i5fmKeysTHjyIUO2wBr8q Md+PelVJ+TiOZ9CHxbNh0znSKtf7l18VSEXo7f0= X-Google-Smtp-Source: AKy350adKfv8X5mkWVPi5Fr+GrM7XZN/f4BfbMrP0NBd5SSG9KibfGGm03VY68P76Thqs56iEttkLA== X-Received: by 2002:ac8:5bd6:0:b0:3db:6f27:60b9 with SMTP id b22-20020ac85bd6000000b003db6f2760b9mr5888168qtb.15.1682037391817; Thu, 20 Apr 2023 17:36:31 -0700 (PDT) Received: from mail-qt1-f173.google.com (mail-qt1-f173.google.com. [209.85.160.173]) by smtp.gmail.com with ESMTPSA id p141-20020a374293000000b0074de75f783fsm869052qka.26.2023.04.20.17.36.31 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 20 Apr 2023 17:36:31 -0700 (PDT) Received: by mail-qt1-f173.google.com with SMTP id d75a77b69052e-3ef36d814a5so981561cf.0 for ; Thu, 20 Apr 2023 17:36:31 -0700 (PDT) X-Received: by 2002:ac8:5fd1:0:b0:3ef:404a:b291 with SMTP id k17-20020ac85fd1000000b003ef404ab291mr66942qta.7.1682037390531; Thu, 20 Apr 2023 17:36:30 -0700 (PDT) MIME-Version: 1.0 References: <20230413182313.RFC.1.Ia86ccac02a303154a0b8bc60567e7a95d34c96d3@changeid> <87v8hz17o9.fsf@yhuang6-desk2.ccr.corp.intel.com> <87ildvwbr5.fsf@yhuang6-desk2.ccr.corp.intel.com> <20230420102304.7wdquge2b7r3xerj@techsingularity.net> In-Reply-To: <20230420102304.7wdquge2b7r3xerj@techsingularity.net> From: Doug Anderson Date: Thu, 20 Apr 2023 17:36:18 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [RFC PATCH] migrate_pages: Never block waiting for the page lock To: Mel Gorman Cc: Vlastimil Babka , "Huang, Ying" , Andrew Morton , Yu Zhao , linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: r3boemjpnjtzas7zkfzbeswkndpeex4t X-Rspam-User: X-Rspamd-Queue-Id: 04332160004 X-Rspamd-Server: rspam06 X-HE-Tag: 1682037393-952838 X-HE-Meta: U2FsdGVkX19bDymRxRcpHkVMHggyKwst5jABHVgkfRxIBw6Bu3O/3iR2xQ1j3EPH5rGchuZ+vpQAypcRlQ3MZua7SmnkAPr9TJu+AL/hYX84ghFgwGCoGljaMpWQ08FDCa4beP4BoF//j9AqOvRiYhmWfnzgGNh4lVtgV9pKUv3Il1x6HGdREabw/w6uGGmoKsV+tYf/pfKAC2xBwGoq8J+glkizGZjlZpRIa1V3SyOjSwRy6wUKnE+ImcXRou2ncaSRZtHUh6z5OVs4/LBbzRvi6ph5MhpJNbPNEIg2telhlmuTFlc9OZXzBRAzATUsINtq1RZZ/Fcpzzrm07qi2AIRiwh939cl+yGrD6igo1f4Y7dXFWh/qd4zulSD1o5bRVffOPb0vFUlRzzglC9CyoMd+Tyb6qRnzrDwF0mot92fnZhMEOPf8DiQYTf82kTnkh/R4O6/yNU7ukMG+G4RsGbb7XDu0SoT+Baqg1/1skDk+72sTPamNO5//ciNVJU/xOcoWpQCH7iwWAhBnjFPCZaB2kwyjXeOxOOiSV6QK/dxrk0P7K34n1ucns0QcVX8hCmuACdeGkoXWQXA+cn1CUvRm5s+n4uyHVZLVOYVICLds7CcibaGE3tYAr7f/R+SfvLYB6bti+Cm2Wlahs1jyTQj4nEAFyGGAoRzTXDFYMmhiXwfKj2PgJfU6kb3DInuu4wUsU9pfIabdD8xATrajEX94JG5UrPqul+/RQnSxh+f0qEB16KyfgMs54uL8vlU6E1uziKzkRwqS17FJrjV3inhA7KlBK7bDxZjOSPyApzUWQJNuhY9uxJQblqPPZCEfoaGvMbanad3tMMSYKvgNEtvyoFP/ZvlxRMQgwNDH6ruCz5d03zY+BI9uHPvQeecgnSACTsPgNyCXlUHkbZOWO6+HRcHky3ZQXoBsr+4BRzxbxhkPp5/eeTaySeEZQ/DDFQTsECo6TVPAIVRLZr iUXBDB6x TSaonTguaIH7Vzm2Y9VpaQBC0u2AtjnnJjfARZwXmFCCJVtj+cgu/zw7dLewFcFwZY7nYSbLIXA8ML6OyS3Iz28GD9KjjTR+XV7qLtHhqb1h76Xi9dMQwRE4G7i356cpSfpT67ghg1+ENRggeB/kof/SSP54VUwGKTtDWzcM+1wFO9VrQxLb7535WF5FRDQl+lThhmpE1wWtaoqOFDBM4gIg4eWIw4ctjgqgK9Bl+G4PtD6DYyArxRqF5mx7BKKWrfugFZhOPB5fJGr1tHxwVejesae0+EzR05CS5iuvM0HMOqAJkhoCF3Z0yKgE4N1xYKISa/UqSl6tLw5a7owEHlqcp+oBmFVmzGKmkJOCEVJt/2FtzuAmzHIMl0aarPJGBp1x1FB0VV0EWo+jkaeP7yFG4/3b4WJs4wqjr X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi, On Thu, Apr 20, 2023 at 3:23=E2=80=AFAM Mel Gorman wrote: > > On Tue, Apr 18, 2023 at 11:17:23AM +0200, Vlastimil Babka wrote: > > > Actually, the more I think about it the more I think the right answer > > > is to keep kcompactd as using MIGRATE_SYNC_LIGHT and make > > > MIGRATE_SYNC_LIGHT not block on the folio lock. kcompactd can accept > > > some blocking but we don't want long / unbounded blocking. Reading th= e > > > comments for MIGRATE_SYNC_LIGHT, this also seems like it fits pretty > > > well. MIGRATE_SYNC_LIGHT says that the stall time of writepage() is > > > too much. It's entirely plausible that someone else holding the lock > > > is doing something as slow as writepage() and thus waiting on the loc= k > > > can be just as bad for latency. > > > > +Cc Mel for potential insights. Sounds like a good compromise at first > > glance, but it's a tricky area. > > Also there are other callers of migration than compaction, and we shoul= d > > make sure we are not breaking them unexpectedly. > > > > It's tricky because part of the point of SYNC_LIGHT was to block on > some operations but not for too long. writepage was considered to be an > exception because it can be very slow for a variety of reasons. I think > At the time that writeback from reclaim context was possible and it was > very inefficient, more more inefficient than reads. Storage can also be > very slow (e.g. USB stick plugged in) and there can be large differences > between read/write performance (SMR, some ssd etc). Pages read were gener= ally > clean and could be migrated, pages for write may be redirtied etc. It was > assumed that while both read/write could lock the page for a long time, > write had worse hold times and most other users of lock page were transie= nt. I think some of the slowness gets into the complex ways that systems like ChromeOS are currently working. As mentioned in the commit message of my RFC patch, ChromeOS currently runs Android programs out of a 128K-block, zlib-compressed squashfs disk. That squashfs disk is actually a loopback mounted file on the main ext2 filesystem which is stored on something like eMMC. If I understand the everything correctly, if we get a page fault on memory backed by this squashfs filesystem, then we can end up holding a page/folio lock and then trying to read a pile of pages (enough to decompress the whole 128K block). ...but we don't read them directly, we instead block on ext4 which might need to allocate memory and then finally blocks on the block driver completing the task. This whole sequence of things is not necessarily fast. I think this is responsible for some of the very large numbers that were part of my original patch description. Without the above squashfs setup, we can still run into slow times but not quite as bad. I tried running a ChromeOS "memory pressure" test against a mainline kernel plus _just_ the browser (Android disabled). The test eventually opened over 90 tabs on my 4GB system and the system got pretty janky, but still barely usable. While running the test, I saw dozens of cases of folio_lock() taking over 10 ms and quite a few (~10?) of it taking over 100 ms. The peak I saw was ~380ms. I also happened to time buffer locking. That was much less bad with the worst case topping out at ~70ms. I'm not sure what timeout you'd consider to be bad. 10 ms? 20 ms? Also as a side note: I ran the same memory pressure but _with_ Android running (though it doesn't actually stress Android, it's just running in the background). To do this I had to run a downstream kernel. Here it was easy to see a ~1.7 ms wait on the page lock without any ridiculous amount of stressing. ...and a ~1.5 second wait for the buffer lock, too. > A compromise for SYNC_LIGHT or even SYNC on lock page would be to try > locking with a timeout. I don't think there is a specific helper but it > should be possible to trylock, wait on the folio_waitqueue and attempt > once to get the lock again. I didn't look very closely but it would > doing something similar to folio_wait_bit_common() with > io_schedule_timeout instead of io_schedule. This will have false > positives because the folio waitqueue may be woken for unrelated pages > and obviously it can race with other wait queues. > > kcompactd is an out-of-line wait and can afford to wait for a long time > without user-visible impact but 120 seconds or any potentially unbounded > length of time is ridiculous and unhelpful. I would still be wary about > adding new sync modes or making major modifications to kcompactd because > detecting application stalls due to a kcompactd modification is difficult= . OK, I'll give this a shot. It doesn't look too hard, but we'll see. > There is another approach -- kcompactd or proactive under heavy memory > pressure is probably a waste of CPU time and resources and should > avoid or minimise effort when under pressure. While direct compaction > can capture a page for immediate use, kcompactd and proactive reclaim > are just shuffling memory around for *potential* users and may be making > the overall memory pressure even worse. If memory pressure detection was > better and proactive/kcompactd reclaim bailed then the unbounded time to > lock a page is mitigated or completely avoided. I probably won't try to take this on, though it does sound like a good idea for future research.