From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 636A7C77B73 for ; Thu, 20 Apr 2023 01:40:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A7E296B0071; Wed, 19 Apr 2023 21:40:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A2D516B0072; Wed, 19 Apr 2023 21:40:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8F5176B0074; Wed, 19 Apr 2023 21:40:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 7F9E66B0071 for ; Wed, 19 Apr 2023 21:40:18 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 4D6B212044F for ; Thu, 20 Apr 2023 01:40:18 +0000 (UTC) X-FDA: 80700064116.22.8BC6BD4 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by imf02.hostedemail.com (Postfix) with ESMTP id A50B580008 for ; Thu, 20 Apr 2023 01:40:15 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="OHESIq/P"; spf=pass (imf02.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681954816; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=BGmYLSp+bb5SLTzdl+IwGsZqjkJJ/QyC/hxiyrzojd4=; b=2aUFz/DpXiHloliIoWJEWtiTXihd6SiP5bGfyVRwE5QJGzuuZvdEzfcZSI63D+DNQGso84 7AVQGh3PKKTNRIQ4SMjoQto/t6ZxHKd3KyA2hWa2fKlixf+AzJ7UwT1LfaRP9vTEApufhC w2uyEd/S8vgmK7lBXVT00oPRKIylyPM= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1681954816; a=rsa-sha256; cv=none; b=pE0as/50W7saqR45y4+hJthdxQwT6VJtZDrec1EB9YGaUkstyImnu1v2t3HRMKN9WX0Hq9 P+f4VakThdFnIhI3UyU0thpNydgg+W/EdsWNt63EJdiEPdHp3nCnFI9EMoP2Gz9nxg2VKD swa56uedNtxbiqjFzM81THjaN3bq9ro= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="OHESIq/P"; spf=pass (imf02.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1681954815; x=1713490815; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version:content-transfer-encoding; bh=G1DkaHEZt8/FNUw8vlJ9t05zmgqwE4zlsOzsNff0UvY=; b=OHESIq/PjKJhmos0lxJZ3ELu7VIq2PGPRawv5RdjSem/KXXzPFbDjBSz 1Tmcw9btgO3b1piHt2c0RciE785CjKBxnQrif+gJJbESVhAz5C+x7Tsq9 0xWPzv6/4/isUwO0NvpE9BAz64t57cXwNgOMg6110GL3ij0uVtIJrgkAv bc0YLRbG8hA6SJE2/cdSwCJ4RpOXzvF4N2UlrqLhUuDRYvSJZyV1pE8bF 5fqBTTuJXEd72YpuslI9kval9YHTa91ydrtdhcbWYSWjvG71IJSPAQ+iW UGFJUvVYLh29j4FRx4yZxDmDbuf8m4N2eEgN8sfzI51yGIsjcwmgecxtz g==; X-IronPort-AV: E=McAfee;i="6600,9927,10685"; a="334425791" X-IronPort-AV: E=Sophos;i="5.99,211,1677571200"; d="scan'208";a="334425791" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2023 18:40:13 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10685"; a="669184169" X-IronPort-AV: E=Sophos;i="5.99,211,1677571200"; d="scan'208";a="669184169" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2023 18:40:11 -0700 From: "Huang, Ying" To: Doug Anderson Cc: Andrew Morton , Yu Zhao , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Vlastimil Babka , Mel Gorman Subject: Re: [RFC PATCH] migrate_pages: Never block waiting for the page lock References: <20230413182313.RFC.1.Ia86ccac02a303154a0b8bc60567e7a95d34c96d3@changeid> <87v8hz17o9.fsf@yhuang6-desk2.ccr.corp.intel.com> <87ildvwbr5.fsf@yhuang6-desk2.ccr.corp.intel.com> <87edohvpzk.fsf@yhuang6-desk2.ccr.corp.intel.com> <87wn28u2wy.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Thu, 20 Apr 2023 09:39:02 +0800 In-Reply-To: (Doug Anderson's message of "Wed, 19 Apr 2023 12:30:55 -0700") Message-ID: <87h6tbtjrd.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Queue-Id: A50B580008 X-Rspamd-Server: rspam09 X-Stat-Signature: cp7n89bnzjk7pm4nog1g5rr6xrm7c4ej X-HE-Tag: 1681954815-875169 X-HE-Meta: U2FsdGVkX183bz6BnoU9KGw1wq052c6oVFNiT2537n4u2MLhkLHXtIy4GfQZJy49fh8jmi/et1JwmHPT8E8l+HMBNqvKr2J8SjJpcvexF6FPsiaWh2c5OKAgYVJ46TA1mDcEXnRCQp3fa5oxnbRx6mWHQ49C0iTXm7xvQ6YhQGvb0eadOHHZZgkbU+JdK9Jr/8H+p4Kiufz0wJuS9/mv7DpXxae5UBhsX/4LxJ7eHB9iDkGShB8W/XxKPBazuQOCb/Dqp+XYFK4+ftzX7A63WkI0PK2/Orq79W+gjBqljjtfd2EwBJAdyl+K6QcjItR6NneNLVPaS21P0CtZ7onifMo726MlQcg8tD6QAuav3xVFB6uiu94zrcr2VUqjqHRyIiI5GDUg20FTiJgq5fLZfbkVvnbNTuobnXKMqPID+huRxppWWH3UgRXs6kRB7zFgppHo8MUzqM481xLa8oSPKmWhQO+E6CBuBljEUrzCWRG+vZunI2fVqmTL4QLWy2I3zaywfnM6SoCxJl6Ck+h1jjXLiX0BRTz1eNcLgzhWzbRqTdbF+W9jS1XOTmKxexPUumR7sTuvUP1so3O/cMafqYl38/7uZ3Az1c/au5D0T6AiN3qRdql3b4np8TEJ2pG7gwyai2lqKb2n/YT3mTMj2gXFmec4jci1blJ8yp8H4vNEh4bPp3GK7zw9IHQo/+/A5Iz44HEit0Hs1QE6jd1vIG/DMtYRARae5JgFLR+D/0upch56Euu79687OT5pnG8TyhsK1/ySaSTII5bTY79H3qhgTxjQ/Xpm3YKw/DOiHHr58ydPE/lK40gTdNirtXdhrSnQ++DR/Nd3l3tSDzIlFvQhjD3vk9lfKYP6Hin/QlTUpCtoXtsSnd6B3P/hyRck2XhLxZVbetn/9LjV/qqo18/rZMEJlwq7+aDuF8CKGWoUQqT6huwjPJEQfELHC19u3fysKk46n4I05uX2o6I 0P8fKIby PbrZZomyaKGWUTvZ7ca6AwWHNYJOJQwLFLOtlPkRBjkpFp2On1WqQgsmatSFxkVZGDgqpc5iah0AY8GejdJ2wLVJGYGvEk2932OUOpfXp+tcG10NDORxQthjCeJEW9tZrX83uiAOrRiZ5PKe7lu+hXslXl29X8wZA3ZcUlI14eeYjgSVZD67XbswKSDYEhvTDSSoBr0o32M5AYmeJFmP3AJ48+EIvJ7NSr1OkQiwvX5d2wKeFBxKjZAYn317EWFttuZPykhFZW0G5GqWFluJSGvHFjErdCg8NSlDlEbR8JZM8E0c= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Doug Anderson writes: > Hi, > > On Tue, Apr 18, 2023 at 5:34=E2=80=AFPM Huang, Ying wrote: >> >> >> >> >> TBH, the test case is too extreme for me. >> >> >> > >> >> >> > That's fair. That being said, I guess the point I was trying to = make >> >> >> > is that waiting for this lock could take an unbounded amount of = time. >> >> >> > Other parts of the system sometimes hold a page lock and then do= a >> >> >> > blocking operation. At least in the case of kcompactd there are = better >> >> >> > uses of its time than waiting for any given page. >> >> >> > >> >> >> >> And, we have multiple "sync" mode to deal with latency requirem= ent, for >> >> >> >> example, we use MIGRATE_SYNC_LIGHT for compaction to avoid too = long >> >> >> >> latency. If you have latency requirement for some users, you m= ay >> >> >> >> consider to add new "sync" mode. >> >> >> > >> >> >> > Sure. kcompactd_do_work() is currently using MIGRATE_SYNC_LIGHT.= I >> >> >> > guess my first thought would be to avoid adding a new mode and m= ake >> >> >> > MIGRATE_SYNC_LIGHT not block here. Then anyone that truly needs = to >> >> >> > wait for all the pages to be migrated can use the heavier sync m= odes. >> >> >> > It seems to me like the current users of MIGRATE_SYNC_LIGHT woul= d not >> >> >> > want to block for an unbounded amount of time here. What do you = think? >> >> >> >> >> >> It appears that you can just use MIGRATE_ASYNC if you think the co= rrect >> >> >> behavior is "NOT block at all". I found that there are more >> >> >> fine-grained controls on this in compaction code, please take a lo= ok at >> >> >> "enum compact_priority" and its comments. >> >> > >> >> > Actually, the more I think about it the more I think the right answ= er >> >> > is to keep kcompactd as using MIGRATE_SYNC_LIGHT and make >> >> > MIGRATE_SYNC_LIGHT not block on the folio lock. >> >> >> >> Then, what is the difference between MIGRATE_SYNC_LIGHT and >> >> MIGRATE_ASYNC? >> > >> > Aren't there still some differences even if we remove blocking this >> > one lock? ...or maybe your point is that maybe the other differences >> > have similar properties? >> >> Sorry for confusing words. Here, I asked you to list the implementation >> difference between MIGRATE_ASYNC and MIGRATE_SYNC_LIGHT after your >> proposed changes. Which are waited in MIGRATE_SYNC_LIGHT but not in >> MIGRATE_ASYNC? > > Ah, got it! It's not always the easiest to follow all the code paths, > but let's see what I can find. > > I guess to start with, though, I will assert that someone seems to > have believed that there was an important difference between > MIGRATE_ASYNC and MIGRATE_SYNC_LIGHT besides waiting on the lock in > migrate_folio_unmapt() since (as I found in my previous digging) the > "direct reclaim" path never grabs this lock but explicitly sometimes > chooses MIGRATE_ASYNC some times and MIGRATE_SYNC_LIGHT other times. > > OK, so looking at mainline Linux and comparing differences in behavior > between SYNC_LIGHT and ASYNC and thoughts about which one should be > used for kcompactd. Note that I won't go _too_ deep into all the > differences... > > -- > > In nfs.c: > > 1. We will wait for the fscache if SYNC_LIGHT but not ASYNC. No idea > what would be the most ideal for calls from kcompactd. This appears like something like disk writing. > In compaction.c: > > 2. We will update the non-async "compact_cached_migrate_pfn" for > SYNC_LIGHT but not ASYNC since we keep track of sync and async > progress separately. > > 3. compact_lock_irqsave() note contentions for ASYNC but not > SYNC_LIGHT and cause an earlier abort. Seems like kcompactd would want > the SYNC_LIGHT behavior since this isn't about things indefinitely > blocking. > > 4. isolate_migratepages_block() will bail if too_many_isolated() for > ASYNC but not SYNC_LIGHT. My hunch is that kcompactd wants the > SYNC_LIGHT behavior for kcompact. > > 5. If in direct compaction, isolate_migratepages_block() sets > "skip_on_failure" for ASYNC but not SYNC_LIGHT. My hunch is that > kcompactd wants the SYNC_LIGHT behavior for kcompact. > > 6. suitable_migration_source() considers more things suitable > migration sources when SYNC_LIGHT but not (ASYNC+direct_compaction). > Doesn't matter since kcompactd isn't direct compaction and non-direct > compaction is the same. > > 7. fast_isolate_around() does less scanning when SYNC_LIGHT but not > (ASYNC+direct_compaction). Again, it doesn't matter for kcompactd. > > 8. isolate_freepages() uses a different stride with SYNC_LIGHT vs. > ASYNC. My hunch is that kcompactd wants the SYNC_LIGHT behavior for > kcompact. > > In migrate.c: > > 9. buffer_migrate_lock_buffers() will block waiting to lock buffers > with SYNC_LIGHT but not ASYNC. I don't know for sure, but this feels > like something we _wouldn't_ want to block on in kcompactd and instead > should look for easier pickings. IIUC, this is similar as page lock. > 10. migrate_folio_unmap() has the case we've already talked about > > 11. migrate_folio_unmap() does batch flushes for async because it > doesn't need to worry about a class of deadlock. Somewhat recent code > actually ends up running this code first even for sync modes to get > the batch. > > 12. We'll retry a few more times for SYNC_LIGHT than ASYNC. Seems like > the more retries won't really hurt for kcompactd. > > -- > > So from looking at all the above, I'll say that kcompactd should stick > with SYNC_LIGHT and we should fix #10. In other words, like my > original patch except that we keep blocking on the lock in the full > SYNC modes. > > It's possible that we should also change case #9 I listed above. Do > you know if locking buffers is likely to block on something as slow as > page reading/writing? IIUC, this is related to page reading/writing. Buffer head is used by ext2/4 to read/write. Thank you very much for your research. It looks like ASYNC isn't appropriate for kcompactd. >From the comments of SYNC_LIGHT, * MIGRATE_SYNC_LIGHT in the current implementation means to allow blocking * on most operations but not ->writepage as the potential stall time * is too significant To make SYNC_LIGHT block on less operations than before, I guess that you need to prove the stall time can be long with the operation with not-so-extreme test cases. Best Regards, Huang, Ying