From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E80FFE7718A for ; Thu, 19 Dec 2024 16:26:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7B64C6B0082; Thu, 19 Dec 2024 11:26:16 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7656B6B0085; Thu, 19 Dec 2024 11:26:16 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 62D336B0088; Thu, 19 Dec 2024 11:26:16 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 440016B0082 for ; Thu, 19 Dec 2024 11:26:16 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id E9D831A1757 for ; Thu, 19 Dec 2024 16:26:15 +0000 (UTC) X-FDA: 82912235616.14.D37B675 Received: from out-170.mta0.migadu.com (out-170.mta0.migadu.com [91.218.175.170]) by imf03.hostedemail.com (Postfix) with ESMTP id EDAA72001C for ; Thu, 19 Dec 2024 16:25:58 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=EqMg9tMd; spf=pass (imf03.hostedemail.com: domain of shakeel.butt@linux.dev designates 91.218.175.170 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734625559; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=g2334SKBzcMIdlQx2V++5W2JrGnzx2vdgTttgjhEBtc=; b=jPEi4UXhbb3RxLFdxO28q0AXyCN5MdxcLiwgEyykh9E6vefYX51WKisCiTKupbPNme881H Z+rp3bIfVyXPMStfsN1Lj2MPDvEQjpyUiMvhjDRJZ1GHfUdX1lNFG+sPiqVCjH13gXE4jh 4UfCwxFFWCuI+qkr4HBfxxmg6HITinY= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=EqMg9tMd; spf=pass (imf03.hostedemail.com: domain of shakeel.butt@linux.dev designates 91.218.175.170 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734625559; a=rsa-sha256; cv=none; b=OtQu2II8ICCQ4lGAJaWGVWeaIPHUauYcTB1WIiJH0EaDNYobblXw/chWZEhxevwcCL4/gR wsLfvHAJRH58U1FFbE0a3iCB2vnaSbsgrZPBRoYpJshYz62GVu0PJpZZB2pfGRky59FQMb l/vUfHK7G6Lni/M4naptN8Ew8vpG6N8= Date: Thu, 19 Dec 2024 08:26:06 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1734625571; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=g2334SKBzcMIdlQx2V++5W2JrGnzx2vdgTttgjhEBtc=; b=EqMg9tMdhea/sTRb4Dl/bpMCc1JKcTvtrlJg8eZ2T82hwswf4mEoQjEchdk6Yr4IAGNhw4 PW6Umw7Uohb53z1Mw3HtSbympx9IfJKVXg43K20LN8d/KTzTaxgJfWIHLFXsS926fvhSby 7Zk/cg5RC43AtE/+jACCRaeuzPK4nUE= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Shakeel Butt To: Zi Yan Cc: Bernd Schubert , David Hildenbrand , Joanne Koong , miklos@szeredi.hu, linux-fsdevel@vger.kernel.org, jefflexu@linux.alibaba.com, josef@toxicpanda.com, linux-mm@kvack.org, kernel-team@meta.com, Matthew Wilcox , Oscar Salvador , Michal Hocko Subject: Re: [PATCH v6 4/5] mm/migrate: skip migrating folios under writeback with AS_WRITEBACK_INDETERMINATE mappings Message-ID: <7qyun2waznrduxpf2i5eebqdvpigrd5ycu4rlpawu336kqkyvh@xmfmlsmr43gw> References: <0CF889CE-09ED-4398-88AC-920118D837A1@nvidia.com> <722A63E5-776E-4353-B3EE-DE202E4A4309@nvidia.com> <6FBDD501-25A0-4A21-8051-F8EE74AD177B@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <6FBDD501-25A0-4A21-8051-F8EE74AD177B@nvidia.com> X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: EDAA72001C X-Stat-Signature: 5e98qg7ruw3tuc195coiepucy4p47dgr X-Rspam-User: X-Rspamd-Server: rspam11 X-HE-Tag: 1734625558-628878 X-HE-Meta: U2FsdGVkX1//RHO540KyzKfuKPzbI0BRzVudbzqU61TDpVRw0mZv475QJIGJsZ4QqmNt9TZGm5IOMY4657YljPA+V99LpUqdGsl71ub1CQU+ZYflXGKVOagnWKKCVQ4sjTx1dmEvF7V22iX+z/TgsnLgeDxf/BBv3R5W7p6Y6JIlBKGckArgI7eyg3vzCHtTDmW691B/bC00tghE7YqDQk95SKuvz0sErcZ50oVAvu2AeYDb632uBrA4PtNLFJAM0IYp9D+3FpJI77AWhxYGJ3dYm3eK8ssrSrQjtSOWuiytfqb/aK1wk7TjfuxDzguECyNtHxnx5ZLoBILIHMrTZVGsQxiZWGv1/gWiIoWzTYBDIFpQj7ESxzmU9UCzFgQ0f5Gwxm965exGC51IH+WCkIbo1W6AaOWtE+U3QaNRoYNvo+PkKkPqFFUhb7tS6EfZnxrE3vf+kO8x4FHkS7FUOK5ll4Yr752JZGGwRwYOEofEB9+2y6RPXClj7lU47GLkoWJtiryKeDPMnl2ScDB9jgkMrXYmYDLGGaCaR/PbcoJ/OfGH0Pg6iX+G6Nlp6Cz1SbkgpDihnnweKx9zigeclwlIE/Vm60wE0qFQHHqwtGpqQJkBl6fEbH4vX4YirmMGDlJvVYPdI2zmWszh933a4aI1ZQS5gJKbiW5v7RPtfPyQeMt3g0RsbKPa2k3Cgilyte8MCyZCejAzEprOCIzOWZkTVoAzDVSyxvVnJ6t+bwJgDLMVAJtzTiZ1xouJEu7qB3w+zjf5Zi0etZ1YDBlTCnuLeB0UrEwTSDpalB6BO9FTE6it6U39SZAaK0S0afCTFO5q5GLEjn25crEXb7CpIYO+gugPweINexVfoI0TxH0bshHo5XNKPE1dN4mEq+m7aY847NJmHjq+Ue4ebi+sTkpoG1N4j3Lp7lJL8vvJwoqyYU8wRL72E2z4C/IIlaSqluHjyRfPJDpYf5UxSkV 4InxKC5i 5Td+7Mgv2e95VXF8B0ynWhdusmLSDdYx7MmJ9KDkvqcD66zHZrdSyKrGJrALZtXzs0Xs3skClDYmxKDSLjcdi36LbhGN+oRsDeAxieB6jXZ4fZL2iLqDkToCfEwl4GEEpLltvPxuSOStjEWHHpzXjeDdztKyiR+Aincno4yRRtgazArw4jBKgFPB4ngp75pwUNJToUh97OZPaeHPJDLeR15/xgO447VmR9cRjrzVLH9+fEsXGSvOml7XZnb/5VgJKs5O7cuxL+SQTAhQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Dec 19, 2024 at 11:14:49AM -0500, Zi Yan wrote: > On 19 Dec 2024, at 11:09, Bernd Schubert wrote: > > > On 12/19/24 17:02, Zi Yan wrote: > >> On 19 Dec 2024, at 11:00, Zi Yan wrote: > >>> On 19 Dec 2024, at 10:56, Bernd Schubert wrote: > >>> > >>>> On 12/19/24 16:55, Zi Yan wrote: > >>>>> On 19 Dec 2024, at 10:53, Shakeel Butt wrote: > >>>>> > >>>>>> On Thu, Dec 19, 2024 at 04:47:18PM +0100, David Hildenbrand wrote: > >>>>>>> On 19.12.24 16:43, Shakeel Butt wrote: > >>>>>>>> On Thu, Dec 19, 2024 at 02:05:04PM +0100, David Hildenbrand wrote: > >>>>>>>>> On 23.11.24 00:23, Joanne Koong wrote: > >>>>>>>>>> For migrations called in MIGRATE_SYNC mode, skip migrating the folio if > >>>>>>>>>> it is under writeback and has the AS_WRITEBACK_INDETERMINATE flag set on its > >>>>>>>>>> mapping. If the AS_WRITEBACK_INDETERMINATE flag is set on the mapping, the > >>>>>>>>>> writeback may take an indeterminate amount of time to complete, and > >>>>>>>>>> waits may get stuck. > >>>>>>>>>> > >>>>>>>>>> Signed-off-by: Joanne Koong > >>>>>>>>>> Reviewed-by: Shakeel Butt > >>>>>>>>>> --- > >>>>>>>>>> mm/migrate.c | 5 ++++- > >>>>>>>>>> 1 file changed, 4 insertions(+), 1 deletion(-) > >>>>>>>>>> > >>>>>>>>>> diff --git a/mm/migrate.c b/mm/migrate.c > >>>>>>>>>> index df91248755e4..fe73284e5246 100644 > >>>>>>>>>> --- a/mm/migrate.c > >>>>>>>>>> +++ b/mm/migrate.c > >>>>>>>>>> @@ -1260,7 +1260,10 @@ static int migrate_folio_unmap(new_folio_t get_new_folio, > >>>>>>>>>> */ > >>>>>>>>>> switch (mode) { > >>>>>>>>>> case MIGRATE_SYNC: > >>>>>>>>>> - break; > >>>>>>>>>> + if (!src->mapping || > >>>>>>>>>> + !mapping_writeback_indeterminate(src->mapping)) > >>>>>>>>>> + break; > >>>>>>>>>> + fallthrough; > >>>>>>>>>> default: > >>>>>>>>>> rc = -EBUSY; > >>>>>>>>>> goto out; > >>>>>>>>> > >>>>>>>>> Ehm, doesn't this mean that any fuse user can essentially completely block > >>>>>>>>> CMA allocations, memory compaction, memory hotunplug, memory poisoning... ?! > >>>>>>>>> > >>>>>>>>> That sounds very bad. > >>>>>>>> > >>>>>>>> The page under writeback are already unmovable while they are under > >>>>>>>> writeback. This patch is only making potentially unrelated tasks to > >>>>>>>> synchronously wait on writeback completion for such pages which in worst > >>>>>>>> case can be indefinite. This actually is solving an isolation issue on a > >>>>>>>> multi-tenant machine. > >>>>>>>> > >>>>>>> Are you sure, because I read in the cover letter: > >>>>>>> > >>>>>>> "In the current FUSE writeback design (see commit 3be5a52b30aa ("fuse: > >>>>>>> support writable mmap"))), a temp page is allocated for every dirty > >>>>>>> page to be written back, the contents of the dirty page are copied over to > >>>>>>> the temp page, and the temp page gets handed to the server to write back. > >>>>>>> This is done so that writeback may be immediately cleared on the dirty > >>>>>>> page," > >>>>>>> > >>>>>>> Which to me means that they are immediately movable again? > >>>>>> > >>>>>> Oh sorry, my mistake, yes this will become an isolation issue with the > >>>>>> removal of the temp page in-between which this series is doing. I think > >>>>>> the tradeoff is between extra memory plus slow write performance versus > >>>>>> temporary unmovable memory. > >>>>> > >>>>> No, the tradeoff is slow FUSE performance vs whole system slowdown due to > >>>>> memory fragmentation. AS_WRITEBACK_INDETERMINATE indicates it is not > >>>>> temporary. > >>>> > >>>> Is there is a difference between FUSE TMP page being unmovable and > >>>> AS_WRITEBACK_INDETERMINATE folios/pages being unmovable? > >> > >> (Fix my response location) > >> > >> Both are unmovable, but you can control where FUSE TMP page > >> can come from to avoid spread across the entire memory space. For example, > >> allocate a contiguous region as a TMP page pool. > > > > Wouldn't it make sense to have that for fuse writeback pages as well? > > Fuse tries to limit dirty pages anyway. > > Can fuse constraint the location of writeback pages? Something like what > I proposed[1], migrating pages to a location before their writeback? Will > that be a performance concern? > > In terms of the number of dirty pages, you only need one page out of 512 > pages to prevent 2MB THP from allocation. For CMA allocation, one unmovable > page can kill one contiguous range. What is the limit of fuse dirty pages? > > [1] https://lore.kernel.org/linux-mm/90C41581-179F-40B6-9801-9C9DBBEB1AF4@nvidia.com/ I think this whole concern of fuse making system memory unmovable forever is overblown. Fuse is already using a temp (unmovable) page for the writeback and is slow and is being removed in this series.