From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B8940E7718B for ; Fri, 27 Dec 2024 18:26:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E53886B0092; Fri, 27 Dec 2024 13:26:10 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E02C76B0095; Fri, 27 Dec 2024 13:26:10 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CCA056B0096; Fri, 27 Dec 2024 13:26:10 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id A824E6B0092 for ; Fri, 27 Dec 2024 13:26:10 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 256941A0899 for ; Fri, 27 Dec 2024 18:26:10 +0000 (UTC) X-FDA: 82941566820.13.B5774F2 Received: from mail-qt1-f178.google.com (mail-qt1-f178.google.com [209.85.160.178]) by imf27.hostedemail.com (Postfix) with ESMTP id 8B5644000F for ; Fri, 27 Dec 2024 18:25:21 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="m2q/gXO/"; spf=pass (imf27.hostedemail.com: domain of joannelkoong@gmail.com designates 209.85.160.178 as permitted sender) smtp.mailfrom=joannelkoong@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1735323918; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ADXKvKpKDzwI9mOaMH3qaGZERZnHzlEkr8M4zdCnWUI=; b=IYZ6cxPLKW3r89bWnMHPoqE8wT2RZsTHvmta/dRPWnq3Nvd5TSY51nKGAulEfKks/fcp3E LU1JniGm8BO298F5OiZpdI8PS1SN2q8SLQkzCBlUePbb95R1lpz0rc6vOSCkOSpvJZj4ui HVArrhHuZw6JD6gkMztR+ioeA4xJisM= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1735323918; a=rsa-sha256; cv=none; b=jdgQzHT1RtX9Qo54Ms+IcSv/9AOiURKUKe8/qPUMRy36hYp4i1hQCmvKdR7y6BRTk78PcD NWgY1WaiIKv/U0WgK/Nj2rQcr04A08WESdF7TdhUzCbMyjEZNXLSYj7u4I1Km2bXIsRlHe Rql5M4VxY0T8TiW7Cv0JHHzhdFHYT2A= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="m2q/gXO/"; spf=pass (imf27.hostedemail.com: domain of joannelkoong@gmail.com designates 209.85.160.178 as permitted sender) smtp.mailfrom=joannelkoong@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-qt1-f178.google.com with SMTP id d75a77b69052e-467a17055e6so83179501cf.3 for ; Fri, 27 Dec 2024 10:26:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1735323967; x=1735928767; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ADXKvKpKDzwI9mOaMH3qaGZERZnHzlEkr8M4zdCnWUI=; b=m2q/gXO/ZfpiS/Wuq4GhSXdlCHZsy3JxqhEpyzd6auDHoWpGqskG0U3yeN+0HMibz/ n5+YUD9J1SK0p2L2QBpxeoMzESegSRPOndjJDqyZcNgcuEAx2h8QP3tdRG/lMQ6pLPjp SY3gnIrAS8AX2Pf7JB62B3FGCsHj9AGXYTPk0pC02VvC2jNO6S/knF2QrqS8BFj9wSA4 qII/5bFW2YHg4rs6DH7du8QmSLNALebRz1QKmLSlPLncXQ5i9ISLQK9T0c/YK8tOJzMp GmmG5O85pfo0CYU5lOWbgE1SMvJOME8ngIB1JVC6DQCS3A6mmQ66z3G3VM9pCz+hHfaK MqsQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1735323967; x=1735928767; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ADXKvKpKDzwI9mOaMH3qaGZERZnHzlEkr8M4zdCnWUI=; b=JdR1N5sZtqnTzrw9LQ9cqET5Yve6AjYcXfcLoQGeppCrg7fwBPLH0cKmiggHX1a32N ev8QH6iOVlK+2jqResG2b2rgKSagl4M6NrM5okIK5ykyAd4OXMkWffklKgW9TQBMcnGt uQnysbh9MKwWmeYsbYAakthcsrGQQwnSudCYwdTkksLTvAbB9SDlc7ty4GWKfzEVyEDS 7iS6G9P2ojGkgPrbXu+rvxII6bOESFPXp272u3Nba8kdhn/aWkoZWxNclMbHAQmjTeVS 6I6CZdaYKfuoMaKWzmora6vTzwkcWznjEGD/MzjZ0zz4nV5R6TY9+2J7ojhvJa9easpa 4Aiw== X-Forwarded-Encrypted: i=1; AJvYcCWIMyfygk1FN1SEqkDQgXKBETQaoexZ1+OaejhzwsQEcMwhb84rqbV10q8zDsNogkJAqkgOovb+4A==@kvack.org X-Gm-Message-State: AOJu0YwAo83K67F4ctgCgjglAaSpCsZiiOYSFmwaTctgZuRls6ShkCCM UvmmJNCs5jQAkfWJdOKcGx31hOG2AiuI+Btbr/y5kG+PeWdxTxSEPkAJnKVg1D+4imM2tV2J0hj m2uuw8ZXK6eGtFehyCh2WIBmANQI= X-Gm-Gg: ASbGncs69oO5iNZ1LYqdEWwVOsXEwO04w3ZMaiTkU0eLc1k7Fh0oGhkl/Uv+xwfF9yx LtX+13Us8vwtcQp3OA4ze5dD0GDfTa59Nvf8aDS4= X-Google-Smtp-Source: AGHT+IGEqqOYVw4AgSs8GoKDFZuk4x/RBCaVRAJzuDPEBGIbx5+tcT1S0gfjuu6kVnYWSokD9rMmmaSTG+Qkl8z9KGs= X-Received: by 2002:a05:622a:d2:b0:467:6e45:2177 with SMTP id d75a77b69052e-46a4a8cddbdmr433357311cf.12.1735323967252; Fri, 27 Dec 2024 10:26:07 -0800 (PST) MIME-Version: 1.0 References: <43e13556-18a4-4250-b4fe-7ab736ceba7d@redhat.com> <968d3543-d8ac-4b5a-af8e-e6921311d5cf@redhat.com> <7b6b8143-d7a4-439f-ae35-a91055f9d62a@redhat.com> <2e13a67a-0bad-4795-9ac8-ee800b704cb6@fastmail.fm> <2bph7jx4hvhxpgp77shq2j7mo4xssobhqndw5v7hdvbn43jo2w@scqly5zby7bm> <71d7ac34-a5e5-4e59-802b-33d8a4256040@redhat.com> <9404aaa2-4fc2-4b8b-8f95-5604c54c162a@redhat.com> <61a4bcb1-8043-42b1-bf68-1792ee854f33@redhat.com> <166a147e-fdd7-4ea6-b545-dd8fb7ef7c2f@fastmail.fm> In-Reply-To: From: Joanne Koong Date: Fri, 27 Dec 2024 10:25:56 -0800 Message-ID: Subject: Re: [PATCH v6 4/5] mm/migrate: skip migrating folios under writeback with AS_WRITEBACK_INDETERMINATE mappings To: Bernd Schubert Cc: David Hildenbrand , Shakeel Butt , Zi Yan , miklos@szeredi.hu, linux-fsdevel@vger.kernel.org, jefflexu@linux.alibaba.com, josef@toxicpanda.com, linux-mm@kvack.org, kernel-team@meta.com, Matthew Wilcox , Oscar Salvador , Michal Hocko Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 8B5644000F X-Rspam-User: X-Rspamd-Server: rspam07 X-Stat-Signature: sf4u7ashyf3zkkna5f3iuqj5ptzh9wy8 X-HE-Tag: 1735323921-971698 X-HE-Meta: U2FsdGVkX1+VBQTeFHnL4itVMUNV8m9SVjYqLHzRYrFb+TqsguZIzrjr1+C6KW89JWfqHDAuPwQFRXjOae98PEN37Ma9n3bXGQa8GAYhnbBiU3dtfzw+o3wqDd1be/m91wNcc3Tyjy455ezfNNWBgB1/C+rR6YSOJnvCCw1rZ+kJS47KF5m4eunQ3gIqIaESAcVY52FPPn8cN25kekyXzQ9QUbzdWoQJgPc48tUm9G8US5xGMdOfApxKFG/WWNowjU8h6zuhFLlmMXQlpeMMgdD+Kaihi40yd+/VQHFJosNhbdBOlgKteS/q+UXB0BemyUaSw0X9mWq3y3NWI5hFptyzYmQVewSbSWKPCPuuMHFMYPPfaCfA/LshCgm+DBd7g1UXhNonzkHrBN69nFPlzcmZTIbH/jPIR+724igTMdiPzpWX8w4DzrA6Ysl4umvgDusQOgRpymCT/4JGwD7KB6eC0i+phLaAIo1jXrmBCC3/M7FTUh05gUNEb5Gs+eE1GDjd1kMutYtJrssOHLcDVNmFd3pQfybJknWuq0PwyDi/BYt1YlHopY0S1d4MTw5XJgKZL3JpLKHqlzn6LiKQAfz3k1er59WrVhpxM7UC7K5ed6f/yePblGMsKeWJShWHyl+4oONYjyaJyG+b4AkA0KEynjM/iPPHSZYkwFU9S8WI+llpjCclcxBGVZYhKft56Bq7GtDNLrHaL3S7ZOUgk7RGNPNBqz6rRyl+yYK4zFXebw7GZqmNkQRxpFhWUjuyBG17muOOgaDMDgJcfuV0Iz1LgA2T1rmn5cxFHNVLLKXwiNAnwsEcqzX2Aa6BBeqeTa30kXhX7nDedIVgbdcdE0evIov7ZgIbvxa01OP5iY19ns5Sgru1FPq4QjJraThRu9d86SYDblbd3NdFjQKqt2qAESaewono1KXEVmsPcrTdoeQR9sjDu7n15F2cP3AOIuSqyYqWjVhblMNNlua mKjco6BQ t2yX2C6Xvdh5wIMHRUDmhKPiCoNTZKchONkWTAx0pGFHbSLHHqz28/BjTD6DSFD02VJ6i65zlhVYOiqSj0TlfyEw6MxtgNUzd6Lx64xbNITKpbbFLh90zTNzA7EHpFPBJdhJtGVRyPXCxjilq8JUbCsNwWU0XlMyzwXEYLFGwPHB/Vib1/gyqinHriA/YED/USfyxkFepSNrNV7iq6DXdR3rDKsRPsOYXN66eD8CxE8uEhPoxxSP+/OlgdGCwbvIGDeUPYLFNYr8oE6v2zjLgn1c5LrwG1eOCmcvlwJajoxM29UGhFhdI4dL+ybrfMJ5HqmsOK2cOg2KYaHFgLhNB1fT4zhJgMtFy8IgppAMGsV7YAahnmaEUSjT2VEapG/wCJRpIzmRk0yDKYhWrRwE/QXboI5OBXv8P7TxPqr+GXOscGxzieE4cObDZEw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000121, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Dec 26, 2024 at 2:44=E2=80=AFPM Bernd Schubert wrote: > > On 12/23/24 20:00, Joanne Koong wrote: > > On Sat, Dec 21, 2024 at 1:59=E2=80=AFPM Bernd Schubert > > wrote: > >> > >> > >> > >> On 12/21/24 17:25, David Hildenbrand wrote: > >>> On 20.12.24 22:01, Joanne Koong wrote: > >>>> On Fri, Dec 20, 2024 at 6:49=E2=80=AFAM David Hildenbrand > >>>> wrote: > >>>>> > >>>>>>> I'm wondering if there would be a way to just "cancel" the > >>>>>>> writeback and > >>>>>>> mark the folio dirty again. That way it could be migrated, but no= t > >>>>>>> reclaimed. At least we could avoid the whole > >>>>>>> AS_WRITEBACK_INDETERMINATE > >>>>>>> thing. > >>>>>>> > >>>>>> > >>>>>> That is what I basically meant with short timeouts. Obviously it i= s not > >>>>>> that simple to cancel the request and to retry - it would add in q= uite > >>>>>> some complexity, if all the issues that arise can be solved at all= . > >>>>> > >>>>> At least it would keep that out of core-mm. > >>>>> > >>>>> AS_WRITEBACK_INDETERMINATE really has weird smell to it ... we shou= ld > >>>>> try to improve such scenarios, not acknowledge and integrate them, = then > >>>>> work around using timeouts that must be manually configured, and ca > >>>>> likely no be default enabled because it could hurt reasonable use > >>>>> cases :( > >>>>> > >>>>> Right now we clear the writeback flag immediately, indicating that = data > >>>>> was written back, when in fact it was not written back at all. I su= spect > >>>>> fsync() currently handles that manually already, to wait for any of= the > >>>>> allocated pages to actually get written back by user space, so we h= ave > >>>>> control over when something was *actually* written back. > >>>>> > >>>>> > >>>>> Similar to your proposal, I wonder if there could be a way to reque= st > >>>>> fuse to "abort" a writeback request (instead of using fixed timeout= s per > >>>>> request). Meaning, when we stumble over a folio that is under write= back > >>>>> on some paths, we would tell fuse to "end writeback now", or "end > >>>>> writeback now if it takes longer than X". Essentially hidden inside > >>>>> folio_wait_writeback(). > >>>>> > >>>>> When aborting a request, as I said, we would essentially "end write= back" > >>>>> and mark the folio as dirty again. The interesting thing is likely = how > >>>>> to handle user space that wants to process this request right now (= stuck > >>>>> in fuse_send_writepage() I assume?), correct? > >>>> > >>>> This would be fine if the writeback request hasn't been sent yet to > >>>> userspace but if it has and the pages are spliced > >>> > >>> Can you point me at the code where that splicing happens? > >> > >> fuse_dev_splice_read() > >> fuse_dev_do_read() > >> fuse_copy_args() > >> fuse_copy_page > >> > >> > >> Btw, for the non splice case, disabling migration should be > >> only needed while it is copying to the userspace buffer? > > > > I don't think so. We don't currently disable migration when copying > > to/from the userspace buffer for reads. > > > Sorry for my late reply. I'm confused about "reads". This discussions > is about writeback? Whether we need to disable migration for copying to/from the userspace buffers for non-tmp pages should be the same between handling reads or writes, no? That's why I brought up reads, but looking more at how fuse handles readahead and read_folio(), it looks like the folio's lock is held while it's being copied out, and IIUC that's enough to disable migration since migration will wait on the lock. So if we end writeback on the non-tmp, it seems like we'd probably need to do something similar first. > Without your patches we have tmp-pages - migration disabled on these. > With your patches we have AS_WRITEBACK_INDETERMINATE - migration > also disabled? > > I think we have two code paths > > a) fuse_dev_read - does a full buffer copy. Why do we need tmp-pages > for these at all? The only time migration must not run on these pages > while it is copying to the userspace buffer? The tmp pages were originally introduced for avoiding deadlock on reclaim and avoiding hanging sync()s as well. [1] https://lore.kernel.org/linux-kernel/bd49fcba-3eb6-4e84-a0f0-e73bce31dd= b2@linux.alibaba.com/ > > b) fuse_dev_splice_read - isn't this our real problem, as we don't > know when pages in the pipe are getting consumed? Yes, the splice case nixes the idea unfortunately. Everything else we could find a workaround for, but there's no way I can see to avoid this for splice Thanks, Joanne > > > Thanks, > Bernd >