From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B88D8C77B71 for ; Fri, 14 Apr 2023 17:44:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D08AE900002; Fri, 14 Apr 2023 13:44:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CB8AC6B0075; Fri, 14 Apr 2023 13:44:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B8092900002; Fri, 14 Apr 2023 13:44:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id A78916B0072 for ; Fri, 14 Apr 2023 13:44:33 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 771AF12029D for ; Fri, 14 Apr 2023 17:44:33 +0000 (UTC) X-FDA: 80680721226.17.D5971CE Received: from mail-ej1-f41.google.com (mail-ej1-f41.google.com [209.85.218.41]) by imf13.hostedemail.com (Postfix) with ESMTP id A1EEC20008 for ; Fri, 14 Apr 2023 17:44:31 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=3Eyw0wQN; spf=pass (imf13.hostedemail.com: domain of fvdl@google.com designates 209.85.218.41 as permitted sender) smtp.mailfrom=fvdl@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681494271; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=DbN0jpcGA5XqhM/WcDWy1vf/Fqs9pIA93z/uU3WZM4I=; b=utxv2tX8o2IaNOId9K2JeNQkkgZAeot+RqtsvHCbGgFRMGFx5Zq1pdSaVXmadMFZK2nvuJ ktutNIyUJ6Ql2/uyEB76NPw0Vlf7oplk+djCqXLCRG0QViRBP53bkkZzNSbw7x1Ec9ZBOu CH7w9/58fP8r3DltdSyJh0bbZlmlCpE= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=3Eyw0wQN; spf=pass (imf13.hostedemail.com: domain of fvdl@google.com designates 209.85.218.41 as permitted sender) smtp.mailfrom=fvdl@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1681494271; a=rsa-sha256; cv=none; b=DdtdONXFhCtKKyhSB7Ta1Z/clWvMAtkvC8GvEZGKbuwG9R7PinPc5JhPTq4z7O3PY9wTF9 r7qmzHCE4npnAaKaCaZq1TK3UQDNSzmO9JFLAOt55ayDvRpBd7IIVMytJRTm9drOjn2RGr yPjQf/L/PcCGJvWYb6rrNP0izBrC4Wg= Received: by mail-ej1-f41.google.com with SMTP id q23so38063724ejz.3 for ; Fri, 14 Apr 2023 10:44:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1681494270; x=1684086270; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=DbN0jpcGA5XqhM/WcDWy1vf/Fqs9pIA93z/uU3WZM4I=; b=3Eyw0wQNRnyWbfClJ4tTEf5mh6i6eE4NDoRK+I5X10/m9BF764kAjogFs4DvpmA1/p kqj4XpXAcvK8KKXTl+hGgLuG3FaQkhkJjhbBXhRSKIblro+q0Z7iEyiWyX7+Gju+Uo2v +G6b9RaWJr/DEqz8VfsdIGtporhgC5hVj5H9fVyq7+FSrP0vrdRkHSKboXAry5tDAvr5 O7XD8HgfnOzus9NRVrfVS4BPP1pT1zggl7iXZtw+4RhVrAjk/7FEPrxpprla1BA8rzKw t/6pGBA+8Br+mrEAmcrfc9LzH8SMBlV/2+VRKK07vOAqEdqKjZ2o8Akv/gxHIDE54lsY gxJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681494270; x=1684086270; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=DbN0jpcGA5XqhM/WcDWy1vf/Fqs9pIA93z/uU3WZM4I=; b=S3+zltzwLZA8bUG/8dDDVuuLRMAg9Dv88JCksZzVBDg+RnIgoK0qBNg8D+NSfnOoIQ hD0ptX/wgTQ4/z96HTzSqIvretr9MLquiOl8RnW/7wL+HWcbcelJiZOZ7lN7bOB+bVNp 82Q6hKQDVtcYWoXtUnNBdS5Y40e5Jjib2B9+LnVJpIQymMBTo1dzLCl8L2f3/rr9bQhD hGy1/wUDf8Ww+fCNhIFrBvZ48nG95NBjPYN4l5mGEXY09u4OxRvPRZKvMvEnFlx6wxEg d4h1+HJZofvAXRXKEQstBAdr1FldW74e54ru3FQeAH2n9UbwCVo4oyzVLRTubZQlFv8u SrrA== X-Gm-Message-State: AAQBX9frDwTQxOejzzjnhu6bQ29gW89wUbv9/VmSXk3+dpVXfkcw4c3p VaU/FAfx+38J6h0wvO0mzuGHyj9Pz0q2vZ6io3L1Jw== X-Google-Smtp-Source: AKy350aGTIVKOyJPf/0/gaGmkDnb5nRJar59+IUjNbkN+dxrLST1bm64fJmchDEhz1/n7wR8u7+NVweid9zjz/D0/g0= X-Received: by 2002:a17:906:f0cc:b0:94e:d53e:cc7c with SMTP id dk12-20020a170906f0cc00b0094ed53ecc7cmr2039026ejb.15.1681494269910; Fri, 14 Apr 2023 10:44:29 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Frank van der Linden Date: Fri, 14 Apr 2023 10:44:18 -0700 Message-ID: Subject: Re: [PATCH V7 0/2] mm: shmem: support POSIX_FADV_[WILL|DONT]NEED for shmem files To: Charan Teja Kalla Cc: akpm@linux-foundation.org, hughd@google.com, willy@infradead.org, markhemm@googlemail.com, rientjes@google.com, surenb@google.com, shakeelb@google.com, quic_pkondeti@quicinc.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: A1EEC20008 X-Stat-Signature: xtzpyamhefjn17q9oybjsh8h5f15z5kb X-HE-Tag: 1681494271-527581 X-HE-Meta: U2FsdGVkX19S47kdtOfQrtCA/X0SXlIsX7sCuirH4QaM/fSBGSGldrQls26JUTOHB/IqVLQo4Q0NJCcu4UUkpVDuDD17pT1zoWdhofeBqd1YeHBSD5BthRwbMjWPkE7f5jClFrr+2iQZgZeRFA0dXXD15wqDN4CjSZFhOg5rOYwDdbLmKAlLsCBrzW2zwv0jQp1AjZ+ixyC1wVOyXwS2xtiBVgVfzvuJR/7EsPCij6vLOp05TD5uj2hhPVu4mxNuXTaAAaTKrXZugH+od7LIVS3Zm6MvouqMm4Ai/Q+tlIKLXnic9+u0cH/gTaJGKSYl+FYRjHgQSUgQqhdruAgxzghGndBTTTMkk4b/j/ds1iU3W0zZoElzVdKtMKE9L7EepxCFdl6HX9dNfWYZfNLOWbVzOs9vYVAgeP+H/r0DevvqInQREy9adNRclL7LmLUqnSeW+ERBT8RanW1wRzbjodr27HL5sIu6yf3KE8Cjdc5EMvHoGz1YUyh+lmVQHfwDYLG+vNGrM94rFCV7KLdhQxCprYTRC8hXZlBFTucLp0N58RHm1MraRkYU5Y+kmSzwz3aX5ICdgqLofvmeg1XoM0spCnnDSJZnjNp1EwrbVd42aJpEhLecS1twvF84xnqZ26iyW5yaclhN9f3TgIXNZkJbuYc9VHjNqCiDJ2kDjgFhV5rhNXdsWqzTxie8al3Tp2aKrbAKAgo6Gyx2P/gK97lnP4LmThqHBdnTjFmYjXaYWYQVeXSZMy+56VCqyOi75/EeSikPEax8RRCjq+Dq3OAlgsoWuk5BzD/+DNQZlYdYlCuTw2BHt55s5tVPaRB2YCloeP163RlLmNmVYySDWCFfmhGh1WhIpylfRHNZl744C5NOfCGYYLTm5ctERE9E+w0Km1zcDklldN2cPaPqbAlIzBOHCw7GFgHLiz/W9F76gihWlVzdFXqmno0frqMm4Bv4fe8URZHu+DOdDx9 mdl8vjR5 sijLxLxvHdPUtnecUJr1mUC2pB9o+ylzThsdUYIA+JTWzEmjLw6JUdDOLYOw6QubemgjQ/rFIvPNR/o/O0gGeVRLs2GK92xlmgb95Z9uyxM7O3wcQn7s0KRZlytDe26Pc9XQtQSMOy/6S0fePsYwBS6PlW2V+1+rKAs79hiTh2kKG05ZTY147KZADTvgKDTyRk2zdC0+sr+ZF3meqIKvAJSElzsw4x+ODWC/Ady9Jg60/T7ocBoRU8dGjcMxFEhffcNlsBHqZmm6Ih8rScH2JxOJ+FwMscFdUd+1Pn4JixYAo+xK4FsIzbAOXBqnB6SHJ7EFYIqf/n7L5HYWnxPu8qwvVa2qf34wvKgn8WbuQVHdZGVHjY2sTHa88gMn164WCXP39jwEEGBEbSJ5HJlv5iV5pqbm65A6CgLwxYyLm9DxN6wGVdVcBOFQDi6HwM30lTJOr20E9GcXsfQTmC+7RdHw4cShW5Wk6HUcyLV163ku3gXs= X-Bogosity: Ham, tests=bogofilter, spamicity=0.001444, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Apr 13, 2023 at 12:45=E2=80=AFPM Frank van der Linden wrote: > > On Tue, Feb 14, 2023 at 4:53=E2=80=AFAM Charan Teja Kalla > wrote: > > > > This patch aims to implement POSIX_FADV_WILLNEED and POSIX_FADV_DONTNEE= D > > advices to shmem files which can be helpful for the drivers who may wan= t > > to manage the pages of shmem files on their own, like, that are created > > through shmem_file_setup[_with_mnt](). > > > > Changes in V7: > > -- Use folio based interface, shmem_read_folio(), for FADV_WILLNEED. > > -- Don't swap the SHM_LOCK'ed pages. > > > > Changes in V6: > > -- Replaced the pages with folio's for shmem changes. > > -- https://lore.kernel.org/all/cover.1675690847.git.quic_charante@quic= inc.com/ > > > > Changes in V5: > > -- Moved the 'endbyte' calculations to a header function for use by sh= mem_fadvise(). > > -- Addressed comments from suren. > > -- No changes in resend. Retested on the latest tip. > > -- https://lore.kernel.org/all/cover.1648706231.git.quic_charante@quic= inc.com/ > > > > Changes in V4: > > -- Changed the code to use reclaim_pages() to writeout the shmem page= s to swap and then reclaim. > > -- Addressed comments from Mark Hemment and Matthew. > > -- fadvise() on shmem file may even unmap a page. > > -- https://patchwork.kernel.org/project/linux-mm/patch/1644572051-240= 91-1-git-send-email-quic_charante@quicinc.com/ > > > > Changes in V3: > > -- Considered THP pages while doing FADVISE_[DONT|WILL]NEED, identifi= ed by Matthew. > > -- xarray used properly, as identified by Matthew. > > -- Excluded mapped pages as it requires unmapping and the man pages o= f fadvise don't talk about them. > > -- RESEND: Fixed the compilation issue when CONFIG_TMPFS is not defin= ed. > > -- https://patchwork.kernel.org/project/linux-mm/patch/1641488717-138= 65-1-git-send-email-quic_charante@quicinc.com/ > > > > Changes in V2: > > -- Rearranged the code to not to sleep with rcu_lock while using xas_= () functionality. > > -- Addressed the comments from Suren. > > -- https://patchwork.kernel.org/project/linux-mm/patch/1638442253-159= 1-1-git-send-email-quic_charante@quicinc.com/ > > > > changes in V1: > > -- Created the interface for fadvise(2) to work on shmem files. > > -- https://patchwork.kernel.org/project/linux-mm/patch/1633701982-223= 02-1-git-send-email-charante@codeaurora.org/ > > > > > > Charan Teja Kalla (2): > > mm: fadvise: move 'endbyte' calculations to helper function > > mm: shmem: implement POSIX_FADV_[WILL|DONT]NEED for shmem > > > > mm/fadvise.c | 11 +----- > > mm/internal.h | 21 +++++++++++ > > mm/shmem.c | 116 ++++++++++++++++++++++++++++++++++++++++++++++++++= ++++++++ > > 3 files changed, 138 insertions(+), 10 deletions(-) > > > > -- > > 2.7.4 > > > > > > I didn't see this patch before, so I looked a bit at the history. At > some point, in v3, dealing with mapped pages for DONTNEED was left > out, they are now skipped. Unfortunately, that makes this patch no > longer usable for a case that we have: restoring the (approximate) > swap state of a tmpfs file. This involves walking a potentially large > number of regions, and explicitly pushing them out to swap. This can > be used to e.g. restore the state VM memory that is backed by a tmpfs > file, avoiding memory usage by cold VM pages after resume. > > If DONTNEED also reclaims mapped pages (e.g. they get pushed out to > swap, if any), implementing the above use case efficiently is simple: > use io_uring with a vector that contains each region and the fadvise > method. > > Without DONTNEED reclaiming mapped pages, you'd have to do mmap + > madvise(MADV_PAGEOUT) for each region that you want swapped out, which > is rather inefficient. > > I understand that the semantics for POSIX_FADV_DONTNEED on shmem/tmpfs > files are open to interpretation, as it is a special case. And you can > certainly make the argument that relying on behavior caused by what > can be considered an implementation detail is bad. > > So, is there any way we can make this use case work efficiently using > this patch? > > You state in the commit message: > > > So, FADV_DONTNEED also covers the semantics of MADV_PAGEOUT for file pa= ges > > and there is no purpose of PAGEOUT for file pages. > > But that doesn't seem correct: for shmem file pages, there actually > can be a purpose, and the FADV_DONTNEED as implemented for shmem in > this patch set does not cover the semantics. > > You can say that it doesn't need to cover the pageout case of mapped > shmem pages, and that's fair. But I don't think you can claim that it > covers the case as currently implemented. > > I suppose there are three options here: > > 1) Do nothing, this use case will just have to spend more time doing > mmap+madvise > 2) Don't skip mapped pages for POSIX_FADV_DONTNEED in shmem_fadvise > 3) Implement something like POSIX_FADV_PAGEOUT_NP, which would include > mapped pages. > > What do people think? > > - Frank Hmm, actually, looking at it a bit more, there are several issues here. One is that with fadvise, you can't be sure if you are the only one dealing with the page in a mapped way(with madvise, if mapcount =3D=3D 1, that mean's it's just you, but you don't know that for fadvise, so that makes correctly dealing with mapped pages harder). Also, the madvise loop issue can probably also be dealt with via io_uring. So, I think we can deal with the use case I mentioned in a different way, that is mostly unrelated to this patchset, so basically: disregard my previous email. - Frank