From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A76F9C77B61 for ; Thu, 13 Apr 2023 19:45:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C60EE6B0072; Thu, 13 Apr 2023 15:45:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BEAE56B0074; Thu, 13 Apr 2023 15:45:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A8A9B900002; Thu, 13 Apr 2023 15:45:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 93B6E6B0072 for ; Thu, 13 Apr 2023 15:45:40 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 6637F1C467D for ; Thu, 13 Apr 2023 19:45:40 +0000 (UTC) X-FDA: 80677397640.01.C7A82B7 Received: from mail-ed1-f53.google.com (mail-ed1-f53.google.com [209.85.208.53]) by imf10.hostedemail.com (Postfix) with ESMTP id 8B9C5C000A for ; Thu, 13 Apr 2023 19:45:37 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=ZtCKQZrL; spf=pass (imf10.hostedemail.com: domain of fvdl@google.com designates 209.85.208.53 as permitted sender) smtp.mailfrom=fvdl@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1681415137; a=rsa-sha256; cv=none; b=xBcLbJfr3D7ZLFhrLVtnoPNt3LQHEQ7fzG3mOrUwJxdz00iUIkpvtOOCS4MjlNBPyjvEzU 30lCHsW4q0YEpRK0rg4orU9L3DEr/Ur6uEuKp82EgSTOeeEzg+XahfdeK+y5LCWYKE8RIR Eg08urVtiYjDWN3oypiReFC26//Qnwc= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=ZtCKQZrL; spf=pass (imf10.hostedemail.com: domain of fvdl@google.com designates 209.85.208.53 as permitted sender) smtp.mailfrom=fvdl@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681415137; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=88EFnL+Iup5e+05FhvL6jyNeiQTCaWgRGCFvR1JP38o=; b=woN83spacjnGPycI7x4On9WJLUcIiBk2zRGsnzcIt6QEHqRIfT+OOqLQFJCy64sZ2jFET5 ZhtMwRYLobro/+58tdyFzjt5Mv0tg2qhS1JtwlqPzefDAYEl8FvDI/9MMhm3UxdbZS3fVO FViS06nTphrAhNUKq64NBLHbzj8W5UA= Received: by mail-ed1-f53.google.com with SMTP id 4fb4d7f45d1cf-5027e589e20so6719205a12.3 for ; Thu, 13 Apr 2023 12:45:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1681415136; x=1684007136; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=88EFnL+Iup5e+05FhvL6jyNeiQTCaWgRGCFvR1JP38o=; b=ZtCKQZrLgkgivZRPYxzDX2JQp/+IFqVi/4GqLXx5xrMw8e5J1faiKuLAd37SML1gUr atW417I1a9aS5z45tNWN234ACP9BudUAv3atS/FdmuCWdtXBaJ+vF4zY3rp2EjKWYsb6 1VjGiOIP6r6pUeWELZh6W0OoSgGYm+K+Urf6MRbgzL0ccP8UuNsO3s6JXzpa6Z45ftgk mKNPc5MCpOhdrjinEtpd9JIDlPeEUKbmoqG5tmrmFDcoHnY9kBjmZy0J9HVxzDll2An9 ves5Jraa7wXtv5aryYv3w3EdR5uIbmky7k787yhI9T5NRfNKDx8PAwXh8Ld2CnwtzYHx k3TQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681415136; x=1684007136; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=88EFnL+Iup5e+05FhvL6jyNeiQTCaWgRGCFvR1JP38o=; b=Ptn5XAHdqPHoIcqdNcoIkw9ivUdhbHY+Snc3Pa4y8wB96E3VgCNcLJRlviuY2ObJo5 SDXxSHRhH1XjbCfKOhOUDzEd1g6YHAJ8YH85SnkAIno8CYPW2LgwP6sw9Ie8QnPJGUnY 2C4ydxaWS5oBsQ5K2pFM2fXLmjk5C0EL/lU8QPddwZYrS+LEL0MOAtxouloNsc5i7q/J livLFUi8zugft2uOwF44KmgxaI8g9Ez5lXH4HODKE4BXyJTWIOM2RmXNR8x8sdokOfU6 ujYuWf6YiPo46DlZZ6BajnSTfS0Ihiwzi1oBW8lHtfSunYjXkGQmUGTvEx0wfND0a/Hf Bz0w== X-Gm-Message-State: AAQBX9ddkEGuX0FUte/0bjcYqNtKagVaSli2LOZp0pXKp2TJM7+JdG6C ex9GOfuvuMw8WqwXJDNEWMjuEotbAhlXo55WQl5ugA== X-Google-Smtp-Source: AKy350Yau9Uf2g3DEINeSuj0OVv5i5Z1SKO/3AzTPErsOaG/BCkUgsFxi51PEgNXrD9GsG0de0ZZVRKWhnWphmvdxLI= X-Received: by 2002:a50:aa84:0:b0:504:7094:2b59 with SMTP id q4-20020a50aa84000000b0050470942b59mr1796723edc.7.1681415135796; Thu, 13 Apr 2023 12:45:35 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Frank van der Linden Date: Thu, 13 Apr 2023 12:45:24 -0700 Message-ID: Subject: Re: [PATCH V7 0/2] mm: shmem: support POSIX_FADV_[WILL|DONT]NEED for shmem files To: Charan Teja Kalla Cc: akpm@linux-foundation.org, hughd@google.com, willy@infradead.org, markhemm@googlemail.com, rientjes@google.com, surenb@google.com, shakeelb@google.com, quic_pkondeti@quicinc.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Queue-Id: 8B9C5C000A X-Rspamd-Server: rspam01 X-Stat-Signature: s864ym8ce7r4zpbw3ez6t3n6gtnnhymb X-HE-Tag: 1681415137-70976 X-HE-Meta: U2FsdGVkX19m4BcQMOxbqyVAFKBN1USIFOzLfl8bY5mYkaiMYlHN8ra4TQg0CxYihQ0sq3mDpHv8N9MQeOXdxCSCHQBqdNhw1jFd2REWucQltb/jfrVJAeoH85ywiSAVwgIlqJJ0taympKSQXxg/3AFGBcI4j+A6mvLBzab2yg3nPSxo0ovlTHmcRlSvL4h7kiRHjVWT4v0QucR6ljOqxlZTBqA1ZddppvFvXD9jlqZMaYIgj0tpHyiG/OZuyWA6yx1pTRBlxz1sqbzGyDMScHKZu9ieh6qErprirS5dGAo5m/OFer1gxWL1CluxdGWwBFDYucoIP7vRD7a1n8xbcyhOcWbW6gehOBNxuzqaVzzPeNAhPCPWKechVKQuHEaCmqJH1sBlWoS/UjTWIRc1uigaFJiPYovsSiLHmVcJ5kWr0gPX7sqVMTauOQCh/j2h7/DGCgXUh3oT/wKi/9FAinCWyvLmFywXYPILldfpHxCbn40VOWrvJHAhnmN7r/L6ByzOUhJWuWFDs1m0ZBEOYdEkpYO+J5ko/KsUhi/Zuf6xq4dSIBVLwRK7ynN9AnMW+rKpF7BFbR4zuarX/bgHByqBeLf95nnFPMBtVakCbjyZ5sr+c7JMAX5pJnar2aNyEoHwDazBsiR6RuKnI9Iu4af8H+EbXITyIxQkaXY8+qI8WPBWVMorhQwNfLtHNOc0j4m/80HpHlY44z2hGaWmYjmaPzIOL3xR77SYEx6Rzk0xrkWP/k/sHnuwoRJvhks3G452Nf5EU4ANt0KkhnV3QNvRl8Mh53YKPY+R5Z5nhxTDCHlzni+sNyham8zd16wnSHe0RPnCvM4U+DViGYQpVV1AnvfgAnpN8bhpaLuRakrW2WomhE1s5vuH6ZUvL7MNMvdkh9p7Tgg5slQPN8UK1V8SMq2WlDn38ptGqX0rP/GfjmMIlkjmiuD8kY7+QUWteS8Q50Vj1bRRuwFe5q6 a9f35hKN JheSCxidsb7xdRqQWqIpVee3lgakVctGGkhBxrtCpdK/C+qqkB19CQMd5VoF8XZ03tUyQxGWcREW8wgQR4RK3ynB+tmjGkbWDPx7k/2hacdl95lkm29SXMKKFwu90Y/J714/2b39zwdYUyb0K35DDqdXVb8OPlHWVkLU13k0cJCPTiI3D8mRgzNfTy44SembOtUejR3f2TIFCUW5w0jo1yRhaThGPgC8sOYSzXOeuuY++Bx9kWC9KrxVF10N/IiwgqZZZxBEVdYyyBUZdDqnCtJk94q++pGy+pwwm82lzsb73liQGNVS+uJbhqo01Yt2yDIRZL8cZWJpFMLpSMNWXo+bF6uxzCn/kdlCBdcDIY5ZekcnQiFN5RTUcaNZqcWfM5tritt90ffy9FHk3NfsLmpe4wq6EDrqtBxKjz/O1vsatxC8EZQK7tQkcRZQh0rsfLFGrIUSLmg+1eQRrh2X96zKsxQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000031, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Feb 14, 2023 at 4:53=E2=80=AFAM Charan Teja Kalla wrote: > > This patch aims to implement POSIX_FADV_WILLNEED and POSIX_FADV_DONTNEED > advices to shmem files which can be helpful for the drivers who may want > to manage the pages of shmem files on their own, like, that are created > through shmem_file_setup[_with_mnt](). > > Changes in V7: > -- Use folio based interface, shmem_read_folio(), for FADV_WILLNEED. > -- Don't swap the SHM_LOCK'ed pages. > > Changes in V6: > -- Replaced the pages with folio's for shmem changes. > -- https://lore.kernel.org/all/cover.1675690847.git.quic_charante@quicin= c.com/ > > Changes in V5: > -- Moved the 'endbyte' calculations to a header function for use by shme= m_fadvise(). > -- Addressed comments from suren. > -- No changes in resend. Retested on the latest tip. > -- https://lore.kernel.org/all/cover.1648706231.git.quic_charante@quicin= c.com/ > > Changes in V4: > -- Changed the code to use reclaim_pages() to writeout the shmem pages = to swap and then reclaim. > -- Addressed comments from Mark Hemment and Matthew. > -- fadvise() on shmem file may even unmap a page. > -- https://patchwork.kernel.org/project/linux-mm/patch/1644572051-24091= -1-git-send-email-quic_charante@quicinc.com/ > > Changes in V3: > -- Considered THP pages while doing FADVISE_[DONT|WILL]NEED, identified= by Matthew. > -- xarray used properly, as identified by Matthew. > -- Excluded mapped pages as it requires unmapping and the man pages of = fadvise don't talk about them. > -- RESEND: Fixed the compilation issue when CONFIG_TMPFS is not defined= . > -- https://patchwork.kernel.org/project/linux-mm/patch/1641488717-13865= -1-git-send-email-quic_charante@quicinc.com/ > > Changes in V2: > -- Rearranged the code to not to sleep with rcu_lock while using xas_()= functionality. > -- Addressed the comments from Suren. > -- https://patchwork.kernel.org/project/linux-mm/patch/1638442253-1591-= 1-git-send-email-quic_charante@quicinc.com/ > > changes in V1: > -- Created the interface for fadvise(2) to work on shmem files. > -- https://patchwork.kernel.org/project/linux-mm/patch/1633701982-22302= -1-git-send-email-charante@codeaurora.org/ > > > Charan Teja Kalla (2): > mm: fadvise: move 'endbyte' calculations to helper function > mm: shmem: implement POSIX_FADV_[WILL|DONT]NEED for shmem > > mm/fadvise.c | 11 +----- > mm/internal.h | 21 +++++++++++ > mm/shmem.c | 116 ++++++++++++++++++++++++++++++++++++++++++++++++++++= ++++++ > 3 files changed, 138 insertions(+), 10 deletions(-) > > -- > 2.7.4 > > I didn't see this patch before, so I looked a bit at the history. At some point, in v3, dealing with mapped pages for DONTNEED was left out, they are now skipped. Unfortunately, that makes this patch no longer usable for a case that we have: restoring the (approximate) swap state of a tmpfs file. This involves walking a potentially large number of regions, and explicitly pushing them out to swap. This can be used to e.g. restore the state VM memory that is backed by a tmpfs file, avoiding memory usage by cold VM pages after resume. If DONTNEED also reclaims mapped pages (e.g. they get pushed out to swap, if any), implementing the above use case efficiently is simple: use io_uring with a vector that contains each region and the fadvise method. Without DONTNEED reclaiming mapped pages, you'd have to do mmap + madvise(MADV_PAGEOUT) for each region that you want swapped out, which is rather inefficient. I understand that the semantics for POSIX_FADV_DONTNEED on shmem/tmpfs files are open to interpretation, as it is a special case. And you can certainly make the argument that relying on behavior caused by what can be considered an implementation detail is bad. So, is there any way we can make this use case work efficiently using this patch? You state in the commit message: > So, FADV_DONTNEED also covers the semantics of MADV_PAGEOUT for file page= s > and there is no purpose of PAGEOUT for file pages. But that doesn't seem correct: for shmem file pages, there actually can be a purpose, and the FADV_DONTNEED as implemented for shmem in this patch set does not cover the semantics. You can say that it doesn't need to cover the pageout case of mapped shmem pages, and that's fair. But I don't think you can claim that it covers the case as currently implemented. I suppose there are three options here: 1) Do nothing, this use case will just have to spend more time doing mmap+madvise 2) Don't skip mapped pages for POSIX_FADV_DONTNEED in shmem_fadvise 3) Implement something like POSIX_FADV_PAGEOUT_NP, which would include mapped pages. What do people think? - Frank