From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 047FBC636CC for ; Tue, 7 Feb 2023 22:49:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7C61A6B0093; Tue, 7 Feb 2023 17:48:59 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 74F926B0095; Tue, 7 Feb 2023 17:48:59 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5C9C86B0096; Tue, 7 Feb 2023 17:48:59 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 478206B0093 for ; Tue, 7 Feb 2023 17:48:59 -0500 (EST) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 1A3CF160912 for ; Tue, 7 Feb 2023 22:48:59 +0000 (UTC) X-FDA: 80441987598.05.02C717A Received: from mail-yw1-f182.google.com (mail-yw1-f182.google.com [209.85.128.182]) by imf15.hostedemail.com (Postfix) with ESMTP id 50FCAA0010 for ; Tue, 7 Feb 2023 22:48:57 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="GI/ToeaW"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf15.hostedemail.com: domain of surenb@google.com designates 209.85.128.182 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675810137; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=dFjw02IGyYSdDqsEhGgv/e8XgYEcv3UpXCjNfFEdXS0=; b=KDYVEROSv3wolKY2QeA3l7HwjXse34hTHf9b3szYuwFQVItJhJ6tg/ZbAbW+qtLDN0uBfy dMHxCQd/wJ5D9IWclkYhuSXQGgk9TtQ1KBUtlGeBZey8+rfyRlxwj0+JeRMQDH/0wdLdGc 8arDUW55vmJHQSmelfBJFbplWkBJkLs= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="GI/ToeaW"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf15.hostedemail.com: domain of surenb@google.com designates 209.85.128.182 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675810137; a=rsa-sha256; cv=none; b=K8x+C/GE2jyb6jQMqzcYn5hFa6B79a7eEDYEIWVf1pbAFwA08SyQYUGyDri9MP3YCH9Pzx AlgOPLk6BPYoghxIbTU+ELISO8zD7hZfgFahMjwnSIsPMGB4XHuAiK8hT0eFM/RDs8PN/T okugAOqjgewHkM0EtXtJmtZA5+pRuGA= Received: by mail-yw1-f182.google.com with SMTP id 00721157ae682-51ba4b1b9feso213568027b3.11 for ; Tue, 07 Feb 2023 14:48:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=dFjw02IGyYSdDqsEhGgv/e8XgYEcv3UpXCjNfFEdXS0=; b=GI/ToeaWmvHz6ucktkDFQkkqRo3i5y+Gl6saKyeYIQ/pC5tRn0XVLeambEGioJVxWN 74/vWrgZbak2g8mxqJ3b39fwMzpg8AXkOqsq0WBQ40wMx9RryCeR+M/ypiDwVAAt1Dov Pj3zDXQaWYwOzDzCALo7WSz5sRGFCq5NLOUyJh0BlIp558muUp/xxKIAHtnsoSNY2PcO ZzwCK0NoOP0+hbDNcUgCUFykO8tmPMGM6bEfhbkiWhw0x+AgJVSljuKKbbq0IR0XFfcy zSa1QCyz44N0mu3+I2H6VeRYq27qmbA44vPNwAsAp/el994/bY0UeNt40FwWQ200DCHA qKMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=dFjw02IGyYSdDqsEhGgv/e8XgYEcv3UpXCjNfFEdXS0=; b=ucOJukp7TEtzVsGHsY7T0ITHcAQZqM4pyqdt7CObjCLa9i5Q7V05ahhH8ny0d8Ki/i 32YXLePDSkwsHuas8yGYH8Odo4ow6Ny7qxHyIcuHKSKemI7Ep1jpIujU5AcvQZp9UJD8 DyuZ5jKQXi1ulmLUplPx1LOP5FPhfYqv0QoEgSmb2OmTS/rnelCfARbN1YrPmIvZ4ZDD z+6gv5opXfzD5R2gjfFbPehRicNxTEwtqdBdOmcTYe32FNwy23u0rM0jywy8STfzqNyt ZFWw9QO7If5xHJrgXJLUtmbCUE/BxNAYqXgcTgvjnOjI4PGNmUiCvj5KyRiC8TJHRvkn lhHg== X-Gm-Message-State: AO0yUKVOWeU9vaxtP7hdrQxiO7EUETw9cNqTws7YDInlwweRIy0VzaJ2 YFyPdURfVM+iBAXqetIfItPFDiZNy+ZMYo2tNaR1HA== X-Google-Smtp-Source: AK7set/3u4YHNlC+/83bA4p+sLvvz8D/RgSPVT+/yvkr6U5mkJh85s9pupfui0U5sJPjLsQU9F32UlhNbSMsZOMimyQ= X-Received: by 2002:a0d:d7cf:0:b0:50f:a78f:8c2 with SMTP id z198-20020a0dd7cf000000b0050fa78f08c2mr511373ywd.438.1675810136127; Tue, 07 Feb 2023 14:48:56 -0800 (PST) MIME-Version: 1.0 References: <08e04b5d2fc7a2300a52fb7fff1bc6316a53927c.1675690847.git.quic_charante@quicinc.com> In-Reply-To: <08e04b5d2fc7a2300a52fb7fff1bc6316a53927c.1675690847.git.quic_charante@quicinc.com> From: Suren Baghdasaryan Date: Tue, 7 Feb 2023 14:48:44 -0800 Message-ID: Subject: Re: [PATCH V6 2/2] mm: shmem: implement POSIX_FADV_[WILL|DONT]NEED for shmem To: Charan Teja Kalla Cc: akpm@linux-foundation.org, hughd@google.com, willy@infradead.org, markhemm@googlemail.com, rientjes@google.com, shakeelb@google.com, mhocko@suse.com, vbabka@suse.cz, quic_pkondeti@quicinc.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 50FCAA0010 X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: wxc55a6cwe38porufrc5rj5zzgndwgz9 X-HE-Tag: 1675810137-945312 X-HE-Meta: U2FsdGVkX1/27e70hL5pNaPubLmBqFwQSCcIc/bI4G+v7YdXrvyHEL/ZGiDZnOJnTPjVts1yt8ra1u5revv6o4NXEPXrD9r0gCflUKLLPV/ps3EYLiHQepK3OB8RMO60LEA7SoenpLMpaPOTMAhuS58zL7GNOKaS6JJUKFGlLRmRoUma4gBExl1A++28eXTw6p7w+R/fdUTbRCfoBcFb3BM5O4hGpBniHWTKQEWzWYdevoZ4i2kHwiaW89A9JVOlz8Z8OhWOGo1ufj+goG1qGXCI3O8aJogHhdEw+ZYUWMhpxPkCwgIUctyDZ35s9auK/EJnQVZ/RYYiyCIz9XsAoRkINwWKB/W1DUctK+9Fa4lZHB8oIPVT3fQBQaHeW9A4tN0Xrvf0Zj4XAk/gOs2rtqfFGmqfQOdk2xsuNfiCNEBQ7IMwWYEyHmklXhAUr1hpQuKQb+FZUQX2Yun7Ul6WhfnN5sjvBIZIH1KsPVzPzfzA0WwhVB0iyP9AjcafdJKuVXj1R4cAueEK0koFVZwooXxsMCG8dyFiOdp3GRq696i/xskZU0EAzhg2/IJyy+uR4ZRVO9iujkvbxRAsjX8lSoPCilOgqu5B0sVDJz3RH9mKgL9SKYv6F9qN60Kdh4aoE+L74sd+y7YUkjuw8yVNltgzJFkpUwsMoe62Btbuta5DaYS+d/9KQ33KY1uEWydzMJvF/g9izJfVEUJwRbPVZXaG2Ka5szBifRe7MMhw0YnYUDmgI/fPmKdINABdTTZhzboJxbydfWAWGtiB8LGWj5pnI0EI06ClThsQHRRf3fRyBqjsjMNT3yboJmNVjhggrmR4eoBZw//k0rUWWOOAw6CDo1CAOaqufWclrr1XzilJrMqBGlmzC/ycOghjTdvSLBgwaQ1ftqiaRH3y1SYTjZvsfOJn+DS8JMlIrKj3TGXvGBefZysL15l2CQvqTf51EHUSuHKuD+6K5uMcot7 wch42hqG lbWZySviFTOEuUSOHa8SbTBYpOUtfffCZ5jHsBK0JAOS3mH/3mC4vyKshl2tQcgSIEV3BvcB9FWSIdvB8cR92BciLLq9Zi2SUd7w0hnk9tzoZbs8Q4l2ogjL33w4AJXXm8i+xD6VZqfDiInAaWEPT8HeBwkdwsjLZcbpgwPgqV3xpx3USG991lQAwWwp3D2T7Y+YYJJh8ZXpKn9F3at7shY9L4BzarMsD1GQTFJWZSvLcgB8I9Zv5iTzOxHZ9UXEfCwT7rl4IMgFda0dQpfdVlZFz306hVcfw8/Cq0H/DJGUWzweQ2CBdBc5ME+s/sRhbStN8IB68CuaBKoILwmt7F5YVRVeFXJ1NuVRDPOhhXOG5A8y3mvYhtJAaTXydccKvV9ytuHl4qrTAPEqIBlzK7cDjin9npMb/dI4Ud77zY/0XSXsYiD0jMPM6k/im7F1dg0JD+LAZZZ8D74Y0ERKguazyAXnEtGywHBAvtkd6vyizRzIDyi77vIMEU4QfDGjqAboSag/XMQDLHcG/K8Z4cpCQUARI7hIwb/eWiNm/zXeo7tR+FIba59jMn2OWm5jKxMmDqCIIObIGvWHUkVIoil0gFNNcFCnRYH6pF7famG4zsSM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Charan, On Mon, Feb 6, 2023 at 6:22 AM Charan Teja Kalla wrote: > > Currently fadvise(2) is supported only for the files that doesn't > associated with noop_backing_dev_info thus for the files, like shmem, > fadvise results into NOP. But then there is file_operations->fadvise() > that lets the file systems to implement their own fadvise > implementation. Use this support to implement some of the POSIX_FADV_XXX > functionality for shmem files. > > This patch aims to implement POSIX_FADV_WILLNEED and POSIX_FADV_DONTNEED > advices to shmem files which can be helpful for the clients who may want > to manage the shmem pages of the files that are created through > shmem_file_setup[_with_mnt](). One usecase is implemented on the > Snapdragon SoC's running Android where the graphics client is allocating > lot of shmem pages per process and pinning them. When this process is > put to background, the instantaneous reclaim is performed on those shmem > pages using the logic implemented downstream[3][4]. With this patch, the Thanks for upstreaming this feature! > client can now issue the fadvise calls on the shmem files that does the > instantaneous reclaim which can aid the use cases like mentioned above. > > This usecase lead to ~2% reduction in average launch latencies of the > apps and 10% in total number of kills by the low memory killer running > on Android. > > Some questions asked while reviewing this patch: > Q) Can the same thing be achieved with FD mapped to user and use > madvise? > A) All drivers are not mapping all the shmem fd's to user space and want > to manage them with in the kernel. Ex: shmem memory can be mapped to the > other subsystems and they fill in the data and then give it to other > subsystem for further processing, where, the user mapping is not at all > required. A simple example, memory that is given for gpu subsystem > which can be filled directly and give to display subsystem. And the > respective drivers know well about when to keep that memory in ram or > swap based on may be a user activity. > > Q) Should we add the documentation section in Manual pages? > A) The man[1] pages for the fadvise() whatever says is also applicable > for shmem files. so couldn't feel it correct to add specific to shmem > files separately. > > Q) The proposed semantics of POSIX_FADV_DONTNEED is actually similar to > MADV_PAGEOUT and different from MADV_DONTNEED. This is a user facing API > and this difference will cause confusion? > A) man pages [2] says that "POSIX_FADV_DONTNEED attempts to free cached > pages associated with the specified region." This means on issuing this > FADV, it is expected to free the file cache pages. And it is > implementation defined If the dirty pages may be attempted to writeback. > And the unwritten dirty pages will not be freed. So, FADV_DONTNEED also > covers the semantics of MADV_PAGEOUT for file pages and there is no > purpose of PAGEOUT for file pages. > > [1] https://linux.die.net/man/2/fadvise > [2] https://man7.org/linux/man-pages/man2/posix_fadvise.2.html > [3] https://git.codelinaro.org/clo/la/platform/vendor/qcom/opensource/graphics-kernel/-/blob/gfx-kernel.lnx.1.0.r3-rel/kgsl_reclaim.c#L289 > [4] https://android.googlesource.com/kernel/common/+/refs/heads/android12-5.10/mm/shmem.c#4310 > > Signed-off-by: Charan Teja Kalla > --- > mm/shmem.c | 115 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 115 insertions(+) > > diff --git a/mm/shmem.c b/mm/shmem.c > index 0005ab2..58aa3d7 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -39,6 +39,9 @@ > #include > #include > #include > +#include > +#include > +#include > #include "swap.h" > > static struct vfsmount *shm_mnt; > @@ -2327,6 +2330,117 @@ static void shmem_set_inode_flags(struct inode *inode, unsigned int fsflags) > #define shmem_initxattrs NULL > #endif > > +static void shmem_isolate_pages_range(struct address_space *mapping, loff_t start, > + loff_t end, struct list_head *list) > +{ > + XA_STATE(xas, &mapping->i_pages, start); > + struct folio *folio; > + > + rcu_read_lock(); > + xas_for_each(&xas, folio, end) { > + if (xas_retry(&xas, folio)) > + continue; > + if (xa_is_value(folio)) > + continue; > + > + if (!folio_try_get(folio)) > + continue; > + if (folio_test_unevictable(folio) || folio_mapped(folio) || > + folio_isolate_lru(folio)) { > + folio_put(folio); > + continue; > + } > + folio_put(folio); > + > + /* > + * Prepare the page to be passed to the reclaim_pages(). > + * VM couldn't reclaim the page unless we clear PG_young. nit: Since you operate on folios now, should you update this comment as well? > + */ > + folio_clear_referenced(folio); > + folio_test_clear_young(folio); > + list_add(&folio->lru, list); > + if (need_resched()) { > + xas_pause(&xas); > + cond_resched_rcu(); > + } > + } > + rcu_read_unlock(); > +} > + > +static int shmem_fadvise_dontneed(struct address_space *mapping, loff_t start, > + loff_t end) > +{ > + LIST_HEAD(folio_list); > + > + if (!total_swap_pages) > + return 0; > + > + lru_add_drain(); > + shmem_isolate_pages_range(mapping, start, end, &folio_list); > + reclaim_pages(&folio_list); > + > + return 0; > +} > + > +static int shmem_fadvise_willneed(struct address_space *mapping, > + pgoff_t start, pgoff_t long end) > +{ > + struct page *page; > + pgoff_t index; > + > + xa_for_each_range(&mapping->i_pages, index, page, start, end) { > + if (!xa_is_value(page)) > + continue; > + page = shmem_read_mapping_page(mapping, index); > + if (!IS_ERR(page)) > + put_page(page); > + } > + > + return 0; > +} > + > +static int shmem_fadvise(struct file *file, loff_t offset, loff_t len, int advice) > +{ > + loff_t endbyte; > + pgoff_t start_index; > + pgoff_t end_index; > + struct address_space *mapping; > + struct inode *inode = file_inode(file); > + int ret = 0; > + > + if (S_ISFIFO(inode->i_mode)) > + return -ESPIPE; > + > + mapping = file->f_mapping; > + if (!mapping || len < 0 || !shmem_mapping(mapping)) > + return -EINVAL; > + > + endbyte = fadvise_calc_endbyte(offset, len); > + > + start_index = offset >> PAGE_SHIFT; > + end_index = endbyte >> PAGE_SHIFT; > + switch (advice) { > + case POSIX_FADV_DONTNEED: Should (SHMEM_I(inode)->flags & VM_LOCKED) be checked here too? > + ret = shmem_fadvise_dontneed(mapping, start_index, end_index); > + break; > + case POSIX_FADV_WILLNEED: > + ret = shmem_fadvise_willneed(mapping, start_index, end_index); > + break; > + case POSIX_FADV_NORMAL: > + case POSIX_FADV_RANDOM: > + case POSIX_FADV_SEQUENTIAL: > + case POSIX_FADV_NOREUSE: > + /* > + * No bad return value, but ignore advice. > + */ > + break; > + default: > + return -EINVAL; > + } > + > + return ret; > +} > + > static struct inode *shmem_get_inode(struct super_block *sb, struct inode *dir, > umode_t mode, dev_t dev, unsigned long flags) > { > @@ -3933,6 +4047,7 @@ static const struct file_operations shmem_file_operations = { > .splice_write = iter_file_splice_write, > .fallocate = shmem_fallocate, > #endif > + .fadvise = shmem_fadvise, > }; > > static const struct inode_operations shmem_inode_operations = { > -- > 2.7.4 >