On Wed, Jun 28, 2023 at 11:48:52AM +0100, David Howells wrote: > Fscache has an optimisation by which reads from the cache are skipped until > we know that (a) there's data there to be read and (b) that data isn't > entirely covered by pages resident in the netfs pagecache. This is done > with two flags manipulated by fscache_note_page_release(): > > if (... > test_bit(FSCACHE_COOKIE_HAVE_DATA, &cookie->flags) && > test_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags)) > clear_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags); > > where the NO_DATA_TO_READ flag causes cachefiles_prepare_read() to indicate > that netfslib should download from the server or clear the page instead. > > The fscache_note_page_release() function is intended to be called from > ->releasepage() - but that only gets called if PG_private or PG_private_2 > is set - and currently the former is at the discretion of the network > filesystem and the latter is only set whilst a page is being written to the > cache, so sometimes we miss clearing the optimisation. > > Fix this by following Willy's suggestion[1] and adding an address_space > flag, AS_RELEASE_ALWAYS, that causes filemap_release_folio() to always call > ->release_folio() if it's set, even if PG_private or PG_private_2 aren't > set. > > Note that this would require folio_test_private() and page_has_private() to > become more complicated. To avoid that, in the places[*] where these are > used to conditionalise calls to filemap_release_folio() and > try_to_release_page(), the tests are removed the those functions just > jumped to unconditionally and the test is performed there. > > [*] There are some exceptions in vmscan.c where the check guards more than > just a call to the releaser. I've added a function, folio_needs_release() > to wrap all the checks for that. > > AS_RELEASE_ALWAYS should be set if a non-NULL cookie is obtained from > fscache and cleared in ->evict_inode() before truncate_inode_pages_final() > is called. > > Additionally, the FSCACHE_COOKIE_NO_DATA_TO_READ flag needs to be cleared > and the optimisation cancelled if a cachefiles object already contains data > when we open it. > > Fixes: 1f67e6d0b188 ("fscache: Provide a function to note the release of a page") > Fixes: 047487c947e8 ("cachefiles: Implement the I/O routines") > Reported-by: Rohith Surabattula > Suggested-by: Matthew Wilcox > Signed-off-by: David Howells Hi David, I was bisecting a use-after-free BUG on the latest mm-unstable, where HEAD is 347e208de0e4 ("rmap: pass the folio to __page_check_anon_rmap()"). According to my bisection, this is the first bad commit. Use-After-Free is triggered on reclamation path when swap is enabled. (and couldn't trigger without swap enabled) the config, KASAN splat, bisect log are attached. hope this isn't too late :( > cc: Matthew Wilcox > cc: Linus Torvalds > cc: Steve French > cc: Shyam Prasad N > cc: Rohith Surabattula > cc: Dave Wysochanski > cc: Dominique Martinet > cc: Ilya Dryomov > cc: linux-cachefs@redhat.com > cc: linux-cifs@vger.kernel.org > cc: linux-afs@lists.infradead.org > cc: v9fs-developer@lists.sourceforge.net > cc: ceph-devel@vger.kernel.org > cc: linux-nfs@vger.kernel.org > cc: linux-fsdevel@vger.kernel.org > cc: linux-mm@kvack.org > --- > > Notes: > ver #7) > - Make NFS set AS_RELEASE_ALWAYS. > > ver #4) > - Split out merging of folio_has_private()/filemap_release_folio() call > pairs into a preceding patch. > - Don't need to clear AS_RELEASE_ALWAYS in ->evict_inode(). > > ver #3) > - Fixed mapping_clear_release_always() to use clear_bit() not set_bit(). > - Moved a '&&' to the correct line. > > ver #2) > - Rewrote entirely according to Willy's suggestion[1]. > > fs/9p/cache.c | 2 ++ > fs/afs/internal.h | 2 ++ > fs/cachefiles/namei.c | 2 ++ > fs/ceph/cache.c | 2 ++ > fs/nfs/fscache.c | 3 +++ > fs/smb/client/fscache.c | 2 ++ > include/linux/pagemap.h | 16 ++++++++++++++++ > mm/internal.h | 5 ++++- > 8 files changed, 33 insertions(+), 1 deletion(-) > > diff --git a/fs/9p/cache.c b/fs/9p/cache.c > index cebba4eaa0b5..12c0ae29f185 100644 > --- a/fs/9p/cache.c > +++ b/fs/9p/cache.c > @@ -68,6 +68,8 @@ void v9fs_cache_inode_get_cookie(struct inode *inode) > &path, sizeof(path), > &version, sizeof(version), > i_size_read(&v9inode->netfs.inode)); > + if (v9inode->netfs.cache) > + mapping_set_release_always(inode->i_mapping); > > p9_debug(P9_DEBUG_FSC, "inode %p get cookie %p\n", > inode, v9fs_inode_cookie(v9inode)); > diff --git a/fs/afs/internal.h b/fs/afs/internal.h > index 9d3d64921106..da73b97e19a9 100644 > --- a/fs/afs/internal.h > +++ b/fs/afs/internal.h > @@ -681,6 +681,8 @@ static inline void afs_vnode_set_cache(struct afs_vnode *vnode, > { > #ifdef CONFIG_AFS_FSCACHE > vnode->netfs.cache = cookie; > + if (cookie) > + mapping_set_release_always(vnode->netfs.inode.i_mapping); > #endif > } > > diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c > index d9d22d0ec38a..7bf7a5fcc045 100644 > --- a/fs/cachefiles/namei.c > +++ b/fs/cachefiles/namei.c > @@ -585,6 +585,8 @@ static bool cachefiles_open_file(struct cachefiles_object *object, > if (ret < 0) > goto check_failed; > > + clear_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &object->cookie->flags); > + > object->file = file; > > /* Always update the atime on an object we've just looked up (this is > diff --git a/fs/ceph/cache.c b/fs/ceph/cache.c > index 177d8e8d73fe..de1dee46d3df 100644 > --- a/fs/ceph/cache.c > +++ b/fs/ceph/cache.c > @@ -36,6 +36,8 @@ void ceph_fscache_register_inode_cookie(struct inode *inode) > &ci->i_vino, sizeof(ci->i_vino), > &ci->i_version, sizeof(ci->i_version), > i_size_read(inode)); > + if (ci->netfs.cache) > + mapping_set_release_always(inode->i_mapping); > } > > void ceph_fscache_unregister_inode_cookie(struct ceph_inode_info *ci) > diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c > index 8c35d88a84b1..b05717fe0d4e 100644 > --- a/fs/nfs/fscache.c > +++ b/fs/nfs/fscache.c > @@ -180,6 +180,9 @@ void nfs_fscache_init_inode(struct inode *inode) > &auxdata, /* aux_data */ > sizeof(auxdata), > i_size_read(inode)); > + > + if (netfs_inode(inode)->cache) > + mapping_set_release_always(inode->i_mapping); > } > > /* > diff --git a/fs/smb/client/fscache.c b/fs/smb/client/fscache.c > index 8f6909d633da..3677525ee993 100644 > --- a/fs/smb/client/fscache.c > +++ b/fs/smb/client/fscache.c > @@ -108,6 +108,8 @@ void cifs_fscache_get_inode_cookie(struct inode *inode) > &cifsi->uniqueid, sizeof(cifsi->uniqueid), > &cd, sizeof(cd), > i_size_read(&cifsi->netfs.inode)); > + if (cifsi->netfs.cache) > + mapping_set_release_always(inode->i_mapping); > } > > void cifs_fscache_unuse_inode_cookie(struct inode *inode, bool update) > diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h > index a56308a9d1a4..a1176ceb4a0c 100644 > --- a/include/linux/pagemap.h > +++ b/include/linux/pagemap.h > @@ -199,6 +199,7 @@ enum mapping_flags { > /* writeback related tags are not used */ > AS_NO_WRITEBACK_TAGS = 5, > AS_LARGE_FOLIO_SUPPORT = 6, > + AS_RELEASE_ALWAYS, /* Call ->release_folio(), even if no private data */ > }; > > /** > @@ -269,6 +270,21 @@ static inline int mapping_use_writeback_tags(struct address_space *mapping) > return !test_bit(AS_NO_WRITEBACK_TAGS, &mapping->flags); > } > > +static inline bool mapping_release_always(const struct address_space *mapping) > +{ > + return test_bit(AS_RELEASE_ALWAYS, &mapping->flags); > +} > + > +static inline void mapping_set_release_always(struct address_space *mapping) > +{ > + set_bit(AS_RELEASE_ALWAYS, &mapping->flags); > +} > + > +static inline void mapping_clear_release_always(struct address_space *mapping) > +{ > + clear_bit(AS_RELEASE_ALWAYS, &mapping->flags); > +} > + > static inline gfp_t mapping_gfp_mask(struct address_space * mapping) > { > return mapping->gfp_mask; > diff --git a/mm/internal.h b/mm/internal.h > index a76314764d8c..86aef26df905 100644 > --- a/mm/internal.h > +++ b/mm/internal.h > @@ -175,7 +175,10 @@ static inline void set_page_refcounted(struct page *page) > */ > static inline bool folio_needs_release(struct folio *folio) > { > - return folio_has_private(folio); > + struct address_space *mapping = folio->mapping; > + > + return folio_has_private(folio) || > + (mapping && mapping_release_always(mapping)); > } > > extern unsigned long highest_memmap_pfn; >