From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D3732C47DDB for ; Sun, 28 Jan 2024 14:26:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F3C4F6B0071; Sun, 28 Jan 2024 09:26:38 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id EEC286B0072; Sun, 28 Jan 2024 09:26:38 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DDCEC6B0074; Sun, 28 Jan 2024 09:26:38 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id CF46F6B0071 for ; Sun, 28 Jan 2024 09:26:38 -0500 (EST) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id A2531A02DF for ; Sun, 28 Jan 2024 14:26:38 +0000 (UTC) X-FDA: 81728945676.05.D1ECD53 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf15.hostedemail.com (Postfix) with ESMTP id ED3A6A000D for ; Sun, 28 Jan 2024 14:26:36 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=fHLcSFud; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf15.hostedemail.com: domain of ming.lei@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=ming.lei@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706451997; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=+CMzCSV2vzJC7cGxMUfn3KLiYWQWO3h35bY6GJWrDzU=; b=JElhWcQ6xUtABTmg1Q0jNgON2D2wDazJq3DE2LRwydQ3CLqq6FfHO5eVym6Q/TUaI98i1y Bfp6bsh4YYUBclvr+Bc4w1tzT7LVJDz0uioS77zCC86Cu+As/YoHUUUWoWKn517m0xNYz4 EVareXBrAg+IfadMrBUYrOAcvGrawHo= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=fHLcSFud; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf15.hostedemail.com: domain of ming.lei@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=ming.lei@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706451997; a=rsa-sha256; cv=none; b=CrUH4emjEgMMkfUQ1bSyOU1yr/9j4gSEndUE5PnkvWZSTCDivySh+JU2/XmrlkVm/9nI+8 5fwelDkeC6sOfIhSulrc6Aj+gW0yVMh3S0Dslo6LWL3HZZWzutmOVYiKts1DID/V6uKmOe 8CZj2MbsZianOCKy+rKiZwF0S8sR24M= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1706451996; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=+CMzCSV2vzJC7cGxMUfn3KLiYWQWO3h35bY6GJWrDzU=; b=fHLcSFudny33p0tJ+z9/ILdQywKJIb8PBCi6ey+zg8b2EpOvCoF3xrLfYLgQfrGqtcGKUZ 85S3+TeN0qyayabyh3eLW70agCexRJ9PC0c+WY80WDSAWPMDd9x5NQn9UdIGp1Kq21Coxn 7k/EVi76i57yHvnvUVTc/hZ43YLCa8Y= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-286-cTWLHBBcPtCk_Sstyzs8Mg-1; Sun, 28 Jan 2024 09:26:32 -0500 X-MC-Unique: cTWLHBBcPtCk_Sstyzs8Mg-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 5176229AB411; Sun, 28 Jan 2024 14:26:32 +0000 (UTC) Received: from localhost (unknown [10.72.116.13]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7FBCB1C060AF; Sun, 28 Jan 2024 14:26:31 +0000 (UTC) From: Ming Lei To: Andrew Morton , linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Ming Lei , Mike Snitzer , Don Dutile , Raghavendra K T Subject: [RFC PATCH] mm/readahead: readahead aggressively if read drops in willneed range Date: Sun, 28 Jan 2024 22:25:22 +0800 Message-ID: <20240128142522.1524741-1-ming.lei@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.7 X-Rspamd-Queue-Id: ED3A6A000D X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: cw567859h6mq43aybebhw4pto9kfnodh X-HE-Tag: 1706451996-270240 X-HE-Meta: U2FsdGVkX182/0PxDvywoA2OcR3nJr/5grRMLQWw0uGGQNjWI0/19HxtjidO+aZkdDIN/RhZRwK3M6GJddBRU+3PTlKnjJWH2G2GmO8gh0EPbKEkFYZ8T2nhkUAnnecZfuz6uyCIZCR5BtbzO8cs9rXmLPr2ugfadz2RjKHsy01JhLfy6P+Y1oHwqavIkIDXqGEzWxykVNVLnzHv5iLqb22QKcynxCn+99wN7X18aMNInmLxRBj2S7mDfI3YHHbRZsBLr26oNj9K4der+bT+blVHppKAX/3ceXYsB1j7V4sPHN3xD/scNoxHC6M9O6SmHHMGHQ/MzcndhBHpVtraVwG8bMO7f/9wAyNYzdf8lkl2cY16c5+29iKBOzoXDIYmbnqSzS9dHd1jLx9i8dI8+CeQPcsLPIZzzSJm2Pdwf5HWfevGLsAWJdNL4W9SC7lzGwIZYymzWejt+4k/U0MvnFndpl2yoaeHRWfsi4jwHFkVUmBqHE7+hok6p/02mwzRQa4GrUBR+NMtVupdN/ViktW0+rmR0HNK51Uh2BwRcFJc09bdgN9mX4G2/zzr188cH2fiX/sdKcWxzqTHr7XAeHnlcrQsgirpaIIJFrhGEqw5zgd3/JKic79PTViOGASc5MqcVQgjhNvb9CjLwxaTMa0UIIya74KJ/gs245NwtXOY9q+2F8Mn9KwMj8lsbRvfcfOII5V7/nfPeNvGByak5Ml5E0xBphMLMndDIlMb2f3F9P7wg1lJkVnBymOnFFWtXOHdYboyxpJ/rMUHgzQM7jmMtzrNdEm9GVyjKkWS24BABlXp5jG8rA1KTYojPZXtyQcr71ZNQXZBNCaofV+4Ndt97tdHHqzen1HT24kBGJjyzgCbWQbqjcfaA2U7/LhBgxapj7mUbNDcADsfVNh6H10BnxARz0l3Pmp77OcV8Rza/WkK0txoYR7gISyxA7RYqXBa0QeRGvquR02KxJA 581xGr3w 4FAarbEmzTL7ySeUQQkf3wrWQhqU4OpxKf0q3miUYWEWI0wZbR61aKQi6//Y9YgkcHNTVC3DxlaNKpK1gWJ/wEU0c/pSAv8ZS/Xm9Ba8M8VtId39O8jreaNJ3umWmgPXENmrouFTA1FNwk9pYNrrckwvcDRKII5dmyoIIcD2BSamfiG/4dmPLfHwp6neMcN0roIpJ4lAwxPUq4iUZdMA5bXWeINh4ONt6GSJmV6zEfCEuOdzCQuXuND2rc5EF1ibO7eSt X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Since commit 6d2be915e589 ("mm/readahead.c: fix readahead failure for memoryless NUMA nodes and limit readahead max_pages"), ADV_WILLNEED only tries to readahead 512 pages, and the remained part in the advised range fallback on normal readahead. If bdi->ra_pages is set as small, readahead will perform not efficient enough. Increasing read ahead may not be an option since workload may have mixed random and sequential I/O. Improve this situation by maintaining one willneed range maple tree, if read drops in any willneed range, readahead aggressively just like what we did before commit 6d2be915e589. Cc: Mike Snitzer Cc: Don Dutile Cc: Raghavendra K T Signed-off-by: Ming Lei --- fs/file_table.c | 10 ++++++++++ include/linux/fs.h | 15 +++++++++++++++ mm/filemap.c | 5 ++++- mm/internal.h | 7 ++++++- mm/readahead.c | 32 +++++++++++++++++++++++++++++++- 5 files changed, 66 insertions(+), 3 deletions(-) diff --git a/fs/file_table.c b/fs/file_table.c index b991f90571b4..bb0303683305 100644 --- a/fs/file_table.c +++ b/fs/file_table.c @@ -61,8 +61,18 @@ struct path *backing_file_user_path(struct file *f) } EXPORT_SYMBOL_GPL(backing_file_user_path); +static inline void file_ra_free(struct file_ra_state *ra) +{ + if (ra->need_mt) { + mtree_destroy(ra->need_mt); + kfree(ra->need_mt); + ra->need_mt = NULL; + } +} + static inline void file_free(struct file *f) { + file_ra_free(&f->f_ra); security_file_free(f); if (likely(!(f->f_mode & FMODE_NOACCOUNT))) percpu_counter_dec(&nr_files); diff --git a/include/linux/fs.h b/include/linux/fs.h index ed5966a70495..bdbd16990072 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -43,6 +43,7 @@ #include #include #include +#include #include #include @@ -961,6 +962,7 @@ struct fown_struct { * @ra_pages: Maximum size of a readahead request, copied from the bdi. * @mmap_miss: How many mmap accesses missed in the page cache. * @prev_pos: The last byte in the most recent read request. + * @need_mt: maple tree for tracking WILL_NEED ranges * * When this structure is passed to ->readahead(), the "most recent" * readahead means the current readahead. @@ -972,6 +974,7 @@ struct file_ra_state { unsigned int ra_pages; unsigned int mmap_miss; loff_t prev_pos; + struct maple_tree *need_mt; }; /* @@ -983,6 +986,18 @@ static inline int ra_has_index(struct file_ra_state *ra, pgoff_t index) index < ra->start + ra->size); } +/* + * Check if @index falls in the madvise/fadvise WILLNEED window. + */ +static inline bool ra_index_in_need_range(struct file_ra_state *ra, + pgoff_t index) +{ + if (ra->need_mt) + return mtree_load(ra->need_mt, index) != NULL; + + return false; +} + /* * f_{lock,count,pos_lock} members can be highly contended and share * the same cacheline. f_{lock,mode} are very frequently used together diff --git a/mm/filemap.c b/mm/filemap.c index 750e779c23db..0ffe63d58421 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3147,7 +3147,10 @@ static struct file *do_sync_mmap_readahead(struct vm_fault *vmf) */ fpin = maybe_unlock_mmap_for_io(vmf, fpin); ra->start = max_t(long, 0, vmf->pgoff - ra->ra_pages / 2); - ra->size = ra->ra_pages; + if (ra_index_in_need_range(ra, vmf->pgoff)) + ra->size = inode_to_bdi(mapping->host)->io_pages; + else + ra->size = ra->ra_pages; ra->async_size = ra->ra_pages / 4; ractl._index = ra->start; page_cache_ra_order(&ractl, ra, 0); diff --git a/mm/internal.h b/mm/internal.h index f309a010d50f..17bd970ff23c 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -120,13 +120,18 @@ void unmap_page_range(struct mmu_gather *tlb, unsigned long addr, unsigned long end, struct zap_details *details); +void file_ra_add_need_range(struct file_ra_state *ra, pgoff_t start, + pgoff_t end); void page_cache_ra_order(struct readahead_control *, struct file_ra_state *, unsigned int order); void force_page_cache_ra(struct readahead_control *, unsigned long nr); static inline void force_page_cache_readahead(struct address_space *mapping, struct file *file, pgoff_t index, unsigned long nr_to_read) { - DEFINE_READAHEAD(ractl, file, &file->f_ra, mapping, index); + struct file_ra_state *ra = &file->f_ra; + DEFINE_READAHEAD(ractl, file, ra, mapping, index); + + file_ra_add_need_range(ra, index, index + nr_to_read); force_page_cache_ra(&ractl, nr_to_read); } diff --git a/mm/readahead.c b/mm/readahead.c index 23620c57c122..0882ceecf9ff 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -140,9 +140,38 @@ file_ra_state_init(struct file_ra_state *ra, struct address_space *mapping) { ra->ra_pages = inode_to_bdi(mapping->host)->ra_pages; ra->prev_pos = -1; + ra->need_mt = NULL; } EXPORT_SYMBOL_GPL(file_ra_state_init); +static void file_ra_setup_need_mt(struct file_ra_state *ra) +{ + struct maple_tree *mt = kzalloc(sizeof(*mt), GFP_KERNEL); + + if (!mt) + return; + + mt_init(mt); + if (cmpxchg(&ra->need_mt, NULL, mt) != NULL) + kfree(mt); +} + +/* Maintain one willneed range hint for speedup readahead */ +void file_ra_add_need_range(struct file_ra_state *ra, pgoff_t start, + pgoff_t end) +{ + /* ignore small willneed range */ + if (end - start < 4 * ra->ra_pages) + return; + + if (!ra->need_mt) + file_ra_setup_need_mt(ra); + + if (ra->need_mt) + mtree_insert_range(ra->need_mt, start, end, (void *)1, + GFP_KERNEL); +} + static void read_pages(struct readahead_control *rac) { const struct address_space_operations *aops = rac->mapping->a_ops; @@ -552,9 +581,10 @@ static void ondemand_readahead(struct readahead_control *ractl, { struct backing_dev_info *bdi = inode_to_bdi(ractl->mapping->host); struct file_ra_state *ra = ractl->ra; - unsigned long max_pages = ra->ra_pages; unsigned long add_pages; pgoff_t index = readahead_index(ractl); + unsigned long max_pages = ra_index_in_need_range(ra, index) ? + bdi->io_pages : ra->ra_pages; pgoff_t expected, prev_index; unsigned int order = folio ? folio_order(folio) : 0; -- 2.41.0