linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Efficient mapping of sparse file holes to zero-pages
@ 2025-02-20 12:48 David Frank
  2025-02-20 13:47 ` Matthew Wilcox
  0 siblings, 1 reply; 5+ messages in thread
From: David Frank @ 2025-02-20 12:48 UTC (permalink / raw)
  To: linux-mm, linux-kernel

Hi all,

I'd like to efficiently mmap a large sparse file (ext4), 95% of which
is holes. I was unsatisfied with the performance and after profiling,
I found that most of the time is spent in filemap_add_folio and
filemap_alloc_folio - much more than in my algorithm:

 - 97.87% filemap_fault
    - 97.57% do_sync_mmap_readahead
       - page_cache_ra_order
          - 97.28% page_cache_ra_unbounded
             - 40.80% filemap_add_folio
                + 21.93% __filemap_add_folio
                + 8.88% folio_add_lru
                + 7.56% workingset_refault
             + 28.73% filemap_alloc_folio
             + 22.34% read_pages
             + 3.29% xa_load

As a workaround, I started using lseek and SEEK_HOLE+SEEK_DATA and
changed the algorithm to use a static array filled with zeros instead
of reading from the holes. This works ~30x faster, however, it
introduces substantial complexity in the implementation. I was
wondering if mapping holes to zero pages with COW in the kernel is
being considered.

I found [a related thread][1] from early 2022 which mentions mapping
to zero pages for shared memory objects. There seemed to be some
concerns about the complexity, I wonder if it's different for (even
just private/readonly) mmap.

[1]: https://lore.kernel.org/lkml/4b1885b8-eb95-c50-2965-11e7c8efbf36@google.com/T/

Thanks,
David


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-02-24 16:17 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-02-20 12:48 Efficient mapping of sparse file holes to zero-pages David Frank
2025-02-20 13:47 ` Matthew Wilcox
2025-02-20 20:46   ` David Frank
2025-02-23  1:47     ` Matthew Wilcox
2025-02-24 16:17       ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox