From: Ryan Roberts <ryan.roberts@arm.com>
To: Kundan Kumar <kundanthebest@gmail.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: Pages doesn't belong to same large order folio in block IO path
Date: Mon, 5 Feb 2024 11:16:56 +0000 [thread overview]
Message-ID: <17f32e6d-6737-498b-9335-02d4372630ff@arm.com> (raw)
In-Reply-To: <CALYkqXrydCudnB-UYWCQB0A1uToo8dn6A8xUsw5N8f8oisUNww@mail.gmail.com>
On 05/02/2024 11:02, Kundan Kumar wrote:
[...]
>
>
> Thanks Ryan for help and good elaborate reply.
>
> I tried various combinations. Good news is mmap and aligned memory allocates
> large folio and solves the issue.
> Lets see the various cases one by one :
>
> ==============
> Aligned malloc
> ==============
> Only the align didnt solve the issue. The command I used :
> fio -iodepth=1 -iomem_align=16K -rw=write -ioengine=io_uring -direct=1 -hipri
> -bs=16K -numjobs=1 -size=16k -group_reporting -filename=/dev/nvme0n1
> -name=io_uring_test
> The block IO path has separate pages and separate folios.
> Logs
> Feb 5 15:27:32 kernel: [261992.075752] 1603 iov_iter_extract_user_pages addr =
> 55b2a0542000
This is not 16K aligned, so I'm guessing that -iomem_align is being ignored for
the malloc backend. Probably malloc has done a mmap() for the 16K without any
padding applied and the kernel has chosen a VA that is not 16K aligned so its
been populated with small folios.
> Feb 5 15:27:32 kernel: [261992.075762] 1610 iov_iter_extract_user_pages
> nr_pages = 4
> Feb 5 15:27:32 kernel: [261992.075786] 1291 __bio_iov_iter_get_pages page =
> ffffea000d9461c0 folio = ffffea000d9461c0
> Feb 5 15:27:32 kernel: [261992.075812] 1291 __bio_iov_iter_get_pages page =
> ffffea000d7ef7c0 folio = ffffea000d7ef7c0
> Feb 5 15:27:32 kernel: [261992.075836] 1291 __bio_iov_iter_get_pages page =
> ffffea000d7d30c0 folio = ffffea000d7d30c0
> Feb 5 15:27:32 kernel: [261992.075861] 1291 __bio_iov_iter_get_pages page =
> ffffea000d7f2680 folio = ffffea000d7f2680
>
>
> ==============
> Non aligned mmap
> ==============
> mmap not aligned does somewhat better, we see 3 pages from same folio
> fio -iodepth=1 -iomem=mmap -rw=write -ioengine=io_uring -direct=1 -hipri
> -bs=16K -numjobs=1 -size=16k -group_reporting -filename=/dev/nvme0n1
> -name=io_uring_test
> Feb 5 15:31:08 kernel: [262208.082789] 1603 iov_iter_extract_user_pages addr =
> 7f72bc711000
> Feb 5 15:31:08 kernel: [262208.082808] 1610 iov_iter_extract_user_pages
> nr_pages = 4
> Feb 5 15:24:31 kernel: [261811.086973] 1291 __bio_iov_iter_get_pages page =
> ffffea000aed36c0 folio = ffffea000aed36c0
> Feb 5 15:24:31 kernel: [261811.087010] 1291 __bio_iov_iter_get_pages page =
> ffffea000d2d0200 folio = ffffea000d2d0200
> Feb 5 15:24:31 kernel: [261811.087044] 1291 __bio_iov_iter_get_pages page =
> ffffea000d2d0240 folio = ffffea000d2d0200
> Feb 5 15:24:31 kernel: [261811.087078] 1291 __bio_iov_iter_get_pages page =
> ffffea000d2d0280 folio = ffffea000d2d0200
This looks strange to me. You should only get a 16K folio if the VMA has a big
enough 16K-aligned section. If you are only mmapping 16K, and its address
(7f72bc711000) is correct; then that's unaligned and you should only see small
folios. I could believe the pages are "accidentally contiguous", but then their
folios should all be different. So perhaps the program is mmapping more, and
using the first part internally? Just a guess.
>
>
> ==============
> Aligned mmap
> ==============
> mmap and aligned "-iomem_align=16K -iomem=mmap" solves the issue !!!
> Even with all the mTHP sizes enabled I see that 1 folio is present
> corresponding to the 4 pages.
>
> fio -iodepth=1 -iomem_align=16K -iomem=mmap -rw=write -ioengine=io_uring
> -direct=1 -hipri -bs=16K -numjobs=1 -size=16k -group_reporting
> -filename=/dev/nvme0n1 -name=io_uring_test
> Feb 5 15:29:36 kernel: [262115.791589] 1603 iov_iter_extract_user_pages addr =
> 7f5c9087b000
> Feb 5 15:29:36 kernel: [262115.791611] 1610 iov_iter_extract_user_pages
> nr_pages = 4
> Feb 5 15:29:36 kernel: [262115.791635] 1291 __bio_iov_iter_get_pages page =
> ffffea000e0116c0 folio = ffffea000e011600
> Feb 5 15:29:36 kernel: [262115.791696] 1291 __bio_iov_iter_get_pages page =
> ffffea000e011700 folio = ffffea000e011600
> Feb 5 15:29:36 kernel: [262115.791755] 1291 __bio_iov_iter_get_pages page =
> ffffea000e011740 folio = ffffea000e011600
> Feb 5 15:29:36 kernel: [262115.791814] 1291 __bio_iov_iter_get_pages page =
> ffffea000e011780 folio = ffffea000e011600
OK good, but addr (7f5c9087b000) is still not 16K aligned! Could this be a bug
in your logging?
>
> So it looks like normal malloc even if aligned doesn't allocate large order
> folios. Only if we do a mmap which sets the flag "OS_MAP_ANON | MAP_PRIVATE"
> then we get the same folio.
>
> I was under assumption that malloc will internally use mmap with MAP_ANON
> and we shall get same folio.
Yes it will, but it also depends on the alignment being correct.
>
>
> For just the malloc case :
>
> On another front I have logs in alloc_anon_folio. For just the malloc case I
> see allocation of 64 pages. "addr = 5654feac0000" is the address malloced by
> fio(without align and without mmap)
>
> Feb 5 15:56:56 kernel: [263756.413095] alloc_anon_folio comm=fio order = 6
> folio = ffffea000e044000 addr = 5654feac0000 vma = ffff88814cfc7c20
> Feb 5 15:56:56 kernel: [263756.413110] alloc_anon_folio comm=fio folio_nr_pages
> = 64
>
> 64 pages with be 0x40000, when added to 5654feac0000 we get 5654feb00000.
> So this range user space address shall be covered in this folio itself.
>
> And after this when IO is issued I see the user space address passed in this
> range to block IO path. But the code of iov_iter_extract_user_pages() doesnt
> fetch the same pages/folio.
> Feb 5 15:56:57 kernel: [263756.678586] 1603 iov_iter_extract_user_pages addr =
> 5654fead4000
> Feb 5 15:56:57 kernel: [263756.678606] 1610 iov_iter_extract_user_pages
> nr_pages = 4
> Feb 5 15:56:57 kernel: [263756.678629] 1291 __bio_iov_iter_get_pages page =
> ffffea000dfc2b80 folio = ffffea000dfc2b80
> Feb 5 15:56:57 kernel: [263756.678684] 1291 __bio_iov_iter_get_pages page =
> ffffea000dfc2bc0 folio = ffffea000dfc2bc0
> Feb 5 15:56:57 kernel: [263756.678738] 1291 __bio_iov_iter_get_pages page =
> ffffea000d7b9100 folio = ffffea000d7b9100
> Feb 5 15:56:57 kernel: [263756.678790] 1291 __bio_iov_iter_get_pages page =
> ffffea000d7b9140 folio = ffffea000d7b9140
>
> Please let me know your thoughts on same.
>
> --
> Kundan Kumar
prev parent reply other threads:[~2024-02-05 11:17 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-05 6:33 Kundan Kumar
2024-02-05 8:46 ` Ryan Roberts
2024-02-05 11:02 ` Kundan Kumar
2024-02-05 11:16 ` Ryan Roberts [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=17f32e6d-6737-498b-9335-02d4372630ff@arm.com \
--to=ryan.roberts@arm.com \
--cc=kundanthebest@gmail.com \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox