From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 660BAC4828D for ; Mon, 5 Feb 2024 11:17:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AC2BB6B0075; Mon, 5 Feb 2024 06:17:01 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A4B3D6B007B; Mon, 5 Feb 2024 06:17:01 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8EC1C6B007D; Mon, 5 Feb 2024 06:17:01 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 7A0D26B0075 for ; Mon, 5 Feb 2024 06:17:01 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 577891A0A12 for ; Mon, 5 Feb 2024 11:17:01 +0000 (UTC) X-FDA: 81757498242.14.E8D03E4 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf06.hostedemail.com (Postfix) with ESMTP id 4085B180012 for ; Mon, 5 Feb 2024 11:16:59 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=none; spf=pass (imf06.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1707131819; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6VQSW3Hb8/XKEOGt0oL3I8Vu6MtaOu8xzmq5UoVl0uk=; b=ial5WsfHbuinewUL3mhRdVy6Coxnmp49G/hOA7M2RZ8hDSZwEmScGaSgkWAIHrP5zL+1gC PCxv7H1vSx5CjRocwIABGtmwYBrR5WCjghGT/6/8FkXsnPqe0pfeXtr8fewymfEORwjdGx 9bkHmnqnRVX5gRUYK+2pd7NoCDaIsW4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1707131819; a=rsa-sha256; cv=none; b=1p2Ux7XZOQsoVmGxQkjtLzXnGxGY8HLzmZAkxcWtBYxmr8D0Wgpu9xH1LUuv8J2egEM3IP qiYQU0tG4o4dmW/hLQAESbD002+CNycj0fn+AKghlLmlmdl6ExRk6JBsGKDzRO18FiiFgB Ltv6cXGSXcBmfEDMdaMz9a++sjH6lNw= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=none; spf=pass (imf06.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C5CA61FB; Mon, 5 Feb 2024 03:17:40 -0800 (PST) Received: from [10.57.79.41] (unknown [10.57.79.41]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id D7D6C3F762; Mon, 5 Feb 2024 03:16:57 -0800 (PST) Message-ID: <17f32e6d-6737-498b-9335-02d4372630ff@arm.com> Date: Mon, 5 Feb 2024 11:16:56 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Pages doesn't belong to same large order folio in block IO path Content-Language: en-GB To: Kundan Kumar Cc: "linux-mm@kvack.org" References: <7b4bb92f-bd57-49c8-8b95-0e10408914fb@arm.com> From: Ryan Roberts In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Stat-Signature: b44guam6h9s4hhjbyd9swz6sm99rgnss X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 4085B180012 X-Rspam-User: X-HE-Tag: 1707131819-870571 X-HE-Meta: U2FsdGVkX1/YN4N/SYG/+OMnMayK2345x/Bpyy3tosZ3gC+xwpiA+z716PCY9VlGJQU6BkvCvVucjC7vKnlgj+v/NWvRKftAJM+qKf79PiM3p1RkBgcQebUYX7TFBayOOtEIk8QH9XvhcGlxZRNWoeBTVZUmrZOq+qCwvW7xl5uQi50vf726vmTvGft42BRqqVnIe/r8yK0k36QQkLWG9HNdt0g9z5shuG3Qrene+VWfMHt2yFfiC8/lruy9a+I+ecHnofr5crDZ0GBHvKeGsz+VWfjatS0PffuBTq46WlIk35XvhXpAdpz7aVgHzPvh3B7089JBGqgdMhE2jjTDMcdxOcjElsUEhdV3ajSvvy5yF5R/UjoQGInBTYm2Nhj/Ic2j7is3x1NQz96IHNv/m0aCL9JrKvz2uYxhjfA/ZeSl/L7RNoDGabZQnZ9egG+U/JMoy2gIGuF5ML5At9EUJEG/MIYgoUG2BFf5l4F2LfqkHccgE1wBUXhbkTsFCTlRXPmxxjzqPNONyjNUC7BaVx2SBb+zNQ57jBNiFaSxCjqJogDRC2sc+uR1y//HlmTOh/Q1jPf79f6SUADvAIYd6NywQAfn7ayR/nyjXkm/VIrbka1zmFXcniFLKBd4gklytwxpQ76n2U1QJ6nPPlMc6EFX4szZTJ2BiS5KkAo+Hm4lMH5pJBGpMQZXzt0entKzx0e+gfiZBjsOO9mVlbGjHy7anMozuKa+vOPA3jyyrEF9Pw7B5H//bhytQ3qII2RZXWocRhr6bNjg3/OKxAC07WFN5j2BhRFdsqeQUdorA+Z3uh1LBLCmVGM6flGtNXijaBMFoEdnIxK6kmISQqg+igNpu9EeWqpzKKTnTg/19T8xbPDQE0IcA6w8qovCEW5IyBlRQFMjdMBvPqouHwsI6xm0an/c8PRuGKobUJMxYEobJmD7vntj1+5J7b/MSoUD+np9gJ1oKyyneP8J89l pnbgxbdm p7sXVJkAA/7/G7qsOemH7qQpkzNjBUrIeO7yKmdJwZcPiqLXA3TieT3tRXvzGCrP8z8aH1qDpoLE3JHGxm1fIB9cmwkH8bxOKOXjmMX81Jienh1Fde0fK930uh2sGYhPINxm7H0+TykxKPtEaSOQMIvN8teW2p2Q05LDcL23FlEf95uynahsZhmXO+G4FSPwCILwyStA39AH9eZp0mBQiC6lJoQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 05/02/2024 11:02, Kundan Kumar wrote: [...] > > > Thanks Ryan for help and good elaborate reply. > > I tried various combinations. Good news is mmap and aligned memory allocates > large folio and solves the issue. > Lets see the various cases one by one : > > ============== > Aligned malloc  > ============== > Only the align didnt solve the issue. The command I used : > fio -iodepth=1 -iomem_align=16K -rw=write -ioengine=io_uring -direct=1 -hipri > -bs=16K -numjobs=1 -size=16k -group_reporting -filename=/dev/nvme0n1 > -name=io_uring_test > The block IO path has separate pages and separate folios. > Logs > Feb  5 15:27:32 kernel: [261992.075752] 1603 iov_iter_extract_user_pages addr = > 55b2a0542000 This is not 16K aligned, so I'm guessing that -iomem_align is being ignored for the malloc backend. Probably malloc has done a mmap() for the 16K without any padding applied and the kernel has chosen a VA that is not 16K aligned so its been populated with small folios. > Feb  5 15:27:32 kernel: [261992.075762] 1610 iov_iter_extract_user_pages > nr_pages = 4 > Feb  5 15:27:32 kernel: [261992.075786] 1291 __bio_iov_iter_get_pages page = > ffffea000d9461c0 folio = ffffea000d9461c0 > Feb  5 15:27:32 kernel: [261992.075812] 1291 __bio_iov_iter_get_pages page = > ffffea000d7ef7c0 folio = ffffea000d7ef7c0 > Feb  5 15:27:32 kernel: [261992.075836] 1291 __bio_iov_iter_get_pages page = > ffffea000d7d30c0 folio = ffffea000d7d30c0 > Feb  5 15:27:32 kernel: [261992.075861] 1291 __bio_iov_iter_get_pages page = > ffffea000d7f2680 folio = ffffea000d7f2680 > > > ============== > Non aligned mmap  > ============== > mmap not aligned does somewhat better, we see 3 pages from same folio > fio -iodepth=1  -iomem=mmap -rw=write -ioengine=io_uring -direct=1 -hipri > -bs=16K -numjobs=1 -size=16k -group_reporting -filename=/dev/nvme0n1 > -name=io_uring_test > Feb  5 15:31:08 kernel: [262208.082789] 1603 iov_iter_extract_user_pages addr = > 7f72bc711000 > Feb  5 15:31:08 kernel: [262208.082808] 1610 iov_iter_extract_user_pages > nr_pages = 4 > Feb  5 15:24:31 kernel: [261811.086973] 1291 __bio_iov_iter_get_pages page = > ffffea000aed36c0 folio = ffffea000aed36c0 > Feb  5 15:24:31 kernel: [261811.087010] 1291 __bio_iov_iter_get_pages page = > ffffea000d2d0200 folio = ffffea000d2d0200 > Feb  5 15:24:31 kernel: [261811.087044] 1291 __bio_iov_iter_get_pages page = > ffffea000d2d0240 folio = ffffea000d2d0200 > Feb  5 15:24:31 kernel: [261811.087078] 1291 __bio_iov_iter_get_pages page = > ffffea000d2d0280 folio = ffffea000d2d0200 This looks strange to me. You should only get a 16K folio if the VMA has a big enough 16K-aligned section. If you are only mmapping 16K, and its address (7f72bc711000) is correct; then that's unaligned and you should only see small folios. I could believe the pages are "accidentally contiguous", but then their folios should all be different. So perhaps the program is mmapping more, and using the first part internally? Just a guess. > > > ============== > Aligned mmap  > ============== > mmap and aligned "-iomem_align=16K -iomem=mmap" solves the issue !!! > Even with all the mTHP sizes enabled I see that 1 folio is present > corresponding to the 4 pages. > > fio -iodepth=1 -iomem_align=16K -iomem=mmap -rw=write -ioengine=io_uring > -direct=1 -hipri -bs=16K -numjobs=1 -size=16k -group_reporting > -filename=/dev/nvme0n1 -name=io_uring_test > Feb  5 15:29:36 kernel: [262115.791589] 1603 iov_iter_extract_user_pages addr = > 7f5c9087b000 > Feb  5 15:29:36 kernel: [262115.791611] 1610 iov_iter_extract_user_pages > nr_pages = 4 > Feb  5 15:29:36 kernel: [262115.791635] 1291 __bio_iov_iter_get_pages page = > ffffea000e0116c0 folio = ffffea000e011600 > Feb  5 15:29:36 kernel: [262115.791696] 1291 __bio_iov_iter_get_pages page = > ffffea000e011700 folio = ffffea000e011600 > Feb  5 15:29:36 kernel: [262115.791755] 1291 __bio_iov_iter_get_pages page = > ffffea000e011740 folio = ffffea000e011600 > Feb  5 15:29:36 kernel: [262115.791814] 1291 __bio_iov_iter_get_pages page = > ffffea000e011780 folio = ffffea000e011600 OK good, but addr (7f5c9087b000) is still not 16K aligned! Could this be a bug in your logging? > > So it looks like normal malloc even if aligned doesn't allocate large order > folios. Only if we do a mmap which sets the flag "OS_MAP_ANON | MAP_PRIVATE" > then we get the same folio. > > I was under assumption that malloc will internally use mmap with MAP_ANON > and we shall get same folio. Yes it will, but it also depends on the alignment being correct. > > > For just the malloc case :  > > On another front I have logs in alloc_anon_folio. For just the malloc case I > see allocation of 64 pages. "addr = 5654feac0000" is the address malloced by > fio(without align and without mmap)  > > Feb  5 15:56:56 kernel: [263756.413095] alloc_anon_folio comm=fio order = 6 > folio = ffffea000e044000 addr = 5654feac0000 vma = ffff88814cfc7c20 > Feb  5 15:56:56 kernel: [263756.413110] alloc_anon_folio comm=fio folio_nr_pages > = 64 > > 64 pages with be 0x40000, when added to 5654feac0000 we get 5654feb00000.  > So this range user space address shall be covered in this folio itself.  > > And after this when IO is issued I see the user space address passed in this > range to block IO path. But the code of iov_iter_extract_user_pages() doesnt > fetch the same pages/folio. > Feb  5 15:56:57 kernel: [263756.678586] 1603 iov_iter_extract_user_pages addr = > 5654fead4000 > Feb  5 15:56:57 kernel: [263756.678606] 1610 iov_iter_extract_user_pages > nr_pages = 4 > Feb  5 15:56:57 kernel: [263756.678629] 1291 __bio_iov_iter_get_pages page = > ffffea000dfc2b80 folio = ffffea000dfc2b80 > Feb  5 15:56:57 kernel: [263756.678684] 1291 __bio_iov_iter_get_pages page = > ffffea000dfc2bc0 folio = ffffea000dfc2bc0 > Feb  5 15:56:57 kernel: [263756.678738] 1291 __bio_iov_iter_get_pages page = > ffffea000d7b9100 folio = ffffea000d7b9100 > Feb  5 15:56:57 kernel: [263756.678790] 1291 __bio_iov_iter_get_pages page = > ffffea000d7b9140 folio = ffffea000d7b9140 > > Please let me know your thoughts on same. > > -- > Kundan Kumar