From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 59267D60CEC for ; Mon, 18 Nov 2024 22:05:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C01286B0085; Mon, 18 Nov 2024 17:05:04 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BB09E6B008A; Mon, 18 Nov 2024 17:05:04 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A51086B008C; Mon, 18 Nov 2024 17:05:04 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 8766B6B0085 for ; Mon, 18 Nov 2024 17:05:04 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 41D611605D2 for ; Mon, 18 Nov 2024 22:05:04 +0000 (UTC) X-FDA: 82800595374.10.CECC6B9 Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) by imf07.hostedemail.com (Postfix) with ESMTP id 9B78640015 for ; Mon, 18 Nov 2024 22:03:52 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=imLnEUXQ; spf=pass (imf07.hostedemail.com: domain of kaleshsingh@google.com designates 209.85.214.182 as permitted sender) smtp.mailfrom=kaleshsingh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731967353; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=rZpbDY5mWrDv6OnVwiqKQCXRFl0cdal4ZeypHNgk58U=; b=oc8E7GAM3VGIiyv4pzo2Tc5zDxoMFYtRdU0RJeEfsvMg6D/8ZY7Gvl9cvIbntfIuFwzrgU rwmOSO1e909FaO2LZv7PHEEuIwImZFUM2uzLrhbCZT5v1Q4whuxZ6/Uby9+mzb5JF6l06z r1YqV8SHL7ysCHFYHoH+0k2JOoGaaMg= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731967353; a=rsa-sha256; cv=none; b=hPSQowpiHC9JdoMn7GEnRJ1DIwNHLFUD7L1jz7OCGAplafF6PX3WufaT3USLTCWOdyKkYy 898fGZxy7ReG+U6NfuJRH9X8GM2lYXQ4qO3zTajex65L7Pd4q4w92i1zWAtDdDkaNUMQzm 0A0AOEs5wHMGppqSdVpN22hKgx0cwQY= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=imLnEUXQ; spf=pass (imf07.hostedemail.com: domain of kaleshsingh@google.com designates 209.85.214.182 as permitted sender) smtp.mailfrom=kaleshsingh@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-pl1-f182.google.com with SMTP id d9443c01a7336-211f1b2bf2bso19405ad.1 for ; Mon, 18 Nov 2024 14:05:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1731967501; x=1732572301; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=rZpbDY5mWrDv6OnVwiqKQCXRFl0cdal4ZeypHNgk58U=; b=imLnEUXQpklxU6VDkMAYJjpdX5bxOBedn0NEcC0/rmltD9UkdU5TzVFSUOxITY/7Z7 FYHQKktYFMavtu54iNDNvFAUdmYmARQ5WsgnKEbDdq6AM5vjuJylPs2TDtg2geMb2xsR 8qrq7HjuSFZ0MCeAOpRnCxxwcwGtnqnj/lYSu7suwekhKLyvlvK1YMGMtJqhxp+AX1LX 1EwUf0kTqtIT5cDXUon95yQLf6GcMVwniG2b9mrX0T67lX4Wtg7ba0hgeVryiZgvz+v2 E2RHy7dOtGptM1BTiztzTjD/WHvge9xl2it39IntTh3o0SLiJp0KkXjqOC6d374OFbAn ouMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731967501; x=1732572301; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=rZpbDY5mWrDv6OnVwiqKQCXRFl0cdal4ZeypHNgk58U=; b=hDYY/Ce6JhxGsOk8JZyqAayWs6vos7+tcL01t/HLyGeMzIHwEuc2BXj/IUa1t0bM8y dUC7+apnrfnEOEX+C1rDR2PqQm31Bz9tFafU+kBSnoKWxx7pYnmw6XJABuaMwrgfs2t2 wOmr+zvFTHDhiyHdczK/n2pATWqJtu7RK8EQ2TfaNOYHEYRHIjsnUbpXRr9sX1mwTR3u +4+jOaRUTMDnA+99scGUufMK+O2HmjRhLOlAJbut9NZB0vbPbgh01Fi8+KQIXpEhsLnZ 55HltTiftxprC2AaMqUag2sv71vhd0InPKyAtyNjS9jXnwEtHtIbiqZFxDDZLrd1AyyY xTMA== X-Forwarded-Encrypted: i=1; AJvYcCUIKUvamcESpjmFyeAvCOViYUeqLRiqUXuSfi4qoNV0zLhHyecyL+rIM+LBLHNAcrRiuJKjlMsaYg==@kvack.org X-Gm-Message-State: AOJu0YxUXgHbcqg+wWsH9hl/OaFZDmk+B2klojqMzZ6Nrm+snINbv8yX 4d6ZxJG8Goe2ggKf9Agm6EKCkelKumoewrrVpCfFAUGwC+snaz8AMSaLOG6WZk4bxE2XBuj7NW5 5Usp1/dXaA2QhVIjaRsT36WKBHS5nIn7zlwRq X-Gm-Gg: ASbGncuZg/C1m2w8mJeLDNjmH7gY5h/4yc9z+5ZtZYwKOR1F0PwQc58bNJuCsjjrDDh NIOmEMsSoxwozB+JHnVTY+NcifbooIk66gi6N9VyX3/3WFBT3wF2odhKkF2D+ X-Google-Smtp-Source: AGHT+IHDU1IGm7lewxumKHELmTgTS6tIR08JzMLNeZLhHDULczS9ajfRvPk8jGcBzWoEeEzD8Ssf6fHi+EwybokLFFU= X-Received: by 2002:a17:902:db11:b0:20b:81bb:4a81 with SMTP id d9443c01a7336-2124fffa518mr437955ad.7.1731967500584; Mon, 18 Nov 2024 14:05:00 -0800 (PST) MIME-Version: 1.0 References: <20241115215256.578125-1-kaleshsingh@google.com> In-Reply-To: From: Kalesh Singh Date: Mon, 18 Nov 2024 14:04:49 -0800 Message-ID: Subject: Re: [PATCH] mm: Respect mmap hint address when aligning for THP To: Yang Shi Cc: Vlastimil Babka , kernel-team@android.com, android-mm@google.com, Andrew Morton , Yang Shi , Rik van Riel , Ryan Roberts , Suren Baghdasaryan , Minchan Kim , Hans Boehm , Lokesh Gidra , stable@vger.kernel.org, "Liam R. Howlett" , Lorenzo Stoakes , Jann Horn , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 9B78640015 X-Stat-Signature: r3c8xg9rgfjxxnhe3x6eubhhcoce3shf X-HE-Tag: 1731967432-847925 X-HE-Meta: U2FsdGVkX1+7Dz1jB5u2mlbs63tflcypHjbXMY17mlkpWyDo0RhicmAiXUccnuciu4OzQ4ro1mo8gqI2YUXrsICV0/IuH4TutoEAbXWplCA/ZJSTupWWeytLZtkZ2f2CwzxYfVsSWEDWCMRZ4KE8n5IcWuIYpc6d4/1T20u5kdDvxqOgIOjwB0XJKTuBRQAGrm72SKXsuIqYtVSH2aJXhcIMgt+hGedoa1DwcWoKA6O2g5ZHoJqNcKhXrp4i43sOiRkesJX15M3HWP9c7HxTTHe3X8UTGL3gq98kDjAlgPhAl2jViQTfC0tqzHg7kivwVk6HRgjrem6xxr6YKEJWyK0Z5FnlU48JCYThWYDUAX0+nSvfbrGdbRdOstcpTNNOU8AA+rzr6KnQLeHoBtKftt1+//KtSwawTD4YIYpM/ZhrK3eMC9TQ58uGXCieU9G16Jnm4ufPtDVmbK8X7kMetgcnZlRus+OjPb4YS/UXIQzZscVPVLk8S0WOh/G0fnnuWruQ9bsf2d2BB5p3T2Ze0Aok0PXdBAvdNdXGAR+EZwkfwRUAKuGez4T03Cp1ZyvJNjq0hePZdiZEb8MXfPTP/zwSrMtsxLgIkYmjNdhChaQGqVviYadgCACZ+ZKGKQUjpEVpkf0iqCbRO94Ynp+mL6bYOp99YqUWbOUeyfxjLtI8UypBSYsLV7Ia4K+Gr7Sw19LOApd86d86OH9EgFGhDTQtME+OZd9Fp8W/pfMoOcGOFtlPxKTxmZZ5mlu7pwesxc8aBR+DrXHyPQ36TOIZRIfKJYqcA2lLwmSv2tX+2mwWYOaW9qNPP8tyIoFa6gDc/NHd4PHdJvLXvoy4oOqNXYmvCJ7j9HUvzQa335BurQr8/fnyCaA+GVgRdCESOXJUwSvGJPyaL3U/L0VtaTucy1CVY16ciT/q1TMDlqY+aiet+JY+aYeYoQEzwfC43LsA0vqCzrIw08qhVkCtKQw ep3dhWgL 78ws9HqQvsmQZ9bJU71L4Uzgv0tXL1EiTCgWKhiqiNi4pdI7dY7cGII7goqzdSYYRTlpfzmSBI5+uwZNm32cBrVHG9Tb3rKlcUwitBXaNe1WsfXDtkcOhYtohzINCyNpvSKbypGlQQnlK2ndZ988llO94Lqk661MLwn7gQiwtav9puv1uypuU7OjZIpIafhtdv3vwMDYl1dbkKA628nYZor5+yd9PSuNTHlKJgnvwmlWowHqj9Jwas5Lh7Cz8trVhFdDx6a939zXksNhY6yJV/sJKiTydJqF8QrQ+bGELrYPQmefz9Uah6oBk40kaWzOFgYq3YWMiHnJkiatfBkOX42ep3vFBL/TkCtmTHskxIxHlXoeLZEK9F6FADG9pPLQeP/XUTE/5txztQJB9BCfI7Diu/W8UGhVSI3ddKTzhi392EtFuL2dHafJ8DJq7kqhtMEQ5JWCw5zi2iFvnIJe2fqGVWG2IRUu7aWrFnfk8N/mxUoF4aH+taxmwBVugweas2sJYlNghBPsKya18x3OrWMwv7Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Nov 18, 2024 at 1:44=E2=80=AFPM Yang Shi wrot= e: > > On Mon, Nov 18, 2024 at 9:52=E2=80=AFAM Kalesh Singh wrote: > > > > On Mon, Nov 18, 2024 at 9:05=E2=80=AFAM Yang Shi = wrote: > > > > > > On Sun, Nov 17, 2024 at 3:12=E2=80=AFAM Vlastimil Babka wrote: > > > > > > > > On 11/15/24 23:15, Yang Shi wrote: > > > > > On Fri, Nov 15, 2024 at 1:52=E2=80=AFPM Kalesh Singh wrote: > > > > >> > > > > >> Commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP > > > > >> boundaries") updated __get_unmapped_area() to align the start ad= dress > > > > >> for the VMA to a PMD boundary if CONFIG_TRANSPARENT_HUGEPAGE=3Dy= . > > > > >> > > > > >> It does this by effectively looking up a region that is of size, > > > > >> request_size + PMD_SIZE, and aligning up the start to a PMD boun= dary. > > > > >> > > > > >> Commit 4ef9ad19e176 ("mm: huge_memory: don't force huge page ali= gnment > > > > >> on 32 bit") opted out of this for 32bit due to regressions in mm= ap base > > > > >> randomization. > > > > >> > > > > >> Commit d4148aeab412 ("mm, mmap: limit THP alignment of anonymous > > > > >> mappings to PMD-aligned sizes") restricted this to only mmap siz= es that > > > > >> are multiples of the PMD_SIZE due to reported regressions in som= e > > > > >> performance benchmarks -- which seemed mostly due to the reduced= spatial > > > > >> locality of related mappings due to the forced PMD-alignment. > > > > >> > > > > >> Another unintended side effect has emerged: When a user specifie= s an mmap > > > > >> hint address, the THP alignment logic modifies the behavior, pot= entially > > > > >> ignoring the hint even if a sufficiently large gap exists at the= requested > > > > >> hint location. > > > > >> > > > > >> Example Scenario: > > > > >> > > > > >> Consider the following simplified virtual address (VA) space: > > > > >> > > > > >> ... > > > > >> > > > > >> 0x200000-0x400000 --- VMA A > > > > >> 0x400000-0x600000 --- Hole > > > > >> 0x600000-0x800000 --- VMA B > > > > >> > > > > >> ... > > > > >> > > > > >> A call to mmap() with hint=3D0x400000 and len=3D0x200000 behaves= differently: > > > > >> > > > > >> - Before THP alignment: The requested region (size 0x200000) f= its into > > > > >> the gap at 0x400000, so the hint is respected. > > > > >> > > > > >> - After alignment: The logic searches for a region of size > > > > >> 0x400000 (len + PMD_SIZE) starting at 0x400000. > > > > >> This search fails due to the mapping at 0x600000 (VMA B), an= d the hint > > > > >> is ignored, falling back to arch_get_unmapped_area[_topdown]= (). > > > > > > > > Hi all, Thanks for the reviews. > > > > > > Hmm looks like the search is not done in the optimal way regardless= of > > > > whether or not it ignores a hint - it should be able to find the ho= le, no? > > > > It's not able to find the hole in the example case because the size we > > are looking for is now > > (requested size + padding len) which is larger than the hole we have > > at the hint address. > > > > > > > > > > >> In general the hint is effectively ignored, if there is any > > > > >> existing mapping in the below range: > > > > >> > > > > >> [mmap_hint + mmap_size, mmap_hint + mmap_size + PMD_SIZE) > > > > >> > > > > >> This changes the semantics of mmap hint; from ""Respect the hint= if a > > > > >> sufficiently large gap exists at the requested location" to "Res= pect the > > > > >> hint only if an additional PMD-sized gap exists beyond the reque= sted size". > > > > >> > > > > >> This has performance implications for allocators that allocate t= heir heap > > > > >> using mmap but try to keep it "as contiguous as possible" by usi= ng the > > > > >> end of the exisiting heap as the address hint. With the new beha= vior > > > > >> it's more likely to get a much less contiguous heap, adding extr= a > > > > >> fragmentation and performance overhead. > > > > >> > > > > >> To restore the expected behavior; don't use thp_get_unmapped_are= a_vmflags() > > > > >> when the user provided a hint address. > > > > > > > > Agreed, the hint should take precendence. > > > > > > > > > Thanks for fixing it. I agree we should respect the hint address.= But > > > > > this patch actually just fixed anonymous mapping and the file map= pings > > > > > which don't support thp_get_unmapped_area(). So I think you shoul= d > > > > > move the hint check to __thp_get_unmapped_area(). > > > > > > > > > > And Vlastimil's fix d4148aeab412 ("mm, mmap: limit THP alignment = of > > > > > anonymous mappings to PMD-aligned sizes") should be moved to ther= e too > > > > > IMHO. > > > > > > > > This was brought up, but I didn't want to do it as part of the stab= le fix as > > > > that would change even situations that Rik's change didn't. > > > > If the mmap hint change is another stable hotfix, I wouldn't confla= te it > > > > either. But we can try it for further development. But careful abou= t just > > > > moving the code as-is, the file-based mappings are different than a= nonymous > > > > memory and I believe file offsets matter: > > > > > > > > https://lore.kernel.org/all/9d7c73f6-1e1a-458b-93c6-3b44959022e0@su= se.cz/ > > > > > > > > https://lore.kernel.org/all/5f7a49e8-0416-4648-a704-a7a67e8cd894@su= se.cz/ > > > > > > > I see, so I think we should keep the check here to address the > > immediate regression for anonymous mappings. > > > > I believe what we would need to address this longer term is to have > > vma_iter_lowest() [1] vma_iter_highest() [2] take into account the > > alignment when doing the search. That way we don't need to inflate the > > search size to facilitate the manual alignment after the fact. I > > haven't looked too too deeply into this, maybe Liam has some ideas > > about that? > > > > [1] https://github.com/torvalds/linux/blob/v6.12-rc7/mm/vma.h#L420 > > > > [2] https://github.com/torvalds/linux/blob/v6.12-rc7/mm/vma.h#L426 > > > > > Did some research about the history of the code, I found this commit: > > > > > > 97d3d0f9a1cf ("mm/huge_memory.c: thp: fix conflict of above-47bit hin= t > > > address and PMD alignment"), it tried to fix "the function would not > > > try to allocate PMD-aligned area if *any* hint address specified." > > > It was for file mapping back then since anonymous mapping THP > > > alignment was not supported yet. > > > > > > But it seems like this patch somehow tried to do something reverse. I= t > > > may not be correct either. > > > > > > > IIUC Kirill's patch is doing the right thing (mostly), i.e. it will > > return the hint address (without any rounding to PMD alignment). The > > case it doesn't handle is what I mentioned above, where we aren't able > > to find the hole at the hint address in the first place because the > > hole is smaller than (size + padding len) > > Yes. But my point is your fix (just simply skip PMD alignment when > hint is specified) actually broke what Kirill fixed IIUC. Hi Yang, It's true the PMD alignment is skipped in that case for the anon mappings. Though I believe that's still what we want to have here initially as we shouldn't regress the hint behaviour. I've posted the v2 here: https://lore.kernel.org/lkml/20241118214650.3667577-1-kaleshsingh@google.co= m/ > > > > > > With Vlastimis's fix, we just try to make the address THP aligned for > > > anonymous mapping when the size is PMD aligned. So we don't need to > > > take into account the padding for anonymous mapping anymore. > > > > > > > We still need to have padding length because PMD alignment of the size > > doesn't mean that the start address returned by the search will be PMD > > aligned. Inherently those are only PAGE aligned. > > Yes, you are right, I overlooked this. I think we can do this in two > passes. Use len w/o padding in the first pass. If the returned address > equals the hint or it is already PMD aligned, just return it. > Otherwise it means there is no hole with suitable size and alignment. > In the second pass, we redo it with padding. Just off the top of my > head, others may have better ideas. > You are right, it's one way we can do it. Though, I am concerned that the 2 passes will add overhead on mmap() performance. One idea I have is to move the alignment handling lower down to vma_iter_highest()/lowest(). Interested to hear your thoughts on that. We can do this in a follow up patch, which should fix file mappings as well. Thanks, Kalesh > > > > Thanks, > > Kalesh > > > > > So IIUC we should do something like: > > > > > > @@ -1085,7 +1085,11 @@ static unsigned long > > > __thp_get_unmapped_area(struct file *filp, > > > if (off_end <=3D off_align || (off_end - off_align) < size) > > > return 0; > > > > > > - len_pad =3D len + size; > > > + if (filp) > > > + len_pad =3D len + size; > > > + else > > > + len_pad =3D len; > > > + > > > if (len_pad < len || (off + len_pad) < off) > > > return 0; > > > > > > > > > > > >> Cc: Andrew Morton > > > > >> Cc: Vlastimil Babka > > > > >> Cc: Yang Shi > > > > >> Cc: Rik van Riel > > > > >> Cc: Ryan Roberts > > > > >> Cc: Suren Baghdasaryan > > > > >> Cc: Minchan Kim > > > > >> Cc: Hans Boehm > > > > >> Cc: Lokesh Gidra > > > > >> Cc: > > > > >> Fixes: efa7df3e3bb5 ("mm: align larger anonymous mappings on THP= boundaries") > > > > >> Signed-off-by: Kalesh Singh > > > > > > > > Reviewed-by: Vlastimil Babka > > > > > > > > >> --- > > > > >> mm/mmap.c | 1 + > > > > >> 1 file changed, 1 insertion(+) > > > > >> > > > > >> diff --git a/mm/mmap.c b/mm/mmap.c > > > > >> index 79d541f1502b..2f01f1a8e304 100644 > > > > >> --- a/mm/mmap.c > > > > >> +++ b/mm/mmap.c > > > > >> @@ -901,6 +901,7 @@ __get_unmapped_area(struct file *file, unsig= ned long addr, unsigned long len, > > > > >> if (get_area) { > > > > >> addr =3D get_area(file, addr, len, pgoff, flags)= ; > > > > >> } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) > > > > >> + && !addr /* no hint */ > > > > >> && IS_ALIGNED(len, PMD_SIZE)) { > > > > >> /* Ensures that larger anonymous mappings are TH= P aligned. */ > > > > >> addr =3D thp_get_unmapped_area_vmflags(file, add= r, len, > > > > >> > > > > >> base-commit: 2d5404caa8c7bb5c4e0435f94b28834ae5456623 > > > > >> -- > > > > >> 2.47.0.338.g60cca15819-goog > > > > >> > > > >