From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 35AA6D60CEB for ; Mon, 18 Nov 2024 21:44:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4A8E86B007B; Mon, 18 Nov 2024 16:44:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 458906B0082; Mon, 18 Nov 2024 16:44:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2F88E6B0083; Mon, 18 Nov 2024 16:44:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 0EA386B007B for ; Mon, 18 Nov 2024 16:44:45 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 78912805E6 for ; Mon, 18 Nov 2024 21:44:44 +0000 (UTC) X-FDA: 82800545226.14.1AB3627 Received: from mail-ed1-f43.google.com (mail-ed1-f43.google.com [209.85.208.43]) by imf27.hostedemail.com (Postfix) with ESMTP id 6347F4000D for ; Mon, 18 Nov 2024 21:43:51 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=C1meJtwR; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf27.hostedemail.com: domain of shy828301@gmail.com designates 209.85.208.43 as permitted sender) smtp.mailfrom=shy828301@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731966099; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=yRfriMQu+9mTxB6xkzjxY3uVJzxTJgqFClD86fzO5dc=; b=K+QfH8uXC9pntB1iNjA4CTysP8ZrHseF36cPFUFQPvCQogl7RoNLM4+6yv0vV0h0O8pnha IRVI4yTzpmejR6WXRFDFk+/wK/U1ZJptkBpiXPmloYvV5p/yVf+koOuvJr+8jAAVXuwKCC 6xCzqtdIc5KPNwbm8ZsacBzLDTX6C3U= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=C1meJtwR; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf27.hostedemail.com: domain of shy828301@gmail.com designates 209.85.208.43 as permitted sender) smtp.mailfrom=shy828301@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731966099; a=rsa-sha256; cv=none; b=hfopdcXqxxT4mX9+YLQr0w2DWWX0uJUCwi++E5fnd5wBb9oHMcNL+dfwwmNvgEH08eiS9p rkBRnIN1Z63mPyYcT8UIhA6QvcDiMHQXa9PPjltq2rmoIKxnECX7zdzBoA0pJF5REmBZYK MRy5JLTvam1Rr5HqpjwDGZ/nMzOn/oY= Received: by mail-ed1-f43.google.com with SMTP id 4fb4d7f45d1cf-5cedf5fe237so3028731a12.3 for ; Mon, 18 Nov 2024 13:44:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1731966281; x=1732571081; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=yRfriMQu+9mTxB6xkzjxY3uVJzxTJgqFClD86fzO5dc=; b=C1meJtwRZ35B0or/w9eiIyPovDrpWfVO4FCUNPXj5XFxSny2CE/PJjt5aEFTDBH6LC y5YKTC+qd+csVDCfGQylUy83avbThzzqVE0Yk4ITAOJXdN1knTsO7+VyD7PvbHdqyAl+ Lx40M/zcxRTYRT5ipyT/RhV0ul8FvtGnB7+jw9i11TK/xwoXwRPGE+cH/XHnx4+G5bn7 FxsRT73IHUvh+tVBafolLhzyofO5xagDs5SwoA1rjhftvqh0YckrkqyUCxa+00NE7Ef3 QjKK7a+s2/Y1sn9bvFPltsD6sIqmWGU8zJk9GIC1mXfpGfj4BIZ+qsdwe2Gg0iFn7NIU /xYw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731966281; x=1732571081; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=yRfriMQu+9mTxB6xkzjxY3uVJzxTJgqFClD86fzO5dc=; b=UXpnPZtiQYpkllaiZD7vanBBM1oV0WJwRzkN3uEcfhdEnmN4dOs3C/GmP/uX2Xlh9y 094B9w57ZHofoNVaKDn7vlnXuDyXZU8ivXCjxWxRaYUbEpR9hZ/8iL8t+agv+2525tFI BJ6BQiObo0kIkqfiSzPnRg8Ql9AwAqCSqMAdQOUE7+9bOMRtL0La7l5Xk2q1dq17YSOe nHG4LgqKwhd42J06v6ihia5rz1pAINKYRZK0CnfBnyxWEj8IlndIiKR/vBi+qWOJcl4L Q4jI3u9O6Xm4LEIMolwX1BbnrK6LmJUtXo/TOec12ezoLtJ6ajqS6CMIei2FCO8LJeSi xeXQ== X-Forwarded-Encrypted: i=1; AJvYcCUJ4IrFuzXNZ2FwGBfIoJzTQQu/npC+t3jf7pXQHA1pITxTrr6G5goKDTQ3n0HDg1O/we4gEgQY0g==@kvack.org X-Gm-Message-State: AOJu0YzkG+f7kHXzesMy5AI3+ym0VrN/oi5vo+70hyHHHrqb1+MNeeAG Nk91P3GZMiZlr6UutgfsfntZwbPplDuArUA3Z2exx9NbbmPDIp4gWYnL/Nmqdncp0LXm2+On+pp 5wetrPLjjzPkQW2aGp0ZV4aX3Fs4= X-Google-Smtp-Source: AGHT+IGILNYbmfG19eVFGOYb5KN1Iq5ZhEmhjDA+8mqCgpJJZjE1CC2wXLnImgrqYrI+QLReoyoTBJnAsy5N2wM+IeA= X-Received: by 2002:a05:6402:2695:b0:5cf:e022:24f5 with SMTP id 4fb4d7f45d1cf-5cfe022337amr150632a12.2.1731966280598; Mon, 18 Nov 2024 13:44:40 -0800 (PST) MIME-Version: 1.0 References: <20241115215256.578125-1-kaleshsingh@google.com> In-Reply-To: From: Yang Shi Date: Mon, 18 Nov 2024 13:44:29 -0800 Message-ID: Subject: Re: [PATCH] mm: Respect mmap hint address when aligning for THP To: Kalesh Singh Cc: Vlastimil Babka , kernel-team@android.com, android-mm@google.com, Andrew Morton , Yang Shi , Rik van Riel , Ryan Roberts , Suren Baghdasaryan , Minchan Kim , Hans Boehm , Lokesh Gidra , stable@vger.kernel.org, "Liam R. Howlett" , Lorenzo Stoakes , Jann Horn , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 6347F4000D X-Stat-Signature: f3rwuofhe6kt7qddtgu6csuatk98hw81 X-Rspam-User: X-HE-Tag: 1731966231-647674 X-HE-Meta: U2FsdGVkX1/yNqybmpwAJVvJXISCLCpGfsOLI8qSwnwU2NCOEvDnZnPILmWMoEycrZlv2kDzjpBUPjxQF+dCPgNeOIOYiSd3A9Egghl8pTcQWZ/K7I5fAJ94oZrLdaIw7/ZGH6tD5YJdQCMAh6N5xPymD4X6rIlf6qn+sTHsbXokO3nnrXmGPrGGsvE/gHtmE8rGED38AIHwV38cC/XUCveya4Jo5x/+0xZrx2W4PDl1nK3Gf3W+Tqdab6KSZGvPEKui/4K1k2KBkSmGmaI09VQo0SJ3tEP/xsZCV9hjhys9Ix9EXRtj5CsriM+nHs5dXNmUKW8IqRF7D8iUeu2Or4PI+sZWRezbGdA+S+m7vX6/cpstfeHhOK3dGW914oWrZOp4jfrIckvOKV9aBbqmHHDQ13djIGIhVPQaf9yP1wbUiFjD0fiiIAljIrriRCFXcIA1AtraI1c/gjg4xTyB+6iXVmochAiZjghGZJiuZ6WS3UOUDF4DJfcZpYpM0JnSuYgVlX3E8Q+jGBf2ciyrh24Zj1Kgdjxf/4/CdBLjwqlMCrNamnHNyfa7YR2mRRrkbb0qnerPjwaXX0MBEZXcAYSrHZxODhNHNaXLgyRQNebvJxVOlquf/CwT53RuCphmSgQjMbnsZ/FhTga6GvY6M0wtUmaSKkGrI0N3SzveiMnvQ1AFQ6kNKcp4MVNZiwXTwPzXyOi4nhq5UKDavecSyX/fpGUNl0UwRygJHOutwO/v40jSsASn6yGlf0qVMxtuiXWuDoOjZ+t4BLdfy8zCtX9/WtdS3kTr5znphEaWQRjnrrH71pCirlM+Ju83q4n5+lhhW7g5Tu434KUSPYXkVcKzgRDp7zPcB8tE9VRbG31ht+chmEp/70khV9f6I3tvUTo3QA5zNyh3/nohp1Nr2cUxiKQsb0jzGa9znE46n4OLPDl/W38K5OneGWDKrB6WW0VvlexzBUuIkzFfj2a X8q8xwAp 822hfrEXpGa1lMTmgyuU5yovm4Mg1abQmrpDxoV9nPKlZ6sC2SQVWGPhq0vQ+XxjtSAEbPD6UP2ZjulBu6YFrig9bgZQlgLbW70+w15bnDoPjJjzwgDG4qm4YMyllpAhYsXtq5scGCUUOCy59nd/+JO8k6xwnUg2ulN+AZIu062dj6w0BV/+djg16Phu5zgP4NyEcJVowpl4TlHgzSkzBdcYXftNM0uYO+2Lm64FHlbrBAwemJ8WdmTuQlL8XPipgrNysR1kAh4mEkV9f0b6bE+TR8vr1Nzv34G2skqw3KOu3cA5hgcIjV0cgXUbNWfNLGjc4Oc2mdyAzHWhEPtvf1PDbd83lZjsvXtrnTaNePXIK76KZb8eEt460/ZdEcitr20aUlzYWTSkukr24ESQjgcVGdLQiGywLnnuk4TmU1MNFlSbXusjHlYVDxTBOmsBWfk/8X445rpKwV5ZCmlZuvNGPVYOa9Bms0Wmv+5CQEn6juERBtwgFM6Hx6suIy6xxjL2ydTMe2Mm9Nj3aPnyzYXvspfMRWB1iDt0AB9K1WhPv2SQIMWyND910316hU20iBCHwAr/Fgaks+Z1kGVhmWmn3l1DJEk9/7nEVF4uYYaH3ra9tC8Gl18uSLEj3Ey6hGyrXZCEB163IIkxJNUza00tvmg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Nov 18, 2024 at 9:52=E2=80=AFAM Kalesh Singh wrote: > > On Mon, Nov 18, 2024 at 9:05=E2=80=AFAM Yang Shi wr= ote: > > > > On Sun, Nov 17, 2024 at 3:12=E2=80=AFAM Vlastimil Babka wrote: > > > > > > On 11/15/24 23:15, Yang Shi wrote: > > > > On Fri, Nov 15, 2024 at 1:52=E2=80=AFPM Kalesh Singh wrote: > > > >> > > > >> Commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP > > > >> boundaries") updated __get_unmapped_area() to align the start addr= ess > > > >> for the VMA to a PMD boundary if CONFIG_TRANSPARENT_HUGEPAGE=3Dy. > > > >> > > > >> It does this by effectively looking up a region that is of size, > > > >> request_size + PMD_SIZE, and aligning up the start to a PMD bounda= ry. > > > >> > > > >> Commit 4ef9ad19e176 ("mm: huge_memory: don't force huge page align= ment > > > >> on 32 bit") opted out of this for 32bit due to regressions in mmap= base > > > >> randomization. > > > >> > > > >> Commit d4148aeab412 ("mm, mmap: limit THP alignment of anonymous > > > >> mappings to PMD-aligned sizes") restricted this to only mmap sizes= that > > > >> are multiples of the PMD_SIZE due to reported regressions in some > > > >> performance benchmarks -- which seemed mostly due to the reduced s= patial > > > >> locality of related mappings due to the forced PMD-alignment. > > > >> > > > >> Another unintended side effect has emerged: When a user specifies = an mmap > > > >> hint address, the THP alignment logic modifies the behavior, poten= tially > > > >> ignoring the hint even if a sufficiently large gap exists at the r= equested > > > >> hint location. > > > >> > > > >> Example Scenario: > > > >> > > > >> Consider the following simplified virtual address (VA) space: > > > >> > > > >> ... > > > >> > > > >> 0x200000-0x400000 --- VMA A > > > >> 0x400000-0x600000 --- Hole > > > >> 0x600000-0x800000 --- VMA B > > > >> > > > >> ... > > > >> > > > >> A call to mmap() with hint=3D0x400000 and len=3D0x200000 behaves d= ifferently: > > > >> > > > >> - Before THP alignment: The requested region (size 0x200000) fit= s into > > > >> the gap at 0x400000, so the hint is respected. > > > >> > > > >> - After alignment: The logic searches for a region of size > > > >> 0x400000 (len + PMD_SIZE) starting at 0x400000. > > > >> This search fails due to the mapping at 0x600000 (VMA B), and = the hint > > > >> is ignored, falling back to arch_get_unmapped_area[_topdown]()= . > > > > > Hi all, Thanks for the reviews. > > > > Hmm looks like the search is not done in the optimal way regardless o= f > > > whether or not it ignores a hint - it should be able to find the hole= , no? > > It's not able to find the hole in the example case because the size we > are looking for is now > (requested size + padding len) which is larger than the hole we have > at the hint address. > > > > > > > >> In general the hint is effectively ignored, if there is any > > > >> existing mapping in the below range: > > > >> > > > >> [mmap_hint + mmap_size, mmap_hint + mmap_size + PMD_SIZE) > > > >> > > > >> This changes the semantics of mmap hint; from ""Respect the hint i= f a > > > >> sufficiently large gap exists at the requested location" to "Respe= ct the > > > >> hint only if an additional PMD-sized gap exists beyond the request= ed size". > > > >> > > > >> This has performance implications for allocators that allocate the= ir heap > > > >> using mmap but try to keep it "as contiguous as possible" by using= the > > > >> end of the exisiting heap as the address hint. With the new behavi= or > > > >> it's more likely to get a much less contiguous heap, adding extra > > > >> fragmentation and performance overhead. > > > >> > > > >> To restore the expected behavior; don't use thp_get_unmapped_area_= vmflags() > > > >> when the user provided a hint address. > > > > > > Agreed, the hint should take precendence. > > > > > > > Thanks for fixing it. I agree we should respect the hint address. B= ut > > > > this patch actually just fixed anonymous mapping and the file mappi= ngs > > > > which don't support thp_get_unmapped_area(). So I think you should > > > > move the hint check to __thp_get_unmapped_area(). > > > > > > > > And Vlastimil's fix d4148aeab412 ("mm, mmap: limit THP alignment of > > > > anonymous mappings to PMD-aligned sizes") should be moved to there = too > > > > IMHO. > > > > > > This was brought up, but I didn't want to do it as part of the stable= fix as > > > that would change even situations that Rik's change didn't. > > > If the mmap hint change is another stable hotfix, I wouldn't conflate= it > > > either. But we can try it for further development. But careful about = just > > > moving the code as-is, the file-based mappings are different than ano= nymous > > > memory and I believe file offsets matter: > > > > > > https://lore.kernel.org/all/9d7c73f6-1e1a-458b-93c6-3b44959022e0@suse= .cz/ > > > > > > https://lore.kernel.org/all/5f7a49e8-0416-4648-a704-a7a67e8cd894@suse= .cz/ > > > > I see, so I think we should keep the check here to address the > immediate regression for anonymous mappings. > > I believe what we would need to address this longer term is to have > vma_iter_lowest() [1] vma_iter_highest() [2] take into account the > alignment when doing the search. That way we don't need to inflate the > search size to facilitate the manual alignment after the fact. I > haven't looked too too deeply into this, maybe Liam has some ideas > about that? > > [1] https://github.com/torvalds/linux/blob/v6.12-rc7/mm/vma.h#L420 > > [2] https://github.com/torvalds/linux/blob/v6.12-rc7/mm/vma.h#L426 > > > Did some research about the history of the code, I found this commit: > > > > 97d3d0f9a1cf ("mm/huge_memory.c: thp: fix conflict of above-47bit hint > > address and PMD alignment"), it tried to fix "the function would not > > try to allocate PMD-aligned area if *any* hint address specified." > > It was for file mapping back then since anonymous mapping THP > > alignment was not supported yet. > > > > But it seems like this patch somehow tried to do something reverse. It > > may not be correct either. > > > > IIUC Kirill's patch is doing the right thing (mostly), i.e. it will > return the hint address (without any rounding to PMD alignment). The > case it doesn't handle is what I mentioned above, where we aren't able > to find the hole at the hint address in the first place because the > hole is smaller than (size + padding len) Yes. But my point is your fix (just simply skip PMD alignment when hint is specified) actually broke what Kirill fixed IIUC. > > > With Vlastimis's fix, we just try to make the address THP aligned for > > anonymous mapping when the size is PMD aligned. So we don't need to > > take into account the padding for anonymous mapping anymore. > > > > We still need to have padding length because PMD alignment of the size > doesn't mean that the start address returned by the search will be PMD > aligned. Inherently those are only PAGE aligned. Yes, you are right, I overlooked this. I think we can do this in two passes. Use len w/o padding in the first pass. If the returned address equals the hint or it is already PMD aligned, just return it. Otherwise it means there is no hole with suitable size and alignment. In the second pass, we redo it with padding. Just off the top of my head, others may have better ideas. > > Thanks, > Kalesh > > > So IIUC we should do something like: > > > > @@ -1085,7 +1085,11 @@ static unsigned long > > __thp_get_unmapped_area(struct file *filp, > > if (off_end <=3D off_align || (off_end - off_align) < size) > > return 0; > > > > - len_pad =3D len + size; > > + if (filp) > > + len_pad =3D len + size; > > + else > > + len_pad =3D len; > > + > > if (len_pad < len || (off + len_pad) < off) > > return 0; > > > > > > > > >> Cc: Andrew Morton > > > >> Cc: Vlastimil Babka > > > >> Cc: Yang Shi > > > >> Cc: Rik van Riel > > > >> Cc: Ryan Roberts > > > >> Cc: Suren Baghdasaryan > > > >> Cc: Minchan Kim > > > >> Cc: Hans Boehm > > > >> Cc: Lokesh Gidra > > > >> Cc: > > > >> Fixes: efa7df3e3bb5 ("mm: align larger anonymous mappings on THP b= oundaries") > > > >> Signed-off-by: Kalesh Singh > > > > > > Reviewed-by: Vlastimil Babka > > > > > > >> --- > > > >> mm/mmap.c | 1 + > > > >> 1 file changed, 1 insertion(+) > > > >> > > > >> diff --git a/mm/mmap.c b/mm/mmap.c > > > >> index 79d541f1502b..2f01f1a8e304 100644 > > > >> --- a/mm/mmap.c > > > >> +++ b/mm/mmap.c > > > >> @@ -901,6 +901,7 @@ __get_unmapped_area(struct file *file, unsigne= d long addr, unsigned long len, > > > >> if (get_area) { > > > >> addr =3D get_area(file, addr, len, pgoff, flags); > > > >> } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) > > > >> + && !addr /* no hint */ > > > >> && IS_ALIGNED(len, PMD_SIZE)) { > > > >> /* Ensures that larger anonymous mappings are THP = aligned. */ > > > >> addr =3D thp_get_unmapped_area_vmflags(file, addr,= len, > > > >> > > > >> base-commit: 2d5404caa8c7bb5c4e0435f94b28834ae5456623 > > > >> -- > > > >> 2.47.0.338.g60cca15819-goog > > > >> > > >