From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 01326C47DD3 for ; Mon, 22 Jan 2024 19:44:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7E7466B00A1; Mon, 22 Jan 2024 14:44:09 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 797376B00A2; Mon, 22 Jan 2024 14:44:09 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 65F6F6B00A3; Mon, 22 Jan 2024 14:44:09 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 562BE6B00A1 for ; Mon, 22 Jan 2024 14:44:09 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 3BC3F1A0A45 for ; Mon, 22 Jan 2024 19:44:09 +0000 (UTC) X-FDA: 81707973018.11.C1659F5 Received: from mail-pj1-f50.google.com (mail-pj1-f50.google.com [209.85.216.50]) by imf01.hostedemail.com (Postfix) with ESMTP id 729D740002 for ; Mon, 22 Jan 2024 19:44:07 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="DZ/rqNkt"; spf=pass (imf01.hostedemail.com: domain of shy828301@gmail.com designates 209.85.216.50 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1705952647; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=PhvnJ1zPFZKF2xF5l7klcYJLTNgBFSu49KaFf3uRqbM=; b=MvTXdIZHBxBtcGqimAjizyhh1ZGxxyPgf0Ik1Vj5h+4zTexWM71umHbpQzx42bMjDU2sYk eW83IhOFaFhos0E0wUf51P9q6mbR+epSw+09w40cMcqKnAdZa3iSRNRqAyTVcye7xJmic5 8TTmnNZupFUpU52BWUXrvF2FyBVxj4o= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1705952647; a=rsa-sha256; cv=none; b=bWQBzm0WMqjPxyFYZ2NMK2+mA5sjX154HDwWb5ndbEOSekNosZ31AAUCymzAe9IHC2xmSm 2+TX3WR5oMZHZ8QLaTEIiSuNizZx03e0V3TtiW0pcR24pQQUZhOJirDx7RSHouEu6hAq1/ BG3sswOrdlucra+M5lYaRdwYEWBNq2M= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="DZ/rqNkt"; spf=pass (imf01.hostedemail.com: domain of shy828301@gmail.com designates 209.85.216.50 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pj1-f50.google.com with SMTP id 98e67ed59e1d1-2907a17fa34so1383757a91.1 for ; Mon, 22 Jan 2024 11:44:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1705952646; x=1706557446; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=PhvnJ1zPFZKF2xF5l7klcYJLTNgBFSu49KaFf3uRqbM=; b=DZ/rqNkt9RL/2fDdGwTFFBjtz/5c06jIfe5c1F2UZHjIynDYkijCShGeZJCSNxtf1q yduJ9RW4p+QaPwX5jNz8SYq7+VEtpe9G3PAaUTCuV87ZhAWX2XOdh/lpo7Q026ePZ50e 01jELLe+MUoEdLnaeTa6O3SvsK67vknn5YadFswDwTHieegRXQ0iphtv4PUJRrt7ET86 un7JnZvmFHZuEd4szb/737d4vmNhX90leNw6GahC40punIMoyKovue+a7W0NCYJ94t+6 2n9zKyDbSo3FadhdbM+OFDefDFF62RYvXmQGm83dt+eaEGFgpW6idUboa3B1ZnBRi6DF WOpQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705952646; x=1706557446; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=PhvnJ1zPFZKF2xF5l7klcYJLTNgBFSu49KaFf3uRqbM=; b=dEInI6ySxCuxHGdbqhFUoc+mVI6dpgl32OekppQfe7FGU+sIgc74U+J9qIRarHulIT 6J+11FbWOSPxfx+xRthEBJmM3YVQIEqAfh4yjTMYE81RRvXxoJRRB3KT6yRlt7FJ0PQv OmY7IoF8/Z8DO8ombDy0HGOFOaYJpCpqW6KMOlby2ZsOQsSdxWQHxE3hUje2SuRQ74Cl d1mD53mRRbexUVDdBqYHW2THYpjZYNAj1GTsTOQatulfiiYZkb1BxAbATRpTAX7AwI8w zZK8TBschU0ThFAya0EgNQHU6Jle5InppxJkg2CobgHQK/IynQtf8oZHR739USS+Nnwt yABw== X-Gm-Message-State: AOJu0Yy8wvHwsDgLDs2av9SMLSCR0/j4xbHXCF/q0/h5VAiGYY9ATZCk 5YKVL5zu5csXpC9NXlmIiPLotFnCy/d1uVq0j8cy2wuMYRIUq+xidRx+4u6imzif4iD7jGL4Li5 imOWgEr9TiLkaaizPlKtyUgE8r4k= X-Google-Smtp-Source: AGHT+IEy30QcCOvpCXa3tuPzVS8jmnDIyVxIJztKYFfTkJPPcRsEgf0PNlmHT/LsFK4+3FQczRsiQFTMJVYJmrLy/Mw= X-Received: by 2002:a17:90a:8d0d:b0:290:1451:2cd0 with SMTP id c13-20020a17090a8d0d00b0029014512cd0mr2215010pjo.35.1705952646125; Mon, 22 Jan 2024 11:44:06 -0800 (PST) MIME-Version: 1.0 References: <20231214223423.1133074-1-yang@os.amperecomputing.com> <1e8f5ac7-54ce-433a-ae53-81522b2320e1@arm.com> In-Reply-To: From: Yang Shi Date: Mon, 22 Jan 2024 11:43:53 -0800 Message-ID: Subject: Re: [RESEND PATCH] mm: align larger anonymous mappings on THP boundaries To: Ryan Roberts Cc: Matthew Wilcox , Yang Shi , riel@surriel.com, cl@linux.com, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 729D740002 X-Rspam-User: X-Stat-Signature: 6h9jq1p9m9jdmsxno8get3e78uocojzn X-Rspamd-Server: rspam03 X-HE-Tag: 1705952647-64161 X-HE-Meta: U2FsdGVkX1/zxNw7MuLpl1FBAT7kysinSWaPiO+4HKEP0nv+q6h/Mz5ONdk0N9jCYazCbe6hi8uxEYOIMyVnHd9BqI0UQlaVRmDssFbQYQ3uNg9hEPOjcNpDsJaO5I26bZwHFmaOlWiMwqTXTnpHat5Qfy/hGwnnQBoBZpIbbMn65NolMZJN7teqY6cPvT6SD9Tf5O4i4TSXHZCAg/0hFgokSLgy8xR9BnM1shi77JXrm6ilhOoW9oz/MjYqpUADHLanDudrLAcXidp+RmdVnUG7OYpepQ4hfMqkS+7GZV/xs/y1US5T7zEwSz1BFnoeI9RLvPXDpqO2vyPmF+AsnIMmcLm00v2MezeM4006XUzLkRrjihgrJcd4aB/zuFpYAoRIetp6KdXHFY8J3IGBWeO4rvfMlR4hNkoTZYA5fAB7vm35mgq7fXaX1hTnTKDvYLR9Bn/RGMTuroKoLMkpzgLnuo0jeF0FnADr9ot0il19FIsSLqfiHx51VfeKoxuWdZoSFKECvxlMk/7i1V2w+UxP7TWf32ffsGJ88lTEREeP+4BZi2Zh5fQ276yQ5HhgONVhhw7ZWcmQN32t2oA6lu6V9A2v6DM16EH15FHsDEfID0lkv741+GfOoUSHdFcavhRdlckhVCCPgnU8tdR/CgZ2e1IdzCtVlRTJ11Q4fAMCEqepyVc0qbZC+O07lKO2bIlp9Mfm7JPE/pstjDNfZNIvFtKbtWXWv7m+uD9B9kQeM7b4B4g9sySSzh/+0Bx8J4R7JdAp8qQSrbsneKZSl+l9KWCr3XLTrZh08jI8cQLTo/XKvV/xf8OZY/qe+piAfSnwKy27n0lTh6th5ReZO/c1yTYXupEVAPNO7WHvXkHWr3XglS31EZezWTO6KH6vM5R1TthsCa2atvd6dmjR2t67ipFDGrvoygAIQkgTwxS+L18whLx92Y6RhfFFVxDIc6wYBOkp8ym2UHkrmST Z+ohbs3I WZq5AW2m+mFMKL4XfkteVrW2u6y7gY+wmklk4cg0FBFClNASnryTEEykED7SrkB4ngF7L/2oN9oIDEnZTSDmX3KnSPJYZOA5S1tj3n69Tp+yeY5sarVbCG84i/jmYi1obtX29HQ7EcsCx50SQ8lWOud80DdHtsAmUzyF+7oZtxGmrw/MAKPB7luNsr69VVDZLCZzCXCC3jlGFTtwjA5bG6GRo5i/RKvZ/SmcU15UxABmmA8GU9x8Ai88F/NmlkRNPNPqoLelkohmU82KXmOgJaN+TMYx10QG/ay08QrDOB/YBIVfJYt5Qo9MsE+a1O/b2qhyC5s2c6hcz0k1GjxCtKHSSKaA9u57QuOfWDZxBWawKiGBQ7n37dmXHb0zA+wnPRRiNARrnivYnqmhStEJZGpnVNsJ1rWOKlwj1 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Jan 22, 2024 at 3:37=E2=80=AFAM Ryan Roberts = wrote: > > On 20/01/2024 16:39, Matthew Wilcox wrote: > > On Sat, Jan 20, 2024 at 12:04:27PM +0000, Ryan Roberts wrote: > >> However, after this patch, each allocation is in its own VMA, and ther= e is a 2M > >> gap between each VMA. This causes 2 problems: 1) mmap becomes MUCH slo= wer > >> because there are so many VMAs to check to find a new 1G gap. 2) It fa= ils once > >> it hits the VMA limit (/proc/sys/vm/max_map_count). Hitting this limit= then > >> causes a subsequent calloc() to fail, which causes the test to fail. > >> > >> Looking at the code, I think the problem is that arm64 selects > >> ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT. But __thp_get_unmapped_area() a= llocates > >> len+2M then always aligns to the bottom of the discovered gap. That ca= uses the > >> 2M hole. As far as I can see, x86 allocates bottom up, so you don't ge= t a hole. > > > > As a quick hack, perhaps > > #ifdef ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT > > take-the-top-half > > #else > > current-take-bottom-half-code > > #endif > > > > ? Thanks for the suggestion. It makes sense to me. Doing the alignment needs to take into account this. > > There is a general problem though that there is a trade-off between abutt= ing > VMAs, and aligning them to PMD boundaries. This patch has decided that in > general the latter is preferable. The case I'm hitting is special though,= in > that both requirements could be achieved but currently are not. > > The below fixes it, but I feel like there should be some bitwise magic th= at > would give the correct answer without the conditional - but my head is go= ne and > I can't see it. Any thoughts? Thanks Ryan for the patch. TBH I didn't see a bitwise magic without the conditional either. > > Beyond this, though, there is also a latent bug where the offset provided= to > mmap() is carried all the way through to the get_unmapped_area() > impelementation, even for MAP_ANONYMOUS - I'm pretty sure we should be > force-zeroing it for MAP_ANONYMOUS? Certainly before this change, for arc= hes > that use the default get_unmapped_area(), any non-zero offset would not h= ave > been used. But this change starts using it, which is incorrect. That said= , there > are some arches that override the default get_unmapped_area() and do use = the > offset. So I'm not sure if this is a bug or a feature that user space can= pass > an arbitrary value to the implementation for anon memory?? Thanks for noticing this. If I read the code correctly, the pgoff used by some arches to workaround VIPT caches, and it looks like it is for shared mapping only (just checked arm and mips). And I believe everybody assumes 0 should be used when doing anonymous mapping. The offset should have nothing to do with seeking proper unmapped virtual area. But the pgoff does make sense for file THP due to the alignment requirements. I think it should be zero'ed for anonymous mappings, like: diff --git a/mm/mmap.c b/mm/mmap.c index 2ff79b1d1564..a9ed353ce627 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1830,6 +1830,7 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len, pgoff =3D 0; get_area =3D shmem_get_unmapped_area; } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) { + pgoff =3D 0; /* Ensures that larger anonymous mappings are THP aligned. = */ get_area =3D thp_get_unmapped_area; } > > Finally, the second test failure I reported (ksm_tests) is actually cause= d by a > bug in the test code, but provoked by this change. So I'll send out a fix= for > the test code separately. > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 4f542444a91f..68ac54117c77 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -632,7 +632,7 @@ static unsigned long __thp_get_unmapped_area(struct f= ile *filp, > { > loff_t off_end =3D off + len; > loff_t off_align =3D round_up(off, size); > - unsigned long len_pad, ret; > + unsigned long len_pad, ret, off_sub; > > if (off_end <=3D off_align || (off_end - off_align) < size) > return 0; > @@ -658,7 +658,13 @@ static unsigned long __thp_get_unmapped_area(struct = file *filp, > if (ret =3D=3D addr) > return addr; > > - ret +=3D (off - ret) & (size - 1); > + off_sub =3D (off - ret) & (size - 1); > + > + if (current->mm->get_unmapped_area =3D=3D arch_get_unmapped_area_= topdown && > + !off_sub) > + return ret + size; > + > + ret +=3D off_sub; > return ret; > } I didn't spot any problem, would you please come up with a formal patch?