From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BF905C47DDB for ; Tue, 23 Jan 2024 17:14:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2D74A6B0072; Tue, 23 Jan 2024 12:14:43 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 286566B007D; Tue, 23 Jan 2024 12:14:43 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 14F826B007E; Tue, 23 Jan 2024 12:14:43 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 087A16B0072 for ; Tue, 23 Jan 2024 12:14:43 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id D0F7DA0B5D for ; Tue, 23 Jan 2024 17:14:42 +0000 (UTC) X-FDA: 81711225204.04.7C53D6D Received: from mail-pj1-f46.google.com (mail-pj1-f46.google.com [209.85.216.46]) by imf06.hostedemail.com (Postfix) with ESMTP id 16AC6180010 for ; Tue, 23 Jan 2024 17:14:40 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=I2kUCxx0; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf06.hostedemail.com: domain of shy828301@gmail.com designates 209.85.216.46 as permitted sender) smtp.mailfrom=shy828301@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706030081; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=+DNezpFxIXzeeU2U4MFZwCEO05BxRItJUlB3EwgzaOU=; b=M9XraSPrPmOUZwnTekPomvGJc7+hXUDKG6uxW2CX4WkUqFvj4JdxD/5Nm3EsCnFN+R1LGC FO4gRH/T844WRCFBHoleNoBmwmjQ3Pjb7soYlid/9hzU4wkFvy2o8AfyNMJXTGs+GopVOY CKIr9mzkss+MC1Q2houtRJUU53TveCM= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=I2kUCxx0; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf06.hostedemail.com: domain of shy828301@gmail.com designates 209.85.216.46 as permitted sender) smtp.mailfrom=shy828301@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706030081; a=rsa-sha256; cv=none; b=FXz8SOWqzCKBOu9CDw4sDGDo6l39i/SMsTn6UB7llyKbvLgn9SJNxNZMKHzc9tTOOOfZG7 N57Wb43Qqzjn901n7W7Gto1TzbcrYMuHZTLTBe+kK/a51KeKgJiKbdz5iWwxQXNIAGbJx7 Ss2Of8Y+BBfBNcexow1qvZKEX+ouDdw= Received: by mail-pj1-f46.google.com with SMTP id 98e67ed59e1d1-290a55f3feaso1681202a91.2 for ; Tue, 23 Jan 2024 09:14:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1706030080; x=1706634880; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=+DNezpFxIXzeeU2U4MFZwCEO05BxRItJUlB3EwgzaOU=; b=I2kUCxx0/hjwNVa3jXOTbiUhUQwTB6vK3SsIUGCPFijelPq0uyCQAwrkA5qWmSq3eG s6sVHEMfZBGGo/L21Qmp2xAdup5Kns8Cel3BiMrTdlSVJ/1wMINXYN9CYMehD6cT9JMo 3GG/drQobkF4fgH7ioytWZQwmLp7MncOs1wi0PaX5y8lhHTqohP3UufbFRW7LPpdpnYf TLcjIZGPvkGbIzXUNB9ntjwCwTfSHyoXOZ97+aUWNAs4CAhAQZLg/uv6Th4T4t2qZbK1 FjxkB/r9OKaQrgmhN3AeXTs8phK3oDBR8bn2FmgXNgtzfQ+kkbjP+SJoMdsxdQ48SKjE BDxg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706030080; x=1706634880; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+DNezpFxIXzeeU2U4MFZwCEO05BxRItJUlB3EwgzaOU=; b=rUDclXhMou5n4i99YFG0BOLuXUJybSh896WBYY8/BDdiY8RG5OwG24podXJJ/bBi3K DmN4PCsgnJqXfyhx0lDBtaToPX8CKWys6UZShj9nrZY3AJh2nronstprtcj7lqtrqPbQ lpd5gWn9ZzC1MNfOHMbM0pbfzueoSucYN6CMliVeYaDqV4x5QxzWueUdn6TuIOhyyn4N +ruGCQ02t7KrqgvxUuNY96K+X2eii9AfpfYolMQg9DyXVSmZEoX7NrDxgkkUnDEOIRbh FBz6LJkkZ9Vy0NsjnZc621dayFroNnewJ51xUw6xMWUOFbLpZK2uY6rOkPzLgnu5tKCz bgBA== X-Gm-Message-State: AOJu0Yz/u6kINywSpAzakjHcF26i+12RWxgwju0gG19aZn3P1IOtPgjW dD2zQ9fl3rM9duynuVbKwPEvtzOQBr82nOJtQAfjXwQnCC79uIGSLe6mCiDka8m5o6Rr2/IIyn3 EXonVTY1AhOb1ISaz+YIuZif4y38= X-Google-Smtp-Source: AGHT+IGdprY6W0mA9cG6ffr/9OI2v7VapsMAQMcBcOYOgT5/4HrbOrkQ1ZRzfv0ww6Yhz5hLFN4HJmwjcLZduKkfYgc= X-Received: by 2002:a05:6a21:3281:b0:197:570c:c3a6 with SMTP id yt1-20020a056a21328100b00197570cc3a6mr8141941pzb.101.1706030079600; Tue, 23 Jan 2024 09:14:39 -0800 (PST) MIME-Version: 1.0 References: <20231214223423.1133074-1-yang@os.amperecomputing.com> <1e8f5ac7-54ce-433a-ae53-81522b2320e1@arm.com> In-Reply-To: From: Yang Shi Date: Tue, 23 Jan 2024 09:14:27 -0800 Message-ID: Subject: Re: [RESEND PATCH] mm: align larger anonymous mappings on THP boundaries To: Ryan Roberts Cc: Matthew Wilcox , Yang Shi , riel@surriel.com, cl@linux.com, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: 1d8mcunthd4r46yj7hi1c8abomiw9urf X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 16AC6180010 X-HE-Tag: 1706030080-47911 X-HE-Meta: U2FsdGVkX1+O7vf8DJTpnOTpO4olkiLHuAjwI92LrNmUI+QfIEyi0VBFPZ+jyZzdjF2oDUCaVfS2K2DNak5Sku7F73zn16xEkPC8QPbha4a9cg+7nUSd4JmaUAGzHuFMwlIKBW7dsMM4NRB/Ll2uz53aU3Mf3al2QcKm2z1KJr3KL2OvWIdf4yp/MlF+xnekUflvsrTqmtJ5F8rW9b+rKO5FGbVde9ylM9BSbbhQ8WsYds7BPIAae1tjD5rRxs59r+UpeNU/QRT4ybjcVqCyQdK9X6ATbx6ojzDrpJk/00N/AnSBcA4QgVYoi88zt2T6h9chxJFyBmnZQMWBA3dLqMIZv+UXGE5qmnJkeeWU7wagITvHfPM+kUexJFPIod4HgJjw4QmRRPxOaFeKtp0Be8OuJtks7A9lv9idS460p+0RZkBKKwKDsuoeZCQBrLLbjIxlXthU5BdZHyyEwS2dfv3fxTZGFYyyYXzC4cSz6IUySqjmlkAu4CQNrLvUCU4fjl5ZQmObauVZyoeIoOh170uCx7Z5jmGbBQsuvJl4UD7ssz81csa5RZVLjH9d1us6XjhR2u7W0RTyHhNCK132txAeAUChXxCSUMoapl4k/0BbdQgds7OOlwUxWUk3Zf+rPxSGnJ8rpfZIM+zvbTPvez9tkQF2nrNmNyHG1PsFkvpR6QDGWqIKGuK6TBASJ7JUz4j9rvydKq7q/+8wt9aDX/erogsuJRN4m5IvdKGTx3K1slotW+hzHB/WdZpmHQ7Huy1OmN0s1z6t+yTrrTsjRSl4AstpfI0f3THeHBjFx+6RgYg2Td2tTLk1gxZrfYmSEZ8kxV7jTjbDrZV1GIk6CG+DP7Vc7DFoBqUkeHBBbs71SOtBpY6JfeO84c1IsjbszC1+JmcXVpN9sW9PCZYRp2sEwOViQZ2Wla0ePos27aemR4eNwnGcho9mtTYMoKQdtnPLeZaFU+Fm5S8+WD8 YOuP5N/P QDZDLuU4FJSSVOKYfXKhAvxf0WOBVKuUMTNABVvFMkfqsO7PVvOUGayb9qS5WjZaYfpkPs4NMDfbgSftMP6n2zmkha/sgDVvtP655Fqu++1XjvE07Dmx5aglxbdombIWoJhKHpP/Q+07B+0dUYHBcOu5tPE6dCCf5g+4DBv+t375e9WRTiKMhlmJmM9rJhAokxAMoC63E6uDaTcuUmE6rtMBv1XUWLvT1cXf+IBcMiojhM4xCuL0njsBCYgkzyIkBXP5a+cYgI2bU1cefktU3akp06lvny6KBVU6Q86dX1M0sVOPsdTcLwoHbeQX1GL+2rV9j2Q1MhrURuAOzbpuCwf+69T8vnMt4wIHoOjbAHhGLg2wypSSNoEpTcNA2hQGQUodebG6IDo2gALiNFNQ+W3advNbiq1vFIFtg X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jan 23, 2024 at 1:41=E2=80=AFAM Ryan Roberts = wrote: > > On 22/01/2024 19:43, Yang Shi wrote: > > On Mon, Jan 22, 2024 at 3:37=E2=80=AFAM Ryan Roberts wrote: > >> > >> On 20/01/2024 16:39, Matthew Wilcox wrote: > >>> On Sat, Jan 20, 2024 at 12:04:27PM +0000, Ryan Roberts wrote: > >>>> However, after this patch, each allocation is in its own VMA, and th= ere is a 2M > >>>> gap between each VMA. This causes 2 problems: 1) mmap becomes MUCH s= lower > >>>> because there are so many VMAs to check to find a new 1G gap. 2) It = fails once > >>>> it hits the VMA limit (/proc/sys/vm/max_map_count). Hitting this lim= it then > >>>> causes a subsequent calloc() to fail, which causes the test to fail. > >>>> > >>>> Looking at the code, I think the problem is that arm64 selects > >>>> ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT. But __thp_get_unmapped_area()= allocates > >>>> len+2M then always aligns to the bottom of the discovered gap. That = causes the > >>>> 2M hole. As far as I can see, x86 allocates bottom up, so you don't = get a hole. > >>> > >>> As a quick hack, perhaps > >>> #ifdef ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT > >>> take-the-top-half > >>> #else > >>> current-take-bottom-half-code > >>> #endif > >>> > >>> ? > > > > Thanks for the suggestion. It makes sense to me. Doing the alignment > > needs to take into account this. > > > >> > >> There is a general problem though that there is a trade-off between ab= utting > >> VMAs, and aligning them to PMD boundaries. This patch has decided that= in > >> general the latter is preferable. The case I'm hitting is special thou= gh, in > >> that both requirements could be achieved but currently are not. > >> > >> The below fixes it, but I feel like there should be some bitwise magic= that > >> would give the correct answer without the conditional - but my head is= gone and > >> I can't see it. Any thoughts? > > > > Thanks Ryan for the patch. TBH I didn't see a bitwise magic without > > the conditional either. > > > >> > >> Beyond this, though, there is also a latent bug where the offset provi= ded to > >> mmap() is carried all the way through to the get_unmapped_area() > >> impelementation, even for MAP_ANONYMOUS - I'm pretty sure we should be > >> force-zeroing it for MAP_ANONYMOUS? Certainly before this change, for = arches > >> that use the default get_unmapped_area(), any non-zero offset would no= t have > >> been used. But this change starts using it, which is incorrect. That s= aid, there > >> are some arches that override the default get_unmapped_area() and do u= se the > >> offset. So I'm not sure if this is a bug or a feature that user space = can pass > >> an arbitrary value to the implementation for anon memory?? > > > > Thanks for noticing this. If I read the code correctly, the pgoff used > > by some arches to workaround VIPT caches, and it looks like it is for > > shared mapping only (just checked arm and mips). And I believe > > everybody assumes 0 should be used when doing anonymous mapping. The > > offset should have nothing to do with seeking proper unmapped virtual > > area. But the pgoff does make sense for file THP due to the alignment > > requirements. I think it should be zero'ed for anonymous mappings, > > like: > > > > diff --git a/mm/mmap.c b/mm/mmap.c > > index 2ff79b1d1564..a9ed353ce627 100644 > > --- a/mm/mmap.c > > +++ b/mm/mmap.c > > @@ -1830,6 +1830,7 @@ get_unmapped_area(struct file *file, unsigned > > long addr, unsigned long len, > > pgoff =3D 0; > > get_area =3D shmem_get_unmapped_area; > > } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) { > > + pgoff =3D 0; > > /* Ensures that larger anonymous mappings are THP align= ed. */ > > get_area =3D thp_get_unmapped_area; > > } > > I think it would be cleaner to just zero pgoff if file=3D=3DNULL, then it= covers the > shared case, the THP case, and the non-THP case properly. I'll prepare a > separate patch for this. IIUC I don't think this is ok for those arches which have to workaround VIPT cache since MAP_ANONYMOUS | MAP_SHARED with NULL file pointer is a common case for creating tmpfs mapping. For example, arm's arch_get_unmapped_area() has: if (aliasing) do_align =3D filp || (flags & MAP_SHARED); The pgoff is needed if do_align is true. So we should just zero pgoff iff !file && !MAP_SHARED like what my patch does, we can move the zeroing to a better place. > > > > > >> > >> Finally, the second test failure I reported (ksm_tests) is actually ca= used by a > >> bug in the test code, but provoked by this change. So I'll send out a = fix for > >> the test code separately. > >> > >> > >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c > >> index 4f542444a91f..68ac54117c77 100644 > >> --- a/mm/huge_memory.c > >> +++ b/mm/huge_memory.c > >> @@ -632,7 +632,7 @@ static unsigned long __thp_get_unmapped_area(struc= t file *filp, > >> { > >> loff_t off_end =3D off + len; > >> loff_t off_align =3D round_up(off, size); > >> - unsigned long len_pad, ret; > >> + unsigned long len_pad, ret, off_sub; > >> > >> if (off_end <=3D off_align || (off_end - off_align) < size) > >> return 0; > >> @@ -658,7 +658,13 @@ static unsigned long __thp_get_unmapped_area(stru= ct file *filp, > >> if (ret =3D=3D addr) > >> return addr; > >> > >> - ret +=3D (off - ret) & (size - 1); > >> + off_sub =3D (off - ret) & (size - 1); > >> + > >> + if (current->mm->get_unmapped_area =3D=3D arch_get_unmapped_ar= ea_topdown && > >> + !off_sub) > >> + return ret + size; > >> + > >> + ret +=3D off_sub; > >> return ret; > >> } > > > > I didn't spot any problem, would you please come up with a formal patch= ? > > Yeah, I'll aim to post today. Thanks! > >