From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0A2D7C3DA64 for ; Thu, 25 Jul 2024 06:21:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 937926B0088; Thu, 25 Jul 2024 02:21:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8E9296B0089; Thu, 25 Jul 2024 02:21:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7AFA36B008C; Thu, 25 Jul 2024 02:21:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 5CEF96B0088 for ; Thu, 25 Jul 2024 02:21:48 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id CB72A1C26B1 for ; Thu, 25 Jul 2024 06:21:47 +0000 (UTC) X-FDA: 82377279054.22.2B9F059 Received: from mail-vs1-f45.google.com (mail-vs1-f45.google.com [209.85.217.45]) by imf11.hostedemail.com (Postfix) with ESMTP id 1CBED4000F for ; Thu, 25 Jul 2024 06:21:45 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=kvInmaF1; spf=pass (imf11.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.45 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721888441; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2La7RJ3KD31zcZCBzxNjyk9Uud9cTzRN8iK2bDflsug=; b=NsMLithGFm7G5TJHfp4lAiPAWXSxCqFI8qSXTE37K5YqBAXPq68h/MLomouavq2y7B0OYY f2Ht54/ZwBfniEySLJYExHcv7TxbuSbU2GWhg8E0Zfo2vvvf4D6d27AOQtE93F7QR/GHVw gOomMwMjYW05G+xxrRM85pnPDHhu9Lw= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=kvInmaF1; spf=pass (imf11.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.45 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721888441; a=rsa-sha256; cv=none; b=hpn/aHOXcjp+qQquP41tzuWKtXUe/txsySIes8L1bgzyjJQCTmKCq7H6BlFfISpSabPOfM H2kMqtTje5h7CKleMjTWHGuglS2ClS0coI4G0S8LEizeBi7MPurew0maSBvfWLiPObshD+ 5nNZ6enQUieLmmHYbCS3qfhA72U7zyA= Received: by mail-vs1-f45.google.com with SMTP id ada2fe7eead31-4928989e272so198958137.2 for ; Wed, 24 Jul 2024 23:21:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1721888505; x=1722493305; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=2La7RJ3KD31zcZCBzxNjyk9Uud9cTzRN8iK2bDflsug=; b=kvInmaF1LGWyDCIkLMOMS5D1Ys23TEcNunLKVTfk0DH4uAKJ/i4q7/7YuBQJlXPb6V xbPz+YZhdKtMI9IrPBfl/Yqi+ijWvksQSE+fFd5kHKPL8XckLh9ojj9+U1DXKGOVo16L LrwVoXSKAk9mWKxaztZMF4WWDGgvu6CAw0Q+0v0Ru5Nu91RsSWsC2zI6oYHew0kjH3K6 x3hlRKzcc3iffwzFDdrttbXdyvYToHbh7QTmpYc9vIpiI/fawPnKQ/dGbArGTxJxQmrT Zv6PUjbdeHJrTv4nAwTvM4j5jOvUaHSXwKe1whPozv63YzFlstNQn/fv1zck5aQEUYuA 8e/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721888505; x=1722493305; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2La7RJ3KD31zcZCBzxNjyk9Uud9cTzRN8iK2bDflsug=; b=owSWz7dZCzD7COn8yjSKMhHfWg7aA7BMhTK2+SMI6poX7HXLdBvqB+gnMdcXq8k4DB 0otDNA3AlPgG1e/WX3YKuMsSJU7w02ovIcSigWfm4sfFrIknjdi8dOSY9jCJ61N16EUI ePqAZpFIMztIn/iKqn3ZkdBZq9j0kCeqJcSxlfGruC6QABQ5aJyvBGkFtZL61/hmnFWl HhGDOkr7gk/Im0OLurSbgJmuBKtMraw9pt1Mqmzzf+zdUlBlqFSaLDW1T+EkMiW/PPzL YUc8DQDSRlInjDuUMge+BydnmjoHC54MvuBaoQ4N4Wmx60F7wYb83toLq4IBXlpLoFXn DaQw== X-Forwarded-Encrypted: i=1; AJvYcCWzYi6sM0ReYSJRb6syhCLqfxlMG1nqaeKw1oImZwRFowyqRkxZVy1E9bLGGtISg5LzxXrS0IQ4yij9dt09/T96/j8= X-Gm-Message-State: AOJu0Yx+rKkyGEsTGPXDMkCHcLLP9h85YTsB0FC+IP1WTZ7OA9Y2WmrF Ko+tjXudRC6KBVZSBlS1BSwUUigzitfvRhWvfzm+h5GnKuhptgjopZSLQQo+PaFsSyecTsZDIcO ++lYKXbNmskDexCAcySQ9rBcSyow= X-Google-Smtp-Source: AGHT+IGzG8S0IZ8DeAe68Qj/KPU3jWFfxLBHEh6bC5AJ78nHUFCdhbn2nxYKEiwB9QB5kGDLDQ3vNJ+00vPru+ex/38= X-Received: by 2002:a05:6102:4b89:b0:493:bf46:7f00 with SMTP id ada2fe7eead31-493d6400622mr2942648137.5.1721888504973; Wed, 24 Jul 2024 23:21:44 -0700 (PDT) MIME-Version: 1.0 References: <20240725035318.471-1-hailong.liu@oppo.com> In-Reply-To: <20240725035318.471-1-hailong.liu@oppo.com> From: Barry Song <21cnbao@gmail.com> Date: Thu, 25 Jul 2024 18:21:33 +1200 Message-ID: Subject: Re: [RFC PATCH v2] mm/vmalloc: fix incorrect __vmap_pages_range_noflush() if vm_area_alloc_pages() from high order fallback to order0 To: hailong.liu@oppo.com Cc: Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Lorenzo Stoakes , Vlastimil Babka , Michal Hocko , Baoquan He , Matthew Wilcox , "Tangquan . Zheng" , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 1CBED4000F X-Stat-Signature: 8ohun9er9zz8asuscmui9stgpaqmyscd X-Rspam-User: X-HE-Tag: 1721888505-76619 X-HE-Meta: U2FsdGVkX1+N3OS+zWjNFTTPeyBAPqKiiTaddI0lY4v8ig9Igj8uhgSjJDuNDL4Xe/uiX7qdg5cAW5uDhiFEFhdDUdkIRBYDueu8zJW2Y/YMj2lVvOVM5YWeptKgvYE6Uxx+/gk7OFGRYmTIp9JbNthcuVHfsnrW+LWH+uQnPGrXn4mj08O7iAIMx3EtIQF5uvThTNAeVzTwjDMeAEFtRfLag0/ue0Buapiqml/lHHM9yNsLWb51EzAF1qDRYCXoza1qphQuHKekldj5yulFoPsTZXsFtX+8fgK1Dr1kG46ZKB8rarLIDObdg2QQIGO2UMs/jCpByHWKaIyARiZUPufCfyZvRcSzfAu5eokNqRVc+Pg2Ho03ep4xdr7LEI/eLMDX8naHFltrowiYMdRxTCMIYrm4end3neG/BqEeocgJZQzVlm/aiUrbpSZLno4yuWwqul3wnsKPxZHrDdE2rK5F800m7UT77UadGSZCOXy9Gc2hh5svYsecE7bFjlgUjeB1fS7x8pkQBR/bKl5mBnEGxpeLGFrqhhFuuolF1XxUvRE6U7ZFRAB4a74ExoooyCFPpMfonCW1ZTf5hlILpSn84h38/5BXCvq3O//U8dYH+oLkXJtaue82JOJxs9tcsPz+Alpk6Y7b1crM9W19ofQXOteGDGdENlzsyTCtCegOBmp5IuO/VHsuMkemozjzUCtaQ8uVrh7gkBg9VX5uvDUY+Xt77PookwgI1sSQ47m7V+6ssKyh0+nry1qOmBXdWgY/HDnq/WfLlJrq4XWuA03RCzn1PXmfaehKbv/7KdPHgvuDorrVHSOJ2L7BXEr5w9g7D0evpzPoxvLNbAQxJyY1COsghr8i/jZnNn1NF1C3sBFk95bKq0tP9+Bqg2eGxT2FxH7K8CMWXwA2D3eEtYqRRmNjwdQTmzzJCFZTaAcH7Ph//0e0RxJsxhxq8+FLLpykqKFI9+ZGRPFxpR6 TlEedLt1 +x53ZX/kyJYdbqAxXO5Rp5Jd9PJhfHi5Xu9z/wpYo9k++xWvKxoqGRc9B/oyTNVB5No1p6gmY5r4DOiek8JUduKOg3r98M6BUG2bslP5u4lUvHvOAfkV9rmu24lVcUzaS0hzBW0/S3QQFePZvrdphzR+ydhhcM8gDaTsx8jLG5K2Lma0PbxVoYJw/bMeECreWu00wCZBeKKYIUMlGQH7aTQPwAsmlDhYd8xkwXEaBmBBcMyAiziFx0RRg6s+lRZ89Vwg9GZgSN9YVEh03uAXWcORzT3b2alPtTY6gRNSYiSaOOAcb+c48nxxyu912aWj7R73YeIKjLMdM33+fuWAzDBW2A4GTiUTsdrd1/I87kEBG67y+sakZ1rlzi6DcFnUCaDFXyrzOs4dadiW4ZreOyTl/ibuUA9dWkCYL84daPIgLEcEkXbYIyTvGnP+NMoB6ly5iD042j6Eg4Io9oyOPzJrRR2eSKLhhGQ+ycEmO/r8dDMO/+RcaEzfWaWJcrUalWMyd15EuyziHk22VHlZSnV/D547uzdqrOz2ENit9ADXG7r03cl7bKlZeoGosJY2XyKPo X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Jul 25, 2024 at 3:53=E2=80=AFPM wrote: > > From: "Hailong.Liu" > > The scenario where the issue occurs is as follows: > CONFIG: vmap_allow_huge =3D true && 2M is for PMD_SIZE > kvmalloc(2M, __GFP_NOFAIL|GFP_XXX) > __vmalloc_node_range(vm_flags=3DVM_ALLOW_HUGE_VMAP) > vm_area_alloc_pages(order=3D9) --->allocs order9 failed and fallb= ack to order0 > and phys_addr is aligned with PMD= _SIZE > vmap_pages_range > vmap_pages_range_noflush > __vmap_pages_range_noflush(page_shift =3D 21) ----> i= ncorrect vmap *huge* here > > In fact, as long as page_shift is not equal to PAGE_SHIFT, there > might be issues with the __vmap_pages_range_noflush(). > > The patch also remove VM_ALLOW_HUGE_VMAP in kvmalloc_node(), There > are several reasons for this: > - This increases memory footprint because ALIGNMENT. > - This increases the likelihood of kvmalloc allocation failures. > - Without this it fixes the origin issue of kvmalloc with __GFP_NOFAIL ma= y return NULL. > Besides if drivers want to vmap huge, user vmalloc_huge instead. > > Fix it by disabling fallback and remove VM_ALLOW_HUGE_VMAP in > kvmalloc_node(). > Fixes: e9c3cda4d86e ("mm, vmalloc: fix high order __GFP_NOFAIL allocation= s") > > CC: Barry Song <21cnbao@gmail.com> > CC: Baoquan He > CC: Matthew Wilcox > Reported-by: Tangquan.Zheng > Signed-off-by: Hailong.Liu The implementation of HUGE_VMAP appears to be quite disorganized. A major change is needed. 1. when allocating 2.1MB kvmalloc, we shouldn't allocate 4MB memory, which is now done by HUGE_VMAP. This is even worse than PMD-mapped THP for userspace. We don't even do this for THP. vmap could be done by 1PMD map + 0.1MB PTE mapping instead. 2. We need to allow fallback to order-0 pages if we're unable to allocate 2= MB. In this case, we should perform PMD/PTE mapping based on how the pages are acquired, rather than assuming they always form contiguous 2MB blocks. 3. Memory is entirely corrupted after Michael's "mm, vmalloc: fix high orde= r __GFP_NOFAIL allocations". but without it, forcing 2MB allocation was making OOM. > --- > mm/util.c | 2 +- > mm/vmalloc.c | 9 --------- > 2 files changed, 1 insertion(+), 10 deletions(-) > > diff --git a/mm/util.c b/mm/util.c > index 669397235787..b23133b738cf 100644 > --- a/mm/util.c > +++ b/mm/util.c > @@ -657,7 +657,7 @@ void *kvmalloc_node(size_t size, gfp_t flags, int nod= e) > * protection games. > */ > return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END, > - flags, PAGE_KERNEL, VM_ALLOW_HUGE_VMAP, > + flags, PAGE_KERNEL, 0, > node, __builtin_return_address(0)); I'd vote +1 for this. we don't want to waste memory, for example, wasting 1.9MB memory while allocating 2.1MB kvmalloc. but this should be a separate patch. > } > EXPORT_SYMBOL(kvmalloc_node); > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index 03c78fae06f3..1914768f473e 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -3577,15 +3577,6 @@ vm_area_alloc_pages(gfp_t gfp, int nid, > page =3D alloc_pages(alloc_gfp, order); > else > page =3D alloc_pages_node(nid, alloc_gfp, order); > - if (unlikely(!page)) { > - if (!nofail) > - break; > - > - /* fall back to the zero order allocations */ > - alloc_gfp |=3D __GFP_NOFAIL; > - order =3D 0; > - continue; > - } > > /* > * Higher order allocations must be able to be treated as > -- > After 1) I check the code and I can't find a resonable band-aid to fix > this. so the v2 patch works but ugly. Glad to hear a better solution :) This is still incorrect because it undoes Michal's work. We also need to br= eak the loop if (!nofail), which you're currently omitting. To avoid reverting Michal's work, the simplest "fix" would be, diff --git a/mm/vmalloc.c b/mm/vmalloc.c index caf032f0bd69..0011ca30df1c 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -3775,7 +3775,7 @@ void *__vmalloc_node_range_noprof(unsigned long size, unsigned long align, return NULL; } - if (vmap_allow_huge && (vm_flags & VM_ALLOW_HUGE_VMAP)) { + if (vmap_allow_huge && (vm_flags & VM_ALLOW_HUGE_VMAP) & !(gfp_mask & __GFP_NOFAIL)) { unsigned long size_per_node; /* > > [1] https://lore.kernel.org/lkml/20240724182827.nlgdckimtg2gwns5@oppo.com= / > 2.34.1 Thanks Barry