From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A1D2CC3DA49 for ; Fri, 26 Jul 2024 01:31:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 211ED6B009B; Thu, 25 Jul 2024 21:31:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1C1D96B009D; Thu, 25 Jul 2024 21:31:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0891E6B009E; Thu, 25 Jul 2024 21:31:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id DD1716B009B for ; Thu, 25 Jul 2024 21:31:45 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 17500814B8 for ; Fri, 26 Jul 2024 01:31:45 +0000 (UTC) X-FDA: 82380176970.14.75AB9E9 Received: from mail-vk1-f179.google.com (mail-vk1-f179.google.com [209.85.221.179]) by imf05.hostedemail.com (Postfix) with ESMTP id 50746100002 for ; Fri, 26 Jul 2024 01:31:42 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=BQsByZMH; spf=pass (imf05.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.179 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721957500; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=YghOf0p5Z5aY+HS5FcJlRDILUlgv3pFmY/AoP2dscpw=; b=P6JbxC1yb+047FfWFPepXGnOCfHknb2IBl51wdjpo9cpJmC8w4D68cF0QQIsAgezOlbzxW YF9jE69m2b7ArAki65QLxbfC9+iNovGobz7E9fjFpis7bg+E1YRtZMMcx5CIL3kVkLKIKf bjQYo475INkYORlopYHYMK5DTJrBMaU= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=BQsByZMH; spf=pass (imf05.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.179 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721957500; a=rsa-sha256; cv=none; b=iqX5INVxq0TeJa7LUyVjYkVcjkwS7t5g2LkBcLtohI9ToR/kzcxTKmrnV/mlk7DFjphXDK P+R+v+FWcpOIRpxVAsHkVVVl7ao7pKP8eai06h+rOxOFK1icVRMhgf0Gd9t0eHVilDxHLT qMmre29MPx7QCZJr2HkPa4cTZEnKDh8= Received: by mail-vk1-f179.google.com with SMTP id 71dfb90a1353d-4f6be9d13cdso62119e0c.3 for ; Thu, 25 Jul 2024 18:31:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1721957501; x=1722562301; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=YghOf0p5Z5aY+HS5FcJlRDILUlgv3pFmY/AoP2dscpw=; b=BQsByZMHnGs7RS34wW4M3A+ZFH+tLo7ZrJ8GM4eBucxo4gIslztd8XhzRd7vufd4ac 9n2adtcq8ceEkNJCOCpEz0E4Jg11vGFvNZ0HFFUDvDnTxsFkEoIbyO/YLqQwLAIBH/X0 ZI1O6lfRiW0cmwEamRGdN3q2x8dlOV9eoOR0/st7HfD8Oa1gwYOOtIQUG9hkTJq8Y62i NdSfZmmCbaivcNEm5fjZa/lh97e/ba6iYuP7tk8ICl1kkRtGHGO2xJRRadBy85+9FbXL pjw0bz7rn3zNbgtsJRIHuOfKjVLaq9PGsBpnQ8tIyXW5eZFzwwJMX/Lc4/k9qqzeUq3M daxg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721957501; x=1722562301; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=YghOf0p5Z5aY+HS5FcJlRDILUlgv3pFmY/AoP2dscpw=; b=BS/bXnIE027jXk4P9cO2bsAExu8qGswpWPZp4DdZOl282laVJH5Yp2/KeCEy6adIAo UN+Hg+5MgG+74DkB6lUjmmk1IVn98MyUiaTCADLAOw+FlrHed+1o+AKipgA1wEvdqXfj D8T9n6ZaggL1xOQsZmh9A6kmdyPHaknJGTveFhlO6ksdYwT0MGRPVb/bp0Cddx4c0ops YOrVI98/JnZI4ZuRFZTyDHmfxQHIct9tyeoDMqmQIdOtpCNp+4+8pGkO+9j4UDJ+FARg wfW9bz30PtxzrEJwRpBxTibPUbk2Fbx6Y1l/1YviXYahYtmn3f3+28LHg1Z598w0wQmQ dvVg== X-Forwarded-Encrypted: i=1; AJvYcCWJoNE6C/L1hypC+2yW4SPf5uuf5hrpc0Rf/g183632amis0VTh3t3hH0pGLo0dpOSiJKKY1hIKytUK+IpXD+OcAtM= X-Gm-Message-State: AOJu0YzllQT4F17WjE73qewgc8yVSegTISqdjLN2EHLZQHuYlgVWOXY/ TlhGv0CnJEGQQ5DoJg6Uf2nnXDjpWk6lLk9ZIU9SbJWEj+Mp4RdMU81xWIo58o3iXlhrLKU2Ph4 B9pYwFbh+6G3bN3sspyn5AeK176k= X-Google-Smtp-Source: AGHT+IE01g9g/PxfWypF0RlO9mu3/XLFJtUcH4DlX4BmUmad4SyrAWOG3jcuEXAki2FW1Ve3+Enmjmk42oFxvyt6a6E= X-Received: by 2002:a05:6102:2c19:b0:48f:450b:2ea6 with SMTP id ada2fe7eead31-493d64d1612mr5214219137.31.1721957501178; Thu, 25 Jul 2024 18:31:41 -0700 (PDT) MIME-Version: 1.0 References: <20240725035318.471-1-hailong.liu@oppo.com> <20240725164003.ft6huabwa5dqoy2g@oppo.com> In-Reply-To: <20240725164003.ft6huabwa5dqoy2g@oppo.com> From: Barry Song <21cnbao@gmail.com> Date: Fri, 26 Jul 2024 13:31:30 +1200 Message-ID: Subject: Re: [RFC PATCH v2] mm/vmalloc: fix incorrect __vmap_pages_range_noflush() if vm_area_alloc_pages() from high order fallback to order0 To: Hailong Liu Cc: Baoquan He , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Lorenzo Stoakes , Vlastimil Babka , Michal Hocko , Matthew Wilcox , Tangquan Zheng , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: bpoc75kbps4jxy6ytnk35zo9ugs4dhhg X-Rspamd-Queue-Id: 50746100002 X-Rspamd-Server: rspam11 X-HE-Tag: 1721957502-196665 X-HE-Meta: U2FsdGVkX1/7fPpSrSp5j6tW+ZWS17hdReJ3c1+yI3/FdjgHy1Qk+2kW9SPRQBXd8osW6TDNrwvBNHqGcjnq3qknf2uYjrOScrBu/M0OVb/710j+Wrtx4ekRIDvBpkH8YaHtC/NkQYx/1EmFvSRdpSfDcgP59FtfxLB4BiitpGHMYsxjxRr2RaTDfqOAn3b33dvr47BLhw/Md50EDpzyN6Kh7y2H1Chpp1iD35QAfDesAbTInNu3X7L86VQKNLQM1yYpMXzNq9yCqck7XNFAsEkQQtv58JHjCSBUnmlaDezJpEyrfGuokhBqEAh6FkrdSGTDQMe4yJbYnOHcm5tEJgWht3an/+xurkYPDxVCN/dN4sHk70Ar+NN9m05GH0qTuHACA6yD2BkJZw8LszkRCDH8VneDN0RlRWjJJiCJijg3Qiun2I7WyKdijj+HfaAyd1vgrjYyV+J5wHv6u/6oomJhcmEd9oBdqDPIqvGDBYj7EaIoelzr9vu6qtOOq5J/l5AEC//xL2zLf33xEGoQrL8YJcvpZvGbi2zvHFcbX/IHj1Qb+NGqcn94HXuBAYLRfRCbmsUdNe7XzakvvWcU1MrPhaNVt/e82EgwQlIr11+ib355g6gnxG1M8Kt500JdVjjWWjvKAozMYHZICGoaE+6g1yzT5NNgiKMmyuirEIKJ0OXxybSk38nGPr2U2TynyQgUuuqTnRG81LFu4dsZtSWNNoQaQuD/OzYuGF+SPbTPQa3Pxsm9Qfzr9J1Cqo4QMEmrFKWD42BrzHc0FuWo4mcrh9dNG8h85tjJS+SB9nRBVl5q3Q9t50fJ9oGhKbdb10XqyotBPPQUSN7KB+Pi4sA9GoBHkTDDYcjZdnNPmvJ1xCo6Wv3vlpYneWVMPRhkv8LmbgYJmumO88fUtg0PglCqjlcYQCu5P6qG4xiOkYGz89MH+wvt2ZrTYERG7ZRaQDrlbth1a6gC5eb839G YIYbMCcd NTtklqum8Utdrlfsm2m4ParLynKXV4f2YrogE52yEIHupYaNWHXn52qjW4YJgxOvC4GMiRIcS+fMbJZ6RMbut9Y2+fhk3+8YJDBDUahC+GnnPhvqBChRjZLj3TCP7ji/mmnvE+zJuyaqGFjrefh+gvV5q7xkZHbEcPl55pKk1Bl8+W3zlR9OTFsjA0YZbhXrc//9FYRw0dag3oquV1/HOiFKrtj0WghXM4uxb1YK+rxbnE2OkdC0pAkRDcrh7btpHKcbaG722S1Gq0XdOhHP7PGS4JWJDYW7BhYreQ0FSLtn1aB8C/jBPLhDu3DRB2W4rAI3nUslMy2cGvOlHByNs6ZWGr4zle4JaU784d7t9+6U/GtDpClp8WGsb17K2pNUx4Q+UuArlSsYFMdzk+3Bfiwpf/O4xP10UA0QaJi/JHFFn8J/jcE8uBLjg5oaRpNi5jlz+gFxAObgcEYHWsFoXQHr6xZyCnlViO5b2vA8HySZ1usmnPOxrelssWuv7jJYKR2p/w7AMnfnBOHk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Jul 26, 2024 at 4:40=E2=80=AFAM Hailong Liu = wrote: > > On Thu, 25. Jul 19:39, Baoquan He wrote: > > On 07/25/24 at 11:53am, hailong.liu@oppo.com wrote: > > > From: "Hailong.Liu" > > > > > > The scenario where the issue occurs is as follows: > > > CONFIG: vmap_allow_huge =3D true && 2M is for PMD_SIZE > > > kvmalloc(2M, __GFP_NOFAIL|GFP_XXX) > > > __vmalloc_node_range(vm_flags=3DVM_ALLOW_HUGE_VMAP) > > > vm_area_alloc_pages(order=3D9) --->allocs order9 failed and f= allback to order0 > > > and phys_addr is aligned with= PMD_SIZE > > > vmap_pages_range > > > vmap_pages_range_noflush > > > __vmap_pages_range_noflush(page_shift =3D 21) ---= -> incorrect vmap *huge* here > > > > > > In fact, as long as page_shift is not equal to PAGE_SHIFT, there > > > might be issues with the __vmap_pages_range_noflush(). > > > > > > The patch also remove VM_ALLOW_HUGE_VMAP in kvmalloc_node(), There > > > are several reasons for this: > > > - This increases memory footprint because ALIGNMENT. > > > - This increases the likelihood of kvmalloc allocation failures. > > > - Without this it fixes the origin issue of kvmalloc with __GFP_NOFAI= L may return NULL. > > > Besides if drivers want to vmap huge, user vmalloc_huge instead. > > > > Seem there are two issues you are folding into one patch: > Got it. I will separate in the next version. > > > > > one is the wrong informatin passed into __vmap_pages_range_noflush(); > > the other is you want to take off VM_ALLOW_HUGE_VMAP on kvmalloc(). > > > > About the 1st one, do you think below draft is OK to you? > > > > Pass out the fall back order and adjust the order and shift for later > > usage, mainly for vmap_pages_range(). > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > > index 260897b21b11..5ee9ae518f3d 100644 > > --- a/mm/vmalloc.c > > +++ b/mm/vmalloc.c > > @@ -3508,9 +3508,9 @@ EXPORT_SYMBOL_GPL(vmap_pfn); > > > > static inline unsigned int > > vm_area_alloc_pages(gfp_t gfp, int nid, > > - unsigned int order, unsigned int nr_pages, struct page **= pages) > > + unsigned int *page_order, unsigned int nr_pages, struct p= age **pages) > > { > > - unsigned int nr_allocated =3D 0; > > + unsigned int nr_allocated =3D 0, order =3D *page_order; > > gfp_t alloc_gfp =3D gfp; > > bool nofail =3D gfp & __GFP_NOFAIL; > > struct page *page; > > @@ -3611,6 +3611,7 @@ vm_area_alloc_pages(gfp_t gfp, int nid, > > cond_resched(); > > nr_allocated +=3D 1U << order; > > } > > + *page_order =3D order; > > > > return nr_allocated; > > } > > @@ -3654,7 +3655,7 @@ static void *__vmalloc_area_node(struct vm_struct= *area, gfp_t gfp_mask, > > page_order =3D vm_area_page_order(area); > > > > area->nr_pages =3D vm_area_alloc_pages(gfp_mask | __GFP_NOWARN, > > - node, page_order, nr_small_pages, area->pages); > > + node, &page_order, nr_small_pages, area->pages); > > > > atomic_long_add(area->nr_pages, &nr_vmalloc_pages); > > if (gfp_mask & __GFP_ACCOUNT) { > > @@ -3686,6 +3687,10 @@ static void *__vmalloc_area_node(struct vm_struc= t *area, gfp_t gfp_mask, > > goto fail; > > } > > > > + > > + set_vm_area_page_order(area, page_order); > > + page_shift =3D page_order + PAGE_SHIFT; > > + > > /* > > * page tables allocations ignore external gfp mask, enforce it > > * by the scope API > > > The logic of this patch is somewhat similar to my first one. If high orde= r > allocation fails, it will go normal mapping. > > However I also save the fallback position. The ones before this position = are > used for huge mapping, the ones >=3D position for normal mapping as Barry= said. > "support the combination of PMD and PTE mapping". this will take some > times as it needs to address the corner cases and do some tests. > > IMO, the draft can fix the current issue, it also does not have significa= nt side > effects. Barry, what do you think about this patch? If you think it's oka= y, > I will split this patch into two: one to remove the VM_ALLOW_HUGE_VMAP an= d the > other to address the current mapping issue. Yes, it's acceptable, even though it's not perfect. However, addressing the mapping issues is an urgent requirement. Memory corruption is currently occurring, and we need to ensure the fix reaches the stable kernel and mainline as soon as possible. Removing VM_ALLOW_HUGE_VMAP in kvmalloc is not as urgent. with Baoquan's patch, we can even extend the fallback to non-nofail case. then maybe we ca= n remain VM_ALLOW_HUGE_VMAP of kvmalloc. This at least fixes one problem of kvmalloc: we are likely to fail because it is really difficult to get 2MB contiguous memory from buddy while memory is fragmented. diff --git a/mm/vmalloc.c b/mm/vmalloc.c index caf032f0bd69..6f47b01cbe2e 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -3585,11 +3585,11 @@ vm_area_alloc_pages(gfp_t gfp, int nid, else page =3D alloc_pages_node_noprof(nid, alloc_gfp, or= der); if (unlikely(!page)) { - if (!nofail) + if (!nofail && order =3D=3D 0) break; - + if (nofail) + alloc_gfp |=3D __GFP_NOFAIL; /* fall back to the zero order allocations */ - alloc_gfp |=3D __GFP_NOFAIL; order =3D 0; continue; } Other two optimizations can be deferred tasks 1. to save memory=E2=80=94such as avoiding allocating 4MB when users reques= t 2.1MB with kvmalloc. 2. to do mixed and adaptive mapping, for example, for the 2.1kvmalloc, * if the first 2MB is contiguous, we map the 0~2MB as PMD and 2MB~2.1MB as= PTE; * if the first 2MB is not contiguous, we map the whole 0~2.1MB as PTE > > -- > help you, help me, > Hailong. Thanks Barry