From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1BE10C3DA63 for ; Fri, 26 Jul 2024 05:30:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 70B436B0082; Fri, 26 Jul 2024 01:30:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6BB096B0088; Fri, 26 Jul 2024 01:30:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5A9C56B008A; Fri, 26 Jul 2024 01:30:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 3AE426B0088 for ; Fri, 26 Jul 2024 01:30:00 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 9DF22121362 for ; Fri, 26 Jul 2024 05:29:59 +0000 (UTC) X-FDA: 82380777318.27.DFD4837 Received: from mail-vk1-f178.google.com (mail-vk1-f178.google.com [209.85.221.178]) by imf17.hostedemail.com (Postfix) with ESMTP id D7D0840024 for ; Fri, 26 Jul 2024 05:29:56 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="j/nxnk1m"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf17.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.178 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721971795; a=rsa-sha256; cv=none; b=ehljJ6j6DrDORuw6UokbBFBUoDT/cdDxu9ORNgV1+wx4Vn9eY/nH1pzKZPvVcckaoRzPsT P0AOjtDM+BQ3n2GiyNgbFRhbyARhRSNPuQV66SqQyb+DVxbHeQnTX7Kp5vqaRYbvc5odn1 q8qK8DNG9AKdNu+Evp6H2PnyjIwJKQ0= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="j/nxnk1m"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf17.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.178 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721971795; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=qtvEPRvVT6P08LbTj9n6n9WKl7c6OVDzze2g6xsE7DE=; b=CA8GwQ+FD4KKeE/DRydwkL6DoYF0CjxwnH7HrTCeEy8IUa6ziAhJW466KRIeh2ZuNBZWOp 7BT7NQ6b7BOW9/yqWwvHX8CsXQutKKfv2LhRVI014qKhl97fT2lSRtl/BE8mrTnuT61r9l /YucAAdF9mLeW9qG6Q0FrD5PWcA+klU= Received: by mail-vk1-f178.google.com with SMTP id 71dfb90a1353d-4f521a22d74so110085e0c.2 for ; Thu, 25 Jul 2024 22:29:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1721971796; x=1722576596; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=qtvEPRvVT6P08LbTj9n6n9WKl7c6OVDzze2g6xsE7DE=; b=j/nxnk1m5UZ28Z5hY0j0dbPe/6wV5b1hm/8UBNOMElnIs/BNVzDxUx+i4SpMUW2oyB ILUgHrrL05P19ZgbYNNpU9tXAeEU47Ql/jnBGCIO2BLouwmf5dke3U5O/xxoR4peWp1u m7d5x1K2ZmQdQRVVKT0RIW9CMatGUHp9ocv/5vhZ0JDCWyCB1j7huOc2w75eMpH9ymhl pi3aK02Ojkbi8T0S4zma49yPQdxQaBFRXFhY9wIudEJ6lE9Qwxk6wLk55xls+xfBGfnF SJvLhKOaftSJsw38EJ1dAUheGCOwQp1hxDkayr+ABKSzC7Iy8nURax5RzYivvFjoDo09 32Vg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721971796; x=1722576596; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=qtvEPRvVT6P08LbTj9n6n9WKl7c6OVDzze2g6xsE7DE=; b=fog/R22MtP263rlwhWYYkRXMH0CSdVvMj1DUGI9+9/77PKyEEBUVvan5KklMUUnO6E /k7Lq1/LEbrhxowrY1G04ufpEciDasc+0M3hQXviliWAm8ah6Lv3O9z2QnB3oMA0WLEQ gTFS5yHzpDVkc+bzExj6wHdX5aBN7KCqrt8ogy+XSdpBzu3LERQVuRxu8PNUwXQrsxfJ rQ0lrYpDPicHDi8aASw51FyKkPrZf4U4VPw+1EeHpxZONfYpq9YqIGIkacAWexB9t86i +fF+aJWFq1jjfqQZxbaZg0StCHxSCvhEfEfOPhQJnWBFi0H4MKwQUBMmINlw43qydonL LLzA== X-Forwarded-Encrypted: i=1; AJvYcCX1278jwp2wKf54DVhYyTp2FZZWsfMojBE+k6Mo6FdiZo4BcwXR0tuuNxgad4hwYII3KtnUm5rg8DbrQaXE/1oepVg= X-Gm-Message-State: AOJu0YxUb178GuRVEZAHou3c0KchLpG2O5YnfxmKhRL6JOfGGuXbl+PQ qwvjAL8AN2F/9n+b4mdSaNsjOnRs+F3soteXaQf0sB8oNLVJ/U58euYKeN06wx1iWBst/1z4Gza 6m3ZLdtgWIhodWBtBUFIRcrksz4M= X-Google-Smtp-Source: AGHT+IFgsokiU004rCnirxcltPgVF00Ev7xNNGQ8SdWAEbfO927jp75/niejt62iKcmxuP5K4l3qiUxL/Akfiopc0Oc= X-Received: by 2002:a05:6122:91d:b0:4ed:14e:9342 with SMTP id 71dfb90a1353d-4f6c5b5a63amr6117959e0c.1.1721971795746; Thu, 25 Jul 2024 22:29:55 -0700 (PDT) MIME-Version: 1.0 References: <20240725035318.471-1-hailong.liu@oppo.com> <20240725164003.ft6huabwa5dqoy2g@oppo.com> <20240726040052.hs2gvpktrnlbvhsq@oppo.com> <20240726050356.ludmpxfee6erlxxt@oppo.com> In-Reply-To: <20240726050356.ludmpxfee6erlxxt@oppo.com> From: Barry Song <21cnbao@gmail.com> Date: Fri, 26 Jul 2024 17:29:44 +1200 Message-ID: Subject: Re: [RFC PATCH v2] mm/vmalloc: fix incorrect __vmap_pages_range_noflush() if vm_area_alloc_pages() from high order fallback to order0 To: Hailong Liu Cc: Baoquan He , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Lorenzo Stoakes , Vlastimil Babka , Michal Hocko , Matthew Wilcox , Tangquan Zheng , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Queue-Id: D7D0840024 X-Rspamd-Server: rspam01 X-Stat-Signature: pnw3hqy7uownke558xro7w7x13c3ehg1 X-HE-Tag: 1721971796-686892 X-HE-Meta: U2FsdGVkX1/6Ab5N3B9PbAfc71SMEVp0ehR99+CKPV5obNeVjKKbfMD8I8pdyEPbauSg2S3N9N7GoPqJZMB5obgeJ9jeozSpD3XxHxUqMbJSp/sbtwTXkudmzhyFuBlVO16KVyN9ENo5+2H/eszfhlY0TCUg/o8pA5zCW+zq6yXoB/jyCLTYIUsBHpOjy9LvN9ekwQPob7gA+mhSzFhj2qbomxYtj8QlAUDmaYI4mmuIJ1qC82H02UAKrp4z6evjxvanOlryTDRMLmGpzNbOD0Z50jHRWadLR07YPcyts9IzSOH5sDzis5Abw8azrOLVGRqrNujFWjVSmkWpjJA3Jd95cvFrIk7qCAXXEGQwMKOG8Wgv/CqJjymZRHEh8qxc+NLTh0MWIwkOCaE+9hPh8toySxBgpjXDV2enbGbnfWvVcbuZfLBa57caZxnpssiHtrwycnOi/FsCkdJ7cX+fVimmBzxvlm8gDQBElAmJkdov5myiusL0ZhEOJ0LitSsjC+p2QF0+JR947VDskhzvwzhB8eFcEfQ7vEm+lLwyq5R176jwcaVnaL0GTomrSIMsXhFkNN6VDtb5atmnfU/t8xQ6POxnnxZz70+wnefjB6WeNgjeMkTVi8/TJJSD++o3uvL2/0lLgjUniiRNr9AnzEPs9afrC4wuyQwfuVxmx6mGndHNnTNh9Eg5SGyleaHpBoeJc1GmrzjiZ3T2u0oKxur7UYEc53ndWHEBjg6PWvOrJNvSI4Zr8AvGIJ6VGiGUlsjUsL1Hia3hPKtKHsoc+cAXAVCLAbOpKrF+bz+u1IHpD6BdhUw7SHULnZPxJsiQgj9pE5w8AhJUK35TlG8iJKnpYhSZckt/JswTQEd1cVB1EBzkB6A942zGUHgXFHKxEu309XTCqDiTyWBjl+HOye7O3vBhrx7Q2QcrSQ8zYDyvsah12Hkan7rSs4qTzPrP9yZFgLIpACD0314HLNT ivS0LZlV cZrksK1fzCGsa25EnToyz+rSnVi1Ht3gxNxL1ZDmn8SP3qZuLVfgFcMSZUpLg4OD6RVTnzRH4z0KPSSH2Jj3DGb7IgmeScERioWm0RUV/OFYl0C7DbdEjnV4buqOeR902ux9g671hvTMo1p+qMEFk2CCR9c1geod9zlTvJwNiFarTKBQistc6bxipzN4nf9wWWRb4anYi9eCbpr2JNi+tmA9iwODpAe74eKL7o1tJ/k0f2xTYUlEx+a9zf9+UiLqhipMsj0AO/1OlEsUdLeYaBz8+n2izJMmAKUCztWqGHxq79IWc6z3ONtT/SgRK3aSE2doZw4QX7iK5GfgcH11IPgHL1welk7nDKtkQI7VS8oYaXCPrn8JWeTXboh9UZVU/XkqaD+rvXxjQRIXLbbqOe139ig== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Jul 26, 2024 at 5:04=E2=80=AFPM Hailong Liu = wrote: > > On Fri, 26. Jul 12:00, Hailong Liu wrote: > > On Fri, 26. Jul 10:31, Baoquan He wrote: > > [...] > > > > The logic of this patch is somewhat similar to my first one. If hig= h order > > > > allocation fails, it will go normal mapping. > > > > > > > > However I also save the fallback position. The ones before this pos= ition are > > > > used for huge mapping, the ones >=3D position for normal mapping as= Barry said. > > > > "support the combination of PMD and PTE mapping". this will take s= ome > > > > times as it needs to address the corner cases and do some tests. > > > > > > Hmm, we may not need to worry about the imperfect mapping. Currently > > > there are two places setting VM_ALLOW_HUGE_VMAP: __kvmalloc_node_nopr= of() > > > and vmalloc_huge(). > > > > > > For vmalloc_huge(), it's called in below three interfaces which are a= ll > > > invoked during boot. Basically they can succeed to get required conti= guous > > > physical memory. I guess that's why Tangquan only spot this issue on = kvmalloc > > > invocation when the required size exceeds e.g 2M. For kvmalloc_node()= , > > > we have told that in the code comment above __kvmalloc_node_noprof(), > > > it's a best effort behaviour. > > > > > Take a __vmalloc_node_range(2.1M, VM_ALLOW_HUGE_VMAP) as a example. > > because the align requirement of huge. the real size is 4M. > > if allocation first order-9 successfully and the next failed. becuase t= he > > fallback, the layout out pages would be like order9 - 512 * order0 > > order9 support huge mapping, but order0 not. > > with the patch above, would call vmap_small_pages_range_noflush() and d= o normal > > mapping, the huge mapping would not exist. > > > > > mm/mm_init.c <> > > > table =3D vmalloc_huge(size, gfp_flags); > > > net/ipv4/inet_hashtables.c <> > > > new_hashinfo->ehash =3D vmalloc_huge(ehash_entries * sizeof(struct i= net_ehash_bucket), > > > net/ipv4/udp.c <> > > > udptable->hash =3D vmalloc_huge(hash_entries * 2 * sizeof(struct udp= _hslot) > > > > > > Maybe we should add code comment or document to notice people that th= e > > > contiguous physical pages are not guaranteed for vmalloc_huge() if yo= u > > > use it after boot. > > > > > > > > > > > IMO, the draft can fix the current issue, it also does not have sig= nificant side > > > > effects. Barry, what do you think about this patch? If you think it= 's okay, > > > > I will split this patch into two: one to remove the VM_ALLOW_HUGE_V= MAP and the > > > > other to address the current mapping issue. > > > > > > > > -- > > > > help you, help me, > > > > Hailong. > > > > > > > > > > > I check the code, the issue only happen in gfp_mask with __GFP_NOFAIL and > fallback to order 0, actuaally without this commit > e9c3cda4d86e ("mm, vmalloc: fix high order __GFP_NOFAIL allocations") > if __vmalloc_area_node allocation failed, it will goto fail and try order= -0. > > fail: > if (shift > PAGE_SHIFT) { > shift =3D PAGE_SHIFT; > align =3D real_align; > size =3D real_size; > goto again; > } > > So do we really need fallback to order-0 if nofail? Good catch, this is what I missed. I feel we can revert Michal's fix. And just remove __GFP_NOFAIL bit when we are still allocating by high-order. When "goto again" happens, we will allocate by order-0, in this case, we keep the __GFP_NOFAIL. > > > > -- > > help you, help me, > > Hailong. Thanks Barry