From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.6 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5A809C2D0A3 for ; Wed, 4 Nov 2020 13:55:25 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8B9152240C for ; Wed, 4 Nov 2020 13:55:24 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="d7c2RHA6" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8B9152240C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 941BD6B0036; Wed, 4 Nov 2020 08:55:23 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8CD0A6B005D; Wed, 4 Nov 2020 08:55:23 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 791366B0068; Wed, 4 Nov 2020 08:55:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0041.hostedemail.com [216.40.44.41]) by kanga.kvack.org (Postfix) with ESMTP id 488A46B0036 for ; Wed, 4 Nov 2020 08:55:23 -0500 (EST) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id B438E1EE6 for ; Wed, 4 Nov 2020 13:55:22 +0000 (UTC) X-FDA: 77446882884.19.wound79_1b17d15272c1 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin19.hostedemail.com (Postfix) with ESMTP id 817A01AD1B0 for ; Wed, 4 Nov 2020 13:55:22 +0000 (UTC) X-HE-Tag: wound79_1b17d15272c1 X-Filterd-Recvd-Size: 12019 Received: from mail-qk1-f174.google.com (mail-qk1-f174.google.com [209.85.222.174]) by imf28.hostedemail.com (Postfix) with ESMTP for ; Wed, 4 Nov 2020 13:55:21 +0000 (UTC) Received: by mail-qk1-f174.google.com with SMTP id a65so17045348qkg.13 for ; Wed, 04 Nov 2020 05:55:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=8FcQH1PEeXJS3ixtPcGywp8QkyhZXXwjE0Lo6hPxm30=; b=d7c2RHA6nGbZZMqn20RQbuP00va+qTA6jCIvjL+xbGFrNEU+5lac7eZDHIQO7YyKFL CLjV1jo03TTMp0MXNgAkI8pWUFOVHenwnmMVDC74kzpBjZ3oVmAiJM2iQxRrOn2KLROs VzwGQLU9v4EZPXm15WkwRSEZ2RNDNsSVkRf3HXVUZHgDr1hvC/c09mOG6du0/lKCzW75 I+Jc8iHRWyVcjiAkYTT1ppsa9i21VnRtEZQHhFEoHd2KxMt9bT0dlJYZz8MecwMNKdv2 jLSSBshAqlM3XWrc8elkM68luZnl11HopPloDnIRaU75QH4SoeBQORe9dmITA6kUDRvk T1fQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=8FcQH1PEeXJS3ixtPcGywp8QkyhZXXwjE0Lo6hPxm30=; b=bU0v7sq9VwGuwR1pW0cwoJW1VF3suM0ITl0geZ6lANkKCAShnFA6iVyFg5dW5SqNi6 kHIr2x0See4sQR5lDWmHbupW6vRWMcXRYx0VdBBnt25BkuYLiP8t4KEsH17YC1GQ644s Vwvhax3Hpl71jILI4LPxOaBfWWOxa4Pi0QmLCKHJ8PqjEEcrkY1PmIp1qXI4HsGVnqw8 qKNr6IlB5x/7A/zi+J+d2MBT9kXdCYbxpRrfftA7FpPjQDJx2qZC4baPIDmqp7CDy72M Oxl7GL3ouGMqMoUaDwMeBnioEFyRf7gnDiV8azPBHttF5/VUjSAfh+bA6n63zu9slrT3 /x3A== X-Gm-Message-State: AOAM533Fjbgv2ycZJ+x4CHz5gilC3GunK445Ci0pOIZMPI+XcyDtEIgo Ja2PAZcw/ooiqSKy3sE0FFIfZa604xCJgPa0yc8= X-Google-Smtp-Source: ABdhPJwiTWWysu8eVOFGsGqMukQxQmkOmVvQp9AW841eVNHjVc4QxlBcRNyVv9/v/+9vQyObEOiyfgbo2gXPY9ANF9k= X-Received: by 2002:a05:620a:1014:: with SMTP id z20mr25246765qkj.409.1604498121327; Wed, 04 Nov 2020 05:55:21 -0800 (PST) MIME-Version: 1.0 References: <20201103162740.6a7c835276b5a704d5b219cc@linux-foundation.org> In-Reply-To: <20201103162740.6a7c835276b5a704d5b219cc@linux-foundation.org> From: Hsin-Hui Wu Date: Wed, 4 Nov 2020 08:55:11 -0500 Message-ID: Subject: Re: [Bug 210023] New: Crash when allocating > 2 TB memory To: Andrew Morton Cc: bugzilla-daemon@bugzilla.kernel.org, linux-mm@kvack.org Content-Type: multipart/alternative; boundary="00000000000090883f05b3485498" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: --00000000000090883f05b3485498 Content-Type: text/plain; charset="UTF-8" > With a machine with 3 TB (more than 2 TB memory). If you use vmalloc to > allocate > 2 TB memory, the array_size below will be overflowed. How was this observed? Is there any know userspace operation which causes the kernel to try to vmalloc such a large hunk of memory? [Frank] The Dell PowerEdge R740/R940 can have up to 3TB/6TB memory. installed. Our application requires reserve consecutive memory in the kernel space and protected from userspace programs. ---------------------------------------------------------------------------------------------------- OK, thanks. Against current mainline your proposed change would look like this, yes? [Frank] Yes. This will support up to less than 16 TB. If you want to support more than 16 TB, we need to expand nr_pages to unsigned long as Matthew pointed out. Will it be possible to add this to kernel 3.10.0-957.27.2.el7.x86_64? Thanks, Frank On Tue, Nov 3, 2020 at 7:27 PM Andrew Morton wrote: > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > On Tue, 03 Nov 2020 18:50:07 +0000 bugzilla-daemon@bugzilla.kernel.org > wrote: > > > https://bugzilla.kernel.org/show_bug.cgi?id=210023 > > > > Bug ID: 210023 > > Summary: Crash when allocating > 2 TB memory > > Product: Memory Management > > Version: 2.5 > > Kernel Version: 3.10.0-957.27.2.el7.x86_64 > > Hardware: All > > OS: Linux > > Tree: Mainline > > Status: NEW > > Severity: blocking > > Priority: P1 > > Component: Slab Allocator > > Assignee: akpm@linux-foundation.org > > Reporter: hsinhuiwu@gmail.com > > Regression: No > > > > With a machine with 3 TB (more than 2 TB memory). If you use vmalloc to > > allocate > 2 TB memory, the array_size below will be overflowed. > > How was this observed? > > Is there any know userspace operation which causes the kernel to try to > vmalloc such a large hunk of memory? > > > The array_size is an unsigned int and can only be used to allocate less > than 2 > > TB memory. If you pass 2*1028*1028*1024*1024 = 2 * 2^40 in the argument > of > > vmalloc. The array_size will become 2*2^31 = 2^32. The 2^32 cannot be > store > > with a 32 bit integer. > > > > The fix is to change the type of array_size to unsigned long. > > > > vmalloc.c > > > > 1762 void *vmalloc(unsigned long size) > > 1763 { > > 1764 return __vmalloc_node_flags(size, NUMA_NO_NODE, > > 1765 GFP_KERNEL | __GFP_HIGHMEM); > > 1766 } > > OK, thanks. Against current mainline your proposed change would look > like this, yes? > > --- a/mm/vmalloc.c~a > +++ a/mm/vmalloc.c > @@ -2461,9 +2461,11 @@ static void *__vmalloc_area_node(struct > { > const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | > __GFP_ZERO; > unsigned int nr_pages = get_vm_area_size(area) >> PAGE_SHIFT; > - unsigned int array_size = nr_pages * sizeof(struct page *), i; > + unsigned long array_size > + unsigned int i; > struct page **pages; > > + array_size = (unsigned long)nr_pages * sizeof(struct page *); > gfp_mask |= __GFP_NOWARN; > if (!(gfp_mask & (GFP_DMA | GFP_DMA32))) > gfp_mask |= __GFP_HIGHMEM; > _ > > --00000000000090883f05b3485498 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
> With a machine with 3 TB (more than 2 TB memory). If = you use vmalloc to
> allocate > 2 TB memory, the array_size below = will be overflowed.

How was this observed?

Is there any know = userspace operation which causes the kernel to try to
vmalloc such a lar= ge hunk of memory?=C2=A0=C2=A0

[Frank] The Dell PowerEdge R740= /R940 can have up to=C2=A0 3TB/6TB memory.
installed. Our applica= tion requires=C2=A0reserve consecutive memory in the kernel
space= and protected from userspace programs.=C2=A0

-------------------------------------------------------------------------= ---------------------------
OK, thanks.=C2=A0 Against current mai= nline your proposed change would look
like this, yes?

=
[Frank] Yes. This will support up to less than 16 TB. If you wan= t to support
=C2=A0more than 16 TB, we need to expand nr_pages to= unsigned long as=C2=A0
Matthew pointed out.

=
Will it be possible to add this to kernel=C2=A03.10.0-957.27.2.el7.x86_64?
<= br>
Thanks,
Frank=C2=A0=C2=A0

On Tue, Nov 3, 2= 020 at 7:27 PM Andrew Morton <akpm@linux-foundation.org> wrote:
(switched to email.=C2=A0 Please respond via em= ailed reply-to-all, not via the
bugzilla web interface).

On Tue, 03 Nov 2020 18:50:07 +0000 bugzilla-daemon@bugzilla.kernel.org wr= ote:

> https://bugzilla.kernel.org/show_bug.cgi?= id=3D210023
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Bug ID: 210023
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Summary: Crash when allocatin= g > 2 TB memory
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Product: Memory Management >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Version: 2.5
>=C2=A0 =C2=A0 =C2=A0Kernel Version: 3.10.0-957.27.2.el7.x86_64
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Hardware: All
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0OS: Linux=
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Tree: Mainline >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Status: NEW
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Severity: blocking
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Priority: P1
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Component: Slab Allocator
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Assignee: akpm@linux-foundation.org >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Reporter: hsinhuiwu@gmail.com
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Regression: No
>
> With a machine with 3 TB (more than 2 TB memory). If you use vmalloc t= o
> allocate > 2 TB memory, the array_size below will be overflowed.
How was this observed?

Is there any know userspace operation which causes the kernel to try to
vmalloc such a large hunk of memory?

> The array_size is an unsigned int and can only be used to allocate les= s than 2
> TB memory. If you pass 2*1028*1028*1024*1024 =3D 2 * 2^40 in the argum= ent of
> vmalloc. The array_size will become 2*2^31 =3D 2^32. The 2^32 cannot b= e store
> with a 32 bit integer.
>
> The fix is to change the type of array_size to unsigned long.
>
> vmalloc.c
>
> 1762 void *vmalloc(unsigned long size)
> 1763 {
> 1764=C2=A0 =C2=A0 =C2=A0return __vmalloc_node_flags(size, NUMA_NO_NODE= ,
> 1765=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0GFP_KERNEL | __GFP_HIGHMEM);
> 1766 }

OK, thanks.=C2=A0 Against current mainline your proposed change would look<= br> like this, yes?

--- a/mm/vmalloc.c~a
+++ a/mm/vmalloc.c
@@ -2461,9 +2461,11 @@ static void *__vmalloc_area_node(struct
=C2=A0{
=C2=A0 =C2=A0 =C2=A0 =C2=A0 const gfp_t nested_gfp =3D (gfp_mask & GFP_= RECLAIM_MASK) | __GFP_ZERO;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 unsigned int nr_pages =3D get_vm_area_size(area= ) >> PAGE_SHIFT;
-=C2=A0 =C2=A0 =C2=A0 =C2=A0unsigned int array_size =3D nr_pages * sizeof(s= truct page *), i;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0unsigned long array_size
+=C2=A0 =C2=A0 =C2=A0 =C2=A0unsigned int i;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 struct page **pages;

+=C2=A0 =C2=A0 =C2=A0 =C2=A0array_size =3D (unsigned long)nr_pages * sizeof= (struct page *);
=C2=A0 =C2=A0 =C2=A0 =C2=A0 gfp_mask |=3D __GFP_NOWARN;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (!(gfp_mask & (GFP_DMA | GFP_DMA32))) =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 gfp_mask |=3D __GFP= _HIGHMEM;
_

--00000000000090883f05b3485498--