From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail6.bemta12.messagelabs.com (mail6.bemta12.messagelabs.com [216.82.250.247]) by kanga.kvack.org (Postfix) with ESMTP id 84E546B0023 for ; Wed, 18 May 2011 20:51:07 -0400 (EDT) Received: by bwz17 with SMTP id 17so2894292bwz.14 for ; Wed, 18 May 2011 17:51:03 -0700 (PDT) MIME-Version: 1.0 Reply-To: aquini@linux.com In-Reply-To: <20110518153445.GA18127@sgi.com> References: <20110518153445.GA18127@sgi.com> Date: Wed, 18 May 2011 21:51:03 -0300 Message-ID: Subject: Re: [PATCH] [BUGFIX] mm: hugepages can cause negative commitlimit From: Rafael Aquini Content-Type: multipart/alternative; boundary=00032555aefe4da90e04a39666ee Sender: owner-linux-mm@kvack.org List-ID: To: Russ Anderson Cc: Andrea Arcangeli , linux-mm , linux-kernel , Christoph Lameter , Andrew Morton --00032555aefe4da90e04a39666ee Content-Type: text/plain; charset=ISO-8859-1 Howdy, On Wed, May 18, 2011 at 12:34 PM, Russ Anderson wrote: > If the total size of hugepages allocated on a system is > over half of the total memory size, commitlimit becomes > a negative number. > > What happens in fs/proc/meminfo.c is this calculation: > > allowed = ((totalram_pages - hugetlb_total_pages()) > * sysctl_overcommit_ratio / 100) + total_swap_pages; > > The problem is that hugetlb_total_pages() is larger than > totalram_pages resulting in a negative number. Since > allowed is an unsigned long the negative shows up as a > big number. > > A similar calculation occurs in __vm_enough_memory() in mm/mmap.c. > > A symptom of this problem is that /proc/meminfo prints a > very large CommitLimit number. > > CommitLimit: 737869762947802600 kB > > To reproduce the problem reserve over half of memory as hugepages. > For example "default_hugepagesz=1G hugepagesz=1G hugepages=64 > Then look at /proc/meminfo "CommitLimit:" to see if it is too big. > > The fix is to not subtract hugetlb_total_pages(). When hugepages > are allocated totalram_pages is decremented so there is no need to > subtract out hugetlb_total_pages() a second time. > > Reported-by: Russ Anderson > Signed-off-by: Russ Anderson > > --- > > Example of "CommitLimit:" being too big. > > uv1-sys:~ # cat /proc/meminfo > MemTotal: 32395508 kB > MemFree: 32029276 kB > Buffers: 8656 kB > Cached: 89548 kB > SwapCached: 0 kB > Active: 55336 kB > Inactive: 73916 kB > Active(anon): 31220 kB > Inactive(anon): 36 kB > Active(file): 24116 kB > Inactive(file): 73880 kB > Unevictable: 0 kB > Mlocked: 0 kB > SwapTotal: 0 kB > SwapFree: 0 kB > Dirty: 1692 kB > Writeback: 0 kB > AnonPages: 31132 kB > Mapped: 15668 kB > Shmem: 152 kB > Slab: 70256 kB > SReclaimable: 17148 kB > SUnreclaim: 53108 kB > KernelStack: 6536 kB > PageTables: 3704 kB > NFS_Unstable: 0 kB > Bounce: 0 kB > WritebackTmp: 0 kB > CommitLimit: 737869762947802600 kB > Committed_AS: 394044 kB > VmallocTotal: 34359738367 kB > VmallocUsed: 713960 kB > VmallocChunk: 34325764204 kB > HardwareCorrupted: 0 kB > HugePages_Total: 32 > HugePages_Free: 32 > HugePages_Rsvd: 0 > HugePages_Surp: 0 > Hugepagesize: 1048576 kB > DirectMap4k: 16384 kB > DirectMap2M: 2064384 kB > DirectMap1G: 65011712 kB > > fs/proc/meminfo.c | 2 +- > mm/mmap.c | 3 +-- > 2 files changed, 2 insertions(+), 3 deletions(-) > > Index: linux/fs/proc/meminfo.c > =================================================================== > --- linux.orig/fs/proc/meminfo.c 2011-05-17 16:03:50.935658801 -0500 > +++ linux/fs/proc/meminfo.c 2011-05-18 08:53:00.568784147 -0500 > @@ -36,7 +36,7 @@ static int meminfo_proc_show(struct seq_ > si_meminfo(&i); > si_swapinfo(&i); > committed = percpu_counter_read_positive(&vm_committed_as); > - allowed = ((totalram_pages - hugetlb_total_pages()) > + allowed = (totalram_pages > * sysctl_overcommit_ratio / 100) + total_swap_pages; > > cached = global_page_state(NR_FILE_PAGES) - > Index: linux/mm/mmap.c > =================================================================== > --- linux.orig/mm/mmap.c 2011-05-17 16:03:51.727658828 -0500 > +++ linux/mm/mmap.c 2011-05-18 08:54:34.912222405 -0500 > @@ -167,8 +167,7 @@ int __vm_enough_memory(struct mm_struct > goto error; > } > > - allowed = (totalram_pages - hugetlb_total_pages()) > - * sysctl_overcommit_ratio / 100; > + allowed = totalram_pages * sysctl_overcommit_ratio / 100; > /* > * Leave the last 3% for root > */ > -- > Russ Anderson, OS RAS/Partitioning Project Lead > SGI - Silicon Graphics Inc rja@sgi.com I'm afraid this will introduce a bug on how accurate kernel will account memory for overcommitment limits. totalram_pages is not decremented as hugepages are allocated. Since hugepages are reserved, hugetlb_total_pages() has to be accounted and subtracted from totalram_pages in order to render an accurate number of remaining pages available to the general memory workload commitment. I've tried to reproduce your findings on my boxes, without success, unfortunately. I'll keep chasing to hit this behaviour, though. Cheers! --aquini --00032555aefe4da90e04a39666ee Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Howdy,

On Wed, May 18, 2011 at 12:34 PM, = Russ Anderson <rja@sgi.= com> wrote:
If the total size of hugepages allocated on a system is
over half of the total memory size, commitlimit becomes
a negative number.

What happens in fs/proc/meminfo.c is this calculation:

=A0 =A0 =A0 =A0allowed =3D ((totalram_pages - hugetlb_total_pages())
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0* sysctl_overcommit_ratio / 100) + total_sw= ap_pages;

The problem is that hugetlb_total_pages() is larger than
totalram_pages resulting in a negative number. =A0Since
allowed is an unsigned long the negative shows up as a
big number.

A similar calculation occurs in __vm_enough_memory() in mm/mmap.c.

A symptom of this problem is that /proc/meminfo prints a
very large CommitLimit number.

CommitLimit: =A0 =A0737869762947802600 kB

To reproduce the problem reserve over half of memory as hugepages.
For example "default_hugepagesz=3D1G hugepagesz=3D1G hugepages=3D64 Then look at /proc/meminfo "CommitLimit:" to see if it is too big= .

The fix is to not subtract hugetlb_total_pages(). =A0When hugepages
are allocated totalram_pages is decremented so there is no need to
subtract out hugetlb_total_pages() a second time.

Reported-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Russ Anderson <
rja@sgi.com= >

---

Example of "CommitLimit:" being too big.

uv1-sys:~ # cat /proc/meminfo
MemTotal: =A0 =A0 =A0 32395508 kB
MemFree: =A0 =A0 =A0 =A032029276 kB
Buffers: =A0 =A0 =A0 =A0 =A0 =A08656 kB
Cached: =A0 =A0 =A0 =A0 =A0 =A089548 kB
SwapCached: =A0 =A0 =A0 =A0 =A0 =A00 kB
Active: =A0 =A0 =A0 =A0 =A0 =A055336 kB
Inactive: =A0 =A0 =A0 =A0 =A073916 kB
Active(anon): =A0 =A0 =A031220 kB
Inactive(anon): =A0 =A0 =A0 36 kB
Active(file): =A0 =A0 =A024116 kB
Inactive(file): =A0 =A073880 kB
Unevictable: =A0 =A0 =A0 =A0 =A0 0 kB
Mlocked: =A0 =A0 =A0 =A0 =A0 =A0 =A0 0 kB
SwapTotal: =A0 =A0 =A0 =A0 =A0 =A0 0 kB
SwapFree: =A0 =A0 =A0 =A0 =A0 =A0 =A00 kB
Dirty: =A0 =A0 =A0 =A0 =A0 =A0 =A01692 kB
Writeback: =A0 =A0 =A0 =A0 =A0 =A0 0 kB
AnonPages: =A0 =A0 =A0 =A0 31132 kB
Mapped: =A0 =A0 =A0 =A0 =A0 =A015668 kB
Shmem: =A0 =A0 =A0 =A0 =A0 =A0 =A0 152 kB
Slab: =A0 =A0 =A0 =A0 =A0 =A0 =A070256 kB
SReclaimable: =A0 =A0 =A017148 kB
SUnreclaim: =A0 =A0 =A0 =A053108 kB
KernelStack: =A0 =A0 =A0 =A06536 kB
PageTables: =A0 =A0 =A0 =A0 3704 kB
NFS_Unstable: =A0 =A0 =A0 =A0 =A00 kB
Bounce: =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A00 kB
WritebackTmp: =A0 =A0 =A0 =A0 =A00 kB
CommitLimit: =A0 =A0737869762947802600 kB
Committed_AS: =A0 =A0 394044 kB
VmallocTotal: =A0 34359738367 kB
VmallocUsed: =A0 =A0 =A0713960 kB
VmallocChunk: =A0 34325764204 kB
HardwareCorrupted: =A0 =A0 0 kB
HugePages_Total: =A0 =A0 =A032
HugePages_Free: =A0 =A0 =A0 32
HugePages_Rsvd: =A0 =A0 =A0 =A00
HugePages_Surp: =A0 =A0 =A0 =A00
Hugepagesize: =A0 =A01048576 kB
DirectMap4k: =A0 =A0 =A0 16384 kB
DirectMap2M: =A0 =A0 2064384 kB
DirectMap1G: =A0 =A065011712 kB

=A0fs/proc/meminfo.c | =A0 =A02 +-
=A0mm/mmap.c =A0 =A0 =A0 =A0 | =A0 =A03 +--
=A02 files changed, 2 insertions(+), 3 deletions(-)

Index: linux/fs/proc/meminfo.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- linux.orig/fs/proc/meminfo.c =A0 =A0 =A0 =A02011-05-17 16:03:50.9356588= 01 -0500
+++ linux/fs/proc/meminfo.c =A0 =A0 2011-05-18 08:53:00.568784147 -0500
@@ -36,7 +36,7 @@ static int meminfo_proc_show(struct seq_
=A0 =A0 =A0 =A0si_meminfo(&i);
=A0 =A0 =A0 =A0si_swapinfo(&i);
=A0 =A0 =A0 =A0committed =3D percpu_counter_read_positive(&vm_committe= d_as);
- =A0 =A0 =A0 allowed =3D ((totalram_pages - hugetlb_total_pages())
+ =A0 =A0 =A0 allowed =3D (totalram_pages
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0* sysctl_overcommit_ratio / 100) + total_sw= ap_pages;

=A0 =A0 =A0 =A0cached =3D global_page_state(NR_FILE_PAGES) -
Index: linux/mm/mmap.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- linux.orig/mm/mmap.c =A0 =A0 =A0 =A02011-05-17 16:03:51.727658828 -0500=
+++ linux/mm/mmap.c =A0 =A0 2011-05-18 08:54:34.912222405 -0500
@@ -167,8 +167,7 @@ int __vm_enough_memory(struct mm_struct
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0goto error;
=A0 =A0 =A0 =A0}

- =A0 =A0 =A0 allowed =3D (totalram_pages - hugetlb_total_pages())
- =A0 =A0 =A0 =A0 =A0 =A0 =A0 * sysctl_overcommit_ratio / 100;
+ =A0 =A0 =A0 allowed =3D totalram_pages * sysctl_overcommit_ratio / 100; =A0 =A0 =A0 =A0/*
=A0 =A0 =A0 =A0 * Leave the last 3% for root
=A0 =A0 =A0 =A0 */
--
Russ Anderson, OS RAS/Partitioning Project Lead
SGI - Silicon Graphics Inc =A0 =A0 =A0 =A0 =A0rja@sgi.com

I'm afraid this will = introduce a bug on how accurate kernel will account memory for overcommitme= nt limits.

totalram_pages is not decremented as hugepages are allo= cated. Since hugepages are reserved,=A0hugetlb_total_pages() has to be acco= unted and subtracted from totalram_pages in order to render an accurate num= ber of remaining pages available to the general memory workload commitment.=

I've tried to reproduce your findings on my boxes,= =A0=A0without success,=A0unfortunately.

I'll keep chasing to hit this behaviour, though.

Cheers!
--aquini
--00032555aefe4da90e04a39666ee-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org