* The 4GB memory thing @ 1999-11-03 9:48 Neil Conway 1999-11-03 16:25 ` The 64GB " Ingo Molnar 1999-11-03 18:09 ` The 4GB " Kanoj Sarcar 0 siblings, 2 replies; 6+ messages in thread From: Neil Conway @ 1999-11-03 9:48 UTC (permalink / raw) To: Linux MM [-- Attachment #1: Type: text/plain, Size: 272 bytes --] The recent thread about >4GB surprised me, as I didn't even think >2GB was very stable yet. Am I wrong? Are people out there using 4GB boxes with decent stability? I presume it's a 2.3 feature, yes? Sorry for my ignorance, I guess I've been dozing a bit of late. Neil [-- Attachment #2: Type: message/rfc822, Size: 670 bytes --] From: Neil Conway <nconway.list@ukaea.org.uk> To: "linux-smp@vger.rutgers.edu" <linux-smp@vger.rutgers.edu> Subject: The 4GB memory thing Date: Wed, 03 Nov 1999 09:46:14 +0000 Message-ID: <38200466.5839E78E@ukaea.org.uk> The recent thread about >4GB surprised me, as I didn't even think >2GB was very stable yet. Am I wrong? Are people out there using 4GB boxes with decent stability? I presume it's a 2.3 feature, yes? Sorry for my ignorance, I guess I've been dozing a bit of late. Neil ^ permalink raw reply [flat|nested] 6+ messages in thread
* The 64GB memory thing 1999-11-03 9:48 The 4GB memory thing Neil Conway @ 1999-11-03 16:25 ` Ingo Molnar 1999-11-03 15:46 ` Neil Conway 1999-11-03 18:09 ` The 4GB " Kanoj Sarcar 1 sibling, 1 reply; 6+ messages in thread From: Ingo Molnar @ 1999-11-03 16:25 UTC (permalink / raw) To: Neil Conway; +Cc: Linux MM On Wed, 3 Nov 1999, Neil Conway wrote: > The recent thread about >4GB surprised me, as I didn't even think >2GB > was very stable yet. Am I wrong? Are people out there using 4GB > boxes with decent stability? I presume it's a 2.3 feature, yes? the 64GB stuff got included recently. It's a significant rewrite of the lowlevel x86 MM and generic MM layer, here is a short description about it: my 'HIGHMEM patch' went into the 2.3 kernel starting at pre4-2.3.23. This means a heavily rewritten VM subsystem (to deal with pte's bigger than machine word size), and a much rewritten x86 memory and boot architecture. In fact there is no bigmem anymore, it has been replaced by (the i think more correct term) 'high memory'. It utilizes 3-level page tables on PPro+ CPUs called 'Physical Address Extension' (PAE) mode. In PAE mode the CPU uses a completely different and incompatible page table structure, which is 3-level and has 64-bit page table entries and cover up to 64GB physical RAM. Virtual space is still unchanged, 4GB. Highmem is completely transparent to user-space. There is a new 'High Memory Support' option under 'Processor type and features': High Memory Support ( ) off ( ) 4GB (X) 64GB 'off' is up to 1GB RAM, utilizing 2-level page tables and no highmem support. '4GB' is utilizing 2-level page tables and high memory support for any <4GB physical RAM that cannot be permanently mapped by the kernel. '64GB' mode utilizes 3-level page tables (for everything). The theoretical limit of high memory on IA32 boxes is 16 TB - there is lots of space in 64-bit PAE pte's, although current CPUs only support up to 64GB RAM. (the biggest current chipsets supports up to 32GB RAM) about the structure of the patch/feature itself, kernel internals: pgtable.h got split up into pgtable-2level.h and pgtable-3level.h, which should be the only 'global #ifdef' distincting 3-level from 2-level page tables on x86. There were lots of assumptions throughout the arch/i386 tree that assumed 2-level page tables, these places all had to be fixed and converted to 'generic 3-level page table code'. There are only a few CONFIG_X86_PAE #ifdefs left, i intend to cut down the number of these even more, to keep the x86 lowlevel MM/boot code easy to maintain. the generic kernel was almost safe wrt. 3-level page tables, but nevertheless it had bugs which only triggered in PAE mode. For example, one pgd entry in PAE mode covers 1GB of virtual memory, and some loops which iterated through virtual memory had buggy exit conditions and broke in subtle ways when they were running in the upper-most 1GB of virtual memory. (ie. kernel space) There were about 20 of such buggy loops throughout the MM code. the much bigger generic change was that pte's got 64-bit, although the architecture itself was still 32-bit. Lots of VM-internal code had to be reworked to never assume that 'sizeof(pte_t) == sizeof(unsigned long)'. Examples are the swapping code, IPC shared memory. Bigger than machine-word ptes were not supported by Linux previously. also i guess many of you have noticed the new mm/bootmem.c allocator - this was necessery because on my 8GB box mem_map is more than 100 MB (!), and the 'naive' boot-time allocation we did in earlier kernels simply did not work on 'slightly noncontinous' physical maps like my box has. (at 64MB there is an ACPI area which caused problems) [this short description should give you a scope of the changes, and i/we are still fixing some of the impact in 2.3.25. (Christoph just posted his problems with IPC shared memory)] Backporting to 2.2: while the bigmem patch was small and simple and got backported to 2.2, the highmem patch is basically impossible to be backported in a maintainable way as it touches some 60 files all over the kernel. 64 GB PAE mode works just fine on my 8GB RAM, 8-way Xeon box: 11:25pm up 5 min, 2 users, load average: 7.78, 4.30, 1.77 30 processes: 21 sleeping, 9 running, 0 zombie, 0 stopped CPU states: 0.0% user, 7.2% system, 92.8% nice, 0.0% idle Mem: 8241152K av,7720960K used, 520192K free, 0K shrd, 2168K buff Swap: 0K av, 0K used, 0K free 9756K cached PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME COMMAND 215 root 19 19 1000M 1.0G 232 R N 1.0G 11.6 12.4 0:11 db_serv 180 root 19 19 1000M 1.0G 232 R N 1.0G 11.5 12.4 0:31 db_serv 182 root 19 19 1000M 1.0G 232 R N 1.0G 11.5 12.4 0:30 db_serv 183 root 19 19 1000M 1.0G 232 R N 1.0G 11.5 12.4 0:30 db_serv 184 root 19 19 1000M 1.0G 232 R N 1.0G 11.5 12.4 0:30 db_serv 185 root 19 19 1000M 1.0G 232 R N 1.0G 11.5 12.4 0:30 db_serv 186 root 19 19 1000M 1.0G 232 R N 1.0G 11.5 12.4 0:29 db_serv 247 root 19 19 500M 500M 232 R N 0 11.5 6.2 0:04 db_serv2 181 root 1 0 996 996 828 R 0 7.2 0.0 0:16 top 177 root 0 0 984 984 768 S 0 0.1 0.0 0:00 bash 1 root 0 0 476 476 408 S 0 0.0 0.0 0:00 init (these are 8x ~1G-RSS processes using up all 8GB physical RAM, one running on each CPU) future plans: right now high memory is seriously underused on typical servers due to the page cache still being in low memory. On 2.2 with bigmem the lowmem:highmem ratio is around 5:1, this means that the 'effective' size of my 8GB box in 2.2 is only ~2.4GB. The exception are workloads where most memory is allocated as shared memory or user memory, but this is not the case for a typical web or fileserver. On my box the pagecache is already in high memory (and we are ready to add 64-bit DMA to device drivers), and the lowmem:highmem ratio is up to 1:10. This means that 8GB RAM is already fully utilized on a typical server workload. Ingo ps. to have correct memory statistics (top, vmstat, free) with >4GB RAM you need the newest procps package (or i can send the patches). -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://humbolt.geo.uu.nl/Linux-MM/ ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: The 64GB memory thing 1999-11-03 16:25 ` The 64GB " Ingo Molnar @ 1999-11-03 15:46 ` Neil Conway 1999-11-03 17:09 ` Ingo Molnar 0 siblings, 1 reply; 6+ messages in thread From: Neil Conway @ 1999-11-03 15:46 UTC (permalink / raw) To: Ingo Molnar; +Cc: Linux MM Ingo Molnar wrote: > > On Wed, 3 Nov 1999, Neil Conway wrote: > > > The recent thread about >4GB surprised me, as I didn't even think >2GB > > was very stable yet. Am I wrong? Are people out there using 4GB > > boxes with decent stability? I presume it's a 2.3 feature, yes? > > the 64GB stuff got included recently. It's a significant rewrite of the > lowlevel x86 MM and generic MM layer, here is a short description about > it: > > my 'HIGHMEM patch' went into the 2.3 kernel starting at pre4-2.3.23. This > ... Wow, that's good news. But hang on a second, ;-) wasn't there a feature freeze at 2.3.18? And presumably each process is still limited to a 32-bit address space, right? As for stability, anyone got any comments? > 64 GB PAE mode works just fine on my 8GB RAM, 8-way Xeon box: Mmmmm :-) Could you give us the source and a ballpark price on that please? thanks for the update, Neil -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://humbolt.geo.uu.nl/Linux-MM/ ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: The 64GB memory thing 1999-11-03 15:46 ` Neil Conway @ 1999-11-03 17:09 ` Ingo Molnar 1999-11-03 20:47 ` Tom Hull 0 siblings, 1 reply; 6+ messages in thread From: Ingo Molnar @ 1999-11-03 17:09 UTC (permalink / raw) To: Neil Conway; +Cc: Linux MM On Wed, 3 Nov 1999, Neil Conway wrote: > > the 64GB stuff got included recently. It's a significant rewrite of the > > lowlevel x86 MM and generic MM layer, here is a short description about > > it: > > > > my 'HIGHMEM patch' went into the 2.3 kernel starting at pre4-2.3.23. This > > ... > > Wow, that's good news. But hang on a second, ;-) wasn't there a feature > freeze at 2.3.18? most of the changes are in the cleanup category, but yes, it's a boundary case. I had to and still have to work hard to make this as painless as possible ... > And presumably each process is still limited to a 32-bit address space, > right? yes, this is a fundamental limitation of x86 processors. Under Linux -in all 3 high memory modes- user-space virtual memory is 3GB. Nevertheless on a 8-way box you likely want to run either lots of processes, or a few (but more than 8 ) processes/threads to use up all available CPU time. This means with 8x 2GB RSS number crunching processes we already cover 16GB RAM. So it's not at all unrealistic to have support for more than 4GB RAM! The foundation for this is that under Linux all 64GB RAM can be mapped into user processes transparently. I believe other x86 unices (not to talk about NT) do not have this propertly, they handle 'high memory' as a special kind of RAM which can be accessed through special system calls. -- mingo -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://humbolt.geo.uu.nl/Linux-MM/ ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: The 64GB memory thing 1999-11-03 17:09 ` Ingo Molnar @ 1999-11-03 20:47 ` Tom Hull 0 siblings, 0 replies; 6+ messages in thread From: Tom Hull @ 1999-11-03 20:47 UTC (permalink / raw) To: Ingo Molnar; +Cc: Linux MM Ingo Molnar wrote: > > On Wed, 3 Nov 1999, Neil Conway wrote: > > > And presumably each process is still limited to a 32-bit address space, > > right? > > yes, this is a fundamental limitation of x86 processors. Under Linux -in > all 3 high memory modes- user-space virtual memory is 3GB. Nevertheless on > a 8-way box you likely want to run either lots of processes, or a few (but > more than 8 ) processes/threads to use up all available CPU time. This > means with 8x 2GB RSS number crunching processes we already cover 16GB > RAM. So it's not at all unrealistic to have support for more than 4GB RAM! > The foundation for this is that under Linux all 64GB RAM can be mapped > into user processes transparently. I believe other x86 unices (not to talk > about NT) do not have this propertly, they handle 'high memory' as a > special kind of RAM which can be accessed through special system calls. SCO UnixWare (as of Release 7.1) has the ability to transparently map PAE-addressable memory into user processes. A brief history: Starting with 7.0 (Spring 1998), SCO UnixWare supports PAE mode for accessing physical memory up to 64GB. In 7.0, such memory (called Dedicated Memory) was only available for shared memory segments. New system calls (called dshm) were introduced at that time. The dshm calls are not necessary for a user process to access high (above 4GB) memory, but without dshm a user process is limited to its 3GB virtual address space. Dshm provides a window for dynamically mapping a portion of a much larger shared memory segment into the user's ddress space. UnixWare 7.1 (Spring 1999) support for using high memory for all purposes. The default tuning limits general purpose memory to 8GB, and retains the concept of Dedicated Memory for memory above the general purpose memory tune point. The tuning can be adjusted by changing a boot parameter, so that general purpose memory size can be increased to larger values if the workload permits. (The example of 8x 2GB RSS number crunchine process is a relatively painless scenario.) -- /* * Tom Hull -- mailto:thull@kscable.com or thull@ocston.org * http://www.ocston.org/~thull/ */ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://humbolt.geo.uu.nl/Linux-MM/ ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: The 4GB memory thing 1999-11-03 9:48 The 4GB memory thing Neil Conway 1999-11-03 16:25 ` The 64GB " Ingo Molnar @ 1999-11-03 18:09 ` Kanoj Sarcar 1 sibling, 0 replies; 6+ messages in thread From: Kanoj Sarcar @ 1999-11-03 18:09 UTC (permalink / raw) To: Neil Conway; +Cc: linux-mm > > This is a multi-part message in MIME format. > --------------8C597550CEC840E9F68E4220 > Content-Type: text/plain; charset=us-ascii > Content-Transfer-Encoding: 7bit > > The recent thread about >4GB surprised me, as I didn't even think >2GB > was very stable yet. Am I wrong? Are people out there using 4GB boxes > with decent stability? I presume it's a 2.3 feature, yes? > > Sorry for my ignorance, I guess I've been dozing a bit of late. > > Neil I have a 2.2 patch for 4Gb support, which has seen a lot of stress testing by now. The 2.3 >2gb support uses a different (and better approach), but last I checked, things like rawio did not work above >2Gb. The 64Gb support is completely new ... Kanoj -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://humbolt.geo.uu.nl/Linux-MM/ ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~1999-11-03 20:47 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 1999-11-03 9:48 The 4GB memory thing Neil Conway 1999-11-03 16:25 ` The 64GB " Ingo Molnar 1999-11-03 15:46 ` Neil Conway 1999-11-03 17:09 ` Ingo Molnar 1999-11-03 20:47 ` Tom Hull 1999-11-03 18:09 ` The 4GB " Kanoj Sarcar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox