* The 4GB memory thing @ 1999-11-03 9:48 Neil Conway 1999-11-03 16:25 ` The 64GB " Ingo Molnar 1999-11-03 18:09 ` The 4GB " Kanoj Sarcar 0 siblings, 2 replies; 12+ messages in thread From: Neil Conway @ 1999-11-03 9:48 UTC (permalink / raw) To: Linux MM [-- Attachment #1: Type: text/plain, Size: 272 bytes --] The recent thread about >4GB surprised me, as I didn't even think >2GB was very stable yet. Am I wrong? Are people out there using 4GB boxes with decent stability? I presume it's a 2.3 feature, yes? Sorry for my ignorance, I guess I've been dozing a bit of late. Neil [-- Attachment #2: Type: message/rfc822, Size: 670 bytes --] From: Neil Conway <nconway.list@ukaea.org.uk> To: "linux-smp@vger.rutgers.edu" <linux-smp@vger.rutgers.edu> Subject: The 4GB memory thing Date: Wed, 03 Nov 1999 09:46:14 +0000 Message-ID: <38200466.5839E78E@ukaea.org.uk> The recent thread about >4GB surprised me, as I didn't even think >2GB was very stable yet. Am I wrong? Are people out there using 4GB boxes with decent stability? I presume it's a 2.3 feature, yes? Sorry for my ignorance, I guess I've been dozing a bit of late. Neil ^ permalink raw reply [flat|nested] 12+ messages in thread
* The 64GB memory thing 1999-11-03 9:48 The 4GB memory thing Neil Conway @ 1999-11-03 16:25 ` Ingo Molnar 1999-11-03 15:46 ` Neil Conway 1999-11-03 18:09 ` The 4GB " Kanoj Sarcar 1 sibling, 1 reply; 12+ messages in thread From: Ingo Molnar @ 1999-11-03 16:25 UTC (permalink / raw) To: Neil Conway; +Cc: Linux MM On Wed, 3 Nov 1999, Neil Conway wrote: > The recent thread about >4GB surprised me, as I didn't even think >2GB > was very stable yet. Am I wrong? Are people out there using 4GB > boxes with decent stability? I presume it's a 2.3 feature, yes? the 64GB stuff got included recently. It's a significant rewrite of the lowlevel x86 MM and generic MM layer, here is a short description about it: my 'HIGHMEM patch' went into the 2.3 kernel starting at pre4-2.3.23. This means a heavily rewritten VM subsystem (to deal with pte's bigger than machine word size), and a much rewritten x86 memory and boot architecture. In fact there is no bigmem anymore, it has been replaced by (the i think more correct term) 'high memory'. It utilizes 3-level page tables on PPro+ CPUs called 'Physical Address Extension' (PAE) mode. In PAE mode the CPU uses a completely different and incompatible page table structure, which is 3-level and has 64-bit page table entries and cover up to 64GB physical RAM. Virtual space is still unchanged, 4GB. Highmem is completely transparent to user-space. There is a new 'High Memory Support' option under 'Processor type and features': High Memory Support ( ) off ( ) 4GB (X) 64GB 'off' is up to 1GB RAM, utilizing 2-level page tables and no highmem support. '4GB' is utilizing 2-level page tables and high memory support for any <4GB physical RAM that cannot be permanently mapped by the kernel. '64GB' mode utilizes 3-level page tables (for everything). The theoretical limit of high memory on IA32 boxes is 16 TB - there is lots of space in 64-bit PAE pte's, although current CPUs only support up to 64GB RAM. (the biggest current chipsets supports up to 32GB RAM) about the structure of the patch/feature itself, kernel internals: pgtable.h got split up into pgtable-2level.h and pgtable-3level.h, which should be the only 'global #ifdef' distincting 3-level from 2-level page tables on x86. There were lots of assumptions throughout the arch/i386 tree that assumed 2-level page tables, these places all had to be fixed and converted to 'generic 3-level page table code'. There are only a few CONFIG_X86_PAE #ifdefs left, i intend to cut down the number of these even more, to keep the x86 lowlevel MM/boot code easy to maintain. the generic kernel was almost safe wrt. 3-level page tables, but nevertheless it had bugs which only triggered in PAE mode. For example, one pgd entry in PAE mode covers 1GB of virtual memory, and some loops which iterated through virtual memory had buggy exit conditions and broke in subtle ways when they were running in the upper-most 1GB of virtual memory. (ie. kernel space) There were about 20 of such buggy loops throughout the MM code. the much bigger generic change was that pte's got 64-bit, although the architecture itself was still 32-bit. Lots of VM-internal code had to be reworked to never assume that 'sizeof(pte_t) == sizeof(unsigned long)'. Examples are the swapping code, IPC shared memory. Bigger than machine-word ptes were not supported by Linux previously. also i guess many of you have noticed the new mm/bootmem.c allocator - this was necessery because on my 8GB box mem_map is more than 100 MB (!), and the 'naive' boot-time allocation we did in earlier kernels simply did not work on 'slightly noncontinous' physical maps like my box has. (at 64MB there is an ACPI area which caused problems) [this short description should give you a scope of the changes, and i/we are still fixing some of the impact in 2.3.25. (Christoph just posted his problems with IPC shared memory)] Backporting to 2.2: while the bigmem patch was small and simple and got backported to 2.2, the highmem patch is basically impossible to be backported in a maintainable way as it touches some 60 files all over the kernel. 64 GB PAE mode works just fine on my 8GB RAM, 8-way Xeon box: 11:25pm up 5 min, 2 users, load average: 7.78, 4.30, 1.77 30 processes: 21 sleeping, 9 running, 0 zombie, 0 stopped CPU states: 0.0% user, 7.2% system, 92.8% nice, 0.0% idle Mem: 8241152K av,7720960K used, 520192K free, 0K shrd, 2168K buff Swap: 0K av, 0K used, 0K free 9756K cached PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME COMMAND 215 root 19 19 1000M 1.0G 232 R N 1.0G 11.6 12.4 0:11 db_serv 180 root 19 19 1000M 1.0G 232 R N 1.0G 11.5 12.4 0:31 db_serv 182 root 19 19 1000M 1.0G 232 R N 1.0G 11.5 12.4 0:30 db_serv 183 root 19 19 1000M 1.0G 232 R N 1.0G 11.5 12.4 0:30 db_serv 184 root 19 19 1000M 1.0G 232 R N 1.0G 11.5 12.4 0:30 db_serv 185 root 19 19 1000M 1.0G 232 R N 1.0G 11.5 12.4 0:30 db_serv 186 root 19 19 1000M 1.0G 232 R N 1.0G 11.5 12.4 0:29 db_serv 247 root 19 19 500M 500M 232 R N 0 11.5 6.2 0:04 db_serv2 181 root 1 0 996 996 828 R 0 7.2 0.0 0:16 top 177 root 0 0 984 984 768 S 0 0.1 0.0 0:00 bash 1 root 0 0 476 476 408 S 0 0.0 0.0 0:00 init (these are 8x ~1G-RSS processes using up all 8GB physical RAM, one running on each CPU) future plans: right now high memory is seriously underused on typical servers due to the page cache still being in low memory. On 2.2 with bigmem the lowmem:highmem ratio is around 5:1, this means that the 'effective' size of my 8GB box in 2.2 is only ~2.4GB. The exception are workloads where most memory is allocated as shared memory or user memory, but this is not the case for a typical web or fileserver. On my box the pagecache is already in high memory (and we are ready to add 64-bit DMA to device drivers), and the lowmem:highmem ratio is up to 1:10. This means that 8GB RAM is already fully utilized on a typical server workload. Ingo ps. to have correct memory statistics (top, vmstat, free) with >4GB RAM you need the newest procps package (or i can send the patches). -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://humbolt.geo.uu.nl/Linux-MM/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: The 64GB memory thing 1999-11-03 16:25 ` The 64GB " Ingo Molnar @ 1999-11-03 15:46 ` Neil Conway 1999-11-03 17:09 ` Ingo Molnar 0 siblings, 1 reply; 12+ messages in thread From: Neil Conway @ 1999-11-03 15:46 UTC (permalink / raw) To: Ingo Molnar; +Cc: Linux MM Ingo Molnar wrote: > > On Wed, 3 Nov 1999, Neil Conway wrote: > > > The recent thread about >4GB surprised me, as I didn't even think >2GB > > was very stable yet. Am I wrong? Are people out there using 4GB > > boxes with decent stability? I presume it's a 2.3 feature, yes? > > the 64GB stuff got included recently. It's a significant rewrite of the > lowlevel x86 MM and generic MM layer, here is a short description about > it: > > my 'HIGHMEM patch' went into the 2.3 kernel starting at pre4-2.3.23. This > ... Wow, that's good news. But hang on a second, ;-) wasn't there a feature freeze at 2.3.18? And presumably each process is still limited to a 32-bit address space, right? As for stability, anyone got any comments? > 64 GB PAE mode works just fine on my 8GB RAM, 8-way Xeon box: Mmmmm :-) Could you give us the source and a ballpark price on that please? thanks for the update, Neil -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://humbolt.geo.uu.nl/Linux-MM/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: The 64GB memory thing 1999-11-03 15:46 ` Neil Conway @ 1999-11-03 17:09 ` Ingo Molnar 1999-11-03 20:47 ` Tom Hull 0 siblings, 1 reply; 12+ messages in thread From: Ingo Molnar @ 1999-11-03 17:09 UTC (permalink / raw) To: Neil Conway; +Cc: Linux MM On Wed, 3 Nov 1999, Neil Conway wrote: > > the 64GB stuff got included recently. It's a significant rewrite of the > > lowlevel x86 MM and generic MM layer, here is a short description about > > it: > > > > my 'HIGHMEM patch' went into the 2.3 kernel starting at pre4-2.3.23. This > > ... > > Wow, that's good news. But hang on a second, ;-) wasn't there a feature > freeze at 2.3.18? most of the changes are in the cleanup category, but yes, it's a boundary case. I had to and still have to work hard to make this as painless as possible ... > And presumably each process is still limited to a 32-bit address space, > right? yes, this is a fundamental limitation of x86 processors. Under Linux -in all 3 high memory modes- user-space virtual memory is 3GB. Nevertheless on a 8-way box you likely want to run either lots of processes, or a few (but more than 8 ) processes/threads to use up all available CPU time. This means with 8x 2GB RSS number crunching processes we already cover 16GB RAM. So it's not at all unrealistic to have support for more than 4GB RAM! The foundation for this is that under Linux all 64GB RAM can be mapped into user processes transparently. I believe other x86 unices (not to talk about NT) do not have this propertly, they handle 'high memory' as a special kind of RAM which can be accessed through special system calls. -- mingo -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://humbolt.geo.uu.nl/Linux-MM/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: The 64GB memory thing 1999-11-03 17:09 ` Ingo Molnar @ 1999-11-03 20:47 ` Tom Hull 0 siblings, 0 replies; 12+ messages in thread From: Tom Hull @ 1999-11-03 20:47 UTC (permalink / raw) To: Ingo Molnar; +Cc: Linux MM Ingo Molnar wrote: > > On Wed, 3 Nov 1999, Neil Conway wrote: > > > And presumably each process is still limited to a 32-bit address space, > > right? > > yes, this is a fundamental limitation of x86 processors. Under Linux -in > all 3 high memory modes- user-space virtual memory is 3GB. Nevertheless on > a 8-way box you likely want to run either lots of processes, or a few (but > more than 8 ) processes/threads to use up all available CPU time. This > means with 8x 2GB RSS number crunching processes we already cover 16GB > RAM. So it's not at all unrealistic to have support for more than 4GB RAM! > The foundation for this is that under Linux all 64GB RAM can be mapped > into user processes transparently. I believe other x86 unices (not to talk > about NT) do not have this propertly, they handle 'high memory' as a > special kind of RAM which can be accessed through special system calls. SCO UnixWare (as of Release 7.1) has the ability to transparently map PAE-addressable memory into user processes. A brief history: Starting with 7.0 (Spring 1998), SCO UnixWare supports PAE mode for accessing physical memory up to 64GB. In 7.0, such memory (called Dedicated Memory) was only available for shared memory segments. New system calls (called dshm) were introduced at that time. The dshm calls are not necessary for a user process to access high (above 4GB) memory, but without dshm a user process is limited to its 3GB virtual address space. Dshm provides a window for dynamically mapping a portion of a much larger shared memory segment into the user's ddress space. UnixWare 7.1 (Spring 1999) support for using high memory for all purposes. The default tuning limits general purpose memory to 8GB, and retains the concept of Dedicated Memory for memory above the general purpose memory tune point. The tuning can be adjusted by changing a boot parameter, so that general purpose memory size can be increased to larger values if the workload permits. (The example of 8x 2GB RSS number crunchine process is a relatively painless scenario.) -- /* * Tom Hull -- mailto:thull@kscable.com or thull@ocston.org * http://www.ocston.org/~thull/ */ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://humbolt.geo.uu.nl/Linux-MM/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: The 4GB memory thing 1999-11-03 9:48 The 4GB memory thing Neil Conway 1999-11-03 16:25 ` The 64GB " Ingo Molnar @ 1999-11-03 18:09 ` Kanoj Sarcar 1 sibling, 0 replies; 12+ messages in thread From: Kanoj Sarcar @ 1999-11-03 18:09 UTC (permalink / raw) To: Neil Conway; +Cc: linux-mm > > This is a multi-part message in MIME format. > --------------8C597550CEC840E9F68E4220 > Content-Type: text/plain; charset=us-ascii > Content-Transfer-Encoding: 7bit > > The recent thread about >4GB surprised me, as I didn't even think >2GB > was very stable yet. Am I wrong? Are people out there using 4GB boxes > with decent stability? I presume it's a 2.3 feature, yes? > > Sorry for my ignorance, I guess I've been dozing a bit of late. > > Neil I have a 2.2 patch for 4Gb support, which has seen a lot of stress testing by now. The 2.3 >2gb support uses a different (and better approach), but last I checked, things like rawio did not work above >2Gb. The 64Gb support is completely new ... Kanoj -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://humbolt.geo.uu.nl/Linux-MM/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: The 4GB memory thing
@ 1999-11-04 18:19 Andrea Arcangeli
1999-11-04 18:48 ` Kanoj Sarcar
0 siblings, 1 reply; 12+ messages in thread
From: Andrea Arcangeli @ 1999-11-04 18:19 UTC (permalink / raw)
To: Kanoj Sarcar; +Cc: Neil Conway, linux-mm
kanoj@google.engr.sgi.com (Kanoj Sarcar) writes:
> I have a 2.2 patch for 4Gb support, which has seen a lot of stress
> testing by now. The 2.3 >2gb support uses a different (and better
> approach), but last I checked, things like rawio did not work above
> >2Gb. The 64Gb support is completely new ...
2.2.13aa3 includes both 4g bigmem support and rawio and you can do
rawio on all the memory (bigmem included).
ftp://ftp.*.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.2/2.2.13aa3/
This is the README on how to go in sync with 2.2.13aa3:
ftp://ftp.*.kernel.org/pub/linux/kernel/people/andrea/tools/apply-patches/README.gz
--
Andrea
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: The 4GB memory thing 1999-11-04 18:19 Andrea Arcangeli @ 1999-11-04 18:48 ` Kanoj Sarcar 1999-11-04 22:50 ` Ingo Molnar 1999-11-04 23:41 ` Andrea Arcangeli 0 siblings, 2 replies; 12+ messages in thread From: Kanoj Sarcar @ 1999-11-04 18:48 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: nconway.list, linux-mm > > kanoj@google.engr.sgi.com (Kanoj Sarcar) writes: > > > I have a 2.2 patch for 4Gb support, which has seen a lot of stress > > testing by now. The 2.3 >2gb support uses a different (and better > > approach), but last I checked, things like rawio did not work above > > >2Gb. The 64Gb support is completely new ... > > 2.2.13aa3 includes both 4g bigmem support and rawio and you can do > rawio on all the memory (bigmem included). > > ftp://ftp.*.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.2/2.2.13aa3/ > > This is the README on how to go in sync with 2.2.13aa3: > > ftp://ftp.*.kernel.org/pub/linux/kernel/people/andrea/tools/apply-patches/README.gz > > -- > Andrea > I don't see a README.gz under http://www.kernel.org/pub/linux/kernel/people/andrea/tools/apply-patches/ In any case, did you a have a small technical README on how rawio works on bigmem in 2.2.13aa3? Btw, I haven't seen the rawio 2.2 port, I am assuming its very similar to 2.3 ... where brw_kiovec() refuses to accept PageHighMem pages. I didn't see anything in z-bigmem-2.2.13aa3-7 that tinkers either with fs/buffer.c. Thanks. Kanoj -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://humbolt.geo.uu.nl/Linux-MM/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: The 4GB memory thing 1999-11-04 18:48 ` Kanoj Sarcar @ 1999-11-04 22:50 ` Ingo Molnar 1999-11-04 22:08 ` Kanoj Sarcar 1999-11-04 23:41 ` Andrea Arcangeli 1 sibling, 1 reply; 12+ messages in thread From: Ingo Molnar @ 1999-11-04 22:50 UTC (permalink / raw) To: Kanoj Sarcar; +Cc: nconway.list, linux-mm, Stephen C. Tweedie On Thu, 4 Nov 1999, Kanoj Sarcar wrote: > assuming its very similar to 2.3 ... where brw_kiovec() refuses to > accept PageHighMem pages. [...] (btw, i have removed this limitation already in my tree, now that ll_rw_block() accepts highmem pages as well.) -- mingo -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://humbolt.geo.uu.nl/Linux-MM/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: The 4GB memory thing 1999-11-04 22:50 ` Ingo Molnar @ 1999-11-04 22:08 ` Kanoj Sarcar 1999-11-05 8:21 ` Ingo Molnar 0 siblings, 1 reply; 12+ messages in thread From: Kanoj Sarcar @ 1999-11-04 22:08 UTC (permalink / raw) To: Ingo Molnar; +Cc: nconway.list, linux-mm, sct > > > On Thu, 4 Nov 1999, Kanoj Sarcar wrote: > > > assuming its very similar to 2.3 ... where brw_kiovec() refuses to > > accept PageHighMem pages. [...] > > (btw, i have removed this limitation already in my tree, now that > ll_rw_block() accepts highmem pages as well.) > > -- mingo > Ohh! Are you talking about ll_rw_block() in your tree, or in 2.3.25? If in 2.3.25, where was the bouncing added? Kanoj -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://humbolt.geo.uu.nl/Linux-MM/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: The 4GB memory thing 1999-11-04 22:08 ` Kanoj Sarcar @ 1999-11-05 8:21 ` Ingo Molnar 0 siblings, 0 replies; 12+ messages in thread From: Ingo Molnar @ 1999-11-05 8:21 UTC (permalink / raw) To: Kanoj Sarcar; +Cc: nconway.list, linux-mm, sct On Thu, 4 Nov 1999, Kanoj Sarcar wrote: > > > assuming its very similar to 2.3 ... where brw_kiovec() refuses to > > > accept PageHighMem pages. [...] > > > > (btw, i have removed this limitation already in my tree, now that > > ll_rw_block() accepts highmem pages as well.) > > Ohh! Are you talking about ll_rw_block() in your tree, or in 2.3.25? > If in 2.3.25, where was the bouncing added? my tree. Ingo -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://humbolt.geo.uu.nl/Linux-MM/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: The 4GB memory thing 1999-11-04 18:48 ` Kanoj Sarcar 1999-11-04 22:50 ` Ingo Molnar @ 1999-11-04 23:41 ` Andrea Arcangeli 1 sibling, 0 replies; 12+ messages in thread From: Andrea Arcangeli @ 1999-11-04 23:41 UTC (permalink / raw) To: Kanoj Sarcar; +Cc: nconway.list, linux-mm kanoj@google.engr.sgi.com (Kanoj Sarcar) writes: > I don't see a README.gz under > http://www.kernel.org/pub/linux/kernel/people/andrea/tools/apply-patches/ lftp> pwd ftp://ftp.it.kernel.org/pub/linux/kernel/people/andrea/tools/apply-patches lftp> ls README.gz -rw-r--r-- 1 ftp daemon 875 Oct 25 00:43 README.gz lftp> > In any case, did you a have a small technical README on how rawio works > on bigmem in 2.2.13aa3? Btw, I haven't seen the rawio 2.2 port, I am As first you can have a look at the rawio patch. ftp://ftp.it.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.2/2.2.13aa3/z-bigmem-rawio-2.2.13aa2-1.gz The above patch includes all the necessary stuff to make rawio working fine on bigmem pages. Basically the code uses regular pages as bounce buffers to do the I/O on bigmem pages. > assuming its very similar to 2.3 ... where brw_kiovec() refuses to > accept PageHighMem pages. I didn't see anything in > z-bigmem-2.2.13aa3-7 No. The z-bigmem-rawio-2.2.13aa2-1.gz in 2.2.13aa3 allows brw_kiovec to do I/O on bigmem pages. > that tinkers either with fs/buffer.c. I take the bigmem stuff separated from rawio. The rawio patch (pointed out above) included in 2.2.13aa3 is an incremental patch that goes on the top of bigmem. I take all the patches separated to allow everybody out there to merge easily my stuff and to see the only related necessary changes on topic. -- Andrea -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://humbolt.geo.uu.nl/Linux-MM/ ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~1999-11-05 8:21 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 1999-11-03 9:48 The 4GB memory thing Neil Conway 1999-11-03 16:25 ` The 64GB " Ingo Molnar 1999-11-03 15:46 ` Neil Conway 1999-11-03 17:09 ` Ingo Molnar 1999-11-03 20:47 ` Tom Hull 1999-11-03 18:09 ` The 4GB " Kanoj Sarcar 1999-11-04 18:19 Andrea Arcangeli 1999-11-04 18:48 ` Kanoj Sarcar 1999-11-04 22:50 ` Ingo Molnar 1999-11-04 22:08 ` Kanoj Sarcar 1999-11-05 8:21 ` Ingo Molnar 1999-11-04 23:41 ` Andrea Arcangeli
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox