* Re: large page patch (fwd) (fwd) [not found] <Pine.LNX.4.33.0208021252090.2466-100000@penguin.transmeta.com> @ 2002-08-02 23:54 ` Martin J. Bligh 2002-08-03 0:35 ` Andrew Morton 0 siblings, 1 reply; 5+ messages in thread From: Martin J. Bligh @ 2002-08-02 23:54 UTC (permalink / raw) To: Linus Torvalds; +Cc: Hubertus Franke, wli, gh, akpm, swj, linux-mm mailing list >> Let me than turn around the table. Have you looked at our patch for 2.4.18. >> It doesn't add anything to the hot path either, if the (vma->pg_order == 0). >> Period. > > Nobody has forwarded the patch, and I've seen no discussion of it on the > kernel mailing lists. > > Guess what the answer is? > > Is it 10 lines of code in the VM subsystem? No, and you're not going to like the patch in it's current incarnation by the sound of it. So, having listened to your objections, we're going to take a slightly different course - we will prepare a minimal version of the patch with very low impact on the core VM code, but using more standard interfaces to access it (eg the shmem method you outlined earlier). It'll have a little less functionality, but so be it. There are other apps apart from Oracle that want the ability to use large pages (eg DB2 and Java), and it seems that most of those want them for anonymous mmap or shmem. If we can provide an interface that's more standard, it'll make people's porting much easier. IBM Research has done some significant benchmarking of large page support in a variety of applications, and has seen 20-40% performance boost for Java, and 6-22% improvment for the SPEC CPU2000 set of tests. For the full details, see the OLS paper at: http://www.linux.org.uk/~ajh/ols2002_proceedings.pdf.gz Moreover, we need large pages to reduce PTE consumption in a variety of applications using shared memory, especially given the additional overhead of rmap. We should have this available in a few days - if you could hold off until then, we should be able to do an objective comparison? I believe we can make something that's acceptable to you. Thanks, Martin. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: large page patch (fwd) (fwd) 2002-08-02 23:54 ` large page patch (fwd) (fwd) Martin J. Bligh @ 2002-08-03 0:35 ` Andrew Morton 2002-08-03 1:26 ` Linus Torvalds 0 siblings, 1 reply; 5+ messages in thread From: Andrew Morton @ 2002-08-03 0:35 UTC (permalink / raw) To: Martin J. Bligh Cc: Linus Torvalds, Hubertus Franke, wli, swj, linux-mm mailing list "Martin J. Bligh" wrote: > > >> Let me than turn around the table. Have you looked at our patch for 2.4.18. > >> It doesn't add anything to the hot path either, if the (vma->pg_order == 0). > >> Period. > > > > Nobody has forwarded the patch, and I've seen no discussion of it on the > > kernel mailing lists. > > > > Guess what the answer is? > > > > Is it 10 lines of code in the VM subsystem? > > No, and you're not going to like the patch in it's current incarnation by > the sound of it. So, having listened to your objections, we're going to > take a slightly different course - we will prepare a minimal version of > the patch with very low impact on the core VM code, but using more > standard interfaces to access it (eg the shmem method you outlined > earlier). It'll have a little less functionality, but so be it. Remind me again what's wrong with wrapping the Intel syscalls inside malloc() and then maybe grafting a little hook into the shm code? >... > We should have this available in a few days - if you could hold off > until then, we should be able to do an objective comparison? I believe > we can make something that's acceptable to you. More than a few days. The patch which went around isn't Rohit's latest, and it hasn't even been tested in 2.5 and we're considering replacing the shm key with an fd, and... -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: large page patch (fwd) (fwd) 2002-08-03 0:35 ` Andrew Morton @ 2002-08-03 1:26 ` Linus Torvalds 2002-08-03 4:26 ` Gerrit Huizenga 0 siblings, 1 reply; 5+ messages in thread From: Linus Torvalds @ 2002-08-03 1:26 UTC (permalink / raw) To: Andrew Morton; +Cc: Martin J. Bligh, Hubertus Franke, wli On Fri, 2 Aug 2002, Andrew Morton wrote: > > Remind me again what's wrong with wrapping the Intel syscalls > inside malloc() and then maybe grafting a little hook into the shm code? Indeed. However, don't think "Intel syscalls", think instead "bring out the architecture-defined mapping features". In particular, the main objection I had to Ingo's patch (which, by the sound of it is fairly similar to the IBM patches which I haven't seen) was that it was much too Intel-centric. I admit to being x86-centric when it comes to implementation (simply due to the fact that they are cheap and everywhere), but I try very hard to avoid making _design_ revolve around x86. In particular, while I'm not a big fan of the PPC hash tables (understatement of the year), I _do_ like the BAT mapping that PPC has. (Alternatively, if you aren't familiar with BAT registers, think software-filled extra TLB entries that are outside the normal fill policy and have large sizes. For some architectures it makes sense to do this at sw TLB fill time, for others that isn't very practical because the page table lookup is fixed in various ways.) This is sometimes also referred to as "superpages". And I think people will find the "separate path" approach more palatable if you think of it as an interface to BAT registers (with the "normal" VM path being the interface to the regular page tables). And keeping very much in mind that on some CPU's these two things really _are_ totally separate (PPC being the best example). The fact that on x86, which doesn't have a BAT array, we use the PMD-spanning "large pages" instead, should be seen as the anomaly, not as the design case. This also hopefully explains why I consider anything that touches or cares about page tables in generic VM code wrt the largepage support to be fundamentally broken. If the largepage patch messes around with page tables, it cannot be generic. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: large page patch (fwd) (fwd) 2002-08-03 1:26 ` Linus Torvalds @ 2002-08-03 4:26 ` Gerrit Huizenga 2002-08-03 4:39 ` Linus Torvalds 0 siblings, 1 reply; 5+ messages in thread From: Gerrit Huizenga @ 2002-08-03 4:26 UTC (permalink / raw) To: Linus Torvalds Cc: Andrew Morton, Martin J. Bligh, Hubertus Franke, wli, swj, linux-mm mailing list In message <Pine.LNX.4.44.0208021757490.2210-100000@home.transmeta.com>, > : Li nus Torvalds writes: > > > On Fri, 2 Aug 2002, Andrew Morton wrote: > > > > Remind me again what's wrong with wrapping the Intel syscalls > > inside malloc() and then maybe grafting a little hook into the shm code? > > Indeed. Do you really want all calls to malloc to allocate non-pageable memory? And I doubt that this memory will be pageable in time for 2.5. > However, don't think "Intel syscalls", think instead "bring out the > architecture-defined mapping features". In particular, the main objection > I had to Ingo's patch (which, by the sound of it is fairly similar to the > IBM patches which I haven't seen) was that it was much too Intel-centric. The IBM patch (Simon Winwood's work) was first done for PPC64 and then ported at my insistence to IA32 since we had an immediate need and an opportunity to do some specific application porting work on IA32. The patch was intended to be both architecture neutral and to support multiple page sizes. In the essense of hitting the Halloween deadline, we believe that dropping back for the moment to IA32, pinned, mmap()/ madvise()/shm*() versions, possibly gated by a capability (or not, easily debatable and I doubt that it matters too much) will get at least IBM apps on IA32 through the lifetime on 2.6 and probably have the framework in such that PPC64 can also easily fit in possibly pre-freeze, possibly post-freeze with mostly arch-specific mods. > I admit to being x86-centric when it comes to implementation (simply due > to the fact that they are cheap and everywhere), but I try very hard to > avoid making _design_ revolve around x86. In particular, while I'm not a > big fan of the PPC hash tables (understatement of the year), I _do_ like > the BAT mapping that PPC has. We folks in the LTC have much the same interest. In addition to the obvious IA32/PPC32/PPC64/zSeries/IA64/AMD issues (keep in mind we probably sell more servers with PPC than with IA32 ;-), we have software products which run on nearly every platform and every distro in existence. So, we too try to qualify most of our work on its potential application to multiple architectures. > (Alternatively, if you aren't familiar with BAT registers, think > software-filled extra TLB entries that are outside the normal fill policy > and have large sizes. For some architectures it makes sense to do this at > sw TLB fill time, for others that isn't very practical because the page > table lookup is fixed in various ways.) >From what I've heard from the Watson Research experts on PPC64, BAT registers are actually a bad idea for this and AIX is slowly removing its dependency on BAT registers. I'd be interested in a read from Anton or Paul Mackerras or even the Research folks involved in the chip design. And, we are doing everything possible to at least provide code to demonstrate the solutions we are talking about. It just may take a few days to get it properly accelerated. ;-) gerrit -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: large page patch (fwd) (fwd) 2002-08-03 4:26 ` Gerrit Huizenga @ 2002-08-03 4:39 ` Linus Torvalds 0 siblings, 0 replies; 5+ messages in thread From: Linus Torvalds @ 2002-08-03 4:39 UTC (permalink / raw) To: Gerrit Huizenga Cc: Andrew Morton, Martin J. Bligh, Hubertus Franke, wli, swj, linux-mm mailing list On Fri, 2 Aug 2002, Gerrit Huizenga wrote: > In message <Pine.LNX.4.44.0208021757490.2210-100000@home.transmeta.com>, > : Li > nus Torvalds writes: > > > > > > On Fri, 2 Aug 2002, Andrew Morton wrote: > > > > > > Remind me again what's wrong with wrapping the Intel syscalls > > > inside malloc() and then maybe grafting a little hook into the shm code? > > > > Indeed. > > Do you really want all calls to malloc to allocate non-pageable > memory? And I doubt that this memory will be pageable in time for > 2.5. No, I'm saying that you can do the SHM_LARGEPAGE bit testing in user space if you want to. And obviously it will only succeed for root or similar user anyway. But hey, the proof is in the pudding. If you guys can come up with a better scheme that does not pollute the VM paths and has better semantics, I don't think anybody will complain. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2002-08-03 4:39 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <Pine.LNX.4.33.0208021252090.2466-100000@penguin.transmeta.com>
2002-08-02 23:54 ` large page patch (fwd) (fwd) Martin J. Bligh
2002-08-03 0:35 ` Andrew Morton
2002-08-03 1:26 ` Linus Torvalds
2002-08-03 4:26 ` Gerrit Huizenga
2002-08-03 4:39 ` Linus Torvalds
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox