* Why don't we make mmap MAP_SHARED with /dev/zero possible?
@ 1999-10-26 1:57 fxzhang
1999-10-26 7:35 ` Christoph Rohland
1999-10-26 12:07 ` Stephen C. Tweedie
0 siblings, 2 replies; 12+ messages in thread
From: fxzhang @ 1999-10-26 1:57 UTC (permalink / raw)
To: linux-mm
and find this:
/usr/src/linux/drivers/char/mem.c
static int mmap_zero(struct file * file, struct vm_area_struct * vma)
{
if (vma->vm_flags & VM_SHARED)
return -EINVAL;
I don't understand why people don't implement it.Yes,in the source,I find something like
"the shared case is complex",Could someone tell me what's the difficulty?As it is a
driver,I think it should not be too much to concern.At least I know in Solaris this works.
I want to implement it but know I am not competent now,I am just beginning digging it:).
Is there any good way to share memory between process at page granularity?That is,I can
share individual pages between them? Threads maybe a subtitue,but there are many things
that I don't want to share.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: Why don't we make mmap MAP_SHARED with /dev/zero possible?
1999-10-26 1:57 Why don't we make mmap MAP_SHARED with /dev/zero possible? fxzhang
@ 1999-10-26 7:35 ` Christoph Rohland
1999-10-26 12:05 ` Stephen C. Tweedie
1999-10-26 12:07 ` Stephen C. Tweedie
1 sibling, 1 reply; 12+ messages in thread
From: Christoph Rohland @ 1999-10-26 7:35 UTC (permalink / raw)
To: fxzhang; +Cc: linux-mm
fxzhang <fxzhang@chpc.ict.ac.cn> writes:
> and find this:
> /usr/src/linux/drivers/char/mem.c static int mmap_zero(struct file *
> file, struct vm_area_struct * vma) {
> if (vma->vm_flags & VM_SHARED)
> return -EINVAL;
>
> I don't understand why people don't implement it.Yes,in the source,I
> find something like "the shared case is complex",Could someone tell me
> what's the difficulty?As it is a driver,I think it should not be too
> much to concern.At least I know in Solaris this works.
> I want to implement it but know I am not competent now,I am just
> beginning digging it:).
Yes I would like to see it also, but at least in 2.0 days it was
really difficult/impossible.
> Is there any good way to share memory between process at page
> granularity?That is,I can share individual pages between them? Threads
> maybe a subtitue,but there are many things that I don't want to share.
Using SYSV shm with key IPC_PRIVATE gives you the same behaviour.
Greetings
Christoph
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: Why don't we make mmap MAP_SHARED with /dev/zero possible?
1999-10-26 7:35 ` Christoph Rohland
@ 1999-10-26 12:05 ` Stephen C. Tweedie
0 siblings, 0 replies; 12+ messages in thread
From: Stephen C. Tweedie @ 1999-10-26 12:05 UTC (permalink / raw)
To: Christoph Rohland; +Cc: fxzhang, linux-mm
Hi,
On 26 Oct 1999 09:35:45 +0200, Christoph Rohland
<hans-christoph.rohland@sap.com> said:
> Yes I would like to see it also, but at least in 2.0 days it was
> really difficult/impossible.
In 2.2 it is much easier --- I did most of the required work when
making swap cache sharing persistant. Then the 2.2 codefreeze hit...
The first remaining problem is initialisation of demand-zero pages for
shared vmas. You have to be able to ensure that when one process
faults in a shared page for the first time, all other processes pick
up the correct new page.
There are several ways you could do this. The SysV-shm mechanism
would work, but it would be harder to garbage-collect all of the
resources used by a page which is no longer shared. Normal
demand-zero page instantiation would work provided that it was
performed atomically over all the vmas concerned, which would require
careful locking on 2.3 for SMP.
The only fly in the ointment is that 2.3's new bigmem code doesn't
observe the swap cache rules so carefully, and shared pages can become
separated. We'd have to make the swap cache capable of working
properly on high memory pages.
The other thing still needing done is to make the swap cache work
properly for writable pages --- there are still various places in the
VM where we assume mapped swap cache pages are readonly.
--Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Why don't we make mmap MAP_SHARED with /dev/zero possible?
1999-10-26 1:57 Why don't we make mmap MAP_SHARED with /dev/zero possible? fxzhang
1999-10-26 7:35 ` Christoph Rohland
@ 1999-10-26 12:07 ` Stephen C. Tweedie
1 sibling, 0 replies; 12+ messages in thread
From: Stephen C. Tweedie @ 1999-10-26 12:07 UTC (permalink / raw)
To: fxzhang; +Cc: linux-mm, Stephen Tweedie
Hi,
On Tue, 26 Oct 1999 9:57:48 +0800, fxzhang <fxzhang@chpc.ict.ac.cn>
said:
> static int mmap_zero(struct file * file, struct vm_area_struct * vma)
> {
> if (vma->vm_flags & VM_SHARED)
> return -EINVAL;
> I don't understand why people don't implement it.Yes,in the source,I
> find something like "the shared case is complex",Could someone tell
> me what's the difficulty?As it is a driver,I think it should not be
> too much to concern.
It is not a driver issue --- it is core to the VM. The VM cannot
handle shared writable anonymous pages. We're not talking about mmap
pages in this special case: we are talking about normal anonymous data
pages.
> Is there any good way to share memory between process at page
> granularity?That is,I can share individual pages between them?
> Threads maybe a subtitue,but there are many things that I don't want
> to share.
SysV shared memory. "man shmget; man shmop; man shmctl"
--Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/
^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <qwwzox6l3nh.fsf@sap.com>]
* Re: Why don't we make mmap MAP_SHARED with /dev/zero possible?
[not found] <qwwzox6l3nh.fsf@sap.com>
@ 1999-11-03 14:29 ` Ingo Molnar
1999-11-03 14:50 ` Eric W. Biederman
0 siblings, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 1999-11-03 14:29 UTC (permalink / raw)
To: Christoph Rohland
Cc: Stephen C. Tweedie, Eric W. Biederman, fxzhang, linux-mm
On 26 Oct 1999, Christoph Rohland wrote:
> This lines up with some remarks from Eric Biederman about his shmfs,
> which is BTW a feature I would _love_ to have in Linux to do posix shm
> and perhaps redo sysv shm. He said that he would like to make the
> pagecache highmem-capable and AFAIK the main work for shmfs was
> makeing the pagecache working with writeable pages.
hm, i've got the pagecache in high memory already on my box, patch under
cleanup right now. It was the next natural step after doing all the hard
work to get 64GB RAM support. Eric, is there any conflicting work here?
-- mingo
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: Why don't we make mmap MAP_SHARED with /dev/zero possible?
1999-11-03 14:29 ` Ingo Molnar
@ 1999-11-03 14:50 ` Eric W. Biederman
1999-11-03 16:46 ` Ingo Molnar
0 siblings, 1 reply; 12+ messages in thread
From: Eric W. Biederman @ 1999-11-03 14:50 UTC (permalink / raw)
To: Ingo Molnar
Cc: Christoph Rohland, Stephen C. Tweedie, Eric W. Biederman,
fxzhang, linux-mm
Ingo Molnar <mingo@chiara.csoma.elte.hu> writes:
> On 26 Oct 1999, Christoph Rohland wrote:
>
> > This lines up with some remarks from Eric Biederman about his shmfs,
> > which is BTW a feature I would _love_ to have in Linux to do posix shm
> > and perhaps redo sysv shm. He said that he would like to make the
> > pagecache highmem-capable and AFAIK the main work for shmfs was
> > makeing the pagecache working with writeable pages.
>
> hm, i've got the pagecache in high memory already on my box, patch under
> cleanup right now. It was the next natural step after doing all the hard
> work to get 64GB RAM support. Eric, is there any conflicting work here?
Not really. I played with the idea, and the only really tricky aspect I saw
was how to write a version of copy_to/from_user that would handle the bigmem
case. Because kmap ... copy .. kunmap isn't safe as you can sleep due
to a page fault.
I got about half way to a solution by having the page fault handler basically
act like an exception handler, and switch the return address for this one specific
case. So eventualy when the page fault would return area would magically
rekmap itself, and continue with life. Duing it this was is important
as it only penalizes the uncommon case. So within a clock or two
highmem_copy_to/from_user should be as fast as copy_to/from_user.
And I played with putting a wrapper around ll_rw_block calls in buffer.c
that would allocate bounce buffers from the buffer cache as needed.
I've been a little busy so keeping up with the kernel changes has been too much
just lately. I wound up hacking on dosemu instead where I can out a six month
old patch and finish it up. . .
I'll probably get back to shmfs in a kernel version or two.
>From the last pre-2.3.25-3 it looks like everything I have proposed,
except moving bdflush to the page cache level is finding it's way into
the kernel. And that last isn't critical for 2.4+
So when I get back to hacking it. I'm going to concentrate on the
practical things needed to get shmfs working on 2.3.25+
And let some of the rest of you work on the generic mechanisms,
you are doing fine right now. . .
If you'd like to compare mechanisms or whatever I'd be happy too.
Eric
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Why don't we make mmap MAP_SHARED with /dev/zero possible?
1999-11-03 14:50 ` Eric W. Biederman
@ 1999-11-03 16:46 ` Ingo Molnar
1999-11-03 18:55 ` Eric W. Biederman
1999-11-03 19:16 ` Eric W. Biederman
0 siblings, 2 replies; 12+ messages in thread
From: Ingo Molnar @ 1999-11-03 16:46 UTC (permalink / raw)
To: Eric W. Biederman
Cc: Christoph Rohland, Stephen C. Tweedie, fxzhang, linux-mm
On 3 Nov 1999, Eric W. Biederman wrote:
> Not really. I played with the idea, and the only really tricky aspect I saw
> was how to write a version of copy_to/from_user that would handle the bigmem
> case. Because kmap ... copy .. kunmap isn't safe as you can sleep due
> to a page fault.
yes, i implemented a new 'kaddr = kmap_permanent(page)'
'kunmap_permanent(kaddr)' interface which is schedulable. This is now
getting used in exec.c (argument pages can be significantly big) and the
page cache.
> And I played with putting a wrapper around ll_rw_block calls in
> buffer.c that would allocate bounce buffers from the buffer cache as
> needed.
that is a much more problematic issue, especially if you consider future
64-bit PCI DMAing. What i did was to change bh->b_data to bh->b_page,
which b_page is a 32-bit value describing the physical address of the
buffer, in 512-byte units. This also ment changing bazillion places where
b_data was used (lowlevel fs, buffer-cache and block layer, device
drivers) ... But it's working just fine on my box:
moon:~> cat /proc/meminfo
MemTotal: 8249708 kB
MemFree: 7760256 kB
MemShared: 0 kB
Buffers: 20292 kB
Cached: 432052 kB <=== 432M pagecache
HighTotal: 7471104 kB
HighFree: 7035928 kB <=== 444M high memory allocated
LowTotal: 778604 kB
LowFree: 724328 kB <=== 50M normal memory allocated
SwapTotal: 0 kB
SwapFree: 0 kB
> I'll probably get back to shmfs in a kernel version or two.
looking forward to test it, i believe we could get some spectacular
benchmark numbers with that thing and 2.4 ...
-- mingo
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: Why don't we make mmap MAP_SHARED with /dev/zero possible?
1999-11-03 16:46 ` Ingo Molnar
@ 1999-11-03 18:55 ` Eric W. Biederman
1999-11-03 19:16 ` Eric W. Biederman
1 sibling, 0 replies; 12+ messages in thread
From: Eric W. Biederman @ 1999-11-03 18:55 UTC (permalink / raw)
To: Ingo Molnar
Cc: Eric W. Biederman, Christoph Rohland, Stephen C. Tweedie,
fxzhang, linux-mm
Ingo Molnar <mingo@chiara.csoma.elte.hu> writes:
> On 3 Nov 1999, Eric W. Biederman wrote:
>
> > Not really. I played with the idea, and the only really tricky aspect I saw
> > was how to write a version of copy_to/from_user that would handle the bigmem
> > case. Because kmap ... copy .. kunmap isn't safe as you can sleep due
> > to a page fault.
>
> yes, i implemented a new 'kaddr = kmap_permanent(page)'
> 'kunmap_permanent(kaddr)' interface which is schedulable. This is now
> getting used in exec.c (argument pages can be significantly big) and the
> page cache.
Do you have a patch around that the rest of us can look at?
> that is a much more problematic issue, especially if you consider future
> 64-bit PCI DMAing. What i did was to change bh->b_data to bh->b_page,
> which b_page is a 32-bit value describing the physical address of the
> buffer, in 512-byte units. This also ment changing bazillion places where
> b_data was used (lowlevel fs, buffer-cache and block layer, device
> drivers) ... But it's working just fine on my box:
Click.
Which lets up access up to 2Terabytes of ram, on a 32 bit machine.
And you have to do something like that or you can't put buffers on
those high pages even temporarily. I missed that trick.
> > I'll probably get back to shmfs in a kernel version or two.
>
> looking forward to test it, i believe we could get some spectacular
> benchmark numbers with that thing and 2.4 ...
We'll see. I just want to get it functional first.
There are no binary compatibility constrainsts so after it works
any optimizations are easy :)
Eric
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Why don't we make mmap MAP_SHARED with /dev/zero possible?
1999-11-03 16:46 ` Ingo Molnar
1999-11-03 18:55 ` Eric W. Biederman
@ 1999-11-03 19:16 ` Eric W. Biederman
1999-11-03 20:24 ` Ingo Molnar
1 sibling, 1 reply; 12+ messages in thread
From: Eric W. Biederman @ 1999-11-03 19:16 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Christoph Rohland, Stephen C. Tweedie, fxzhang, linux-mm
[snip]
Q: Highmem allocation for the page cache.
Note: page_cache_alloc currently doesn't take any parameters.
It should take __GFP_BIGMEM or whatever so we can yes high memory is ok.
Or no.
I'm going to put metadata in this page and high memory is not o.k, too inconvienint.
Eric
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Why don't we make mmap MAP_SHARED with /dev/zero possible?
1999-11-03 19:16 ` Eric W. Biederman
@ 1999-11-03 20:24 ` Ingo Molnar
1999-11-03 19:32 ` Benjamin C.R. LaHaise
0 siblings, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 1999-11-03 20:24 UTC (permalink / raw)
To: Eric W. Biederman
Cc: Christoph Rohland, Stephen C. Tweedie, fxzhang, linux-mm
On 3 Nov 1999, Eric W. Biederman wrote:
> Q: Highmem allocation for the page cache.
>
> Note: page_cache_alloc currently doesn't take any parameters.
> It should take __GFP_BIGMEM or whatever so we can yes high memory is ok.
it's now an unconditional __GFP_HIGHMEM in my tree. HIGHMEM gfp()
allocation automatically falls back to allocate in lowmem, if highmem
lists are empty.
> Or no.
> I'm going to put metadata in this page and high memory is not o.k, too
> inconvienint.
hm, i see, this makes sense. permanent mappings are not inconvenient at
all (you can hold a number of them, can sleep inbetween), but maybe we
still want to allocate in low memory in some cases, for performance
reasons. It's not a problem at all and completely legal, as low memory
pages can happen anyway, so everything is completely symmetric. Any 'high
memory enabled code' automatically works with low memory (or exclusive low
memory) as well.
-- mingo
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Why don't we make mmap MAP_SHARED with /dev/zero possible?
1999-11-03 20:24 ` Ingo Molnar
@ 1999-11-03 19:32 ` Benjamin C.R. LaHaise
1999-11-03 21:41 ` Ingo Molnar
0 siblings, 1 reply; 12+ messages in thread
From: Benjamin C.R. LaHaise @ 1999-11-03 19:32 UTC (permalink / raw)
To: Ingo Molnar; +Cc: fxzhang, linux-mm
On Wed, 3 Nov 1999, Ingo Molnar wrote:
> it's now an unconditional __GFP_HIGHMEM in my tree. HIGHMEM gfp()
> allocation automatically falls back to allocate in lowmem, if highmem
> lists are empty.
I'd like to look through the patches to see how you're doing things before
making any comments. Specifically, I want to look at the buffer head
address thing that was mentioned -- given that the devices that support
addressing memory above 4G will expect a 64 bit address, I don't think the
shift is the right way to go.
-ben
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Why don't we make mmap MAP_SHARED with /dev/zero possible?
1999-11-03 19:32 ` Benjamin C.R. LaHaise
@ 1999-11-03 21:41 ` Ingo Molnar
0 siblings, 0 replies; 12+ messages in thread
From: Ingo Molnar @ 1999-11-03 21:41 UTC (permalink / raw)
To: Benjamin C.R. LaHaise; +Cc: fxzhang, linux-mm
On Wed, 3 Nov 1999, Benjamin C.R. LaHaise wrote:
> > it's now an unconditional __GFP_HIGHMEM in my tree. HIGHMEM gfp()
> > allocation automatically falls back to allocate in lowmem, if highmem
> > lists are empty.
>
> I'd like to look through the patches to see how you're doing things before
> making any comments. Specifically, I want to look at the buffer head
(soon, probably tomorrow)
> address thing that was mentioned -- given that the devices that support
> addressing memory above 4G will expect a 64 bit address, I don't think the
> shift is the right way to go.
i introduced bh->b_page _specifically_ to support 64-bit addresses. Below
<4GB RAM DMA32 can be supported with no IO-layer changes. Right now it
supports up to 2TB DMA64 target space (on 32-bit boxes), which is more
than enough.
-- mingo
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~1999-11-03 21:41 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1999-10-26 1:57 Why don't we make mmap MAP_SHARED with /dev/zero possible? fxzhang
1999-10-26 7:35 ` Christoph Rohland
1999-10-26 12:05 ` Stephen C. Tweedie
1999-10-26 12:07 ` Stephen C. Tweedie
[not found] <qwwzox6l3nh.fsf@sap.com>
1999-11-03 14:29 ` Ingo Molnar
1999-11-03 14:50 ` Eric W. Biederman
1999-11-03 16:46 ` Ingo Molnar
1999-11-03 18:55 ` Eric W. Biederman
1999-11-03 19:16 ` Eric W. Biederman
1999-11-03 20:24 ` Ingo Molnar
1999-11-03 19:32 ` Benjamin C.R. LaHaise
1999-11-03 21:41 ` Ingo Molnar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox