* mmap/munmap semantics
@ 2000-02-22 17:46 Richard Guenther
2000-02-22 18:36 ` James Antill
` (4 more replies)
0 siblings, 5 replies; 22+ messages in thread
From: Richard Guenther @ 2000-02-22 17:46 UTC (permalink / raw)
To: Linux Kernel List; +Cc: glame-devel, Linux-MM
Hi!
With the ongoing development of GLAME there arise the following
problems with the backing-store management, which is a mmaped
file and does "userspace virtual memory management":
- I cannot see a way to mmap a part of the file but set the
contents initially to zero, i.e. I want to setup an initially
dirty zero-mapping which is assigned to a part of the file.
Currently I'm just mmaping the part and do the zeroing by
reading from /dev/zero (which does as I understand from the
kernel code just create this zero mappings) - is there a more
portable way to achieve this?
- I need to "drop" a mapping sometimes without writing the contents
back to disk - I cannot see a way to do this with linux currently.
Ideally a hole could be created in the mmapped file on drop time -
is this possible at all with the VFS/ext2 at the moment (creating
a hole in a file by dropping parts of it)?
So for the first case we could add a flag to mmap like MAP_ZERO to
indicate a zero-map (dirty).
For the second case either the munmap call needs to be extended or
some sort of madvise with a MADV_CLEAN flag? Or we can just adjust
mprotect(PROT_NONE) and subsequent munmap() to do the dropping?
Richard.
--
The GLAME Project: http://www.glame.de/
Hosted by SourceForge: http://glame.sourceforge.net/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: mmap/munmap semantics 2000-02-22 17:46 mmap/munmap semantics Richard Guenther @ 2000-02-22 18:36 ` James Antill 2000-02-22 18:41 ` Benjamin C.R. LaHaise ` (3 subsequent siblings) 4 siblings, 0 replies; 22+ messages in thread From: James Antill @ 2000-02-22 18:36 UTC (permalink / raw) To: Richard Guenther; +Cc: Linux Kernel List, glame-devel, Linux-MM > Hi! > > With the ongoing development of GLAME there arise the following > problems with the backing-store management, which is a mmaped > file and does "userspace virtual memory management": > - I cannot see a way to mmap a part of the file but set the > contents initially to zero, i.e. I want to setup an initially > dirty zero-mapping which is assigned to a part of the file. > Currently I'm just mmaping the part and do the zeroing by > reading from /dev/zero (which does as I understand from the > kernel code just create this zero mappings) - is there a more > portable way to achieve this? I think you want to truncate/lseek after open but before you mmap, if I'm reading what you want to do properly. This is portable (at least it works on Linux/FreeBSD/Solaris). > - I need to "drop" a mapping sometimes without writing the contents > back to disk - I cannot see a way to do this with linux currently. > Ideally a hole could be created in the mmapped file on drop time - > is this possible at all with the VFS/ext2 at the moment (creating > a hole in a file by dropping parts of it)? The mapping can be synchronized with the file before you munmap() so I'm not sure what you want mrevert() ? -- I'm positive you're going to have to do this in user space (Ie. copy the file and then rename() or don't rename() at munmap() time -- or do a private mapping and write() to the file at munmap()). -- James Antill -- james@and.org I am always an optimist, but frankly there is no hope. -Hosni Mubarek -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: mmap/munmap semantics 2000-02-22 17:46 mmap/munmap semantics Richard Guenther 2000-02-22 18:36 ` James Antill @ 2000-02-22 18:41 ` Benjamin C.R. LaHaise 2000-02-23 10:57 ` Richard Guenther 2000-02-22 21:48 ` Richard Gooch ` (2 subsequent siblings) 4 siblings, 1 reply; 22+ messages in thread From: Benjamin C.R. LaHaise @ 2000-02-22 18:41 UTC (permalink / raw) To: Richard Guenther; +Cc: Linux Kernel List, glame-devel, Linux-MM On Tue, 22 Feb 2000, Richard Guenther wrote: > Hi! > > With the ongoing development of GLAME there arise the following > problems with the backing-store management, which is a mmaped > file and does "userspace virtual memory management": > - I cannot see a way to mmap a part of the file but set the > contents initially to zero, i.e. I want to setup an initially > dirty zero-mapping which is assigned to a part of the file. > Currently I'm just mmaping the part and do the zeroing by > reading from /dev/zero (which does as I understand from the > kernel code just create this zero mappings) - is there a more > portable way to achieve this? Do you mean that you want to go above and beyond what ftruncate does? If that's the case, reading from /dev/zero is probably the easiest thing, although I suspect doing a sendfile from /dev/zero to the file will ultimately end up being more efficient. If you are managed to do a read from /dev/zero into a shared file mapping beyond the end of file without getting a SIGBUS, then that's a bug. > - I need to "drop" a mapping sometimes without writing the contents > back to disk - I cannot see a way to do this with linux currently. > Ideally a hole could be created in the mmapped file on drop time - > is this possible at all with the VFS/ext2 at the moment (creating > a hole in a file by dropping parts of it)? No, this is insanity. Creating holes in the middle of files actually cam up when talking about ext2 changes, and frankly it doesn't make sense. For example: on a filesystem that uses extents, creating a hole in the middle of a file means that you might have to allocate more disk space in order to free the disk space. > So for the first case we could add a flag to mmap like MAP_ZERO to > indicate a zero-map (dirty). Or teach truncate about preallocation. > For the second case either the munmap call needs to be extended or > some sort of madvise with a MADV_CLEAN flag? Or we can just adjust > mprotect(PROT_NONE) and subsequent munmap() to do the dropping? Remember that madvise is only giving the system hints about what you want it to do. If madvise allows you to mark a dirty page as clean without doing a writeback, that could result in stale data residing in the page cache for other users to come along and read without that data going to disk -- not behaviour I want to see. If read returns it, it should be on disk. -ben -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: mmap/munmap semantics 2000-02-22 18:41 ` Benjamin C.R. LaHaise @ 2000-02-23 10:57 ` Richard Guenther 2000-02-23 15:58 ` Benjamin C.R. LaHaise 0 siblings, 1 reply; 22+ messages in thread From: Richard Guenther @ 2000-02-23 10:57 UTC (permalink / raw) To: Benjamin C.R. LaHaise Cc: Richard Guenther, Linux Kernel List, glame-devel, Linux-MM On Tue, 22 Feb 2000, Benjamin C.R. LaHaise wrote: > On Tue, 22 Feb 2000, Richard Guenther wrote: > > > Hi! > > > > With the ongoing development of GLAME there arise the following > > problems with the backing-store management, which is a mmaped > > file and does "userspace virtual memory management": > > - I cannot see a way to mmap a part of the file but set the > > contents initially to zero, i.e. I want to setup an initially > > dirty zero-mapping which is assigned to a part of the file. > > Currently I'm just mmaping the part and do the zeroing by > > reading from /dev/zero (which does as I understand from the > > kernel code just create this zero mappings) - is there a more > > portable way to achieve this? > > Do you mean that you want to go above and beyond what ftruncate does? If > that's the case, reading from /dev/zero is probably the easiest thing, > although I suspect doing a sendfile from /dev/zero to the file will > ultimately end up being more efficient. > > If you are managed to do a read from /dev/zero into a shared file mapping > beyond the end of file without getting a SIGBUS, then that's a bug. No, I do not want to extend the file. I want to do a mmap of a _part_ (in the middle) of an existing and have this part automagically zeroed regardless if there was a hole (already zeroed) or some other data. With the mmap & read(/dev/zero) approach I can achieve this, but I do not know if it is portably (wrt to effectvieness). > > - I need to "drop" a mapping sometimes without writing the contents > > back to disk - I cannot see a way to do this with linux currently. > > Ideally a hole could be created in the mmapped file on drop time - > > is this possible at all with the VFS/ext2 at the moment (creating > > a hole in a file by dropping parts of it)? > > No, this is insanity. Creating holes in the middle of files actually cam > up when talking about ext2 changes, and frankly it doesn't make sense. > For example: on a filesystem that uses extents, creating a hole in the > middle of a file means that you might have to allocate more disk space in > order to free the disk space. Ok, so the this case is closed. > > So for the first case we could add a flag to mmap like MAP_ZERO to > > indicate a zero-map (dirty). > > Or teach truncate about preallocation. ?? I do not understand this. Truncate does not operate on ranges in the middle of a file, no? > > For the second case either the munmap call needs to be extended or > > some sort of madvise with a MADV_CLEAN flag? Or we can just adjust > > mprotect(PROT_NONE) and subsequent munmap() to do the dropping? > > Remember that madvise is only giving the system hints about what you want > it to do. If madvise allows you to mark a dirty page as clean without > doing a writeback, that could result in stale data residing in the page > cache for other users to come along and read without that data going to > disk -- not behaviour I want to see. If read returns it, it should be on > disk. Yes, so I need a new API for this. The scenario is - I have a dirty (shared) mapping of some part of a file (in a multithreaded environment) - I do reference counting on the "mmaps", i.e. I do handle out clusters of the (big) file to the threads for storing data. Also I do cache the mappings to save munmap (and in turn disk io) calls. - If the reference count drops to zero at any time, then I do not need the data in the (possibly dirty) mapping anymore - so I could just munmap them - BUT i generate needless disk io in this case (I dont need to get the pages written to disk, because I dont care for the contents in this case anyway). So how can I throw away a dirty (shared) mapping of a file without generating disk io? Remember, I do not care about the contents of the file at the mmap place. A possible solution would be to be able to convert a shared mapping to a private one? If I'm the only user of the shared mapping (so its a virtually private one) this should be easy - just "disconnect" it. In the other case I do not really know how to handle this. Richard. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: mmap/munmap semantics 2000-02-23 10:57 ` Richard Guenther @ 2000-02-23 15:58 ` Benjamin C.R. LaHaise 2000-02-24 10:06 ` Richard Guenther 0 siblings, 1 reply; 22+ messages in thread From: Benjamin C.R. LaHaise @ 2000-02-23 15:58 UTC (permalink / raw) To: Richard Guenther; +Cc: Linux Kernel List, glame-devel, Linux-MM On Wed, 23 Feb 2000, Richard Guenther wrote: > No, I do not want to extend the file. I want to do a mmap of a _part_ > (in the middle) of an existing and have this part automagically zeroed > regardless if there was a hole (already zeroed) or some other data. > With the mmap & read(/dev/zero) approach I can achieve this, but I do not > know if it is portably (wrt to effectvieness). It should be portable, but it still isn't as efficient as just dumping the pages since it generates io. > > > So for the first case we could add a flag to mmap like MAP_ZERO to > > > indicate a zero-map (dirty). > > > > Or teach truncate about preallocation. > > ?? I do not understand this. Truncate does not operate on ranges in the > middle of a file, no? I think I see what you're trying to do now, so just ignore this part =) > So how can I throw away a dirty (shared) mapping of a file without > generating disk io? Remember, I do not care about the contents of the file > at the mmap place. > A possible solution would be to be able to convert a shared mapping to > a private one? If I'm the only user of the shared mapping (so its a > virtually private one) this should be easy - just "disconnect" it. In the > other case I do not really know how to handle this. The most portable and easiest way to achieve this behaviour right now is to use individual files or shm segments for the shared mappings. Using SysV shared memory will get you the most performance since it won't get written back to disk early (like mmaped files). If that doesn't give you enough space, I strongly recommend using 1 file per shared "segment", since the semantics you get by truncating and then extending the mapping are exactly what you want. As a bonus, this technique works on filesystems that don't support files with holes =) -ben -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: mmap/munmap semantics 2000-02-23 15:58 ` Benjamin C.R. LaHaise @ 2000-02-24 10:06 ` Richard Guenther 0 siblings, 0 replies; 22+ messages in thread From: Richard Guenther @ 2000-02-24 10:06 UTC (permalink / raw) To: Benjamin C.R. LaHaise Cc: Richard Guenther, Linux Kernel List, glame-devel, Linux-MM On Wed, 23 Feb 2000, Benjamin C.R. LaHaise wrote: > On Wed, 23 Feb 2000, Richard Guenther wrote: > > > So how can I throw away a dirty (shared) mapping of a file without > > generating disk io? Remember, I do not care about the contents of the file > > at the mmap place. > > A possible solution would be to be able to convert a shared mapping to > > a private one? If I'm the only user of the shared mapping (so its a > > virtually private one) this should be easy - just "disconnect" it. In the > > other case I do not really know how to handle this. > > The most portable and easiest way to achieve this behaviour right now is > to use individual files or shm segments for the shared mappings. Using > SysV shared memory will get you the most performance since it won't get > written back to disk early (like mmaped files). If that doesn't give you > enough space, I strongly recommend using 1 file per shared "segment", > since the semantics you get by truncating and then extending the mapping > are exactly what you want. As a bonus, this technique works on > filesystems that don't support files with holes =) Yes, but unfortunately the individual file approach does not work in case we (ideally) want to operate on a whole disk... Richard. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: mmap/munmap semantics 2000-02-22 17:46 mmap/munmap semantics Richard Guenther 2000-02-22 18:36 ` James Antill 2000-02-22 18:41 ` Benjamin C.R. LaHaise @ 2000-02-22 21:48 ` Richard Gooch 2000-02-23 3:49 ` Eric W. Biederman 2000-02-23 18:48 ` Stephen C. Tweedie 4 siblings, 0 replies; 22+ messages in thread From: Richard Gooch @ 2000-02-22 21:48 UTC (permalink / raw) To: Richard Guenther; +Cc: Linux Kernel List, glame-devel, Linux-MM Richard Guenther writes: > Hi! > > With the ongoing development of GLAME there arise the following > problems with the backing-store management, which is a mmaped > file and does "userspace virtual memory management": > - I cannot see a way to mmap a part of the file but set the > contents initially to zero, i.e. I want to setup an initially > dirty zero-mapping which is assigned to a part of the file. > Currently I'm just mmaping the part and do the zeroing by > reading from /dev/zero (which does as I understand from the > kernel code just create this zero mappings) - is there a more > portable way to achieve this? > - I need to "drop" a mapping sometimes without writing the contents > back to disk - I cannot see a way to do this with linux currently. > Ideally a hole could be created in the mmapped file on drop time - > is this possible at all with the VFS/ext2 at the moment (creating > a hole in a file by dropping parts of it)? > > So for the first case we could add a flag to mmap like MAP_ZERO to > indicate a zero-map (dirty). > > For the second case either the munmap call needs to be extended or > some sort of madvise with a MADV_CLEAN flag? Or we can just adjust > mprotect(PROT_NONE) and subsequent munmap() to do the dropping? Maybe you can make use of the same driver that I'd like to use: - a processes opens the driver device file, and passes the FD to another "daemon" process (a child or whatever) - one process mmap(2)s the FD and does reads and writes to the VMA - the "daemon" does an ioctl(2) waiting for page fault events, and proceeds to read(2)/write(2) to satisfy the page faults - the driver passes page fault and free memory request events to the daemon. This allows you to set up a user-space virtual memory system. This is useful for me, it may be useful for you. All I need to do is find a victim to volunteer to write this ;-) Regards, Richard.... Permanent: rgooch@atnf.csiro.au Current: rgooch@ras.ucalgary.ca -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: mmap/munmap semantics 2000-02-22 17:46 mmap/munmap semantics Richard Guenther ` (2 preceding siblings ...) 2000-02-22 21:48 ` Richard Gooch @ 2000-02-23 3:49 ` Eric W. Biederman 2000-02-23 11:14 ` Richard Guenther 2000-02-23 15:44 ` Jamie Lokier 2000-02-23 18:48 ` Stephen C. Tweedie 4 siblings, 2 replies; 22+ messages in thread From: Eric W. Biederman @ 2000-02-23 3:49 UTC (permalink / raw) To: Richard Guenther; +Cc: Linux Kernel List, glame-devel, Linux-MM Richard Guenther <richard.guenther@student.uni-tuebingen.de> writes: > Hi! > > With the ongoing development of GLAME there arise the following > problems with the backing-store management, which is a mmaped > file and does "userspace virtual memory management": For this to be a productive discussion we need to know why you want this. What advantage do your changes/proposals provide. Most of this sounds like you want to zero memory quickly. Does your code care that the memory is zero, or can you get away with simply getting memory quickly? > - I cannot see a way to mmap a part of the file but set the > contents initially to zero, i.e. I want to setup an initially > dirty zero-mapping which is assigned to a part of the file. Why dirty? > Currently I'm just mmaping the part and do the zeroing by > reading from /dev/zero (which does as I understand from the > kernel code just create this zero mappings) - is there a more > portable way to achieve this? memset... > - I need to "drop" a mapping sometimes without writing the contents > back to disk - You do know that with a shared mapping the kernel can write the contents back to disk whenever it feels like it. What is the benefit of not writing things to disk? > I cannot see a way to do this with linux currently. > Ideally a hole could be created in the mmapped file on drop time - > is this possible at all with the VFS/ext2 at the moment (creating > a hole in a file by dropping parts of it)? Again why do you want this? > > So for the first case we could add a flag to mmap like MAP_ZERO to > indicate a zero-map (dirty). Possibly. madavise(MADV_WILLNEED) sounds probably like what you want. (After using ftruncate to zero everything quickly). > > For the second case either the munmap call needs to be extended or > some sort of madvise with a MADV_CLEAN flag? Poking holes is probably not what you want. The zeroing cost will be paid somewhere. > Or we can just adjust > mprotect(PROT_NONE) and subsequent munmap() to do the dropping? This is definetly not right. mprotect(PROT_NONE) has very clear semantics, as does munmap, and this suggestion would break them. Eric -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: mmap/munmap semantics 2000-02-23 3:49 ` Eric W. Biederman @ 2000-02-23 11:14 ` Richard Guenther 2000-02-23 15:44 ` Jamie Lokier 1 sibling, 0 replies; 22+ messages in thread From: Richard Guenther @ 2000-02-23 11:14 UTC (permalink / raw) To: Eric W. Biederman Cc: Richard Guenther, Linux Kernel List, glame-devel, Linux-MM On 22 Feb 2000, Eric W. Biederman wrote: > Richard Guenther <richard.guenther@student.uni-tuebingen.de> writes: > > > Hi! > > > > With the ongoing development of GLAME there arise the following > > problems with the backing-store management, which is a mmaped > > file and does "userspace virtual memory management": > > For this to be a productive discussion we need to know why > you want this. What advantage do your changes/proposals provide. > > Most of this sounds like you want to zero memory quickly. > Does your code care that the memory is zero, or can you get away > with simply getting memory quickly? No, I do not want to simply zero the memory quickly. The main goal is to avoid needless disk io. I'll try to elaborate on the use of the GLAME (Audio processing tool) "swapfile": - The swapfile is the backing store (large, preallocated, either a file or a complete disk) for audio tracks. - The swapfile is organized into clusters (as is virtual memory into pages) who are reference counted and may be shared between multiple audio tracks. - Access to the data of the audio tracks goes through handling out a memory map of one swapfile cluster at a time (clusters are not fixed in size but at least page aligned) - Those memory maps of the clusters are cached to aviod the disk io of mmap/munmap sequences, a cluster is mmapped at most one time (from the GLAME program) but the address of the mapping is handed out possibly n times to the worker threads (this is reference counted). - If the reference count of a cluster (not the mapping) drops to zero, it is no longer used by any of the audio tracks which populate the swapfile. Now I want to munmap the mapping of the no longer used swapfile cluster _without_ generating any disk io (i.e. I do not care about the actual contents of the swapfile at the place of the cluster). This cannot be done at the moment? - If a cluster is mmaped the first time the semantics of the GLAME swapfile require it to be zeroed. At the moment I achieve this by mmapping the cluster and zeroing it by reading from /dev/zero - this is fine, but ideally I do not want the disk copy of the cluster to change (to these zeros) until somebody actually _writes_ to the mapping. I.e. ideally I want to have a clean ro mmap of /dev/zero, handle a SIGSEGV in user space and somehow exchange the private mapping of /dev/zero with a shared rw mapping of the swapfile (of course inclusive zeroing this mapping first) and just continue. So I have sort of virtual memory management with a automatic updated disk copy - but please with the least amount of disk io possible. Does this sound reasonable? Richard. > > - I cannot see a way to mmap a part of the file but set the > > contents initially to zero, i.e. I want to setup an initially > > dirty zero-mapping which is assigned to a part of the file. > Why dirty? This would be the easiest way - I want the zeroes written back to disk. > > Currently I'm just mmaping the part and do the zeroing by > > reading from /dev/zero (which does as I understand from the > > kernel code just create this zero mappings) - is there a more > > portable way to achieve this? > memset... Portable, yes - efficient? certainly no in the read only case. > > - I need to "drop" a mapping sometimes without writing the contents > > back to disk - > > You do know that with a shared mapping the kernel can write > the contents back to disk whenever it feels like it. > What is the benefit of not writing things to disk? I know that the kernel can write back to disk at any time it likes. I do not care about this possibility. But at a certain point I know I will not need the contents of a _part_ of the file anymore. So I want to avoid syncing back a huge amount of memory to disk at munmap time if I know I wont need the data. (huge amount == 4 to 32 MB usually) > > I cannot see a way to do this with linux currently. > > Ideally a hole could be created in the mmapped file on drop time - > > is this possible at all with the VFS/ext2 at the moment (creating > > a hole in a file by dropping parts of it)? > Again why do you want this? See my above description > > > > So for the first case we could add a flag to mmap like MAP_ZERO to > > indicate a zero-map (dirty). > > Possibly. madavise(MADV_WILLNEED) sounds probably like what you > want. (After using ftruncate to zero everything quickly). I'm satisfied with the read from /dev/zero approach, but ideally - see above - I really want to switch from a private to a shared mapping on-demand (on write). > > For the second case either the munmap call needs to be extended or > > some sort of madvise with a MADV_CLEAN flag? > Poking holes is probably not what you want. The zeroing cost > will be paid somewhere. Ok, I got this. Creating holes is no longer on my wishlist - but avoiding nedless disk io is. > > Or we can just adjust > > mprotect(PROT_NONE) and subsequent munmap() to do the dropping? > > This is definetly not right. mprotect(PROT_NONE) has very clear > semantics, as does munmap, and this suggestion would break them. Ok, just an idea. Richard. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: mmap/munmap semantics 2000-02-23 3:49 ` Eric W. Biederman 2000-02-23 11:14 ` Richard Guenther @ 2000-02-23 15:44 ` Jamie Lokier 1 sibling, 0 replies; 22+ messages in thread From: Jamie Lokier @ 2000-02-23 15:44 UTC (permalink / raw) To: Eric W. Biederman Cc: Richard Guenther, Linux Kernel List, glame-devel, Linux-MM Eric W. Biederman wrote: > > For the second case either the munmap call needs to be extended or > > some sort of madvise with a MADV_CLEAN flag? > Poking holes is probably not what you want. The zeroing cost > will be paid somewhere. MADV_CLEAN, or perhaps a different syscall mdiscard() (as it's page based and doesn't change vmas) looks utterly wrong for this application, but it does have a very nice use for memory allocators. With memory allocators you could use mdiscard to tell the kernel to decide whether to replace a privately mapped page by its original backing page. For /dev/zero that means you can let the kernel decide whether to reclaim the memory, or if the application can keep the page. The nice part is that the decision can be deferred: you are simply informing the kernel that a page can be reclaimed later on demand. But the application doesn't need to know when the decision happens -- it assumes it is immediate. This is appropriate for freed memory areas, and is not something that the application can do itself. mmaping /dev/zero over the page doesn't work because that _always_ causes an undesirable zero copy, not to mention expensive vma operations, when what you want is to simply mark pages for potential reclaim _if_ the kernel decides it could reclaim the page in the intervening time. enjoy, -- Jamie -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: mmap/munmap semantics 2000-02-22 17:46 mmap/munmap semantics Richard Guenther ` (3 preceding siblings ...) 2000-02-23 3:49 ` Eric W. Biederman @ 2000-02-23 18:48 ` Stephen C. Tweedie 2000-02-24 2:35 ` Jamie Lokier 4 siblings, 1 reply; 22+ messages in thread From: Stephen C. Tweedie @ 2000-02-23 18:48 UTC (permalink / raw) To: Richard Guenther; +Cc: Linux Kernel List, glame-devel, Linux-MM Hi, On Tue, 22 Feb 2000 18:46:02 +0100 (MET), Richard Guenther <richard.guenther@student.uni-tuebingen.de> said: > With the ongoing development of GLAME there arise the following > problems with the backing-store management, which is a mmaped > file and does "userspace virtual memory management": > - I cannot see a way to mmap a part of the file but set the > contents initially to zero, All file contents default to zero anyway, so just ftruncate() the file to create as much demand-zeroed mmapable memory as you want. > - I need to "drop" a mapping sometimes without writing the contents > back to disk - I cannot see a way to do this with linux currently. The only way is to use Chuck Lever's madvise() patches: madvise(MADV_DONTNEED) is exactly what you need there. It's not yet in Linus's 2.3 tree, but the API is pretty standard. > Ideally a hole could be created in the mmapped file on drop time - No, if the mmaped area has already been flushed to disk then there is no way at all to recreate the hole except by truncating and then re-extending the file (which destroys everything until EOF, of course). --Stephen -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: mmap/munmap semantics 2000-02-23 18:48 ` Stephen C. Tweedie @ 2000-02-24 2:35 ` Jamie Lokier 2000-02-24 12:13 ` Stephen C. Tweedie 0 siblings, 1 reply; 22+ messages in thread From: Jamie Lokier @ 2000-02-24 2:35 UTC (permalink / raw) To: Stephen C. Tweedie Cc: Richard Guenther, Linux Kernel List, glame-devel, Linux-MM Stephen C. Tweedie wrote: > > - I need to "drop" a mapping sometimes without writing the contents > > back to disk - I cannot see a way to do this with linux currently. > > The only way is to use Chuck Lever's madvise() patches: > madvise(MADV_DONTNEED) is exactly what you need there. It's not yet in > Linus's 2.3 tree, but the API is pretty standard. I don't think MADV_DONTNEED actually drops privately modified data does it? I thought it was merely a hint to the kernel that the data will not be accessed again soon, so it can be paged out or, if unmodified, dropped. All the other MADV_* flags are access hints. -- Jamie -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: mmap/munmap semantics 2000-02-24 2:35 ` Jamie Lokier @ 2000-02-24 12:13 ` Stephen C. Tweedie 2000-02-24 12:24 ` Richard Guenther ` (2 more replies) 0 siblings, 3 replies; 22+ messages in thread From: Stephen C. Tweedie @ 2000-02-24 12:13 UTC (permalink / raw) To: Jamie Lokier Cc: Stephen C. Tweedie, Richard Guenther, Linux Kernel List, glame-devel, Linux-MM Hi, On Thu, 24 Feb 2000 03:35:02 +0100, Jamie Lokier <lk@tantalophile.demon.co.uk> said: > I don't think MADV_DONTNEED actually drops privately modified data does > it? Yes, it does. From the DU man pages: MADV_DONTNEED Do not need these pages The system will free any whole pages in the specified region. All modifications will be lost and any swapped out pages will be discarded. Subsequent access to the region will result in a zero-fill-on-demand fault as though it is being accessed for the first time. Reserved swap space is not affected by this call. Regarding the other half of the problem --- zeroing out a portion of a file without further IO --- the splice code I hope to have using kiobufs in 2.5 will allow this to be done very easily. You'll be able to take a region of /dev/zero and splice it into your open file with zero-copy. --Stephen -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: mmap/munmap semantics 2000-02-24 12:13 ` Stephen C. Tweedie @ 2000-02-24 12:24 ` Richard Guenther 2000-02-24 13:51 ` Stephen C. Tweedie 2000-02-24 15:01 ` kernel 2000-02-24 13:06 ` lars brinkhoff 2000-02-24 13:41 ` Eric W. Biederman 2 siblings, 2 replies; 22+ messages in thread From: Richard Guenther @ 2000-02-24 12:24 UTC (permalink / raw) To: Stephen C. Tweedie Cc: Jamie Lokier, Richard Guenther, Linux Kernel List, glame-devel, Linux-MM On Thu, 24 Feb 2000, Stephen C. Tweedie wrote: > Hi, > > On Thu, 24 Feb 2000 03:35:02 +0100, Jamie Lokier > <lk@tantalophile.demon.co.uk> said: > > > I don't think MADV_DONTNEED actually drops privately modified data does > > it? > > Yes, it does. From the DU man pages: > > MADV_DONTNEED > Do not need these pages > > The system will free any whole pages in the specified > region. All modifications will be lost and any swapped > out pages will be discarded. Subsequent access to the > region will result in a zero-fill-on-demand fault as > though it is being accessed for the first time. > Reserved swap space is not affected by this call. Ah, this is cool - exactly what we need. I.e. an madvise(MADV_DONTNEED) and a subsequent munmap should not generate any disk io? > Regarding the other half of the problem --- zeroing out a portion of a > file without further IO --- the splice code I hope to have using kiobufs > in 2.5 will allow this to be done very easily. You'll be able to take a > region of /dev/zero and splice it into your open file with zero-copy. Cool, too. So for now we will stay with zeroing by reading from /dev/zero which does vm tricks in linux already. Richard. > --Stephen > > _______________________________________________ > glame-devel mailing list > glame-devel@lists.sourceforge.net > http://lists.sourceforge.net/mailman/listinfo/glame-devel > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: mmap/munmap semantics 2000-02-24 12:24 ` Richard Guenther @ 2000-02-24 13:51 ` Stephen C. Tweedie 2000-02-24 15:01 ` kernel 1 sibling, 0 replies; 22+ messages in thread From: Stephen C. Tweedie @ 2000-02-24 13:51 UTC (permalink / raw) To: Richard Guenther Cc: Stephen C. Tweedie, Jamie Lokier, Linux Kernel List, glame-devel, Linux-MM Hi, On Thu, 24 Feb 2000 13:24:07 +0100 (MET), Richard Guenther <richard.guenther@student.uni-tuebingen.de> said: > Ah, this is cool - exactly what we need. I.e. an > madvise(MADV_DONTNEED) and a subsequent munmap should not generate any > disk io? If you do the MADV_DONTNEED before the VM system has decided to flush things out for its own reasons, then yes. At least, according to one reading of the specs. There doesn't seem to be consensus yet on precisely what this call is supposed to do --- BSD and Digital Unix man pages are contradictory on this. --Stephen -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: mmap/munmap semantics 2000-02-24 12:24 ` Richard Guenther 2000-02-24 13:51 ` Stephen C. Tweedie @ 2000-02-24 15:01 ` kernel 2000-02-24 15:03 ` Richard Guenther 1 sibling, 1 reply; 22+ messages in thread From: kernel @ 2000-02-24 15:01 UTC (permalink / raw) To: Richard Guenther Cc: Stephen C. Tweedie, Jamie Lokier, Linux Kernel List, glame-devel, Linux-MM On Thu, 24 Feb 2000, Richard Guenther wrote: > Cool, too. So for now we will stay with zeroing by reading from /dev/zero > which does vm tricks in linux already. It does not do tricks when you are dealing with a shared mapping. -ben -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: mmap/munmap semantics 2000-02-24 15:01 ` kernel @ 2000-02-24 15:03 ` Richard Guenther 2000-02-24 15:15 ` Jamie Lokier 0 siblings, 1 reply; 22+ messages in thread From: Richard Guenther @ 2000-02-24 15:03 UTC (permalink / raw) To: kernel Cc: Richard Guenther, Stephen C. Tweedie, Jamie Lokier, Linux Kernel List, glame-devel, Linux-MM On Thu, 24 Feb 2000 kernel@kvack.org wrote: > On Thu, 24 Feb 2000, Richard Guenther wrote: > > > Cool, too. So for now we will stay with zeroing by reading from /dev/zero > > which does vm tricks in linux already. > > It does not do tricks when you are dealing with a shared mapping. Oops, so I misread the code in drivers/char/mem.c ... well, so how can I get the same effect as for the private mapping? Not at the moment, I think? So memset should be faster than reading from /dev/zero? Richard. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: mmap/munmap semantics 2000-02-24 15:03 ` Richard Guenther @ 2000-02-24 15:15 ` Jamie Lokier 0 siblings, 0 replies; 22+ messages in thread From: Jamie Lokier @ 2000-02-24 15:15 UTC (permalink / raw) To: Richard Guenther Cc: kernel, Stephen C. Tweedie, Linux Kernel List, glame-devel, Linux-MM Richard Guenther wrote: > Oops, so I misread the code in drivers/char/mem.c ... well, so how can I > get the same effect as for the private mapping? Not at the moment, I > think? So memset should be faster than reading from /dev/zero? Try them both. /dev/zero may be faster eventually, once kiobufs do clever things. For the moment they should be about the same speed apart from syscall entry cost, so zeroing a large region would be fine with /dev/zero, and for a small region even when kiobufs are working, you probably don't want the overhead of messing with page tables for a small region. enjoy, -- Jamie -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: mmap/munmap semantics 2000-02-24 12:13 ` Stephen C. Tweedie 2000-02-24 12:24 ` Richard Guenther @ 2000-02-24 13:06 ` lars brinkhoff 2000-02-24 14:42 ` Jamie Lokier 2000-02-24 13:41 ` Eric W. Biederman 2 siblings, 1 reply; 22+ messages in thread From: lars brinkhoff @ 2000-02-24 13:06 UTC (permalink / raw) To: Stephen C. Tweedie Cc: Jamie Lokier, Richard Guenther, Linux Kernel List, glame-devel, Linux-MM "Stephen C. Tweedie" wrote: > On Thu, 24 Feb 2000 03:35:02 +0100, Jamie Lokier > <lk@tantalophile.demon.co.uk> said: > > I don't think MADV_DONTNEED actually drops privately modified data does > > it? > Yes, it does. From the DU man pages: > > MADV_DONTNEED > Do not need these pages > > The system will free any whole pages in the specified > region. All modifications will be lost and any swapped > out pages will be discarded. Subsequent access to the > region will result in a zero-fill-on-demand fault as > though it is being accessed for the first time. > Reserved swap space is not affected by this call. >From a FreeBSD man page at http://dorifer.heim3.tu-clausthal.de/cgi-bin/man/madvise.2.html MADV_DONTNEED Allows the VM system to decrease the in-memory priority of pages in the specified range. Additionally future references to this address range will incur a page fault. MADV_FREE Gives the VM system the freedom to free pages, and tells the system that information in the specified page range is no longer important. This is an efficient way of al- lowing malloc(3) to free pages anywhere in the address space, while keeping the address space valid. The next time that the page is referenced, the page might be de- mand zeroed, or might contain the data that was there before the MADV_FREE call. References made to that ad- dress space range will not make the VM system page the information back in from backing store until the page is modified again. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: mmap/munmap semantics 2000-02-24 13:06 ` lars brinkhoff @ 2000-02-24 14:42 ` Jamie Lokier 0 siblings, 0 replies; 22+ messages in thread From: Jamie Lokier @ 2000-02-24 14:42 UTC (permalink / raw) To: lars brinkhoff Cc: Stephen C. Tweedie, Richard Guenther, Linux Kernel List, glame-devel, Linux-MM lars brinkhoff wrote: > >From a FreeBSD man page at > http://dorifer.heim3.tu-clausthal.de/cgi-bin/man/madvise.2.html I like the FreeBSD best, of all the man page snippets posted recently. Everything it does is actually useful. It's unfortunate that different systems define MADV_DONTNEED differently though.. Let's run through useful behaviours one by one. 1. A hint to the VM system: I've finished using this data. If it's modified, you can write it back right away. If not, you can discard it. FreeBSD's MADV_DONTNEED does this, but DU's doesn't. FreeBSD: > MADV_DONTNEED Allows the VM system to decrease the in-memory priority > of pages in the specified range. Additionally future > references to this address range will incur a page > fault. To avoid ambiguity, perhaps we could call this one MADV_DONE? In BSD compatibility mode, Glibc would define MADV_DONTNEED to be MADV_DONE. In standard mode it would not define MADV_DONTNEED at all. 2. Zeroing a range in a private map. DU's MADV_DONTNEED does this -- that's my reading of the man page. Digital Unix: (?yes) > MADV_DONTNEED Do not need these pages > The system will free any whole pages in the specified > region. All modifications will be lost and any swapped > out pages will be discarded. Subsequent access to the > region will result in a zero-fill-on-demand fault as > though it is being accessed for the first time. > Reserved swap space is not affected by this call. For Linux, simply read /dev/zero into the selected range. The kernel already optimises this case for anonymous mappings. If doing it in general turns out to be too hard to implement, I propose MADV_ZERO should have this effect: exactly like reading /dev/zero into the range, but always efficient. 3. Zeroing a range in a shared map. I have no idea if DU's MADV_DONTNEED has this effect, or whether it only has this effect on shared anonymous mappings. In any case, reading /dev/zero into the range will always have the desired effect, and Stephen's work will eventually make this efficient on Linux. Again, if the kiobuf work doesn't have the desired effect, I propose MADV_ZERO should be exactly like reading /dev/zero into the range, and efficiently if the underlying mapped object can do so efficiently. 4. Deferred freeing of pages. FreeBSD's MADV_FREE does this, according to the posted manual snippet. I like this very much -- it is perfect for a wide variety of memory allocators. FreeBSD: > MADV_FREE Gives the VM system the freedom to free pages, and tells > the system that information in the specified page range > is no longer important. This is an efficient way of al- > lowing malloc(3) to free pages anywhere in the address > space, while keeping the address space valid. The next > time that the page is referenced, the page might be de- > mand zeroed, or might contain the data that was there > before the MADV_FREE call. References made to that ad- > dress space range will not make the VM system page the > information back in from backing store until the page is > modified again. I like this so much I started coding it a long time ago, as an mdiscard syscall. But then I got onto something else. The principle here is very simple: MADV_FREE marks all the pages in the region as "discardable", and clears the accessed and dirty bits of those pages. Later when the kernel needs to free some memory, it is permitted to free "discardable" pages immediately provided they are still not accessed or dirty. When vmscan is clearing the accessed and dirty bits on pages, if they were set it must clear the " discardable" bit. This allows malloc() and other user space allocators to free pages back to the system. Unlike DU's MADV_DONTNEED, or mmapping /dev/zero, if the system does not need the page there is no inefficient zero-copy. If there was, malloc() would be better off not bothering to return the pages. The FreeBSD man page seems ambiguous about the effect on a shared mapping: is the underlying page marked "discardable", or just the page table entry in this particular vm mapping? Also I note that the page is always zero filled if it was discarded. That's fine for anonymous mappings. For mapped files, is MADV_FREE permitted at all? If so, should discarding the page replace it with a zero page, or with the underlying file's page before private modifications? I propose this is a useful behaviour, and MADV_FREE is a fine name. Alternatively, MADV_RESTORE if the behaviour is defined in terms of discarding private modifications, just as if you had re-done the mmap() in the region. For private anonymous mappings the behaviours are equivalent; for file mappings, they are not. Summary ------- Four handy behaviours. 1. MADV_DONE 2. MADV_ZERO or read /dev/zero 3. MADV_ZERO or read /dev/zero 4. MADV_FREE and/or MADV_RESTORE have a nice day, -- Jamie -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: mmap/munmap semantics 2000-02-24 12:13 ` Stephen C. Tweedie 2000-02-24 12:24 ` Richard Guenther 2000-02-24 13:06 ` lars brinkhoff @ 2000-02-24 13:41 ` Eric W. Biederman 2000-02-24 13:49 ` Stephen C. Tweedie 2 siblings, 1 reply; 22+ messages in thread From: Eric W. Biederman @ 2000-02-24 13:41 UTC (permalink / raw) To: Stephen C. Tweedie Cc: Jamie Lokier, Richard Guenther, Linux Kernel List, glame-devel, Linux-MM "Stephen C. Tweedie" <sct@redhat.com> writes: > Hi, > > On Thu, 24 Feb 2000 03:35:02 +0100, Jamie Lokier > <lk@tantalophile.demon.co.uk> said: > > > I don't think MADV_DONTNEED actually drops privately modified data does > > it? > > Yes, it does. From the DU man pages: > > MADV_DONTNEED > Do not need these pages > > The system will free any whole pages in the specified > region. All modifications will be lost and any swapped > out pages will be discarded. Subsequent access to the > region will result in a zero-fill-on-demand fault as > though it is being accessed for the first time. > Reserved swap space is not affected by this call. Which is fine but if it works this way on shared memory it is broken, at least unless all mappings set (MADV_DONTNEED) and you can prove there was no file-io. Otherwise you could loose legitimate file writes. Also from an irix man page: MADV_DONTNEED informs the system that the address range from addr to addr + len will likely not be referenced in the near future. The memory to which the indicated addresses are mapped will be the first to be reclaimed when memory is needed by the system. Eric -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: mmap/munmap semantics 2000-02-24 13:41 ` Eric W. Biederman @ 2000-02-24 13:49 ` Stephen C. Tweedie 0 siblings, 0 replies; 22+ messages in thread From: Stephen C. Tweedie @ 2000-02-24 13:49 UTC (permalink / raw) To: Eric W. Biederman Cc: Stephen C. Tweedie, Jamie Lokier, Richard Guenther, Linux Kernel List, glame-devel, Linux-MM Hi, On 24 Feb 2000 07:41:45 -0600, ebiederm+eric@ccr.net (Eric W. Biederman) said: >> The system will free any whole pages in the specified >> region. All modifications will be lost and any swapped >> out pages will be discarded. Subsequent access to the >> region will result in a zero-fill-on-demand fault as >> though it is being accessed for the first time. >> Reserved swap space is not affected by this call. > Which is fine but if it works this way on shared memory it is broken, > at least unless all mappings set (MADV_DONTNEED) and you can prove there > was no file-io. Otherwise you could loose legitimate file writes. Not necessarily, if this behaviour is defined. It is no more broken than the fact that write() can overwrite another process's data, or truncate() can invalidate another process's mapping. This is an explicitly destructive system call and the user must have write access to the file. The discarding of modifications is obviously correct if the mapping is MAP_PRIVATE, but I'd be interested in seeing what other Unixen actually do on MAP_SHARED maps. Similarly, msync(MS_INVALIDATE) is expected to discard modifications by some applications (and I've personally had requests for this funcationality from vendors whose applications use it on shared memory segments). Its definition in DU includes: After a successful call to the msync() function with the flags parameter set to MS_INVALIDATE, all previous modifications to the file using the write() function are visible to the mapped region. Previous direct modifications to the mapped region might be lost. Again it isn't explicit whether this applies only to MAP_PRIVATE or to MAP_SHARED too. --Stephen -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2000-02-24 15:15 UTC | newest] Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2000-02-22 17:46 mmap/munmap semantics Richard Guenther 2000-02-22 18:36 ` James Antill 2000-02-22 18:41 ` Benjamin C.R. LaHaise 2000-02-23 10:57 ` Richard Guenther 2000-02-23 15:58 ` Benjamin C.R. LaHaise 2000-02-24 10:06 ` Richard Guenther 2000-02-22 21:48 ` Richard Gooch 2000-02-23 3:49 ` Eric W. Biederman 2000-02-23 11:14 ` Richard Guenther 2000-02-23 15:44 ` Jamie Lokier 2000-02-23 18:48 ` Stephen C. Tweedie 2000-02-24 2:35 ` Jamie Lokier 2000-02-24 12:13 ` Stephen C. Tweedie 2000-02-24 12:24 ` Richard Guenther 2000-02-24 13:51 ` Stephen C. Tweedie 2000-02-24 15:01 ` kernel 2000-02-24 15:03 ` Richard Guenther 2000-02-24 15:15 ` Jamie Lokier 2000-02-24 13:06 ` lars brinkhoff 2000-02-24 14:42 ` Jamie Lokier 2000-02-24 13:41 ` Eric W. Biederman 2000-02-24 13:49 ` Stephen C. Tweedie
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox