* [RFC] sys_punchhole() @ 2005-11-10 23:23 Badari Pulavarty 2005-11-10 23:32 ` Andrew Morton 2005-11-11 5:18 ` Arjan van de Ven 0 siblings, 2 replies; 17+ messages in thread From: Badari Pulavarty @ 2005-11-10 23:23 UTC (permalink / raw) To: akpm, andrea, hugh; +Cc: lkml, linux-mm Hi Andrew, We discussed this in madvise(REMOVE) thread - to add support for sys_punchhole(fd, offset, len) to complete the functionality (in the future). http://marc.theaimsgroup.com/?l=linux-mm&m=113036713810002&w=2 What I am wondering is, should I invest time now to do it ? Or wait till need arises ? My thought line is, I would add a generic_zeroblocks_range() function which would zero out the given range of pages and flush to disk. Use this as a default operation, if the filesystems doesn't provide a specific function to free up the blocks. Would this work ? Suggestions ? Thanks, Badari -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] sys_punchhole() 2005-11-10 23:23 [RFC] sys_punchhole() Badari Pulavarty @ 2005-11-10 23:32 ` Andrew Morton 2005-11-10 23:41 ` Badari Pulavarty ` (2 more replies) 2005-11-11 5:18 ` Arjan van de Ven 1 sibling, 3 replies; 17+ messages in thread From: Andrew Morton @ 2005-11-10 23:32 UTC (permalink / raw) To: Badari Pulavarty; +Cc: andrea, hugh, linux-kernel, linux-mm Badari Pulavarty <pbadari@us.ibm.com> wrote: > > We discussed this in madvise(REMOVE) thread - to add support > for sys_punchhole(fd, offset, len) to complete the functionality > (in the future). > > http://marc.theaimsgroup.com/?l=linux-mm&m=113036713810002&w=2 > > What I am wondering is, should I invest time now to do it ? I haven't even heard anyone mention a need for this in the past 1-2 years. > Or wait till need arises ? A long wait, I suspect.. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] sys_punchhole() 2005-11-10 23:32 ` Andrew Morton @ 2005-11-10 23:41 ` Badari Pulavarty 2005-11-10 23:55 ` Anton Altaparmakov 2005-11-11 8:25 ` Ingo Oeser 2005-11-13 15:09 ` Pavel Machek 2 siblings, 1 reply; 17+ messages in thread From: Badari Pulavarty @ 2005-11-10 23:41 UTC (permalink / raw) To: Andrew Morton; +Cc: andrea, hugh, lkml, linux-mm On Thu, 2005-11-10 at 15:32 -0800, Andrew Morton wrote: > Badari Pulavarty <pbadari@us.ibm.com> wrote: > > > > We discussed this in madvise(REMOVE) thread - to add support > > for sys_punchhole(fd, offset, len) to complete the functionality > > (in the future). > > > > http://marc.theaimsgroup.com/?l=linux-mm&m=113036713810002&w=2 > > > > What I am wondering is, should I invest time now to do it ? > > I haven't even heard anyone mention a need for this in the past 1-2 years. > > > Or wait till need arises ? > > A long wait, I suspect.. > Okay. I guess, I will wait till someone needs it. I am just trying to increase my chances of "getting my madvise(REMOVE) patch into mainline" :) Thanks, Badari -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] sys_punchhole() 2005-11-10 23:41 ` Badari Pulavarty @ 2005-11-10 23:55 ` Anton Altaparmakov 0 siblings, 0 replies; 17+ messages in thread From: Anton Altaparmakov @ 2005-11-10 23:55 UTC (permalink / raw) To: Badari Pulavarty; +Cc: Andrew Morton, andrea, hugh, lkml, linux-mm On Thu, 10 Nov 2005, Badari Pulavarty wrote: > On Thu, 2005-11-10 at 15:32 -0800, Andrew Morton wrote: > > Badari Pulavarty <pbadari@us.ibm.com> wrote: > > > > > > We discussed this in madvise(REMOVE) thread - to add support > > > for sys_punchhole(fd, offset, len) to complete the functionality > > > (in the future). > > > > > > http://marc.theaimsgroup.com/?l=linux-mm&m=113036713810002&w=2 > > > > > > What I am wondering is, should I invest time now to do it ? > > > > I haven't even heard anyone mention a need for this in the past 1-2 years. > > > > > Or wait till need arises ? > > > > A long wait, I suspect.. > > > > Okay. I guess, I will wait till someone needs it. > > I am just trying to increase my chances of "getting my madvise(REMOVE) > patch into mainline" :) > It may be worth asking the Samba people if they want it given that Windows has such a function (but it is not a syscall, it is a fsctl - FSCTL_SET_ZERO_DATA), so Samba may want to have it, too... And in case you care, NTFS already has such functionality (currently only used in error handling) and implementing the sys_punchole() fs-specific function for ntfs will therefore be trivial... Best regards, Anton -- Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @) Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] sys_punchhole() 2005-11-10 23:32 ` Andrew Morton 2005-11-10 23:41 ` Badari Pulavarty @ 2005-11-11 8:25 ` Ingo Oeser 2005-11-11 19:07 ` Christoph Lameter 2005-11-16 12:08 ` Rob Landley 2005-11-13 15:09 ` Pavel Machek 2 siblings, 2 replies; 17+ messages in thread From: Ingo Oeser @ 2005-11-11 8:25 UTC (permalink / raw) To: linux-kernel; +Cc: Andrew Morton, Badari Pulavarty, andrea, hugh, linux-mm [-- Attachment #1: Type: text/plain, Size: 895 bytes --] Hi, On Friday 11 November 2005 00:32, Andrew Morton wrote: > Badari Pulavarty <pbadari@us.ibm.com> wrote: > > > > We discussed this in madvise(REMOVE) thread - to add support > > for sys_punchhole(fd, offset, len) to complete the functionality > > (in the future). > > > > http://marc.theaimsgroup.com/?l=linux-mm&m=113036713810002&w=2 > > > > What I am wondering is, should I invest time now to do it ? > > I haven't even heard anyone mention a need for this in the past 1-2 years. Because the people need it are usally at the application level. It would be useful with hard disk editing. But this would need a move_blocks within the filesystem, which could attach a given list of blocks to another file. E.g. mremap() for files :-) Both together would make harddisk video editing with linux quite performant and less error prone. Regards Ingo Oeser [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] sys_punchhole() 2005-11-11 8:25 ` Ingo Oeser @ 2005-11-11 19:07 ` Christoph Lameter 2005-11-16 12:08 ` Rob Landley 1 sibling, 0 replies; 17+ messages in thread From: Christoph Lameter @ 2005-11-11 19:07 UTC (permalink / raw) To: Ingo Oeser Cc: linux-kernel, Andrew Morton, Badari Pulavarty, andrea, hugh, linux-mm On Fri, 11 Nov 2005, Ingo Oeser wrote: > > I haven't even heard anyone mention a need for this in the past 1-2 years. > > Because the people need it are usally at the application level. > It would be useful with hard disk editing. > > But this would need a move_blocks within the filesystem, which > could attach a given list of blocks to another file. > > E.g. mremap() for files :-) Something similar to that is included in my patch migration patchsets. It will also allow you to selectively push pages in a range out. So it does something similar to hole punching. For that you would scan over the range to be cleared and put the pages on a list using isolate_lru_page(). Then do whatever you need to with the pages. Push em out with migrate_pages(list, NULL) etc. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] sys_punchhole() 2005-11-11 8:25 ` Ingo Oeser 2005-11-11 19:07 ` Christoph Lameter @ 2005-11-16 12:08 ` Rob Landley 2005-11-16 12:20 ` Andrea Arcangeli 1 sibling, 1 reply; 17+ messages in thread From: Rob Landley @ 2005-11-16 12:08 UTC (permalink / raw) To: Ingo Oeser Cc: linux-kernel, Andrew Morton, Badari Pulavarty, andrea, hugh, linux-mm On Friday 11 November 2005 02:25, Ingo Oeser wrote: > Hi, > > On Friday 11 November 2005 00:32, Andrew Morton wrote: > > Badari Pulavarty <pbadari@us.ibm.com> wrote: > > > We discussed this in madvise(REMOVE) thread - to add support > > > for sys_punchhole(fd, offset, len) to complete the functionality > > > (in the future). You know, if you wanted to get really really gross and disgusting about this, you could always have write(fd, NULL, count) punch a hole in the file. (Then have libc's write() check for NULL and error out, and have a seprate punch() call that does the write with the null...) Just one way to avoid introducing a new syscall... Rob -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] sys_punchhole() 2005-11-16 12:08 ` Rob Landley @ 2005-11-16 12:20 ` Andrea Arcangeli 0 siblings, 0 replies; 17+ messages in thread From: Andrea Arcangeli @ 2005-11-16 12:20 UTC (permalink / raw) To: Rob Landley Cc: Ingo Oeser, linux-kernel, Andrew Morton, Badari Pulavarty, hugh, linux-mm On Wed, Nov 16, 2005 at 06:08:18AM -0600, Rob Landley wrote: > You know, if you wanted to get really really gross and disgusting about this, > you could always have write(fd, NULL, count) punch a hole in the file. (Then > have libc's write() check for NULL and error out, and have a seprate punch() > call that does the write with the null...) > > Just one way to avoid introducing a new syscall... That would add an unnecessary branch in write(3). I don't think it worth it, we'd rather go full speed and use the syscall table for it. Plus it sounds safer in general to keep it separate (just in case someone isn't using glibc but some other dietlibc or similar ;) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] sys_punchhole() 2005-11-10 23:32 ` Andrew Morton 2005-11-10 23:41 ` Badari Pulavarty 2005-11-11 8:25 ` Ingo Oeser @ 2005-11-13 15:09 ` Pavel Machek 2005-11-16 22:01 ` Badari Pulavarty 2005-11-18 16:42 ` Ragnar Kjørstad 2 siblings, 2 replies; 17+ messages in thread From: Pavel Machek @ 2005-11-13 15:09 UTC (permalink / raw) To: Andrew Morton; +Cc: Badari Pulavarty, andrea, hugh, linux-kernel, linux-mm Hi! > > We discussed this in madvise(REMOVE) thread - to add support > > for sys_punchhole(fd, offset, len) to complete the functionality > > (in the future). > > > > http://marc.theaimsgroup.com/?l=linux-mm&m=113036713810002&w=2 > > > > What I am wondering is, should I invest time now to do it ? > > I haven't even heard anyone mention a need for this in the past 1-2 years. Some database people wanted it maybe month ago. It was replaced by some madvise hack... -- 64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] sys_punchhole() 2005-11-13 15:09 ` Pavel Machek @ 2005-11-16 22:01 ` Badari Pulavarty 2005-11-16 23:37 ` Ric Wheeler 2005-11-21 6:46 ` Rob Landley 2005-11-18 16:42 ` Ragnar Kjørstad 1 sibling, 2 replies; 17+ messages in thread From: Badari Pulavarty @ 2005-11-16 22:01 UTC (permalink / raw) To: Pavel Machek; +Cc: Andrew Morton, andrea, hugh, lkml, linux-mm On Sun, 2005-11-13 at 15:09 +0000, Pavel Machek wrote: > Hi! > > > > We discussed this in madvise(REMOVE) thread - to add support > > > for sys_punchhole(fd, offset, len) to complete the functionality > > > (in the future). > > > > > > http://marc.theaimsgroup.com/?l=linux-mm&m=113036713810002&w=2 > > > > > > What I am wondering is, should I invest time now to do it ? > > > > I haven't even heard anyone mention a need for this in the past 1-2 years. > > Some database people wanted it maybe month ago. It was replaced by some > madvise hack... Hmm. Someone other than me asking for it ? I did the madvise() hack and asking to see if any one really needs sys_punchole(). Thanks, Badari -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] sys_punchhole() 2005-11-16 22:01 ` Badari Pulavarty @ 2005-11-16 23:37 ` Ric Wheeler 2005-11-21 6:46 ` Rob Landley 1 sibling, 0 replies; 17+ messages in thread From: Ric Wheeler @ 2005-11-16 23:37 UTC (permalink / raw) To: Badari Pulavarty Cc: Pavel Machek, Andrew Morton, andrea, hugh, lkml, linux-mm Badari Pulavarty wrote: >On Sun, 2005-11-13 at 15:09 +0000, Pavel Machek wrote: > > >>Hi! >> >> >> >>>>We discussed this in madvise(REMOVE) thread - to add support >>>>for sys_punchhole(fd, offset, len) to complete the functionality >>>>(in the future). >>>> >>>>http://marc.theaimsgroup.com/?l=linux-mm&m=113036713810002&w=2 >>>> >>>>What I am wondering is, should I invest time now to do it ? >>>> >>>> >>>I haven't even heard anyone mention a need for this in the past 1-2 years. >>> >>> >>Some database people wanted it maybe month ago. It was replaced by some >>madvise hack... >> >> > >Hmm. Someone other than me asking for it ? > >I did the madvise() hack and asking to see if any one really needs >sys_punchole(). > >Thanks, >Badari > > > > I think that sys_punchole() would be useful for some object based storage systems. Specifically, when you have a box that is trying to store potentially a billion objects on one file system, pushing several objects into a file ("container") can be useful to keep the object count down. The punch hole would be useful in reclaiming space in this type of scheme. On the other side of the argument, you can argue that file systems that support large file counts and really big directories should perform well enough to make this use case less important. ric -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] sys_punchhole() 2005-11-16 22:01 ` Badari Pulavarty 2005-11-16 23:37 ` Ric Wheeler @ 2005-11-21 6:46 ` Rob Landley 1 sibling, 0 replies; 17+ messages in thread From: Rob Landley @ 2005-11-21 6:46 UTC (permalink / raw) To: Badari Pulavarty Cc: Pavel Machek, Andrew Morton, andrea, hugh, lkml, linux-mm On Wednesday 16 November 2005 16:01, Badari Pulavarty wrote: > Hmm. Someone other than me asking for it ? > > I did the madvise() hack and asking to see if any one really needs > sys_punchole(). I run into a potential use case for every once in a while. For example, there was recent discussion on the User Mode Linux list about this, since the "physical memory" that uses is an mmaped file so the logical way to give unused memory back to the host OS (initially via a hotplug memory interface driven by some kind of daemon, since the pagecache expands to fill all available space even when the data is also redundantly cached by the host OS) would by via sys_punchole(). Of course UML's physmem file is normally on a tmpfs() mount, where madvise(DONTNEED) has special behavior to work like punch anyway. So it looks like special cases to work around this lack can be added ad infinitum so there's never any immediate need for the actual generic functionality. On the other hand, if you're going to support holes at all, having to recreate the file to get your hole back is kind of silly. I personally think the ability to create holes in a new file but not create holes in an existing file is every bit as strange as being able to extend a file but not truncate it. (See the java 1.1 api for an example of _that_ particular thinko...) Rob -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] sys_punchhole() 2005-11-13 15:09 ` Pavel Machek 2005-11-16 22:01 ` Badari Pulavarty @ 2005-11-18 16:42 ` Ragnar Kjørstad 2005-11-18 16:54 ` Badari Pulavarty 1 sibling, 1 reply; 17+ messages in thread From: Ragnar Kjørstad @ 2005-11-18 16:42 UTC (permalink / raw) To: Pavel Machek Cc: Andrew Morton, Badari Pulavarty, andrea, hugh, linux-kernel, linux-mm On Sun, Nov 13, 2005 at 03:09:06PM +0000, Pavel Machek wrote: > > > We discussed this in madvise(REMOVE) thread - to add support > > > for sys_punchhole(fd, offset, len) to complete the functionality > > > (in the future). > > > > > > http://marc.theaimsgroup.com/?l=linux-mm&m=113036713810002&w=2 > > > > > > What I am wondering is, should I invest time now to do it ? > > > > I haven't even heard anyone mention a need for this in the past 1-2 years. > > Some database people wanted it maybe month ago. It was replaced by some > madvise hack... sys_punchhole is also potentially very useful for Hirarchial Storage Management. (Holes are typically used for data that have been migrated to tape). -- Ragnar Kjorstad -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] sys_punchhole() 2005-11-18 16:42 ` Ragnar Kjørstad @ 2005-11-18 16:54 ` Badari Pulavarty 0 siblings, 0 replies; 17+ messages in thread From: Badari Pulavarty @ 2005-11-18 16:54 UTC (permalink / raw) To: Ragnar Kjørstad Cc: Pavel Machek, Andrew Morton, andrea, hugh, lkml, linux-mm On Fri, 2005-11-18 at 17:42 +0100, Ragnar KjA,rstad wrote: > On Sun, Nov 13, 2005 at 03:09:06PM +0000, Pavel Machek wrote: > > > > We discussed this in madvise(REMOVE) thread - to add support > > > > for sys_punchhole(fd, offset, len) to complete the functionality > > > > (in the future). > > > > > > > > http://marc.theaimsgroup.com/?l=linux-mm&m=113036713810002&w=2 > > > > > > > > What I am wondering is, should I invest time now to do it ? > > > > > > I haven't even heard anyone mention a need for this in the past 1-2 years. > > > > Some database people wanted it maybe month ago. It was replaced by some > > madvise hack... > > > sys_punchhole is also potentially very useful for Hirarchial Storage > Management. (Holes are typically used for data that have been migrated > to tape). I agree. But I am not interested in adding whole lot of complexity in the kernel, just because some "potential" use for this. I want to know, if people/products which really really need this feature and their requirements, before I go down that path. For that matter, HSM folks really care about DMAPI. But I never got them to explicitly tell me, what is the most minimum subset interfaces they *absolutely* need (and why) in the whole DMAPI specs :( I always hear complaints about not having DMAPI. Thanks, Badari -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] sys_punchhole() 2005-11-10 23:23 [RFC] sys_punchhole() Badari Pulavarty 2005-11-10 23:32 ` Andrew Morton @ 2005-11-11 5:18 ` Arjan van de Ven 2005-11-16 16:05 ` Badari Pulavarty 1 sibling, 1 reply; 17+ messages in thread From: Arjan van de Ven @ 2005-11-11 5:18 UTC (permalink / raw) To: Badari Pulavarty; +Cc: akpm, andrea, hugh, lkml, linux-mm On Thu, 2005-11-10 at 15:23 -0800, Badari Pulavarty wrote: > > We discussed this in madvise(REMOVE) thread - to add support > for sys_punchhole(fd, offset, len) to complete the functionality > (in the future). in the past always this was said to be "really hard" in linux locking wise, esp. the locking with respect to truncate... did you find a solution to this problem ? > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] sys_punchhole() 2005-11-11 5:18 ` Arjan van de Ven @ 2005-11-16 16:05 ` Badari Pulavarty 2005-11-16 16:38 ` Anton Altaparmakov 0 siblings, 1 reply; 17+ messages in thread From: Badari Pulavarty @ 2005-11-16 16:05 UTC (permalink / raw) To: Arjan van de Ven; +Cc: akpm, andrea, hugh, lkml, linux-mm On Fri, 2005-11-11 at 06:18 +0100, Arjan van de Ven wrote: > On Thu, 2005-11-10 at 15:23 -0800, Badari Pulavarty wrote: > > > > We discussed this in madvise(REMOVE) thread - to add support > > for sys_punchhole(fd, offset, len) to complete the functionality > > (in the future). > > in the past always this was said to be "really hard" in linux locking > wise, esp. the locking with respect to truncate... > > did you find a solution to this problem ? I have been thinking about some of the race condition we might run into. Its hard to think all of them, when I really don't have any code to play with :( Anyway, I think race against truncate is fine. We hold i_alloc_sem - which should serialize against truncates. This should also serialize against DIO. Holding i_sem should take care of writers. One concern I can think of is, racing with read(2). While we are thrashing pagecache and calling filesystem to free up the blocks - a read(2) could read old disk block and give old data (since it won't find it in pagecache). This could become a security hole :( Thanks, Badari -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] sys_punchhole() 2005-11-16 16:05 ` Badari Pulavarty @ 2005-11-16 16:38 ` Anton Altaparmakov 0 siblings, 0 replies; 17+ messages in thread From: Anton Altaparmakov @ 2005-11-16 16:38 UTC (permalink / raw) To: Badari Pulavarty; +Cc: Arjan van de Ven, akpm, andrea, hugh, lkml, linux-mm On Wed, 16 Nov 2005, Badari Pulavarty wrote: > On Fri, 2005-11-11 at 06:18 +0100, Arjan van de Ven wrote: > > On Thu, 2005-11-10 at 15:23 -0800, Badari Pulavarty wrote: > > > > > > We discussed this in madvise(REMOVE) thread - to add support > > > for sys_punchhole(fd, offset, len) to complete the functionality > > > (in the future). > > > > in the past always this was said to be "really hard" in linux locking > > wise, esp. the locking with respect to truncate... > > > > did you find a solution to this problem ? > > I have been thinking about some of the race condition we might run into. > Its hard to think all of them, when I really don't have any code to play > with :( > > Anyway, I think race against truncate is fine. We hold i_alloc_sem - > which should serialize against truncates. This should also serialize > against DIO. Holding i_sem should take care of writers. > > One concern I can think of is, racing with read(2). While we are > thrashing pagecache and calling filesystem to free up the blocks - > a read(2) could read old disk block and give old data (since it won't > find it in pagecache). This could become a security hole :( So why not tell the fs to perform the "punch" before dealing with the page cache? If you do it in that order, a racing read(2) (or a racing mmapped access for that matter) will see the hole, not the old data. btw. I sometimes wonder whether it is correct for truncate to do the page cache update before calling down into the fs for simillar reasons but I think that it is ok after all because truncate only ever converts between (exists/hole -> does not exist) or (does not exist -> exists as zeroes/hole) but it never deals with (exists A -> exists B/hole) which is what sys_punchhole does. I just had to adapt the address space operations readpage and writepage in ntfs to cope with a read/write request outside the end of the file which does happen when a racing truncate has extended the file's i_size but the fs has not done the necessary metadata updates yet... Best regards, Anton -- Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @) Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2005-11-21 6:46 UTC | newest] Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2005-11-10 23:23 [RFC] sys_punchhole() Badari Pulavarty 2005-11-10 23:32 ` Andrew Morton 2005-11-10 23:41 ` Badari Pulavarty 2005-11-10 23:55 ` Anton Altaparmakov 2005-11-11 8:25 ` Ingo Oeser 2005-11-11 19:07 ` Christoph Lameter 2005-11-16 12:08 ` Rob Landley 2005-11-16 12:20 ` Andrea Arcangeli 2005-11-13 15:09 ` Pavel Machek 2005-11-16 22:01 ` Badari Pulavarty 2005-11-16 23:37 ` Ric Wheeler 2005-11-21 6:46 ` Rob Landley 2005-11-18 16:42 ` Ragnar Kjørstad 2005-11-18 16:54 ` Badari Pulavarty 2005-11-11 5:18 ` Arjan van de Ven 2005-11-16 16:05 ` Badari Pulavarty 2005-11-16 16:38 ` Anton Altaparmakov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox