From mboxrd@z Thu Jan 1 00:00:00 1970 Subject: Re: page locking and error handling References: From: ebiederm@xmission.com (Eric W. Biederman) Date: 15 Feb 2001 08:33:07 -0700 In-Reply-To: Roman Zippel's message of "Thu, 15 Feb 2001 16:02:28 +0100 (MET)" Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-linux-mm@kvack.org Return-Path: To: Roman Zippel Cc: linux-mm@kvack.org List-ID: Roman Zippel writes: > Hi, > > I'm currently trying to exactly understand the current page state handling > and I have a few problems around the generic_file_write function. The > problems I see are: > - the mentioned deadlock in generic_file_write is not really fixed, it's > just a bit harder to exploit? I don't know I haven't traced that path. > - if copy_from_user() fails the page is set as not uptodate. AFAIK this > assumes that the page->buffers are still uptodate, so previous writes > are not lost. If copy_from_user fails that invokes undefined behavior, and you just lost your previous writes because you ``overwrote'' them. > - a concurrent read might outrun a write and so possibly get some new data > of the write and some old data. Read/write are not atomic so no problem. > > Please correct me, if I'm wrong. Anyway, here are some ideas to address > this. > 1. We can add a nonblocking version from copy_(from|to)_user, which > returns EAGAIN if it finds a locked page. Most of the needed code is in > the slow path, so it doesn't affect performance. Also see > arch/m68k/mm/fault.c how to pass additional info from the fault > handler. > 2. We should make the state of a page completely independent of the > underlying mapping mechanism. > - A page shouldn't get suddenly not uptodate because a read from user > space fails, so we need to _clearly_ define who sets/clears which > bit. Especially in error situations ClearPageUptodate() is called, > but gets the page data really not uptodate? Some kind of ``correct'' handling for this should happen so that if the page is mapped we don't break invariants, (like a mapped page should always be uptodate. But other than that we are fine. > - This also includes that the buffer pointer becomes private to the > mapping mechanism, so it can be used for other caching mechanism > (e.g. nfs doesn't have to store it separately). Possibly. This is a bit of a corner case. It looks good on paper certainly. > 3. During a write we always lock at least one page and we don't release > the previous page until we got the next. This means: > - the i_sem is not needed anymore, so multiple writes can access the > file at the same time. i_sem I believe is to protect the file length, and avoid weird truncation races. As well allowing things like O_APPEND work. I don't see how page level locking helps with file size changes. > - a read can't outrun a write anymore. That isn't a problem. You have to do locking in user space to avoid that if you want it. > - page locking has to happen completely at the higher layer and keeping > multiple pages locked would require something like 1). ? > - this would allow to pass multiple pages at once to the mapping > mechanism, as we can easily link several pages together. This > actually is all what is needed/wanted for streaming and no need for a > heavyweight kiobuf. Hmm. For what you are suggesting kiobufs aren't that bad. Not that I'm supporting them, but since you are aiming at the cases they handle just fine I won't criticize them either. > > This is probably is a bit sketchy, but the main idea is to further improve > the page state handling and remove dependencies/assumptions to the buffer > handling. This would also allow better error handling, e.g. data for a > removed media could be saved in a temporary file instead of throwing away > the data or one could even keep two medias mounted in the same > drive. Create a pseudo block device if you want these kinds of semantics they should not be handled directly by the filesystem layer. At least not unless so one comes up with a design where it just happens to fall out naturally. (Unlikely). > Another possibility is to use/test several i/o mechanism at the same time, > or even to make them modular. > Anyway, just some wild ideas :). I'm interested in any comments, whether > above is needed/desired. As it would mean some quite heavy changes, I'd > like to make sure I'm not missing anything before starting hacking on it > (what won't be to soon, as more important things are pending...). Have fun reading the code and kibitzing. Eric -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/