linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Roman Zippel <zippel@fh-brandenburg.de>
Cc: linux-mm@kvack.org
Subject: Re: page locking and error handling
Date: 15 Feb 2001 08:33:07 -0700	[thread overview]
Message-ID: <m1snlg6jws.fsf@frodo.biederman.org> (raw)
In-Reply-To: Roman Zippel's message of "Thu, 15 Feb 2001 16:02:28 +0100 (MET)"

Roman Zippel <zippel@fh-brandenburg.de> writes:

> Hi,
> 
> I'm currently trying to exactly understand the current page state handling
> and I have a few problems around the generic_file_write function. The
> problems I see are:
> - the mentioned deadlock in generic_file_write is not really fixed, it's
>   just a bit harder to exploit?

I don't know I haven't traced that path.

> - if copy_from_user() fails the page is set as not uptodate. AFAIK this
>   assumes that the page->buffers are still uptodate, so previous writes
>   are not lost.
If copy_from_user fails that invokes undefined behavior, and you just lost
your previous writes because you ``overwrote'' them.

> - a concurrent read might outrun a write and so possibly get some new data
>   of the write and some old data.

Read/write are not atomic so no problem.
> 

> Please correct me, if I'm wrong. Anyway, here are some ideas to address
> this.
> 1. We can add a nonblocking version from copy_(from|to)_user, which
>    returns EAGAIN if it finds a locked page. Most of the needed code is in
>    the slow path, so it doesn't affect performance. Also see
>    arch/m68k/mm/fault.c how to pass additional info from the fault
>    handler.
> 2. We should make the state of a page completely independent of the
>    underlying mapping mechanism.
>    - A page shouldn't get suddenly not uptodate because a read from user
>      space fails, so we need to _clearly_ define who sets/clears which
>      bit. Especially in error situations ClearPageUptodate() is called,
>      but gets the page data really not uptodate?

Some kind of ``correct'' handling for this should happen so that if
the page is mapped we don't break invariants, (like a mapped page
should always be uptodate.  But other than that we are fine.

>    - This also includes that the buffer pointer becomes private to the
>      mapping mechanism, so it can be used for other caching mechanism
>      (e.g. nfs doesn't have to store it separately).

Possibly.  This is a bit of a corner case.  It looks good on paper
certainly.

> 3. During a write we always lock at least one page and we don't release
>    the previous page until we got the next. This means:
>    - the i_sem is not needed anymore, so multiple writes can access the
>      file at the same time.

i_sem I believe is to protect the file length, and avoid weird
truncation races.  As well allowing things like O_APPEND work.
I don't see how page level locking helps with file size changes.

>    - a read can't outrun a write anymore.

That isn't a problem.  You have to do locking in user space to avoid
that if you want it.

>    - page locking has to happen completely at the higher layer and keeping
>      multiple pages locked would require something like 1).
?
>    - this would allow to pass multiple pages at once to the mapping
>      mechanism, as we can easily link several pages together. This
>      actually is all what is needed/wanted for streaming and no need for a
>      heavyweight kiobuf.

Hmm.  For what you are suggesting kiobufs aren't that bad.  Not that
I'm supporting them, but since you are aiming at the cases they handle
just fine I won't criticize them either.

> 
> This is probably is a bit sketchy, but the main idea is to further improve
> the page state handling and remove dependencies/assumptions to the buffer
> handling. This would also allow better error handling, e.g. data for a
> removed media could be saved in a temporary file instead of throwing away
> the data or one could even keep two medias mounted in the same
> drive.

Create a pseudo block device if you want these kinds of semantics they
should not be handled directly by the filesystem layer.  At least not
unless so one comes up with a design where it just happens to fall out
naturally. (Unlikely).

> Another possibility is to use/test several i/o mechanism at the same time,
> or even to make them modular.
> Anyway, just some wild ideas :). I'm interested in any comments, whether
> above is needed/desired. As it would mean some quite heavy changes, I'd
> like to make sure I'm not missing anything before starting hacking on it
> (what won't be to soon, as more important things are pending...).

Have fun reading the code and kibitzing.

Eric
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

  reply	other threads:[~2001-02-15 15:33 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-02-15 15:02 Roman Zippel
2001-02-15 15:33 ` Eric W. Biederman [this message]
2001-02-15 18:50   ` Roman Zippel
2001-02-15 23:37     ` Eric W. Biederman
2001-02-16 18:20 ` Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m1snlg6jws.fsf@frodo.biederman.org \
    --to=ebiederm@xmission.com \
    --cc=linux-mm@kvack.org \
    --cc=zippel@fh-brandenburg.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox