Re: [PATCH v2] mm: fix race between MADV_FREE reclaim and blkdev direct IO read

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Minchan Kim <minchan@kernel.org>
To: John Hubbard <jhubbard@nvidia.com>
Cc: Yu Zhao <yuzhao@google.com>,
	Mauricio Faria de Oliveira <mfo@canonical.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-block@vger.kernel.org,
	Huang Ying <ying.huang@intel.com>,
	Miaohe Lin <linmiaohe@huawei.com>, Yang Shi <shy828301@gmail.com>
Subject: Re: [PATCH v2] mm: fix race between MADV_FREE reclaim and blkdev direct IO read
Date: Tue, 11 Jan 2022 12:20:13 -0800	[thread overview]
Message-ID: <Yd3mfROPwP72QPt3@google.com> (raw)
In-Reply-To: <e75c8f37-782f-f4d4-b197-8fda18090b42@nvidia.com>

On Tue, Jan 11, 2022 at 11:29:36AM -0800, John Hubbard wrote:
> On 1/11/22 10:54, Minchan Kim wrote:
> ...
> > Hi Yu,
> > 
> > I think you're correct. I think we don't like memory barrier
> > there in page_dup_rmap. Then, how about make gup_fast is aware
> > of FOLL_TOUCH?
> > 
> > FOLL_TOUCH means it's going to write something so the page
> 
> Actually, my understanding of FOLL_TOUCH is that it does *not* mean that
> data will be written to the page. That is what FOLL_WRITE is for.
> FOLL_TOUCH means: update the "accessed" metadata, without actually
> writing to the memory that the page represents.

Exactly. I should have mentioned the FOLL_TOUCH with FOLL_WRITE.
What I wanted to hit with FOLL_TOUCH was 

follow_page_pte:

        if (flags & FOLL_TOUCH) {
                if ((flags & FOLL_WRITE) &&
                    !pte_dirty(pte) && !PageDirty(page))
                        set_page_dirty(page);
                mark_page_accessed(page);
        }

> 
> 
> > should be dirty. Currently, get_user_pages works like that.
> > Howver, problem is get_user_pages_fast since it looks like
> > that lockless_pages_from_mm doesn't support FOLL_TOUCH.
> > 
> > So the idea is if param in internal_get_user_pages_fast
> > includes FOLL_TOUCH, gup_{pmd,pte}_range try to make the
> > page dirty under trylock_page(If the lock fails, it goes
> 
> Marking a page dirty solely because FOLL_TOUCH is specified would
> be an API-level mistake. That's why it isn't "supported". Or at least,
> that's how I'm reading things.
> 
> Hope that helps!
> 
> > slow path with __gup_longterm_unlocked and set_dirty_pages
> > for them).
> > 
> > This approach would solve other cases where map userspace
> > pages into kernel space and then write. Since the write
> > didn't go through with the process's page table, we will
> > lose the dirty bit in the page table of the process and
> > it turns out same problem. That's why I'd like to approach
> > this.
> > 
> > If it doesn't work, the other option to fix this specific
> > case is can't we make pages dirty in advance in DIO read-case?
> > 
> > When I look at DIO code, it's already doing in async case.
> > Could't we do the same thing for the other cases?
> > I guess the worst case we will see would be more page
> > writeback since the page becomes dirty unnecessary.
> 
> Marking pages dirty after pinning them is a pre-existing area of
> problems. See the long-running LWN articles about get_user_pages() [1].

Oh, Do you mean marking page dirty in DIO path is already problems?
Let me read the pages in the link.
Thanks!

> 
> 
> [1] https://lwn.net/Kernel/Index/#Memory_management-get_user_pages
> 
> thanks,
> -- 
> John Hubbard
> NVIDIA
>

next prev parent reply	other threads:[~2022-01-11 20:20 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-05 23:34 Mauricio Faria de Oliveira
2022-01-06 23:15 ` Minchan Kim
2022-01-07  0:11   ` Yang Shi
2022-01-07  1:08     ` Yang Shi
2022-01-11  1:34   ` Huang, Ying
2022-01-11  6:48 ` Yu Zhao
2022-01-11 18:54   ` Minchan Kim
2022-01-11 19:29     ` John Hubbard
2022-01-11 20:20       ` Minchan Kim [this message]
2022-01-11 20:21         ` Minchan Kim
2022-01-11 21:59           ` Minchan Kim
2022-01-11 23:38             ` John Hubbard
2022-01-12  0:01               ` Minchan Kim
2022-01-12  1:46   ` Huang, Ying
2022-01-12 17:33     ` Minchan Kim
2022-01-12 21:53       ` Mauricio Faria de Oliveira
2022-01-12 22:37         ` Minchan Kim
2022-01-13  8:54           ` Huang, Ying
2022-01-13 12:30             ` Huang, Ying
2022-01-13 14:54               ` Mauricio Faria de Oliveira
2022-01-13 14:30           ` Mauricio Faria de Oliveira
2022-01-13  7:29         ` Yu Zhao
2022-01-14  0:35           ` Minchan Kim
2022-01-31 23:10             ` Mauricio Faria de Oliveira
2022-01-13  5:47       ` Huang, Ying
2022-01-13  6:37         ` Miaohe Lin
2022-01-13  8:04           ` Huang, Ying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yd3mfROPwP72QPt3@google.com \
    --to=minchan@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=jhubbard@nvidia.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mfo@canonical.com \
    --cc=shy828301@gmail.com \
    --cc=ying.huang@intel.com \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox