From: Minchan Kim <minchan@kernel.org>
To: John Hubbard <jhubbard@nvidia.com>
Cc: Yu Zhao <yuzhao@google.com>,
Mauricio Faria de Oliveira <mfo@canonical.com>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, linux-block@vger.kernel.org,
Huang Ying <ying.huang@intel.com>,
Miaohe Lin <linmiaohe@huawei.com>, Yang Shi <shy828301@gmail.com>
Subject: Re: [PATCH v2] mm: fix race between MADV_FREE reclaim and blkdev direct IO read
Date: Tue, 11 Jan 2022 12:20:13 -0800 [thread overview]
Message-ID: <Yd3mfROPwP72QPt3@google.com> (raw)
In-Reply-To: <e75c8f37-782f-f4d4-b197-8fda18090b42@nvidia.com>
On Tue, Jan 11, 2022 at 11:29:36AM -0800, John Hubbard wrote:
> On 1/11/22 10:54, Minchan Kim wrote:
> ...
> > Hi Yu,
> >
> > I think you're correct. I think we don't like memory barrier
> > there in page_dup_rmap. Then, how about make gup_fast is aware
> > of FOLL_TOUCH?
> >
> > FOLL_TOUCH means it's going to write something so the page
>
> Actually, my understanding of FOLL_TOUCH is that it does *not* mean that
> data will be written to the page. That is what FOLL_WRITE is for.
> FOLL_TOUCH means: update the "accessed" metadata, without actually
> writing to the memory that the page represents.
Exactly. I should have mentioned the FOLL_TOUCH with FOLL_WRITE.
What I wanted to hit with FOLL_TOUCH was
follow_page_pte:
if (flags & FOLL_TOUCH) {
if ((flags & FOLL_WRITE) &&
!pte_dirty(pte) && !PageDirty(page))
set_page_dirty(page);
mark_page_accessed(page);
}
>
>
> > should be dirty. Currently, get_user_pages works like that.
> > Howver, problem is get_user_pages_fast since it looks like
> > that lockless_pages_from_mm doesn't support FOLL_TOUCH.
> >
> > So the idea is if param in internal_get_user_pages_fast
> > includes FOLL_TOUCH, gup_{pmd,pte}_range try to make the
> > page dirty under trylock_page(If the lock fails, it goes
>
> Marking a page dirty solely because FOLL_TOUCH is specified would
> be an API-level mistake. That's why it isn't "supported". Or at least,
> that's how I'm reading things.
>
> Hope that helps!
>
> > slow path with __gup_longterm_unlocked and set_dirty_pages
> > for them).
> >
> > This approach would solve other cases where map userspace
> > pages into kernel space and then write. Since the write
> > didn't go through with the process's page table, we will
> > lose the dirty bit in the page table of the process and
> > it turns out same problem. That's why I'd like to approach
> > this.
> >
> > If it doesn't work, the other option to fix this specific
> > case is can't we make pages dirty in advance in DIO read-case?
> >
> > When I look at DIO code, it's already doing in async case.
> > Could't we do the same thing for the other cases?
> > I guess the worst case we will see would be more page
> > writeback since the page becomes dirty unnecessary.
>
> Marking pages dirty after pinning them is a pre-existing area of
> problems. See the long-running LWN articles about get_user_pages() [1].
Oh, Do you mean marking page dirty in DIO path is already problems?
Let me read the pages in the link.
Thanks!
>
>
> [1] https://lwn.net/Kernel/Index/#Memory_management-get_user_pages
>
> thanks,
> --
> John Hubbard
> NVIDIA
>
next prev parent reply other threads:[~2022-01-11 20:20 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-05 23:34 Mauricio Faria de Oliveira
2022-01-06 23:15 ` Minchan Kim
2022-01-07 0:11 ` Yang Shi
2022-01-07 1:08 ` Yang Shi
2022-01-11 1:34 ` Huang, Ying
2022-01-11 6:48 ` Yu Zhao
2022-01-11 18:54 ` Minchan Kim
2022-01-11 19:29 ` John Hubbard
2022-01-11 20:20 ` Minchan Kim [this message]
2022-01-11 20:21 ` Minchan Kim
2022-01-11 21:59 ` Minchan Kim
2022-01-11 23:38 ` John Hubbard
2022-01-12 0:01 ` Minchan Kim
2022-01-12 1:46 ` Huang, Ying
2022-01-12 17:33 ` Minchan Kim
2022-01-12 21:53 ` Mauricio Faria de Oliveira
2022-01-12 22:37 ` Minchan Kim
2022-01-13 8:54 ` Huang, Ying
2022-01-13 12:30 ` Huang, Ying
2022-01-13 14:54 ` Mauricio Faria de Oliveira
2022-01-13 14:30 ` Mauricio Faria de Oliveira
2022-01-13 7:29 ` Yu Zhao
2022-01-14 0:35 ` Minchan Kim
2022-01-31 23:10 ` Mauricio Faria de Oliveira
2022-01-13 5:47 ` Huang, Ying
2022-01-13 6:37 ` Miaohe Lin
2022-01-13 8:04 ` Huang, Ying
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Yd3mfROPwP72QPt3@google.com \
--to=minchan@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=jhubbard@nvidia.com \
--cc=linmiaohe@huawei.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mfo@canonical.com \
--cc=shy828301@gmail.com \
--cc=ying.huang@intel.com \
--cc=yuzhao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox