From: Andy Lutomirski <luto@amacapital.net>
To: Daniel Micay <danielmicay@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
Michael Kerrisk <mtk.manpages@gmail.com>,
Linux API <linux-api@vger.kernel.org>,
Hugh Dickins <hughd@google.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Rik van Riel <riel@redhat.com>, Mel Gorman <mgorman@suse.de>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Jason Evans <je@fb.com>,
"Kirill A. Shutemov" <kirill@shutemov.name>,
Shaohua Li <shli@kernel.org>, Michal Hocko <mhocko@suse.cz>,
yalin wang <yalin.wang2010@gmail.com>
Subject: Re: [PATCH v3 01/17] mm: support madvise(MADV_FREE)
Date: Fri, 13 Nov 2015 11:46:07 -0800 [thread overview]
Message-ID: <CALCETrVx0JFchtJrrKVqEYvTwWvC+DwSLxzhD_A7EdNu2PiG7w@mail.gmail.com> (raw)
In-Reply-To: <56459B9A.7080501@gmail.com>
On Fri, Nov 13, 2015 at 12:13 AM, Daniel Micay <danielmicay@gmail.com> wrote:
> On 13/11/15 02:03 AM, Minchan Kim wrote:
>> On Fri, Nov 13, 2015 at 01:45:52AM -0500, Daniel Micay wrote:
>>>> And now I am thinking if we use access bit, we could implment MADV_FREE_UNDO
>>>> easily when we need it. Maybe, that's what you want. Right?
>>>
>>> Yes, but why the access bit instead of the dirty bit for that? It could
>>> always be made more strict (i.e. access bit) in the future, while going
>>> the other way won't be possible. So I think the dirty bit is really the
>>> more conservative choice since if it turns out to be a mistake it can be
>>> fixed without a backwards incompatible change.
>>
>> Absolutely true. That's why I insist on dirty bit until now although
>> I didn't tell the reason. But I thought you wanted to change for using
>> access bit for the future, too. It seems MADV_FREE start to bloat
>> over and over again before knowing real problems and usecases.
>> It's almost same situation with volatile ranges so I really want to
>> stop at proper point which maintainer should decide, I hope.
>> Without it, we will make the feature a lot heavy by just brain storming
>> and then causes lots of churn in MM code without real bebenfit
>> It would be very painful for us.
>
> Well, I don't think you need more than a good API and an implementation
> with no known bugs, kernel security concerns or backwards compatibility
> issues. Configuration and API extensions are something for later (i.e.
> land a baseline, then submit stuff like sysctl tunables). Just my take
> on it though...
>
As long as it's anonymous MAP_PRIVATE only, then the security aspects
should be okay. MADV_DONTNEED seems to work on pretty much any VMA,
and there's been long history of interesting bugs there.
As for dirty vs accessed, an argument in favor of going straight to
accessed is that it means that users can write code like this without
worrying about whether they have a kernel that uses the dirty bit:
x = mmap(...);
*x = 1; /* mark it present */
/* i'm done with it */
*x = 1;
madvise(MADV_FREE, x, ...);
wait a while;
/* is it still there? */
if (*x == 1) {
/* use whatever was cached there */
} else {
/* reinitialize it */
*x = 1;
}
With the dirty bit, this will look like it works, but on occasion
users will lose the race where they probe *x to see if the data was
lost and then the data gets lost before the next write comes in.
Sure, that load from *x could be changed to RMW or users could do a
dummy write (e.g. x[1] = 1; if (*x == 1) ...), but people might forget
to do that, and the caching implications are a little bit worse.
Note that switching to RMW is really really dangerous. Doing:
*x &= 1;
if (*x == 1) ...;
is safe on x86 if the compiler generates:
andl $1, (%[x]);
cmpl $1, (%[x]);
but is unsafe if the compiler generates:
movl (%[x]), %eax;
andl $1, %eax;
movl %eax, (%[x]);
cmpl $1, %eax;
and even worse if the write is omitted when "provably" unnecessary.
OTOH, if switching to the accessed bit is too much of a mess, then
using the dirty bit at first isn't so bad.
--Andy
--
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2015-11-13 19:46 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-11-12 4:32 [PATCH v3 00/17] MADFV_FREE support Minchan Kim
2015-11-12 4:32 ` [PATCH v3 01/17] mm: support madvise(MADV_FREE) Minchan Kim
2015-11-12 4:49 ` Andy Lutomirski
2015-11-12 5:21 ` Daniel Micay
2015-11-13 6:15 ` Minchan Kim
2015-11-13 6:16 ` Daniel Micay
2015-11-13 6:38 ` Minchan Kim
2015-11-13 6:45 ` Daniel Micay
2015-11-13 7:03 ` Minchan Kim
2015-11-13 8:13 ` Daniel Micay
2015-11-13 19:46 ` Andy Lutomirski [this message]
2015-11-16 2:13 ` Minchan Kim
2015-11-16 3:14 ` yalin wang
2015-11-12 11:26 ` Kirill A. Shutemov
2015-11-13 6:17 ` Minchan Kim
2015-11-12 4:32 ` [PATCH v3 02/17] mm: define MADV_FREE for some arches Minchan Kim
2015-11-12 4:32 ` [PATCH v3 03/17] arch: uapi: asm: mman.h: Let MADV_FREE have same value for all architectures Minchan Kim
2015-11-12 11:27 ` Kirill A. Shutemov
2015-11-13 6:18 ` Minchan Kim
[not found] ` <564895F3.8090300@hotmail.com>
2015-11-15 14:23 ` Chen Gang
2015-11-12 4:33 ` [PATCH v3 04/17] mm: free swp_entry in madvise_free Minchan Kim
2015-11-12 4:33 ` [PATCH v3 05/17] mm: move lazily freed pages to inactive list Minchan Kim
2015-11-12 4:33 ` [PATCH v3 06/17] mm: clear PG_dirty to mark page freeable Minchan Kim
2015-11-12 4:33 ` [PATCH v3 07/17] mm: mark stable page dirty in KSM Minchan Kim
2015-11-12 4:33 ` [PATCH v3 08/17] x86: add pmd_[dirty|mkclean] for THP Minchan Kim
2015-11-12 4:33 ` [PATCH v3 09/17] sparc: " Minchan Kim
2015-11-12 4:33 ` [PATCH v3 10/17] powerpc: " Minchan Kim
2015-11-12 4:33 ` [PATCH v3 11/17] arm: add pmd_mkclean " Minchan Kim
2015-11-12 4:33 ` [PATCH v3 12/17] arm64: " Minchan Kim
2015-11-12 4:33 ` [PATCH v3 13/17] mm: don't split THP page when syscall is called Minchan Kim
2015-11-12 4:33 ` [PATCH v3 14/17] mm: introduce wrappers to add new LRU Minchan Kim
2015-11-12 4:33 ` [PATCH v3 15/17] mm: introduce lazyfree LRU list Minchan Kim
2015-11-12 4:33 ` [PATCH v3 16/17] mm: support MADV_FREE on swapless system Minchan Kim
2015-11-12 4:33 ` [PATCH v3 17/17] mm: add knob to tune lazyfreeing Minchan Kim
2015-11-12 19:44 ` Shaohua Li
2015-11-13 6:20 ` Minchan Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CALCETrVx0JFchtJrrKVqEYvTwWvC+DwSLxzhD_A7EdNu2PiG7w@mail.gmail.com \
--to=luto@amacapital.net \
--cc=akpm@linux-foundation.org \
--cc=danielmicay@gmail.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=je@fb.com \
--cc=kirill@shutemov.name \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mhocko@suse.cz \
--cc=minchan@kernel.org \
--cc=mtk.manpages@gmail.com \
--cc=riel@redhat.com \
--cc=shli@kernel.org \
--cc=yalin.wang2010@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox