From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
Hugh Dickins <hughd@google.com>, "H. Peter Anvin" <hpa@zytor.com>,
Jan Kara <jack@suse.cz>, Dave Hansen <dave.hansen@intel.com>,
"linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>,
linux-mm <linux-mm@kvack.org>,
Russell King - ARM Linux <linux@arm.linux.org.uk>,
Tony Luck <tony.luck@intel.com>
Subject: Re: Dirty/Access bits vs. page content
Date: Mon, 28 Apr 2014 09:13:01 +1000 [thread overview]
Message-ID: <1398640381.8437.82.camel@pasglop> (raw)
In-Reply-To: <CA+55aFwLumAqA6mYyPKRZYOCr2TRPxUVdCKhHMg0nYN_KbBDbQ@mail.gmail.com>
On Sun, 2014-04-27 at 09:21 -0700, Linus Torvalds wrote:
> So in theory a CPU could just remember what address it loaded the TLB
> entry from, and do a blind "set the dirty bit" with just an atomic
> "or" operation. In fact, for a while I thought that CPU's could do
> that, and the TLB flushing sequence would be:
>
> entry = atomic_xchg(pte, 0);
> flush_tlb();
> entry |= *pte;
>
> so that we'd catch any races with the A/D bit getting set.
>
> It turns out no CPU actually does that, and I'm not sure we ever had
> that code sequence in the kernel (but some code archaeologist might go
> look).
Today hash based powerpc's do the update in the hash table using a byte
store, not an atomic compare. That's one of the reasons we don't
currently exploit the HW facility for dirty/accessed. (There are others,
such as pages being evicted from the hash, we would need a path to
transfer dirty back to the struct page, etc...)
.../...
> Of course, *If* a CPU were to remember the address it loaded the TLB
> entry from, then such a CPU might as well make the TLB be part of the
> cache-coherency domain, and then we wouldn't need to do any TLB
> flushing at all. I wish.
Hrm... Remembering the address as part of the data is one thing, having
it in the tag for snoops is another :) I can see CPU designers wanting
to do the first and not the second.... Though most CPUs I've seen are 4
or 8 ways set-associative so it's not as bad as adding a big CAM
thankfully.
> > Will the hardware fault when it does a translation and needs to update
> > the dirty/access bits while the PTE entry is !present?
>
> Yes indeed, see above (but see how broken hardware _could_ work, which
> would be really painful for us).
>
> What we are fighting is race #3: the TLB happily exists on this or
> other CPU's, an dis _not_ getting updated (so no re-walk), but _is_
> getting used.
Right, and it's little brother which is that the update and the access
that caused it aren't atomic with each other, thus the access can be
seen some time after the R/C update. (This was my original concern until
I realized that it was in fact the same race as the dirty TLB entry
still in the other CPUs).
Cheers,
Ben.
next prev parent reply other threads:[~2014-04-27 23:13 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1398032742.19682.11.camel@pasglop>
[not found] ` <CA+55aFz1sK+PF96LYYZY7OB7PBpxZu-uNLWLvPiRz-tJsBqX3w@mail.gmail.com>
[not found] ` <1398054064.19682.32.camel@pasglop>
[not found] ` <1398057630.19682.38.camel@pasglop>
[not found] ` <CA+55aFwWHBtihC3w9E4+j4pz+6w7iTnYhTf4N3ie15BM9thxLQ@mail.gmail.com>
[not found] ` <53558507.9050703@zytor.com>
[not found] ` <CA+55aFxGm6J6N=4L7exLUFMr1_siNGHpK=wApd9GPCH1=63PPA@mail.gmail.com>
[not found] ` <53559F48.8040808@intel.com>
2014-04-22 0:31 ` Linus Torvalds
2014-04-22 0:44 ` Linus Torvalds
2014-04-22 5:15 ` Tony Luck
2014-04-22 14:55 ` Linus Torvalds
2014-04-22 7:34 ` Peter Zijlstra
2014-04-22 7:54 ` Peter Zijlstra
2014-04-22 21:36 ` Linus Torvalds
2014-04-22 21:46 ` Dave Hansen
2014-04-22 22:08 ` Linus Torvalds
2014-04-22 22:41 ` Dave Hansen
2014-04-23 2:44 ` Linus Torvalds
2014-04-23 3:08 ` Hugh Dickins
2014-04-23 4:23 ` Linus Torvalds
2014-04-23 6:14 ` Benjamin Herrenschmidt
2014-04-23 18:41 ` Jan Kara
2014-04-23 19:33 ` Linus Torvalds
2014-04-24 6:51 ` Peter Zijlstra
2014-04-24 18:40 ` Hugh Dickins
2014-04-24 19:45 ` Linus Torvalds
2014-04-24 20:02 ` Hugh Dickins
2014-04-24 23:46 ` Linus Torvalds
2014-04-25 1:37 ` Benjamin Herrenschmidt
2014-04-25 2:41 ` Benjamin Herrenschmidt
2014-04-25 2:46 ` Linus Torvalds
2014-04-25 2:50 ` H. Peter Anvin
2014-04-25 3:03 ` Linus Torvalds
2014-04-25 12:01 ` Hugh Dickins
2014-04-25 13:51 ` Peter Zijlstra
2014-04-25 19:41 ` Hugh Dickins
2014-04-26 18:07 ` Peter Zijlstra
2014-04-27 7:20 ` Peter Zijlstra
2014-04-27 12:20 ` Hugh Dickins
2014-04-27 19:33 ` Peter Zijlstra
2014-04-27 19:47 ` Linus Torvalds
2014-04-27 20:09 ` Hugh Dickins
2014-04-28 9:25 ` Peter Zijlstra
2014-04-28 10:14 ` Peter Zijlstra
2014-04-27 16:21 ` Linus Torvalds
2014-04-27 23:13 ` Benjamin Herrenschmidt [this message]
2014-04-25 16:54 ` Dave Hansen
2014-04-25 18:41 ` Hugh Dickins
2014-04-25 22:00 ` Dave Hansen
2014-04-26 3:11 ` Hugh Dickins
2014-04-26 3:48 ` Linus Torvalds
2014-04-25 17:56 ` Linus Torvalds
2014-04-25 19:13 ` Hugh Dickins
2014-04-25 16:30 ` Dave Hansen
2014-04-23 20:11 ` Hugh Dickins
2014-04-24 8:49 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1398640381.8437.82.camel@pasglop \
--to=benh@kernel.crashing.org \
--cc=dave.hansen@intel.com \
--cc=hpa@zytor.com \
--cc=hughd@google.com \
--cc=jack@suse.cz \
--cc=linux-arch@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux@arm.linux.org.uk \
--cc=peterz@infradead.org \
--cc=tony.luck@intel.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox