From: Toshi Kani <toshi.kani@hp.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: hpa@zytor.com, tglx@linutronix.de, mingo@redhat.com,
arnd@arndb.de, linux-mm@kvack.org, x86@kernel.org,
linux-kernel@vger.kernel.org, Elliott@hp.com
Subject: Re: [PATCH v2 0/7] Kernel huge I/O mapping support
Date: Mon, 23 Feb 2015 16:54:40 -0700 [thread overview]
Message-ID: <1424735680.17007.94.camel@misato.fc.hp.com> (raw)
In-Reply-To: <20150223122224.c55554325cc4dadeca067234@linux-foundation.org>
On Mon, 2015-02-23 at 12:22 -0800, Andrew Morton wrote:
> On Mon, 9 Feb 2015 15:45:28 -0700 Toshi Kani <toshi.kani@hp.com> wrote:
>
> > ioremap() and its related interfaces are used to create I/O
> > mappings to memory-mapped I/O devices. The mapping sizes of
> > the traditional I/O devices are relatively small. Non-volatile
> > memory (NVM), however, has many GB and is going to have TB soon.
> > It is not very efficient to create large I/O mappings with 4KB.
>
> The changelogging is very good - thanks for taking the time to do this.
>
> > This patchset extends the ioremap() interfaces to transparently
> > create I/O mappings with huge pages whenever possible.
>
> I'm wondering if this is prudent. Existing code which was tested with
> 4k mappings will magically start to use huge tlb mappings. I don't
> know what could go wrong, but I'd prefer not to find out! Wouldn't it
> be safer to make this an explicit opt-in?
There were related discussions on this. This v2 patchset actually has
CONFIG_HUGE_IOMAP, which allows user to select this feature. As
suggested in the thread below, I am going to remove this
CONFIG_HUGE_IOMAP, so that it will be simpler and similar to how we
create huge mappings to the kernel itself. If bugs are found, they will
be fixed.
https://lkml.org/lkml/2015/2/18/677
> What operations can presently be performed against an ioremapped area?
> Can kernel code perform change_page_attr() against individual pages?
> Can kernel code run iounmap() against just part of that region (I
> forget). There does seem to be potential for breakage if we start
> using hugetlb mappings for such things?
Yes, kernel code can use the CPA interfaces, such as set_memory_x() and
set_memory_ro() to an ioremapped area. CPA breaks a huge page to
smaller pages. I have included them into my test cases and confirmed
they work. (Note, memory type change interfaces, such as
set_memory_uc() and set_memory_wc(), are not supported to an ioremapped
area regardless of their page size.)
iounmap() only takes a single argument, virtual base addr. It looks up
the corresponding vm area object from the virt addr, and always removes
the entire mapping.
> > ioremap()
> > continues to use 4KB mappings when a huge page does not fit into
> > a requested range. There is no change necessary to the drivers
> > using ioremap(). A requested physical address must be aligned by
> > a huge page size (1GB or 2MB on x86) for using huge page mapping,
> > though. The kernel huge I/O mapping will improve performance of
> > NVM and other devices with large memory, and reduce the time to
> > create their mappings as well.
> >
> > On x86, the huge I/O mapping may not be used when a target range is
> > covered by multiple MTRRs with different memory types. The caller
> > must make a separate request for each MTRR range, or the huge I/O
> > mapping can be disabled with the kernel boot option "nohugeiomap".
> > The detail of this issue is described in the email below, and this
> > patch takes option C) in favor of simplicity since MTRRs are legacy
> > feature.
> > https://lkml.org/lkml/2015/2/5/638
>
> How is this mtrr clash handled?
>
> - The iomap call will fail if there are any MTRRs covering the region?
>
> - The iomap call will fail if there are more than one MTRRs covering
> the region?
>
> - If the ioremap will succeed if a single MTRR covers the region,
> must that MTRR cover the *entire* region?
>
> - What happens if userspace tried fiddling the MTRRs after the region
> has been established?
>
> <reads the code>
This issue was also discussed in the same thread:
https://lkml.org/lkml/2015/2/18/677
I am going to implement option D -- the iomap call will fail if there
are more than one MTRRs with "different types" covering the region.
> Oh. We don't do any checking at all. We're just telling userspace
> programmers "don't do that". hrm. What are your thoughts on adding
> the overlap checks to the kernel?
>
> This adds more potential for breaking existing code, doesn't it? If
> there's code which is using 4k ioremap on regions which are covered by
> mtrrs, the transparent switch to hugeptes will cause that code to enter
> the "undefined behaviour" space?
Yes, I agree with your concern, and I am going to add the check. I do
not think we have such platform today, and will be affected by this
change, though.
> > The patchset introduces the following configs:
> > HUGE_IOMAP - When selected (default Y), enable huge I/O mappings.
> > Require HAVE_ARCH_HUGE_VMAP set.
> > HAVE_ARCH_HUGE_VMAP - Indicate arch supports huge KVA mappings.
> > Require X86_PAE set on X86_32.
> >
> > Patch 1-4 changes common files to support huge I/O mappings. There
> > is no change in the functinalities until HUGE_IOMAP is set in patch 7.
> >
> > Patch 5,6 implement HAVE_ARCH_HUGE_VMAP and HUGE_IOMAP funcs on x86,
> > and set HAVE_ARCH_HUGE_VMAP on x86.
> >
> > Patch 7 adds HUGE_IOMAP to Kconfig, which is set to Y by default on
> > x86.
>
> What do other architectures need to do to utilize this?
Other architectures can implement their version of patch 5/7 and 6/7 to
utilize this feature.
Thanks,
-Toshi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2015-02-23 23:55 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-09 22:45 Toshi Kani
2015-02-09 22:45 ` [PATCH v2 1/7] mm: Change __get_vm_area_node() to use fls_long() Toshi Kani
2015-02-09 22:45 ` [PATCH v2 2/7] lib: Add huge I/O map capability interfaces Toshi Kani
2015-02-09 22:45 ` [PATCH v2 3/7] mm: Change ioremap to set up huge I/O mappings Toshi Kani
2015-02-09 22:45 ` [PATCH v2 4/7] mm: Change vunmap to tear down huge KVA mappings Toshi Kani
2015-02-09 22:45 ` [PATCH v2 5/7] x86, mm: Support huge KVA mappings on x86 Toshi Kani
2015-02-10 18:59 ` Dave Hansen
2015-02-10 20:42 ` Toshi Kani
2015-02-10 20:51 ` Dave Hansen
2015-02-10 22:13 ` Toshi Kani
2015-02-10 22:20 ` Toshi Kani
2015-02-10 23:10 ` Toshi Kani
2015-03-03 0:37 ` Toshi Kani
2015-02-09 22:45 ` [PATCH v2 6/7] x86, mm: Support huge I/O " Toshi Kani
2015-02-18 20:44 ` Ingo Molnar
2015-02-18 21:13 ` Toshi Kani
2015-02-18 21:15 ` Ingo Molnar
2015-02-18 21:33 ` Toshi Kani
2015-02-18 21:57 ` Ingo Molnar
2015-02-18 22:14 ` Toshi Kani
2015-02-09 22:45 ` [PATCH v2 7/7] mm: Add config HUGE_IOMAP to enable huge I/O mappings Toshi Kani
2015-02-23 20:22 ` [PATCH v2 0/7] Kernel huge I/O mapping support Andrew Morton
2015-02-23 23:54 ` Toshi Kani [this message]
2015-02-24 8:09 ` Ingo Molnar
2015-03-02 15:51 ` Toshi Kani
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1424735680.17007.94.camel@misato.fc.hp.com \
--to=toshi.kani@hp.com \
--cc=Elliott@hp.com \
--cc=akpm@linux-foundation.org \
--cc=arnd@arndb.de \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@redhat.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox