linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Alexandre Ghiti <alexghiti@rivosinc.com>
To: "Lad, Prabhakar" <prabhakar.csengg@gmail.com>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>,
	Will Deacon <will@kernel.org>,
	 "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	 Nick Piggin <npiggin@gmail.com>,
	Peter Zijlstra <peterz@infradead.org>,
	 Mayuresh Chitale <mchitale@ventanamicro.com>,
	Vincent Chen <vincent.chen@sifive.com>,
	 Paul Walmsley <paul.walmsley@sifive.com>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	 Albert Ou <aou@eecs.berkeley.edu>,
	linux-arch@vger.kernel.org, linux-mm@kvack.org,
	 linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org,
	 Andrew Jones <ajones@ventanamicro.com>
Subject: Re: [PATCH v3 4/4] riscv: Improve flush_tlb_kernel_range()
Date: Fri, 8 Sep 2023 14:34:13 +0200	[thread overview]
Message-ID: <CAHVXubia1=pSN_CJ8RaE=HXr7+2Fb2WSha+_N1GzsJGau7n9fg@mail.gmail.com> (raw)
In-Reply-To: <CA+V-a8tGGR8q1Wv=dJJKLkbAsfmH8p8Fn9Ycns7+1LCSzxvpZA@mail.gmail.com>

Hi Prabhakar,

On Thu, Sep 7, 2023 at 12:50 PM Lad, Prabhakar
<prabhakar.csengg@gmail.com> wrote:
>
> Hi Alexandre,
>
> On Thu, Sep 7, 2023 at 10:06 AM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
> >
> > Hi Prabhakar,
> >
> > On Wed, Sep 6, 2023 at 3:55 PM Lad, Prabhakar
> > <prabhakar.csengg@gmail.com> wrote:
> > >
> > > Hi Alexandre,
> > >
> > > On Wed, Sep 6, 2023 at 1:43 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
> > > >
> > > > On Wed, Sep 6, 2023 at 2:24 PM Lad, Prabhakar
> > > > <prabhakar.csengg@gmail.com> wrote:
> > > > >
> > > > > Hi Alexandre,
> > > > >
> > > > > On Wed, Sep 6, 2023 at 1:18 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
> > > > > >
> > > > > > On Wed, Sep 6, 2023 at 2:09 PM Lad, Prabhakar
> > > > > > <prabhakar.csengg@gmail.com> wrote:
> > > > > > >
> > > > > > > Hi Alexandre,
> > > > > > >
> > > > > > > On Wed, Sep 6, 2023 at 1:01 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
> > > > > > > >
> > > > > > > > Hi Prabhakar,
> > > > > > > >
> > > > > > > > On Wed, Sep 6, 2023 at 1:49 PM Lad, Prabhakar
> > > > > > > > <prabhakar.csengg@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > Hi Alexandre,
> > > > > > > > >
> > > > > > > > > On Tue, Aug 1, 2023 at 9:58 AM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
> > > > > > > > > >
> > > > > > > > > > This function used to simply flush the whole tlb of all harts, be more
> > > > > > > > > > subtile and try to only flush the range.
> > > > > > > > > >
> > > > > > > > > > The problem is that we can only use PAGE_SIZE as stride since we don't know
> > > > > > > > > > the size of the underlying mapping and then this function will be improved
> > > > > > > > > > only if the size of the region to flush is < threshold * PAGE_SIZE.
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> > > > > > > > > > Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
> > > > > > > > > > ---
> > > > > > > > > >  arch/riscv/include/asm/tlbflush.h | 11 +++++-----
> > > > > > > > > >  arch/riscv/mm/tlbflush.c          | 34 +++++++++++++++++++++++--------
> > > > > > > > > >  2 files changed, 31 insertions(+), 14 deletions(-)
> > > > > > > > > >
> > > > > > > > > After applying this patch, I am seeing module load issues on RZ/Five
> > > > > > > > > (complete log [0]). I am testing defconfig + [1] (rz/five related
> > > > > > > > > configs).
> > > > > > > > >
> > > > > > > > > Any pointers on what could be an issue here?
> > > > > > > >
> > > > > > > > Can you give me the exact version of the kernel you use? The trap
> > > > > > > > addresses are vmalloc addresses, and a fix for those landed very late
> > > > > > > > in the release cycle.
> > > > > > > >
> > > > > > > I am using next-20230906, Ive pushed a branch [1] for you to have a look.
> > > > > > >
> > > > > > > [0] https://github.com/prabhakarlad/linux/tree/rzfive-debug
> > > > > >
> > > > > > Great, thanks, I had to get rid of this possibility :)
> > > > > >
> > > > > > As-is, I have no idea, can you try to "bisect" the problem? I mean
> > > > > > which patch in the series leads to those traps?
> > > > > >
> > > > > Oops sorry for not mentioning earlier, this is the offending patch
> > > > > which leads to the issues seen on rz/five.
> > > >
> > > > Ok, so at least I found the following problem, but I don't see how
> > > > that could fix your issue: can you give a try anyway? I keep looking
> > > > into this, thanks
> > > >
> > > > diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> > > > index df2a0838c3a1..b5692bc6c76a 100644
> > > > --- a/arch/riscv/mm/tlbflush.c
> > > > +++ b/arch/riscv/mm/tlbflush.c
> > > > @@ -239,7 +239,7 @@ void flush_tlb_range(struct vm_area_struct *vma,
> > > > unsigned long start,
> > > >
> > > >  void flush_tlb_kernel_range(unsigned long start, unsigned long end)
> > > >  {
> > > > -       __flush_tlb_range(NULL, start, end, PAGE_SIZE);
> > > > +       __flush_tlb_range(NULL, start, end - start, PAGE_SIZE);
> > > >  }
> > > >
> > > I am able to reproduce the issue with the above change too.
> >
> > I can't reproduce the problem on my Unmatched or Qemu, so it is not
> > easy to debug. But I took another look at your traces and something is
> > weird to me. In the following trace (and there is another one), the
> > trap is taken at 0xffffffff015ca034, which is the beginning of
> > rz_ssi_probe(): that's a page fault, so no translation was found (or
> > an invalid one is cached).
> >
> > [   16.586527] Unable to handle kernel paging request at virtual
> > address ffffffff015ca034
> > [   16.594750] Oops [#3]
> > ...
> > [   16.622000] epc : rz_ssi_probe+0x0/0x52a [snd_soc_rz_ssi]
> > ...
> > [   16.708697] status: 0000000200000120 badaddr: ffffffff015ca034
> > cause: 000000000000000c
> > [   16.716580] [<ffffffff015ca034>] rz_ssi_probe+0x0/0x52a
> > [snd_soc_rz_ssi]
> > ...
> >
> > But then here we are able to read the code at this same address:
> > [   16.821620] Code: 0109 6597 0000 8593 5f65 7097 7f34 80e7 7aa0 b7a9
> > (7139) f822
> >
> > So that looks like a "transient" error. Do you know if you uarch
> > caches invalid TLB entries? If you don't know, I have just written
> > some piece of code to determine if it does, let me know.
> >
> No I dont, can you please share the details so that I can pass on the
> information to you.
>
> > Do those errors always happen?
> >
> Yes they do.
>

I still can't reproduce those errors, I built different configs
including yours, insmod/rmmod a few modules but still can't reproduce
that. I'm having a hard time understanding how the correct mapping
magically appears in the trap handler. We finally removed this
patchset from 6.6...

You can give the following patch a try to determine if your uarch
caches invalid TLB entries, but honestly, I'm not sure if that would
help (but it will test my patch :)). The output can be seen in dmesg
"uarch caches invalid entries:".

If the trap addresses are constant, I would try to breakpoint on
flush_tlb_kernel_range() on those addresses and see what happens:
maybe that's an alignment issue or something else, maybe that's not
even called before the trap...etc. More info are welcome :)

Thanks!

diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 80af436c04ac..8f863b251898 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -58,6 +58,8 @@ bool pgtable_l5_enabled = IS_ENABLED(CONFIG_64BIT)
&& !IS_ENABLED(CONFIG_XIP_KER
 EXPORT_SYMBOL(pgtable_l4_enabled);
 EXPORT_SYMBOL(pgtable_l5_enabled);

+bool tlb_caching_invalid_entries;
+
 phys_addr_t phys_ram_base __ro_after_init;
 EXPORT_SYMBOL(phys_ram_base);

@@ -752,6 +754,18 @@ static void __init disable_pgtable_l4(void)
        satp_mode = SATP_MODE_39;
 }

+static void __init enable_pgtable_l5(void)
+{
+       pgtable_l5_enabled = true;
+       satp_mode = SATP_MODE_57;
+}
+
+static void __init enable_pgtable_l4(void)
+{
+       pgtable_l4_enabled = true;
+       satp_mode = SATP_MODE_48;
+}
+
 static int __init print_no4lvl(char *p)
 {
        pr_info("Disabled 4-level and 5-level paging");
@@ -828,6 +842,113 @@ static __init void set_satp_mode(uintptr_t dtb_pa)
        memset(early_pud, 0, PAGE_SIZE);
        memset(early_pmd, 0, PAGE_SIZE);
 }
+
+/* Determine at runtime if the uarch caches invalid TLB entries */
+static __init void set_tlb_caching_invalid_entries(void)
+{
+#define NR_RETRIES_CACHING_INVALID_ENTRIES     50
+       uintptr_t set_tlb_caching_invalid_entries_pmd = ((unsigned
long)set_tlb_caching_invalid_entries) & PMD_MASK;
+       // TODO the test_addr as defined below could go into another pud...
+       uintptr_t test_addr = set_tlb_caching_invalid_entries_pmd + 2
* PMD_SIZE;
+       pmd_t valid_pmd;
+       u64 satp;
+       int i = 0;
+
+       /* To ease the page table creation */
+       // TODO use variable instead, like in the clean, nop stap_mode too
+       disable_pgtable_l5();
+       disable_pgtable_l4();
+
+       /* Establish a mapping for set_tlb_caching_invalid_entries() in sv39 */
+       create_pgd_mapping(early_pg_dir,
+                          set_tlb_caching_invalid_entries_pmd,
+                          (uintptr_t)early_pmd,
+                          PGDIR_SIZE, PAGE_TABLE);
+
+       /* Handle the case where set_tlb_caching_invalid_entries
straddles 2 PMDs */
+       create_pmd_mapping(early_pmd,
+                          set_tlb_caching_invalid_entries_pmd,
+                          set_tlb_caching_invalid_entries_pmd,
+                          PMD_SIZE, PAGE_KERNEL_EXEC);
+       create_pmd_mapping(early_pmd,
+                          set_tlb_caching_invalid_entries_pmd + PMD_SIZE,
+                          set_tlb_caching_invalid_entries_pmd + PMD_SIZE,
+                          PMD_SIZE, PAGE_KERNEL_EXEC);
+
+       /* Establish an invalid mapping */
+       create_pmd_mapping(early_pmd, test_addr, 0, PMD_SIZE, __pgprot(0));
+
+       /* Precompute the valid pmd here because the mapping for
pfn_pmd() won't exist */
+       valid_pmd =
pfn_pmd(PFN_DOWN(set_tlb_caching_invalid_entries_pmd), PAGE_KERNEL);
+
+       local_flush_tlb_all();
+       satp = PFN_DOWN((uintptr_t)&early_pg_dir) | SATP_MODE_39;
+       csr_write(CSR_SATP, satp);
+
+       /*
+        * Set stvec to after the trapping access, access this invalid mapping
+        * and legitimately trap
+        */
+       // TODO: Should I save the previous stvec?
+#define ASM_STR(x)     __ASM_STR(x)
+       asm volatile(
+               "la a0, 1f                              \n"
+               "csrw " ASM_STR(CSR_TVEC) ", a0         \n"
+               "ld a0, 0(%0)                           \n"
+               ".align 2                               \n"
+               "1:                                     \n"
+               :
+               : "r" (test_addr)
+               : "a0"
+       );
+
+       /* Now establish a valid mapping to check if the invalid one
is cached */
+       early_pmd[pmd_index(test_addr)] = valid_pmd;
+
+       /*
+        * Access the valid mapping multiple times: indeed, we can't use
+        * sfence.vma as a barrier to make sure the cpu did not reorder accesses
+        * so we may trap even if the uarch does not cache invalid entries. By
+        * trying a few times, we make sure that those uarchs will see the right
+        * mapping at some point.
+        */
+
+       i = NR_RETRIES_CACHING_INVALID_ENTRIES;
+
+#define ASM_STR(x)     __ASM_STR(x)
+       asm_volatile_goto(
+               "la a0, 1f                                      \n"
+               "csrw " ASM_STR(CSR_TVEC) ", a0                 \n"
+               ".align 2                                       \n"
+               "1:                                             \n"
+               "addi %0, %0, -1                                \n"
+               "blt %0, zero, %l[caching_invalid_entries]      \n"
+               "ld a0, 0(%1)                                   \n"
+               :
+               : "r" (i), "r" (test_addr)
+               : "a0"
+               : caching_invalid_entries
+       );
+
+       csr_write(CSR_SATP, 0ULL);
+       local_flush_tlb_all();
+
+       /* If we don't trap, the uarch does not cache invalid entries! */
+       tlb_caching_invalid_entries = false;
+       goto clean;
+
+caching_invalid_entries:
+       csr_write(CSR_SATP, 0ULL);
+       local_flush_tlb_all();
+
+       tlb_caching_invalid_entries = true;
+clean:
+       memset(early_pg_dir, 0, PAGE_SIZE);
+       memset(early_pmd, 0, PAGE_SIZE);
+
+       enable_pgtable_l4();
+       enable_pgtable_l5();
+}
 #endif

 /*
@@ -1040,6 +1161,7 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
 #endif

 #if defined(CONFIG_64BIT) && !defined(CONFIG_XIP_KERNEL)
+       set_tlb_caching_invalid_entries();
        set_satp_mode(dtb_pa);
 #endif

@@ -1290,6 +1412,9 @@ static void __init setup_vm_final(void)
        local_flush_tlb_all();

        pt_ops_set_late();
+
+       pr_info("uarch caches invalid entries: %s",
+               tlb_caching_invalid_entries ? "yes": "no");
 }
 #else
 asmlinkage void __init setup_vm(uintptr_t dtb_pa)


> Cheers,
> Prabhakar


  reply	other threads:[~2023-09-08 12:34 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-01  8:53 [PATCH v3 0/4] riscv: tlb flush improvements Alexandre Ghiti
2023-08-01  8:53 ` [PATCH v3 1/4] riscv: Improve flush_tlb() Alexandre Ghiti
2023-08-01  8:54 ` [PATCH v3 2/4] riscv: Improve flush_tlb_range() for hugetlb pages Alexandre Ghiti
2023-08-01  8:54 ` [PATCH v3 3/4] riscv: Make __flush_tlb_range() loop over pte instead of flushing the whole tlb Alexandre Ghiti
2023-08-01  8:54 ` [PATCH v3 4/4] riscv: Improve flush_tlb_kernel_range() Alexandre Ghiti
2023-09-06 11:48   ` Lad, Prabhakar
2023-09-06 12:01     ` Alexandre Ghiti
2023-09-06 12:08       ` Lad, Prabhakar
2023-09-06 12:18         ` Alexandre Ghiti
2023-09-06 12:23           ` Lad, Prabhakar
2023-09-06 12:43             ` Alexandre Ghiti
2023-09-06 13:16               ` Palmer Dabbelt
2023-09-06 13:54               ` Lad, Prabhakar
2023-09-07  9:05                 ` Alexandre Ghiti
2023-09-07 10:49                   ` Lad, Prabhakar
2023-09-08 12:34                     ` Alexandre Ghiti [this message]
2023-09-06 20:22     ` Nadav Amit
2023-09-07 13:47       ` Alexandre Ghiti
2023-09-09 19:00   ` Samuel Holland
2023-09-11  8:33     ` Alexandre Ghiti
2023-09-06 13:00 ` [PATCH v3 0/4] riscv: tlb flush improvements patchwork-bot+linux-riscv
2023-09-09 20:11 ` Samuel Holland

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHVXubia1=pSN_CJ8RaE=HXr7+2Fb2WSha+_N1GzsJGau7n9fg@mail.gmail.com' \
    --to=alexghiti@rivosinc.com \
    --cc=ajones@ventanamicro.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=aou@eecs.berkeley.edu \
    --cc=geert+renesas@glider.be \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=mchitale@ventanamicro.com \
    --cc=npiggin@gmail.com \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=peterz@infradead.org \
    --cc=prabhakar.csengg@gmail.com \
    --cc=vincent.chen@sifive.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox