linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
       [not found] ` <718b8afe-222f-4b3a-96d3-93af0e4ceff1@roeck-us.net>
@ 2024-08-06  2:40   ` Linus Torvalds
  2024-08-06 11:02     ` Vlastimil Babka
  0 siblings, 1 reply; 21+ messages in thread
From: Linus Torvalds @ 2024-08-06  2:40 UTC (permalink / raw)
  To: Guenter Roeck, Vlastimil Babka; +Cc: linux-kernel, Linux-MM

[ Let's drop random people and bring in Vlastimil ]

Vlastimil,
 it turns out that the "this patch" is entirely a red herring, and the
problem comes and goes randomly with just some code layout issues. See

   http://server.roeck-us.net/qemu/parisc64-6.10.3/

for more detail, particularly you'll see the "log.bad.gz" with the full log.

See also

   https://lore.kernel.org/all/87y15a4p4h.ffs@tglx/

for this thread.

I don't think this is really a slub issue, since it only happens on
parisc, but maybe you can see what would make parisc different, and
what could possibly make it all timing- or layout-dependent.

                 Linus

On Sun, 4 Aug 2024 at 11:36, Guenter Roeck <linux@roeck-us.net> wrote:
>
> With this patch in v6.10.3, all my parisc64 qemu tests get stuck with repeated error messages
>
> [    0.000000] =============================================================================
> [    0.000000] BUG kmem_cache_node (Not tainted): objects 21 > max 16
> [    0.000000] -----------------------------------------------------------------------------
>
> This never stops until the emulation aborts.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
  2024-08-06  2:40   ` [PATCH 6.10 000/809] 6.10.3-rc3 review Linus Torvalds
@ 2024-08-06 11:02     ` Vlastimil Babka
  2024-08-06 17:33       ` Thomas Gleixner
       [not found]       ` <f63c6789-b01a-4d76-b7c9-74c04867bc13@roeck-us.net>
  0 siblings, 2 replies; 21+ messages in thread
From: Vlastimil Babka @ 2024-08-06 11:02 UTC (permalink / raw)
  To: Linus Torvalds, Guenter Roeck; +Cc: linux-kernel, Linux-MM, Thomas Gleixner

On 8/6/24 04:40, Linus Torvalds wrote:
> [ Let's drop random people and bring in Vlastimil ]

tglx was reproducing it so I add him back

> Vlastimil,
>  it turns out that the "this patch" is entirely a red herring, and the
> problem comes and goes randomly with just some code layout issues. See
> 
>    http://server.roeck-us.net/qemu/parisc64-6.10.3/
> 
> for more detail, particularly you'll see the "log.bad.gz" with the full log.

[    0.000000] BUG kmem_cache_node (Not tainted): objects 21 > max 16
[    0.000000] Slab 0x0000000041ed0000 objects=21 used=5 fp=0x00000000434003d0 flags=0x200(workingset|section=0|zone=0)

flags tell us this came from the partial list (workingset), there's no head flag so order-0

since the error was detected it basically throws the slab page away and tries another one

[    0.000000] BUG kmem_cache (Tainted: G    B             ): objects 25 > max 16
[    0.000000] Slab 0x0000000041ed0080 objects=25 used=6 fp=0x0000000043402790 flags=0x240(workingset|head|section=0|zone=0)

this was also from the partial list but head flag so at least order-1, two things are weird:
- max=16 is same as above even though it should be at least double as
slab page's order is larger
- objects=25 also isn't at least twice than objects=21

All the following are:
[    0.000000] BUG kmem_cache (Tainted: G    B             ): objects 25 > max 16
[    0.000000] Slab 0x0000000041ed0300 objects=25 used=1 fp=0x000000004340c150 flags=0x40(head|section=0|zone=0)

we depleted the partial list so it's allocating new slab pages, that are
also at least order-1

It looks like maxobj calculation is bogus, would be useful to see what values it
calculates from. I'm attaching a diff, but maybe it will also hide the issue...

If someone has a /proc/slabinfo from a working boot with otherwise same config
it might be also enough to guess what values should be expected there,
at least the s-size.

objects=21 vs 25 also seem odd though

used=5 with used=6 in the first two also suggests we already passed this code
successfully for creating a number of kmalloc caches and only then it started
failing, that's also weird.

> See also
> 
>    https://lore.kernel.org/all/87y15a4p4h.ffs@tglx/
> 
> for this thread.
> 
> I don't think this is really a slub issue, since it only happens on
> parisc, but maybe you can see what would make parisc different, and
> what could possibly make it all timing- or layout-dependent.
> 
>                  Linus
> 
> On Sun, 4 Aug 2024 at 11:36, Guenter Roeck <linux@roeck-us.net> wrote:
>>
>> With this patch in v6.10.3, all my parisc64 qemu tests get stuck with repeated error messages
>>
>> [    0.000000] =============================================================================
>> [    0.000000] BUG kmem_cache_node (Not tainted): objects 21 > max 16
>> [    0.000000] -----------------------------------------------------------------------------
>>
>> This never stops until the emulation aborts.

diff --git a/mm/slub.c b/mm/slub.c
index 4927edec6a8c..ec4ed5215f2f 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1386,8 +1386,8 @@ static int check_slab(struct kmem_cache *s, struct slab *slab)
 
 	maxobj = order_objects(slab_order(slab), s->size);
 	if (slab->objects > maxobj) {
-		slab_err(s, slab, "objects %u > max %u",
-			slab->objects, maxobj);
+		slab_err(s, slab, "objects %u > max %u (order %d size %u)",
+			slab->objects, maxobj, slab_order(slab), s->size);
 		return 0;
 	}
 	if (slab->inuse > slab->objects) {



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
  2024-08-06 11:02     ` Vlastimil Babka
@ 2024-08-06 17:33       ` Thomas Gleixner
       [not found]         ` <90e02d99-37a2-437e-ad42-44b80c4e94f6@suse.cz>
       [not found]       ` <f63c6789-b01a-4d76-b7c9-74c04867bc13@roeck-us.net>
  1 sibling, 1 reply; 21+ messages in thread
From: Thomas Gleixner @ 2024-08-06 17:33 UTC (permalink / raw)
  To: Vlastimil Babka, Linus Torvalds, Guenter Roeck; +Cc: linux-kernel, Linux-MM

On Tue, Aug 06 2024 at 13:02, Vlastimil Babka wrote:
> On 8/6/24 04:40, Linus Torvalds wrote:
> It looks like maxobj calculation is bogus, would be useful to see what values it
> calculates from. I'm attaching a diff, but maybe it will also hide the issue...

It does hide it :(

> If someone has a /proc/slabinfo from a working boot with otherwise same config
> it might be also enough to guess what values should be expected there,
> at least the s-size.
>
> objects=21 vs 25 also seem odd though
>
> used=5 with used=6 in the first two also suggests we already passed this code
> successfully for creating a number of kmalloc caches and only then it started
> failing, that's also weird.

I added a printk() to check_slab() and on the non-failing boot this
looks like:

[    0.000000] c 000000004017b0f8 c 0000000041ed0000 objects 21 max 21 order 0 size 192, inuse 2
[    0.000000] c 000000004017b1c8 c 0000000041ed0080 objects 25 max 25 order 1 size 320, inuse 1
[    0.000000] c 0000000043402010 c 0000000041ed0080 objects 25 max 25 order 1 size 320, inuse 2
[    0.000000] c 0000000043402010 c 0000000041ed0080 objects 25 max 25 order 1 size 320, inuse 3
[    0.000000] c 0000000043402150 c 0000000041ed0000 objects 21 max 21 order 0 size 192, inuse 3
[    0.000000] c 0000000043402010 c 0000000041ed0080 objects 25 max 25 order 1 size 320, inuse 4
[    0.000000] c 0000000043402150 c 0000000041ed0000 objects 21 max 21 order 0 size 192, inuse 4
[    0.000000] c 0000000043402010 c 0000000041ed0080 objects 25 max 25 order 1 size 320, inuse 5
[    0.000000] c 0000000043402150 c 0000000041ed0000 objects 21 max 21 order 0 size 192, inuse 5
[    0.000000] c 0000000043402010 c 0000000041ed0080 objects 25 max 25 order 1 size 320, inuse 6
[    0.000000] c 0000000043402150 c 0000000041ed0000 objects 21 max 21 order 0 size 192, inuse 6

I did some more experiments to figure out why adding or removing text
cures it. The minimal change which makes it boot again is:

 asmlinkage __visible void __softirq_entry __do_softirq(void)
 {
+	current->flags &= ~PF_MEMALLOC;
 	handle_softirqs(false);
 }

That results in the following System.map delta:

--- ../upstream.txt	2024-08-06 16:52:49.746528992 +0200
+++ ../build-misc/System.map	2024-08-06 19:02:32.652201977 +0200
@@ -47600,15 +47600,15 @@
 0000000041218c30 T __do_softirq
 0000000041218c30 T __irqentry_text_end
 0000000041218c30 T __softirqentry_text_start
-0000000041218c70 T $$divoI
-0000000041218c70 T __softirqentry_text_end
-00000000412190d0 T $$divI_2
-00000000412190d0 T $$divide_by_constant
-00000000412190e0 T $$divI_4
-00000000412190f0 T $$divI_8
-0000000041219100 T $$divI_16
-00000000412192d8 T $$divI_17
-000000004121930c T $$divU_17
+0000000041218c80 T $$divoI
+0000000041218c80 T __softirqentry_text_end
+00000000412190e0 T $$divI_2
+00000000412190e0 T $$divide_by_constant
+00000000412190f0 T $$divI_4
+0000000041219100 T $$divI_8
+0000000041219110 T $$divI_16
+00000000412192e8 T $$divI_17
+000000004121931c T $$divU_17
 000000004121a000 D __start_opd
 000000004121a000 D _etext
 000000004121a000 D _sdata

So this change adds 16 bytes to __softirq() which moves the division
functions up by 16 bytes. That's all it takes to make the stupid go
away....

I wonder whether this is some qemu stupid.

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
       [not found]         ` <CAHk-=wjmumbT73xLkSAnnxDwaFE__Ny=QCp6B_LE2aG1SUqiTg@mail.gmail.com>
@ 2024-08-06 17:49           ` Linus Torvalds
  0 siblings, 0 replies; 21+ messages in thread
From: Linus Torvalds @ 2024-08-06 17:49 UTC (permalink / raw)
  To: Guenter Roeck, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger
  Cc: Vlastimil Babka, linux-kernel, Linux-MM, Thomas Gleixner

[ Adding s390 people, this is strange ]

New people, see

  https://lore.kernel.org/all/CAHk-=wjmumbT73xLkSAnnxDwaFE__Ny=QCp6B_LE2aG1SUqiTg@mail.gmail.com/

for context. There's a heisenbug that depends on random code layout
issues on s390.

On Tue, 6 Aug 2024 at 10:34, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Hmm. Do we have some alignment confusion?
>
> The alignment rules for 192 are to align it to 64-byte boundaries
> (because that's the largest power of two that divides it), and that
> means it stays at 192, and that would give 21 objects per 4kB page.
>
> But if we use the "align up to next power of two", you get 256 bytes,
> and 16 objects per page.
>
> And that 21-vs-16 confusion would seem to match this pretty well:
>
>   [    0.000000] BUG kmem_cache_node (Not tainted): objects 21 > max 16
>
> which makes me wonder...

I'd suspect commit ad59baa31695 ("slab, rust: extend kmalloc()
alignment guarantees to remove Rust padding"), perhaps with some odd
s390 code generation issue for 'ffs()'.

IOW, this new code in mm/slab_common.c

        if (flags & SLAB_KMALLOC)
                align = max(align, 1U << (ffs(size) - 1));

might not match some other alignment code.

Or maybe it's the s390 ffs().

It looks like

  static inline int ffs(int word)
  {
        unsigned long mask = 2 * BITS_PER_LONG - 1;
        unsigned int val = (unsigned int)word;

        return (1 + (__flogr(-val & val) ^ (BITS_PER_LONG - 1))) & mask;
  }

where s390 has this very odd "flogr" instruction ("find last one G
register"?) for the non-constant case.

That uses a "union register_pair" but only ever uses the "even"
register without ever using the full 128-bit part or the odd register.
So the other register in the register pair is uninitialized.

Does that cause random compiler issues based on register allocation?

Just for fun, does something like this make any difference?

  --- a/arch/s390/include/asm/bitops.h
  +++ b/arch/s390/include/asm/bitops.h
  @@ -305,6 +305,7 @@ static inline unsigned char __flogr(unsigned long word)
                union register_pair rp;

                rp.even = word;
  +             rp.odd = 0;
                asm volatile(
                        "       flogr   %[rp],%[rp]\n"
                        : [rp] "+d" (rp.pair) : : "cc");


Thomas notices that the special "div by constant" routines moved
around, and I'm not seeing how *that* would matter, but it's all
obviously very strange.

              Linus


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
       [not found]               ` <87plqjz6aa.ffs@tglx>
@ 2024-08-08 15:53                 ` Linus Torvalds
  2024-08-08 16:12                   ` Thomas Gleixner
       [not found]                 ` <cffe30ed-43a3-46ac-ad03-afb7633f17e5@roeck-us.net>
  1 sibling, 1 reply; 21+ messages in thread
From: Linus Torvalds @ 2024-08-08 15:53 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Guenter Roeck, Vlastimil Babka, linux-kernel, Linux-MM,
	Helge Deller, linux-parisc

On Thu, 8 Aug 2024 at 02:57, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Careful vs. the pr_once(). It's not necessarily the first allocation
> which trips up. I removed slab_err() in that condition and just printed
> the data:
>
> [    0.000000] Order: 1 Size:  384 Nobj: 21 Maxobj: 16 21 Inuse: 14
> [    0.000000] Order: 0 Size:  168 Nobj: 24 Maxobj: 16 24 Inuse:  1
> [    0.000000] Order: 1 Size:  320 Nobj: 25 Maxobj: 16 25 Inuse: 18
> [    0.000000] Order: 1 Size:  320 Nobj: 25 Maxobj: 16 25 Inuse: 19
> [    0.000000] Order: 1 Size:  320 Nobj: 25 Maxobj: 16 25 Inuse: 20
> [    0.000000] Order: 0 Size:  160 Nobj: 25 Maxobj: 16 25 Inuse:  5
> [    0.000000] Order: 2 Size:  672 Nobj: 24 Maxobj: 16 24 Inuse:  1
> [    0.000000] Order: 3 Size: 1536 Nobj: 21 Maxobj: 16 21 Inuse:  1
> [    0.000000] Order: 3 Size: 1536 Nobj: 21 Maxobj: 16 21 Inuse:  2
> [    0.000000] Order: 3 Size: 1536 Nobj: 21 Maxobj: 16 21 Inuse: 10
>
> The maxobj column shows the failed result and the result from the second
> invocation inside of the printk().

Hmm. There's a few patterns there:

 - the incorrect Maxobj is always 16, with wildly different sizes.

 - the correct value is always in that 21-25 range

and neither of these are particularly common cases for slab objects
(well, at least on x86-64).

I actually went into the gcc sources to look at the libgcc routines
for the hppa $$divU routine, but apart from checking for trivial
powers-of-two and for divisions with small divisor values (<=17), all
it is ends up being a series of "ds" (divide step) and "addc"
instructions. I don't see how that could possibly mess up. It does end
up with the final addc in the delay slot of the return, but that's
normal parisc behavior (and here by "normal" I mean "it's a really
messed up instruction set that did everything wrong, including branch
delay slots")

I do note that the $$divU function (which is what this all should use)
oddly doesn't show up as defined in 'nm' for me when I look at
Guenter's vmlinux file. So there's some odd linker thing going on, and
it *only* affects the $$div* functions.

Thomas' System.map shows some of the same effects, ie it shows $$divoI
(signed integer divide with overflow checking), but doesn't show
$$divU that is right after it. The reason I was looking was exactly
because this should be using $$divU, and clearly code alignment is
implicated somehow, but the exact alignment of $$divU wasn't obvious.

But it looks like "$$divU" should be somewhere between $$divoI and
$$divl_2, and in Guenter's bad case that's

  0000000041218c70 T $$divoI
  00000000412190d0 T $$divI_2

so *maybe* $$divU is around a page boundary? 0000000041218xxx turning
into 0000000041219000?

Some ITLB fill issue together with that delayed branch and a qemu bug?

                Linus


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
       [not found]                 ` <cffe30ed-43a3-46ac-ad03-afb7633f17e5@roeck-us.net>
@ 2024-08-08 15:58                   ` John David Anglin
  0 siblings, 0 replies; 21+ messages in thread
From: John David Anglin @ 2024-08-08 15:58 UTC (permalink / raw)
  To: Guenter Roeck, Thomas Gleixner, Vlastimil Babka, Linus Torvalds
  Cc: linux-kernel, Linux-MM, Helge Deller, linux-parisc

On 2024-08-08 10:59 a.m., Guenter Roeck wrote:
>> Let's wait for PARISC people to run it on actual hardware.
>>
>
> Agreed. I suspect that there is a carry or upper 32 bit of a register not
> handled properly, but I have no idea where that might be or why that would
> only be seen if the div functions are located in a certain address range.
I'm doubtful there's a coding issue in $$divoI.  The routine was written by HP and
it's used on both HP-UX and Linux.  The routine can be found in libgcc/config/pa/milli64.S

The routine can trap:
    Divide by zero is trapped.
    Divide of -2**31 by -1 is trapped for $$divoI but not for $$divI.

$$divoI is a millicode routine.  Not sure what calls it.  gcc doesn't call it.  gcc uses $$divI.

It appears you are testing a 64-bit kernel.  There might be issues calling millicode routines
when branch distance exceeds approximately 4 MB.  Millicode routines have a special
calling sequence.  You could try building kernel with -mlong-calls. Kernel will get bugger
and slower with this option.

Dave

-- 
John David Anglin  dave.anglin@bell.net



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
  2024-08-08 15:53                 ` Linus Torvalds
@ 2024-08-08 16:12                   ` Thomas Gleixner
  2024-08-08 16:33                     ` Linus Torvalds
  0 siblings, 1 reply; 21+ messages in thread
From: Thomas Gleixner @ 2024-08-08 16:12 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Guenter Roeck, Vlastimil Babka, linux-kernel, Linux-MM,
	Helge Deller, linux-parisc

On Thu, Aug 08 2024 at 08:53, Linus Torvalds wrote:
> On Thu, 8 Aug 2024 at 02:57, Thomas Gleixner <tglx@linutronix.de> wrote:
> Hmm. There's a few patterns there:
>
>  - the incorrect Maxobj is always 16, with wildly different sizes.

Which means that the size value is rounded up to the next power of 2

>> [    0.000000] Order: 1 Size:  384 Nobj: 21 Maxobj: 16 21 Inuse: 14

   8192/16 = 512

>> [    0.000000] Order: 0 Size:  168 Nobj: 24 Maxobj: 16 24 Inuse:  1

   4096/16 = 256

>> [    0.000000] Order: 3 Size: 1536 Nobj: 21 Maxobj: 16 21 Inuse:  1

  32768/16 = 2048

>> The maxobj column shows the failed result and the result from the second
>> invocation inside of the printk().

> I actually went into the gcc sources to look at the libgcc routines
> for the hppa $$divU routine, but apart from checking for trivial
> powers-of-two and for divisions with small divisor values (<=17), all
> it is ends up being a series of "ds" (divide step) and "addc"
> instructions. I don't see how that could possibly mess up. It does end
> up with the final addc in the delay slot of the return, but that's
> normal parisc behavior (and here by "normal" I mean "it's a really
> messed up instruction set that did everything wrong, including branch
> delay slots")
>
> I do note that the $$divU function (which is what this all should use)
> oddly doesn't show up as defined in 'nm' for me when I look at
> Guenter's vmlinux file. So there's some odd linker thing going on, and
> it *only* affects the $$div* functions.
>
> Thomas' System.map shows some of the same effects, ie it shows $$divoI
> (signed integer divide with overflow checking), but doesn't show
> $$divU that is right after it. The reason I was looking was exactly
> because this should be using $$divU, and clearly code alignment is
> implicated somehow, but the exact alignment of $$divU wasn't obvious.
>
> But it looks like "$$divU" should be somewhere between $$divoI and
> $$divl_2, and in Guenter's bad case that's
>
>   0000000041218c70 T $$divoI
>   00000000412190d0 T $$divI_2
>
> so *maybe* $$divU is around a page boundary? 0000000041218xxx turning
> into 0000000041219000?

It uses $$divU which is at $$divoI + 0x250. I validated that in the
disassembly.

Thanks,

        tglx



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
  2024-08-08 16:12                   ` Thomas Gleixner
@ 2024-08-08 16:33                     ` Linus Torvalds
  2024-08-08 17:48                       ` Thomas Gleixner
  0 siblings, 1 reply; 21+ messages in thread
From: Linus Torvalds @ 2024-08-08 16:33 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Guenter Roeck, Vlastimil Babka, linux-kernel, Linux-MM,
	Helge Deller, linux-parisc

On Thu, 8 Aug 2024 at 09:12, Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > But it looks like "$$divU" should be somewhere between $$divoI and
> > $$divl_2, and in Guenter's bad case that's
> >
> >   0000000041218c70 T $$divoI
> >   00000000412190d0 T $$divI_2
> >
> > so *maybe* $$divU is around a page boundary? 0000000041218xxx turning
> > into 0000000041219000?
>
> It uses $$divU which is at $$divoI + 0x250. I validated that in the
> disassembly.

Well, that does support "maybe we have a page crosser issue", but it's
not quite at the delayed branch.

Because that would mean that $$divU starts at 0x41218ec0, and that
means that there are 80 instructions from the start of $$divU to the
end of that 0x41218xxx page.

And if I counted instructions right (I don't have a disassembler, so
I'm just looking at the libgcc sources), that puts the page crosser
not quite at the delayed branch slot, but it does put it somewhere
roughly at or around

        ds      temp,arg1,temp          /* 29th divide step */
        addc    retreg,retreg,retreg    /* shift retreg with/into carry */

so it's around the last few bits of the result. The ones we get wrong.

Which is intriguing, but honestly, I don't see how we could get itlb
misses horribly wrong and not crash left and right.

The $$divU routine is unusual in that it uses that millicode calling
convention, but I think that's just a different register for the
return address.

And it obviously depends on the carry flag, which is pretty unusual.
Maybe if the ITLB fill messes up C, it wouldn't show up in other
areas? But the $$divU result error is more than one single bit getting
cleared.

              Linus


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
  2024-08-08 16:33                     ` Linus Torvalds
@ 2024-08-08 17:48                       ` Thomas Gleixner
  2024-08-08 18:19                         ` Linus Torvalds
  0 siblings, 1 reply; 21+ messages in thread
From: Thomas Gleixner @ 2024-08-08 17:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Guenter Roeck, Vlastimil Babka, linux-kernel, Linux-MM,
	Helge Deller, linux-parisc

On Thu, Aug 08 2024 at 09:33, Linus Torvalds wrote:
> On Thu, 8 Aug 2024 at 09:12, Thomas Gleixner <tglx@linutronix.de> wrote:
>> It uses $$divU which is at $$divoI + 0x250. I validated that in the
>> disassembly.
>
> Well, that does support "maybe we have a page crosser issue", but it's
> not quite at the delayed branch.
>
> Because that would mean that $$divU starts at 0x41218ec0, and that
> means that there are 80 instructions from the start of $$divU to the
> end of that 0x41218xxx page.
>
> And if I counted instructions right (I don't have a disassembler, so
> I'm just looking at the libgcc sources), that puts the page crosser
> not quite at the delayed branch slot, but it does put it somewhere
> roughly at or around
>
>         ds      temp,arg1,temp          /* 29th divide step */
>         addc    retreg,retreg,retreg    /* shift retreg with/into carry */
>
> so it's around the last few bits of the result. The ones we get wrong.
>
> Which is intriguing, but honestly, I don't see how we could get itlb
> misses horribly wrong and not crash left and right.

Here is the disassembly from my latest crashing debug kernel which
shifts it up a couple of pages. Add 0x10 or sub 0x20 to make it work. 

    4121dec0:	37 21 3f ff 	ldo -1(r25),r1
    4121dec4:	08 39 22 00 	and,= r25,r1,r0
    4121dec8:	e8 00 00 88 	b,l 4121df14 <$$divoI+0x2a4>,r0
    4121decc:	b3 20 20 00 	addi,tc,= 0,r25,r0
    4121ded0:	08 1a 02 5d 	copy r26,ret1
    4121ded4:	d3 21 39 f0 	extrw,u,= r25,15,16,r1
    4121ded8:	d3 bd 19 f0 	extrw,u ret1,15,16,ret1
    4121dedc:	08 39 02 59 	or r25,r1,r25
    4121dee0:	34 1a 01 98 	ldi cc,r26
    4121dee4:	d3 21 3a f8 	extrw,u,= r25,23,8,r1
    4121dee8:	d3 bd 1a e8 	extrw,u ret1,23,24,ret1
    4121deec:	08 39 02 59 	or r25,r1,r25
    4121def0:	34 01 01 54 	ldi aa,r1
    4121def4:	d3 20 3b 7c 	extrw,u,= r25,27,4,r0
    4121def8:	d3 bd 1b 64 	extrw,u ret1,27,28,ret1
    4121defc:	0b 59 22 00 	and,= r25,r26,r0
    4121df00:	d3 bd 1b a2 	extrw,u ret1,29,30,ret1
    4121df04:	08 39 22 00 	and,= r25,r1,r0
    4121df08:	d3 bd 1b c1 	extrw,u ret1,30,31,ret1
    4121df0c:	e8 40 c0 02 	bv,n r0(rp)
    4121df10:	08 00 02 40 	nop
    4121df18:	97 21 00 00 	subi 0,r25,r1
    4121df1c:	08 20 04 40 	ds r0,r1,r0
    4121df20:	0b 5a 06 1d 	add r26,r26,ret1
    4121df24:	0b 20 04 41 	ds r0,r25,r1
    4121df28:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121df2c:	0b 21 04 41 	ds r1,r25,r1
    4121df30:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121df34:	0b 21 04 41 	ds r1,r25,r1
    4121df38:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121df3c:	0b 21 04 41 	ds r1,r25,r1
    4121df40:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121df44:	0b 21 04 41 	ds r1,r25,r1
    4121df48:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121df4c:	0b 21 04 41 	ds r1,r25,r1
    4121df50:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121df54:	0b 21 04 41 	ds r1,r25,r1
    4121df58:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121df5c:	0b 21 04 41 	ds r1,r25,r1
    4121df60:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121df64:	0b 21 04 41 	ds r1,r25,r1
    4121df68:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121df6c:	0b 21 04 41 	ds r1,r25,r1
    4121df70:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121df74:	0b 21 04 41 	ds r1,r25,r1
    4121df78:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121df7c:	0b 21 04 41 	ds r1,r25,r1
    4121df80:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121df84:	0b 21 04 41 	ds r1,r25,r1
    4121df88:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121df8c:	0b 21 04 41 	ds r1,r25,r1
    4121df90:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121df94:	0b 21 04 41 	ds r1,r25,r1
    4121df98:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121df9c:	0b 21 04 41 	ds r1,r25,r1
    4121dfa0:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121dfa4:	0b 21 04 41 	ds r1,r25,r1
    4121dfa8:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121dfac:	0b 21 04 41 	ds r1,r25,r1
    4121dfb0:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121dfb4:	0b 21 04 41 	ds r1,r25,r1
    4121dfb8:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121dfbc:	0b 21 04 41 	ds r1,r25,r1
    4121dfc0:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121dfc4:	0b 21 04 41 	ds r1,r25,r1
    4121dfc8:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121dfcc:	0b 21 04 41 	ds r1,r25,r1
    4121dfd0:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121dfd4:	0b 21 04 41 	ds r1,r25,r1
    4121dfd8:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121dfdc:	0b 21 04 41 	ds r1,r25,r1
    4121dfe0:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121dfe4:	0b 21 04 41 	ds r1,r25,r1
    4121dfe8:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121dfec:	0b 21 04 41 	ds r1,r25,r1
    4121dff0:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121dff4:	0b 21 04 41 	ds r1,r25,r1
    4121dff8:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121dffc:	0b 21 04 41 	ds r1,r25,r1
    4121e000:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121e004:	0b 21 04 41 	ds r1,r25,r1
    4121e008:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121e00c:	0b 21 04 41 	ds r1,r25,r1
    4121e010:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121e014:	0b 21 04 41 	ds r1,r25,r1
    4121e018:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121e01c:	0b 21 04 41 	ds r1,r25,r1
    4121e020:	e8 40 c0 00 	bv r0(rp)
    4121e024:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121e028:	f3 20 0c 00 	depd,* r0,31,32,r25
    4121e02c:	8f 20 61 10 	cmpib,> 0,r25,4121e0bc <$$divoI+0x44c>
    4121e030:	08 00 02 40 	nop
    4121e034:	e8 19 40 00 	blr r25,r0
    4121e038:	08 00 02 40 	nop
    4121e03c:	b3 20 20 00 	addi,tc,= 0,r25,r0
    4121e040:	08 00 02 40 	nop
    4121e044:	e8 40 c0 00 	bv r0(rp)
    4121e048:	08 1a 02 5d 	copy r26,ret1
    4121e04c:	e8 40 c0 00 	bv r0(rp)
    4121e050:	d3 5d 1b c1 	extrw,u r26,30,31,ret1
    4121e054:	e8 00 01 c2 	b,l,n 4121e13c <$$divI_16+0x3c>,r0
    4121e058:	08 00 02 40 	nop
    4121e05c:	e8 40 c0 00 	bv r0(rp)
    4121e060:	d3 5d 1b a2 	extrw,u r26,29,30,ret1
    4121e064:	e8 00 02 2a 	b,l,n 4121e180 <$$divI_16+0x80>,r0
    4121e068:	08 00 02 40 	nop
    4121e06c:	e8 00 02 aa 	b,l,n 4121e1c8 <$$divI_16+0xc8>,r0
    4121e070:	08 00 02 40 	nop
    4121e074:	e8 00 06 9a 	b,l,n 4121e3c8 <$$divU_17+0xbc>,r0
    4121e078:	08 00 02 40 	nop
    4121e07c:	e8 40 c0 00 	bv r0(rp)
    4121e080:	d3 5d 1b 83 	extrw,u r26,28,29,ret1
    4121e084:	e8 00 07 12 	b,l,n 4121e414 <$$divU_17+0x108>,r0
    4121e088:	08 00 02 40 	nop
    4121e08c:	e8 00 02 9a 	b,l,n 4121e1e0 <$$divI_16+0xe0>,r0
    4121e090:	08 00 02 40 	nop
    4121e094:	e8 1f 1d 0d 	b,l 4121df20 <$$divoI+0x2b0>,r0
    4121e098:	08 20 04 40 	ds r0,r1,r0
    4121e09c:	e8 00 03 fa 	b,l,n 4121e2a0 <$$divI_16+0x1a0>,r0
    4121e0a0:	08 00 02 40 	nop
    4121e0a4:	e8 1f 1c ed 	b,l 4121df20 <$$divoI+0x2b0>,r0
    4121e0a8:	08 20 04 40 	ds r0,r1,r0
    4121e0ac:	e8 00 07 02 	b,l,n 4121e434 <$$divU_17+0x128>,r0
    4121e0b0:	08 00 02 40 	nop
    4121e0b4:	e8 00 04 22 	b,l,n 4121e2cc <$$divI_16+0x1cc>,r0
    4121e0b8:	08 00 02 40 	nop
    4121e0bc:	0b 3a 04 00 	sub r26,r25,r0
    4121e0c0:	e8 40 c0 00 	bv r0(rp)
    4121e0c4:	08 00 07 1d 	add,c r0,r0,ret1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
  2024-08-08 17:48                       ` Thomas Gleixner
@ 2024-08-08 18:19                         ` Linus Torvalds
  2024-08-08 20:52                           ` Guenter Roeck
  2024-09-03  7:54                           ` Helge Deller
  0 siblings, 2 replies; 21+ messages in thread
From: Linus Torvalds @ 2024-08-08 18:19 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Guenter Roeck, Vlastimil Babka, linux-kernel, Linux-MM,
	Helge Deller, linux-parisc

On Thu, 8 Aug 2024 at 10:48, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Here is the disassembly from my latest crashing debug kernel which
> shifts it up a couple of pages. Add 0x10 or sub 0x20 to make it work.

Looks like I was off by an instruction, it's the 28th divide-step (not
29) that does the page crosser:

>     4121dffc:   0b 21 04 41     ds r1,r25,r1
>     4121e000:   0b bd 07 1d     add,c ret1,ret1,ret1

but my parisc knowledge is not good enough to even guess at what could go wrong.

And I have no actual reason to believe this has *anything* to do with
an itlb miss, except for that whole "exact placement seems to matter,
and it crosses a page boundary" detail.

None of this makes sense. I think we'll have to wait for Helge. It's
not like parisc is a huge concern, and for all we know this is all a
qemu bug to begin with.

          Linus


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
  2024-08-08 18:19                         ` Linus Torvalds
@ 2024-08-08 20:52                           ` Guenter Roeck
  2024-08-08 21:50                             ` John David Anglin
  2024-08-08 22:15                             ` Richard Henderson
  2024-09-03  7:54                           ` Helge Deller
  1 sibling, 2 replies; 21+ messages in thread
From: Guenter Roeck @ 2024-08-08 20:52 UTC (permalink / raw)
  To: Linus Torvalds, Thomas Gleixner
  Cc: Vlastimil Babka, linux-kernel, Linux-MM, Helge Deller,
	linux-parisc, Richard Henderson

On 8/8/24 11:19, Linus Torvalds wrote:
> On Thu, 8 Aug 2024 at 10:48, Thomas Gleixner <tglx@linutronix.de> wrote:
>>
>> Here is the disassembly from my latest crashing debug kernel which
>> shifts it up a couple of pages. Add 0x10 or sub 0x20 to make it work.
> 
> Looks like I was off by an instruction, it's the 28th divide-step (not
> 29) that does the page crosser:
> 
>>      4121dffc:   0b 21 04 41     ds r1,r25,r1
>>      4121e000:   0b bd 07 1d     add,c ret1,ret1,ret1
> 
> but my parisc knowledge is not good enough to even guess at what could go wrong.
> 
> And I have no actual reason to believe this has *anything* to do with
> an itlb miss, except for that whole "exact placement seems to matter,
> and it crosses a page boundary" detail.
> 
> None of this makes sense. I think we'll have to wait for Helge. It's
> not like parisc is a huge concern, and for all we know this is all a
> qemu bug to begin with.
> 

Copying Richard Henderson who recently made a number of changes to the
parisc/hppa qemu implementation (which unfortunately didn't fix the problem).

Guenter



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
  2024-08-08 20:52                           ` Guenter Roeck
@ 2024-08-08 21:50                             ` John David Anglin
  2024-08-08 22:29                               ` John David Anglin
  2024-08-09  0:50                               ` Guenter Roeck
  2024-08-08 22:15                             ` Richard Henderson
  1 sibling, 2 replies; 21+ messages in thread
From: John David Anglin @ 2024-08-08 21:50 UTC (permalink / raw)
  To: Guenter Roeck, Linus Torvalds, Thomas Gleixner
  Cc: Vlastimil Babka, linux-kernel, Linux-MM, Helge Deller,
	linux-parisc, Richard Henderson

On 2024-08-08 4:52 p.m., Guenter Roeck wrote:
> On 8/8/24 11:19, Linus Torvalds wrote:
>> On Thu, 8 Aug 2024 at 10:48, Thomas Gleixner <tglx@linutronix.de> wrote:
>>>
>>> Here is the disassembly from my latest crashing debug kernel which
>>> shifts it up a couple of pages. Add 0x10 or sub 0x20 to make it work.
>>
>> Looks like I was off by an instruction, it's the 28th divide-step (not
>> 29) that does the page crosser:
>>
>>>      4121dffc:   0b 21 04 41     ds r1,r25,r1
>>>      4121e000:   0b bd 07 1d     add,c ret1,ret1,ret1
I think this macro might clobber the C/B bits on a ITLB missing:

         /* This is for ILP32 PA2.0 only.  The TLB insertion needs
          * to extend into I/O space if the address is 0xfXXXXXXX
          * so we extend the f's into the top word of the pte in
          * this case */
         .macro          f_extend        pte,tmp
         extrd,s         \pte,42,4,\tmp
         addi,<>         1,\tmp,%r0
         extrd,s         \pte,63,25,\pte
         .endm

The addi instruction affects the C/B bits.  However, it is only used for 32-bit PA 2.0 kernels.
A second tmp register would be needed to change the addi to an add logical.

The mode likely problem is the shladd instruction in the following macro in entry.S:

         .macro          L2_ptep pmd,pte,index,va,fault
#if CONFIG_PGTABLE_LEVELS == 3
         extru_safe      \va,31-ASM_PMD_SHIFT,ASM_BITS_PER_PMD,\index
#else
         extru_safe \va,31-ASM_PGDIR_SHIFT,ASM_BITS_PER_PGD,\index
#endif
         dep             %r0,31,PAGE_SHIFT,\pmd  /* clear offset */
#if CONFIG_PGTABLE_LEVELS < 3
         copy            %r0,\pte
#endif
         ldw,s           \index(\pmd),\pmd
         bb,>=,n         \pmd,_PxD_PRESENT_BIT,\fault
         dep             %r0,31,PxD_FLAG_SHIFT,\pmd /* clear flags */
         SHLREG          \pmd,PxD_VALUE_SHIFT,\pmd
         extru_safe      \va,31-PAGE_SHIFT,ASM_BITS_PER_PTE,\index
         dep             %r0,31,PAGE_SHIFT,\pmd  /* clear offset */
         shladd          \index,BITS_PER_PTE_ENTRY,\pmd,\pmd /* pmd is now pte */
         .endm

I believe the shladd instruction should be changed to shladd,l (shift left and add logical).

Dave

-- 
John David Anglin  dave.anglin@bell.net



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
  2024-08-08 20:52                           ` Guenter Roeck
  2024-08-08 21:50                             ` John David Anglin
@ 2024-08-08 22:15                             ` Richard Henderson
  1 sibling, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2024-08-08 22:15 UTC (permalink / raw)
  To: Guenter Roeck, Linus Torvalds, Thomas Gleixner
  Cc: Vlastimil Babka, linux-kernel, Linux-MM, Helge Deller, linux-parisc

On 8/9/24 06:52, Guenter Roeck wrote:
> On 8/8/24 11:19, Linus Torvalds wrote:
>> On Thu, 8 Aug 2024 at 10:48, Thomas Gleixner <tglx@linutronix.de> wrote:
>>>
>>> Here is the disassembly from my latest crashing debug kernel which
>>> shifts it up a couple of pages. Add 0x10 or sub 0x20 to make it work.
>>
>> Looks like I was off by an instruction, it's the 28th divide-step (not
>> 29) that does the page crosser:
>>
>>>      4121dffc:   0b 21 04 41     ds r1,r25,r1
>>>      4121e000:   0b bd 07 1d     add,c ret1,ret1,ret1
>>
>> but my parisc knowledge is not good enough to even guess at what could go wrong.
>>
>> And I have no actual reason to believe this has *anything* to do with
>> an itlb miss, except for that whole "exact placement seems to matter,
>> and it crosses a page boundary" detail.
>>
>> None of this makes sense. I think we'll have to wait for Helge. It's
>> not like parisc is a huge concern, and for all we know this is all a
>> qemu bug to begin with.
>>
> 
> Copying Richard Henderson who recently made a number of changes to the
> parisc/hppa qemu implementation (which unfortunately didn't fix the problem).

Wow, that's quite the agile bug you've got there.

You can eliminate one class of qemu bug by attempting to reproduce in qemu-linux-user: 
arrange for the page crossing at the appropriate spot and see if the split between two 
translation blocks causes carry flag weirdness.

If that doesn't reproduce, then I'd be likely to blame something in the exception delivery 
or return process.  Still could be a qemu problem, but it would be something in the system 
emulation of the exception path.

It should be possible to write a small system mode test case for this hypothesis.  Ideally 
the itlb miss handler would be as simple a possible, e.g. computing an identity mapping 
rather than using real page tables.

Only after that would I start digging into the linux kernel's exception paths.


r~


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
  2024-08-08 21:50                             ` John David Anglin
@ 2024-08-08 22:29                               ` John David Anglin
  2024-08-08 23:33                                 ` Linus Torvalds
  2024-08-09  0:56                                 ` Guenter Roeck
  2024-08-09  0:50                               ` Guenter Roeck
  1 sibling, 2 replies; 21+ messages in thread
From: John David Anglin @ 2024-08-08 22:29 UTC (permalink / raw)
  To: Guenter Roeck, Linus Torvalds, Thomas Gleixner
  Cc: Vlastimil Babka, linux-kernel, Linux-MM, Helge Deller,
	linux-parisc, Richard Henderson

On 2024-08-08 5:50 p.m., John David Anglin wrote:
> The mode likely problem is the shladd instruction in the following macro in entry.S:
>
>         .macro          L2_ptep pmd,pte,index,va,fault
> #if CONFIG_PGTABLE_LEVELS == 3
>         extru_safe      \va,31-ASM_PMD_SHIFT,ASM_BITS_PER_PMD,\index
> #else
>         extru_safe \va,31-ASM_PGDIR_SHIFT,ASM_BITS_PER_PGD,\index
> #endif
>         dep             %r0,31,PAGE_SHIFT,\pmd  /* clear offset */
> #if CONFIG_PGTABLE_LEVELS < 3
>         copy            %r0,\pte
> #endif
>         ldw,s           \index(\pmd),\pmd
>         bb,>=,n         \pmd,_PxD_PRESENT_BIT,\fault
>         dep             %r0,31,PxD_FLAG_SHIFT,\pmd /* clear flags */
>         SHLREG          \pmd,PxD_VALUE_SHIFT,\pmd
>         extru_safe      \va,31-PAGE_SHIFT,ASM_BITS_PER_PTE,\index
>         dep             %r0,31,PAGE_SHIFT,\pmd  /* clear offset */
>         shladd          \index,BITS_PER_PTE_ENTRY,\pmd,\pmd /* pmd is now pte */
>         .endm
>
> I believe the shladd instruction should be changed to shladd,l (shift left and add logical).
diff --git a/arch/parisc/kernel/entry.S b/arch/parisc/kernel/entry.S
index ab23e61a6f01..1ec60406f841 100644
--- a/arch/parisc/kernel/entry.S
+++ b/arch/parisc/kernel/entry.S
@@ -399,7 +399,7 @@
      SHLREG        \pmd,PxD_VALUE_SHIFT,\pmd
      extru_safe    \va,31-PAGE_SHIFT,ASM_BITS_PER_PTE,\index
      dep        %r0,31,PAGE_SHIFT,\pmd  /* clear offset */
-    shladd        \index,BITS_PER_PTE_ENTRY,\pmd,\pmd /* pmd is now pte */
+    shladd,l    \index,BITS_PER_PTE_ENTRY,\pmd,\pmd /* pmd is now pte */
      .endm

      /* Look up PTE in a 3-Level scheme. */

Boots okay.  Fixing the addi instruction is harder and it would take some time to test.

Dave

-- 
John David Anglin  dave.anglin@bell.net



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
  2024-08-08 22:29                               ` John David Anglin
@ 2024-08-08 23:33                                 ` Linus Torvalds
  2024-08-09  0:33                                   ` John David Anglin
  2024-08-09  0:56                                 ` Guenter Roeck
  1 sibling, 1 reply; 21+ messages in thread
From: Linus Torvalds @ 2024-08-08 23:33 UTC (permalink / raw)
  To: John David Anglin
  Cc: Guenter Roeck, Thomas Gleixner, Vlastimil Babka, linux-kernel,
	Linux-MM, Helge Deller, linux-parisc, Richard Henderson

On Thu, 8 Aug 2024 at 15:30, John David Anglin <dave.anglin@bell.net> wrote:
>
> > I believe the shladd instruction should be changed to shladd,l (shift left and add logical).
>
> diff --git a/arch/parisc/kernel/entry.S b/arch/parisc/kernel/entry.S
> index ab23e61a6f01..1ec60406f841 100644
> --- a/arch/parisc/kernel/entry.S
> +++ b/arch/parisc/kernel/entry.S
> @@ -399,7 +399,7 @@
> -    shladd        \index,BITS_PER_PTE_ENTRY,\pmd,\pmd /* pmd is now pte */
> +    shladd,l    \index,BITS_PER_PTE_ENTRY,\pmd,\pmd /* pmd is now pte */

This doesn't seem wrong, but doesn't RFIR already restore the status word?

So even if the itlb fill modifies C/B, I don't see why that should
actually matter.

But again, parisc is very much not one of the architectures I've ever
worked with, so..

          Linus


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
  2024-08-08 23:33                                 ` Linus Torvalds
@ 2024-08-09  0:33                                   ` John David Anglin
  0 siblings, 0 replies; 21+ messages in thread
From: John David Anglin @ 2024-08-09  0:33 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Guenter Roeck, Thomas Gleixner, Vlastimil Babka, linux-kernel,
	Linux-MM, Helge Deller, linux-parisc, Richard Henderson

On 2024-08-08 7:33 p.m., Linus Torvalds wrote:
> On Thu, 8 Aug 2024 at 15:30, John David Anglin <dave.anglin@bell.net> wrote:
>>> I believe the shladd instruction should be changed to shladd,l (shift left and add logical).
>> diff --git a/arch/parisc/kernel/entry.S b/arch/parisc/kernel/entry.S
>> index ab23e61a6f01..1ec60406f841 100644
>> --- a/arch/parisc/kernel/entry.S
>> +++ b/arch/parisc/kernel/entry.S
>> @@ -399,7 +399,7 @@
>> -    shladd        \index,BITS_PER_PTE_ENTRY,\pmd,\pmd /* pmd is now pte */
>> +    shladd,l    \index,BITS_PER_PTE_ENTRY,\pmd,\pmd /* pmd is now pte */
> This doesn't seem wrong, but doesn't RFIR already restore the status word?
Yes.
>
> So even if the itlb fill modifies C/B, I don't see why that should
> actually matter.
Probably won't help unless there's a bug in qemu rfir emulation.

Dave

-- 
John David Anglin  dave.anglin@bell.net



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
  2024-08-08 21:50                             ` John David Anglin
  2024-08-08 22:29                               ` John David Anglin
@ 2024-08-09  0:50                               ` Guenter Roeck
  1 sibling, 0 replies; 21+ messages in thread
From: Guenter Roeck @ 2024-08-09  0:50 UTC (permalink / raw)
  To: John David Anglin, Linus Torvalds, Thomas Gleixner
  Cc: Vlastimil Babka, linux-kernel, Linux-MM, Helge Deller,
	linux-parisc, Richard Henderson

On 8/8/24 14:50, John David Anglin wrote:
> On 2024-08-08 4:52 p.m., Guenter Roeck wrote:
>> On 8/8/24 11:19, Linus Torvalds wrote:
>>> On Thu, 8 Aug 2024 at 10:48, Thomas Gleixner <tglx@linutronix.de> wrote:
>>>>
>>>> Here is the disassembly from my latest crashing debug kernel which
>>>> shifts it up a couple of pages. Add 0x10 or sub 0x20 to make it work.
>>>
>>> Looks like I was off by an instruction, it's the 28th divide-step (not
>>> 29) that does the page crosser:
>>>
>>>>      4121dffc:   0b 21 04 41     ds r1,r25,r1
>>>>      4121e000:   0b bd 07 1d     add,c ret1,ret1,ret1
> I think this macro might clobber the C/B bits on a ITLB missing:
> 
>          /* This is for ILP32 PA2.0 only.  The TLB insertion needs
>           * to extend into I/O space if the address is 0xfXXXXXXX
>           * so we extend the f's into the top word of the pte in
>           * this case */
>          .macro          f_extend        pte,tmp
>          extrd,s         \pte,42,4,\tmp
>          addi,<>         1,\tmp,%r0
>          extrd,s         \pte,63,25,\pte
>          .endm
> 
> The addi instruction affects the C/B bits.  However, it is only used for 32-bit PA 2.0 kernels.
> A second tmp register would be needed to change the addi to an add logical.
> 
> The mode likely problem is the shladd instruction in the following macro in entry.S:
> 
>          .macro          L2_ptep pmd,pte,index,va,fault
> #if CONFIG_PGTABLE_LEVELS == 3
>          extru_safe      \va,31-ASM_PMD_SHIFT,ASM_BITS_PER_PMD,\index
> #else
>          extru_safe \va,31-ASM_PGDIR_SHIFT,ASM_BITS_PER_PGD,\index
> #endif
>          dep             %r0,31,PAGE_SHIFT,\pmd  /* clear offset */
> #if CONFIG_PGTABLE_LEVELS < 3
>          copy            %r0,\pte
> #endif
>          ldw,s           \index(\pmd),\pmd
>          bb,>=,n         \pmd,_PxD_PRESENT_BIT,\fault
>          dep             %r0,31,PxD_FLAG_SHIFT,\pmd /* clear flags */
>          SHLREG          \pmd,PxD_VALUE_SHIFT,\pmd
>          extru_safe      \va,31-PAGE_SHIFT,ASM_BITS_PER_PTE,\index
>          dep             %r0,31,PAGE_SHIFT,\pmd  /* clear offset */
>          shladd          \index,BITS_PER_PTE_ENTRY,\pmd,\pmd /* pmd is now pte */
>          .endm
> 
> I believe the shladd instruction should be changed to shladd,l (shift left and add logical).
> 

That doesn't help, at least not in qemu.

Guenter



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
  2024-08-08 22:29                               ` John David Anglin
  2024-08-08 23:33                                 ` Linus Torvalds
@ 2024-08-09  0:56                                 ` Guenter Roeck
  1 sibling, 0 replies; 21+ messages in thread
From: Guenter Roeck @ 2024-08-09  0:56 UTC (permalink / raw)
  To: John David Anglin, Linus Torvalds, Thomas Gleixner
  Cc: Vlastimil Babka, linux-kernel, Linux-MM, Helge Deller,
	linux-parisc, Richard Henderson

On 8/8/24 15:29, John David Anglin wrote:
> On 2024-08-08 5:50 p.m., John David Anglin wrote:
>> The mode likely problem is the shladd instruction in the following macro in entry.S:
>>
>>         .macro          L2_ptep pmd,pte,index,va,fault
>> #if CONFIG_PGTABLE_LEVELS == 3
>>         extru_safe      \va,31-ASM_PMD_SHIFT,ASM_BITS_PER_PMD,\index
>> #else
>>         extru_safe \va,31-ASM_PGDIR_SHIFT,ASM_BITS_PER_PGD,\index
>> #endif
>>         dep             %r0,31,PAGE_SHIFT,\pmd  /* clear offset */
>> #if CONFIG_PGTABLE_LEVELS < 3
>>         copy            %r0,\pte
>> #endif
>>         ldw,s           \index(\pmd),\pmd
>>         bb,>=,n         \pmd,_PxD_PRESENT_BIT,\fault
>>         dep             %r0,31,PxD_FLAG_SHIFT,\pmd /* clear flags */
>>         SHLREG          \pmd,PxD_VALUE_SHIFT,\pmd
>>         extru_safe      \va,31-PAGE_SHIFT,ASM_BITS_PER_PTE,\index
>>         dep             %r0,31,PAGE_SHIFT,\pmd  /* clear offset */
>>         shladd          \index,BITS_PER_PTE_ENTRY,\pmd,\pmd /* pmd is now pte */
>>         .endm
>>
>> I believe the shladd instruction should be changed to shladd,l (shift left and add logical).
> diff --git a/arch/parisc/kernel/entry.S b/arch/parisc/kernel/entry.S
> index ab23e61a6f01..1ec60406f841 100644
> --- a/arch/parisc/kernel/entry.S
> +++ b/arch/parisc/kernel/entry.S
> @@ -399,7 +399,7 @@
>       SHLREG        \pmd,PxD_VALUE_SHIFT,\pmd
>       extru_safe    \va,31-PAGE_SHIFT,ASM_BITS_PER_PTE,\index
>       dep        %r0,31,PAGE_SHIFT,\pmd  /* clear offset */
> -    shladd        \index,BITS_PER_PTE_ENTRY,\pmd,\pmd /* pmd is now pte */
> +    shladd,l    \index,BITS_PER_PTE_ENTRY,\pmd,\pmd /* pmd is now pte */
>       .endm
> 
>       /* Look up PTE in a 3-Level scheme. */
> 
> Boots okay.  Fixing the addi instruction is harder and it would take some time to test.
> 

Odd, it doesn't help for me. Does it crash for you without the above change ?
Or, in other words, is divI at the objecting location ?

Guenter




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
  2024-08-08 18:19                         ` Linus Torvalds
  2024-08-08 20:52                           ` Guenter Roeck
@ 2024-09-03  7:54                           ` Helge Deller
  2024-09-03 14:13                             ` Guenter Roeck
  2024-09-03 18:43                             ` Linus Torvalds
  1 sibling, 2 replies; 21+ messages in thread
From: Helge Deller @ 2024-09-03  7:54 UTC (permalink / raw)
  To: Linus Torvalds, Richard Henderson, Guenter Roeck
  Cc: Vlastimil Babka, linux-kernel, Linux-MM, linux-parisc, Thomas Gleixner

On 8/8/24 20:19, Linus Torvalds wrote:
> On Thu, 8 Aug 2024 at 10:48, Thomas Gleixner <tglx@linutronix.de> wrote:
>>
>> Here is the disassembly from my latest crashing debug kernel which
>> shifts it up a couple of pages. Add 0x10 or sub 0x20 to make it work.
>
> Looks like I was off by an instruction, it's the 28th divide-step (not
> 29) that does the page crosser:
>
>>      4121dffc:   0b 21 04 41     ds r1,r25,r1
>>      4121e000:   0b bd 07 1d     add,c ret1,ret1,ret1
>
> but my parisc knowledge is not good enough to even guess at what could go wrong.
>
> And I have no actual reason to believe this has *anything* to do with
> an itlb miss, except for that whole "exact placement seems to matter,
> and it crosses a page boundary" detail.

Well, you were on the right track :-)

Guenters kernel from
http://server.roeck-us.net/qemu/parisc64-6.10.3/
boots nicely on my physical C3700 machine, but crashes with Qemu.

So, it's not some bug in the kernel ITLB miss handler or other
Linux kernel code.

Instead it's a Qemu bug, which gets triggered by the page
boundary crossing of:
    41218ffc:   0b 21 04 41     ds r1,r25,r1
    41219000:   0b bd 07 1d     add,c ret1,ret1,ret1

During the ITLB miss, the carry bits and the PSW-V-bit
(from the divide step) are saved in the IPSW register and restored
upon irq return.

During packaging the bits there is a qemu coding bug, where we missed
to handle the PSW-V-bit as 32-bit value even on a 64-bit CPU.
The (copy&pasted) patch below fixes the crash for me.

Helge

diff --git a/target/hppa/helper.c b/target/hppa/helper.c
index b79ddd8184..d4b1a3cd5a 100644
--- a/target/hppa/helper.c
+++ b/target/hppa/helper.c
@@ -53,7 +53,7 @@ target_ulong cpu_hppa_get_psw(CPUHPPAState *env)
      }

      psw |= env->psw_n * PSW_N;
-    psw |= (env->psw_v < 0) * PSW_V;
+    psw |= ((env->psw_v >> 31) & 1) * PSW_V;
      psw |= env->psw | env->psw_xb;

      return psw;



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
  2024-09-03  7:54                           ` Helge Deller
@ 2024-09-03 14:13                             ` Guenter Roeck
  2024-09-03 18:43                             ` Linus Torvalds
  1 sibling, 0 replies; 21+ messages in thread
From: Guenter Roeck @ 2024-09-03 14:13 UTC (permalink / raw)
  To: Helge Deller
  Cc: Linus Torvalds, Richard Henderson, Vlastimil Babka, linux-kernel,
	Linux-MM, linux-parisc, Thomas Gleixner

On Tue, Sep 03, 2024 at 09:54:19AM +0200, Helge Deller wrote:
> On 8/8/24 20:19, Linus Torvalds wrote:
> > On Thu, 8 Aug 2024 at 10:48, Thomas Gleixner <tglx@linutronix.de> wrote:
> > > 
> > > Here is the disassembly from my latest crashing debug kernel which
> > > shifts it up a couple of pages. Add 0x10 or sub 0x20 to make it work.
> > 
> > Looks like I was off by an instruction, it's the 28th divide-step (not
> > 29) that does the page crosser:
> > 
> > >      4121dffc:   0b 21 04 41     ds r1,r25,r1
> > >      4121e000:   0b bd 07 1d     add,c ret1,ret1,ret1
> > 
> > but my parisc knowledge is not good enough to even guess at what could go wrong.
> > 
> > And I have no actual reason to believe this has *anything* to do with
> > an itlb miss, except for that whole "exact placement seems to matter,
> > and it crosses a page boundary" detail.
> 
> Well, you were on the right track :-)
> 
> Guenters kernel from
> http://server.roeck-us.net/qemu/parisc64-6.10.3/
> boots nicely on my physical C3700 machine, but crashes with Qemu.
> 
> So, it's not some bug in the kernel ITLB miss handler or other
> Linux kernel code.
> 
> Instead it's a Qemu bug, which gets triggered by the page
> boundary crossing of:
>    41218ffc:   0b 21 04 41     ds r1,r25,r1
>    41219000:   0b bd 07 1d     add,c ret1,ret1,ret1
> 
> During the ITLB miss, the carry bits and the PSW-V-bit
> (from the divide step) are saved in the IPSW register and restored
> upon irq return.
> 
> During packaging the bits there is a qemu coding bug, where we missed
> to handle the PSW-V-bit as 32-bit value even on a 64-bit CPU.
> The (copy&pasted) patch below fixes the crash for me.
> 

Yes, that works for me as well. Thanks a lot for the fix!

Guenter

> Helge
> 
> diff --git a/target/hppa/helper.c b/target/hppa/helper.c
> index b79ddd8184..d4b1a3cd5a 100644
> --- a/target/hppa/helper.c
> +++ b/target/hppa/helper.c
> @@ -53,7 +53,7 @@ target_ulong cpu_hppa_get_psw(CPUHPPAState *env)
>      }
> 
>      psw |= env->psw_n * PSW_N;
> -    psw |= (env->psw_v < 0) * PSW_V;
> +    psw |= ((env->psw_v >> 31) & 1) * PSW_V;
>      psw |= env->psw | env->psw_xb;
> 
>      return psw;
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
  2024-09-03  7:54                           ` Helge Deller
  2024-09-03 14:13                             ` Guenter Roeck
@ 2024-09-03 18:43                             ` Linus Torvalds
  1 sibling, 0 replies; 21+ messages in thread
From: Linus Torvalds @ 2024-09-03 18:43 UTC (permalink / raw)
  To: Helge Deller
  Cc: Richard Henderson, Guenter Roeck, Vlastimil Babka, linux-kernel,
	Linux-MM, linux-parisc, Thomas Gleixner

On Tue, 3 Sept 2024 at 00:54, Helge Deller <deller@gmx.de> wrote:
>
> During packaging the bits there is a qemu coding bug, where we missed
> to handle the PSW-V-bit as 32-bit value even on a 64-bit CPU.

Well, that was a fun bug.

And by "fun" I obviously mean "let's hope to never do this again".

               Linus


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2024-09-03 18:43 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20240731095022.970699670@linuxfoundation.org>
     [not found] ` <718b8afe-222f-4b3a-96d3-93af0e4ceff1@roeck-us.net>
2024-08-06  2:40   ` [PATCH 6.10 000/809] 6.10.3-rc3 review Linus Torvalds
2024-08-06 11:02     ` Vlastimil Babka
2024-08-06 17:33       ` Thomas Gleixner
     [not found]         ` <90e02d99-37a2-437e-ad42-44b80c4e94f6@suse.cz>
     [not found]           ` <87frrh44mf.ffs@tglx>
     [not found]             ` <76c643ee-17d6-463b-8ee1-4e30b0133671@roeck-us.net>
     [not found]               ` <87plqjz6aa.ffs@tglx>
2024-08-08 15:53                 ` Linus Torvalds
2024-08-08 16:12                   ` Thomas Gleixner
2024-08-08 16:33                     ` Linus Torvalds
2024-08-08 17:48                       ` Thomas Gleixner
2024-08-08 18:19                         ` Linus Torvalds
2024-08-08 20:52                           ` Guenter Roeck
2024-08-08 21:50                             ` John David Anglin
2024-08-08 22:29                               ` John David Anglin
2024-08-08 23:33                                 ` Linus Torvalds
2024-08-09  0:33                                   ` John David Anglin
2024-08-09  0:56                                 ` Guenter Roeck
2024-08-09  0:50                               ` Guenter Roeck
2024-08-08 22:15                             ` Richard Henderson
2024-09-03  7:54                           ` Helge Deller
2024-09-03 14:13                             ` Guenter Roeck
2024-09-03 18:43                             ` Linus Torvalds
     [not found]                 ` <cffe30ed-43a3-46ac-ad03-afb7633f17e5@roeck-us.net>
2024-08-08 15:58                   ` John David Anglin
     [not found]       ` <f63c6789-b01a-4d76-b7c9-74c04867bc13@roeck-us.net>
     [not found]         ` <CAHk-=wjmumbT73xLkSAnnxDwaFE__Ny=QCp6B_LE2aG1SUqiTg@mail.gmail.com>
2024-08-06 17:49           ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox