* mips64-linux-ld: div64.c:undefined reference to `__multi3'
@ 2026-01-13 17:59 kernel test robot
2026-01-13 20:04 ` David Laight
0 siblings, 1 reply; 8+ messages in thread
From: kernel test robot @ 2026-01-13 17:59 UTC (permalink / raw)
To: David Laight
Cc: oe-kbuild-all, linux-kernel, Andrew Morton,
Linux Memory Management List, Nicolas Pitre
tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
head: b71e635feefc852405b14620a7fc58c4c80c0f73
commit: d10bb374c41e4c4dced04ae7d2fe2d782a5858a0 lib: mul_u64_u64_div_u64(): optimise the divide code
date: 8 weeks ago
config: mips-randconfig-r113-20260113 (https://download.01.org/0day-ci/archive/20260114/202601140146.hMLODc6v-lkp@intel.com/config)
compiler: mips64-linux-gcc (GCC) 8.5.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260114/202601140146.hMLODc6v-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202601140146.hMLODc6v-lkp@intel.com/
All errors (new ones prefixed by >>):
mips64-linux-ld: lib/math/div64.o: in function `mul_u64_add_u64_div_u64':
div64.c:(.text+0x84): undefined reference to `__multi3'
>> mips64-linux-ld: div64.c:(.text+0x11c): undefined reference to `__multi3'
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: mips64-linux-ld: div64.c:undefined reference to `__multi3'
2026-01-13 17:59 mips64-linux-ld: div64.c:undefined reference to `__multi3' kernel test robot
@ 2026-01-13 20:04 ` David Laight
2026-01-13 21:58 ` David Laight
2026-01-14 6:19 ` Maciej W. Rozycki
0 siblings, 2 replies; 8+ messages in thread
From: David Laight @ 2026-01-13 20:04 UTC (permalink / raw)
To: kernel test robot
Cc: oe-kbuild-all, linux-kernel, Andrew Morton,
Linux Memory Management List, Nicolas Pitre, linux-mips,
tsbogend
On Wed, 14 Jan 2026 01:59:24 +0800
kernel test robot <lkp@intel.com> wrote:
> tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> head: b71e635feefc852405b14620a7fc58c4c80c0f73
> commit: d10bb374c41e4c4dced04ae7d2fe2d782a5858a0 lib: mul_u64_u64_div_u64(): optimise the divide code
> date: 8 weeks ago
> config: mips-randconfig-r113-20260113 (https://download.01.org/0day-ci/archive/20260114/202601140146.hMLODc6v-lkp@intel.com/config)
> compiler: mips64-linux-gcc (GCC) 8.5.0
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260114/202601140146.hMLODc6v-lkp@intel.com/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202601140146.hMLODc6v-lkp@intel.com/
>
> All errors (new ones prefixed by >>):
>
> mips64-linux-ld: lib/math/div64.o: in function `mul_u64_add_u64_div_u64':
> div64.c:(.text+0x84): undefined reference to `__multi3'
> >> mips64-linux-ld: div64.c:(.text+0x11c): undefined reference to `__multi3'
>
This looks like a bug in the mips 'port'.
arch/mips/lib/multi3.c has the comment:
/*
* GCC 7 & older can suboptimally generate __multi3 calls for mips64r6, so for
* that specific case only we implement that intrinsic here.
*
* See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82981
*/
#if defined(CONFIG_64BIT) && defined(CONFIG_CPU_MIPSR6) && (__GNUC__ < 8)
So this code is excluded for gcc 8.5 but the compiler is generating the call.
Looking at the git log for that file there is a comment that includes:
"we wouldn't expect any calls to __multi3 to be generated from
kernel code".
Not true....
Not sure why the link didn't fail before though, something subtle must
have changed.
I think the fix is just to remove the gcc version check.
The code itself just adds the results of four multiply instructions together.
David
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: mips64-linux-ld: div64.c:undefined reference to `__multi3'
2026-01-13 20:04 ` David Laight
@ 2026-01-13 21:58 ` David Laight
2026-01-14 6:19 ` Maciej W. Rozycki
1 sibling, 0 replies; 8+ messages in thread
From: David Laight @ 2026-01-13 21:58 UTC (permalink / raw)
To: kernel test robot
Cc: oe-kbuild-all, linux-kernel, Andrew Morton,
Linux Memory Management List, Nicolas Pitre, linux-mips,
tsbogend
On Tue, 13 Jan 2026 20:04:55 +0000
David Laight <david.laight.linux@gmail.com> wrote:
Resend fixing Thomas's email
> On Wed, 14 Jan 2026 01:59:24 +0800
> kernel test robot <lkp@intel.com> wrote:
>
> > tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> > head: b71e635feefc852405b14620a7fc58c4c80c0f73
> > commit: d10bb374c41e4c4dced04ae7d2fe2d782a5858a0 lib: mul_u64_u64_div_u64(): optimise the divide code
> > date: 8 weeks ago
> > config: mips-randconfig-r113-20260113 (https://download.01.org/0day-ci/archive/20260114/202601140146.hMLODc6v-lkp@intel.com/config)
> > compiler: mips64-linux-gcc (GCC) 8.5.0
> > reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260114/202601140146.hMLODc6v-lkp@intel.com/reproduce)
> >
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > the same patch/commit), kindly add following tags
> > | Reported-by: kernel test robot <lkp@intel.com>
> > | Closes: https://lore.kernel.org/oe-kbuild-all/202601140146.hMLODc6v-lkp@intel.com/
> >
> > All errors (new ones prefixed by >>):
> >
> > mips64-linux-ld: lib/math/div64.o: in function `mul_u64_add_u64_div_u64':
> > div64.c:(.text+0x84): undefined reference to `__multi3'
> > >> mips64-linux-ld: div64.c:(.text+0x11c): undefined reference to `__multi3'
> >
>
> This looks like a bug in the mips 'port'.
> arch/mips/lib/multi3.c has the comment:
>
> /*
> * GCC 7 & older can suboptimally generate __multi3 calls for mips64r6, so for
> * that specific case only we implement that intrinsic here.
> *
> * See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82981
> */
> #if defined(CONFIG_64BIT) && defined(CONFIG_CPU_MIPSR6) && (__GNUC__ < 8)
>
> So this code is excluded for gcc 8.5 but the compiler is generating the call.
>
> Looking at the git log for that file there is a comment that includes:
> "we wouldn't expect any calls to __multi3 to be generated from
> kernel code".
> Not true....
> Not sure why the link didn't fail before though, something subtle must
> have changed.
>
> I think the fix is just to remove the gcc version check.
> The code itself just adds the results of four multiply instructions together.
>
> David
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: mips64-linux-ld: div64.c:undefined reference to `__multi3'
2026-01-13 20:04 ` David Laight
2026-01-13 21:58 ` David Laight
@ 2026-01-14 6:19 ` Maciej W. Rozycki
2026-01-14 10:31 ` David Laight
1 sibling, 1 reply; 8+ messages in thread
From: Maciej W. Rozycki @ 2026-01-14 6:19 UTC (permalink / raw)
To: David Laight
Cc: kernel test robot, oe-kbuild-all, linux-kernel, Andrew Morton,
Linux Memory Management List, Nicolas Pitre, linux-mips,
Thomas Bogendoerfer
On Tue, 13 Jan 2026, David Laight wrote:
> > All errors (new ones prefixed by >>):
> >
> > mips64-linux-ld: lib/math/div64.o: in function `mul_u64_add_u64_div_u64':
> > div64.c:(.text+0x84): undefined reference to `__multi3'
> > >> mips64-linux-ld: div64.c:(.text+0x11c): undefined reference to `__multi3'
> >
>
> This looks like a bug in the mips 'port'.
> arch/mips/lib/multi3.c has the comment:
>
> /*
> * GCC 7 & older can suboptimally generate __multi3 calls for mips64r6, so for
> * that specific case only we implement that intrinsic here.
> *
> * See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82981
> */
> #if defined(CONFIG_64BIT) && defined(CONFIG_CPU_MIPSR6) && (__GNUC__ < 8)
>
> So this code is excluded for gcc 8.5 but the compiler is generating the call.
>
> Looking at the git log for that file there is a comment that includes:
> "we wouldn't expect any calls to __multi3 to be generated from
> kernel code".
> Not true....
> Not sure why the link didn't fail before though, something subtle must
> have changed.
>
> I think the fix is just to remove the gcc version check.
Or rather fix the version check. The GCC fix went in with GCC 10:
$ git log -1 --pretty=oneline 48b2123f6336
48b2123f6336ba6c06846d7c8b60bd14eaeae7ec re PR target/82981 (unnecessary __multi3 call for mips64r6 linux kernel)
$ git show 48b2123f6336:gcc/BASE-VER
10.0.0
$
I don't know why the PR got it all wrong; I've fixed it now.
Maciej
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: mips64-linux-ld: div64.c:undefined reference to `__multi3'
2026-01-14 6:19 ` Maciej W. Rozycki
@ 2026-01-14 10:31 ` David Laight
2026-01-14 15:50 ` Maciej W. Rozycki
0 siblings, 1 reply; 8+ messages in thread
From: David Laight @ 2026-01-14 10:31 UTC (permalink / raw)
To: Maciej W. Rozycki
Cc: kernel test robot, oe-kbuild-all, linux-kernel, Andrew Morton,
Linux Memory Management List, Nicolas Pitre, linux-mips,
Thomas Bogendoerfer
On Wed, 14 Jan 2026 06:19:02 +0000 (GMT)
"Maciej W. Rozycki" <macro@orcam.me.uk> wrote:
> On Tue, 13 Jan 2026, David Laight wrote:
>
> > > All errors (new ones prefixed by >>):
> > >
> > > mips64-linux-ld: lib/math/div64.o: in function `mul_u64_add_u64_div_u64':
> > > div64.c:(.text+0x84): undefined reference to `__multi3'
> > > >> mips64-linux-ld: div64.c:(.text+0x11c): undefined reference to `__multi3'
> > >
> >
> > This looks like a bug in the mips 'port'.
> > arch/mips/lib/multi3.c has the comment:
> >
> > /*
> > * GCC 7 & older can suboptimally generate __multi3 calls for mips64r6, so for
> > * that specific case only we implement that intrinsic here.
> > *
> > * See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82981
> > */
> > #if defined(CONFIG_64BIT) && defined(CONFIG_CPU_MIPSR6) && (__GNUC__ < 8)
> >
> > So this code is excluded for gcc 8.5 but the compiler is generating the call.
> >
> > Looking at the git log for that file there is a comment that includes:
> > "we wouldn't expect any calls to __multi3 to be generated from
> > kernel code".
> > Not true....
> > Not sure why the link didn't fail before though, something subtle must
> > have changed.
> >
> > I think the fix is just to remove the gcc version check.
>
> Or rather fix the version check. The GCC fix went in with GCC 10:
Does that mean the GCC 10 generates the multiply instructions and never calls
__multi3?
(Rather than just not using __multi3() for that specific example.)
In this case gcc knows the high bits are all zero - so just needs the two
instructions to generate the high and low parts.
David
>
> $ git log -1 --pretty=oneline 48b2123f6336
> 48b2123f6336ba6c06846d7c8b60bd14eaeae7ec re PR target/82981 (unnecessary __multi3 call for mips64r6 linux kernel)
> $ git show 48b2123f6336:gcc/BASE-VER
> 10.0.0
> $
>
> I don't know why the PR got it all wrong; I've fixed it now.
>
> Maciej
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: mips64-linux-ld: div64.c:undefined reference to `__multi3'
2026-01-14 10:31 ` David Laight
@ 2026-01-14 15:50 ` Maciej W. Rozycki
2026-01-14 17:34 ` David Laight
0 siblings, 1 reply; 8+ messages in thread
From: Maciej W. Rozycki @ 2026-01-14 15:50 UTC (permalink / raw)
To: David Laight
Cc: kernel test robot, oe-kbuild-all, linux-kernel, Andrew Morton,
Linux Memory Management List, Nicolas Pitre, linux-mips,
Thomas Bogendoerfer
On Wed, 14 Jan 2026, David Laight wrote:
> > > Looking at the git log for that file there is a comment that includes:
> > > "we wouldn't expect any calls to __multi3 to be generated from
> > > kernel code".
> > > Not true....
> > > Not sure why the link didn't fail before though, something subtle must
> > > have changed.
> > >
> > > I think the fix is just to remove the gcc version check.
> >
> > Or rather fix the version check. The GCC fix went in with GCC 10:
>
> Does that mean the GCC 10 generates the multiply instructions and never calls
> __multi3?
> (Rather than just not using __multi3() for that specific example.)
Of course it still does call `__multi3' for 128x128bit multiplication.
It doesn't for widening 64x64bit one though, which was a missed case for
MIPS64r6 only, having been supported by GCC ever since MIPS III ISA. I
think we do want to fail link in the 128x128bit case.
> In this case gcc knows the high bits are all zero - so just needs the two
> instructions to generate the high and low parts.
Distinct RTL insns are produced, so all the usual RTL optimisations
apply (in addition to any tree optimisations already made):
mul_u64_u64_add_u64:
.frame $sp,0,$31 # vars= 0, regs= 0/0, args= 0, gp= 0
.mask 0x00000000,0
.fmask 0x00000000,0
.set noreorder
.set nomacro
dmul $2,$5,$6 # 9 [c=20 l=4] muldi3_mul3_nohilo
dmuhu $5,$5,$6 # 10 [c=44 l=4] umuldi3_highpart_r6
daddu $7,$2,$7 # 14 [c=4 l=4] *adddi3/1
sltu $2,$7,$2 # 16 [c=4 l=4] *sltu_didi
sd $7,0($4) # 21 [c=4 l=4] *movdi_64bit/4
jr $31 # 44 [c=0 l=4] *simple_return
daddu $2,$2,$5 # 29 [c=4 l=4] *adddi3/1
(hmm, I wonder why the cost for the high-part RTX is over twice that for
the low-part one; this seems outright wrong, also taking the possibility
of fusing into account).
Maciej
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: mips64-linux-ld: div64.c:undefined reference to `__multi3'
2026-01-14 15:50 ` Maciej W. Rozycki
@ 2026-01-14 17:34 ` David Laight
2026-01-14 20:59 ` Maciej W. Rozycki
0 siblings, 1 reply; 8+ messages in thread
From: David Laight @ 2026-01-14 17:34 UTC (permalink / raw)
To: Maciej W. Rozycki
Cc: kernel test robot, oe-kbuild-all, linux-kernel, Andrew Morton,
Linux Memory Management List, Nicolas Pitre, linux-mips,
Thomas Bogendoerfer
On Wed, 14 Jan 2026 15:50:09 +0000 (GMT)
"Maciej W. Rozycki" <macro@orcam.me.uk> wrote:
> On Wed, 14 Jan 2026, David Laight wrote:
>
> > > > Looking at the git log for that file there is a comment that includes:
> > > > "we wouldn't expect any calls to __multi3 to be generated from
> > > > kernel code".
> > > > Not true....
> > > > Not sure why the link didn't fail before though, something subtle must
> > > > have changed.
> > > >
> > > > I think the fix is just to remove the gcc version check.
> > >
> > > Or rather fix the version check. The GCC fix went in with GCC 10:
> >
> > Does that mean the GCC 10 generates the multiply instructions and never calls
> > __multi3?
> > (Rather than just not using __multi3() for that specific example.)
>
> Of course it still does call `__multi3' for 128x128bit multiplication.
> It doesn't for widening 64x64bit one though, which was a missed case for
> MIPS64r6 only, having been supported by GCC ever since MIPS III ISA. I
> think we do want to fail link in the 128x128bit case.
That's fine by me.
I only get blamed for the widening one :-)
...
> Distinct RTL insns are produced, so all the usual RTL optimisations
> apply (in addition to any tree optimisations already made):
>
> mul_u64_u64_add_u64:
> .frame $sp,0,$31 # vars= 0, regs= 0/0, args= 0, gp= 0
> .mask 0x00000000,0
> .fmask 0x00000000,0
> .set noreorder
> .set nomacro
> dmul $2,$5,$6 # 9 [c=20 l=4] muldi3_mul3_nohilo
> dmuhu $5,$5,$6 # 10 [c=44 l=4] umuldi3_highpart_r6
> daddu $7,$2,$7 # 14 [c=4 l=4] *adddi3/1
> sltu $2,$7,$2 # 16 [c=4 l=4] *sltu_didi
> sd $7,0($4) # 21 [c=4 l=4] *movdi_64bit/4
> jr $31 # 44 [c=0 l=4] *simple_return
> daddu $2,$2,$5 # 29 [c=4 l=4] *adddi3/1
>
> (hmm, I wonder why the cost for the high-part RTX is over twice that for
> the low-part one; this seems outright wrong, also taking the possibility
> of fusing into account).
They might be different, if the wide multiply is implemented with multiple
narrow ones then the high result bits don't need to be generated if only
the low result bits are needed.
If the data is clocked through a single multiplier that might be significant,
but probably not double (I think you need 3/4 of the products).
If the results of separate narrow multipliers have to be added together
then the carry-ripple of all the adds might make a small difference, but I'd
only expect 1 (perhaps 2) clocks for that.
If those are gcc's costs I suspect they may not match reality, after all they
usually only have to be 'good enough' or 'reasonable'.
I nearly got the 32x32 multiply to run in a single clock on my Nios-II
re-implementation, but the 64bit ripple carry delayed things too much for
the other logic that needed to happen to feed the product back into the ALU.
The product itself could be latched - so it wasn't far off.
I think I could have fed back the low bits, but that would have been
complicated. Detecting 'short' (18bit by 18bit) unsigned multiplies
would have been more use - they are common for array indexes.
(That was a 'fun' project... Nios-II is, AFICT, basically MIPS 32.)
David
>
> Maciej
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: mips64-linux-ld: div64.c:undefined reference to `__multi3'
2026-01-14 17:34 ` David Laight
@ 2026-01-14 20:59 ` Maciej W. Rozycki
0 siblings, 0 replies; 8+ messages in thread
From: Maciej W. Rozycki @ 2026-01-14 20:59 UTC (permalink / raw)
To: David Laight
Cc: kernel test robot, oe-kbuild-all, linux-kernel, Andrew Morton,
Linux Memory Management List, Nicolas Pitre, linux-mips,
Thomas Bogendoerfer
On Wed, 14 Jan 2026, David Laight wrote:
> > dmul $2,$5,$6 # 9 [c=20 l=4] muldi3_mul3_nohilo
> > dmuhu $5,$5,$6 # 10 [c=44 l=4] umuldi3_highpart_r6
> > daddu $7,$2,$7 # 14 [c=4 l=4] *adddi3/1
> > sltu $2,$7,$2 # 16 [c=4 l=4] *sltu_didi
> > sd $7,0($4) # 21 [c=4 l=4] *movdi_64bit/4
> > jr $31 # 44 [c=0 l=4] *simple_return
> > daddu $2,$2,$5 # 29 [c=4 l=4] *adddi3/1
> >
> > (hmm, I wonder why the cost for the high-part RTX is over twice that for
> > the low-part one; this seems outright wrong, also taking the possibility
> > of fusing into account).
>
> They might be different, if the wide multiply is implemented with multiple
> narrow ones then the high result bits don't need to be generated if only
> the low result bits are needed.
Well, it's GCC that has DImode multiplication in `muldi3_mul3_nohilo' RTX
but then TImode one combined with a shift and a truncation operation in
`umuldi3_highpart_r6' RTX, and then applies some generic cost figures to
the respective complete expression. Instead the MIPS backend ought to
provide the correct cost in both cases.
Given the technology involved with MIPS MDUs I'd expect the same latency
for both operations (DMULT/U used to produce both parts in one operation,
but required a dedicated MDU accumulator register, which complicated both
the pipeline and instruction scheduling in the compiler), and indeed e.g.
the figures for the MIPS I6500 CPU give the latency of 4 for both DMUL/U
and DMUH/U each. That would be 16 in terms of GCC insn costs, as that's
cycles multplied by 4 so as to allow "fractional" costs in special cases,
and while using 20 instead is not too bad, the value of 44 is way off as
it's almost triple the actual cost.
Incidentally, the repeat rate is 1 for all these instructions, so the
multiplier is fully pipelined in the I6500 implementation. No fusion is
mentioned though.
> If those are gcc's costs I suspect they may not match reality, after all they
> usually only have to be 'good enough' or 'reasonable'.
Well, they need to be good enough for the compiler not to come up with a
worse alternative, such as e.g. with repeated addition when one of the
operands is immediate.
Maciej
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-01-14 20:59 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-01-13 17:59 mips64-linux-ld: div64.c:undefined reference to `__multi3' kernel test robot
2026-01-13 20:04 ` David Laight
2026-01-13 21:58 ` David Laight
2026-01-14 6:19 ` Maciej W. Rozycki
2026-01-14 10:31 ` David Laight
2026-01-14 15:50 ` Maciej W. Rozycki
2026-01-14 17:34 ` David Laight
2026-01-14 20:59 ` Maciej W. Rozycki
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox