Re: mips64-linux-ld: div64.c:undefined reference to `__multi3'

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: David Laight <david.laight.linux@gmail.com>
To: "Maciej W. Rozycki" <macro@orcam.me.uk>
Cc: kernel test robot <lkp@intel.com>,
	oe-kbuild-all@lists.linux.dev, linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux Memory Management List <linux-mm@kvack.org>,
	Nicolas Pitre <npitre@baylibre.com>,
	linux-mips@vger.kernel.org,
	Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Subject: Re: mips64-linux-ld: div64.c:undefined reference to `__multi3'
Date: Wed, 14 Jan 2026 17:34:35 +0000	[thread overview]
Message-ID: <20260114173435.51cf556d@pumpkin> (raw)
In-Reply-To: <alpine.DEB.2.21.2601141530510.6421@angie.orcam.me.uk>

On Wed, 14 Jan 2026 15:50:09 +0000 (GMT)
"Maciej W. Rozycki" <macro@orcam.me.uk> wrote:

> On Wed, 14 Jan 2026, David Laight wrote:
> 
> > > > Looking at the git log for that file there is a comment that includes:
> > > > 	"we wouldn't expect any calls to __multi3 to be generated from
> > > > 	 kernel code".
> > > > Not true....
> > > > Not sure why the link didn't fail before though, something subtle must
> > > > have changed.
> > > > 
> > > > I think the fix is just to remove the gcc version check.    
> > > 
> > >  Or rather fix the version check.  The GCC fix went in with GCC 10:  
> > 
> > Does that mean the GCC 10 generates the multiply instructions and never calls
> > __multi3?
> > (Rather than just not using __multi3() for that specific example.)  
> 
>  Of course it still does call `__multi3' for 128x128bit multiplication.  
> It doesn't for widening 64x64bit one though, which was a missed case for 
> MIPS64r6 only, having been supported by GCC ever since MIPS III ISA.  I 
> think we do want to fail link in the 128x128bit case.

That's fine by me.
I only get blamed for the widening one :-)

...
>  Distinct RTL insns are produced, so all the usual RTL optimisations 
> apply (in addition to any tree optimisations already made):
> 
> mul_u64_u64_add_u64:
> 	.frame	$sp,0,$31		# vars= 0, regs= 0/0, args= 0, gp= 0
> 	.mask	0x00000000,0
> 	.fmask	0x00000000,0
> 	.set	noreorder
> 	.set	nomacro
> 	dmul	$2,$5,$6	 # 9	[c=20 l=4]  muldi3_mul3_nohilo
> 	dmuhu	$5,$5,$6	 # 10	[c=44 l=4]  umuldi3_highpart_r6
> 	daddu	$7,$2,$7	 # 14	[c=4 l=4]  *adddi3/1
> 	sltu	$2,$7,$2	 # 16	[c=4 l=4]  *sltu_didi
> 	sd	$7,0($4)	 # 21	[c=4 l=4]  *movdi_64bit/4
> 	jr	$31	 # 44	[c=0 l=4]  *simple_return
> 	daddu	$2,$2,$5	 # 29	[c=4 l=4]  *adddi3/1
> 
> (hmm, I wonder why the cost for the high-part RTX is over twice that for 
> the low-part one; this seems outright wrong, also taking the possibility 
> of fusing into account).

They might be different, if the wide multiply is implemented with multiple
narrow ones then the high result bits don't need to be generated if only
the low result bits are needed.
If the data is clocked through a single multiplier that might be significant,
but probably not double (I think you need 3/4 of the products).
If the results of separate narrow multipliers have to be added together
then the carry-ripple of all the adds might make a small difference, but I'd
only expect 1 (perhaps 2) clocks for that.
If those are gcc's costs I suspect they may not match reality, after all they
usually only have to be 'good enough' or 'reasonable'.

I nearly got the 32x32 multiply to run in a single clock on my Nios-II
re-implementation, but the 64bit ripple carry delayed things too much for
the other logic that needed to happen to feed the product back into the ALU.
The product itself could be latched - so it wasn't far off.
I think I could have fed back the low bits, but that would have been
complicated. Detecting 'short' (18bit by 18bit) unsigned multiplies
would have been more use - they are common for array indexes.
(That was a 'fun' project... Nios-II is, AFICT, basically MIPS 32.)

	David

> 
>   Maciej
>

next prev parent reply	other threads:[~2026-01-14 17:34 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-13 17:59 kernel test robot
2026-01-13 20:04 ` David Laight
2026-01-13 21:58   ` David Laight
2026-01-14  6:19   ` Maciej W. Rozycki
2026-01-14 10:31     ` David Laight
2026-01-14 15:50       ` Maciej W. Rozycki
2026-01-14 17:34         ` David Laight [this message]
2026-01-14 20:59           ` Maciej W. Rozycki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260114173435.51cf556d@pumpkin \
    --to=david.laight.linux@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mips@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkp@intel.com \
    --cc=macro@orcam.me.uk \
    --cc=npitre@baylibre.com \
    --cc=oe-kbuild-all@lists.linux.dev \
    --cc=tsbogend@alpha.franken.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox