linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mateusz Guzik <mjguzik@gmail.com>
To: "Liam R. Howlett" <Liam.Howlett@oracle.com>,
	akpm@linux-foundation.org,  vbabka@suse.cz, lstoakes@gmail.com,
	linux-kernel@vger.kernel.org,  linux-mm@kvack.org
Subject: Re: [PATCH v2] mm: batch unlink_file_vma calls in free_pgd_range
Date: Wed, 22 May 2024 19:22:25 +0200	[thread overview]
Message-ID: <fxrzu3h6qb7mptx4av4e7k55iod6amaob75tisg75eg2x3jmpk@2nkifqbb4yiz> (raw)
In-Reply-To: <v4k3u3h5b4xkss3qlltfqnlmobbihzoelqhnmjbhc57jup52wp@csaqg7h45co2>

On Wed, May 22, 2024 at 11:19:45AM -0400, Liam R. Howlett wrote:
> * Mateusz Guzik <mjguzik@gmail.com> [240521 19:43]:
> > Execs of dynamically linked binaries at 20-ish cores are bottlenecked on
> > the i_mmap_rwsem semaphore, while the biggest singular contributor is
> > free_pgd_range inducing the lock acquire back-to-back for all
> > consecutive mappings of a given file.
> > 
> > Tracing the count of said acquires while building the kernel shows:
> > [1, 2)     799579 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> > [2, 3)          0 |                                                    |
> > [3, 4)       3009 |                                                    |
> > [4, 5)       3009 |                                                    |
> > [5, 6)     326442 |@@@@@@@@@@@@@@@@@@@@@                               |
> > 
> > So in particular there were 326442 opportunities to coalesce 5 acquires
> > into 1.
> > 
> > Doing so increases execs per second by 4% (~50k to ~52k) when running
> > the benchmark linked below.
> > 
> > The lock remains the main bottleneck, I have not looked at other spots
> > yet.
> 
> Thanks.  This change is compact and allows for a performance gain.  It
> looks good to me.
> 
> I guess this would cause a regression on single mappings, probably
> within the noise and probably not a real work load.  Just something to
> keep in mind to check if the bots yell about some contrived benchmark.
> 

Trivial tidy ups can be done should someone be adamant there is a
slowdown and it needs to be recouped, starting with inlining the new
routines (apart from unlink_file_vma_batch_process).

> On that note, kernel/fork.c uses this lock for each cloned vma right
> now.  If you saved the file pointer in your struct, it could be used
> for bulk add as well.  The only complication I see is the insert order
> being inserted "just after mpnt", maybe a bulk add version of the struct
> would need two lists of vmas - if the size of the struct is of concern,
> I don't think it would be.
> 

Looks like it would need a different spin on batching than the one
implemented above.

Maybe I'll get around to this some time early next month.

> > @@ -131,6 +131,47 @@ void unlink_file_vma(struct vm_area_struct *vma)
> >  	}
> >  }
> >  
> > +void unlink_file_vma_batch_init(struct unlink_vma_file_batch *vb)
> > +{
> > +	vb->count = 0;
> > +}
> > +
> > +static void unlink_file_vma_batch_process(struct unlink_vma_file_batch *vb)
> > +{
> > +	struct address_space *mapping;
> > +	int i;
> > +
> > +	mapping = vb->vmas[0]->vm_file->f_mapping;
> > +	i_mmap_lock_write(mapping);
> > +	for (i = 0; i < vb->count; i++) {
> > +		VM_WARN_ON_ONCE(vb->vmas[i]->vm_file->f_mapping != mapping);
> > +		__remove_shared_vm_struct(vb->vmas[i], mapping);
> > +	}
> > +	i_mmap_unlock_write(mapping);
> > +
> > +	unlink_file_vma_batch_init(vb);
> > +}
> > +
> > +void unlink_file_vma_batch_add(struct unlink_vma_file_batch *vb,
> > +			       struct vm_area_struct *vma)
> > +{
> > +	if (vma->vm_file == NULL)
> > +		return;
> > +
> 
> It might be worth a comment about count being always ahead of the last
> vma in the array.  On first glance, I was concerned about an off-by-one
> here (and in the process function).  But maybe it's just me, the
> increment is pretty close to this statement - I had to think about
> ARRAY_SIZE() here.
> 

I think that's upgringing on different codebases.

Idiomatic array iteration of n elements being "for (i = 0; i < n; i++)"
to me makes the below assignment + counter bump pair obviously correct.

That is to say some other arrangement would require me to do a double
take. :)

> > +	if ((vb->count > 0 && vb->vmas[0]->vm_file != vma->vm_file) ||
> > +	    vb->count == ARRAY_SIZE(vb->vmas))
> 
> Since you are checking vm_file and only support a single vm_file in this
> version, it might be worth saving it in your unlink_vma_file_batch
> struct.  It could also be used in the processing to reduce dereferencing
> to f_mappings.
> 
> I'm not sure if this is worth it with modern cpus, though.  I'm just
> thinking that this step is executed the most so any speedup here will
> help you.
> 

I had it originally but it imo uglified the code.

> Feel free to add
> 
> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> 

thanks


      reply	other threads:[~2024-05-22 17:23 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-21 23:43 Mateusz Guzik
2024-05-22 15:19 ` Liam R. Howlett
2024-05-22 17:22   ` Mateusz Guzik [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fxrzu3h6qb7mptx4av4e7k55iod6amaob75tisg75eg2x3jmpk@2nkifqbb4yiz \
    --to=mjguzik@gmail.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lstoakes@gmail.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox