linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Konstantin Khlebnikov <koct9i@gmail.com>
To: Rik van Riel <riel@redhat.com>
Cc: linux-mm <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Tim Hartrick <tim@edgecast.com>,
	Daniel Forrest <dan.forrest@ssec.wisc.edu>,
	Hugh Dickins <hughd@google.com>,
	Michel Lespinasse <walken@google.com>,
	Vlastimil Babka <vbabka@suse.cz>
Subject: Re: [PATCH v3] mm: prevent endless growth of anon_vma hierarchy
Date: Thu, 27 Nov 2014 00:20:14 +0400	[thread overview]
Message-ID: <CALYGNiOy1vbm82Yy93__g+Pszgc_zUaemR+d=06axfaTVywYew@mail.gmail.com> (raw)
In-Reply-To: <54762A73.4080801@redhat.com>

On Wed, Nov 26, 2014 at 10:30 PM, Rik van Riel <riel@redhat.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 11/26/2014 01:11 PM, Konstantin Khlebnikov wrote:
>
>> diff --git a/include/linux/rmap.h b/include/linux/rmap.h index
>> c0c2bce..b1d140c 100644 --- a/include/linux/rmap.h +++
>> b/include/linux/rmap.h @@ -45,6 +45,22 @@ struct anon_vma { *
>> mm_take_all_locks() (mm_all_locks_mutex). */ struct rb_root
>> rb_root;      /* Interval tree of private "related" vmas */ + +       /* +     *
>> Count of child anon_vmas and VMAs which points to this anon_vma. +
>> * +    * This counter is used for making decision about reusing old
>> anon_vma +     * instead of forking new one. It allows to detect
>> anon_vmas which have +         * just one direct descendant and no vmas.
>> Reusing such anon_vma not +    * leads to significant preformance
>> regression but prevents degradation +  * of anon_vma hierarchy to
>> endless linear chain. +        * +     * Root anon_vma is never reused
>> because it is its own parent and it has +      * at leat one vma or
>> child, thus at fork it's degree is at least 2. +       */ +   unsigned
>> degree; + +   struct anon_vma *parent;        /* Parent of this anon_vma */
>> };
>
> Could this be put earlier in the struct, so the "unsigned degree" can be
> packed into the same long word as the spinlock, on 64 bit systems?
>
> Otherwise there are two 4-byte entities in the struct, each of which get
> padded out to 8 bytes.

As I see, rw-semaphore is used here now.
But here is 4-byte padding after atomic counter.
So we could shrink structure from 8 to 7 64-bit words.

>
>
>> diff --git a/mm/rmap.c b/mm/rmap.c index 19886fb..df5c44e 100644
>> --- a/mm/rmap.c +++ b/mm/rmap.c @@ -72,6 +72,8 @@ static inline
>> struct anon_vma *anon_vma_alloc(void) anon_vma =
>> kmem_cache_alloc(anon_vma_cachep, GFP_KERNEL); if (anon_vma) {
>> atomic_set(&anon_vma->refcount, 1); +         anon_vma->degree = 1;   /*
>> Reference for first vma */ +          anon_vma->parent = anon_vma; /* *
>> Initialise the anon_vma root to point to itself. If called * from
>> fork, the root will be reset to the parents anon_vma. @@ -188,6
>> +190,8 @@ int anon_vma_prepare(struct vm_area_struct *vma) if
>> (likely(!vma->anon_vma)) { vma->anon_vma = anon_vma;
>> anon_vma_chain_link(vma, avc, anon_vma); +                    /* vma link if merged
>> or child link for new root */ +                       anon_vma->degree++; allocated =
>> NULL; avc = NULL; } @@ -256,7 +260,17 @@ int anon_vma_clone(struct
>> vm_area_struct *dst, struct vm_area_struct *src) anon_vma =
>> pavc->anon_vma; root = lock_anon_vma_root(root, anon_vma);
>> anon_vma_chain_link(dst, avc, anon_vma); + +          /* +             * Reuse
>> existing anon_vma if its degree lower than two, +              * that means it
>> has no vma and just one anon_vma child. +              */ +           if
>> (!dst->anon_vma && anon_vma != src->anon_vma && +
>> anon_vma->degree < 2) +                       dst->anon_vma = anon_vma; }
>
> Why can src->anon_vma not be reused if it still not shared with
> any other task?

First child will see parent anon-vma with degree counter == 1.
Child haven't incremented this counter yet.
It could be shared but this isn't necessary for fixing original problem.
So I've decided to minimized influence.

>
> Would it be more readable to use a "reuse_anon_vma" pointer here,
> and assign dst->anon_vma the value of reuse_anon_vma if we choose
> to reuse one?
>
> Assigning different things to dst->anon_vma looks a little confusing.

What different things? It always points to anon_vma, but sometimes it's NULL.

>
> Would it make sense to rename anon_vma->degree to anon_vma->sharing
> or anon_vma->shared, or even anon_vma->users, to indicate that it is
> a counter of how many VMAs are sharing this anon_vma?

It counts two different things: vmas and directly descended anon_vmas.
So I give it abstract name to prevent misunderstanding.
For example, if it's 1 that doesn't means that here is just one user:
that might be one child anon_vma and a lot of grandchilds.

>
>> +     if (dst->anon_vma) +            dst->anon_vma->degree++;
>> unlock_anon_vma_root(root); return 0;
>>
>> @@ -279,6 +293,9 @@ int anon_vma_fork(struct vm_area_struct *vma,
>> struct vm_area_struct *pvma) if (!pvma->anon_vma) return 0;
>>
>> +     /* Drop inherited anon_vma, we'll reuse old one or allocate new.
>> */ +  vma->anon_vma = NULL;
>
> Use of a temporary variable in anon_vma_clone() would avoid this.

anon_vma_clone() is used in some other places where we clone vma
without creating new level: for example in vma_split. This trick allows
to keep interface untouched.

>
>> @@ -286,6 +303,10 @@ int anon_vma_fork(struct vm_area_struct *vma,
>> struct vm_area_struct *pvma) if (anon_vma_clone(vma, pvma)) return
>> -ENOMEM;
>>
>> +     /* An old anon_vma has been reused. */
>
> s/old/existing/  ?

or maybe antecedent

>
>> +     if (vma->anon_vma) +            return 0; + /* Then add our own anon_vma.
>> */ anon_vma = anon_vma_alloc(); if (!anon_vma)
>
> Overall the patch looks good.
>
> - --
> All rights reversed
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1
>
> iQEcBAEBAgAGBQJUdipzAAoJEM553pKExN6Db2kH/20CfKy2ntayKb03tqYnlohu
> OUxtCwqiow8XsfYc2cEBrCznCNPD5B0sdDdEgRWybBnRCikHNQS4vUBhNl/F13gS
> Hu8LM+RElhZwr69cCshUXefIx5xMKimUeAsHutpvy09onZy0DvYutdR958/Nhca/
> 1OjXqtE+LbPd0aG87OQQlagk4DQls0uA2l609qBRKsfMgm2444MRPAN0RGQwlYIv
> SENvzVFN4ZvIdzsU8IoSw2EkhBankYDKbbTxAy+sHCHaxKzq0eKn+JgRaoZLjxU9
> +43snI/fkWNN+S5KLgshUKIVO84kAmRAIfdKUjt/DYYkOj6YPp48aJnOKVFwjIY=
> =Bhp4
> -----END PGP SIGNATURE-----

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2014-11-26 20:20 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-26 18:11 Konstantin Khlebnikov
2014-11-26 19:30 ` Rik van Riel
2014-11-26 20:20   ` Konstantin Khlebnikov [this message]
2014-11-26 21:05 ` Daniel Forrest
2014-11-26 22:35   ` Andrew Morton
2014-11-27  9:13   ` Michal Hocko
2014-12-16 10:42 ` Michal Hocko
2014-12-16 23:50   ` Andrew Morton
2014-12-17  9:04     ` Konstantin Khlebnikov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALYGNiOy1vbm82Yy93__g+Pszgc_zUaemR+d=06axfaTVywYew@mail.gmail.com' \
    --to=koct9i@gmail.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=dan.forrest@ssec.wisc.edu \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=riel@redhat.com \
    --cc=tim@edgecast.com \
    --cc=vbabka@suse.cz \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox