Re: [PATCH v3 net-next 08/14] mlx4: use order-0 pages for RX

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Tariq Toukan <tariqt@mellanox.com>
To: Tom Herbert <tom@herbertland.com>, Eric Dumazet <eric.dumazet@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>,
	Jesper Dangaard Brouer <brouer@redhat.com>,
	Alexander Duyck <alexander.duyck@gmail.com>,
	"David S . Miller" <davem@davemloft.net>,
	netdev <netdev@vger.kernel.org>,
	Tariq Toukan <tariqt@mellanox.com>,
	Martin KaFai Lau <kafai@fb.com>,
	Saeed Mahameed <saeedm@mellanox.com>,
	Willem de Bruijn <willemb@google.com>,
	Brenden Blanco <bblanco@plumgrid.com>,
	Alexei Starovoitov <ast@kernel.org>,
	linux-mm <linux-mm@kvack.org>
Subject: Re: [PATCH v3 net-next 08/14] mlx4: use order-0 pages for RX
Date: Wed, 15 Feb 2017 18:42:14 +0200	[thread overview]
Message-ID: <ccc4cb9e-9863-02e1-2789-4869aea3c661@mellanox.com> (raw)
In-Reply-To: <CALx6S3530_2DYU-3VRmvRYZ3n05OqJZpJ3x02vXQd6Q7FUJQvw@mail.gmail.com>



On 14/02/2017 7:29 PM, Tom Herbert wrote:
> On Tue, Feb 14, 2017 at 7:51 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> On Tue, 2017-02-14 at 16:56 +0200, Tariq Toukan wrote:
>>
>>> As the previous series caused hangs, we must run functional regression
>>> tests over this series as well.
>>> Run has already started, and results will be available tomorrow morning.
>>>
>>> In general, I really like this series. The re-factorization looks more
>>> elegant and more correct, functionally.
>>>
>>> However, performance wise: we fear that the numbers will be drastically
>>> lower with this transition to order-0 pages,
>>> because of the (becoming critical) page allocator and dma operations
>>> bottlenecks, especially on systems with costly
>>> dma operations, such as ARM, iommu=on, etc...
>>>
>> So, again, performance after this patch series his higher,
>> once you have sensible RX queues parameters, for the expected workload.
>>
>> Only in pathological cases, you might have some regression.
>>
>> The old schem was _maybe_ better _when_ memory is not fragmented.
>>
>> When you run hosts for months, memory _is_ fragmented.
>>
>> You never see that on benchmarks, unless you force memory being
>> fragmented.
>>
>>
>>
>>> We already have this exact issue in mlx5, where we moved to order-0
>>> allocations with a fixed size cache, but that was not enough.
>>> Customers of mlx5 have already complained about the performance
>>> degradation, and currently this is hurting our business.
>>> We get a clear nack from our performance regression team regarding doing
>>> the same in mlx4.
>>> So, the question is, can we live with this degradation until those
>>> bottleneck challenges are addressed?
>> Again, there is no degradation.
>>
>> We have been using order-0 pages for years at Google.
>>
>> Only when we made the mistake to rebase from the upstream driver and
>> order-3 pages we got horrible regressions, causing production outages.
>>
>> I was silly to believe that mm layer got better.
>>
>>> Following our perf experts feedback, I cannot just simply Ack. We need
>>> to have a clear plan to close the perf gap or reduce the impact.
>> Your perf experts need to talk to me, or any experts at Google and
>> Facebook, really.
>>
> I agree with this 100%! To be blunt, power users like this are testing
> your drivers far beyond what Mellanox is doing and understand how
> performance gains in benchmarks translate to possible gains in real
> production way more than your perf experts can. Listen to Eric!
>
> Tom
>
>
>> Anything _relying_ on order-3 pages being available to impress
>> friends/customers is a lie.

Isn't it the same principle in page_frag_alloc() ?
It is called form __netdev_alloc_skb()/__napi_alloc_skb().

Why is it ok to have order-3 pages (PAGE_FRAG_CACHE_MAX_ORDER) there?
By using netdev/napi_alloc_skb, you'll get that the SKB's linear data is 
a frag of a huge page,
and it is not going to be freed before the other non-linear frags.
Cannot this cause the same threats (memory pinning and so...)?

Currently, mlx4 doesn't use this generic API, while most other drivers do.

Similar claims are true for TX:
https://github.com/torvalds/linux/commit/5640f7685831e088fe6c2e1f863a6805962f8e81

>>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2017-02-15 16:42 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20170213195858.5215-1-edumazet@google.com>
     [not found] ` <20170213195858.5215-9-edumazet@google.com>
     [not found]   ` <CAKgT0Ufx0Y=9kjLax36Gx4e7Y-A7sKZDNYxgJ9wbCT4_vxHhGA@mail.gmail.com>
     [not found]     ` <CANn89iLkPB_Dx1L2dFfwOoeXOmPhu_C3OO2yqZi8+Rvjr=-EtA@mail.gmail.com>
     [not found]       ` <CAKgT0UeB_e_Z7LM1_r=en8JJdgLhoYFstWpCDQN6iawLYZJKDA@mail.gmail.com>
2017-02-14 12:12         ` Jesper Dangaard Brouer
2017-02-14 13:45           ` Eric Dumazet
2017-02-14 14:12             ` Eric Dumazet
2017-02-14 14:56             ` Tariq Toukan
2017-02-14 15:51               ` Eric Dumazet
2017-02-14 16:03                 ` Eric Dumazet
2017-02-14 17:29                 ` Tom Herbert
2017-02-15 16:42                   ` Tariq Toukan [this message]
2017-02-15 16:57                     ` Eric Dumazet
2017-02-16 13:08                       ` Tariq Toukan
2017-02-16 15:47                         ` Eric Dumazet
2017-02-16 17:05                         ` Tom Herbert
2017-02-16 17:11                           ` Eric Dumazet
2017-02-16 20:49                             ` Saeed Mahameed
2017-02-16 19:03                           ` David Miller
2017-02-16 21:06                             ` Saeed Mahameed
2017-02-14 17:04               ` David Miller
2017-02-14 17:17                 ` David Laight
2017-02-14 17:22                   ` David Miller
2017-02-14 19:38                 ` Jesper Dangaard Brouer
2017-02-14 19:59                   ` David Miller
2017-02-14 17:29               ` Alexander Duyck
2017-02-14 18:46                 ` Jesper Dangaard Brouer
2017-02-14 19:02                   ` Eric Dumazet
2017-02-14 20:02                     ` Jesper Dangaard Brouer
2017-02-14 21:56                       ` Eric Dumazet
2017-02-14 19:06                   ` Alexander Duyck
2017-02-14 19:50                     ` Jesper Dangaard Brouer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ccc4cb9e-9863-02e1-2789-4869aea3c661@mellanox.com \
    --to=tariqt@mellanox.com \
    --cc=alexander.duyck@gmail.com \
    --cc=ast@kernel.org \
    --cc=bblanco@plumgrid.com \
    --cc=brouer@redhat.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=eric.dumazet@gmail.com \
    --cc=kafai@fb.com \
    --cc=linux-mm@kvack.org \
    --cc=netdev@vger.kernel.org \
    --cc=saeedm@mellanox.com \
    --cc=tom@herbertland.com \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox