History log of /linux/include/uapi/linux/if_xdp.h (Results 1 – 25 of 106)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7
# 06d07429 29-Feb-2024 Jani Nikula <jani.nikula@intel.com>

Merge drm/drm-next into drm-intel-next

Sync to get the drm_printer changes to drm-intel-next.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>


Revision tags: v6.8-rc6, v6.8-rc5
# 41c177cf 11-Feb-2024 Rob Clark <robdclark@chromium.org>

Merge tag 'drm-misc-next-2024-02-08' into msm-next

Merge the drm-misc tree to uprev MSM CI.

Signed-off-by: Rob Clark <robdclark@chromium.org>


Revision tags: v6.8-rc4, v6.8-rc3
# 4db102dc 29-Jan-2024 Maxime Ripard <mripard@kernel.org>

Merge drm/drm-next into drm-misc-next

Kickstart 6.9 development cycle.

Signed-off-by: Maxime Ripard <mripard@kernel.org>


Revision tags: v6.8-rc2
# be3382ec 23-Jan-2024 Lucas De Marchi <lucas.demarchi@intel.com>

Merge drm/drm-next into drm-xe-next

Sync to v6.8-rc1.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>


# 03c11eb3 14-Feb-2024 Ingo Molnar <mingo@kernel.org>

Merge tag 'v6.8-rc4' into x86/percpu, to resolve conflicts and refresh the branch

Conflicts:
arch/x86/include/asm/percpu.h
arch/x86/include/asm/text-patching.h

Signed-off-by: Ingo Molnar <mingo@k

Merge tag 'v6.8-rc4' into x86/percpu, to resolve conflicts and refresh the branch

Conflicts:
arch/x86/include/asm/percpu.h
arch/x86/include/asm/text-patching.h

Signed-off-by: Ingo Molnar <mingo@kernel.org>

show more ...


# 42ac0be1 26-Jan-2024 Ingo Molnar <mingo@kernel.org>

Merge branch 'linus' into x86/mm, to refresh the branch and pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>


Revision tags: v6.8-rc1
# fe33c0fb 17-Jan-2024 Andrew Morton <akpm@linux-foundation.org>

Merge branch 'master' into mm-hotfixes-stable


# cf79f291 22-Jan-2024 Maxime Ripard <mripard@kernel.org>

Merge v6.8-rc1 into drm-misc-fixes

Let's kickstart the 6.8 fix cycle.

Signed-off-by: Maxime Ripard <mripard@kernel.org>


Revision tags: v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6
# a1c613ae 24-Oct-2023 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Merge drm/drm-next into drm-intel-gt-next

Work that needs to land in drm-intel-gt-next depends on two patches only
present in drm-intel-next, absence of which is causing a merge conflict:

3b918f4

Merge drm/drm-next into drm-intel-gt-next

Work that needs to land in drm-intel-gt-next depends on two patches only
present in drm-intel-next, absence of which is causing a merge conflict:

3b918f4f0c8b ("drm/i915/pxp: Optimize GET_PARAM:PXP_STATUS")
ac765b7018f6 ("drm/i915/pxp/mtl: intel_pxp_init_hw needs runtime-pm inside pm-complete")

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

show more ...


# 3e7aeb78 11-Jan-2024 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'net-next-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Paolo Abeni:
"The most interesting thing is probably the networking structs

Merge tag 'net-next-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Paolo Abeni:
"The most interesting thing is probably the networking structs
reorganization and a significant amount of changes is around
self-tests.

Core & protocols:

- Analyze and reorganize core networking structs (socks, netdev,
netns, mibs) to optimize cacheline consumption and set up build
time warnings to safeguard against future header changes

This improves TCP performances with many concurrent connections up
to 40%

- Add page-pool netlink-based introspection, exposing the memory
usage and recycling stats. This helps indentify bad PP users and
possible leaks

- Refine TCP/DCCP source port selection to no longer favor even
source port at connect() time when IP_LOCAL_PORT_RANGE is set. This
lowers the time taken by connect() for hosts having many active
connections to the same destination

- Refactor the TCP bind conflict code, shrinking related socket
structs

- Refactor TCP SYN-Cookie handling, as a preparation step to allow
arbitrary SYN-Cookie processing via eBPF

- Tune optmem_max for 0-copy usage, increasing the default value to
128KB and namespecifying it

- Allow coalescing for cloned skbs coming from page pools, improving
RX performances with some common configurations

- Reduce extension header parsing overhead at GRO time

- Add bridge MDB bulk deletion support, allowing user-space to
request the deletion of matching entries

- Reorder nftables struct members, to keep data accessed by the
datapath first

- Introduce TC block ports tracking and use. This allows supporting
multicast-like behavior at the TC layer

- Remove UAPI support for retired TC qdiscs (dsmark, CBQ and ATM) and
classifiers (RSVP and tcindex)

- More data-race annotations

- Extend the diag interface to dump TCP bound-only sockets

- Conditional notification of events for TC qdisc class and actions

- Support for WPAN dynamic associations with nearby devices, to form
a sub-network using a specific PAN ID

- Implement SMCv2.1 virtual ISM device support

- Add support for Batman-avd mulicast packet type

BPF:

- Tons of verifier improvements:
- BPF register bounds logic and range support along with a large
test suite
- log improvements
- complete precision tracking support for register spills
- track aligned STACK_ZERO cases as imprecise spilled registers.
This improves the verifier "instructions processed" metric from
single digit to 50-60% for some programs
- support for user's global BPF subprogram arguments with few
commonly requested annotations for a better developer
experience
- support tracking of BPF_JNE which helps cases when the compiler
transforms (unsigned) "a > 0" into "if a == 0 goto xxx" and the
like
- several fixes

- Add initial TX metadata implementation for AF_XDP with support in
mlx5 and stmmac drivers. Two types of offloads are supported right
now, that is, TX timestamp and TX checksum offload

- Fix kCFI bugs in BPF all forms of indirect calls from BPF into
kernel and from kernel into BPF work with CFI enabled. This allows
BPF to work with CONFIG_FINEIBT=y

- Change BPF verifier logic to validate global subprograms lazily
instead of unconditionally before the main program, so they can be
guarded using BPF CO-RE techniques

- Support uid/gid options when mounting bpffs

- Add a new kfunc which acquires the associated cgroup of a task
within a specific cgroup v1 hierarchy where the latter is
identified by its id

- Extend verifier to allow bpf_refcount_acquire() of a map value
field obtained via direct load which is a use-case needed in
sched_ext

- Add BPF link_info support for uprobe multi link along with bpftool
integration for the latter

- Support for VLAN tag in XDP hints

- Remove deprecated bpfilter kernel leftovers given the project is
developed in user-space (https://github.com/facebook/bpfilter)

Misc:

- Support for parellel TC self-tests execution

- Increase MPTCP self-tests coverage

- Updated the bridge documentation, including several so-far
undocumented features

- Convert all the net self-tests to run in unique netns, to avoid
random failures due to conflict and allow concurrent runs

- Add TCP-AO self-tests

- Add kunit tests for both cfg80211 and mac80211

- Autogenerate Netlink families documentation from YAML spec

- Add yml-gen support for fixed headers and recursive nests, the tool
can now generate user-space code for all genetlink families for
which we have specs

- A bunch of additional module descriptions fixes

- Catch incorrect freeing of pages belonging to a page pool

Driver API:

- Rust abstractions for network PHY drivers; do not cover yet the
full C API, but already allow implementing functional PHY drivers
in rust

- Introduce queue and NAPI support in the netdev Netlink interface,
allowing complete access to the device <> NAPIs <> queues
relationship

- Introduce notifications filtering for devlink to allow control
application scale to thousands of instances

- Improve PHY validation, requesting rate matching information for
each ethtool link mode supported by both the PHY and host

- Add support for ethtool symmetric-xor RSS hash

- ACPI based Wifi band RFI (WBRF) mitigation feature for the AMD
platform

- Expose pin fractional frequency offset value over new DPLL generic
netlink attribute

- Convert older drivers to platform remove callback returning void

- Add support for PHY package MMD read/write

New hardware / drivers:

- Ethernet:
- Octeon CN10K devices
- Broadcom 5760X P7
- Qualcomm SM8550 SoC
- Texas Instrument DP83TG720S PHY

- Bluetooth:
- IMC Networks Bluetooth radio

Removed:

- WiFi:
- libertas 16-bit PCMCIA support
- Atmel at76c50x drivers
- HostAP ISA/PCMCIA style 802.11b driver
- zd1201 802.11b USB dongles
- Orinoco ISA/PCMCIA 802.11b driver
- Aviator/Raytheon driver
- Planet WL3501 driver
- RNDIS USB 802.11b driver

Driver updates:

- Ethernet high-speed NICs:
- Intel (100G, ice, idpf):
- allow one by one port representors creation and removal
- add temperature and clock information reporting
- add get/set for ethtool's header split ringparam
- add again FW logging
- adds support switchdev hardware packet mirroring
- iavf: implement symmetric-xor RSS hash
- igc: add support for concurrent physical and free-running
timers
- i40e: increase the allowable descriptors
- nVidia/Mellanox:
- Preparation for Socket-Direct multi-dev netdev. That will
allow in future releases combining multiple PFs devices
attached to different NUMA nodes under the same netdev
- Broadcom (bnxt):
- TX completion handling improvements
- add basic ntuple filter support
- reduce MSIX vectors usage for MQPRIO offload
- add VXLAN support, USO offload and TX coalesce completion
for P7
- Marvell Octeon EP:
- xmit-more support
- add PF-VF mailbox support and use it for FW notifications
for VFs
- Wangxun (ngbe/txgbe):
- implement ethtool functions to operate pause param, ring
param, coalesce channel number and msglevel
- Netronome/Corigine (nfp):
- add flow-steering support
- support UDP segmentation offload

- Ethernet NICs embedded, slower, virtual:
- Xilinx AXI: remove duplicate DMA code adopting the dma engine
driver
- stmmac: add support for HW-accelerated VLAN stripping
- TI AM654x sw: add mqprio, frame preemption & coalescing
- gve: add support for non-4k page sizes.
- virtio-net: support dynamic coalescing moderation

- nVidia/Mellanox Ethernet datacenter switches:
- allow firmware upgrade without a reboot
- more flexible support for bridge flooding via the compressed
FID flooding mode

- Ethernet embedded switches:
- Microchip:
- fine-tune flow control and speed configurations in KSZ8xxx
- KSZ88X3: enable setting rmii reference
- Renesas:
- add jumbo frames support
- Marvell:
- 88E6xxx: add "eth-mac" and "rmon" stats support

- Ethernet PHYs:
- aquantia: add firmware load support
- at803x: refactor the driver to simplify adding support for more
chip variants
- NXP C45 TJA11xx: Add MACsec offload support

- Wifi:
- MediaTek (mt76):
- NVMEM EEPROM improvements
- mt7996 Extremely High Throughput (EHT) improvements
- mt7996 Wireless Ethernet Dispatcher (WED) support
- mt7996 36-bit DMA support
- Qualcomm (ath12k):
- support for a single MSI vector
- WCN7850: support AP mode
- Intel (iwlwifi):
- new debugfs file fw_dbg_clear
- allow concurrent P2P operation on DFS channels

- Bluetooth:
- QCA2066: support HFP offload
- ISO: more broadcast-related improvements
- NXP: better recovery in case receiver/transmitter get out of sync"

* tag 'net-next-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1714 commits)
lan78xx: remove redundant statement in lan78xx_get_eee
lan743x: remove redundant statement in lan743x_ethtool_get_eee
bnxt_en: Fix RCU locking for ntuple filters in bnxt_rx_flow_steer()
bnxt_en: Fix RCU locking for ntuple filters in bnxt_srxclsrldel()
bnxt_en: Remove unneeded variable in bnxt_hwrm_clear_vnic_filter()
tcp: Revert no longer abort SYN_SENT when receiving some ICMP
Revert "mlx5 updates 2023-12-20"
Revert "net: stmmac: Enable Per DMA Channel interrupt"
ipvlan: Remove usage of the deprecated ida_simple_xx() API
ipvlan: Fix a typo in a comment
net/sched: Remove ipt action tests
net: stmmac: Use interrupt mode INTM=1 for per channel irq
net: stmmac: Add support for TX/RX channel interrupt
net: stmmac: Make MSI interrupt routine generic
dt-bindings: net: snps,dwmac: per channel irq
net: phy: at803x: make read_status more generic
net: phy: at803x: add support for cdt cross short test for qca808x
net: phy: at803x: refactor qca808x cable test get status function
net: phy: at803x: generalize cdt fault length function
net: ethernet: cortina: Drop TSO support
...

show more ...


# 753c8608 01-Dec-2023 Jakub Kicinski <kuba@kernel.org>

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2023-11-30

We've added 30 non-merge commits

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2023-11-30

We've added 30 non-merge commits during the last 7 day(s) which contain
a total of 58 files changed, 1598 insertions(+), 154 deletions(-).

The main changes are:

1) Add initial TX metadata implementation for AF_XDP with support in mlx5
and stmmac drivers. Two types of offloads are supported right now, that
is, TX timestamp and TX checksum offload, from Stanislav Fomichev with
stmmac implementation from Song Yoong Siang.

2) Change BPF verifier logic to validate global subprograms lazily instead
of unconditionally before the main program, so they can be guarded using
BPF CO-RE techniques, from Andrii Nakryiko.

3) Add BPF link_info support for uprobe multi link along with bpftool
integration for the latter, from Jiri Olsa.

4) Use pkg-config in BPF selftests to determine ld flags which is
in particular needed for linking statically, from Akihiko Odaki.

5) Fix a few BPF selftest failures to adapt to the upcoming LLVM18,
from Yonghong Song.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (30 commits)
bpf/tests: Remove duplicate JSGT tests
selftests/bpf: Add TX side to xdp_hw_metadata
selftests/bpf: Convert xdp_hw_metadata to XDP_USE_NEED_WAKEUP
selftests/bpf: Add TX side to xdp_metadata
selftests/bpf: Add csum helpers
selftests/xsk: Support tx_metadata_len
xsk: Add option to calculate TX checksum in SW
xsk: Validate xsk_tx_metadata flags
xsk: Document tx_metadata_len layout
net: stmmac: Add Tx HWTS support to XDP ZC
net/mlx5e: Implement AF_XDP TX timestamp and checksum offload
tools: ynl: Print xsk-features from the sample
xsk: Add TX timestamp and TX checksum offload support
xsk: Support tx_metadata_len
selftests/bpf: Use pkg-config for libelf
selftests/bpf: Override PKG_CONFIG for static builds
selftests/bpf: Choose pkg-config for the target
bpftool: Add support to display uprobe_multi links
selftests/bpf: Add link_info test for uprobe_multi link
selftests/bpf: Use bpf_link__destroy in fill_link_info tests
...
====================

Conflicts:

Documentation/netlink/specs/netdev.yaml:
839ff60df3ab ("net: page_pool: add nlspec for basic access to page pools")
48eb03dd2630 ("xsk: Add TX timestamp and TX checksum offload support")
https://lore.kernel.org/all/20231201094705.1ee3cab8@canb.auug.org.au/

While at it also regen, tree is dirty after:
48eb03dd2630 ("xsk: Add TX timestamp and TX checksum offload support")
looks like code wasn't re-rendered after "render-max" was removed.

Link: https://lore.kernel.org/r/20231130145708.32573-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# b5145153 29-Nov-2023 Alexei Starovoitov <ast@kernel.org>

Merge branch 'xsk-tx-metadata'

Stanislav Fomichev says:

====================
xsk: TX metadata

This series implements initial TX metadata (offloads) for AF_XDP.
See patch #2 for the main implementa

Merge branch 'xsk-tx-metadata'

Stanislav Fomichev says:

====================
xsk: TX metadata

This series implements initial TX metadata (offloads) for AF_XDP.
See patch #2 for the main implementation and mlx5/stmmac ones for the
example on how to consume the metadata on the device side.

Starting with two types of offloads:
- request TX timestamp (and write it back into the metadata area)
- request TX checksum offload

Changes since v5:
- preserve xsk_tx_metadata flags across tx and completion by moving
them out of 'request' union (Jesper)
- fix xdp_metadata checksum failure in big endian (Alexei)
- add SPDX to xdp-rx-metadata.rst (Simon)

v5: https://lore.kernel.org/bpf/20231102225837.1141915-1-sdf@google.com/

Performance (mlx5):

I've implemented a small xskgen tool to try to saturate single tx queue:
https://github.com/fomichev/xskgen/tree/master

Here are the performance numbers with some analysis.

1. Baseline. Running with commit eb62e6aef940 ("Merge branch 'bpf:
Support bpf_get_func_ip helper in uprobes'"), nothing from this series:

- with 1400 bytes of payload: 98 gbps, 8 mpps
./xskgen -s 1400 -b eth3 10:70:fd:48:10:77 10:70:fd:48:10:87 fe80::1270:fdff:fe48:1077 fe80::1270:fdff:fe48:1087 1 1
sent 10000000 packets 116960000000 bits, took 1.189130 sec, 98.357623 gbps 8.409509 mpps

- with 200 bytes of payload: 49 gbps, 23 mpps
./xskgen -s 200 -b eth3 10:70:fd:48:10:77 10:70:fd:48:10:87 fe80::1270:fdff:fe48:1077 fe80::1270:fdff:fe48:1087 1 1
sent 10000064 packets 20960134144 bits, took 0.422235 sec, 49.640921 gbps 23.683645 mpps

2. Adding single commit that supports reserving tx_metadata_len
changes nothing numbers-wise.

- baseline for 1400
./xskgen -s 1400 -b eth3 10:70:fd:48:10:77 10:70:fd:48:10:87 fe80::1270:fdff:fe48:1077 fe80::1270:fdff:fe48:1087 1 1
sent 10000000 packets 116960000000 bits, took 1.189247 sec, 98.347946 gbps 8.408682 mpps

- baseline for 200
./xskgen -s 200 -b eth3 10:70:fd:48:10:77 10:70:fd:48:10:87 fe80::1270:fdff:fe48:1077 fe80::1270:fdff:fe48:1087 1 1
sent 10000000 packets 20960000000 bits, took 0.421248 sec, 49.756913 gbps 23.738985 mpps

3. Adding -M flag causes xskgen to reserve the metadata and fill it, but
doesn't set XDP_TX_METADATA descriptor option.

- new baseline for 1400 (with only filling the metadata)
./xskgen -M -s 1400 -b eth3 10:70:fd:48:10:77 10:70:fd:48:10:87 fe80::1270:fdff:fe48:1077 fe80::1270:fdff:fe48:1087 1 1
sent 10000000 packets 116960000000 bits, took 1.188767 sec, 98.387657 gbps 8.412077 mpps

- new baseline for 200 (with only filling the metadata)
./xskgen -M -s 200 -b eth3 10:70:fd:48:10:77 10:70:fd:48:10:87 fe80::1270:fdff:fe48:1077 fe80::1270:fdff:fe48:1087 1 1
sent 10000000 packets 20960000000 bits, took 0.410213 sec, 51.095407 gbps 24.377579 mpps
(the numbers go sligtly up here, not really sure why, maybe some cache-related
side-effects?

4. Next, I'm running the same test but with the commit that adds actual
general infra to parse XDP_TX_METADATA (but no driver support).
Essentially applying "xsk: add TX timestamp and TX checksum offload support"
from this series. Numbers are the same.

- fill metadata for 1400
./xskgen -M -s 1400 -b eth3 10:70:fd:48:10:77 10:70:fd:48:10:87 fe80::1270:fdff:fe48:1077 fe80::1270:fdff:fe48:1087 1 1
sent 10000000 packets 116960000000 bits, took 1.188430 sec, 98.415557 gbps 8.414463 mpps

- fill metadata for 200
./xskgen -M -s 200 -b eth3 10:70:fd:48:10:77 10:70:fd:48:10:87 fe80::1270:fdff:fe48:1077 fe80::1270:fdff:fe48:1087 1 1
sent 10000000 packets 20960000000 bits, took 0.411559 sec, 50.928299 gbps 24.297853 mpps

- request metadata for 1400
./xskgen -m -s 1400 -b eth3 10:70:fd:48:10:77 10:70:fd:48:10:87 fe80::1270:fdff:fe48:1077 fe80::1270:fdff:fe48:1087 1 1
sent 10000000 packets 116960000000 bits, took 1.188723 sec, 98.391299 gbps 8.412389 mpps

- request metadata for 200
./xskgen -m -s 200 -b eth3 10:70:fd:48:10:77 10:70:fd:48:10:87 fe80::1270:fdff:fe48:1077 fe80::1270:fdff:fe48:1087 1 1
sent 10000064 packets 20960134144 bits, took 0.411240 sec, 50.968131 gbps 24.316856 mpps

5. Now, for the most interesting part, I'm adding mlx5 driver support.
The mpps for 200 bytes case goes down from 23 mpps to 19 mpps, but
_only_ when I enable the metadata. This looks like a side effect
of me pushing extra metadata pointer via mlx5e_xdpi_fifo_push.
Hence, this part is wrapped into 'if (xp_tx_metadata_enabled)'
to not affect the existing non-metadata use-cases. Since this is not
regressing existing workloads, I'm not spending any time trying to
optimize it more (and leaving it up to mlx owners to purse if
they see any good way to do it).

- same baseline
./xskgen -s 1400 -b eth3 10:70:fd:48:10:77 10:70:fd:48:10:87 fe80::1270:fdff:fe48:1077 fe80::1270:fdff:fe48:1087 1 1
sent 10000000 packets 116960000000 bits, took 1.189434 sec, 98.332484 gbps 8.407360 mpps

./xskgen -s 200 -b eth3 10:70:fd:48:10:77 10:70:fd:48:10:87 fe80::1270:fdff:fe48:1077 fe80::1270:fdff:fe48:1087 1 1
sent 10000128 packets 20960268288 bits, took 0.425254 sec, 49.288821 gbps 23.515659 mpps

- fill metadata for 1400
./xskgen -M -s 1400 -b eth3 10:70:fd:48:10:77 10:70:fd:48:10:87 fe80::1270:fdff:fe48:1077 fe80::1270:fdff:fe48:1087 1 1
sent 10000000 packets 116960000000 bits, took 1.189528 sec, 98.324714 gbps 8.406696 mpps

- fill metadata for 200
./xskgen -M -s 200 -b eth3 10:70:fd:48:10:77 10:70:fd:48:10:87 fe80::1270:fdff:fe48:1077 fe80::1270:fdff:fe48:1087 1 1
sent 10000128 packets 20960268288 bits, took 0.519085 sec, 40.379260 gbps 19.264914 mpps

- request metadata for 1400
./xskgen -m -s 1400 -b eth3 10:70:fd:48:10:77 10:70:fd:48:10:87 fe80::1270:fdff:fe48:1077 fe80::1270:fdff:fe48:1087 1 1
sent 10000000 packets 116960000000 bits, took 1.189329 sec, 98.341165 gbps 8.408102 mpps

- request metadata for 200
./xskgen -m -s 200 -b eth3 10:70:fd:48:10:77 10:70:fd:48:10:87 fe80::1270:fdff:fe48:1077 fe80::1270:fdff:fe48:1087 1 1
sent 10000128 packets 20960268288 bits, took 0.519929 sec, 40.313713 gbps 19.233642 mpps

Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
====================

Link: https://lore.kernel.org/r/20231127190319.1190813-1-sdf@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

show more ...


# 11614723 27-Nov-2023 Stanislav Fomichev <sdf@google.com>

xsk: Add option to calculate TX checksum in SW

For XDP_COPY mode, add a UMEM option XDP_UMEM_TX_SW_CSUM
to call skb_checksum_help in transmit path. Might be useful
to debugging issues with real hard

xsk: Add option to calculate TX checksum in SW

For XDP_COPY mode, add a UMEM option XDP_UMEM_TX_SW_CSUM
to call skb_checksum_help in transmit path. Might be useful
to debugging issues with real hardware. I also use this mode
in the selftests.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20231127190319.1190813-9-sdf@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

show more ...


# 48eb03dd 27-Nov-2023 Stanislav Fomichev <sdf@google.com>

xsk: Add TX timestamp and TX checksum offload support

This change actually defines the (initial) metadata layout
that should be used by AF_XDP userspace (xsk_tx_metadata).
The first field is flags w

xsk: Add TX timestamp and TX checksum offload support

This change actually defines the (initial) metadata layout
that should be used by AF_XDP userspace (xsk_tx_metadata).
The first field is flags which requests appropriate offloads,
followed by the offload-specific fields. The supported per-device
offloads are exported via netlink (new xsk-flags).

The offloads themselves are still implemented in a bit of a
framework-y fashion that's left from my initial kfunc attempt.
I'm introducing new xsk_tx_metadata_ops which drivers are
supposed to implement. The drivers are also supposed
to call xsk_tx_metadata_request/xsk_tx_metadata_complete in
the right places. Since xsk_tx_metadata_{request,_complete}
are static inline, we don't incur any extra overhead doing
indirect calls.

The benefit of this scheme is as follows:
- keeps all metadata layout parsing away from driver code
- makes it easy to grep and see which drivers implement what
- don't need any extra flags to maintain to keep track of what
offloads are implemented; if the callback is implemented - the offload
is supported (used by netlink reporting code)

Two offloads are defined right now:
1. XDP_TXMD_FLAGS_CHECKSUM: skb-style csum_start+csum_offset
2. XDP_TXMD_FLAGS_TIMESTAMP: writes TX timestamp back into metadata
area upon completion (tx_timestamp field)

XDP_TXMD_FLAGS_TIMESTAMP is also implemented for XDP_COPY mode: it writes
SW timestamp from the skb destructor (note I'm reusing hwtstamps to pass
metadata pointer).

The struct is forward-compatible and can be extended in the future
by appending more fields.

Reviewed-by: Song Yoong Siang <yoong.siang.song@intel.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20231127190319.1190813-3-sdf@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

show more ...


# 341ac980 27-Nov-2023 Stanislav Fomichev <sdf@google.com>

xsk: Support tx_metadata_len

For zerocopy mode, tx_desc->addr can point to an arbitrary offset
and carry some TX metadata in the headroom. For copy mode, there
is no way currently to populate skb me

xsk: Support tx_metadata_len

For zerocopy mode, tx_desc->addr can point to an arbitrary offset
and carry some TX metadata in the headroom. For copy mode, there
is no way currently to populate skb metadata.

Introduce new tx_metadata_len umem config option that indicates how many
bytes to treat as metadata. Metadata bytes come prior to tx_desc address
(same as in RX case).

The size of the metadata has mostly the same constraints as XDP:
- less than 256 bytes
- 8-byte aligned (compared to 4-byte alignment on xdp, due to 8-byte
timestamp in the completion)
- non-zero

This data is not interpreted in any way right now.

Reviewed-by: Song Yoong Siang <yoong.siang.song@intel.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20231127190319.1190813-2-sdf@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

show more ...


Revision tags: v6.6-rc7
# a940daa5 17-Oct-2023 Thomas Gleixner <tglx@linutronix.de>

Merge branch 'linus' into smp/core

Pull in upstream to get the fixes so depending changes can be applied.


Revision tags: v6.6-rc6
# 57390019 11-Oct-2023 Thomas Zimmermann <tzimmermann@suse.de>

Merge drm/drm-next into drm-misc-next

Updating drm-misc-next to the state of Linux v6.6-rc2.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>


Revision tags: v6.6-rc5
# de801933 03-Oct-2023 Ingo Molnar <mingo@kernel.org>

Merge tag 'v6.6-rc4' into perf/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>


Revision tags: v6.6-rc4, v6.6-rc3
# 6f23fc47 18-Sep-2023 Ingo Molnar <mingo@kernel.org>

Merge tag 'v6.6-rc2' into locking/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>


Revision tags: v6.6-rc2
# a3f9e4bc 15-Sep-2023 Jani Nikula <jani.nikula@intel.com>

Merge drm/drm-next into drm-intel-next

Sync to v6.6-rc1.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# c900529f 12-Sep-2023 Thomas Zimmermann <tzimmermann@suse.de>

Merge drm/drm-fixes into drm-misc-fixes

Forwarding to v6.6-rc1.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>


Revision tags: v6.6-rc1
# bd6c11bc 29-Aug-2023 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'net-next-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Paolo Abeni:
"Core:

- Increase size limits for to-be-sent skb frag allocat

Merge tag 'net-next-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Paolo Abeni:
"Core:

- Increase size limits for to-be-sent skb frag allocations. This
allows tun, tap devices and packet sockets to better cope with
large writes operations

- Store netdevs in an xarray, to simplify iterating over netdevs

- Refactor nexthop selection for multipath routes

- Improve sched class lifetime handling

- Add backup nexthop ID support for bridge

- Implement drop reasons support in openvswitch

- Several data races annotations and fixes

- Constify the sk parameter of routing functions

- Prepend kernel version to netconsole message

Protocols:

- Implement support for TCP probing the peer being under memory
pressure

- Remove hard coded limitation on IPv6 specific info placement inside
the socket struct

- Get rid of sysctl_tcp_adv_win_scale and use an auto-estimated per
socket scaling factor

- Scaling-up the IPv6 expired route GC via a separated list of
expiring routes

- In-kernel support for the TLS alert protocol

- Better support for UDP reuseport with connected sockets

- Add NEXT-C-SID support for SRv6 End.X behavior, reducing the SR
header size

- Get rid of additional ancillary per MPTCP connection struct socket

- Implement support for BPF-based MPTCP packet schedulers

- Format MPTCP subtests selftests results in TAP

- Several new SMC 2.1 features including unique experimental options,
max connections per lgr negotiation, max links per lgr negotiation

BPF:

- Multi-buffer support in AF_XDP

- Add multi uprobe BPF links for attaching multiple uprobes and usdt
probes, which is significantly faster and saves extra fds

- Implement an fd-based tc BPF attach API (TCX) and BPF link support
on top of it

- Add SO_REUSEPORT support for TC bpf_sk_assign

- Support new instructions from cpu v4 to simplify the generated code
and feature completeness, for x86, arm64, riscv64

- Support defragmenting IPv(4|6) packets in BPF

- Teach verifier actual bounds of bpf_get_smp_processor_id() and fix
perf+libbpf issue related to custom section handling

- Introduce bpf map element count and enable it for all program types

- Add a BPF hook in sys_socket() to change the protocol ID from
IPPROTO_TCP to IPPROTO_MPTCP to cover migration for legacy

- Introduce bpf_me_mcache_free_rcu() and fix OOM under stress

- Add uprobe support for the bpf_get_func_ip helper

- Check skb ownership against full socket

- Support for up to 12 arguments in BPF trampoline

- Extend link_info for kprobe_multi and perf_event links

Netfilter:

- Speed-up process exit by aborting ruleset validation if a fatal
signal is pending

- Allow NLA_POLICY_MASK to be used with BE16/BE32 types

Driver API:

- Page pool optimizations, to improve data locality and cache usage

- Introduce ndo_hwtstamp_get() and ndo_hwtstamp_set() to avoid the
need for raw ioctl() handling in drivers

- Simplify genetlink dump operations (doit/dumpit) providing them the
common information already populated in struct genl_info

- Extend and use the yaml devlink specs to [re]generate the split ops

- Introduce devlink selective dumps, to allow SF filtering SF based
on handle and other attributes

- Add yaml netlink spec for netlink-raw families, allow route, link
and address related queries via the ynl tool

- Remove phylink legacy mode support

- Support offload LED blinking to phy

- Add devlink port function attributes for IPsec

New hardware / drivers:

- Ethernet:
- Broadcom ASP 2.0 (72165) ethernet controller
- MediaTek MT7988 SoC
- Texas Instruments AM654 SoC
- Texas Instruments IEP driver
- Atheros qca8081 phy
- Marvell 88Q2110 phy
- NXP TJA1120 phy

- WiFi:
- MediaTek mt7981 support

- Can:
- Kvaser SmartFusion2 PCI Express devices
- Allwinner T113 controllers
- Texas Instruments tcan4552/4553 chips

- Bluetooth:
- Intel Gale Peak
- Qualcomm WCN3988 and WCN7850
- NXP AW693 and IW624
- Mediatek MT2925

Drivers:

- Ethernet NICs:
- nVidia/Mellanox:
- mlx5:
- support UDP encapsulation in packet offload mode
- IPsec packet offload support in eswitch mode
- improve aRFS observability by adding new set of counters
- extends MACsec offload support to cover RoCE traffic
- dynamic completion EQs
- mlx4:
- convert to use auxiliary bus instead of custom interface
logic
- Intel
- ice:
- implement switchdev bridge offload, even for LAG
interfaces
- implement SRIOV support for LAG interfaces
- igc:
- add support for multiple in-flight TX timestamps
- Broadcom:
- bnxt:
- use the unified RX page pool buffers for XDP and non-XDP
- use the NAPI skb allocation cache
- OcteonTX2:
- support Round Robin scheduling HTB offload
- TC flower offload support for SPI field
- Freescale:
- add XDP_TX feature support
- AMD:
- ionic: add support for PCI FLR event
- sfc:
- basic conntrack offload
- introduce eth, ipv4 and ipv6 pedit offloads
- ST Microelectronics:
- stmmac: maximze PTP timestamping resolution

- Virtual NICs:
- Microsoft vNIC:
- batch ringing RX queue doorbell on receiving packets
- add page pool for RX buffers
- Virtio vNIC:
- add per queue interrupt coalescing support
- Google vNIC:
- add queue-page-list mode support

- Ethernet high-speed switches:
- nVidia/Mellanox (mlxsw):
- add port range matching tc-flower offload
- permit enslavement to netdevices with uppers

- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- convert to phylink_pcs
- Renesas:
- r8A779fx: add speed change support
- rzn1: enables vlan support

- Ethernet PHYs:
- convert mv88e6xxx to phylink_pcs

- WiFi:
- Qualcomm Wi-Fi 7 (ath12k):
- extremely High Throughput (EHT) PHY support
- RealTek (rtl8xxxu):
- enable AP mode for: RTL8192FU, RTL8710BU (RTL8188GU),
RTL8192EU and RTL8723BU
- RealTek (rtw89):
- Introduce Time Averaged SAR (TAS) support

- Connector:
- support for event filtering"

* tag 'net-next-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1806 commits)
net: ethernet: mtk_wed: minor change in wed_{tx,rx}info_show
net: ethernet: mtk_wed: add some more info in wed_txinfo_show handler
net: stmmac: clarify difference between "interface" and "phy_interface"
r8152: add vendor/device ID pair for D-Link DUB-E250
devlink: move devlink_notify_register/unregister() to dev.c
devlink: move small_ops definition into netlink.c
devlink: move tracepoint definitions into core.c
devlink: push linecard related code into separate file
devlink: push rate related code into separate file
devlink: push trap related code into separate file
devlink: use tracepoint_enabled() helper
devlink: push region related code into separate file
devlink: push param related code into separate file
devlink: push resource related code into separate file
devlink: push dpipe related code into separate file
devlink: move and rename devlink_dpipe_send_and_alloc_skb() helper
devlink: push shared buffer related code into separate file
devlink: push port related code into separate file
devlink: push object register/unregister notifications into separate helpers
inet: fix IP_TRANSPARENT error handling
...

show more ...


Revision tags: v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3
# e93165d5 20-Jul-2023 Jakub Kicinski <kuba@kernel.org>

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Alexei Starovoitov says:

====================
pull-request: bpf-next 2023-07-19

We've added 45 non-merge comm

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Alexei Starovoitov says:

====================
pull-request: bpf-next 2023-07-19

We've added 45 non-merge commits during the last 3 day(s) which contain
a total of 71 files changed, 7808 insertions(+), 592 deletions(-).

The main changes are:

1) multi-buffer support in AF_XDP, from Maciej Fijalkowski,
Magnus Karlsson, Tirthendu Sarkar.

2) BPF link support for tc BPF programs, from Daniel Borkmann.

3) Enable bpf_map_sum_elem_count kfunc for all program types,
from Anton Protopopov.

4) Add 'owner' field to bpf_rb_node to fix races in shared ownership,
Dave Marchevsky.

5) Prevent potential skb_header_pointer() misuse, from Alexei Starovoitov.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (45 commits)
bpf, net: Introduce skb_pointer_if_linear().
bpf: sync tools/ uapi header with
selftests/bpf: Add mprog API tests for BPF tcx links
selftests/bpf: Add mprog API tests for BPF tcx opts
bpftool: Extend net dump with tcx progs
libbpf: Add helper macro to clear opts structs
libbpf: Add link-based API for tcx
libbpf: Add opts-based attach/detach/query API for tcx
bpf: Add fd-based tcx multi-prog infra with link support
bpf: Add generic attach/detach/query API for multi-progs
selftests/xsk: reset NIC settings to default after running test suite
selftests/xsk: add test for too many frags
selftests/xsk: add metadata copy test for multi-buff
selftests/xsk: add invalid descriptor test for multi-buffer
selftests/xsk: add unaligned mode test for multi-buffer
selftests/xsk: add basic multi-buffer test
selftests/xsk: transmit and receive multi-buffer packets
xsk: add multi-buffer documentation
i40e: xsk: add TX multi-buffer support
ice: xsk: Tx multi-buffer support
...
====================

Link: https://lore.kernel.org/r/20230719175424.75717-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# 3226e313 19-Jul-2023 Alexei Starovoitov <ast@kernel.org>

Merge branch 'xsk-multi-buffer-support'

Maciej Fijalkowski says:

====================
xsk: multi-buffer support

v6->v7:
- rebase...[Alexei]

v5->v6:
- update bpf_xdp_query_opts__last_field in patc

Merge branch 'xsk-multi-buffer-support'

Maciej Fijalkowski says:

====================
xsk: multi-buffer support

v6->v7:
- rebase...[Alexei]

v5->v6:
- update bpf_xdp_query_opts__last_field in patch 10 [Alexei]

v4->v5:
- align options argument size to match options from xdp_desc [Benjamin]
- cleanup skb from xdp_sock on socket termination [Toke]
- introduce new netlink attribute for letting user space know about Tx
frag limit; this substitutes xdp_features flag previously dedicated
for setting ZC multi-buffer support [Toke, Jakub]
- include i40e ZC multi-buffer support
- enable TOO_MANY_FRAGS for ZC on xskxceiver; this is now possible due
to netlink attribute mentioned two bullets above

v3->v4:
-rely on ynl for adding new xdp_features flag [Jakub]
- move xskb_list to xsk_buff_pool

v2->v3:
- Fix issue with the next valid packet getting dropped after an invalid
packet with MAX_SKB_FRAGS + 1 frags [Magnus]
- query NETDEV_XDP_ACT_ZC_SG flag within xskxceiver and act on it
- remove redundant include in xsk.c [kernel test robot]
- s/NETDEV_XDP_ACT_NDO_ZC_SG/NETDEV_XDP_ACT_ZC_SG + kernel doc [Magnus,
Simon]

v1->v2:
- fix spelling issues in commit messages [Simon]
- remove XSK_DESC_MAX_FRAGS, use MAX_SKB_FRAGS instead [Stan, Alexei]
- add documentation patch
- fix build error from kernel test robot on patch 10

This series of patches add multi-buffer support for AF_XDP. XDP and
various NIC drivers already have support for multi-buffer packets. With
this patch set, programs using AF_XDP sockets can now also receive and
transmit multi-buffer packets both in copy as well as zero-copy mode.
ZC multi-buffer implementation is based on ice driver.

Some definitions to put us all on the same page:

* A packet consists of one or more frames

* A descriptor in one of the AF_XDP rings always refers to a single
frame. In the case the packet consists of a single frame, the
descriptor refers to the whole packet.

To represent a packet consisting of multiple frames, we introduce a
new flag called XDP_PKT_CONTD in the options field of the Rx and Tx
descriptors. If it is true (1) the packet continues with the next
descriptor and if it is false (0) it means this is the last descriptor
of the packet. Why the reverse logic of end-of-packet (eop) flag found
in many NICs? Just to preserve compatibility with non-multi-buffer
applications that have this bit set to false for all packets on Rx, and
the apps set the options field to zero for Tx, as anything else will
be treated as an invalid descriptor.

These are the semantics for producing packets onto XSK Tx ring
consisting of multiple frames:

* When an invalid descriptor is found, all the other
descriptors/frames of this packet are marked as invalid and not
completed. The next descriptor is treated as the start of a new
packet, even if this was not the intent (because we cannot guess
the intent). As before, if your program is producing invalid
descriptors you have a bug that must be fixed.

* Zero length descriptors are treated as invalid descriptors.

* For copy mode, the maximum supported number of frames in a packet is
equal to CONFIG_MAX_SKB_FRAGS + 1. If it is exceeded, all
descriptors accumulated so far are dropped and treated as
invalid. To produce an application that will work on any system
regardless of this config setting, limit the number of frags to 18,
as the minimum value of the config is 17.

* For zero-copy mode, the limit is up to what the NIC HW
supports. User space can discover this via newly introduced
NETDEV_A_DEV_XDP_ZC_MAX_SEGS netlink attribute.

Here is an example Tx path pseudo-code (using libxdp interfaces for
simplicity) ignoring that the umem is finite in size, and that we
eventually will run out of packets to send. Also assumes pkts.addr
points to a valid location in the umem.

void tx_packets(struct xsk_socket_info *xsk, struct pkt *pkts,
int batch_size)
{
u32 idx, i, pkt_nb = 0;

xsk_ring_prod__reserve(&xsk->tx, batch_size, &idx);

for (i = 0; i < batch_size;) {
u64 addr = pkts[pkt_nb].addr;
u32 len = pkts[pkt_nb].size;

do {
struct xdp_desc *tx_desc;

tx_desc = xsk_ring_prod__tx_desc(&xsk->tx, idx + i++);
tx_desc->addr = addr;

if (len > xsk_frame_size) {
tx_desc->len = xsk_frame_size;
tx_desc->options |= XDP_PKT_CONTD;
} else {
tx_desc->len = len;
tx_desc->options = 0;
pkt_nb++;
}
len -= tx_desc->len;
addr += xsk_frame_size;

if (i == batch_size) {
/* Remember len, addr, pkt_nb for next
* iteration. Skipped for simplicity.
*/
break;
}
} while (len);
}

xsk_ring_prod__submit(&xsk->tx, i);
}

On the Rx path in copy mode, the xsk core copies the XDP data into
multiple descriptors, if needed, and sets the XDP_PKT_CONTD flag as
detailed before. Zero-copy mode in order to avoid the copies has to
maintain a chain of xdp_buff_xsk structs that represent whole packet.
This is because what actually is redirected is the xdp_buff and we
currently have no equivalent mechanism that is used for copy mode
(embedded skb_shared_info in xdp_buff) to carry the frags. This means
xdp_buff_xsk grows in size but these members are at the end and should
not be touched when data path is not dealing with fragmented packets.
This solution kept us within assumed performance impact, hence we
decided to proceed with it.

When the application gets a descriptor with the
XDP_PKT_CONTD flag set to one, it means that the packet consists of
multiple buffers and it continues with the next buffer in the following
descriptor. When a descriptor with XDP_PKT_CONTD == 0 is received, it
means that this is the last buffer of the packet. AF_XDP guarantees that
only a complete packet (all frames in the packet) is sent to the
application.

If application reads a batch of descriptors, using for example the libxdp
interfaces, it is not guaranteed that the batch will end with a full
packet. It might end in the middle of a packet and the rest of the
buffers of that packet will arrive at the beginning of the next batch,
since the libxdp interface does not read the whole ring (unless you
have an enormous batch size or a very small ring size).

Here is a simple Rx path pseudo-code example (using libxdp interfaces for
simplicity). Error paths have been excluded for simplicity:

void rx_packets(struct xsk_socket_info *xsk)
{
static bool new_packet = true;
u32 idx_rx = 0, idx_fq = 0;
static char *pkt;

int rcvd = xsk_ring_cons__peek(&xsk->rx, opt_batch_size, &idx_rx);

xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq);

for (int i = 0; i < rcvd; i++) {
struct xdp_desc *desc = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx++);
char *frag = xsk_umem__get_data(xsk->umem->buffer, desc->addr);
bool eop = !(desc->options & XDP_PKT_CONTD);

if (new_packet)
pkt = frag;
else
add_frag_to_pkt(pkt, frag);

if (eop)
process_pkt(pkt);

new_packet = eop;

*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) = desc->addr;
}

xsk_ring_prod__submit(&xsk->umem->fq, rcvd);
xsk_ring_cons__release(&xsk->rx, rcvd);
}

We had to introduce a new bind flag (XDP_USE_SG) on the AF_XDP level to
enable multi-buffer support. The reason we need to differentiate between
non multi-buffer and multi-buffer is the behaviour when the kernel gets
a packet that is larger than the frame size. Without multi-buffer, this
packet is dropped and marked in the stats. With multi-buffer on, we want
to split it up into multiple frames instead.

At the start, we thought that riding on the .frags section name of
the XDP program was a good idea. You do not have to introduce yet
another flag and all AF_XDP users must load an XDP program anyway
to get any traffic up to the socket, so why not just say that the XDP
program decides if the AF_XDP socket should get multi-buffer packets
or not? The problem is that we can create an AF_XDP socket that is Tx
only and that works without having to load an XDP program at
all. Another problem is that the XDP program might change during the
execution, so we would have to check this for every single packet.

Here is the observed throughput when compared to a codebase without any
multi-buffer changes and measured with xdpsock for 64B packets.
Apparently ZC Tx takes a hit from explicit zero length descriptors
validation. Overall, in terms of ZC performance, there is a room for
improvement, but for now we think this work is in a good shape in terms
of correctness and functionality. We were targetting for up to 5%
overhead though. Note that ZC performance drops come from core + driver
support being combined, whereas copy mode had already driver support in
place.

Mode rxdrop l2fwd txonly
ice-zc -4% -7% -6%
i40e-zc -7% -6% -7%
drv -1.2% 0% +2%
skb -0.6% -1% +2%

Thank you,
Tirthendu, Magnus and Maciej
====================

Link: https://lore.kernel.org/r/20230719132421.584801-1-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

show more ...


# 81470b5c 19-Jul-2023 Tirthendu Sarkar <tirthendu.sarkar@intel.com>

xsk: introduce XSK_USE_SG bind flag for xsk socket

As of now xsk core drops any xdp_buff with data size greater than the
xsk frame_size as set by the af_xdp application. With multi-buffer
support in

xsk: introduce XSK_USE_SG bind flag for xsk socket

As of now xsk core drops any xdp_buff with data size greater than the
xsk frame_size as set by the af_xdp application. With multi-buffer
support introduced in the next patch xsk core can now split those
buffers into multiple descriptors provided the af_xdp application can
handle them. Such capability of the application needs to be independent
of the xdp_prog's frag support capability since there are cases where
even a single xdp_buffer may need to be split into multiple descriptors
owing to a smaller xsk frame size.

For e.g., with NIC rx_buffer size set to 4kB, a 3kB packet will
constitute of a single buffer and so will be sent as such to AF_XDP layer
irrespective of 'xdp.frags' capability of the XDP program. Now if the xsk
frame size is set to 2kB by the AF_XDP application, then the packet will
need to be split into 2 descriptors if AF_XDP application can handle
multi-buffer, else it needs to be dropped.

Applications can now advertise their frag handling capability to xsk core
so that xsk core can decide if it should drop or split xdp_buffs that
exceed xsk frame size. This is done using a new 'XSK_USE_SG' bind flag
for the xdp socket.

Signed-off-by: Tirthendu Sarkar <tirthendu.sarkar@intel.com>
Link: https://lore.kernel.org/r/20230719132421.584801-3-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

show more ...


12345