Revision tags: v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7 |
|
#
06d07429 |
| 29-Feb-2024 |
Jani Nikula <jani.nikula@intel.com> |
Merge drm/drm-next into drm-intel-next
Sync to get the drm_printer changes to drm-intel-next.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
Revision tags: v6.8-rc6, v6.8-rc5 |
|
#
41c177cf |
| 11-Feb-2024 |
Rob Clark <robdclark@chromium.org> |
Merge tag 'drm-misc-next-2024-02-08' into msm-next
Merge the drm-misc tree to uprev MSM CI.
Signed-off-by: Rob Clark <robdclark@chromium.org>
|
Revision tags: v6.8-rc4, v6.8-rc3 |
|
#
4db102dc |
| 29-Jan-2024 |
Maxime Ripard <mripard@kernel.org> |
Merge drm/drm-next into drm-misc-next
Kickstart 6.9 development cycle.
Signed-off-by: Maxime Ripard <mripard@kernel.org>
|
Revision tags: v6.8-rc2 |
|
#
be3382ec |
| 23-Jan-2024 |
Lucas De Marchi <lucas.demarchi@intel.com> |
Merge drm/drm-next into drm-xe-next
Sync to v6.8-rc1.
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
#
03c11eb3 |
| 14-Feb-2024 |
Ingo Molnar <mingo@kernel.org> |
Merge tag 'v6.8-rc4' into x86/percpu, to resolve conflicts and refresh the branch
Conflicts: arch/x86/include/asm/percpu.h arch/x86/include/asm/text-patching.h
Signed-off-by: Ingo Molnar <mingo@k
Merge tag 'v6.8-rc4' into x86/percpu, to resolve conflicts and refresh the branch
Conflicts: arch/x86/include/asm/percpu.h arch/x86/include/asm/text-patching.h
Signed-off-by: Ingo Molnar <mingo@kernel.org>
show more ...
|
#
42ac0be1 |
| 26-Jan-2024 |
Ingo Molnar <mingo@kernel.org> |
Merge branch 'linus' into x86/mm, to refresh the branch and pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
Revision tags: v6.8-rc1 |
|
#
fe33c0fb |
| 17-Jan-2024 |
Andrew Morton <akpm@linux-foundation.org> |
Merge branch 'master' into mm-hotfixes-stable
|
#
cf79f291 |
| 22-Jan-2024 |
Maxime Ripard <mripard@kernel.org> |
Merge v6.8-rc1 into drm-misc-fixes
Let's kickstart the 6.8 fix cycle.
Signed-off-by: Maxime Ripard <mripard@kernel.org>
|
Revision tags: v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6 |
|
#
a1c613ae |
| 24-Oct-2023 |
Tvrtko Ursulin <tvrtko.ursulin@intel.com> |
Merge drm/drm-next into drm-intel-gt-next
Work that needs to land in drm-intel-gt-next depends on two patches only present in drm-intel-next, absence of which is causing a merge conflict:
3b918f4
Merge drm/drm-next into drm-intel-gt-next
Work that needs to land in drm-intel-gt-next depends on two patches only present in drm-intel-next, absence of which is causing a merge conflict:
3b918f4f0c8b ("drm/i915/pxp: Optimize GET_PARAM:PXP_STATUS") ac765b7018f6 ("drm/i915/pxp/mtl: intel_pxp_init_hw needs runtime-pm inside pm-complete")
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
show more ...
|
#
3e7aeb78 |
| 11-Jan-2024 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'net-next-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni: "The most interesting thing is probably the networking structs
Merge tag 'net-next-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni: "The most interesting thing is probably the networking structs reorganization and a significant amount of changes is around self-tests.
Core & protocols:
- Analyze and reorganize core networking structs (socks, netdev, netns, mibs) to optimize cacheline consumption and set up build time warnings to safeguard against future header changes
This improves TCP performances with many concurrent connections up to 40%
- Add page-pool netlink-based introspection, exposing the memory usage and recycling stats. This helps indentify bad PP users and possible leaks
- Refine TCP/DCCP source port selection to no longer favor even source port at connect() time when IP_LOCAL_PORT_RANGE is set. This lowers the time taken by connect() for hosts having many active connections to the same destination
- Refactor the TCP bind conflict code, shrinking related socket structs
- Refactor TCP SYN-Cookie handling, as a preparation step to allow arbitrary SYN-Cookie processing via eBPF
- Tune optmem_max for 0-copy usage, increasing the default value to 128KB and namespecifying it
- Allow coalescing for cloned skbs coming from page pools, improving RX performances with some common configurations
- Reduce extension header parsing overhead at GRO time
- Add bridge MDB bulk deletion support, allowing user-space to request the deletion of matching entries
- Reorder nftables struct members, to keep data accessed by the datapath first
- Introduce TC block ports tracking and use. This allows supporting multicast-like behavior at the TC layer
- Remove UAPI support for retired TC qdiscs (dsmark, CBQ and ATM) and classifiers (RSVP and tcindex)
- More data-race annotations
- Extend the diag interface to dump TCP bound-only sockets
- Conditional notification of events for TC qdisc class and actions
- Support for WPAN dynamic associations with nearby devices, to form a sub-network using a specific PAN ID
- Implement SMCv2.1 virtual ISM device support
- Add support for Batman-avd mulicast packet type
BPF:
- Tons of verifier improvements: - BPF register bounds logic and range support along with a large test suite - log improvements - complete precision tracking support for register spills - track aligned STACK_ZERO cases as imprecise spilled registers. This improves the verifier "instructions processed" metric from single digit to 50-60% for some programs - support for user's global BPF subprogram arguments with few commonly requested annotations for a better developer experience - support tracking of BPF_JNE which helps cases when the compiler transforms (unsigned) "a > 0" into "if a == 0 goto xxx" and the like - several fixes
- Add initial TX metadata implementation for AF_XDP with support in mlx5 and stmmac drivers. Two types of offloads are supported right now, that is, TX timestamp and TX checksum offload
- Fix kCFI bugs in BPF all forms of indirect calls from BPF into kernel and from kernel into BPF work with CFI enabled. This allows BPF to work with CONFIG_FINEIBT=y
- Change BPF verifier logic to validate global subprograms lazily instead of unconditionally before the main program, so they can be guarded using BPF CO-RE techniques
- Support uid/gid options when mounting bpffs
- Add a new kfunc which acquires the associated cgroup of a task within a specific cgroup v1 hierarchy where the latter is identified by its id
- Extend verifier to allow bpf_refcount_acquire() of a map value field obtained via direct load which is a use-case needed in sched_ext
- Add BPF link_info support for uprobe multi link along with bpftool integration for the latter
- Support for VLAN tag in XDP hints
- Remove deprecated bpfilter kernel leftovers given the project is developed in user-space (https://github.com/facebook/bpfilter)
Misc:
- Support for parellel TC self-tests execution
- Increase MPTCP self-tests coverage
- Updated the bridge documentation, including several so-far undocumented features
- Convert all the net self-tests to run in unique netns, to avoid random failures due to conflict and allow concurrent runs
- Add TCP-AO self-tests
- Add kunit tests for both cfg80211 and mac80211
- Autogenerate Netlink families documentation from YAML spec
- Add yml-gen support for fixed headers and recursive nests, the tool can now generate user-space code for all genetlink families for which we have specs
- A bunch of additional module descriptions fixes
- Catch incorrect freeing of pages belonging to a page pool
Driver API:
- Rust abstractions for network PHY drivers; do not cover yet the full C API, but already allow implementing functional PHY drivers in rust
- Introduce queue and NAPI support in the netdev Netlink interface, allowing complete access to the device <> NAPIs <> queues relationship
- Introduce notifications filtering for devlink to allow control application scale to thousands of instances
- Improve PHY validation, requesting rate matching information for each ethtool link mode supported by both the PHY and host
- Add support for ethtool symmetric-xor RSS hash
- ACPI based Wifi band RFI (WBRF) mitigation feature for the AMD platform
- Expose pin fractional frequency offset value over new DPLL generic netlink attribute
- Convert older drivers to platform remove callback returning void
- Add support for PHY package MMD read/write
New hardware / drivers:
- Ethernet: - Octeon CN10K devices - Broadcom 5760X P7 - Qualcomm SM8550 SoC - Texas Instrument DP83TG720S PHY
- Bluetooth: - IMC Networks Bluetooth radio
Removed:
- WiFi: - libertas 16-bit PCMCIA support - Atmel at76c50x drivers - HostAP ISA/PCMCIA style 802.11b driver - zd1201 802.11b USB dongles - Orinoco ISA/PCMCIA 802.11b driver - Aviator/Raytheon driver - Planet WL3501 driver - RNDIS USB 802.11b driver
Driver updates:
- Ethernet high-speed NICs: - Intel (100G, ice, idpf): - allow one by one port representors creation and removal - add temperature and clock information reporting - add get/set for ethtool's header split ringparam - add again FW logging - adds support switchdev hardware packet mirroring - iavf: implement symmetric-xor RSS hash - igc: add support for concurrent physical and free-running timers - i40e: increase the allowable descriptors - nVidia/Mellanox: - Preparation for Socket-Direct multi-dev netdev. That will allow in future releases combining multiple PFs devices attached to different NUMA nodes under the same netdev - Broadcom (bnxt): - TX completion handling improvements - add basic ntuple filter support - reduce MSIX vectors usage for MQPRIO offload - add VXLAN support, USO offload and TX coalesce completion for P7 - Marvell Octeon EP: - xmit-more support - add PF-VF mailbox support and use it for FW notifications for VFs - Wangxun (ngbe/txgbe): - implement ethtool functions to operate pause param, ring param, coalesce channel number and msglevel - Netronome/Corigine (nfp): - add flow-steering support - support UDP segmentation offload
- Ethernet NICs embedded, slower, virtual: - Xilinx AXI: remove duplicate DMA code adopting the dma engine driver - stmmac: add support for HW-accelerated VLAN stripping - TI AM654x sw: add mqprio, frame preemption & coalescing - gve: add support for non-4k page sizes. - virtio-net: support dynamic coalescing moderation
- nVidia/Mellanox Ethernet datacenter switches: - allow firmware upgrade without a reboot - more flexible support for bridge flooding via the compressed FID flooding mode
- Ethernet embedded switches: - Microchip: - fine-tune flow control and speed configurations in KSZ8xxx - KSZ88X3: enable setting rmii reference - Renesas: - add jumbo frames support - Marvell: - 88E6xxx: add "eth-mac" and "rmon" stats support
- Ethernet PHYs: - aquantia: add firmware load support - at803x: refactor the driver to simplify adding support for more chip variants - NXP C45 TJA11xx: Add MACsec offload support
- Wifi: - MediaTek (mt76): - NVMEM EEPROM improvements - mt7996 Extremely High Throughput (EHT) improvements - mt7996 Wireless Ethernet Dispatcher (WED) support - mt7996 36-bit DMA support - Qualcomm (ath12k): - support for a single MSI vector - WCN7850: support AP mode - Intel (iwlwifi): - new debugfs file fw_dbg_clear - allow concurrent P2P operation on DFS channels
- Bluetooth: - QCA2066: support HFP offload - ISO: more broadcast-related improvements - NXP: better recovery in case receiver/transmitter get out of sync"
* tag 'net-next-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1714 commits) lan78xx: remove redundant statement in lan78xx_get_eee lan743x: remove redundant statement in lan743x_ethtool_get_eee bnxt_en: Fix RCU locking for ntuple filters in bnxt_rx_flow_steer() bnxt_en: Fix RCU locking for ntuple filters in bnxt_srxclsrldel() bnxt_en: Remove unneeded variable in bnxt_hwrm_clear_vnic_filter() tcp: Revert no longer abort SYN_SENT when receiving some ICMP Revert "mlx5 updates 2023-12-20" Revert "net: stmmac: Enable Per DMA Channel interrupt" ipvlan: Remove usage of the deprecated ida_simple_xx() API ipvlan: Fix a typo in a comment net/sched: Remove ipt action tests net: stmmac: Use interrupt mode INTM=1 for per channel irq net: stmmac: Add support for TX/RX channel interrupt net: stmmac: Make MSI interrupt routine generic dt-bindings: net: snps,dwmac: per channel irq net: phy: at803x: make read_status more generic net: phy: at803x: add support for cdt cross short test for qca808x net: phy: at803x: refactor qca808x cable test get status function net: phy: at803x: generalize cdt fault length function net: ethernet: cortina: Drop TSO support ...
show more ...
|
#
753c8608 |
| 01-Dec-2023 |
Jakub Kicinski <kuba@kernel.org> |
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
==================== pull-request: bpf-next 2023-11-30
We've added 30 non-merge commits
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
==================== pull-request: bpf-next 2023-11-30
We've added 30 non-merge commits during the last 7 day(s) which contain a total of 58 files changed, 1598 insertions(+), 154 deletions(-).
The main changes are:
1) Add initial TX metadata implementation for AF_XDP with support in mlx5 and stmmac drivers. Two types of offloads are supported right now, that is, TX timestamp and TX checksum offload, from Stanislav Fomichev with stmmac implementation from Song Yoong Siang.
2) Change BPF verifier logic to validate global subprograms lazily instead of unconditionally before the main program, so they can be guarded using BPF CO-RE techniques, from Andrii Nakryiko.
3) Add BPF link_info support for uprobe multi link along with bpftool integration for the latter, from Jiri Olsa.
4) Use pkg-config in BPF selftests to determine ld flags which is in particular needed for linking statically, from Akihiko Odaki.
5) Fix a few BPF selftest failures to adapt to the upcoming LLVM18, from Yonghong Song.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (30 commits) bpf/tests: Remove duplicate JSGT tests selftests/bpf: Add TX side to xdp_hw_metadata selftests/bpf: Convert xdp_hw_metadata to XDP_USE_NEED_WAKEUP selftests/bpf: Add TX side to xdp_metadata selftests/bpf: Add csum helpers selftests/xsk: Support tx_metadata_len xsk: Add option to calculate TX checksum in SW xsk: Validate xsk_tx_metadata flags xsk: Document tx_metadata_len layout net: stmmac: Add Tx HWTS support to XDP ZC net/mlx5e: Implement AF_XDP TX timestamp and checksum offload tools: ynl: Print xsk-features from the sample xsk: Add TX timestamp and TX checksum offload support xsk: Support tx_metadata_len selftests/bpf: Use pkg-config for libelf selftests/bpf: Override PKG_CONFIG for static builds selftests/bpf: Choose pkg-config for the target bpftool: Add support to display uprobe_multi links selftests/bpf: Add link_info test for uprobe_multi link selftests/bpf: Use bpf_link__destroy in fill_link_info tests ... ====================
Conflicts:
Documentation/netlink/specs/netdev.yaml: 839ff60df3ab ("net: page_pool: add nlspec for basic access to page pools") 48eb03dd2630 ("xsk: Add TX timestamp and TX checksum offload support") https://lore.kernel.org/all/20231201094705.1ee3cab8@canb.auug.org.au/
While at it also regen, tree is dirty after: 48eb03dd2630 ("xsk: Add TX timestamp and TX checksum offload support") looks like code wasn't re-rendered after "render-max" was removed.
Link: https://lore.kernel.org/r/20231130145708.32573-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
b5145153 |
| 29-Nov-2023 |
Alexei Starovoitov <ast@kernel.org> |
Merge branch 'xsk-tx-metadata'
Stanislav Fomichev says:
==================== xsk: TX metadata
This series implements initial TX metadata (offloads) for AF_XDP. See patch #2 for the main implementa
Merge branch 'xsk-tx-metadata'
Stanislav Fomichev says:
==================== xsk: TX metadata
This series implements initial TX metadata (offloads) for AF_XDP. See patch #2 for the main implementation and mlx5/stmmac ones for the example on how to consume the metadata on the device side.
Starting with two types of offloads: - request TX timestamp (and write it back into the metadata area) - request TX checksum offload
Changes since v5: - preserve xsk_tx_metadata flags across tx and completion by moving them out of 'request' union (Jesper) - fix xdp_metadata checksum failure in big endian (Alexei) - add SPDX to xdp-rx-metadata.rst (Simon)
v5: https://lore.kernel.org/bpf/20231102225837.1141915-1-sdf@google.com/
Performance (mlx5):
I've implemented a small xskgen tool to try to saturate single tx queue: https://github.com/fomichev/xskgen/tree/master
Here are the performance numbers with some analysis.
1. Baseline. Running with commit eb62e6aef940 ("Merge branch 'bpf: Support bpf_get_func_ip helper in uprobes'"), nothing from this series:
- with 1400 bytes of payload: 98 gbps, 8 mpps ./xskgen -s 1400 -b eth3 10:70:fd:48:10:77 10:70:fd:48:10:87 fe80::1270:fdff:fe48:1077 fe80::1270:fdff:fe48:1087 1 1 sent 10000000 packets 116960000000 bits, took 1.189130 sec, 98.357623 gbps 8.409509 mpps
- with 200 bytes of payload: 49 gbps, 23 mpps ./xskgen -s 200 -b eth3 10:70:fd:48:10:77 10:70:fd:48:10:87 fe80::1270:fdff:fe48:1077 fe80::1270:fdff:fe48:1087 1 1 sent 10000064 packets 20960134144 bits, took 0.422235 sec, 49.640921 gbps 23.683645 mpps
2. Adding single commit that supports reserving tx_metadata_len changes nothing numbers-wise.
- baseline for 1400 ./xskgen -s 1400 -b eth3 10:70:fd:48:10:77 10:70:fd:48:10:87 fe80::1270:fdff:fe48:1077 fe80::1270:fdff:fe48:1087 1 1 sent 10000000 packets 116960000000 bits, took 1.189247 sec, 98.347946 gbps 8.408682 mpps
- baseline for 200 ./xskgen -s 200 -b eth3 10:70:fd:48:10:77 10:70:fd:48:10:87 fe80::1270:fdff:fe48:1077 fe80::1270:fdff:fe48:1087 1 1 sent 10000000 packets 20960000000 bits, took 0.421248 sec, 49.756913 gbps 23.738985 mpps
3. Adding -M flag causes xskgen to reserve the metadata and fill it, but doesn't set XDP_TX_METADATA descriptor option.
- new baseline for 1400 (with only filling the metadata) ./xskgen -M -s 1400 -b eth3 10:70:fd:48:10:77 10:70:fd:48:10:87 fe80::1270:fdff:fe48:1077 fe80::1270:fdff:fe48:1087 1 1 sent 10000000 packets 116960000000 bits, took 1.188767 sec, 98.387657 gbps 8.412077 mpps
- new baseline for 200 (with only filling the metadata) ./xskgen -M -s 200 -b eth3 10:70:fd:48:10:77 10:70:fd:48:10:87 fe80::1270:fdff:fe48:1077 fe80::1270:fdff:fe48:1087 1 1 sent 10000000 packets 20960000000 bits, took 0.410213 sec, 51.095407 gbps 24.377579 mpps (the numbers go sligtly up here, not really sure why, maybe some cache-related side-effects?
4. Next, I'm running the same test but with the commit that adds actual general infra to parse XDP_TX_METADATA (but no driver support). Essentially applying "xsk: add TX timestamp and TX checksum offload support" from this series. Numbers are the same.
- fill metadata for 1400 ./xskgen -M -s 1400 -b eth3 10:70:fd:48:10:77 10:70:fd:48:10:87 fe80::1270:fdff:fe48:1077 fe80::1270:fdff:fe48:1087 1 1 sent 10000000 packets 116960000000 bits, took 1.188430 sec, 98.415557 gbps 8.414463 mpps
- fill metadata for 200 ./xskgen -M -s 200 -b eth3 10:70:fd:48:10:77 10:70:fd:48:10:87 fe80::1270:fdff:fe48:1077 fe80::1270:fdff:fe48:1087 1 1 sent 10000000 packets 20960000000 bits, took 0.411559 sec, 50.928299 gbps 24.297853 mpps
- request metadata for 1400 ./xskgen -m -s 1400 -b eth3 10:70:fd:48:10:77 10:70:fd:48:10:87 fe80::1270:fdff:fe48:1077 fe80::1270:fdff:fe48:1087 1 1 sent 10000000 packets 116960000000 bits, took 1.188723 sec, 98.391299 gbps 8.412389 mpps
- request metadata for 200 ./xskgen -m -s 200 -b eth3 10:70:fd:48:10:77 10:70:fd:48:10:87 fe80::1270:fdff:fe48:1077 fe80::1270:fdff:fe48:1087 1 1 sent 10000064 packets 20960134144 bits, took 0.411240 sec, 50.968131 gbps 24.316856 mpps
5. Now, for the most interesting part, I'm adding mlx5 driver support. The mpps for 200 bytes case goes down from 23 mpps to 19 mpps, but _only_ when I enable the metadata. This looks like a side effect of me pushing extra metadata pointer via mlx5e_xdpi_fifo_push. Hence, this part is wrapped into 'if (xp_tx_metadata_enabled)' to not affect the existing non-metadata use-cases. Since this is not regressing existing workloads, I'm not spending any time trying to optimize it more (and leaving it up to mlx owners to purse if they see any good way to do it).
- same baseline ./xskgen -s 1400 -b eth3 10:70:fd:48:10:77 10:70:fd:48:10:87 fe80::1270:fdff:fe48:1077 fe80::1270:fdff:fe48:1087 1 1 sent 10000000 packets 116960000000 bits, took 1.189434 sec, 98.332484 gbps 8.407360 mpps
./xskgen -s 200 -b eth3 10:70:fd:48:10:77 10:70:fd:48:10:87 fe80::1270:fdff:fe48:1077 fe80::1270:fdff:fe48:1087 1 1 sent 10000128 packets 20960268288 bits, took 0.425254 sec, 49.288821 gbps 23.515659 mpps
- fill metadata for 1400 ./xskgen -M -s 1400 -b eth3 10:70:fd:48:10:77 10:70:fd:48:10:87 fe80::1270:fdff:fe48:1077 fe80::1270:fdff:fe48:1087 1 1 sent 10000000 packets 116960000000 bits, took 1.189528 sec, 98.324714 gbps 8.406696 mpps
- fill metadata for 200 ./xskgen -M -s 200 -b eth3 10:70:fd:48:10:77 10:70:fd:48:10:87 fe80::1270:fdff:fe48:1077 fe80::1270:fdff:fe48:1087 1 1 sent 10000128 packets 20960268288 bits, took 0.519085 sec, 40.379260 gbps 19.264914 mpps
- request metadata for 1400 ./xskgen -m -s 1400 -b eth3 10:70:fd:48:10:77 10:70:fd:48:10:87 fe80::1270:fdff:fe48:1077 fe80::1270:fdff:fe48:1087 1 1 sent 10000000 packets 116960000000 bits, took 1.189329 sec, 98.341165 gbps 8.408102 mpps
- request metadata for 200 ./xskgen -m -s 200 -b eth3 10:70:fd:48:10:77 10:70:fd:48:10:87 fe80::1270:fdff:fe48:1077 fe80::1270:fdff:fe48:1087 1 1 sent 10000128 packets 20960268288 bits, took 0.519929 sec, 40.313713 gbps 19.233642 mpps
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Acked-by: Jakub Kicinski <kuba@kernel.org> ====================
Link: https://lore.kernel.org/r/20231127190319.1190813-1-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
11614723 |
| 27-Nov-2023 |
Stanislav Fomichev <sdf@google.com> |
xsk: Add option to calculate TX checksum in SW
For XDP_COPY mode, add a UMEM option XDP_UMEM_TX_SW_CSUM to call skb_checksum_help in transmit path. Might be useful to debugging issues with real hard
xsk: Add option to calculate TX checksum in SW
For XDP_COPY mode, add a UMEM option XDP_UMEM_TX_SW_CSUM to call skb_checksum_help in transmit path. Might be useful to debugging issues with real hardware. I also use this mode in the selftests.
Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20231127190319.1190813-9-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
48eb03dd |
| 27-Nov-2023 |
Stanislav Fomichev <sdf@google.com> |
xsk: Add TX timestamp and TX checksum offload support
This change actually defines the (initial) metadata layout that should be used by AF_XDP userspace (xsk_tx_metadata). The first field is flags w
xsk: Add TX timestamp and TX checksum offload support
This change actually defines the (initial) metadata layout that should be used by AF_XDP userspace (xsk_tx_metadata). The first field is flags which requests appropriate offloads, followed by the offload-specific fields. The supported per-device offloads are exported via netlink (new xsk-flags).
The offloads themselves are still implemented in a bit of a framework-y fashion that's left from my initial kfunc attempt. I'm introducing new xsk_tx_metadata_ops which drivers are supposed to implement. The drivers are also supposed to call xsk_tx_metadata_request/xsk_tx_metadata_complete in the right places. Since xsk_tx_metadata_{request,_complete} are static inline, we don't incur any extra overhead doing indirect calls.
The benefit of this scheme is as follows: - keeps all metadata layout parsing away from driver code - makes it easy to grep and see which drivers implement what - don't need any extra flags to maintain to keep track of what offloads are implemented; if the callback is implemented - the offload is supported (used by netlink reporting code)
Two offloads are defined right now: 1. XDP_TXMD_FLAGS_CHECKSUM: skb-style csum_start+csum_offset 2. XDP_TXMD_FLAGS_TIMESTAMP: writes TX timestamp back into metadata area upon completion (tx_timestamp field)
XDP_TXMD_FLAGS_TIMESTAMP is also implemented for XDP_COPY mode: it writes SW timestamp from the skb destructor (note I'm reusing hwtstamps to pass metadata pointer).
The struct is forward-compatible and can be extended in the future by appending more fields.
Reviewed-by: Song Yoong Siang <yoong.siang.song@intel.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20231127190319.1190813-3-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
341ac980 |
| 27-Nov-2023 |
Stanislav Fomichev <sdf@google.com> |
xsk: Support tx_metadata_len
For zerocopy mode, tx_desc->addr can point to an arbitrary offset and carry some TX metadata in the headroom. For copy mode, there is no way currently to populate skb me
xsk: Support tx_metadata_len
For zerocopy mode, tx_desc->addr can point to an arbitrary offset and carry some TX metadata in the headroom. For copy mode, there is no way currently to populate skb metadata.
Introduce new tx_metadata_len umem config option that indicates how many bytes to treat as metadata. Metadata bytes come prior to tx_desc address (same as in RX case).
The size of the metadata has mostly the same constraints as XDP: - less than 256 bytes - 8-byte aligned (compared to 4-byte alignment on xdp, due to 8-byte timestamp in the completion) - non-zero
This data is not interpreted in any way right now.
Reviewed-by: Song Yoong Siang <yoong.siang.song@intel.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20231127190319.1190813-2-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
Revision tags: v6.6-rc7 |
|
#
a940daa5 |
| 17-Oct-2023 |
Thomas Gleixner <tglx@linutronix.de> |
Merge branch 'linus' into smp/core
Pull in upstream to get the fixes so depending changes can be applied.
|
Revision tags: v6.6-rc6 |
|
#
57390019 |
| 11-Oct-2023 |
Thomas Zimmermann <tzimmermann@suse.de> |
Merge drm/drm-next into drm-misc-next
Updating drm-misc-next to the state of Linux v6.6-rc2.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
|
Revision tags: v6.6-rc5 |
|
#
de801933 |
| 03-Oct-2023 |
Ingo Molnar <mingo@kernel.org> |
Merge tag 'v6.6-rc4' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
Revision tags: v6.6-rc4, v6.6-rc3 |
|
#
6f23fc47 |
| 18-Sep-2023 |
Ingo Molnar <mingo@kernel.org> |
Merge tag 'v6.6-rc2' into locking/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
Revision tags: v6.6-rc2 |
|
#
a3f9e4bc |
| 15-Sep-2023 |
Jani Nikula <jani.nikula@intel.com> |
Merge drm/drm-next into drm-intel-next
Sync to v6.6-rc1.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
#
c900529f |
| 12-Sep-2023 |
Thomas Zimmermann <tzimmermann@suse.de> |
Merge drm/drm-fixes into drm-misc-fixes
Forwarding to v6.6-rc1.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
|
Revision tags: v6.6-rc1 |
|
#
bd6c11bc |
| 29-Aug-2023 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'net-next-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni: "Core:
- Increase size limits for to-be-sent skb frag allocat
Merge tag 'net-next-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni: "Core:
- Increase size limits for to-be-sent skb frag allocations. This allows tun, tap devices and packet sockets to better cope with large writes operations
- Store netdevs in an xarray, to simplify iterating over netdevs
- Refactor nexthop selection for multipath routes
- Improve sched class lifetime handling
- Add backup nexthop ID support for bridge
- Implement drop reasons support in openvswitch
- Several data races annotations and fixes
- Constify the sk parameter of routing functions
- Prepend kernel version to netconsole message
Protocols:
- Implement support for TCP probing the peer being under memory pressure
- Remove hard coded limitation on IPv6 specific info placement inside the socket struct
- Get rid of sysctl_tcp_adv_win_scale and use an auto-estimated per socket scaling factor
- Scaling-up the IPv6 expired route GC via a separated list of expiring routes
- In-kernel support for the TLS alert protocol
- Better support for UDP reuseport with connected sockets
- Add NEXT-C-SID support for SRv6 End.X behavior, reducing the SR header size
- Get rid of additional ancillary per MPTCP connection struct socket
- Implement support for BPF-based MPTCP packet schedulers
- Format MPTCP subtests selftests results in TAP
- Several new SMC 2.1 features including unique experimental options, max connections per lgr negotiation, max links per lgr negotiation
BPF:
- Multi-buffer support in AF_XDP
- Add multi uprobe BPF links for attaching multiple uprobes and usdt probes, which is significantly faster and saves extra fds
- Implement an fd-based tc BPF attach API (TCX) and BPF link support on top of it
- Add SO_REUSEPORT support for TC bpf_sk_assign
- Support new instructions from cpu v4 to simplify the generated code and feature completeness, for x86, arm64, riscv64
- Support defragmenting IPv(4|6) packets in BPF
- Teach verifier actual bounds of bpf_get_smp_processor_id() and fix perf+libbpf issue related to custom section handling
- Introduce bpf map element count and enable it for all program types
- Add a BPF hook in sys_socket() to change the protocol ID from IPPROTO_TCP to IPPROTO_MPTCP to cover migration for legacy
- Introduce bpf_me_mcache_free_rcu() and fix OOM under stress
- Add uprobe support for the bpf_get_func_ip helper
- Check skb ownership against full socket
- Support for up to 12 arguments in BPF trampoline
- Extend link_info for kprobe_multi and perf_event links
Netfilter:
- Speed-up process exit by aborting ruleset validation if a fatal signal is pending
- Allow NLA_POLICY_MASK to be used with BE16/BE32 types
Driver API:
- Page pool optimizations, to improve data locality and cache usage
- Introduce ndo_hwtstamp_get() and ndo_hwtstamp_set() to avoid the need for raw ioctl() handling in drivers
- Simplify genetlink dump operations (doit/dumpit) providing them the common information already populated in struct genl_info
- Extend and use the yaml devlink specs to [re]generate the split ops
- Introduce devlink selective dumps, to allow SF filtering SF based on handle and other attributes
- Add yaml netlink spec for netlink-raw families, allow route, link and address related queries via the ynl tool
- Remove phylink legacy mode support
- Support offload LED blinking to phy
- Add devlink port function attributes for IPsec
New hardware / drivers:
- Ethernet: - Broadcom ASP 2.0 (72165) ethernet controller - MediaTek MT7988 SoC - Texas Instruments AM654 SoC - Texas Instruments IEP driver - Atheros qca8081 phy - Marvell 88Q2110 phy - NXP TJA1120 phy
- WiFi: - MediaTek mt7981 support
- Can: - Kvaser SmartFusion2 PCI Express devices - Allwinner T113 controllers - Texas Instruments tcan4552/4553 chips
- Bluetooth: - Intel Gale Peak - Qualcomm WCN3988 and WCN7850 - NXP AW693 and IW624 - Mediatek MT2925
Drivers:
- Ethernet NICs: - nVidia/Mellanox: - mlx5: - support UDP encapsulation in packet offload mode - IPsec packet offload support in eswitch mode - improve aRFS observability by adding new set of counters - extends MACsec offload support to cover RoCE traffic - dynamic completion EQs - mlx4: - convert to use auxiliary bus instead of custom interface logic - Intel - ice: - implement switchdev bridge offload, even for LAG interfaces - implement SRIOV support for LAG interfaces - igc: - add support for multiple in-flight TX timestamps - Broadcom: - bnxt: - use the unified RX page pool buffers for XDP and non-XDP - use the NAPI skb allocation cache - OcteonTX2: - support Round Robin scheduling HTB offload - TC flower offload support for SPI field - Freescale: - add XDP_TX feature support - AMD: - ionic: add support for PCI FLR event - sfc: - basic conntrack offload - introduce eth, ipv4 and ipv6 pedit offloads - ST Microelectronics: - stmmac: maximze PTP timestamping resolution
- Virtual NICs: - Microsoft vNIC: - batch ringing RX queue doorbell on receiving packets - add page pool for RX buffers - Virtio vNIC: - add per queue interrupt coalescing support - Google vNIC: - add queue-page-list mode support
- Ethernet high-speed switches: - nVidia/Mellanox (mlxsw): - add port range matching tc-flower offload - permit enslavement to netdevices with uppers
- Ethernet embedded switches: - Marvell (mv88e6xxx): - convert to phylink_pcs - Renesas: - r8A779fx: add speed change support - rzn1: enables vlan support
- Ethernet PHYs: - convert mv88e6xxx to phylink_pcs
- WiFi: - Qualcomm Wi-Fi 7 (ath12k): - extremely High Throughput (EHT) PHY support - RealTek (rtl8xxxu): - enable AP mode for: RTL8192FU, RTL8710BU (RTL8188GU), RTL8192EU and RTL8723BU - RealTek (rtw89): - Introduce Time Averaged SAR (TAS) support
- Connector: - support for event filtering"
* tag 'net-next-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1806 commits) net: ethernet: mtk_wed: minor change in wed_{tx,rx}info_show net: ethernet: mtk_wed: add some more info in wed_txinfo_show handler net: stmmac: clarify difference between "interface" and "phy_interface" r8152: add vendor/device ID pair for D-Link DUB-E250 devlink: move devlink_notify_register/unregister() to dev.c devlink: move small_ops definition into netlink.c devlink: move tracepoint definitions into core.c devlink: push linecard related code into separate file devlink: push rate related code into separate file devlink: push trap related code into separate file devlink: use tracepoint_enabled() helper devlink: push region related code into separate file devlink: push param related code into separate file devlink: push resource related code into separate file devlink: push dpipe related code into separate file devlink: move and rename devlink_dpipe_send_and_alloc_skb() helper devlink: push shared buffer related code into separate file devlink: push port related code into separate file devlink: push object register/unregister notifications into separate helpers inet: fix IP_TRANSPARENT error handling ...
show more ...
|
Revision tags: v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3 |
|
#
e93165d5 |
| 20-Jul-2023 |
Jakub Kicinski <kuba@kernel.org> |
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:
==================== pull-request: bpf-next 2023-07-19
We've added 45 non-merge comm
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:
==================== pull-request: bpf-next 2023-07-19
We've added 45 non-merge commits during the last 3 day(s) which contain a total of 71 files changed, 7808 insertions(+), 592 deletions(-).
The main changes are:
1) multi-buffer support in AF_XDP, from Maciej Fijalkowski, Magnus Karlsson, Tirthendu Sarkar.
2) BPF link support for tc BPF programs, from Daniel Borkmann.
3) Enable bpf_map_sum_elem_count kfunc for all program types, from Anton Protopopov.
4) Add 'owner' field to bpf_rb_node to fix races in shared ownership, Dave Marchevsky.
5) Prevent potential skb_header_pointer() misuse, from Alexei Starovoitov.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (45 commits) bpf, net: Introduce skb_pointer_if_linear(). bpf: sync tools/ uapi header with selftests/bpf: Add mprog API tests for BPF tcx links selftests/bpf: Add mprog API tests for BPF tcx opts bpftool: Extend net dump with tcx progs libbpf: Add helper macro to clear opts structs libbpf: Add link-based API for tcx libbpf: Add opts-based attach/detach/query API for tcx bpf: Add fd-based tcx multi-prog infra with link support bpf: Add generic attach/detach/query API for multi-progs selftests/xsk: reset NIC settings to default after running test suite selftests/xsk: add test for too many frags selftests/xsk: add metadata copy test for multi-buff selftests/xsk: add invalid descriptor test for multi-buffer selftests/xsk: add unaligned mode test for multi-buffer selftests/xsk: add basic multi-buffer test selftests/xsk: transmit and receive multi-buffer packets xsk: add multi-buffer documentation i40e: xsk: add TX multi-buffer support ice: xsk: Tx multi-buffer support ... ====================
Link: https://lore.kernel.org/r/20230719175424.75717-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
3226e313 |
| 19-Jul-2023 |
Alexei Starovoitov <ast@kernel.org> |
Merge branch 'xsk-multi-buffer-support'
Maciej Fijalkowski says:
==================== xsk: multi-buffer support
v6->v7: - rebase...[Alexei]
v5->v6: - update bpf_xdp_query_opts__last_field in patc
Merge branch 'xsk-multi-buffer-support'
Maciej Fijalkowski says:
==================== xsk: multi-buffer support
v6->v7: - rebase...[Alexei]
v5->v6: - update bpf_xdp_query_opts__last_field in patch 10 [Alexei]
v4->v5: - align options argument size to match options from xdp_desc [Benjamin] - cleanup skb from xdp_sock on socket termination [Toke] - introduce new netlink attribute for letting user space know about Tx frag limit; this substitutes xdp_features flag previously dedicated for setting ZC multi-buffer support [Toke, Jakub] - include i40e ZC multi-buffer support - enable TOO_MANY_FRAGS for ZC on xskxceiver; this is now possible due to netlink attribute mentioned two bullets above
v3->v4: -rely on ynl for adding new xdp_features flag [Jakub] - move xskb_list to xsk_buff_pool
v2->v3: - Fix issue with the next valid packet getting dropped after an invalid packet with MAX_SKB_FRAGS + 1 frags [Magnus] - query NETDEV_XDP_ACT_ZC_SG flag within xskxceiver and act on it - remove redundant include in xsk.c [kernel test robot] - s/NETDEV_XDP_ACT_NDO_ZC_SG/NETDEV_XDP_ACT_ZC_SG + kernel doc [Magnus, Simon]
v1->v2: - fix spelling issues in commit messages [Simon] - remove XSK_DESC_MAX_FRAGS, use MAX_SKB_FRAGS instead [Stan, Alexei] - add documentation patch - fix build error from kernel test robot on patch 10
This series of patches add multi-buffer support for AF_XDP. XDP and various NIC drivers already have support for multi-buffer packets. With this patch set, programs using AF_XDP sockets can now also receive and transmit multi-buffer packets both in copy as well as zero-copy mode. ZC multi-buffer implementation is based on ice driver.
Some definitions to put us all on the same page:
* A packet consists of one or more frames
* A descriptor in one of the AF_XDP rings always refers to a single frame. In the case the packet consists of a single frame, the descriptor refers to the whole packet.
To represent a packet consisting of multiple frames, we introduce a new flag called XDP_PKT_CONTD in the options field of the Rx and Tx descriptors. If it is true (1) the packet continues with the next descriptor and if it is false (0) it means this is the last descriptor of the packet. Why the reverse logic of end-of-packet (eop) flag found in many NICs? Just to preserve compatibility with non-multi-buffer applications that have this bit set to false for all packets on Rx, and the apps set the options field to zero for Tx, as anything else will be treated as an invalid descriptor.
These are the semantics for producing packets onto XSK Tx ring consisting of multiple frames:
* When an invalid descriptor is found, all the other descriptors/frames of this packet are marked as invalid and not completed. The next descriptor is treated as the start of a new packet, even if this was not the intent (because we cannot guess the intent). As before, if your program is producing invalid descriptors you have a bug that must be fixed.
* Zero length descriptors are treated as invalid descriptors.
* For copy mode, the maximum supported number of frames in a packet is equal to CONFIG_MAX_SKB_FRAGS + 1. If it is exceeded, all descriptors accumulated so far are dropped and treated as invalid. To produce an application that will work on any system regardless of this config setting, limit the number of frags to 18, as the minimum value of the config is 17.
* For zero-copy mode, the limit is up to what the NIC HW supports. User space can discover this via newly introduced NETDEV_A_DEV_XDP_ZC_MAX_SEGS netlink attribute.
Here is an example Tx path pseudo-code (using libxdp interfaces for simplicity) ignoring that the umem is finite in size, and that we eventually will run out of packets to send. Also assumes pkts.addr points to a valid location in the umem.
void tx_packets(struct xsk_socket_info *xsk, struct pkt *pkts, int batch_size) { u32 idx, i, pkt_nb = 0;
xsk_ring_prod__reserve(&xsk->tx, batch_size, &idx);
for (i = 0; i < batch_size;) { u64 addr = pkts[pkt_nb].addr; u32 len = pkts[pkt_nb].size;
do { struct xdp_desc *tx_desc;
tx_desc = xsk_ring_prod__tx_desc(&xsk->tx, idx + i++); tx_desc->addr = addr;
if (len > xsk_frame_size) { tx_desc->len = xsk_frame_size; tx_desc->options |= XDP_PKT_CONTD; } else { tx_desc->len = len; tx_desc->options = 0; pkt_nb++; } len -= tx_desc->len; addr += xsk_frame_size;
if (i == batch_size) { /* Remember len, addr, pkt_nb for next * iteration. Skipped for simplicity. */ break; } } while (len); }
xsk_ring_prod__submit(&xsk->tx, i); }
On the Rx path in copy mode, the xsk core copies the XDP data into multiple descriptors, if needed, and sets the XDP_PKT_CONTD flag as detailed before. Zero-copy mode in order to avoid the copies has to maintain a chain of xdp_buff_xsk structs that represent whole packet. This is because what actually is redirected is the xdp_buff and we currently have no equivalent mechanism that is used for copy mode (embedded skb_shared_info in xdp_buff) to carry the frags. This means xdp_buff_xsk grows in size but these members are at the end and should not be touched when data path is not dealing with fragmented packets. This solution kept us within assumed performance impact, hence we decided to proceed with it.
When the application gets a descriptor with the XDP_PKT_CONTD flag set to one, it means that the packet consists of multiple buffers and it continues with the next buffer in the following descriptor. When a descriptor with XDP_PKT_CONTD == 0 is received, it means that this is the last buffer of the packet. AF_XDP guarantees that only a complete packet (all frames in the packet) is sent to the application.
If application reads a batch of descriptors, using for example the libxdp interfaces, it is not guaranteed that the batch will end with a full packet. It might end in the middle of a packet and the rest of the buffers of that packet will arrive at the beginning of the next batch, since the libxdp interface does not read the whole ring (unless you have an enormous batch size or a very small ring size).
Here is a simple Rx path pseudo-code example (using libxdp interfaces for simplicity). Error paths have been excluded for simplicity:
void rx_packets(struct xsk_socket_info *xsk) { static bool new_packet = true; u32 idx_rx = 0, idx_fq = 0; static char *pkt;
int rcvd = xsk_ring_cons__peek(&xsk->rx, opt_batch_size, &idx_rx);
xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq);
for (int i = 0; i < rcvd; i++) { struct xdp_desc *desc = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx++); char *frag = xsk_umem__get_data(xsk->umem->buffer, desc->addr); bool eop = !(desc->options & XDP_PKT_CONTD);
if (new_packet) pkt = frag; else add_frag_to_pkt(pkt, frag);
if (eop) process_pkt(pkt);
new_packet = eop;
*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) = desc->addr; }
xsk_ring_prod__submit(&xsk->umem->fq, rcvd); xsk_ring_cons__release(&xsk->rx, rcvd); }
We had to introduce a new bind flag (XDP_USE_SG) on the AF_XDP level to enable multi-buffer support. The reason we need to differentiate between non multi-buffer and multi-buffer is the behaviour when the kernel gets a packet that is larger than the frame size. Without multi-buffer, this packet is dropped and marked in the stats. With multi-buffer on, we want to split it up into multiple frames instead.
At the start, we thought that riding on the .frags section name of the XDP program was a good idea. You do not have to introduce yet another flag and all AF_XDP users must load an XDP program anyway to get any traffic up to the socket, so why not just say that the XDP program decides if the AF_XDP socket should get multi-buffer packets or not? The problem is that we can create an AF_XDP socket that is Tx only and that works without having to load an XDP program at all. Another problem is that the XDP program might change during the execution, so we would have to check this for every single packet.
Here is the observed throughput when compared to a codebase without any multi-buffer changes and measured with xdpsock for 64B packets. Apparently ZC Tx takes a hit from explicit zero length descriptors validation. Overall, in terms of ZC performance, there is a room for improvement, but for now we think this work is in a good shape in terms of correctness and functionality. We were targetting for up to 5% overhead though. Note that ZC performance drops come from core + driver support being combined, whereas copy mode had already driver support in place.
Mode rxdrop l2fwd txonly ice-zc -4% -7% -6% i40e-zc -7% -6% -7% drv -1.2% 0% +2% skb -0.6% -1% +2%
Thank you, Tirthendu, Magnus and Maciej ====================
Link: https://lore.kernel.org/r/20230719132421.584801-1-maciej.fijalkowski@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
81470b5c |
| 19-Jul-2023 |
Tirthendu Sarkar <tirthendu.sarkar@intel.com> |
xsk: introduce XSK_USE_SG bind flag for xsk socket
As of now xsk core drops any xdp_buff with data size greater than the xsk frame_size as set by the af_xdp application. With multi-buffer support in
xsk: introduce XSK_USE_SG bind flag for xsk socket
As of now xsk core drops any xdp_buff with data size greater than the xsk frame_size as set by the af_xdp application. With multi-buffer support introduced in the next patch xsk core can now split those buffers into multiple descriptors provided the af_xdp application can handle them. Such capability of the application needs to be independent of the xdp_prog's frag support capability since there are cases where even a single xdp_buffer may need to be split into multiple descriptors owing to a smaller xsk frame size.
For e.g., with NIC rx_buffer size set to 4kB, a 3kB packet will constitute of a single buffer and so will be sent as such to AF_XDP layer irrespective of 'xdp.frags' capability of the XDP program. Now if the xsk frame size is set to 2kB by the AF_XDP application, then the packet will need to be split into 2 descriptors if AF_XDP application can handle multi-buffer, else it needs to be dropped.
Applications can now advertise their frag handling capability to xsk core so that xsk core can decide if it should drop or split xdp_buffs that exceed xsk frame size. This is done using a new 'XSK_USE_SG' bind flag for the xdp socket.
Signed-off-by: Tirthendu Sarkar <tirthendu.sarkar@intel.com> Link: https://lore.kernel.org/r/20230719132421.584801-3-maciej.fijalkowski@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|