xref: /illumos-gate/usr/src/man/man9e/mac.9e (revision 59d65d3175825093531e82f44269d948ed510a00)
1.\"
2.\" This file and its contents are supplied under the terms of the
3.\" Common Development and Distribution License ("CDDL"), version 1.0.
4.\" You may only use this file in accordance with the terms of version
5.\" 1.0 of the CDDL.
6.\"
7.\" A full copy of the text of the CDDL should have accompanied this
8.\" source.  A copy of the CDDL is also available via the Internet at
9.\" http://www.illumos.org/license/CDDL.
10.\"
11.\"
12.\" Copyright 2019 Joyent, Inc.
13.\" Copyright 2020 RackTop Systems, Inc.
14.\" Copyright 2023 Oxide Computer Company
15.\"
16.Dd January 30, 2023
17.Dt MAC 9E
18.Os
19.Sh NAME
20.Nm mac ,
21.Nm GLDv3
22.Nd MAC networking device driver overview
23.Sh SYNOPSIS
24.In sys/mac_provider.h
25.In sys/mac_ether.h
26.Sh INTERFACE LEVEL
27illumos DDI specific
28.Sh DESCRIPTION
29The
30.Sy MAC
31framework provides a means for implementing high-performance networking
32device drivers.
33It is the successor to the GLD interfaces and is sometimes referred to as the
34GLDv3.
35The remainder of this manual introduces the aspects of writing devices drivers
36that leverage the MAC framework.
37While both the GLDv3 and MAC framework refer to the same thing, in this manual
38page we use the term the
39.Em MAC framework
40to refer to the device driver interface.
41.Pp
42MAC device drivers are character devices.
43They define the standard
44.Xr _init 9E ,
45.Xr _fini 9E ,
46and
47.Xr _info 9E
48entry points to initialize the module, as well as
49.Xr dev_ops 9S
50and
51.Xr cb_ops 9S
52structures.
53.Pp
54The main interface with MAC is through a series of callbacks defined in
55a
56.Xr mac_callbacks 9S
57structure.
58These callbacks control all the aspects of the device.
59They range from sending data, getting and setting of properties, controlling mac
60address filters, and also managing promiscuous mode.
61.Pp
62The MAC framework takes care of many aspects of the device driver's
63management.
64A device that uses the MAC framework does not have to worry about creating
65device nodes or implementing
66.Xr open 9E
67or
68.Xr close 9E
69routines.
70In addition, all of the work to interact with
71.Xr dlpi 4P
72is taken care of automatically and transparently.
73.Ss High-Level Design
74At a high-level, a device driver is chiefly concerned with three general
75operations:
76.Bl -enum -offset indent
77.It
78Sending frames
79.It
80Receiving frames
81.It
82Managing device configuration and metadata
83.El
84.Pp
85When sending frames, the MAC framework always calls functions registered
86in the
87.Xr mac_callbacks 9S
88structure to have the driver transmit frames on hardware.
89When receiving frames, the driver will generally receive an interrupt which will
90cause it to check for incoming data and deliver it to the MAC framework.
91.Pp
92Configuration of a device, such as whether auto-negotiation should be
93enabled, the speeds that the device supports, the MTU (maximum
94transmission unit), and the generation of pause frames are all driven by
95properties.
96The functions to get, set, and obtain information about properties are
97defined through callback functions specified in the
98.Xr mac_callbacks 9S
99structure.
100The full list of properties and a description of the relevant callbacks
101can be found in the
102.Sx PROPERTIES
103section.
104.Pp
105The MAC framework is designed to take advantage of various modern
106features provided by hardware, such as checksumming, segmentation
107offload, and hardware filtering.
108The MAC framework assumes none of these advanced features are present
109and allows device drivers to negotiate them through a capability system.
110Drivers can declare that they support various capabilities by
111implementing the optional
112.Xr mc_getcapab 9E
113entry point.
114Each capability has its associated entry points and structures to fill
115out.
116The capabilities are detailed in the
117.Sx CAPABILITIES
118section.
119.Pp
120The following sections describe the flow of a basic device driver.
121For advanced device drivers, the flow is generally the same.
122The primary distinction is in how frames are sent and received.
123.Ss Initializing MAC Support
124For a device to be used by the MAC framework, it must register with the
125framework and take specific actions during
126.Xr _init 9E ,
127.Xr attach 9E ,
128.Xr detach 9E ,
129and
130.Xr _fini 9E .
131.Pp
132All device drivers have to define a
133.Xr dev_ops 9S
134structure which is pointed to by a
135.Xr modldrv 9S
136structure and the corresponding NULL-terminated
137.Xr modlinkage 9S
138structure.
139The
140.Xr dev_ops 9S
141structure should have a
142.Xr cb_ops 9S
143structure defined for it; however, it does not need to implement any of
144the standard
145.Xr cb_ops 9S
146entry points unless it also exposes a custom set of device nodes not
147otherwise managed by the MAC framework.
148See the
149.Sx Custom Device Nodes
150section for more details.
151.Pp
152Normally, in a driver's
153.Xr _init 9E
154entry point, it passes its
155.Xr modlinkage 9S
156structure directly to
157.Xr mod_install 9F .
158To properly register with MAC, the driver must call
159.Xr mac_init_ops 9F
160before it calls
161.Xr mod_install 9F .
162If for some reason the
163.Xr mod_install 9F
164function fails, then the driver must be removed by a call to
165.Xr mac_fini_ops 9F .
166.Pp
167Conversely, in the driver's
168.Xr _fini 9E
169routine, it should call
170.Xr mac_fini_ops 9F
171after it successfully calls
172.Xr mod_remove 9F .
173For an example of how to use the
174.Xr mac_init_ops 9F
175and
176.Xr mac_fini_ops 9F
177functions, see the examples section in
178.Xr mac_init_ops 9F .
179.Ss Custom Device Nodes
180A device may want to provide its own minor nodes as simple character or block
181devices backed by the usual
182.Xr cb_ops 9S
183routines.
184The MAC framework allows for this by leaving a portion of the minor
185number space available for private driver use.
186.Xr mac_private_minor 9F
187returns the first minor number a driver may use for its own purposes,
188e.g., to pass to
189.Xr ddi_create_minor_node 9F .
190.Pp
191A driver making use of this ability must provide its own
192.Xr getinfo 9E
193implementation that is aware of any such minor nodes.
194It must also delegate back to the MAC framework as appropriate via either
195calls to
196.Xr mac_getinfo 9F
197or
198.Xr mac_devt_to_instance 9F
199for MAC reserved minor nodes.
200It should also take care to not affect MAC reserved minors, e.g.,
201removing all minor nodes associated with a device:
202.Bd -literal -offset indent
203    ddi_remove_minor_node(dip, NULL);
204.Ed
205.Ss Registering with MAC
206Every instance of a device should register separately with MAC.
207To register with MAC, a driver must allocate a
208.Xr mac_register 9S
209structure, fill it in, and then call
210.Xr mac_register 9F .
211The
212.Vt mac_register_t
213structure contains information about the device and all of the required
214function pointers that will be used as callbacks by the framework.
215.Pp
216These steps should all be taken during a device's
217.Xr attach 9E
218entry point.
219It is recommended that the driver perform this sequence of steps after the
220device has finished its initialization of the chipset and interrupts, though
221interrupts should not be enabled at that point.
222After it calls
223.Xr mac_register 9F
224it will start receiving callbacks from the MAC framework.
225.Pp
226To allocate the registration structure, the driver should call
227.Xr mac_alloc 9F .
228Device drivers should generally always pass the symbol
229.Dv MAC_VERSION
230as the argument to
231.Xr mac_alloc 9F .
232Upon successful completion, the driver will receive a
233.Vt mac_register_t
234structure which it should fill in.
235The structure and its members are documented in
236.Xr mac_register 9S .
237.Pp
238The
239.Xr mac_callbacks 9S
240structure is not allocated as a part of the
241.Xr mac_register 9S
242structure.
243In general, device drivers declare this statically.
244See the
245.Sx MAC Callbacks
246section for more information on how to fill it out.
247.Pp
248Once the structure has been filled in, the driver should call
249.Xr mac_register 9F
250to register itself with MAC.
251The handle that it uses to register with should be part of the driver's soft
252state.
253It will be used in various other support functions and callbacks.
254.Pp
255If the call is successful, then the device driver
256should enable interrupts and finish any other initialization required.
257If the call to
258.Xr mac_register 9F
259failed, then it should unwind its initialization and should return
260.Dv DDI_FAILURE
261from its
262.Xr attach 9E
263routine.
264.Pp
265The driver does not need to hold onto an allocated
266.Xr mac_register 9S
267structure after it has called the
268.Xr mac_register 9F
269function.
270Whether the
271.Xr mac_register 9F
272function returns successfully or not, the driver may free its
273.Xr mac_register 9S
274structure by calling the
275.Xr mac_free 9F
276function.
277.Ss MAC Callbacks
278The MAC framework interacts with a device driver through a series of
279callbacks.
280These callbacks are described in their individual manual pages and the
281collection of callbacks is indicated in the
282.Xr mac_callbacks 9S
283manual page.
284This section does not focus on the specific functions, but rather on
285interactions between them and the rest of the device driver framework.
286.Pp
287A device driver should make no assumptions about when the various
288callbacks will be called and whether or not they will be called
289simultaneously.
290For example, a device driver may be asked to transmit data through a call to its
291.Xr mc_tx 9E
292entry point while it is being asked to get a device property through a
293call to its
294.Xr mc_getprop 9E
295entry point.
296As such, while some calls may be serialized to the device, such as setting
297properties, the device driver should always presume that all of its data needs
298to be protected with locks.
299While the device is holding locks, it is safe for it call the following MAC
300routines:
301.Bl -bullet -offset indent -compact
302.It
303.Xr mac_hcksum_get 9F
304.It
305.Xr mac_hcksum_set 9F
306.It
307.Xr mac_lso_get 9F
308.It
309.Xr mac_maxsdu_update 9F
310.It
311.Xr mac_prop_info_set_default_link_flowctrl 9F
312.It
313.Xr mac_prop_info_set_default_str 9F
314.It
315.Xr mac_prop_info_set_default_uint8 9F
316.It
317.Xr mac_prop_info_set_default_uint32 9F
318.It
319.Xr mac_prop_info_set_default_uint64 9F
320.It
321.Xr mac_prop_info_set_perm 9F
322.It
323.Xr mac_prop_info_set_range_uint32 9F
324.El
325.Pp
326Any other MAC related routines should not be called with locks held,
327such as
328.Xr mac_link_update 9F
329or
330.Xr mac_rx 9F .
331Other routines in the DDI may be called while locks are held; however,
332device driver writers should be careful about calling blocking routines
333while locks are held or in interrupt context, even when it is
334legal to do so as this may cause all other callers that need a given
335lock to back up behind such an operation.
336.Ss Receiving Data
337A device driver will often receive data through the means of an
338interrupt or by being asked to poll for frames.
339When this occurs, zero or more frames, each with optional metadata, may
340be ready for the device driver to consume.
341Often each frame has a corresponding descriptor which has information about
342whether or not there were errors or whether or not the device successfully
343checksummed the packet.
344In addition to the per-packet flow described below, there are certain
345requirements that drivers must adhere to when programming the hardware
346to receive data.
347See the section
348.Sx RECEIVE DESCRIPTOR LAYOUT
349for more information.
350.Pp
351During a single interrupt or poll request, a device driver should process
352a fixed number of frames.
353For each frame the device driver should:
354.Bl -enum -offset indent
355.It
356Ensure that all of the DMA memory for the descriptor ring is synchronized with
357the
358.Xr ddi_dma_sync 9F
359function and check the handle for errors if the device driver has enabled DMA
360error reporting as part of the Fault Management Architecture (FMA).
361If the driver does not rely on DMA, then it may skip this step.
362It is recommended that this is performed once per interrupt or poll for
363the entire region and not on a per-packet basis.
364.It
365First check whether or not the frame has errors.
366If errors were detected, then the frame should not be sent to the operating
367system.
368It is recommended that devices keep kstats (see
369.Xr kstat_create 9F
370for more information) and bump the counter whenever such an error is
371detected.
372If the device distinguishes between the types of errors, then separate kstats
373for each class of error are recommended.
374See the
375.Sx STATISTICS
376section for more information on the various error cases that should be
377considered.
378.It
379Once the frame has been determined to be valid, the device driver should
380transform the frame into a
381.Xr mblk 9S .
382See the section
383.Sx MBLKS AND DMA
384for more information on how to transform and prepare a message block.
385.It
386If the device supports hardware checksumming (see the
387.Sx CAPABILITIES
388section for more information on checksumming), then the device driver
389should set the corresponding checksumming information with a call to
390.Xr mac_hcksum_set 9F .
391.It
392It should then append this new message block to the
393.Em end
394of the message block chain, linking it to the
395.Fa b_next
396pointer.
397It is vitally important that all the frames be chained in the order that they
398were received.
399If the device driver mistakenly reorders frames, then it may cause performance
400impacts in the TCP stack and potentially impact application correctness.
401.El
402.Pp
403Once all the frames have been processed and assembled, the device driver
404should deliver them to the rest of the operating system by calling
405.Xr mac_rx 9F .
406The device driver should try to give as many mblk_t structures to the
407system at once.
408It
409.Em should not
410call
411.Xr mac_rx 9F
412once for every assembled mblk_t.
413.Pp
414The device driver must not hold any locks across the call to
415.Xr mac_rx 9F .
416When this function is called, received data will be pushed through the
417networking stack and some replies may be generated and given to the
418driver to send out.
419.Pp
420It is not the device driver's responsibility to determine whether or not
421the system can keep up with a driver's delivery rate of frames.
422The rest of the networking stack will handle issues related to keeping up
423appropriately and ensure that kernel memory is not exhausted by packets
424that are not being processed.
425.Pp
426If the device driver has negotiated the
427.Dv MAC_CAPAB_RINGS
428capability
429.Pq discussed in Xr mac_capab_rings 9E
430then it should call
431.Xr mac_rx_ring 9F
432and not
433.Xr mac_rx 9F .
434A given interrupt may correspond to more than one ring that needs to be
435checked.
436The set of rings is likely to span different groups that were registered
437with MAC through the
438.Xr mr_gget 9E
439interface.
440In those cases, the driver should follow the above procedure
441independently for each ring.
442That means it will call
443.Xr mac_rx_ring 9F
444once for each ring using the handle that it received from when MAC
445called the driver's
446.Xr mr_rget 9E
447entry point.
448When it is looking at the rings, the driver will need to make sure that
449the ring has not had interrupts disabled
450.Pq due to a pending change to polling mode .
451This is discussed in greater detail in the
452.Xr mac_capab_rings 9E
453and
454.Xr mri_poll 9E
455manual pages.
456.Pp
457Finally, the device driver should make sure that any other housekeeping
458activities required for the ring are taken care of such that more data
459can be received.
460.Ss Transmitting Data and Back Pressure
461A device driver will be asked to transmit a message block chain by
462having it's
463.Xr mc_tx 9E
464entry point called.
465While the driver is processing the message blocks, it may run out of resources.
466For example, a transmit descriptor ring may become full.
467At that point, the device driver should return the remaining unprocessed frames.
468The act of returning frames indicates that the device has asserted flow control.
469Once this has been done, no additional calls will be made to the
470driver's transmit entry point and the back pressure will be propagated
471throughout the rest of the networking stack.
472.Pp
473At some point in the future when resources have become available again,
474for example after an interrupt indicating that some portion of the
475transmit ring has been sent, then the device driver must notify the
476system that it can continue transmission.
477To do this, the driver should call
478.Xr mac_tx_update 9F .
479After that point, the driver will receive calls to its
480.Xr mc_tx 9E
481entry point again.
482As mentioned in the section on callbacks, the device driver should avoid holding
483any particular locks across the call to
484.Xr mac_tx_update 9F .
485.Ss Interrupt Coalescing
486For devices operating at higher data rates, interrupt coalescing is an
487important part of a well functioning device and may impact the
488performance of the device.
489Not all devices support interrupt coalescing.
490If interrupt coalescing is supported on the device, it is recommended that
491device driver writers provide private properties for their device to control the
492interrupt coalescing rate.
493This will make it much easier to perform experiments and observe the impact of
494different interrupt rates on the rest of the system.
495.Ss Polling
496Even with interrupt coalescing, when there is a certain incoming packet rate it
497can make more sense to just actively poll the device, asking for more packets
498rather than constantly taking an interrupt.
499When a device driver supports the
500.Xr mac_capab_rings 9E
501capability and therefore polling on receive rings, the MAC framework will ask
502the driver to disable interrupts, with its
503.Xr mi_disable 9E
504entry point, and then subsequently call its polling entry point,
505.Xr mri_poll 9E .
506.Pp
507As long as a device driver implements the needed entry points, then there is
508nothing else that it needs to do to take advantage of polling.
509A driver should not attempt to spin up its own threads, task queues, or
510creatively use timeouts, to try to simulate polling for received packets.
511.Ss MAC Address Filter Management
512The MAC framework will attempt to use as many MAC address filters as a
513device has.
514To program a multicast address filter, the driver's
515.Xr mc_multicst 9E
516entry point will be called.
517If the device driver runs out of filters, it should not take any special action
518and just return the appropriate error as documented in the corresponding manual
519pages for the entry points.
520The framework will ensure that the device is placed in promiscuous mode
521if it needs to.
522.Pp
523If the hardware supports more than one unicast filter then the device
524driver should consider implementing the
525.Dv MAC_CAPAB_RINGS
526capability, which exposes a means for multiple unicast MAC address filters to be
527used by the broader system.
528It is still useful to implement this on hardware which only has a single ring.
529See
530.Xr mac_capab_rings 9E
531for more information.
532.Ss Receive Side Scaling
533Receive side scaling is where a hardware device supports multiple,
534independent queues of frames that can be received.
535Each of these queues is generally associated with an independent
536interrupt and the hardware usually performs some form of hash across the
537queues.
538Hardware which supports this should look at implementing the
539.Dv MAC_CAPAB_RINGS
540capability and see
541.Xr mac_capab_rings 9E
542for more information.
543.Ss Link Updates
544It is the responsibility of the device driver to keep track of the
545data link's state.
546Many devices provide a means of receiving an interrupt when the state of the
547link changes.
548When such a change happens, the driver should update its internal data
549structures and then call
550.Xr mac_link_update 9F
551to inform the MAC layer that this has occurred.
552If the device driver does not properly inform the system about link changes,
553then various features like link aggregations and other mechanisms that leverage
554the link state will not work correctly.
555.Ss Link Speed and Auto-negotiation
556Many networking devices support more than one possible speed that they
557can operate at.
558The selection of a speed is often performed through
559.Em auto-negotiation ,
560though some devices allow the user to control what speeds are advertised
561and used.
562.Pp
563Logically, there are two different sets of things that the device driver
564needs to keep track of while it's operating:
565.Bl -enum
566.It
567The supported speeds in hardware.
568.It
569The enabled speeds from the user.
570.El
571.Pp
572By default, when a link first comes up, the device driver should
573generally configure the link to support the common set of speeds and
574perform auto-negotiation.
575.Pp
576A user can control what speeds a device advertises via auto-negotiation
577and whether or not it performs auto-negotiation at all by using a series
578of properties that have
579.Sy _EN_
580in the name.
581These are read/write properties and there is one for each speed supported in the
582operating system.
583For a full list of them, see the
584.Sx PROPERTIES
585section.
586.Pp
587In addition to these properties, there is a corresponding set of
588properties with
589.Sy _ADV_
590in the name.
591These are similar to the
592.Sy _EN_
593family of properties, but they are read-only and indicate what the
594device has actually negotiated.
595While they are generally similar to the
596.Sy _EN_
597family of properties, they may change depending on power settings.
598See the
599.Sy Ethernet Link Properties
600section in
601.Xr dladm 8
602for more information.
603.Pp
604It's worth discussing how these different values get used throughout the
605different entry points.
606The first entry point to consider is the
607.Xr mc_propinfo 9E
608entry point.
609For a given speed, the driver should consult whether or not the hardware
610supports this speed.
611If it does, it should fill in the default value that the hardware takes and
612whether or not the property is writable.
613The properties should also be updated to indicate whether or not it is writable.
614This holds for both the
615.Sy _EN_
616and
617.Sy _ADV_
618family of properties.
619.Pp
620The next entry point is
621.Xr mc_getprop 9E .
622Here, the device should first consult whether the given speed is
623supported.
624If it is not, then the driver should return
625.Er ENOTSUP .
626If it does, then it should return the current value of the property.
627.Pp
628The last property endpoint is the
629.Xr mc_setprop 9E
630entry point.
631Here, the same logic applies.
632Before the driver considers whether or not the property is writable, it should
633first check whether or not it's a supported property.
634If it's not, then it should return
635.Er ENOTSUP .
636Otherwise, it should proceed to check whether the property is writable,
637and if it is and a valid value, then it should update the property and
638restart the link's negotiation.
639.Pp
640Finally, there is the
641.Xr mc_getstat 9E
642entry point.
643Several of the statistics that are queried relate to auto-negotiation and
644hardware capabilities.
645When a statistic relates to the hardware supporting a given speed, the
646.Sy _EN_
647properties should be ignored.
648The only thing that should be consulted is what the hardware itself supports.
649Otherwise, the statistics should look at what is currently being advertised by
650the device.
651.Ss Unregistering from MAC
652During a driver's
653.Xr detach 9E
654routine, it should unregister the device instance from MAC by calling
655.Xr mac_unregister 9F
656on the handle that it originally called it on.
657If the call to
658.Xr mac_unregister 9F
659failed, then the device is likely still in use and the driver should
660fail the call to
661.Xr detach 9E .
662.Ss Interacting with Devices
663Administrators always interact with devices through the
664.Xr dladm 8
665command line interface.
666The state of devices such as whether the link is considered up or down,
667various link properties such as the MTU, auto-negotiation state, and
668flow control state, are all exposed.
669It is also the preferred way that these properties are set and configured.
670.Pp
671While device tunables may be presented in a
672.Xr driver.conf 5
673file, it is recommended instead to expose such things through
674.Xr dladm 8
675private properties, whether explicitly documented or not.
676.Sh CAPABILITIES
677Capabilities in the MAC Framework are optional features that a device
678supports which indicate various hardware features that the device
679supports.
680The two current capabilities that the system supports are related to being able
681to hardware perform large send offloads (LSO), often also known as TCP
682segmentation and the ability for hardware to calculate and verify the checksums
683present in IPv4, IPV6, and protocol headers such as TCP and UDP.
684.Pp
685The MAC framework will query a device for support of a capability
686through the
687.Xr mc_getcapab 9E
688function.
689Each capability has its own constant and may have corresponding data that goes
690along with it and a specific structure that the device is required to fill in.
691Note, the set of capabilities changes over time and there are also private
692capabilities in the system.
693Several of the capabilities are used in the implementation of the MAC framework.
694Others, like
695.Dv MAC_CAPAB_RINGS ,
696represent feature that have not been stabilized and thus both API and binary
697compatibility for them is not guaranteed.
698It is important that the device driver handles unknown capabilities correctly.
699For more information, see
700.Xr mc_getcapab 9E .
701.Pp
702The following capabilities are
703stable and defined in the system:
704.Ss Dv MAC_CAPAB_HCKSUM
705The
706.Dv MAC_CAPAB_HCKSUM
707capability indicates to the system that the device driver supports some
708amount of checksumming.
709The specific data for this capability is a pointer to a
710.Vt uint32_t .
711To indicate no support for any kind of checksumming, the driver should
712either set this value to zero or simply return that it doesn't support
713the capability.
714.Pp
715Note, the values that the driver declares in this capability indicate
716what it can do when it transmits data.
717If the driver can only verify checksums when receiving data, then it should not
718indicate that it supports this capability.
719The following set of flags may be combined through a bitwise inclusive OR:
720.Bl -tag -width Ds
721.It Dv HCKSUM_INET_PARTIAL
722This indicates that the hardware can calculate a partial checksum for
723both IPv4 and IPv6 UDP and TCP packets; however, it requires the pseudo-header
724checksum be calculated for it.
725The pseudo-header checksum will be available for the mblk_t when calling
726.Xr mac_hcksum_get 9F .
727Note this does not imply that the hardware is capable of calculating
728the partial checksum for other L4 protocols or the IPv4 header checksum.
729That should be indicated with the
730.Dv HCKSUM_IPHDRCKSUM flag.
731.It Dv HCKSUM_INET_FULL_V4
732This indicates that the hardware will fully calculate the L4 checksum for
733outgoing IPv4 UDP or TCP packets only, and does not require a pseudo-header
734checksum.
735Note this does not imply that the hardware is capable of calculating the
736checksum for other L4 protocols or the IPv4 header checksum.
737That should be indicated with the
738.Dv HCKSUM_IPHDRCKSUM .
739.It Dv HCKSUM_INET_FULL_V6
740This indicates that the hardware will fully calculate the L4 checksum for
741outgoing IPv6 UDP or TCP packets only, and does not require a pseudo-header
742checksum.
743Note this does not imply that the hardware is capable of calculating the
744checksum for any other L4 protocols.
745.It Dv HCKSUM_IPHDRCKSUM
746This indicates that the hardware supports calculating the checksum for
747the IPv4 header itself.
748.El
749.Pp
750When in a driver's transmit function, the driver will be processing a
751single frame.
752It should call
753.Xr mac_hcksum_get 9F
754to see what checksum flags are set on it.
755Note that the flags that are set on it are different from the ones described
756above and are documented in its manual page.
757These flags indicate how the driver is expected to program the hardware and what
758checksumming is required.
759Not all frames will require hardware checksumming or will ask the hardware to
760checksum it.
761.Pp
762If a driver supports offloading the receive checksum and verification,
763it should check to see what the hardware indicated was verified.
764The driver should then call
765.Xr mac_hcksum_set 9F .
766The flags used are different from the ones above and are discussed in
767detail in the
768.Xr mac_hcksum_set 9F
769manual page.
770If there is no checksum information available or the driver does not support
771checksumming, then it should simply not call
772.Xr mac_hcksum_set 9F .
773.Pp
774Note that the checksum flags should be set on the first
775mblk_t that makes up a given message.
776In other words, if multiple mblk_t structures are linked together by the
777.Fa b_cont
778member to describe a single frame, then it should only be called on the
779first mblk_t of that set.
780However, each distinct message should have the checksum bits set on it, if
781applicable.
782In other words, each mblk_t that is linked together by the
783.Fa b_next
784pointer may have checksum flags set.
785.Pp
786It is recommended that device drivers provide a private property or
787.Xr driver.conf 5
788property to control whether or not checksumming is enabled for both rx
789and tx; however, the default disposition is recommended to be enabled
790for both.
791This way if hardware bugs are found in the checksumming implementation, they can
792be disabled without requiring software updates.
793The transmit property should be checked when determining how to reply to
794.Xr mc_getcapab 9E
795and the receive property should be checked in the context of the receive
796function.
797.Ss Dv MAC_CAPAB_LSO
798The
799.Dv MAC_CAPAB_LSO
800capability indicates that the driver supports various forms of large
801send offload (LSO).
802The private data is a pointer to a
803.Ft mac_capab_lso_t
804structure.
805The system currently supports offloading TCP packets over both IPv4 and
806IPv6.
807This structure has the following members which are used to indicate
808various types of LSO support.
809.Bd -literal -offset indent
810t_uscalar_t		lso_flags;
811lso_basic_tcp_ivr4_t	lso_basic_tcp_ipv4;
812lso_basic_tcp_ipv6_t	lso_basic_tcp_ipv6;
813.Ed
814.Pp
815The
816.Fa lso_flags
817member is used to indicate which members are valid and should be
818considered.
819Each flag represents a different form of LSO.
820The member should be set to the bitwise inclusive OR of the following values:
821.Bl -tag -width Dv -offset indent
822.It Dv LSO_TX_BASIC_TCP_IPV4
823This indicates hardware support for performing TCP segmentation
824offloading over IPv4.
825When this flag is set, the
826.Fa lso_basic_tcp_ipv4
827member must be filled in.
828.It Dv LSO_TX_BASIC_TCP_IPV6
829This indicates hardware support for performing TCP segmentation
830offloading over IPv6.
831The IPv6 packet will have no extension headers present.
832When this flag is set, the
833.Fa lso_basic_tcp_ipv6
834member must be filled in.
835.El
836.Pp
837The
838.Fa lso_basic_tcp_ipv4
839member is a structure with the following members:
840.Bd -literal -offset indent
841t_uscalar_t	lso_max
842.Ed
843.Bd -filled -offset indent
844The
845.Fa lso_max
846member should be set to the maximum size of the TCP data
847payload that can be offloaded to the hardware.
848.Ed
849.Pp
850The
851.Fa lso_basic_tcp_ipv6
852member is a structure with the following members:
853.Bd -literal -offset indent
854t_uscalar_t	lso_max
855.Ed
856.Bd -filled -offset indent
857The
858.Fa lso_max
859member should be set to the maximum size of the TCP data
860payload that can be offloaded to the hardware.
861.Ed
862.Pp
863Like with checksumming, it is recommended that driver writers provide a
864means for disabling the support of LSO even if it is enabled by default.
865This deals with the case where issues that pop up for LSO may be worked
866around without requiring additional driver work.
867.Sh EVOLVING CAPABILITIES
868The following capabilities are still evolving in the operating system.
869They are documented such that device driver writers may experiment with
870them.
871However, if such drivers are not present inside the core operating
872system repository, they may be subject to API and ABI breakage.
873.Ss Dv MAC_CAPAB_RINGS
874The
875.Dv MAC_CAPAB_RINGS
876capability is very important for implementing a high-performing device
877driver.
878Networking hardware structures the queues of packets to be sent
879and received into a ring.
880Each entry in this ring has a descriptor, which describes the address
881and options for a packet which is going to
882be transmitted or received.
883While simple networking devices only have a single ring, most high-speed
884networking devices have support for many rings.
885.Pp
886Rings are used for two important purposes.
887The first is receive side scaling (RSS), which is the ability to have
888the hardware hash the contents of a packet based on some of the protocol
889headers, and send it to one of several rings.
890These different rings may each have their own interrupt associated with
891them, allowing the card to receive traffic in parallel.
892Similar logic can be performed when sending traffic, to leverage
893multiple hardware resources, thus increasing capacity.
894.Pp
895The second use of rings is to group them together and apply filtering
896rules.
897For example, if a packet matches a specific VLAN or MAC address,
898then it can be sent to a specific ring or a specific group of rings.
899This is especially useful when there are multiple different virtual NICs
900or zones in play as the operating system will be able to use the
901hardware classificaiton features to already know where a given packet
902needs to be delivered internally rather than having to determine that
903for each packet.
904.Pp
905From the MAC framework's perspective, a driver can have one or more
906groups.
907A group consists of the following:
908.Bl -bullet -offset -indent
909.It
910One or more hardware rings.
911.It
912One or more MAC address or VLAN filters.
913.El
914.Pp
915The details around how a device driver changes when rings are employed,
916the data structures that a driver must implement, and more are available
917in
918.Xr mac_capab_rings 9E .
919.Ss Dv MAC_CAPAB_TRANSCEIVER
920Many networking devices leverage external transceivers that adhere to
921standards such as SFP, QSFP, QSFP-DD, etc., which often contain
922standardized information in a EEPROM on the device.
923The
924.Dv MAC_CAPAB_TRANSCEIVER
925capability provides a means of discovering the number of transceivers,
926their types, and reading the data from a transceiver.
927This allows administrators and users to determine if devices are
928present, if the hardware can use them, and in many cases, detailed
929information about the device ranging from its manufacturer and
930serial numbers to specific information about its health.
931Implementing this capability will lead to the operating system being
932able to discover and display transceivers as part of its fault
933management topology.
934.Pp
935See
936.Xr mac_capab_transceiver 9E
937for more details on the capability structure and the various function
938entry points that come along with it.
939.Ss Dv MAC_CAPAB_LED
940The
941.Dv MAC_CAPAB_LED
942capability provides a means to access and control the LEDs on a network
943interface card.
944This is then made available to the broader operating system and consumed
945by facilities such as the Fault Management Architecture.
946See
947.Xr mac_capab_led 9E
948for more details on the structure and requirements of the capability.
949.Sh PROPERTIES
950Properties in the MAC framework represent aspects of a link.
951These include things like the link's current state and MTU.
952Many of the properties in the system are focused around auto-negotiation and
953controlling what link speeds are advertised.
954Information about properties is covered by three different device entry points.
955The
956.Xr mc_propinfo 9E
957entry point obtains metadata about the property.
958The
959.Xr mc_getprop 9E
960entry point obtains the property.
961The
962.Xr mc_setprop 9E
963entry point updates the property to a new value.
964.Pp
965Many of the properties listed below are read-only.
966Each property indicates whether it's read-only or it's read/write.
967However, driver writers may not implement the ability to set all writable
968properties.
969Many of these depend on the card itself.
970In particular, all properties that relate to auto-negotiation and are read/write
971may not be updated if the hardware in question does not support toggling what
972link speeds are auto-negotiated.
973While copper Ethernet often does not have this restriction, it often exists with
974various fiber standards and phys.
975.Pp
976The following properties are the subset of MAC framework properties that
977driver writers should be aware of and handle.
978While other properties exist in the system, driver writers should always return
979an error when a property not listed below is encountered.
980See
981.Xr mc_getprop 9E
982and
983.Xr mc_setprop 9E
984for more information on how to handle them.
985.Bl -hang -width Ds
986.It Dv MAC_PROP_DUPLEX
987.Bd -filled -compact
988Type:
989.Vt link_duplex_t |
990Permissions:
991.Sy Read-Only
992.Ed
993.Pp
994The
995.Dv MAC_PROP_DUPLEX
996property is used to indicate whether or not the link is duplex.
997A duplex link may have traffic flowing in both directions at the same time.
998The
999.Vt link_duplex_t
1000is an enumeration which may be set to any of the following values:
1001.Bl -tag -width Ds
1002.It Dv LINK_DUPLEX_UNKNOWN
1003The current state of the link is unknown.
1004This may be because the link has not negotiated to a specific speed or it is
1005down.
1006.It Dv LINK_DUPLEX_HALF
1007The link is running at half duplex.
1008Communication may travel in only one direction on the link at a given time.
1009.It Dv LINK_DUPLEX_FULL
1010The link is running at full duplex.
1011Communication may travel in both directions on the link simultaneously.
1012.El
1013.It Dv MAC_PROP_SPEED
1014.Bd -filled -compact
1015Type:
1016.Vt uint64_t |
1017Permissions:
1018.Sy Read-Only
1019.Ed
1020.Pp
1021The
1022.Dv MAC_PROP_SPEED
1023property stores the current link speed in bits per second.
1024A link that is running at 100 MBit/s would store the value 100000000ULL.
1025A link that is running at 40 Gbit/s would store the value 40000000000ULL.
1026.It Dv MAC_PROP_STATUS
1027.Bd -filled -compact
1028Type:
1029.Vt link_state_t |
1030Permissions:
1031.Sy Read-Only
1032.Ed
1033.Pp
1034The
1035.Dv MAC_PROP_STATUS
1036property is used to indicate the current state of the link.
1037It indicates whether the link is up or down.
1038The
1039.Vt link_state_t
1040is an enumeration which may be set to any of the following values:
1041.Bl -tag -width Ds
1042.It Dv LINK_STATE_UNKNOWN
1043The current state of the link is unknown.
1044This may be because the driver's
1045.Xr mc_start 9E
1046endpoint has not been called so it has not attempted to start the link.
1047.It Dv LINK_STATE_DOWN
1048The link is down.
1049This may be because of a negotiation problem, a cable problem, or some other
1050device specific issue.
1051.It Dv LINK_STATE_UP
1052The link is up.
1053If auto-negotiation is in use, it should have completed.
1054Traffic should be able to flow over the link, barring other issues.
1055.El
1056.It Dv MAC_PROP_AUTONEG
1057.Bd -filled -compact
1058Type:
1059.Vt uint8_t |
1060Permissions:
1061.Sy Read/Write
1062.Ed
1063.Pp
1064The
1065.Dv MAC_PROP_AUTONEG
1066property indicates whether or not the device is currently configured to
1067perform auto-negotiation.
1068A value of
1069.Sy 0
1070indicates that auto-negotiation is disabled.
1071A
1072.Sy non-zero
1073value indicates that auto-negotiation is enabled.
1074Devices should generally default to enabling auto-negotiation.
1075.Pp
1076When getting this property, the device driver should return the current
1077state.
1078When setting this property, if the device supports operating in the requested
1079mode, then the device driver should reset the link to negotiate to the new speed
1080after updating any internal registers.
1081.It Dv MAC_PROP_MTU
1082.Bd -filled -compact
1083Type:
1084.Vt uint32_t |
1085Permissions:
1086.Sy Read/Write
1087.Ed
1088.Pp
1089The
1090.Dv MAC_PROP_MTU
1091property determines the maximum transmission unit (MTU).
1092This indicates the maximum size packet that the device can transmit, ignoring
1093its own headers.
1094For an Ethernet device, this would exclude the size of the Ethernet header and
1095any VLAN headers that would be placed.
1096It is up to the driver to ensure that any MTU values that it accepts when adding
1097in its margin and header sizes does not exceed its maximum frame size.
1098.Pp
1099By default, drivers for Ethernet should initialize this value and the
1100MTU to
1101.Sy 1500 .
1102When getting this property, the driver should return its current
1103recorded MTU.
1104When setting this property, the driver should first validate that it is within
1105the device's valid range and then it must call
1106.Xr mac_maxsdu_update 9F .
1107Note that the call may fail.
1108If the call completes successfully, the driver should update the hardware with
1109the new value of the MTU and perform any other work needed to handle it.
1110.Pp
1111If the device does not support changing the MTU after the device's
1112.Xr mc_start 9E
1113entry point has been called, then driver writers should return
1114.Er EBUSY .
1115.It Dv MAC_PROP_FLOWCTRL
1116.Bd -filled -compact
1117Type:
1118.Vt link_flowctrl_t |
1119Permissions:
1120.Sy Read/Write
1121.Ed
1122.Pp
1123The
1124.Dv MAC_PROP_FLOWCTRL
1125property manages the configuration of pause frames as part of Ethernet
1126flow control.
1127Note, this only describes what this device will advertise.
1128What is actually enabled may be different and is subject to the rules of
1129auto-negotiation.
1130The
1131.Vt link_flowctrl_t
1132is an enumeration that may be set to one of the following values:
1133.Bl -tag -width Ds
1134.It Dv LINK_FLOWCTRL_NONE
1135Flow control is disabled.
1136No pause frames should be generated or honored.
1137.It Dv LINK_FLOWCTRL_RX
1138The device can receive pause frames; however, it should not generate
1139them.
1140.It Dv LINK_FLOWCTRL_TX
1141The device can generate pause frames; however, it does not support
1142receiving them.
1143.It Dv LINK_FLOWCTRL_BI
1144The device supports both sending and receiving pause frames.
1145.El
1146.Pp
1147When getting this property, the device driver should return the way that
1148it has configured the device, not what the device has actually
1149negotiated.
1150When setting the property, it should update the hardware and allow the link to
1151potentially perform auto-negotiation again.
1152.It Dv MAC_PROP_EN_FEC_CAP
1153.Bd -filled -compact
1154Type:
1155.Vt link_fec_t |
1156Permissions:
1157.Sy Read/Write
1158.Ed
1159.Pp
1160The
1161.Dv MAC_PROP_EN_FEC_CAP
1162property indicates which Forward Error Correction (FEC) code is advertised
1163by the device.
1164.Pp
1165The
1166.Vt link_fec_t
1167is an enumeration that may be a combination of the following bit values:
1168.Bl -tag -width Ds
1169.It Dv LINK_FEC_NONE
1170No FEC over the link.
1171.It Dv LINK_FEC_AUTO
1172The FEC coding to use is auto-negotiated,
1173.Dv LINK_FEC_AUTO
1174cannot be set along with any of the other values.
1175This is the default setting the device driver should use.
1176.It Dv LINK_FEC_RS
1177The link may use Reed-Solomon FEC coding.
1178.It Dv LINK_FEC_BASE_R
1179The link may use Base-R coding, also common referred to as FireCode.
1180.El
1181.Pp
1182When setting the property, it should update the hardware with the requested, or
1183combination of requested codings.
1184If a particular combination of codings is not supported by the hardware,
1185the device driver should return
1186.Er EINVAL .
1187When retrieving this property, the device driver should return the current
1188value of the property.
1189.It Dv MAC_PROP_ADV_FEC_CAP
1190.Bd -filled -compact
1191Type:
1192.Vt link_fec_t |
1193Permissions:
1194.Sy Read-Only
1195.Ed
1196.Pp
1197The
1198.Dv MAC_PROP_ADV_FEC_CAP
1199has the same values as
1200.Dv MAC_PROP_EN_FEC_CAP .
1201The property indicates which Forward Error Correction (FEC) code has been
1202negotiated over the link.
1203.El
1204.Pp
1205The remaining properties are all about various auto-negotiation link
1206speeds.
1207They fall into two different buckets: properties with
1208.Sy _ADV_
1209in the name and properties with
1210.Sy _EN_
1211in the name.
1212For any given supported speed, there is one of each.
1213The
1214.Sy _EN_
1215set of properties are read/write properties that control what should be
1216advertised by the device.
1217When these are retrieved, they should return the current value of the property.
1218When they are set, they should change how the hardware advertises the specific
1219speed and trigger any kind of link reset and auto-negotiation, if enabled, to
1220occur.
1221.Pp
1222The
1223.Sy _ADV_
1224set of properties are read-only properties.
1225They are meant to reflect what has actually been negotiated.
1226These may be different from the
1227.Sy _EN_
1228family of properties, especially when different power management
1229settings are at play.
1230.Pp
1231See the
1232.Sx Link Speed and Auto-negotiation
1233section for more information.
1234.Pp
1235The properties are ordered in increasing link speed:
1236.Bl -hang -width Ds
1237.It Dv MAC_PROP_ADV_10HDX_CAP
1238.Bd -filled -compact
1239Type:
1240.Vt uint8_t |
1241Permissions:
1242.Sy Read-Only
1243.Ed
1244.Pp
1245The
1246.Dv MAC_PROP_ADV_10HDX_CAP
1247property describes whether or not 10 Mbit/s half-duplex support is
1248advertised.
1249.It Dv MAC_PROP_EN_10HDX_CAP
1250.Bd -filled -compact
1251Type:
1252.Vt uint8_t |
1253Permissions:
1254.Sy Read/Write
1255.Ed
1256.Pp
1257The
1258.Dv MAC_PROP_EN_10HDX_CAP
1259property describes whether or not 10 Mbit/s half-duplex support is
1260enabled.
1261.It Dv MAC_PROP_ADV_10FDX_CAP
1262.Bd -filled -compact
1263Type:
1264.Vt uint8_t |
1265Permissions:
1266.Sy Read-Only
1267.Ed
1268.Pp
1269The
1270.Dv MAC_PROP_ADV_10FDX_CAP
1271property describes whether or not 10 Mbit/s full-duplex support is
1272advertised.
1273.It Dv MAC_PROP_EN_10FDX_CAP
1274.Bd -filled -compact
1275Type:
1276.Vt uint8_t |
1277Permissions:
1278.Sy Read/Write
1279.Ed
1280.Pp
1281The
1282.Dv MAC_PROP_EN_10FDX_CAP
1283property describes whether or not 10 Mbit/s full-duplex support is
1284enabled.
1285.It Dv MAC_PROP_ADV_100HDX_CAP
1286.Bd -filled -compact
1287Type:
1288.Vt uint8_t |
1289Permissions:
1290.Sy Read-Only
1291.Ed
1292.Pp
1293The
1294.Dv MAC_PROP_ADV_100HDX_CAP
1295property describes whether or not 100 Mbit/s half-duplex support is
1296advertised.
1297.It Dv MAC_PROP_EN_100HDX_CAP
1298.Bd -filled -compact
1299Type:
1300.Vt uint8_t |
1301Permissions:
1302.Sy Read/Write
1303.Ed
1304.Pp
1305The
1306.Dv MAC_PROP_EN_100HDX_CAP
1307property describes whether or not 100 Mbit/s half-duplex support is
1308enabled.
1309.It Dv MAC_PROP_ADV_100FDX_CAP
1310.Bd -filled -compact
1311Type:
1312.Vt uint8_t |
1313Permissions:
1314.Sy Read-Only
1315.Ed
1316.Pp
1317The
1318.Dv MAC_PROP_ADV_100FDX_CAP
1319property describes whether or not 100 Mbit/s full-duplex support is
1320advertised.
1321.It Dv MAC_PROP_EN_100FDX_CAP
1322.Bd -filled -compact
1323Type:
1324.Vt uint8_t |
1325Permissions:
1326.Sy Read/Write
1327.Ed
1328.Pp
1329The
1330.Dv MAC_PROP_EN_100FDX_CAP
1331property describes whether or not 100 Mbit/s full-duplex support is
1332enabled.
1333.It Dv MAC_PROP_ADV_100T4_CAP
1334.Bd -filled -compact
1335Type:
1336.Vt uint8_t |
1337Permissions:
1338.Sy Read-Only
1339.Ed
1340.Pp
1341The
1342.Dv MAC_PROP_ADV_100T4_CAP
1343property describes whether or not 100 Mbit/s Ethernet using the
1344100BASE-T4 standard is
1345advertised.
1346.It Dv MAC_PROP_EN_100T4_CAP
1347.Bd -filled -compact
1348Type:
1349.Vt uint8_t |
1350Permissions:
1351.Sy Read/Write
1352.Ed
1353.Pp
1354The
1355.Sy MAC_PROP_ADV_100T4_CAP
1356property describes whether or not 100 Mbit/s Ethernet using the
1357100BASE-T4 standard is
1358enabled.
1359.It Sy MAC_PROP_ADV_1000HDX_CAP
1360.Bd -filled -compact
1361Type:
1362.Vt uint8_t |
1363Permissions:
1364.Sy Read-Only
1365.Ed
1366.Pp
1367The
1368.Dv MAC_PROP_ADV_1000HDX_CAP
1369property describes whether or not 1 Gbit/s half-duplex support is
1370advertised.
1371.It Dv MAC_PROP_EN_1000HDX_CAP
1372.Bd -filled -compact
1373Type:
1374.Vt uint8_t |
1375Permissions:
1376.Sy Read/Write
1377.Ed
1378.Pp
1379The
1380.Dv MAC_PROP_EN_1000HDX_CAP
1381property describes whether or not 1 Gbit/s half-duplex support is
1382enabled.
1383.It Dv MAC_PROP_ADV_1000FDX_CAP
1384.Bd -filled -compact
1385Type:
1386.Vt uint8_t |
1387Permissions:
1388.Sy Read-Only
1389.Ed
1390.Pp
1391The
1392.Dv MAC_PROP_ADV_1000FDX_CAP
1393property describes whether or not 1 Gbit/s full-duplex support is
1394advertised.
1395.It Dv MAC_PROP_EN_1000FDX_CAP
1396.Bd -filled -compact
1397Type:
1398.Vt uint8_t |
1399Permissions:
1400.Sy Read/Write
1401.Ed
1402.Pp
1403The
1404.Dv MAC_PROP_EN_1000FDX_CAP
1405property describes whether or not 1 Gbit/s full-duplex support is
1406enabled.
1407.It Dv MAC_PROP_ADV_2500FDX_CAP
1408.Bd -filled -compact
1409Type:
1410.Vt uint8_t |
1411Permissions:
1412.Sy Read-Only
1413.Ed
1414.Pp
1415The
1416.Dv MAC_PROP_ADV_2500FDX_CAP
1417property describes whether or not 2.5 Gbit/s full-duplex support is
1418advertised.
1419.It Dv MAC_PROP_EN_2500FDX_CAP
1420.Bd -filled -compact
1421Type:
1422.Vt uint8_t |
1423Permissions:
1424.Sy Read/Write
1425.Ed
1426.Pp
1427The
1428.Dv MAC_PROP_EN_2500FDX_CAP
1429property describes whether or not 2.5 Gbit/s full-duplex support is
1430enabled.
1431.It Dv MAC_PROP_ADV_5000FDX_CAP
1432.Bd -filled -compact
1433Type:
1434.Vt uint8_t |
1435Permissions:
1436.Sy Read-Only
1437.Ed
1438.Pp
1439The
1440.Dv MAC_PROP_ADV_5000FDX_CAP
1441property describes whether or not 5.0 Gbit/s full-duplex support is
1442advertised.
1443.It Dv MAC_PROP_EN_5000FDX_CAP
1444.Bd -filled -compact
1445Type:
1446.Vt uint8_t |
1447Permissions:
1448.Sy Read/Write
1449.Ed
1450.Pp
1451The
1452.Dv MAC_PROP_EN_5000FDX_CAP
1453property describes whether or not 5.0 Gbit/s full-duplex support is
1454enabled.
1455.It Dv MAC_PROP_ADV_10GFDX_CAP
1456.Bd -filled -compact
1457Type:
1458.Vt uint8_t |
1459Permissions:
1460.Sy Read-Only
1461.Ed
1462.Pp
1463The
1464.Dv MAC_PROP_ADV_10GFDX_CAP
1465property describes whether or not 10 Gbit/s full-duplex support is
1466advertised.
1467.It Dv MAC_PROP_EN_10GFDX_CAP
1468.Bd -filled -compact
1469Type:
1470.Vt uint8_t |
1471Permissions:
1472.Sy Read/Write
1473.Ed
1474.Pp
1475The
1476.Dv MAC_PROP_EN_10GFDX_CAP
1477property describes whether or not 10 Gbit/s full-duplex support is
1478enabled.
1479.It Dv MAC_PROP_ADV_40GFDX_CAP
1480.Bd -filled -compact
1481Type:
1482.Vt uint8_t |
1483Permissions:
1484.Sy Read-Only
1485.Ed
1486.Pp
1487The
1488.Dv MAC_PROP_ADV_40GFDX_CAP
1489property describes whether or not 40 Gbit/s full-duplex support is
1490advertised.
1491.It Dv MAC_PROP_EN_40GFDX_CAP
1492.Bd -filled -compact
1493Type:
1494.Vt uint8_t |
1495Permissions:
1496.Sy Read/Write
1497.Ed
1498.Pp
1499The
1500.Dv MAC_PROP_EN_40GFDX_CAP
1501property describes whether or not 40 Gbit/s full-duplex support is
1502enabled.
1503.It Dv MAC_PROP_ADV_100GFDX_CAP
1504.Bd -filled -compact
1505Type:
1506.Vt uint8_t |
1507Permissions:
1508.Sy Read-Only
1509.Ed
1510.Pp
1511The
1512.Dv MAC_PROP_ADV_100GFDX_CAP
1513property describes whether or not 100 Gbit/s full-duplex support is
1514advertised.
1515.It Dv MAC_PROP_EN_100GFDX_CAP
1516.Bd -filled -compact
1517Type:
1518.Vt uint8_t |
1519Permissions:
1520.Sy Read/Write
1521.Ed
1522.Pp
1523The
1524.Dv MAC_PROP_EN_100GFDX_CAP
1525property describes whether or not 100 Gbit/s full-duplex support is
1526enabled.
1527.El
1528.Ss Private Properties
1529In addition to the defined properties above, drivers are allowed to
1530define private properties.
1531These private properties are device-specific properties.
1532All private properties share the same constant,
1533.Dv MAC_PROP_PRIVATE .
1534Properties are distinguished by a name, which is a character string.
1535The list of such private properties is defined when registering with mac in the
1536.Fa m_priv_props
1537member of the
1538.Xr mac_register 9S
1539structure.
1540.Pp
1541The driver may define whatever semantics it wants for these private
1542properties.
1543They will not be listed when running
1544.Xr dladm 8 ,
1545unless explicitly requested by name.
1546All such properties should start with a leading underscore character and then
1547consist of alphanumeric ASCII characters and additional underscores or hyphens.
1548.Pp
1549Properties of type
1550.Dv MAC_PROP_PRIVATE
1551may show up in all three property related entry points:
1552.Xr mc_propinfo 9E ,
1553.Xr mc_getprop 9E ,
1554and
1555.Xr mc_setprop 9E .
1556Device drivers should tell the different properties apart by using the
1557.Xr strcmp 9F
1558function to compare it to the set of properties that it knows about.
1559When encountering properties that it doesn't know, it should treat them
1560like all other unknown properties.
1561.Sh STATISTICS
1562The MAC framework defines a couple different sets of statistics which
1563are based on various standards for devices to implement.
1564Statistics are retrieved through the
1565.Xr mc_getstat 9E
1566entry point.
1567There are both statistics that are required for all devices and then there is a
1568separate set of Ethernet specific statistics.
1569Not all devices will support every statistic.
1570In many cases, several device registers will need to be combined to create the
1571proper stat.
1572.Pp
1573In general, if the device is not keeping track of these statistics, then
1574it is recommended that the driver store these values as a
1575.Vt uint64_t
1576to ensure that overflow does not occur.
1577.Pp
1578If a device does not support a specific statistic, then it is fine to
1579return that it is not supported.
1580The same should be used for unrecognized statistics.
1581See
1582.Xr mc_getstat 9E
1583for more information on the proper way to handle these.
1584.Ss General Device Statistics
1585The following statistics are based on MIB-II statistics from both RFC
15861213 and RFC 1573.
1587.Bl -tag -width Ds
1588.It Dv MAC_STAT_IFSPEED
1589The device's current speed in bits per second.
1590.It Dv MAC_STAT_MULTIRCV
1591The total number of received multicast packets.
1592.It Dv MAC_STAT_BRDCSTRCV
1593The total number of received broadcast packets.
1594.It Dv MAC_STAT_MULTIXMT
1595The total number of transmitted multicast packets.
1596.It Dv MAC_STAT_BRDCSTXMT
1597The total number of received broadcast packets.
1598.It Dv MAC_STAT_NORCVBUF
1599The total number of packets discarded by the hardware due to a lack of
1600receive buffers.
1601.It Dv MAC_STAT_IERRORS
1602The total number of errors detected on input.
1603.It Dv MAC_STAT_UNKNOWNS
1604The total number of received packets that were discarded because they
1605were of an unknown protocol.
1606.It Dv MAC_STAT_NOXMTBUF
1607The total number of outgoing packets dropped due to a lack of transmit
1608buffers.
1609.It Dv MAC_STAT_OERRORS
1610The total number of outgoing packets that resulted in errors.
1611.It Dv MAC_STAT_COLLISIONS
1612Total number of collisions encountered by the transmitter.
1613.It Dv MAC_STAT_RBYTES
1614The total number of bytes received by the device, regardless of packet
1615type.
1616.It Dv MAC_STAT_IPACKETS
1617The total number of packets received by the device, regardless of packet type.
1618.It Dv MAC_STAT_OBYTES
1619The total number of bytes transmitted by the device, regardless of packet type.
1620.It Dv MAC_STAT_OPACKETS
1621The total number of packets sent by the device, regardless of packet type.
1622.It Dv MAC_STAT_UNDERFLOWS
1623The total number of packets that were smaller than the minimum sized
1624packet for the device and were therefore dropped.
1625.It Dv MAC_STAT_OVERFLOWS
1626The total number of packets that were larger than the maximum sized
1627packet for the device and were therefore dropped.
1628.El
1629.Ss Ethernet Specific Statistics
1630The following statistics are specific to Ethernet devices.
1631They refer to values from RFC 1643 and include various MII/GMII specific stats.
1632Many of these are also defined in IEEE 802.3.
1633.Bl -tag -width Ds
1634.It Dv ETHER_STAT_ADV_CAP_1000FDX
1635Indicates that the device is advertising support for 1 Gbit/s
1636full-duplex operation.
1637.It Dv ETHER_STAT_ADV_CAP_1000HDX
1638Indicates that the device is advertising support for 1 Gbit/s
1639half-duplex operation.
1640.It Dv ETHER_STAT_ADV_CAP_100FDX
1641Indicates that the device is advertising support for 100 Mbit/s
1642full-duplex operation.
1643.It Dv ETHER_STAT_ADV_CAP_100GFDX
1644Indicates that the device is advertising support for 100 Gbit/s
1645full-duplex operation.
1646.It Dv ETHER_STAT_ADV_CAP_100HDX
1647Indicates that the device is advertising support for 100 Mbit/s
1648half-duplex operation.
1649.It Dv ETHER_STAT_ADV_CAP_100T4
1650Indicates that the device is advertising support for 100 Mbit/s
1651100BASE-T4 operation.
1652.It Dv ETHER_STAT_ADV_CAP_10FDX
1653Indicates that the device is advertising support for 10 Mbit/s
1654full-duplex operation.
1655.It Dv ETHER_STAT_ADV_CAP_10GFDX
1656Indicates that the device is advertising support for 10 Gbit/s
1657full-duplex operation.
1658.It Dv ETHER_STAT_ADV_CAP_10HDX
1659Indicates that the device is advertising support for 10 Mbit/s
1660half-duplex operation.
1661.It Dv ETHER_STAT_ADV_CAP_2500FDX
1662Indicates that the device is advertising support for 2.5 Gbit/s
1663full-duplex operation.
1664.It Dv ETHER_STAT_ADV_CAP_40GFDX
1665Indicates that the device is advertising support for 40 Gbit/s
1666full-duplex operation.
1667.It Dv ETHER_STAT_ADV_CAP_5000FDX
1668Indicates that the device is advertising support for 5.0 Gbit/s
1669full-duplex operation.
1670.It Dv ETHER_STAT_ADV_CAP_ASMPAUSE
1671Indicates that the device is advertising support for receiving pause
1672frames.
1673.It Dv ETHER_STAT_ADV_CAP_AUTONEG
1674Indicates that the device is advertising support for auto-negotiation.
1675.It Dv ETHER_STAT_ADV_CAP_PAUSE
1676Indicates that the device is advertising support for generating pause
1677frames.
1678.It Dv ETHER_STAT_ADV_REMFAULT
1679Indicates that the device is advertising support for detecting faults in
1680the remote link peer.
1681.It Dv ETHER_STAT_ALIGN_ERRORS
1682Indicates the number of times an alignment error was generated by the
1683Ethernet device.
1684This is a count of packets that were not an integral number of octets and failed
1685the FCS check.
1686.It Dv ETHER_STAT_CAP_1000FDX
1687Indicates the device supports 1 Gbit/s full-duplex operation.
1688.It Dv ETHER_STAT_CAP_1000HDX
1689Indicates the device supports 1 Gbit/s half-duplex operation.
1690.It Dv ETHER_STAT_CAP_100FDX
1691Indicates the device supports 100 Mbit/s full-duplex operation.
1692.It Dv ETHER_STAT_CAP_100GFDX
1693Indicates the device supports 100 Gbit/s full-duplex operation.
1694.It Dv ETHER_STAT_CAP_100HDX
1695Indicates the device supports 100 Mbit/s half-duplex operation.
1696.It Dv ETHER_STAT_CAP_100T4
1697Indicates the device supports 100 Mbit/s 100BASE-T4 operation.
1698.It Dv ETHER_STAT_CAP_10FDX
1699Indicates the device supports 10 Mbit/s full-duplex operation.
1700.It Dv ETHER_STAT_CAP_10GFDX
1701Indicates the device supports 10 Gbit/s full-duplex operation.
1702.It Dv ETHER_STAT_CAP_10HDX
1703Indicates the device supports 10 Mbit/s half-duplex operation.
1704.It Dv ETHER_STAT_CAP_2500FDX
1705Indicates the device supports 2.5 Gbit/s full-duplex operation.
1706.It Dv ETHER_STAT_CAP_40GFDX
1707Indicates the device supports 40 Gbit/s full-duplex operation.
1708.It Dv ETHER_STAT_CAP_5000FDX
1709Indicates the device supports 5.0 Gbit/s full-duplex operation.
1710.It Dv ETHER_STAT_CAP_ASMPAUSE
1711Indicates that the device supports the ability to receive pause frames.
1712.It Dv ETHER_STAT_CAP_AUTONEG
1713Indicates that the device supports the ability to perform link
1714auto-negotiation.
1715.It Dv ETHER_STAT_CAP_PAUSE
1716Indicates that the device supports the ability to transmit pause frames.
1717.It Dv ETHER_STAT_CAP_REMFAULT
1718Indicates that the device supports the ability of detecting a remote
1719fault in a link peer.
1720.It Dv ETHER_STAT_CARRIER_ERRORS
1721Indicates the number of times that the Ethernet carrier sense condition
1722was lost or not asserted.
1723.It Dv ETHER_STAT_DEFER_XMTS
1724Indicates the number of frames for which the device was unable to
1725transmit the frame due to being busy and had to try again.
1726.It Dv ETHER_STAT_EX_COLLISIONS
1727Indicates the number of frames that failed to send due to an excessive
1728number of collisions.
1729.It Dv ETHER_STAT_FCS_ERRORS
1730Indicates the number of times that a frame check sequence failed.
1731.It Dv ETHER_STAT_FIRST_COLLISIONS
1732Indicates the number of times that a frame was eventually transmitted
1733successfully, but only after a single collision.
1734.It Dv ETHER_STAT_JABBER_ERRORS
1735Indicates the number of frames that were received that were both larger
1736than the maximum packet size and failed the frame check sequence.
1737.It Dv ETHER_STAT_LINK_ASMPAUSE
1738Indicates whether the link is currently configured to accept pause
1739frames.
1740.It Dv ETHER_STAT_LINK_AUTONEG
1741Indicates whether the current link state is a result of
1742auto-negotiation.
1743.It Dv ETHER_STAT_LINK_DUPLEX
1744Indicates the current duplex state of the link.
1745The values used here should be the same as documented for
1746.Dv MAC_PROP_DUPLEX .
1747.It Dv ETHER_STAT_LINK_PAUSE
1748Indicates whether the link is currently configured to generate pause
1749frames.
1750.It Dv ETHER_STAT_LP_CAP_1000FDX
1751Indicates the remote device supports 1 Gbit/s full-duplex operation.
1752.It Dv ETHER_STAT_LP_CAP_1000HDX
1753Indicates the remote device supports 1 Gbit/s half-duplex operation.
1754.It Dv ETHER_STAT_LP_CAP_100FDX
1755Indicates the remote device supports 100 Mbit/s full-duplex operation.
1756.It Dv ETHER_STAT_LP_CAP_100GFDX
1757Indicates the remote device supports 100 Gbit/s full-duplex operation.
1758.It Dv ETHER_STAT_LP_CAP_100HDX
1759Indicates the remote device supports 100 Mbit/s half-duplex operation.
1760.It Dv ETHER_STAT_LP_CAP_100T4
1761Indicates the remote device supports 100 Mbit/s 100BASE-T4 operation.
1762.It Dv ETHER_STAT_LP_CAP_10FDX
1763Indicates the remote device supports 10 Mbit/s full-duplex operation.
1764.It Dv ETHER_STAT_LP_CAP_10GFDX
1765Indicates the remote device supports 10 Gbit/s full-duplex operation.
1766.It Dv ETHER_STAT_LP_CAP_10HDX
1767Indicates the remote device supports 10 Mbit/s half-duplex operation.
1768.It Dv ETHER_STAT_LP_CAP_2500FDX
1769Indicates the remote device supports 2.5 Gbit/s full-duplex operation.
1770.It Dv ETHER_STAT_LP_CAP_40GFDX
1771Indicates the remote device supports 40 Gbit/s full-duplex operation.
1772.It Dv ETHER_STAT_LP_CAP_5000FDX
1773Indicates the remote device supports 5.0 Gbit/s full-duplex operation.
1774.It Dv ETHER_STAT_LP_CAP_ASMPAUSE
1775Indicates that the remote device supports the ability to receive pause
1776frames.
1777.It Dv ETHER_STAT_LP_CAP_AUTONEG
1778Indicates that the remote device supports the ability to perform link
1779auto-negotiation.
1780.It Dv ETHER_STAT_LP_CAP_PAUSE
1781Indicates that the remote device supports the ability to transmit pause
1782frames.
1783.It Dv ETHER_STAT_LP_CAP_REMFAULT
1784Indicates that the remote device supports the ability of detecting a
1785remote fault in a link peer.
1786.It Dv ETHER_STAT_MACRCV_ERRORS
1787Indicates the number of times that the internal MAC layer encountered an
1788error when attempting to receive and process a frame.
1789.It Dv ETHER_STAT_MACXMT_ERRORS
1790Indicates the number of times that the internal MAC layer encountered an
1791error when attempting to process and transmit a frame.
1792.It Dv ETHER_STAT_MULTI_COLLISIONS
1793Indicates the number of times that a frame was eventually transmitted
1794successfully, but only after more than one collision.
1795.It Dv ETHER_STAT_SQE_ERRORS
1796Indicates the number of times that an SQE error occurred.
1797The specific conditions for this error are documented in IEEE 802.3.
1798.It Dv ETHER_STAT_TOOLONG_ERRORS
1799Indicates the number of frames that were received that were longer than
1800the maximum frame size supported by the device.
1801.It Dv ETHER_STAT_TOOSHORT_ERRORS
1802Indicates the number of frames that were received that were shorter than
1803the minimum frame size supported by the device.
1804.It Dv ETHER_STAT_TX_LATE_COLLISIONS
1805Indicates the number of times a collision was detected late on the
1806device.
1807.It Dv ETHER_STAT_XCVR_ADDR
1808Indicates the address of the MII/GMII receiver address.
1809.It Dv ETHER_STAT_XCVR_ID
1810Indicates the id of the MII/GMII receiver address.
1811.It Dv ETHER_STAT_XCVR_INUSE
1812Indicates what kind of receiver is in use.
1813The following values may be used:
1814.Bl -tag -width Ds
1815.It Dv XCVR_UNDEFINED
1816The receiver type is undefined by the hardware.
1817.It Dv XCVR_NONE
1818There is no receiver in use by the hardware.
1819.It Dv XCVR_10
1820The receiver supports 10BASE-T operation.
1821.It Dv XCVR_100T4
1822The receiver supports 100BASE-T4 operation.
1823.It Dv XCVR_100X
1824The receiver supports 100BASE-TX operation.
1825.It Dv XCVR_100T2
1826The receiver supports 100BASE-T2 operation.
1827.It Dv XCVR_1000X
1828The receiver supports 1000BASE-X operation.
1829This is used for all fiber receivers.
1830.It Dv XCVR_1000T
1831The receiver supports 1000BASE-T operation.
1832This is used for all copper receivers.
1833.El
1834.El
1835.Ss Device Specific kstats
1836In addition to the defined statistics above, if the device driver
1837maintains additional statistics or the device provides additional
1838statistics, it should create its own kstats through the
1839.Xr kstat_create 9F
1840function to allow operators to observe them.
1841.Sh RECEIVE DESCRIPTOR LAYOUT
1842One of the important things that a device driver must do is lay out DMA
1843memory, generally in a ring of descriptors, into which received Ethernet
1844frames will be placed.
1845When performing this, there are a few things that drivers should
1846generally do:
1847.Bl -enum -offset indent
1848.It
1849Drivers should lay out memory so that the IP header will be 4-byte
1850aligned.
1851The IP stack expects that the beginning of an IP header will be at a
18524-byte aligned address; however, a DMA allocation will be at a 4-
1853or 8-byte aligned address by default.
1854The IP hearder is at a 14 byte offset from the beginning of the Ethernet
1855frame, leaving the IP header at a 2-byte alignment if the Ethernet frame
1856starts at the beginning of the DMA buffer.
1857If VLAN tagging is in place, then each VLAN tag adds 4 bytes, which
1858doesn't change the alignment the IP header is found at.
1859.Pp
1860As a solution to this, the driver should program the device to start
1861placing the received Ethernet frame at two bytes off of the start of the
1862DMA buffer.
1863This will make sure that no matter whether or not VLAN tags are present,
1864that the IP header will be 4-byte aligned.
1865.It
1866Drivers should try to allocate the DMA memory used for receiving frames
1867as a continuous buffer.
1868If for some reason that would not be possible, the driver should try to
1869ensure that there is enough space for all of the initial Ethernet and
1870any possible layer three and layer four headers
1871.Pq such as IP, TCP, or UDP
1872in the initial descriptor.
1873.It
1874As discussed in the
1875.Sx MBLKS AND DMA
1876section, there are multiple strategies for managing the relationship
1877between DMA data, receive descriptors, and the operating system
1878representation of a packet in the
1879.Xr mblk 9S
1880structure.
1881Drivers must limit their resource consumption.
1882See the
1883.Sy Considerations
1884section of
1885.Sx MBLKS AND DMA
1886for more on this.
1887.El
1888.Sh TX STALL DETECTION, DEVICE RESETS, AND FAULT MANAGEMENT
1889Device drivers are the first line of defense for dealing with broken
1890devices and bugs in their firmware.
1891While most devices will rarely fail, it is important that when designing and
1892implementing the device driver that particular attention is paid in the design
1893with respect to RAS (Reliability, Availability, and Serviceability).
1894While everything described in this section is optional, it is highly recommended
1895that all new device drivers follow these guidelines.
1896.Pp
1897The Fault Management Architecture (FMA) provides facilities for
1898detecting and reporting various classes of defects and faults.
1899Specifically for networking device drivers, issues that should be
1900detected and reported include:
1901.Bl -bullet -offset indent
1902.It
1903Device internal uncorrectable errors
1904.It
1905Device internal correctable errors
1906.It
1907PCI and PCI Express transport errors
1908.It
1909Device temperature alarms
1910.It
1911Device transmission stalls
1912.It
1913Device communication timeouts
1914.It
1915High invalid interrupts
1916.El
1917.Pp
1918All such errors fall into three primary categories:
1919.Bl -enum -offset indent
1920.It
1921Errors detected by the Fault Management Architecture
1922.It
1923Errors detected by the device and indicated to the device driver
1924.It
1925Errors detected by the device driver
1926.El
1927.Ss Fault Management Setup and Teardown
1928Drivers should initialize support for the fault management framework by
1929calling
1930.Xr ddi_fm_init 9F
1931from their
1932.Xr attach 9E
1933routine.
1934By registering with the fault management framework, a device driver is given the
1935chance to detect and notice transport errors as well as report other errors that
1936exist.
1937While a device driver does not need to indicate that it is capable of all such
1938capabilities described in
1939.Xr ddi_fm_init 9F ,
1940we suggest that device drivers at least register the
1941.Dv DDI_FM_EREPORT_CAPABLE
1942so as to allow the driver to report issues that it detects.
1943.Pp
1944If the driver registers with the fault management framework during its
1945.Xr attach 9E
1946entry point, it must call
1947.Xr ddi_fm_fini 9F
1948during its
1949.Xr detach 9E
1950entry point.
1951.Ss Transport Errors
1952Many modern networking devices leverage PCI or PCI Express.
1953As such, there are two primary ways that device drivers access data: they either
1954memory map device registers and use routines like
1955.Xr ddi_get8 9F
1956and
1957.Xr ddi_put8 9F
1958or they use direct memory access (DMA).
1959New device drivers should always enable checking of the transport layer by
1960marking their support in the
1961.Xr ddi_device_acc_attr 9S
1962structure and using routines like
1963.Xr ddi_fm_acc_err_get 9F
1964and
1965.Xr ddi_fm_dma_err_get 9F
1966to detect if errors have occurred.
1967.Ss Device Indicated Errors
1968Many devices have capabilities to announce to a device driver that a
1969fatal correctable error or uncorrectable error has occurred.
1970Other devices have the ability to indicate that various physical issues have
1971occurred such as a fan failing or a temperature sensor having fired.
1972.Pp
1973Drivers should wire themselves to receive notifications when these
1974events occur.
1975The means and capabilities will vary from device to device.
1976For example, some devices will generate information about these notifications
1977through special interrupts.
1978Other devices may have a register that software can poll.
1979In the cases where polling is required, driver writers should try not to poll
1980too frequently and should generally only poll when the device is actively being
1981used, e.g. between calls to the
1982.Xr mc_start 9E
1983and
1984.Xr mc_stop 9E
1985entry points.
1986.Ss Driver Transmit Stall Detection
1987One of the primary responsibilities of a hardened device driver is to
1988perform transmit stall detection.
1989The core idea behind tx stall detection is that the driver should record when
1990it's getting activity related to when data has been successfully transmitted.
1991Most devices should be transmitting data on a regular basis as long as the link
1992is up.
1993If it is not, then this may indicate that the device is stuck and needs to be
1994reset.
1995At this time, the MAC framework does not provide any resources for performing
1996these checks; however, polling on each individual transmit ring for the last
1997completion time while something is actively being transmitted through the use of
1998routines such as
1999.Xr timeout 9F
2000may be a reasonable starting point.
2001.Ss Driver Command Timeout Detection
2002Each device is programmed in different ways.
2003Some devices are programmed through asynchronous commands while others are
2004programmed by writing directly to memory mapped registers.
2005If a device receives asynchronous replies to commands, then the device driver
2006should set reasonable timeouts for all such commands and plan on detecting them.
2007If a timeout occurs, the driver should presume that there is an issue with the
2008hardware and proceed to abort the command or reset the device.
2009.Pp
2010Many devices do not have such a communication mechanism.
2011However, whenever there is some activity where the device driver must wait, then
2012it should be prepared for the fact that the device may never get back to
2013it and react appropriately by performing some kind of device reset.
2014.Ss Reacting to Errors
2015When any of the above categories of errors has been triggered, the
2016behavior that the device driver should take depends on the kind of
2017error.
2018If a fatal error, for example, a transport error, a transmit stall was detected,
2019or the device indicated an uncorrectable error was detected, then it is
2020important that the driver take the following steps:
2021.Bl -enum -offset indent
2022.It
2023Set a flag in the device driver's state that indicates that it has hit
2024an error condition.
2025When this error condition flag is asserted, transmitted packets should be
2026accepted and dropped and actions that would require writing to the device state
2027should fail with an error.
2028This flag should remain until the device has been successfully restarted.
2029.It
2030If the error was not a transport error that was indicated by the fault
2031management architecture, e.g. a transport error that was detected, then
2032the device driver should post an
2033.Sy ereport
2034indicating what has occurred with the
2035.Xr ddi_fm_ereport_post 9F
2036function.
2037.It
2038The device driver should indicate that the device's service was lost
2039with a call to
2040.Xr ddi_fm_service_impact 9F
2041using the symbol
2042.Dv DDI_SERVICE_LOST .
2043.It
2044At this point the device driver should issue a device reset through some
2045device-specific means.
2046.It
2047When the device reset has been completed, then the device driver should
2048restore all of the programmed state to the device.
2049This includes things like the current MTU, advertised auto-negotiation speeds,
2050MAC address filters, and more.
2051.It
2052Finally, when service has been restored, the device driver should call
2053.Xr ddi_fm_service_impact 9F
2054using the symbol
2055.Dv DDI_SERVICE_RESTORED .
2056.El
2057.Pp
2058When a non-fatal error occurs, then the device driver should submit an
2059ereport and should optionally mark the device degraded using
2060.Xr ddi_fm_service_impact 9F
2061with the
2062.Dv DDI_SERVICE_DEGRADED
2063value depending on the nature of the problem that has occurred.
2064.Pp
2065Device drivers should never make the decision to remove a device from
2066service based on errors that have occurred nor should they panic the
2067system.
2068Rather, the device driver should always try to notify the operating system with
2069various ereports and allow its policy decisions to occur.
2070The decision to retire a device lies in the hands of the fault management
2071architecture.
2072It knows more about the operator's intent and the surrounding system's state
2073than the device driver itself does and it will make the call to offline and
2074retire the device if it is required.
2075.Ss Device Resets
2076When resetting a device, a device driver must exercise caution.
2077If a device driver has not been written to plan for a device reset, then it
2078may not correctly restore the device's state after such a reset.
2079Such state should be stored in the instance's private state data as the MAC
2080framework does not know about device resets and will not inform the
2081device again about the expected, programmed state.
2082.Pp
2083One wrinkle with device resets is that many networking cards show up as
2084multiple PCI functions on a single device, for example, each port may
2085show up as a separate function and thus have a separate instance of the
2086device driver attached.
2087When resetting a function, device driver writers should carefully read the
2088device programming manuals and verify whether or not a reset impacts only the
2089stalled function or if it impacts all function across the device.
2090.Pp
2091If the only way to reset a given function is through the device, then
2092this may require more coordination and work on the part of the device
2093driver to ensure that all the other instances are correctly restored.
2094In cases where this occurs, some devices offer ways of injecting
2095interrupts onto those other functions to notify them that this is
2096occurring.
2097.Sh MBLKS AND DMA
2098The networking stack manages framed data through the use of the
2099.Xr mblk 9S
2100structure.
2101The mblk allows for a single message to be made up of individual blocks.
2102Each part is linked together through its
2103.Fa b_cont
2104member.
2105However, it also allows for multiple messages to be chained together through the
2106use of the
2107.Fa b_next
2108member.
2109While the networking stack works with these structures, device drivers generally
2110work with DMA regions.
2111There are two different strategies that device drivers use for handling these
2112two different cases: copying and binding.
2113.Ss Copying Data
2114The first way that device drivers handle interfacing between the two is
2115by having two separate regions of memory.
2116One part is memory which has been allocated for DMA through a call to
2117.Xr ddi_dma_mem_alloc 9F
2118and the other is memory associated with the memory block.
2119.Pp
2120In this case, a driver will use
2121.Xr bcopy 9F
2122to copy memory between the two distinct regions.
2123When transmitting a packet, it will copy the memory from the mblk_t to the DMA
2124region.
2125When receiving memory, it will allocate a mblk_t through the
2126.Xr allocb 9F
2127routine, copy the memory across with
2128.Xr bcopy 9F ,
2129and then increment the mblk_t's
2130.Fa b_wptr
2131structure.
2132.Pp
2133If, when receiving, memory is not available for a new message block,
2134then the frame should be skipped and effectively dropped.
2135A kstat should be bumped when such an occasion occurs.
2136.Ss Binding Data
2137An alternative approach to copying data is to use DMA binding.
2138When using DMA binding, the OS takes care of mapping between DMA memory and
2139normal device memory.
2140The exact process is a bit different between transmit and receive.
2141.Pp
2142When transmitting a device driver has an mblk_t and needs to call the
2143.Xr ddi_dma_addr_bind_handle 9F
2144function to bind it to an already existing DMA handle.
2145At that point, it will receive various DMA cookies that it can use to obtain the
2146addresses to program the device with for transmitting data.
2147Once the transmit is done, the driver must then make sure to call
2148.Xr freemsg 9F
2149to release the data.
2150It must not call
2151.Xr freemsg 9F
2152before it receives an interrupt from the device indicating that the data
2153has been transmitted, otherwise it risks sending arbitrary kernel
2154memory.
2155.Pp
2156When receiving data, the device can perform a similar operation.
2157First, it must bind the DMA memory into the kernel's virtual memory address
2158space through a call to the
2159.Xr ddi_dma_addr_bind_handle 9F
2160function if it has not already.
2161Once it has, it must then call
2162.Xr desballoc 9F
2163to try and create a new mblk_t which leverages the associated memory.
2164It can then pass that mblk_t up to the stack.
2165.Ss Considerations
2166When deciding which of these options to use, there are many different
2167considerations that must be made.
2168The answer as to whether to bind memory or to copy data is not always simpler.
2169.Pp
2170The first thing to remember is that DMA resources may be finite on a
2171given platform.
2172Consider the case of receiving data.
2173A device driver that binds one of its receive descriptors may not get it back
2174for quite some time as it may be used by the kernel until an application
2175actually consumes it.
2176Device drivers that try to bind memory for receive, often work with the
2177constraint that they must be able to replace that DMA memory with another DMA
2178descriptor.
2179If they were not replaced, then eventually the device would not be able to
2180receive additional data into the ring.
2181.Pp
2182On the other hand, particularly for larger frames, copying every packet
2183from one buffer to another can be a source of additional latency and
2184memory waste in the system.
2185For larger copies, the cost of copying may dwarf any potential cost of
2186performing DMA binding.
2187.Pp
2188For device driver authors that are unsure of what to do, they should
2189first employ the copying method to simplify the act of writing the
2190device driver.
2191The copying method is simpler and also allows the device driver author not to
2192worry about allocated DMA memory that is still outstanding when it is asked to
2193unload.
2194.Pp
2195If device driver writers are worried about the cost, it is recommended
2196to make the decision as to whether or not to copy or bind DMA data
2197a separate private property for both transmitting and receiving.
2198That private property should indicate the size of the received frame at which
2199to switch from one format to the other.
2200This way, data can be gathered to determine what the impact of each method is on
2201a given platform.
2202.Sh SEE ALSO
2203.Xr dlpi 4P ,
2204.Xr driver.conf 5 ,
2205.Xr ieee802.3 7 ,
2206.Xr dladm 8 ,
2207.Xr _fini 9E ,
2208.Xr _info 9E ,
2209.Xr _init 9E ,
2210.Xr attach 9E ,
2211.Xr close 9E ,
2212.Xr detach 9E ,
2213.Xr mac_capab_led 9E ,
2214.Xr mac_capab_rings 9E ,
2215.Xr mac_capab_transceiver 9E ,
2216.Xr mc_close 9E ,
2217.Xr mc_getcapab 9E ,
2218.Xr mc_getprop 9E ,
2219.Xr mc_getstat 9E ,
2220.Xr mc_multicst 9E  ,
2221.Xr mc_open 9E ,
2222.Xr mc_propinfo 9E  ,
2223.Xr mc_setpromisc 9E  ,
2224.Xr mc_setprop 9E ,
2225.Xr mc_start 9E ,
2226.Xr mc_stop 9E ,
2227.Xr mc_tx 9E ,
2228.Xr mc_unicst 9E  ,
2229.Xr open 9E ,
2230.Xr allocb 9F ,
2231.Xr bcopy 9F ,
2232.Xr ddi_dma_addr_bind_handle 9F ,
2233.Xr ddi_dma_mem_alloc 9F ,
2234.Xr ddi_fm_acc_err_get 9F ,
2235.Xr ddi_fm_dma_err_get 9F ,
2236.Xr ddi_fm_ereport_post 9F ,
2237.Xr ddi_fm_fini 9F ,
2238.Xr ddi_fm_init 9F ,
2239.Xr ddi_fm_service_impact 9F ,
2240.Xr ddi_get8 9F ,
2241.Xr ddi_put8 9F ,
2242.Xr desballoc 9F ,
2243.Xr freemsg 9F ,
2244.Xr kstat_create 9F ,
2245.Xr mac_alloc 9F ,
2246.Xr mac_devt_to_instance 9F ,
2247.Xr mac_fini_ops 9F ,
2248.Xr mac_free 9F ,
2249.Xr mac_getinfo 9F ,
2250.Xr mac_hcksum_get 9F ,
2251.Xr mac_hcksum_set 9F ,
2252.Xr mac_init_ops 9F ,
2253.Xr mac_link_update 9F ,
2254.Xr mac_lso_get 9F ,
2255.Xr mac_maxsdu_update 9F ,
2256.Xr mac_private_minor 9F ,
2257.Xr mac_prop_info_set_default_link_flowctrl 9F ,
2258.Xr mac_prop_info_set_default_str 9F ,
2259.Xr mac_prop_info_set_default_uint32 9F ,
2260.Xr mac_prop_info_set_default_uint64 9F ,
2261.Xr mac_prop_info_set_default_uint8 9F ,
2262.Xr mac_prop_info_set_perm 9F ,
2263.Xr mac_prop_info_set_range_uint32 9F ,
2264.Xr mac_register 9F ,
2265.Xr mac_rx 9F ,
2266.Xr mac_unregister 9F ,
2267.Xr mod_install 9F ,
2268.Xr mod_remove 9F ,
2269.Xr strcmp 9F ,
2270.Xr timeout 9F ,
2271.Xr cb_ops 9S ,
2272.Xr ddi_device_acc_attr 9S ,
2273.Xr dev_ops 9S ,
2274.Xr mac_callbacks 9S ,
2275.Xr mac_register 9S ,
2276.Xr mblk 9S ,
2277.Xr modldrv 9S ,
2278.Xr modlinkage 9S
2279.Rs
2280.%A McCloghrie, K.
2281.%A Rose, M.
2282.%T RFC 1213 Management Information Base for Network Management of
2283.%T TCP/IP-based internets: MIB-II
2284.%D March 1991
2285.Re
2286.Rs
2287.%A McCloghrie, K.
2288.%A Kastenholz, F.
2289.%T RFC 1573 Evolution of the Interfaces Group of MIB-II
2290.%D January 1994
2291.Re
2292.Rs
2293.%A Kastenholz, F.
2294.%T RFC 1643 Definitions of Managed Objects for the Ethernet-like
2295.%T Interface Types
2296.Re
2297