Info in my Cache Memory: 2015

Thursday, November 26, 2015

TCP/IP Address Resolution For IP Multicast Addresses

Like most discussions of address resolution, the preceding sections all focus on unicast communication, where a datagram is sent from one source device to one destination device. Whether direct mapping or dynamic resolution is used for resolving a network layer address, it is a relatively simple matter to resolve addresses when there is only one intended recipient of the datagram. TCP/IP uses ARP for its dynamic resolution scheme, which is designed for unicast resolution only.

However, the Internet Protocol also supports multicasting of datagrams, as I explain in the topics on IP multicasting and IP multicast addressing. In this situation, the datagram must be sent to multiple recipients, which complicates matters considerably. We need to establish a relationship of some sort between the IP multicast group address and the addresses of the devices at the data link layer. We could do this by converting the IP multicast datagram to individual unicast transmissions at the data link layer, each using ARP for resolution, but this would be horribly inefficient.

Direct Mapping Technique for IEEE 802 Multicast MAC Addresses

When possible, IP makes use of the multicast addressing and delivery capabilities of the underlying network to deliver multicast datagrams on a physical network. Perhaps surprisingly, even though ARP employs dynamic resolution, multicast address resolution is done using a version of the direct mapping technique. By defining a mapping between IP multicast groups and data link layer multicast groups we enable physical devices to know when to pay attention to multicasted datagrams.

The most commonly used multicast-capable data link addressing scheme is the IEEE 802 addressing system best known for it use in Ethernet networks. These data link layer addresses have 48 bits, arranged into two blocks of 24. The upper 24 bits are arranged into a block called the organizationally unique identifier (OUI), with different values assigned to individual organizations; the lower 24 bits are then used for specific devices.

The Internet Assigned Number Authority (IANA) itself has an OUI that it uses for mapping multicast addresses to IEEE 802 addresses. This OUI is "01:00:5E". To form a mapping for Ethernet, 24 bits are used for this OUI and the 25th (of the 48) is always zero. This leaves 23 bits of the original 48 to encode the multicast address. To do the mapping, the lower-order 23 bits of the multicast address are used as the last 23 bits of the Ethernet address starting with "01:00:5E" for sending the multicast message. This process is illustrated in Figure 51.

Figure 51: Mapping of Multicast IP Addresses to IEEE 802 Multicast MAC Addresses
IP multicast addresses consist of the bit string “1110” followed by a 28-bit multicast group address. To create a 48-bit multicast IEEE 802 (Ethernet) address, the top 24 bits are filled in with the IANA’s multicast OUI, 01-00-5E, the 25th bit is zero, and the bottom 23 bits of the multicast group are put into the bottom 23 bits of the MAC address. This leaves 5 bits (shown in pink) that are not mapped to the MAC address, meaning that 32 different IP addresses may have the same mapped multicast MAC address.

Monday, November 2, 2015

Kmalloc: Which Flag to Use When

Which Flag to Use When

Situation Solution

Process context, can sleep Use GFP_KERNEL

Process context, cannot sleep Use GFP_ATOMIC, or perform your allocations

with GFP_KERNEL at an earlier or later point

when you can sleep.

Interrupt handler Use GFP_ATOMIC

Softirq Use GFP_ATOMIC

Tasklet Use GFP_ATOMIC

Need DMA-able memory, can sleep Use (GFP_DMA | GFP_KERNEL)

Need DMA-able memory, cannot sleep Use (GFP_DMA | GFP_ATOMIC), or perform your

allocation at an earlier point when you can sleep

Tuesday, October 13, 2015

Why Data Alignment Required??

Data Alignment:

Every data type in C/C++ will have alignment requirement (infact it is mandated by processor architecture, not by language). A processor will have processing word length as that of data bus size. On a 32 bit machine, the processing word size will be 4 bytes.

Historically memory is byte addressable and arranged sequentially. If the memory is arranged as single bank of one byte width, the processor needs to issue 4 memory read cycles to fetch an integer. It is more economical to read all 4 bytes of integer in one memory cycle. To take such advantage, the memory will be arranged as group of 4 banks as shown in the above figure.

The memory addressing still be sequential. If bank 0 occupies an address X, bank 1, bank 2 and bank 3 will be at (X + 1), (X + 2) and (X + 3) addresses. If an integer of 4 bytes is allocated on X address (X is multiple of 4), the processor needs only one memory cycle to read entire integer.

Where as, if the integer is allocated at an address other than multiple of 4, it spans across two rows of the banks as shown in the below figure. Such an integer requires two memory read cycle to fetch the data.

A variable’s data alignment deals with the way the data stored in these banks. For example, the natural alignment of int on 32-bit machine is 4 bytes. When a data type is naturally aligned, the CPU fetches it in minimum read cycles.

Similarly, the natural alignment of short int is 2 bytes. It means, a short int can be stored in bank 0 – bank 1 pair or bank 2 – bank 3 pair. A double requires 8 bytes, and occupies two rows in the memory banks. Any misalignment of double will force more than two read cycles to fetch double data.

Note that a double variable will be allocated on 8 byte boundary on 32 bit machine and requires two memory read cycles. On a 64 bit machine, based on number of banks, double variable will be allocated on 8 byte boundary and requires only one memory read cycle.

Structure Padding:

In C/C++ a structures are used as data pack. It doesn’t provide any data encapsulation or data hiding features (C++ case is an exception due to its semantic similarity with classes).

Because of the alignment requirements of various data types, every member of structure should be naturally aligned. The members of structure allocated sequentially increasing order. Let us analyze each struct declared in the above program.

Thursday, October 8, 2015

Quick overview of tasklets

Tasklets

Tasklets are a bottom-half mechanism built on top of softirqs. As already mentioned, they have nothing to do with tasks. Tasklets are similar in nature and work in a similar manner to softirqs; however, they have a simpler interface and relaxed locking rules.

The decision between whether to use softirqs versus tasklets is simple: You usually want to use tasklets. As we saw in the previous section, you can count on one hand the users of softirqs. Softirqs are required only for very high-frequency and highly threaded uses. Tasklets, on the other hand, see much greater use. Tasklets work just fine for the vast majority of cases and they are very easy to use.

Implementation of Tasklets

Because tasklets are implemented on top of softirqs, they are softirqs. As discussed, tasklets are represented by two softirqs: HI_SOFTIRQ and TASKLET_SOFTIRQ. The only real difference in these types is that the HI_SOFTIRQ-based tasklets run prior to the TASKLET_SOFTIRQ tasklets.

The Tasklet Structure

Tasklets are represented by the tasklet_struct structure. Each structure represents a unique tasklet. The structure is declared in <linux/interrupt.h>:

struct tasklet_struct {
        struct tasklet_struct *next;  /* next tasklet in the list */
        unsigned long state;          /* state of the tasklet */
        atomic_t count;               /* reference counter */
        void (*func)(unsigned long);  /* tasklet handler function */
        unsigned long data;           /* argument to the tasklet function */
};

The func member is the tasklet handler (the equivalent of action to a softirq) and it receives data as its sole argument.

The state member is one of zero, TASKLET_STATE_SCHED, or TASKLET_STATE_RUN. TASKLET_STATE_SCHED denotes a tasklet that is scheduled to run and TASKLET_STATE_RUN denotes a tasklet that is running. As an optimization,TASKLET_STATE_RUN is used only on multiprocessor machines because a uniprocessor machine always knows whether the tasklet is running (it is either the currently executing code, or not).

The count field is used as a reference count for the tasklet. If it is nonzero, the tasklet is disabled and cannot run; if it is zero, the tasklet is enabled and can run if marked pending.

Scheduling Tasklets

Scheduled tasklets (the equivalent of raised softirqs)^[5] are stored in two per-processor structures: tasklet_vec (for regular tasklets) and tasklet_hi_vec (for high-priority tasklets). Both of these structures are linked lists oftasklet_struct structures. Each tasklet_struct structure in the list represents a different tasklet.

^[5] Yet another example of the evil naming schemes at work here. Why are softirqs raised but tasklets scheduled? Who knows? Both terms mean to mark that bottom half pending so that it is executed soon.

Tasklets are scheduled via the tasklet_schedule() and tasklet_hi_schedule() functions, which receive a pointer to the tasklet's tasklet_struct as their lone argument. The two functions are very similar (the difference being that one uses TASKLET_SOFTIRQ and one uses HI_SOFTIRQ). Writing and using tasklets is covered in the next section. For now, let's look at the details of tasklet_schedule():

Check whether the tasklet's state is TASKLET_STATE_SCHED. If it is, the tasklet is already scheduled to run and the function can immediately return.
Save the state of the interrupt system, and then disable local interrupts. This ensures that nothing on this processor will mess with the tasklet code while tasklet_schedule() is manipulating the tasklets.
Add the tasklet to be scheduled to the head of the tasklet_vec or tasklet_hi_vec linked list, which is unique to each processor in the system.
Raise the TASKLET_SOFTIRQ or HI_SOFTIRQ softirq, so do_softirq() will execute this tasklet in the near future.
Restore interrupts to their previous state and return.

At the next earliest convenience, do_softirq() is run as discussed in the previous section. Because most tasklets and softirqs are marked pending in interrupt handlers, do_softirq() most likely runs when the last interrupt returns. Because TASKLET_SOFTIRQ or HI_SOFTIRQ is now raised, do_softirq() executes the associated handlers. These handlers, tasklet_action() and tasklet_hi_action(), are the heart of tasklet processing. Let's look at what they do:

Disable local interrupt delivery (there is no need to first save their state because the code here is always called as a softirq handler and interrupts are always enabled) and retrieve the tasklet_vec or tasklet_hi_vec list for this processor.
Clear the list for this processor by setting it equal to NULL.
Enable local interrupt delivery. Again, there is no need to restore them to their previous state because this function knows that they were always originally enabled.
Loop over each pending tasklet in the retrieved list.
If this is a multiprocessing machine, check whether the tasklet is running on another processor by checking the TASKLET_STATE_RUN flag. If it is currently running, do not execute it now and skip to the next pending tasklet (recall, only one tasklet of a given type may run concurrently).
If the tasklet is not currently running, set the TASKLET_STATE_RUN flag, so another processor will not run it.
Check for a zero count value, to ensure that the tasklet is not disabled. If the tasklet is disabled, skip it and go to the next pending tasklet.
We now know that the tasklet is not running elsewhere, is marked as running so it will not start running elsewhere, and has a zero count value. Run the tasklet handler.
After the tasklet runs, clear the TASKLET_STATE_RUN flag in the tasklet's state field.
Repeat for the next pending tasklet, until there are no more scheduled tasklets waiting to run.

The implementation of tasklets is simple, but rather clever. As you saw, all tasklets are multiplexed on top of two softirqs, HI_SOFTIRQ and TASKLET_SOFTIRQ. When a tasklet is scheduled, the kernel raises one of these softirqs. These softirqs, in turn, are handled by special functions that then run any scheduled tasklets. The special functions ensure that only one tasklet of a given type is running at the same time (but other tasklets can run simultaneously). All this complexity is then hidden behind a clean and simple interface.

Using Tasklets

In most cases, tasklets are the preferred mechanism with which to implement your bottom half for a normal hardware device. Tasklets are dynamically created, easy to use, and very quick. Moreover, although their name is mind-numbingly confusing, it grows on you: It is cute.

Declaring Your Tasklet

You can create tasklets statically or dynamically. What option you choose depends on whether you have (or want) a direct or indirect reference to the tasklet. If you are going to statically create the tasklet (and thus have a direct reference to it), use one of two macros in <linux/interrupt.h>:

DECLARE_TASKLET(name, func, data)
DECLARE_TASKLET_DISABLED(name, func, data);

Both these macros statically create a struct tasklet_struct with the given name. When the tasklet is scheduled, the given function func is executed and passed the argument data. The difference between the two macros is the initial reference count. The first macro creates the tasklet with a count of zero, and the tasklet is enabled. The second macro sets count to one, and the tasklet is disabled. Here is an example:

DECLARE_TASKLET(my_tasklet, my_tasklet_handler, dev);

This line is equivalent to

struct tasklet_struct my_tasklet = { NULL, 0, ATOMIC_INIT(0),
                                     my_tasklet_handler, dev };

This creates a tasklet named my_tasklet that is enabled with tasklet_handler as its handler. The value of dev is passed to the handler when it is executed.

To initialize a tasklet given an indirect reference (a pointer) to a dynamically created struct tasklet_struct, t, call tasklet_init():

tasklet_init(t, tasklet_handler, dev);  /* dynamically as opposed to statically */

Writing Your Tasklet Handler

The tasklet handler must match the correct prototype:

void tasklet_handler(unsigned long data)

As with softirqs, tasklets cannot sleep. This means you cannot use semaphores or other blocking functions in a tasklet. Tasklets also run with all interrupts enabled, so you must take precautions (for example, disable interrupts and obtain a lock) if your tasklet shares data with an interrupt handler. Unlike softirqs, however, two of the same tasklets never run concurrentlyalthough two different tasklets can run at the same time on two different processors. If your tasklet shares data with another tasklet or softirq, you need to use proper locking (see Chapter 8, "Kernel Synchronization Introduction," and Chapter 9, "Kernel Synchronization Methods").

Scheduling Your Tasklet

To schedule a tasklet for execution, tasklet_schedule() is called and passed a pointer to the relevant tasklet_struct:

tasklet_schedule(&my_tasklet);    /* mark my_tasklet as pending */

After a tasklet is scheduled, it runs once at some time in the near future. If the same tasklet is scheduled again, before it has had a chance to run, it still runs only once. If it is already running, for example on another processor, the tasklet is rescheduled and runs again. As an optimization, a tasklet always runs on the processor that scheduled itmaking better use of the processor's cache, you hope.

You can disable a tasklet via a call to tasklet_disable(), which disables the given tasklet. If the tasklet is currently running, the function will not return until it finishes executing. Alternatively, you can usetasklet_disable_nosync(), which disables the given tasklet but does not wait for the tasklet to complete prior to returning. This is usually not safe because you cannot assume the tasklet is not still running. A call totasklet_enable() enables the tasklet. This function also must be called before a tasklet created with DECLARE_TASKLET_DISABLED() is usable. For example:

tasklet_disable(&my_tasklet);    /* tasklet is now disabled */

/* we can now do stuff knowing that the tasklet cannot run .. */

tasklet_enable(&my_tasklet);     /* tasklet is now enabled */

You can remove a tasklet from the pending queue via tasklet_kill(). This function receives a pointer as a lone argument to the tasklet's tasklet_struct. Removing a scheduled tasklet from the queue is useful when dealing with a tasklet that often reschedules itself. This function first waits for the tasklet to finish executing and then it removes the tasklet from the queue. Nothing stops some other code from rescheduling the tasklet, of course. This function must not be used from interrupt context because it sleeps.

ksoftirqd

Softirq (and thus tasklet) processing is aided by a set of per-processor kernel threads. These kernel threads help in the processing of softirqs when the system is overwhelmed with softirqs.

As already described, the kernel processes softirqs in a number of places, most commonly on return from handling an interrupt. Softirqs might be raised at very high rates (such as during intense network traffic). Further, softirq functions can reactivate themselves. That is, while running, a softirq can raise itself so that it runs again (indeed, the networking subsystem does this). The possibility of a high frequency of softirqs in conjunction with their capability to remark themselves active can result in user-space programs being starved of processor time. Not processing the reactivated softirqs in a timely manner, however, is unacceptable. When softirqs were first designed, this caused a dilemma that needed fixing, and neither obvious solution was a good one. First, let's look at each of the two obvious solutions.

The first solution is simply to keep processing softirqs as they come in and to recheck and reprocess any pending softirqs before returning. This ensures that the kernel processes softirqs in a timely manner and, most importantly, that any reactivated softirqs are also immediately processed. The problem lies in high load environments, in which many softirqs occur, that continually reactivate themselves. The kernel might continually service softirqs without accomplishing much else. User-space is neglectedindeed, nothing but softirqs and interrupt handlers run and, in turn, the system's users get mad. This approach might work fine if the system is never under intense load; if the system experiences even moderate interrupt levels this solution is not acceptable. User-space cannot be starved for significant periods.

The second solution is not to handle reactivated softirqs. On return from interrupt, the kernel merely looks at all pending softirqs and executes them as normal. If any softirqs reactivate themselves, however, they will not run until the next time the kernel handles pending softirqs. This is most likely not until the next interrupt occurs, which can equate to a lengthy amount of time before any new (or reactivated) softirqs are executed. Worse, on an otherwise idle system it is beneficial to process the softirqs right away. Unfortunately, this approach is oblivious to which processes may or may not be runnable. Therefore, although this method prevents starving user-space, it does starve the softirqs, and it does not take good advantage of an idle system.

In designing softirqs, the developers realized that some sort of compromise was needed. The solution ultimately implemented in the kernel is to not immediately process reactivated softirqs. Instead, if the number of softirqs grows excessive, the kernel wakes up a family of kernel threads to handle the load. The kernel threads run with the lowest possible priority (nice value of 19), which ensures they do not run in lieu of anything important. This concession prevents heavy softirq activity from completely starving user-space of processor time. Conversely, it also ensures that "excess" softirqs do run eventually. Finally, this solution has the added property that on an idle system, the softirqs are handled rather quickly (because the kernel threads will schedule immediately).

There is one thread per processor. The threads are each named ksoftirqd/n where n is the processor number. On a two-processor system, you would have ksoftirqd/0 and ksoftirqd/1. Having a thread on each processor ensures an idle processor, if available, is always able to service softirqs. After the threads are initialized, they run a tight loop similar to this:

for (;;) {
        if (!softirq_pending(cpu))
                schedule();

        set_current_state(TASK_RUNNING);

        while (softirq_pending(cpu)) {
                do_softirq();
                if (need_resched())
                    schedule();
        }

        set_current_state(TASK_INTERRUPTIBLE);
}

If any softirqs are pending (as reported by softirq_pending()), ksoftirqd calls do_softirq() to handle them. Note that it does this repeatedly to handle any reactivated softirqs, too. After each iteration, schedule() is called if needed, to allow more important processes to run. After all processing is complete, the kernel thread sets itself TASK_INTERRUPTIBLE and invokes the scheduler to select a new runnable process.

The softirq kernel threads are awakened whenever do_softirq() detects an executed kernel thread reactivating itself.

Tuesday, August 18, 2015

NAPI : Receive Interrupt Mitigation

When a network driver is written as we have described above, the processor is interrupted for every packet received by your interface. In many cases, that is the desired mode of operation, and it is not a problem. High-bandwidth interfaces, however, can receive thousands of packets per second. With that sort of interrupt load, the overall performance of the system can suffer.

As a way of improving the performance of Linux on high-end systems, the networking subsystem developers have created an alternative interface (called NAPI)^[1] based on polling. "Polling" can be a dirty word among driver developers, who often see polling techniques as inelegant and inefficient. Polling is inefficient, however, only if the interface is polled when there is no work to do. When the system has a high-speed interface handling heavy traffic, there is always more packets to process. There is no need to interrupt the processor in such situations; it is enough that the new packets be collected from the interface every so often.

^[1] NAPI stands for "new API"; the networking hackers are better at creating interfaces than naming them.

Stopping receive interrupts can take a substantial amount of load off the processor. NAPI-compliant drivers can also be told not to feed packets into the kernel if those packets are just dropped in the networking code due to congestion, which can also help performance when that help is needed most. For various reasons, NAPI drivers are also less likely to reorder packets.

Not all devices can operate in the NAPI mode, however. A NAPI-capable interface must be able to store several packets (either on the card itself, or in an in-memory DMA ring). The interface should be capable of disabling interrupts for received packets, while continuing to interrupt for successful transmissions and other events. There are other subtle issues that can make writing a NAPI-compliant driver harder; seeDocumentation/networking/NAPI_HOWTO.txt in the kernel source tree for the details.

Relatively few drivers implement the NAPI interface. If you are writing a driver for an interface that may generate a huge number of interrupts, however, taking the time to implement NAPI may well prove worthwhile.

The snull driver, when loaded with the use_napi parameter set to a nonzero value, operates in the NAPI mode. At initialization time, we have to set up a couple of extra struct net_device fields:

if (use_napi) {
    dev->poll        = snull_poll;
    dev->weight      = 2;
}

The poll field must be set to your driver's polling function; we look at snull_poll shortly. The weight field describes the relative importance of the interface: how much traffic should be accepted from the interface when resources are tight. There are no strict rules for how the weight parameter should be set; by convention, 10 MBps Ethernet interfaces set weight to 16, while faster interfaces use 64. You should not set weight to a value greater than the number of packets your interface can store. In snull, we set the weight to two as a way of demonstrating deferred packet reception.

The next step in the creation of a NAPI-compliant driver is to change the interrupt handler. When your interface (which should start with receive interrupts enabled) signals that a packet has arrived, the interrupt handler should not process that packet. Instead, it should disable further receive interrupts and tell the kernel that it is time to start polling the interface. In the snull "interrupt" handler, the code that responds to packet reception interrupts has been changed to the following:

if (statusword & SNULL_RX_INTR) {
    snull_rx_ints(dev, 0);  /* Disable further interrupts */
    netif_rx_schedule(dev);
}

When the interface tells us that a packet is available, the interrupt handler leaves it in the interface; all that needs to happen at this point is a call to netif_rx_schedule, which causes our poll method to be called at some future point.

The poll method has this prototype:

int (*poll)(struct net_device *dev, int *budget);

The snull implementation of the poll method looks like this:

static int snull_poll(struct net_device *dev, int *budget)
{
    int npackets = 0, quota = min(dev->quota, *budget);
    struct sk_buff *skb;
    struct snull_priv *priv = netdev_priv(dev);
    struct snull_packet *pkt;
    
    while (npackets < quota && priv->rx_queue) {
        pkt = snull_dequeue_buf(dev);
        skb = dev_alloc_skb(pkt->datalen + 2);
        if (! skb) {
            if (printk_ratelimit(  ))
                printk(KERN_NOTICE "snull: packet dropped\n");
            priv->stats.rx_dropped++;
            snull_release_buffer(pkt);
            continue;
        }
        memcpy(skb_put(skb, pkt->datalen), pkt->data, pkt->datalen);
        skb->dev = dev;
        skb->protocol = eth_type_trans(skb, dev);
        skb->ip_summed = CHECKSUM_UNNECESSARY; /* don't check it */
        netif_receive_skb(skb);
        
            /* Maintain stats */
        npackets++;
        priv->stats.rx_packets++;
        priv->stats.rx_bytes += pkt->datalen;
        snull_release_buffer(pkt);
    }
    /* If we processed all packets, we're done; tell the kernel and reenable ints */
    *budget -= npackets;
    dev->quota -= npackets;
    if (! priv->rx_queue) {
        netif_rx_complete(dev);
        snull_rx_ints(dev, 1);
        return 0;
    }
    /* We couldn't process everything. */
    return 1;
}

The central part of the function is concerned with the creation of an skb holding the packet; this code is the same as what we saw in snull_rx before. A number of things are different, however:

The budget parameter provides a maximum number of packets that we are allowed to pass into the kernel. Within the device structure, the quota field gives another maximum; the poll method must respect the lower of the two limits. It should also decrement both dev->quota and *budget by the number of packets actually received. The budget value is a maximum number of packets that the current CPU can receive from all interfaces, while quota is a per-interface value that usually starts out as the weight assigned to the interface at initialization time.
Packets should be fed to the kernel with netif_receive_skb, rather than netif_rx.
If the poll method is able to process all of the available packets within the limits given to it, it should re-enable receive interrupts, call netif_rx_complete to turn off polling, and return 0. A return value of 1 indicates that there are packets remaining to be processed.

The networking subsystem guarantees that any given device's poll method will not be called concurrently on more than one processor. Calls to poll can still happen concurrently with calls to your other device methods, however.

WIth latest kernel ( 3.10 ) the functions have changed ,see the below link for more information.

http://blog.packagecloud.io/eng/2016/06/22/monitoring-tuning-linux-networking-stack-receiving-data/

Friday, August 14, 2015

802.11ac : VHT physical layer frame format

VHT Signal A field

The Signal A field comes first in the frame, and it may take on one of two forms depending on whether the transmission is single-user or multi-user. The depiction of the Signal A field in below Figure is the format for a single user. The two parts of the VHT Signal A field, each of which corresponds to an OFDM symbol, are referred to as VHT-SIG-A1 and VHT-SIG-A2. The two halves of the field are shown as Figure (a) and Figure(b), respectively.

VHT Signal A field (single-user format)

VHT Signal B field

The VHT Signal B field is used to set up the data rate, as well as tune in MIMO reception. Like the VHT Signal A field, it is modulated conservatively to assist receivers in determining the data rate of the payload; however, it is modulated using the VHT MCS 0. Although it is modulated with BPSK with a convolutional code of R=1/2, the VHT modulations have slightly more efficiency and hold a few more bits. The VHT Signal B field is designed to be transmitted in a single OFDM symbol, which is why it has slightly different lengths depending on the channel width. Figure 2-8 shows the single-user format of the VHT Signal B field and its dependence on channel width. (Other formats for this field will be discussed in Chapter 4.)

Thursday, May 7, 2015

How to Implement the Bit Maps?

How to Implement the Bit Maps?

Step 1: Create a bit Map Array.
             unsigned char bit_map[2];

Step 2: Calculate the Bit Map array Index and shift index (How many bits needs to shift).
      < If the user gives bit_position input starts from 1>
            bit_map_array_index = (bit_position - 1) /8
            shift_Index = (bit_position - 1)%8

           < If the user gives bit_position input starts from 0>
            bit_map_array_index = bit_position /8
            shift_Index = bit_position %8

Step 3: Set the Bit in the Bit Map using
      bit_map[bit_map_array_index] |= (1<<shift_Index)

Step 4: Clear the Bit in the Bit Map using
        bit_map[bit_map_array_index] &= ~(1<<shift_Index)

SKB Tinkering - Part 2

What is SKB in Linux kernel? What are SKB operations? Memory Representation of SKB? How to send packet out using skb operations?

As many of them are aware, OSI reference and TCP/IP Model.

For any Networking application TCP/IP model is required to process/route the packets from one end to other. Then, How to support the networking in Linux kernel? How to process the packets in Linux kernel?

This article will answer the above questions.

A network interface represents a thing which sends and receives packets. This is normally interface code for a physical device like an ethernet card. However some devices are software only such as the loopback device which is used for sending data to yourself.
The network subsystem of the Linux kernel is designed to be completely protocol-independent. This applies to both networking protocols (Internet protocol [IP] versus IPX or other protocols) and hardware protocols (Ethernet versus token ring, etc.).
A header is a set of bytes (err, octets) prepended to a packet as it is passed through the various layers of the networking subsystem. When an application sends a block of data through a TCP socket, the networking subsystem breaks that data up into packets and puts a TCP header, describing where each packet fits within the stream, at the beginning. The lower levels then put an IP header, used to route the packet to its destination, in front of the TCP header. If the packet moves over an Ethernet-like medium, an Ethernet header, interpreted by the hardware, goes in front of the rest.

The two important data structures of Linux kernel network layer are:

- sk_buff (defined in /include/linux/sk_buff.h)

- net_device (defined in /include/linux/net_device.h)

Each interface is described by a struct net_device item. ( Please refer in another post)

What is sk_buff:

sk_buff means socket buffers. This is core structure in Linux networking.
skbuffs are the buffers in which the Linux kernel handles network packets. The packet is received by the network card, put into a skbuff and then passed to the network stack, which uses the skbuff all the time.
In the same words as above:

As we need to manipulate packets through the Linux kernel stack, this manipulation involves efficiently:

Adding protocol headers/trailers down the stack.
Removing protocol headers/trailers up the stack.
Concatenating/separating data.
Each protocol should have convenient access to header fields.

To order to perform all of above, kernel provides the sk_buff structure.

Structure of sk_buff

sk_buff structure is created when an application passes data to a socket or when a packet arrives at the network adaptor (dev_alloc_skb() is invoked).

MEMORY Representation of SKB structure:

SKB has four parts. Memory representation of skb structure is depicted below.

sk_buff has five pointers as mentioned below.

head	the start of the packet
data	the start of the packet payload
tail	the end of the packet payload
end	the end of the packet
len	the amount of data of the packet

As shown in above figure, Skb memory usually has four parts:

1. head room : located skb-> between the head and skb-> data, which is stored in the local network protocol header, such as TCP, IP header, Ethernet header are located here;

2. User data : usually filled by the application layer calls through the system between skb-> data and skb-> tail between;

3. tail room : between skb-> tail and skb-> end, which is the core part of the user data to fill in the back part;

4. skb-> after the end is stored in a special structure struct skb_shared_info.

STEPS for sending the packet out using SKB OPERATIONS:

Step 1:   Allocate Memory for the skb

skb = alloc_skb(len, GFP_KERNEL);

Once the skb is allocated with memory using alloc_skb() then it will look like as shown in below.

As you can see, the head, data, and tail pointers all point to the beginning of the data buffer. And the end pointer points to the end of it. Note that all of the data area is considered tail room.

The length of this SKB is zero, it isn't very interesting since it doesn't contain any packet data at all.

Step 2: Space for Headroom to add protocol headers (Ethernet + IP+ TCP headers etc..)

Reserve some space for protocol headers using skb_reserve().It usually initialize the head room, by calling skb_reserve () function as shown in figure below.

skb_reserve(skb, header_len);

skb->data and skb->tail pointer increments (advances or moves) by the specified Header length.

For example, the TCP layer to send a data packet, head room, at least if tcphdr + iphdr + ethhdr.

Step 3 : Add the User Data (payload) after the skb->put()

skb_put() advances (moves the) 'skb->tail' pointer by the specified number of bytes (user_data_len), it also increments 'skb->len' by that number of bytes as well. This routine must not be called on a SKB that has any paged data.

unsigned char *data = skb_put(skb, user_data_len);

memcpy(data, 0x11,  user_data_len);

Step 3:  Add the Headers Using the skb->push()  in the Headroom


skb_push() will decrement the 'skb->data' pointer by the specified number of bytes. It will also increment'skb->len' by that number of bytes as well.  Make sure that there is enough head room for the push being performed.
For example , push the TCP header to the front of the SKB.


unsigned char *tcp_header = skb->push(skb, sizeof(struct udphdr));
struct tcphdr *tcp;
tcp = tcp_header;
 
tcp->source = htons(1025);
tcp->dest = htons(2095);
tcp->len = htons(user_data_len);
tcp->check = 0;

skb->pull() :

Remove the respective headers From the Headroom and returning the bytes to headroom using skb_pull() operation.

It Increments (pulled down) the skb-> data by a specified number of bytes and decrements the skb_len.

Step 5: Similarly Push the Ipv4 header in to sk_buffer using

skb_push operation and send the packet out using dev_queue_xmit() function.

Please refer the example code below.

Pictorial representation of skb operations.

Some more skb operations

skb_clone(),skb_copy(),skb_trim()

skb_clone() : This function will not copy the entire skb structure. It is just make the skb to point to the same region of memory on the line.

In this case data_ref pointer present in the skb_shared_info structure will be incremented to 2

For example:

Packets can be captured using "tcpdump" in linux kernel. At this point, the packet will be given to linux stack and the same packet is given to tcpdump as well (First protocol stack and after tcpdump).

In this case not to completely copy skb it? Not necessary, because the two parts are read, the network data itself is unchanged, becomes just strcut sk_buff pointer inside, that is to say, we just copy skb to point to the same region of memory on the line! This is skb_clone () did:

It can also access using by skb + 1

skb_copy()
In some cases we need have to write a complete copy of their data skb call skb_copy().It copies entire data of skb and creates another skb.

skb_trim() : remove end from a buffer

Prototype

void skb_trim(struct sk_buff *skb, unsigned int len);

SKB SUPPORT FUNCTIONS

There are a bunch of skb support functions provided by the sk_buff layer.

allocation / free / copy / clone and expansion functions

struct sk_buff *alloc_skb(unsigned int size, int gfp_mask)

This function allocates a new skb. This is provided by the skb layer to initialize some privat data and do memory statistics. The returned buffer has no headroom and a tailroom of /size/ bytes.

void kfree_skb(struct sk_buff *skb)

Decrement the skb's usage count by one and free the skb if no references left.

struct sk_buff *skb_get(struct sk_buff *skb)

Increments the skb's usage count by one and returns a pointer to it.

struct sk_buff *skb_clone(struct sk_buff *skb, int gfp_mask)

This function clones a skb. Both copies share the packet data but have their own struct sk_buff. The new copy is not owned by any socket, reference count is 1.

struct sk_buff *skb_copy(const struct sk_buff *skb, int gfp_mask)

Makes a real copy of the skb, including packet data. This is needed, if You wish to modify the packet data. Reference count of the new skb is 1.

struct skb_copy_expand(const struct sk_buff *skb, int new_headroom, int new_tailroom, int gfp_mask)

Make a copy of the skb, including packet data. Additionally the new skb has a haedroom of /new_headroom/ bytes size and a tailroom of /new_tailroom/ bytes.

anciliary functions

int skb_cloned(struct sk_buff *skb)

Is the skb a clone?

int skb_shared(struct sk_Buff *skb)

Is this skb shared? (is the reference count > 1)?

operations on lists of skb's

struct sk_buff *skb_peek(struct sk_buff_head *list_)

peek a skb from front of the list; does not remove skb from the list

struct sk_buff *skb_peek_tail(struct sk_buff_head *list_)

peek a skb from tail of the list; does not remove sk from the list

__u32 skb_queue_len(sk_buff_head *list_)

return the length of the given skb list

void skb_queue_head(struct sk_buff_head *list_, struct sk_buff *newsk)

enqueue a skb at the head of a given list

void skb_queue_tail(struct sk_buff_head *list_, struct sk_buff *newsk)

enqueue a skb at the end of a given list.

int skb_headroom(struct sk_buff *skb)

return the amount of bytes of free space at the head of skb

int skb_tailroom(struct sk_buff *skb)

return the amount of bytes of free space at the end of skb

struct sk_buff *skb_cow(struct sk_buff *skb, int headroom)

if the buffer passed lacks sufficient headroom or is a clone it is copied and additional headroom made available.

void struct sk_buff *skb_dequeue(struct sk_buff_head *list_)

skb_dequeue() takes the first buffer from a list (dequeue a skb from the head of the given list) If the list is empty a NULL pointer is returned. This is used to pull buffers off queues. The buffers are added with the routines skb_queue_head() andskb_queue_tail().

struct sk_buff *sbk_dequeue_tail(struct sk_buff_head *list_)

dequeue a skb from the tail of the given list

Network device packet flow:

How to Identify skb is linear or not.
Data is placed between skb-> head and skb-> end, it is called as linear (linear).
skb->data_len == 0 is Linear.

else

If skb is not linear means skb->data_len != 0,

the length of skb->data is (skb->len) - (skb-> data_len) for the head ONLY.

Pseudocode
----------

/* SKB Is Linear */

if (skb->data_len == 0)

{

printk(" Skb is Linear : the skb_len is :%d", skb->len);

}

/* SKB Is Not Linear */

else

{

if(skb->data_len != 0)

{

skb->len = (skb->len) - (skb->data_len);

}

Printk(“Skb is Not Linear : the skb_len is :%d", skb->len);

}

skb->data_len = struct skb_shared_info->frags[0...struct skb_shared_info->nr_frags].size

+ size of data in struct skb_shared_info->frag_list

When SKB is Non Linear?

First Model :

One is the common NIC driver model, data is stored in different locations of physical pages, skb_shared_info there are an array to store a set of (page, offset, size) of the information used to record these data

Second Model:

frag_list assembled IP packet fragmentation (fragment) when used in:

Fragmentation of data has its own skb structure, which through skb-> next link into a single linked list, the first table is the first one in the skb shared_info frag_list.

Third Model:

GSO segment (segmentation) used a model, when a large TCP data packets are cut into several MTU size, they are also through skb-> next link together: