Embedded Matters: 2009

Tuesday, November 03, 2009

System knowledge of an Embedded Engineer

In a embedded systems company, a project manager was newly appointed and itseems that he had many years of experience in embedded systems. He was driving many embedded systems projects and he used to be very proud of himself about his experience. One day morning, shortly after he came to office, he called the system administration department and shouted "Without asking me why did you change my place? You have put somebody else`s computer on my desk". The administrator answered, "sorry sir, it`s your computer only. I just changed the desktop picture. That`s it!".

Monday, September 28, 2009

Marvell 88E1111S initialization: How To

Initializing 88E1111S was a big problem for me, since I could not get the detailed datasheet/user's manual for the same. Once I got the handle it was a simple matter. So, let me explain the steps. This procedure is for 88E1111S with ID of 0x01410cc (88E1111-BAB1 is written on the chip).

First let me define the addresses of the very essential registers for the initialization.

Registers used
Register	Address
#define MIIM_CONTROL	0x00
#define MIIM_STATUS	0x01
#define MIIM_ANAR	0x04
#define MIIM_GBIT_CONTROL	0x09

The initialization code is as follows:

/* Reset the chip */
write_phy_reg(MIIM_CONTROL, 0x9140);
/* Wait for Reset over */
while(read_phy_reg(MIIM_CONTROL) & 0x8000);
/* Marvel 88E1111S sequence */
write_phy_reg(0x1d, 0x1f);
write_phy_reg(0x1e, 0x200c);
write_phy_reg(0x1d, 0x05);
write_phy_reg(0x1e, 0x0);
write_phy_reg(0x1e, 0x100);
/* Enable the 88e1111 internal RX/TX clock delay */
write_phy_reg(0x14, 0x0cd2);
/* Set the Gigabit control register and
Autonegotiation Advertisement register */
write_phy_reg(MIIM_GBIT_CONTROL, 0xe00);
write_phy_reg(MIIM_ANAR, 0x1e1);
/* Reset Again */
write_phy_reg(MIIM_CONTROL, 0x9140);

/* Check for the link to come up */
while((status = read_phy_reg(MIIM_CONTROL)) & 0x0004);
/* Now, initialization is over. You can parse the status Register to know the Speed and Duplex */

Ping me if you still have problem.

Friday, September 11, 2009

JLPT Level 1 book with english translation

With English explanation, I got Japanese exam level 1 book にほんご500問上級from ask publications. Seems to be easy and shortcut way to prepare for Level 1 exam. It also provides problems using practical Japanese with English explanation. So, I feel it is a good book to concentrate on more essential Japanese. But, I don`t recommend this book for Kanji beginners, since it has not been edited with all kanji. The same series provides books for Level 2 and Level 3 and 4 too as にほんご500問中級 and にほんご500問初級 respectively. Gambate kudasaine!!

Tuesday, September 08, 2009

Howto create job

Two embedded engineers had gone abroad for on-site job. After the work was over, both returned to home country. After some time, one of those guys got call from the client company for some extension on his previous work. Even after that work was over, he used to get work often from the client company. The another guy who got bored of being in the parent company asked him how he only get the job frequently. That another guy answered "With bugs, engineers create job for themselves and for their next generation too".

MPC8313e eTSEC checksum offloading: Programming guidelines

Checksum offloading(Checksum generation and verification) with TSEC and eTSEC controllers of MPC85xx and MPC83xx family processors is easy, but no clue if something gone wrong and the checksum is not generated. Just listing the sequence may help you in troubleshooting the problem.

TSEC checksum generation application notes
1) First of all, during initialization, enable IPCSEN and TUCSEN bits of Transmit Control Register (TCTRL). 2) Next, in the buffer pointed by the buffer descriptor, the first 8 bytes must be allocated for the Frame control block (FCB). Fill it by setting the appropriate bits of IP, IP6, TUP, UDP, CIP, CTU and L4OS, L3OS fields. (When filling a normal IP packet subsequent to FCB will have L4OS as 20 and L3OS as 14 respectively.) 3) Fill the packet from the next byte of FCB. 4) Set the Length filed of the buffer descriptor with (Packet data length + 8). (The length of FCB is added to the actual packet data length.) 5) Set TOE bit of the flags field of the Transmit Buffer descriptor and Transmit the packet. 6) Make sure the IP, TCP and UDP checksum fields of your packet data is filled with zero.

TSEC checksum verification application notes
1) First of all, during initialization, enable IPCSEN and TUCSEN and PRSDEP fields of Receive Control Register (RCTRL). 2) The first 8 bytes of the received data is the Frame control block (FCB). First check whether IP, TUP bits are set to make sure the packet has come under IP, IPv6, TCP and UDP category. Then check CIP and CTU whether the checksum has been verified and then the result bits EIP and ETU. (In case of fragmented packets(even they are TCP or UDP), they will not come under TCP, UDP category. Those packets have to be sent to upper layer for checksum verification) 3) Subtract 8 from Received packet length and use the remaining packet.

For further troubleshooting,

1) Check with the latest errata. For example,

http://www.freescale.com/files/32bit/doc/errata/MPC8313ECE.pdf?fpsp=1&WT_TYPE=Errata&WT_VENDOR=FREESCALE&WT_FILE_FORMAT=pdf&WT_ASSET=Documentation

2) Check the packet structure and data with packet capture software/analyzer.

The above application notes is mainly based on MPC8313E. So, please crosscheck any other settings for other processors.

Sunday, August 30, 2009

Xilinx TEMAC Checksum offload programming sample

Programming with Xilinx TEMAC Checksum offload engine was bit complex, since it calculates only TCP and UDP checksum on transmission and gives the raw checksum on reception. So, I just tried to explain the logic behind the TCP/UDP Checksum Off load in Hardware with a sample UDP packet transmission and reception.

For Transmission

Step 1:

In UDP layer, Do not forget to fill Checksum field of UDP header of the packet with zero.
Calculate the Pseudo Checksum and send it to driver.

How to calculate pseudo_csum?

unsigned int pseudo_csum;
unsigned short *iphdr_ptr;
pseudo_csum = *iphdr_ptr ++; /* Source IP Address (First two bytes) */
pseudo_csum += *iphdr_ptr ++; /* Source IP Address (Last two bytes) */
pseudo_csum += *iphdr_ptr ++; /* Destination IP Address (First two bytes) */
pseudo_csum += *iphdr_ptr ++; /* Destination IP Address (Last two bytes) */
pseudo_csum += htons(UDP_PROTOCOL_ID); /* UDP Protocol ID: 0x11 */
pseudo_csum += udp_length; /* UDP Packet Length (Data + Heaader Length) */pseudo_csum = (pseudo_csum & 0xffff) + (pseudo_csum >>16);
pseudo_csum += (pseudo_csum >>16);
return (unsigned short)pseudo_csum;
Step 2:

Set the Transmit buffer descriptor with these extra settings and transmit.

TransmitBD.APP0 = TransmitBD.APP0 | TX_CSCNTRL;
TransmitBD.APP1 = (TX_CSBEGIN << 16) | TX_CSINSERT;
TransmitBD.APP2 = pseudo_csum;

TX_CSCNTRL is 0x01
TX_CSBEGIN is 34 for IPv4/UDP. It is starting of UDP Header (Ethernet Header size (14) + IP Header size (20)).
TX_CSINSERT is 40 for IPv4/UDP. It is Checksum field offset (starting of UDP header(34) + Checksum offset (6)).

For Reception

In reception, the hardware checksum offload engine just adds the IP packet data in 16 bits. So, subtract the IP header, subtract the UDP checksum, calculate and add the pseudo header and verify with the checksum in the packet.

unsigned short *ip_data = (unsigned short *)(RecieveBD.Buffer + 14);
unsigned short *udp_hdr = (struct udp_header *)(RecieveBD.Buffer + 14 + 20);
unsigned short packet_csum, hw_csum;
unsigned int temp;

Take the hardware generated checksum and shift 16 bits left.

temp = RecieveBD.APP3 & 0xffff;
temp = temp << 16;

Subtract the first 12 bytes of IP header (except the Source and Destination IP addresses: pseudo header)

for (i = 0; i < 6; i++) {
temp -= ip_data[i];
}

Add the UDP protocol ID(0x11) (pseudo header).

temp += htos(UDP_PROTOCOL_ID);

Subtract the UDP checksum. And keep it for later validation.

packet_csum = udp_hdr->check_sum;
temp -= packet_csum;

Add the UDP packet length (pseudo header).

temp += udp_hdr->length;
temp = (temp & 0xffff) + (temp >> 16);
temp += (temp >>16);

Compare the result checksum (16 bits)

hw_csum = (unsigned short)temp;
if (hw_csum != 0xffff)
hw_csum = ~hw_csum;
if (hw_csum == packet_csum)
Checksum passed;
else
Checksum failed;

Send the result to upper layer. Just follow the algorithm for TCP and IPv6 too. For optimized solution and pseudo code for checksum verification, read the following post:

http://embeddedknowledge.blogspot.com/2011/07/xilinx-temac-checksum-offload.html

If you have any queries write as comments.

Friday, August 28, 2009

Japan Robots

Welcoming Guests and distributing tissue papers.

Walking and shaking hands

Arranging the world famous japanese dish `Sushi`.

" I am ready to prepare Okonomiyaki (Japanese dish)"

Distributing the food. (I wonder how it balances to carry the weight and roll around)

"Hey! Here is the show.."

Don`t add him in robot list. But, his T-shirt was interesting..

Thursday, August 27, 2009

Checksum error: differs by 1 or 2

When you calculate checksum, if it varies by just with last two digits (value varies by 0x01 or 0x02 or 0x03), then the problem is that the carry has not been added.
For example, the correct checksum is 0x5e94. But, your calculation is 0x5e96. To solve this problem, check whether the carry has been added.

csum += *(unsigned short)data;
:
csum = (csum & 0xffff) + (csum >> 16); /* Please add this carry too */
csum = csum + (csum >> 16); /* If the carry is again generated, it has to be added */

Wednesday, August 26, 2009

Software design and implementation for TCP/UDP/IP Checksum offloading Interface

Most of modern NICs (Network Interface Cards) are with Giga bit speed capability and come with TCP/UDP/IP Checksum offloading support. Embedded Operating systems can no longer postpone the integration of checksum offloading support with their drivers and protocol stacks.

When referring with few controllers such as Xilinx TEMAC and PowerPC TSEC, it is clear that the extent of support provided by each widely varies and it is not a standardized one. In turn, it complicates the design and implementation of a standardized software interface that can support all kind of controllers.

**Varying features of checksum offloading**
Supported Layers (TCP/UDP/IP)	Some support only TCP/UDP. Some support IP too.
Support for packets with Options header	Some controllers do not calculate the checksum for packets with options header.
Checksum calculation for UDP/TCP Pseudo Header	Most of controllers which do not support IP layer and Options header expects the protocol stack to calculate the UDP/TCP Pseudo header checksum and seed them with.
Driver Interface specification	Driver interface such as input parameters and output format of the Checksum Offload Engines vary, though mostly interfaced with buffer descriptors.
Support for fragmented support	Some support fragmented packets and some do not.
Error packets handling	Whether the hardware rejects the erroneous packets or leaving it to the software also varies.
VLAN packets support	Some support and some do not support.

From the above table, it is clear that the protocol stack can not fully depend on the hardware engine for checksum calculation. There are packets which are not supported by the Checksum Offload Engine and they have to go through the software checksum calculation.

In this article, I try to design a standardized software interface for the checksum offloading functionality. The implementation is divided into three modules called Configuration, Outbound flow and Inbound flow.

Configuration

It is about advertising the abilities of the controller's Checksum Offload Engine(Let's say COE. I hesitate to name it as TOE(TCP offload engine) since it offloads UDP checksum calculation too) across the protocol stack. In other words, it is about initializing the network interface structures with the capability of the Checksum Offload Engine so that each protocol layer can refer whether the COE supports that particular layer or not and process the packets accordingly. The main capabilities to be advertised are: Which layers are supported (UDP/TCP/IP)? What is the extent of support(Partial/Full)(Partial means checksum offload controller does not support pseudo header checksum calculation)? Does it support fragments or not?

Interface configuration
COE supports IP?
COE supports TCP?
COE supports UDP?
COE support is full or partial?
COE supports fragmented packets?

How this information is maintained in the network protocol stack is implementation specific. However, this article suggests to store the information as flags in the network interface structure and the driver can do the initialization job. Ok! How this information can be used at the protocol stack? While sending and receiving each packet, each layer refers to the above flags to know about the ability of the COE and do the processing accordingly. So, three types of processing must be possible by the TCP/UDP layer: 1) Software 2) Partial 3) Full. And, IP layer must do two types of processing 1) Software 2) Full, where each type is explained as below.

Software: If the COE does not support the particular layer or the packet type (for example, fragmented packets), the checksum will be calculated as usual by the software routine.

Partial: This is special case mainly for TCP and UDP layers. Some COEs support TCP and UDP checksum calculation, but they demand the protocol stack to calculate and feed them with the pseudo header checksum alone(What a pity!). In this case, TCP and UDP layers need to calculate only just the pseudo header checksum and send it to the driver.

Full: The COE calculates the whole checksum for a particular layer. The software routine does not need to do anything.

Done ! Now, COE abilities are maintained in the protocol stack. Let's see how to process each packet.

Outbound flow and parameters

Generalizing the input parameters for the controllers yields the following table of input parameters that need to be passed to the driver with each packet.

Outbound parameters
Layer3 type = IPv4 or IPv6?	flag
Layer4 type = TCP or UDP?	flag
Layer3 calculation by COE?	flag
Layer4 calculation by COE?	flag
Should calculate Pseudo header?	flag
Byte offset for layer4 start	offset
Checksum offset for layer4	offset
Checksum offset for layer3	offset
Pseudo Header Checksum	data

But, some of the parameters can be easily calculated/fixed by the driver rather than sending all the way with the packet. If that optimization is done, the list becomes as follows:

Optimized outbound parameters
Layer3 type = IPv4 or IPv6?	flag
Layer4 type = TCP or UDP?	flag
Layer3 calculation by COE?	flag
Layer4 calculation by COE?	flag
Pseudo Header Checksum	data

Now, the UDP and IP checksum calculation logic in the protocol stack become as follows. All the the above parameters are passed to the driver and the driver sets the COE using these parameters. Everything is over. Packet will come out of the controller with calculated checksum.

UDP

Set UDP Checksum field to 0.
IF (COE Supports UDP? is Yes)
{
    IF (this is fragmented packet AND COE supports fragmented packet? is False)
    {
        Calculate by software
    }
    /* Just send the packet. Let COE calculate the checksum */
    Set Layer4 type = UDP;
    Set Layer4 calculation by COE? = Yes;
    IF (COE support is partial)
    {
        Pseudo header checksum = Calculate just pseudo checksum
    }
}
ELSE
    Calculate by software

IF (COE Supports IP? is No)
{
    Calculate by software
}
ELSE
{
    Set Layer3 type = IP;
    Set Layer3 calculation by COE? = Yes;
}

TCP logic will be the same just as the UDP. And, how to pass all these information to the driver is implementation specific. However they can be passed as flags and data bytes as specified in the above table with the packet structure.

Inbound flow and parameters

Some controllers gives the calculated checksum and some notifies whether the checksum verification is passed or failed. And some controllers drop the erroneous packets. So, as a general rule, this article suggests to verify the checksum at the driver level and just drop the packets with checksum error. And, no parameters are passed from the driver to the upper layer. So, it becomes clear that only two types of packets are sent to upper layer. 1) Checksum verified correct packets 2) Packets unsupported by the COE(for example, fragmented packets). So, in the UDP and TCP layers check the checksum of all fragmented packets by software. So the logic becomes as follows:

IF (COE Supports IP? is No)
{
Verify by software
}
/* All received are correct packets, when COE is enabled */

UDP

IF (COE Supports IP? is No)
{
Verify by software
}
ELSE IF (packet is fragmented AND COE supports fragmented packet? is False)
{
Verify by software
}

The algorithm for TCP will be the same just as the UDP. That is all. Big job Done!!

(Please leave your comments on this article. That will help me to improve this. See you!)

Wednesday, August 19, 2009

Love letter

An embedded programmer wrote a love letter "Dear! Since the day you are powered-on, I`m loving you. Your eyes used to glitter like LEDs. But why do you emit heat whenever I approach you. Don`t worry, I will be a heat sink forever. I always want to hang on your shoulders. Come on! Embed me in your heart. Tell me whatever problem you have, we can debug it. Expecting your positive reply. "

He never got reply. Because, she had made reckless run with another guy.

How to

One day an embedded engineer committed suicide. No clue why he did so. In his office, a new programmer take over his job and look over his code. There was a comment in his code.
/* Junk board! You always hang here. Wanna show you really how to hang */

Saturday, August 15, 2009

Embedded systems Coding: Issues and techniques - Part 1

In Embedded systems, you may wonder that though you have done logically correct coding, it is not working practically on board as you expect. Could you imagine what are all things may cause such problems in embedded systems? Are you aware of coding issues related with compiler optimization, I/O synchronization, Cache write-back, edge triggered interrupts, etc.,? Let me guide you how-to code with some of these issues.

Optimization is an unavoidable one to improve the speed and to reduce the size of the code in embedded systems. But, the compiler will have strict eyes on your code and generate more tricky code which may result in unexpected result. Look at the following typical case:

What do you intend to do in the above code? It is a polling where each time it should read the status register and check for the status update by the hardware. But the compiler will think in more smarter way that why to read the SAME memory location each time unnecessarily since the content is going to be the SAME. The compiler will not know that its content is going to be changed by hardware after sometime. So, it is dare enough to generate the assembly code which is equivalent to

which may result in infinite loop in case the IO_COMPLETED flag is not set when the code reads the status register for the first time.
So, what do you have to do? You have to understand that IO mapped memory is different from the real memory and its content is subjected to change internally by hardware without any external CPU write. So, you have to teach the compiler to treat them separately.

Declare the IO memory locations as volatile so that the compiler will not optimize the read/write operations of such volatile memory locations. So, the above code will work perfect with optimization.

Wednesday, July 29, 2009

Famous Embedded Systems exhibitions

International level exhibitions are conducted once per year in the following places. Many companies from worldwide make stalls to attract new customers and to introduce new technology and products. Lot of people used to visit to know about the current trend and leading edge technologies. And, conferences on Embedded systerm also are being conducted.

ESEC Embedded Systems Expo, Japan (http://www.esec.jp/en/)
ET Embedded Technology, Japan (http://www.jasa.or.jp/et/english/index.html)
ESC Embedded System Conference, India (http://esc-india.com/index.html)

"Embedded system" in different languages

Whenever someone asks what kind of job you are doing, it is very easy for the application engineers to answer: they shall be the familiar words: Banking, Internet, Windows application, etc,. But, if I say `Embedded`, nobody understands except some computer engineers. So, first let us introduce them the phrase `Embedded` in their language and let them get hold of it first.

Tamil பதிகணினியியல் Padhi kaniniyiyal
Japanese 組み込みシステム（くみこみシステム） Kumikomi shisuthemu
Chinese 嵌入式系统 Kan Ru Shi Si Ton
Korean 임베디드 시스템

Friday, July 24, 2009

Dos and Donts in Embedded system design

Mostly in Embedded systems, you will have the embedded OS, drivers, interrupt service routines, callbacks, tasks, stack, local and global variables, mutual exclusion mechanisms, system calls, etc., around you to play with. So, while designing or building a system, you have to be careful and aware to pick the right one and put it in the right place.

I remember when I was new to embedded, I was designing a USB device firmware in Linux. It is for transferring the data from the host and to store it in the storage disk of the target. What I did was, in the interrupt service routine of the USB firmware, I read the block of data from the USB device endpoint and also called the system call to store it in the storage disk. Of course, it was working. But, what happened you know? The whole Linux system except my firmware stopped working, when a file is transferred to the target. Hope you know the problem. Later, the code was corrected as follows: the USB firmware which is running in the kernel level is added with system calls to communicate with user level application. A separate user application task was created which will call the system call to get the block of data from the USB firmware and to call another system call to write the data in the storage disk. This was working fine with all other tasks also getting scheduled whenever there is a gap when the USB firmware and the storage disk driver has to wait for the hardware response.

Though I knew what is Interrupt Service routine and what is System call, I did not know what will happen if I put it there. So, at first, let me give you some do's and don'ts when designing a system.

Do not waste the CPU cycles in busy waiting as follows:

In the above example, the driver writes an IO command to device and wait for the device to complete it. Instead, you can go to sleep so that other tasks can utilize the CPU and the device will interrupt you when the intended job is completed.

Let me modify the code as follows:

Do not call blocking calls from contexts where scheduling is forbidden

Blocking calls are the function calls which may cause the calling entity to sleep. There are some contexts which may run with higher priority than the scheduler and the scheduler is forbidden like Interrupt Handler, critical sections, higher priority kernel threads. Calling blocking calls from such contexts may result in unwanted results like deadlocks. It will keep other tasks in waiting unnecessarily. For example, look at the following code. First of all, whether a particular OS will allow such sleep from Interrupt context is a main question. Even if it allows, sleeping in interrupt service routine will stop all lower priority interrupts, will stop the whole scheduling since because Interrupt service routine is a highest priority context. So, the whole system will sleep during that delay.

Instead, you can set a flag inside interrupt service routine and check for the event from a task and call the function dev_bringup(). So, be aware of the blocking calls and use it in context where scheduling can happen. Keeping the code size and processing very less in higher priority contexts such as interrupt handler will improve the overall system performance.

Do not do your own processing in callbacks

Callbacks are the facility provided by the busy tasks to notify some events to the user. For example, the following task has to continuously poll many devices and has to inform the user about the event. It will call the function pointer registered by the user. So, the function call is the device task`s context. Not the user`s. So, the user should not call blocking calls in the callback. And, it is better to keep very less processing the callback function. Mostly, in the callback, applications used to set some flag to catch events and process the event in their own task contexts.

Do not use local variables in very large size

Do you know the memory space for local variables are allocated from stack? Look at the following example:

In most of systems, each Task is allocated specified and limited size stack space. Since, the local variables consume memory from the stack, the stack memory may not become enough and further it will lead to stack overflow and system crash. So, know the limit and use. In the above example, if the calling task is allocated 300 bytes of stack, what will happen? This function itself uses more than 256 bytes. The task context also might be saved in the stack space. So, it may cause overlap and cause damage. Instead, it is better to allocate the buffer from heap or memory pools dynamically and use. Or, you can allocate from Global memory if mutual exclusion is not a major problem.

Do not keep Global resources un-protected

Whenever you are adding a global variable or global buffer memory or global structure, the first thing you have to worry about it is mutual exclusion. Is it getting shared between more than one task? When it is accessed by one context, is it possible to getting pre-empted with another context? If so, it has to be protected with mutual exclusion. When the variable is shared between interrupt service routine and Task, before accessing the variable from the Task, the interrupt has to be disabled. Since the Interrupt service routine is mutually exclusive in nature, no need protect the globals inside the interrupt service routine. When the variable is shared between Tasks, you should use OS primitives for mutual exclusion.

Avoid use of the user system calls in driver or kernel mode

Most of Operating systems have separation between user and kernel applications. Though mutual exclusion might be needed in user applications and drivers, it is not good to use the user primitives in driver. Instead, it will have the equivalent for kernel mode. So, use that.

Good luck!