Embedded Matters: 2010

Monday, December 13, 2010

QorIQ P2020 Development System (P2020DS) : Problems and solutions

"JTAG Clock Speed Change Error. Please Check your JTAG clock speed", error message. P2020DS board is connected to the CodeWarrior USB TAP debugger and when trying to connect the CodeWarrior, the above error message is shown.

Solution: Update the CodeWarrior to the latest version(V8.8.3) and try the sample CodeWarrior project located in the installation folder. For example,
"C:\Program Files\Freescale\CodeWarrior PA V8.8\PowerPC_EABI_Tools\CodeWarriorTRK\Transport\processor\ppc\serial\P2020DS_serial\P2020DS_serial.mcp"

Debugger shows irrelevant or inconsistent code at reset vector of the boot flash. In P2020DS, the boot ROM is 128 MB NAND Flash. When the board is reset through the debugger, the PC(Program Counter) points to the instruction at reset vector(address 0xfffffffc) of the boot page. But, the code at the reset vector seems to be irrelevant junk data. Also, the code seems to be changing each time the reset is given. Even though the board boots normally at power-on-reset, the boot code is not visible through the debugger.

Solution: Reset the Boot Page Translation Register (BPTR) of P2020 to zero through the initialization script of the debugger. For example, add the following in Codewarrior USB TAP initialization script.

# Disable Boot Page Translation Register
writemem.l 0xe0000020 0x00000000

For LauterBach debugger, add the following:

;Disable Boot Page Translation Register

Data.Set ANC:iobase()+0x00020 %LONG %BE 0x00000000

Unable to program or write the boot sector of the NAND flash. Though the other sectors of the flash is programmed, the last boot sector of the flash could not be programmed.

Solution: Reset the Boot Page Translation Register (BPTR) of P2020 to zero through the initialization script of the debugger, as shown in the above problem

Friday, September 17, 2010

GATE Questions: Computer Science and Engineering (2003)

1. Consider the following C function.

float f(float x, int y)
{
    float p, s; int i;
    for (s=1, p=1, i=1; i < y; i++) {
        p *= x/i;
        s += p;
    }
    return s;
}
For large values of y, the return value of function f best approximates?

Ans: 1 + x + (x^2/2!) + (x^3/3!) +... = e^x

2. Assume the following C variable declaration
int * A[10], B[10][10];
of the following expressions
1) A[2]    2) A[2][3]    3) B[1]    4) B[2][3]
Which will not give compile-time errors if used as left hand sides of assignment statements in a C program?

Ans: 1), 2) and 4) only

3. Which of the following assertions is FALSE about the Internet Protocol (IP)?
A) It is possible for a computer to have multiple addresses
B) IP packets from the same source to the same destination can take different routes in the network
C) IP ensures that a packet is discarded if it is unable to reach its destination within a given number of hops
D) The packet source cannot set the route of the outgoing packets; the route is determined only the routing tables in routers on the way.

Ans: A)

4. Which of the following functionalities must be implemented by a transport protocol over and above the network protocol?
A) Recovery from packet losses
B) Detection of duplicate packets
C) Packet delivery in the correct order
D) End to end connectivity

Ans: D)

Monday, July 05, 2010

How to initialize / configure SDRAM

It is very challenging for an Embedded Software Engineer to initialize a hardware and to bring up the system. Mainly they depend on the vendor provided data or hardware engineer to get the configuration data. Especially, configuring SDRAM controller is kept as black box and they think it is very big and unknown task. Not exactly. Initializing SDRAM controller is very simple if you understand the SDRAM operation. Let us do it here.

SDRAM configuration basics

SDRAM is a chip that is not only memory but it has command parser like FLASH memory where you have to write series of commands to ERASE, LOCK and WRITE block of memory. These commands are just memory writes over the FLASH with specific Address bits set. Similarly, SDRAM commands also specific memory writes over SDRAM with specific address bits set. These command have to be executed for Initialization. Understand that these SDRAM commands are common to all SDRAM chips regardless of the Vendor. Let us see the command sequence later.

But, before writing the commands, we must initialize the SDRAM controller with some basic timing parameters in order to guarantee that those commands are written properly to the SDRAM. Assume you are going to an hospital. Can you directly meet the doctor? No. First you have to register your health insurance card in the reception. Then, wait till your turn comes. Then, consult the doctor for a particular time. Then, come out, get the insurance card back and pay the bill. Like that in SDRAM, for each write and read, the CPU has to set the address, enable some other signals to indicate that it is going to write data, wait for sometime, then place the data and wait for a minimum time for the write to be finished then get back the signals. If you set these delay timings to the SDRAM controller, controller helps the CPU for read/write. These timings vary depend on the SDRAM Chip vendor. So, you have to see the datasheet for these delays and set the corresponding parameters in the controller.

Two common operations have to be performed before and after each read/write. They are 1)Activate 2) Precharge(Deactivate). Assume, if you want to consult more than one doctor in the same hospital, what can you do? No need to register the health insurance card for each doctor. If you register once in the reception, then you can consult the doctors in series and finally you can come out. Like that, once if you activate a single row of memory, if you want to read many columns in the same row, no need to Precharge and Activate again. Instead, Activate a row, read many columns and Precharge before going to another row. So, remember this is the sequence of signals for each read and write.

1) Controller Activates a certain row for access. Waits for sometime.

2) Column address and READ or WRITE command is issued. Waits for Data.

3) If more columns are read in the same row, (2) is repeated. Else goes to (4)

4) Deactivates with Precharge command. Wait for sometime before going to (1)

In the above, I have taught you the basics of SDRAM access and Initialization. Now, let me narrate the general Initialization Sequence.

1) Set the above delay timings in the SDRAM controller by referring the SDRAMchip datasheet.

The SDRAM controller do the job of interacting with SDRAM Chip for reading and writing the data. So, you have to teach the controller about the above time delays for performing reading and writing. Let us see this first as follows.

2) Issue the general command sequence for SDRAM Initialization.

The four Important delay timings you have to teach the controller are as follows:

1) tRCD

Delay needed for Row Activation (Time taken for health insurance registration). In other words, it is the minimum time delay that controller have to wait after activating a row and before going to next step(Column Address Strobe). It is called (RAS to CAS Delay). Delay between Row Address Strobe and Column Address Strobe.

2) tCAS

Delay needed between issuing Column address and READ/WRITE command. (Time taken in waiting for the doctor) It means, once the Column Address is issued(Column Address Strobe), controller has to wait for tCAS seconds, before issuing the next step of READ/WRITE command.

3) tRAS (Row Active Time)

Minimum number of clock cycles to access a certain row of data. Overall time the ROW has to be activated. It is total time required between Activate and before issuing Precharge.

That means tRAS = tRCD + tCAS + (Time Required to wait for Data)

(Total time being in the hospital excluding the time take to get insurance card back. It includes the consulting time)

4)tRP (RAS Precharge)

Delay needed between deactivating the Row and going to next step. Once the Precharge command is issued, the controller has to wait for tPR seconds, before going to next step(like Activating next Row). (Time to get back the health insurance card)

These four delay parameters used to be specified in the SDRAM chip data sheet as "tCAS-tRCD-tRP-tRAS". For example, latency values given as 2.5-3-3-8. Sometimes, tRAS is not specified. In that case, in practice for DDR SDRAM, this should be set to at least tRCD + tCAS + 2. So, first download the user manual for the SDRAM chip. Then, Find the above timing parameters and set the controller with.

If the timing values are in ns(nano seconds), convert them into memory cycles.

number of cycles = (delay in ns)/(1/memory speed)

Let us see the command sequence in the next part of this article`How to Initialize SDRAM`.

One more important parameter is Refresh Cycle. SDRAM needs to be refreshed periodically after particular time interval. This parameter varies depend on the chip and should have been specified in the data-sheet. You may need to convert the refresh interval into number of cycles and initialize the controller. The formula is as follows:

Refresh Cycle = tREFI/(1/memory speed)

For example, when tREFI = 7.8us and memory speed is 300 MHz,

the Refresh Cycle is (tREFI=7.8us)/(1/300e+06) = 2340

Enjoy the Hard times!

Monday, February 22, 2010

Monday, February 15, 2010

PPC405GP MMU: Problem in switching between Real and Translation modes

I had my BSP software for PowerPC PPC405GP processor working in Real mode just configured with 128 MB (0x00000000 – 0x07FFFFFF) Region as Cache enabled(ICCR=0x80000000; DCCR=0x80000000; DCRW=0x0; SGR=0x0; SU0R=0x0; SLER=0x0 ). To create both cacheable and non-cacheable memory regions within SDRAM(128 MB), it was necessary to implement Translation mode where TLB entries with smaller size pages can be created and different memory attributes can be set for each entry.

Enabling the Translation mode involves 1) Setting of the TLB entries 2) Setting the MSR register bits MSR[IR] and MSR[DR].

Howto set the TLB entries?

The following assembly routine gets Index, TLBHI(tag) portion and TLBLO(data) portion as input parameters and write the TLB entry.

    set_tlb_entry:           ; set_tlb_entry(tlb_index, TLBHI, TLBLO)
        sync
        tlbwe r4,r3,0        ; Write TLBHI
        tlbwe r5,r3,1        ; Write TLBLO
        isync

Using the above routine, I write the following entries.

/* TLB Index=0 */
/* EPN=0x00000000;Size=4MB;Valid=1;Endian=0;U0=0 */
/* RPN=0x00000000;EX=1;WR=1;ZSEL=0;W=0;I=0;M=0;G=0 */
/* This entry maps 0x0 physical address to same 0x0 virtual address;
    Enables the cache and grants Execute and Write permissions.
    (SDRAM for execution) */
set_tlb_entry(0, 0x00000340, 0x00000300);

/* TLB Index = 1 */
/* EPN=0x00400000;Size=4MB;Valid=1;Endian=0;U0=0 */
/* RPN=0x00400000;EX=0;WR=1;ZSEL=0;W=0;I=1;M=0;G=0 */
/* This entry maps 0x400000 physical address to same virtual address;
    Disables the cache and grants only Write permission.
    (SDRAM for IO Buffers) */
set_tlb_entry(1, 0x00400340, 0x00400104);

(Now, I have the vector table and execution code from 0x0 address.) And, it is time to enable the translation mode by setting the bits MSR[IR] and MSR[DR]. But, before doing this I must invalidate the instruction cache; flush and invalidate the data cache. Because, they have been used in different memory configuration by the Real mode. And, they are going to be used in different configuration by the TLB entries. Ok! Caches are flushed and invalidated. Then? I thought the Real mode settings can be reset to ICCR=0 and DCCR=0, since because once the Translation mode is enabled, Real mode settings become useless. So, what I did was I cleared all Real mode settings and enabled the Translation mode. But, when I ran my code at one location I got the exception thrown. Even I know the location, I am clueless why the exception thrown from that place. Because, the execution pass through that location only once at initialization. The execution went fine with the initialization. But, how that location is executed after that?

I found one strange thing that when I left the Real mode configuration as it is(Cache enabled for 128MB), the execution went fine. While thinking myself why the Real mode settings too needed while the Translation mode is enabled, I got the problem. When the Interrupt occurs at 0x500 or system call at 0xc00, the MSR[IR] and MSR[DR] bits are cleared. And, when these bits are cleared, processor automatically switches to Real mode. Till these MSR bits are restored, the access will be done through the Real mode. So, there is a possibility that same memory area has been accessed by Real mode and Translation mode with different cache attributes. So, the execution goes wrong.

My guess was right! During the system call at 0xc00, the context was stored at memory location 0x5000 when MSR[IR, DR] = 0, and they were stored/restored again when MSR[IR, DR]=1. When this continues, the data become unstable and causes undefined behaviour further.

So, what I did was, I set the same memory attributes for the memory page(first TLB entry) which contains 0x5000 address and the first Real mode region of (0x0000 0000 – 0x07FFFFFF) so that at both Real mode and translation mode the memory region will be accessed with the same attributes.

So, my code sequence is as below:

At reset, MSR[IR, DR] is 0, and Real mode works with all cache inhibited.
At boot code, first 128 MB region is enabled with cache.
Now to switch to translation mode, Reset Real mode with all cache inhibited.
Invalidate Instruction cache.
Flush and Invalidate Data cache.
Write the TLB entries.
Set the Real mode (registers ICCR, DCCR, DCWR) with same attributes of the TLB entry corresponding to the context page.

This is also applicable to IBM PowerPC 405Fx (PPC405Fx core) and Xilinx PowerPC 405 (PPC405 core) processors.

(For any queries, please write in the feedback column)

Sunday, February 07, 2010

Turbo C compiler Installation on Windows

There is Automatic Turbo C Installer in the following link:

http://webcourse.cs.technion.ac.il/234112/Spring2006/ho/WCFiles/TURBOC30.exe

This will install the compiler in c:/tc folder. You can view your favourite compiler environment by double clicking c:/tc/bin/TC.exe

If you are using Windows Vista, you have to install DOSBox DOS Emulator to work in full screen mode. I faced problem when I double clicked the c:/tc/bin/TC.exe. Then I used the DOSBox and the problem got solved.

(1) Install the software DOSBox ver 0.72 ( 1.2 MB ) (Freeware) from the link below (Direct Link)

http://prdownloads.sourceforge.net/dosbox/DOSBox0.72-win32-installer.exe?download

(2) Before going to the details u have to create a folder (any name will do). Here we name it as Turbo

(3) Copy the TC into the Turbo folder.
(4) Run the DOSBox 0.72 from the icon located on the desktop or from the location of the installation folder

(5) Then u are presented with two screens which look like the command prompt in Windows. One with a Z prompt. You can ignore the other screen.

(6) Type the following commands at the command prompt [Z]:

Mount [Type in any alphabet that u wish except z] [Type the source of the turbo C] press enter

(7) Now , Type in the following commands after the Z prompt:

Z: mount d c:\Turbo\ [The folder TC is present inside the folder Turbo]

(8) Now u should get a message which says: Drive D is mounted as a local directory c:\Turbo\

(9) Type d: to shift to d: prompt . Next follow the commands below

cd TC [The contents inside the folder Turbo gets mounted as a virtual drive (Here D drive)

cd Bin

TC or Tc.exe [This presents u the Turbo C++3.0 screen]

(10) In the Turbo C++ goto Options>Directories> Change the source of TC to the source directory [D] ( i.e. virtual D: refers to original c:\Turbo\ . So make the path change to something like D:\TC\include and D:\TC\lib respectively )
===========================================================

Points to Note:

(1) In order to get the full screen use the key combination of Alt and Enter

(2) When u exit from the DosBox [precisely when u unmount the virtual drive where Turbo C++ 3.0 has been mounted] all the files u have saved or made changes in Turbo C++ 3.0 will be copied into the source directory(The directory which contains TC folder)

(3) It is a good idea to backup your files in the source directory prior to running DOSBox 0.72

(4) For additional help go through the readme file located in the installation folder or look on the website of the DOSBox forum.

(5) Don't use shortcut keys to perform operations in TC because they might be a shortcut key for DOSBOX also . Eg : Ctrl+F9 will exit DOSBOX rather running the code .

UPDATE :

You can save yourself some time by having DOSBox automatically MOUNT your folders

For DOSBox versions older then 0.73 browse into program installation folder and open the dosbox.conf file in any text editor. For version 0.73 go to Start Menu and click on "Configuration" and then "Edit Configuration". Then scroll down to the very end, and add the lines which you want to automatically execute when DOS BOX starts.

Now those commands will be executed automatically when DOS BOX starts!

- Reference Tech Guru

Hard and Soft Real time systems

As in general, Real time systems have to finish a particular job within the specified time. In other words, Real time systems have time restrictions and they have to pursue the deadline. According to the following factors, they can be classified as Hard and Soft real time systems:

the permissible(tolerance) level of not meeting the deadline
the usefulness of the result of the job obtained after the deadline is expired
the severeness of the penalty paid when the deadline is not met

In Hard real time systems, the permissible level of not meeting the deadline is almost zero. In other words, the deadline must be met. The result obtained after the deadline is useless. The penalty paid when the deadline is not met is destruction or failure of the system.

In soft real time systems, the permissible level of not meeting the deadline is not zero. The usefulness of the result obtained after the deadline is not zero. It depreciates gradually over period of time. Even when the deadline is not met, the effect is not so destructive physically.

So, Hard real time systems have almost zero flexibility and they have to meet the deadline at any cost. Meet the deadline or failure of the system. And the failure of the system causes extremely high penalty, even loss of human life. The result obtained after the deadline is almost useless. For example, a car engine control system is a hard real-time system because a delayed signal may cause engine failure or damage. Other examples of hard real-time embedded systems include medical systems such as heart pacemakers and industrial process controllers.

Though soft real time systems also have to meet the deadline, they have flexibility. They can change the flexibility level or set an average value. Though there is no damage when the deadline is not met, depend on the application it will have its own cost proportional to the delay. Live audio-video systems are also usually soft real-time; violation of constraints results in degraded quality, but the system can continue to operate.

Friday, February 05, 2010

Processor architectures: Harvard, von Neumann and Modified Harvard architectures

Harvard architecture has separate data and instruction busses, allowing transfers to be performed simultaneously on both busses. A von Neumann architecture has only one bus which is used for both data transfers and instruction fetches, and therefore data transfers and instruction fetches must be scheduled - they can not be performed at the same time.

It is possible to have two separate memory systems for a Harvard architecture. As long as data and instructions can be fed in at the same time, then it doesn't matter whether it comes from a cache or memory. But there are problems with this. Compilers generally embed data (literal pools) within the code, and it is often also necessary to be able to write to the instruction memory space, for example in the case of self modifying code, or, if an ARM debugger is used, to set software breakpoints in memory. If there are two completely separate, isolated memory systems, this is not possible. There must be some kind of bridge between the memory systems to allow this.

Using a simple, unified memory system together with a Harvard architecture is highly inefficient. Unless it is possible to feed data into both busses at the same time, it might be better to use a von Neumann architecture processor.

Use of caches

At higher clock speeds, caches are useful as the memory speed is proportionally slower. Harvard architectures tend to be targeted at higher performance systems, and so caches are nearly always used in such systems.

Von Neumann architectures usually have a single unified cache, which stores both instructions and data. The proportion of each in the cache is variable, which may be a good thing. It would in principle be possible to have separate instruction and data caches, storing data and instructions separately. This probably would not be very useful as it would only be possible to ever access one cache at a time.

Caches for Harvard architectures are very useful. Such a system would have separate caches for each bus. Trying to use a shared cache on a Harvard architecture would be very inefficient since then only one bus can be fed at a time. Having two caches means it is possible to feed both buses simultaneously....exactly what is necessary for a Harvard architecture.

This also allows to have a very simple unified memory system, using the same address space for both instructions and data. This gets around the problem of literal pools and self modifying code. What it does mean, however, is that when starting with empty caches, it is necessary to fetch instructions and data from the single memory system, at the same time. Obviously, two memory accesses are needed therefore before the core has all the data needed. This performance will be no better than a von Neumann architecture. However, as the caches fill up, it is much more likely that the instruction or data value has already been cached, and so only one of the two has to be fetched from memory. The other can be supplied directly from the cache with no additional delay. The best performance is achieved when both instructions and data are supplied by the caches, with no need to access external memory at all.

This is the most sensible compromise and the architecture used by ARMs Harvard processor cores. Two separate memory systems can perform better, but would be difficult to implement.

-ARM Information Center

Modified Harvard architecture

A Modified Harvard architecture machine is very much like a Harvard architecture machine, but it relaxes the strict separation between instruction and code while still letting the CPU concurrently access two (or more) memory busses.

The most common modification includes separate instruction and data caches backed by a common address space. While the CPU executes from cache, it acts as a pure Harvard machine. When accessing backing memory, it acts like a von Neumann machine (where code can be moved around like data, a powerful technique). This modification is widespread in modern processors such as the ARM architecture and X86 processors. It is sometimes loosely called a Harvard architecture, overlooking the fact that it is actually "modified".

Another modification provides a pathway between the instruction memory (such as ROM or flash) and the CPU to allow words from the instruction memory to be treated as read-only data. This technique is used in some microcontrollers, including the Atmel AVR. This allows constant data, such as text strings or function tables, to be accessed without first having to be copied into data memory, preserving scarce (and power-hungry) data memory for read/write variables. Special machine language instructions are provided to read data from the instruction memory. (This is distinct from instructions which themselves embed constant data, although for individual constants the two mechanisms can substitute for each other.)

Modern uses of the Harvard architecture

The principal advantage of the pure Harvard architecture - simultaneous access to more than one memory system - has been reduced by modified Harvard processors using modern CPU cache systems. Relatively pure Harvard architecture machines are used mostly in applications where tradeoffs, such as the cost and power savings from omitting caches, outweigh the programming penalties from having distinct code and data address spaces.

Digital signal processors (DSPs) generally execute small, highly-optimized audio or video processing algorithms. They avoid caches because their behavior must be extremely reproducible. The difficulties of coping with multiple address spaces are of secondary concern to speed of execution. As a result, some DSPs have multiple data memories in distinct address spaces to facilitate SIMD and VLIW processing. Texas Instruments TMS320 C55x processors, as one example, have multiple parallel data busses (two write, three read) and one instruction bus.

Microcontrollers are characterized by having small amounts of program (flash memory) and data (SRAM) memory, with no cache, and take advantage of the Harvard architecture to speed processing by concurrent instruction and data access. The separate storage means the program and data memories can have different bit depths, for example using 16-bit wide instructions and 8-bit wide data. They also mean that instruction prefetch can be performed in parallel with other activities. Examples include the 8051, the AVR by Atmel Corp, and the PIC by Microchip Technology, Inc..

Even in these cases, it is common to have special instructions to access program memory as data for read-only tables, or for reprogramming.

The von Neumann architecture is a design model for a stored-program digital computer.For example, a desk calculator (in principle) is a fixed program computer. It can do basic mathematics, but it cannot be used as a word processor or a gaming console. Changing the program of a fixed-program machine requires re-wiring, re-structuring, or re-designing the machine.
-Wikipedia