Thursday, May 24, 2012

What is Hard Real Time OS?

There is no particular specification for Hard Real-Time OS kernel. But, it can be defined as "a Real-Time OS kernel which can be used to build a Hard Real-Time System". What is a Real-Time System? Hard Real-Time System is a combination of Hardware plus Software which has been designed so that it will meet all of its deadlines even when the system is loaded to the maximum. 

As a simple example, you have an hardware target board in which you run TCP application which just downloads the data at High priority. And, it has a periodic Task at Low priority which blinks the status LED at 1 second interval. If you can make the status LED to stop blinking by downloading a big amount of data, then it is not a Hard Real-Time system. If you have designed the system so that the LED never stops blinking even if you download a big amount of data continuously, that is called Hard Real-Time System. It does not matter whether you use an RTOS or not, whether you use an Single core or Multi-core. If an RTOS kernel can help such a system, then it can be called Hard Real-Time OS kernel. So, what are all the requirements to be met to help to build such a system? You can say a OS is a Hard Real-Time OS, if

① The execution time of each and every system call of the kernel, interrupt latency and the task switching time have been defined/measured accurately. They are predetermined.

② Priority-based pre-emptive kernel. The Kernel always runs the tasks, interrupt service routines according to the priority. At any point of time, it runs the highest priority task or ISR of the current moment.

③ The system never gets into deadlock. Priority inversion has been implemented by Mutex.

The application should be designed according to the Rate Monotonic Algorithm to be Hard Real-Time.


Finally, I can define a Hard Real-Time OS as follows. The requirements are as specified above:

A Real-Time OS Kernel which facilitates to build an application that can meet all of its deadlines even when the system is loaded to the maximum is called a Hard Real-Time Operating System.
 
------
1) Rate Monotonic Scheduling Algorithm
It is about two things. One is about priority assignment. Second is total CPU utilization.
 
a) the shorter cycle duration is, the higher is the job's priority
 
Assume you will have enough time to complete all tasks. But, if you finish the job, who will come to you first, it will be safe. So, whose deadline is shorted, finish it first.
 
b) Overall CPU utilization has to be maximum of 69.3 % to meet all deadlines assuming infinite number of tasks.
 
This assumes all the tasks are periodic, Ti specifies the cycle period and Ci specifies the worst-case execution time. So, utilization specifies howmany percentage in one second is utilized by each task. Addition of all specifies the total utilization.
 
U = C1/T1 + C2/T2 + C3/T3 +....

Saturday, May 12, 2012

Linux Questions

How linux porting is done? What are all the challenges?

build → make config
         make dep
         make
Change of memory map → Where?




Kernel Boot flow?

http://milindchoudhary.wordpress.com/2009/03/30/linux-boot-process

vmlinux image is compressed and zImage is created.

_zimage_start → ELF ENTRY → arch/powerpc/boot/zImage.lds.S
  ordered as text, data, dtb(embedded Device tree blob), vmlinux, initrd, bss
  To extract the kernel vmlinux, System.map, .config or initrd from the zImage binary:
    objcopy -j .kernel:vmlinux -O binary zImage vmlinux.gz
    objcopy -j .kernel:System.map -O binary zImage System.map.gz
    objcopy -j .kernel:.config -O binary zImage config.gz
    objcopy -j .kernel:initrd -O binary zImage.initrd initrd.gz
  
_zimage_start → arch/powerpc/boot/crt0.S
  Initialized stack, bss and C environment
  call platform_init
  branch to start
platform_init → arch/powerpc/boot/redboot-8xx.c
  initialize serial console
  display clock-frequency
  assign the command line to loader_info
start → arch/powerpc/boot/main.c
  "\n\rzImage starting: loaded at 0x%p (sp: 0x%p)\n\r"
  prepare kernel (prep_kernel())
     unzip _vmlinux_start, _vmlinux_end
      "Allocating 0x%lx bytes for kernel ...\n\r"
    "gunzipping (0x%p <- 0x%p:0x%p)..."
      "done 0x%x bytes\n\r"
  prepare ramdisk (prep_initrd())
    "Attached initrd image at 0x%p-0x%p\n\r"
        or
    "Using loader supplied ramdisk at 0x%lx-0x%lx\n\r"
    "Allocating 0x%lx bytes for initrd ...\n\r"
    "Relocating initrd 0x%lx <- 0x%p (0x%lx bytes)\n\r"
    /* Tell the kernel initrd address via device tree */
    "linux,initrd-start" "linux,initrd-end"

  prepare command line (prep_cmdline())
    "bootargs" "\n\rLinux/PowerPC load: %s"
    /* If possible, edit the command line */
    /* Put the command line back into the devtree for the kernel */
    "bootargs", cmdline
Function pointer _vmlinux_start((unsigned long)initrd.addr, initrd.size, loader_info.promptr);
(Find the initial symbol of vmlinux image. Find vmlinux.lds. Makefile at top of Linux source.)
ENTRY(_stext) → arch/powerpc/kernel/vmlinux.lds
ENTRY(_stext) → arch/powerpc/kernel/head_8xx.S
  to map the first 8M 1:1
  call initial_mmu
  call turn_on_mmu
  /* Decide what sort of machine this is and initialize the MMU */
  bl      machine_init
  bl      MMU_init
  branch to start_kernel
start_kernel → init/main.c
  tick_init();
  boot_cpu_init();
  print linux banner
    "Linux version " UTS_RELEASE " (" LINUX_COMPILE_BY "@"
     LINUX_COMPILE_HOST ") (" LINUX_COMPILER ") " UTS_VERSION "\n";
  setup_command_line(command_line);
  printk(KERN_NOTICE "Kernel command line: %s\n", boot_command_line);
  sched_init();
  init_IRQ();
  init_timers();
  sched_clock_init();
  calibrate_delay();
  rest_init();
rest_init();
  rcu_scheduler_starting();
  /*
  * We need to spawn init first so that it obtains pid 1, however
  * the init task will end up wanting to create kthreads, which, if
  * we schedule it before we create kthreadd, will OOPS.
  */
  kernel_thread(kernel_init, NULL, CLONE_FS | CLONE_SIGHAND);
  pid = kernel_thread(kthreadd, NULL, CLONE_FS | CLONE_FILES);
  schedule();
  /* Call into cpu_idle with preempt disabled */
  preempt_disable();
  cpu_idle();
kernel_init → init/main.c
  wait for kthreadd to complete
  smp_init();
  sched_init_smp();
  do_basic_setup();
  init_post();
init_post() → init/main.c
  /* shell specified in cmd prompt or defaults */
  run_init_process("/sbin/init");

 
Howto list, load, unload modules?
What is use of mmap?
Howto allocate physical memory and virtual memory space?
Where is the assembly part of the scheduler?
what are the challenges you face when you port the kernel?
How will you debug kernel crash?
Now I give you a new CPU and board. How will you port Linux to it? What are all the steps you do to make the Linux to work on that?


Linux WiFi stack and Bluetooth stack names?

Howto debug Linux kernel in virtual memory mapping?

How uClinux is different from other flavor of Linux?


http://www.eetimes.com/electronics-news/4134390/How-uClinux-provides-MMU-less-processors-with-an-alternative

uClinux is for MCU with NO mmu.

Differences between uClinux and Linux
・little kernel and user space software is affected with NO mmu
・lack of both memory protection and of a virtual memory model
  ・One consequence of operating without memory protection is that an invalid
    pointer reference by even an unprivileged process may trigger an address
    error, and potentially corrupt or even shut down the system.
  ・three primary consequences of running Linux without virtual memory
    1) processes which are loaded by the kernel must be able to run
      independently of their position in memory. One way to achieve this is to
      "fix up" address references in a program once it is loaded into RAM. The
      other is to generate code that uses only relative addressing (referred to
      as PIC, or Position Independent Code) - uClinux supports both of these
      methods.
    2) memory allocation and deallocation occurs within a flat memory model.
       Very dynamic memory allocation can result in fragmentation which can
       starve the system. One way to improve the robustness of applications
       that perform dynamic memory allocation is to replace malloc() calls with
       requests from a preallocated buffer pool.
    3) swapping pages in and out of memory is not implemented, since it cannot
       be guaranteed that the pages would be loaded to the same location in
       RAM. In embedded systems it is also unlikely that it would be acceptable
       to suspend an application in order to use more RAM than is physically
       available.
・absence of the fork() and brk() system calls.
  ・Under Linux, fork() is implemented using copy-on-write pages. Without an
    MMU, uClinux cannot completely and reliably clone a process, nor does it
    have access to copy-on-write. uClinux implements vfork() in order to
    compensate for the lack of fork(). When a parent process calls vfork() to
    create a child, both processes share all their memory space including the
    stack. vfork() then suspends the parent's execution until the child process
    either calls exit() or execve(). Note that multitasking is not otherwise
    affected. It does, however, mean that older-style network daemons that make
    extensive use of fork() must be modified. Since child processes run in the
    same address space as their parents, the behaviour of both processes may
    require modification in particular situations.
  ・uClinux has neither an autogrow stack nor brk() and so user space programs
    must use the mmap() command to allocate memory. For convenience, our C
    library implements malloc() as a wrapper to mmap(). There is a compile-time
    option to set the stack size of a program.
・Program loaders which support position independent code (PIC) were added.
  A new binary object code format, named 'flat' was created, which supports PIC
  and which has a very compact header. Other program loaders, such as that for
  ELF, were modified to support other formats which, instead of using PIC, use
  absolute references which it is the responsibility of the kernel to 'fix up'
  at run time. Each method has advantages and disadvantages. Traditional PIC is
  quick and compact but has a size restriction on some architectures. For
  example, the 16-bit relative jump in Motorola 68k architectures limits PIC
  programs to 32K. The runtime fix-up technique removes this size restriction,
  but incurs overhead when the program is loaded by the kernel.

How a system call is processed?

Through Supervisor call, get into the Kernel space.

How does ping do not flood at user rights?
Code for Linux Semaphore, Spin lock, Interrupt Disabling, Kernel preemption disabling.
 

Monday, May 07, 2012

Howto program unsupported Flash using Computex CSIDE

First refer to the board user manual and check the bus width of the Flash interface. Select the bus width in the configuration of Flash address space and select an arbitrary Flash device.

Restart the configuration. CSIDE will show the actual Makers ID and Device ID that is read from the flash. Now, search from the following links whether the Maker and the Device are supported. If the maker is not supported, try the Matching device configurations.


I had EON EN29LV3320AB Flash device installed in my board. But, I selected Spansion MBM29LV320BE in CSIDE and successfully programmed Flash. Because both have the same Device ID 0x22F9.

Sunday, May 06, 2012

STM32F4: Exception occurs when enabling prefetch queue and branch caches

Problem:

On STM32F407IG, when I enable/set Prefetch queue (PRFTEN bit) and branch cache (DCEN and ICEN bits) in Flash access control register (FLASH_ACR) and I execute my program, exception occurs. The exception differs each time between Hard Fault, Bus Fault, and Usage Fault exceptions. If I enable only either Prefetch queue or branch cache, there is no such problem.

Or, when I execute my program through single-step using debugger, there is no exception occurs. But, when I 'Run' from reset-handler, either HardFault or BusFault or UsageFault occurs.

Solution:

Check the LATENCY bit setting. If it has been set to Three wait states (0x3), try to increase it to Four wait states (0x4) or more.

The clock speed has been increased to 168 MHz in STM32F4, in comparison with 120 MHz clock speed in STM32F2. So, the number of wait states has to be increased, if you port a program from STM32F2 to STM32F4 family MCUs.

When the cache is enabled, "Cache Fill" reads more faster from Flash memory. So, if the number of wait states is not more enough, the read data or instruction will be inconsistent. Same phenomenon occurs, when 'Run'ning a program using debugger instead of single-step execution. 'When 'Run'ning a program, memory will be read more faster. So, make sure that enough wait states has been introduced.

Leave your comments to improve this post!