Friday, 18 October 2013

IR Drop Analysis using Redhawk - Overview


IR Drop Analysis using Redhawk:

Redhawk performs several types of power analysis on a circuit.
  • Static Voltage (IR) drop with average cycle currents
  • Dynamic Voltage drop with worst-case switching currents
  • Electromigration (EM) Analysis
  • Critical path and clock tree impacts
Redhawk Capabilities:
 As discussed above Redhawk is used to perform EM, IR and transient analysis on power grid. Redhawk is available in different modes. They are
1. Static Mode
2. Dynamic Mode
3. Transient Mode

1. Static Mode: 
 In static mode, Redhawk can perform EM, IR drop analysis. In this mode the tool analyzes average IR Drop and EM in the design.

2. Dynamic Mode:
In Dynamic mode, Redhawk analyzes peak IR drop in the design during functional mode using Vectorless algorithm or VCD. It can also analyze peak IR drop in the design during scanning mode using Vectorless or VCD.

3. Transient Mode:
In Transient mode, Redhawk can perform power up and power down analysis. It analyses peak rush current during turn-on, power grid turn-on time.

Inputs for Redhawk :

  • . LEF
  • . DEF
  • . LIBS
  • . TECH
  • . ploc
  • . GSR
  • . DSPF/SPEF
  • . <design>.timing
  • APL files ( cell.spcurrent, cell.cdev)
LEF - Library Exchange Format :
This is a industry standard format that has the information related to pin description and boundaries of the block /instances in the design.


DEF - Design Exchange Format :
This contains logical and physical connectivity between different instances and blocks in the design.

LIB - Synopsys liberty format:
This has several electrical and logical properties for a cell like: input and output pin properties, information on distributing power among the different power pins, internal energy of the cell, cell functionality information, etc.

SPEF - Standard Parasitic Exchange Format:
This file contains the parasitic (RC values) associated with each nets in the design.

ploc: 
It contains Pad location information based on Full-chip Floorplan.

Redhawk Tech file:
It contains Resistance information for all interconnect layers, EM limit information for all interconnect layers, Interconnect Stack information.

APL Files:
It Contains Current Profile characterization data for each standard cell, Intrinsic decap characterization data for each standard cell, Piecewise linear cap characterization data for each standard cell.

<design>.timing:
It contains STA Timing Information From Primetime. It is Recommended for Static analysis-provides accurate transition times and instance frequency. It required for Dynamic –provides switching windows.

RedHawk outputs:
• IR voltage drop contour maps
• Electro-migration (EM) analysis
• Power density and average current maps
• Text report files of detailed static power, voltage, and current data
• Warnings and violations reports.

You Might Also Like:


Wednesday, 9 October 2013

Clock Tree Synthesis (CTS) - Overview

Clock Tree Synthesis

Clock Tree Synthesis (CTS) is the process of inserting buffers/inverters along the clock paths of the ASIC design to balance the clock delay to all clock inputs. So in order to balance the skew and minimize insertion delay CTS is performed. We will discuss about skew and insertion delay in upcoming posts. As shown in below figure 1, Before CTS, All clock pins are driven by a single clock source. Here we are discussing CTS overview. What are the checklist before CTS and after CTS?? What are the inputs and outputs for CTS? How CTS effect the design.

                                                 Figure 1. Clock Distribution before CTS

Checklist before CTS:
  • Placement - Completed
  • power ground nets - Prerouted
  • Estimated Congestion - acceptable 
  • Estimated Timing - acceptable (~ 0 ns slack)
  • Estimated Max Tran/Cap - No violations
  • High Fanout Nets
Inputs required for CTS:
  • Detailed Placement Database
  • Target for latency and skew if specified
  • Buffers or Inverters for building the clock tree
  • Clock Tree DRC (Max Tran, Max Cap, Max fanout, Max no of buffer levels)
Output of CTS:
  • Database with properly build clock tree in the design
Checklist after CTS:
  • Skew Report
  • Clock Tree Report
  • Timing Reports for setup and hold
  • Power and Area Report
CTS Goals:
  • Minimizing Clock Skew
  • Minimizing Insertion Delay
  • Minimizing Power Dissipation 
Why clock routes are given more priority than signal nets ?

Clock is propagated after placement because the exact physical location of cells and modules are needed for the clock’s propagation which in turn impacts in dealing with accurate delay and operating frequency and clock is propagated before routing because when compared to signal routes, clock routes are given more priority. This is because; clock is the only signal switches frequently which in acts as source for dynamic power dissipation.

Effects of CTS:
  • Clock Buffers are added
  • Congestion may increase
  • Non-clock cells may have been moved to less ideal locations
  • Can introduce timing and max tran/cap violations
                                                Figure 2. After CTS - Buffer tree is built




Tuesday, 8 October 2013

Placement

Placement
Placement is the process of placing standard cells in the rows created at floorplanning stage. The goal is to minimize the total area and interconnect cost. The quality of routing is highly determined by the placement. Placement becomes very critical in Deep Sub Micron technologies.The inputs for the placement stage are Gate-level Netlist, Floorplanned design, Design libraries (Physical and Logical libraries), Design Constraints, Technology file.


Gate-Level Netlist:

Gate-level netlist contain references to standard cells and macros, which are stored in the logical libraries, as well as other hierarchical logic blocks. Before placing one must ensure that all references can be resolved.

Reference Libraries:
Reference Libraries contain logical and physical information of macros, standard cells used by many other designs. These are referenced by pointers in the design library for memory efficiency. A standard cell library also contains a corresponding abstract view for each layout view.

Placement is the process of finding a suitable physical location for each cell in the design. Placement is performed in two stages: coarse placement and legalization.

Coarse Placement:
During coarse placement, The placement tool determines an approximate location for each cell according to the timing and congestion constraints. The placed cells do not fall on the placement grid and may overlap each other. Large cells, such as RAM and IP blocks, act as placement blockages for smaller, leaf-level cells. Coarse placement is fast and is sufficiently accurate for initial timing and congestion analysis

Legalization:
During legalization, Placement moves the cells to precisely legal locations on the placement grid and eliminates any overlap between cells. The small changes to cell locations cause the lengths of the wire connections to change, possibly causing new timing violations. Such violations can often be fixed by incremental optimization, for example, by re-sizing the driving cells. 

The place_opt command is recommended for performing placement in most situations. This command performs coarse placement, high-fanout net synthesis, physical optimization, and legalization, all in a single operation. In certain applications, you might want to perform placement tasks individually using commands such as create_placementand physopt, for a greater degree of control or to closely monitor the results as they are generated.

In the placement process, placement tool considers possible trade-offs between timing and congestion. Timing considerations bring cells closer together to minimize wire lengths and therefore wire delays. On the other hand, the occurrence of congestion draws cells further apart to provide room for the connections. Congestion cannot be ignored entirely in favor of timing because rerouting wires around congested areas will cause an increase in wire lengths and wire delays, thus defeating the value of close placement.


In the place_opt command, the -congestion option causes the tool to apply more effort to congestion removal, resulting in better routability. However, this option should be used only if congestion is expected to be a problem because it requires more runtime and causes area utilization to be less uniform across the available placement area. If congestion is found to be a problem after placement and optimization, it can be improved incrementally with the refine_placement command. Timing, area, and congestion optimization can also be done incrementally with the psynopt command. 

The -area_recovery option of the place_opt command allows placement tool to recover chip area where there is extra timing slack available. For example, it can resize cells smaller in timing paths where there is a positive timing slack. Placement is typically done before clock tree synthesis, so the clock network is ideal and does not have a clock buffer tree available for accurate clock network timing analysis. To get more accurate timing results, you should use the same commands as those used in synthesis tool to specify non-zero latency, uncertainty, and transition times for the clock network.



References:

1.Synopsys ICC Manual

Power Planning - Power Network Synthesis (PNS)

Power Planning - Power Network Synthesis (PNS)

In ICC Design Planning flow, Power Network Synthesis creates macro power rings, creates the power grid. PNS automates power topology definition, Calculations of the width and number of power straps to meet IR constraints, detailed P/G connections and via placement.

Here I am going to discuss about the Calculations of the width and number of power straps to meet EM IR constraints.Suppose consider core voltage Vdd core = 1.2volts.

Using below mentioned equations we can calculate vertical and horizontal strap width and required number of power straps.

1. Calculation of block currents w.r.t to power:

                     Iblock = Pblock/ Vddcore

                       Where Pblock = Block Power
                                  Vdd core = Core Voltage

2. Calculation the current supply from each side of the block :

                 Itop= Ibottom= {Iblockx [Wblock / (Wblock+Hblock)]} / 2

                 Ileft= Iright= {Iblock x [Hblock/ (Wblock+Hblock)]} / 2

3. Calculation of power-strap width based on EM:

               W strap_vertical( = W strap_top= W strap_bottom) = Itop/ J metal
               
                W strap_horizontal( = W strap_left= W strap_right) = Ileft/ J metal

4. Calculation of strap width based on IR drop dominates:

              Wstrap_vertical ≧ (Itop x Roe x Hblock) / 0.1Vdd
           
              Wstrap_horizontal≧ (Ileftx Roe x Wblock) / 0.1Vdd

5. Partition the power straps into power refreshes:

For better utilization of the routing channels, select a refresh width of (3 routing pitch + minimum metal6 width) = (3 x 0.59 μm + 0.25 μm) = 2.01μm 2 μm in the vertical and the same in the horizontal.

Block A as an example, the number of the Vdd/Vssrefresh is:

          Nrefresh_horizontal= Wstrap_ horizontal/ Wrefresh

          Nrefresh_vertical= Wstrap_vertical / Wrefresh

The spacing of each refresh would be:

         Srefresh_horizontal= Hblock/ Nrefresh_horizontal

         Srefresh_vertical = Wblock/ Nrefresh_vertical

6. Calculate the required number of core power/ground pads:

 If each power/ground pad can sustain 25 mA current, Pcore=630mw

        Npad_core = (Pcore/ Vddcore) / Icore_power_pad

                           = (630/1.2)/25
                           = 21

7. Core Power Estimation :

 The following equation provides a simple method to estimate the dynamic power and leakage power of combinational cells in the core area:

      Pdynamic= Pcore x F x Scomb x Ncomb

                Where,
                Pcomb. is the power per MHz per gate count (nW/MHz/gate)
                F is the working frequency. (Unit = MHz)
                Scomb. is the switching activity of combinational logic

                Ncomb. is the number of gate counts

      Pstatic= Pleakagex Ncomb
               
                   Where,
                    Pleakage is average leakage power of gate
                    Ncomb. is the number of gate counts
Consider
•Gate count of combinational logic is 160K gates
•The working frequency is 27MHz
•Switching activity is 0.2

Then, the dynamic power consumption in the combinational circuit is,
Pdynamic = Pcorex F x Scombx NcombPdynamic
                = 12.35 nW/MHz X 27 X 0.2 * 160K
                = 10.67 mW

The leakage power consumption in the combinational logic is
Pstatic= Pleakagex NcombPstatic
          = 0.756 nW X 160K
          = 0.121 mW

Multi Voltage Design - Power Management Technique

Multi Voltage Design:


Power is primary concern in many segments of today's electronics business. As discussed in earlier posts, Power is two types in IC Design - Dynamic and Static power. Dynamic power comprises of Internal power and switching power where as static power comprises of leakage power. As discussed in earlier post, Internal power (Dynamic) includes short-circuit (Vdd to GND) power as well as power consumed due to switching of internal nets.Switching (Dynamic) power is due to charging and discharging of load capacitance during switching.

We know that, Dynamic power is proportional to C.V^2. f. where

                                         C is Capacitance
                                          f is Switching Frequency
                                         V is Voltage

The dynamic power in designs is growing rapidly because dramatic increases in clock speeds and transistor counts. By using clock gating technique, the dynamic power due to switching can be reduced. But dynamic power varies linearly with frequency and it varies proportional to square of the operating voltage.Therefore, We can reduce the dynamic power significantly by reducing the operating voltage.

Challenges and Requirements for Multi Voltage Design:

Multi-voltage design styles vary with the target application. Figure 1 shows three different design styles used today. The most standard style consists of partitioning the design into independent voltage areas (or islands)that can function at a specific minimum voltage under a given performance constraint. Each voltage area operates at a single voltage: this can be the same as the chip voltage main Vdd or it can be a different voltage. Another commonly used multi-voltage design style consists of a power-down mode where one or more voltage areas may be shut down to conserve power during low-performance operating modes, such
as sleep or hibernation. The most advanced multi-voltage design style, however, is Adaptive Voltage Scaling
(AVS). AVS uses on-chip (or off-chip) monitors to adaptively adjust voltage levels based on operating mode requirements and process and temperature.


To achieve multi-voltage design, a systemic solution is required that:




  • Supports advanced infrastructures, offering required libraries and cells for different multi-voltage design          styles
  • Offers integrated RTL to GDSII implementation with advanced, convergent dynamic and leakage power       optimization for faster time-to-results (TTR) and enhanced quality-of-results (QoR)
  • Ensures timing, SI, power, and power integrity sign-off   


  •  Now Android Application available, Click here to download it

    Monday, 7 October 2013

    Power Gating - Power Management Technique

    Power Gating:

    Power Gating is a low power technique in deep sub micron technologies. Power Gating is performed by shutting down the power for a portion of the design in order to reduce the static(leakage) power in the design. Power Switch (PS) cell is  basic element which is used in power gating technique to shutting down the power for a portion of the design. The PS cell is also known as power management cell. The basic idea of power gating is to separate the VDD or GND power supply from standard cells of a specific design hierarchy.

    Appropriate sized PMOS(Header) or NMOS(Footer) transistors are used as Power Switch (PS) cells. These two NMOS, PMOS cells only differ in the fact that the switches switch different power rails VDD and VSS respectively as shown in below Figure1. The designer turned to use header switches since header switches have less leakage and they are also more easy for implementation.


                                                            Figure 1. Power Gating

    Switch cell has two modes of operation  - ON or OFF
    When switches are in off state, they disconnect the devices inside the block from power source. This reduces the leakage current flow in the devices of the block.

    There are two approaches in Power Gating.
        1. Fine Grain Power Gating
        2. Coarse Grain Power Gating

    In Fine Grain Power Gating Technique, Each standard cell has inbuilt power switch. Where in Coarse Grain technique switches control entire block of standard cells using a large size transistor. Each of these approaches has their various trade-offs. Fine grain is easier to implement in terms of timing analysis, but with significant area overhead resulting in higher fabrication cost.On the other hand, the coarse grain switches require more consideration in terms of timing and wake-up time, but shows grater leakage saving. The coarse grain power gating is common implementation technique nowadays and can reduce leakage current by 30X.

    Power Switches Placement Styles:

    Coarse grain implementation provides multiple placement topologies for the power switches. For example, switches can be placed around the power domain (in a column or ring way) or in an array fashion inside the domain area. Array style is a more common technique as it yields smaller IR-drop and less area. It is also more efficient with respect to Power-Gates control sequence. On the other hand, ring approach can eliminate the user from synthesizing complicated Power-Grid and it also gives better placement results, as it removes fragmentations from placement areas.

    Array style also suits best Flip-Chip designs, where Power is delivered from the Bond pads placed also inside the core, which reduce IR-drop significantly, when compared to ring placement style.

    Low power Cells:

    To facilitate data transfer between multiple Power domains operating at different voltage levels, it is recommended to use level-shifters. Usually both low-to-high and high-to-low level shifters are provided by library vendors.
    Level shifters are used for two main reasons. First of all, when a signal propagates from a low-voltage block to a high-voltage block, a lower voltage at the PMOS gate might result in the gate not being entirely switched off, which can cause abnormal leakage current. Secondly, because signals must transition across voltage domains, levels shifters should be used to ensure that both net transition and net delays are accurately calculated.

    For power domains which share the same operating voltage but some of them may be shut-off, an isolation cell is required on power domain interface. The reason for this is that cells connected to power-off blocks, their inputs become floating which may cause high leakage power. Therefore, isolation cells are necessary to isolate floating inputs. The isolation is performed by setting a default logic value on the output depends on the state of a dedicated control pin. Usually 2 types of isolation cells are provided by the library vendor: clamp0 and clamp1, which differs by the default value, set in isolation state. Desired cell type is chosen according to the functionality on the receiver side.

    Blocks operate at different voltage levels, and some of them can also be turned off, requires both isolation and level-shifting functions at the power domain interface. To simplify implementation, library vendors usually supply a single cell called the enable-level shifter, which is basically a level-shifter that includes an enable signal.

    The recommendation is to place Enable Level Shifters on all outputs of such blocks. Both Isolation cells and Enable Level Shifters are placed on the Always-on area. Figure 2 illustrates Low-Power cells usage between various types of power domains.
                                                       Figure 2. Low power cells usage

    Power Switch Count:

    In order to ensure correct operation under functional mode, we need to make sure no I/R drop is within cell characterization range (usually 10% of Nominal voltage). Since Power switches are in linear state when they are turned ON, they act like a resistor which drops the Voltage based on its resistance, as described in figure 3.

                                                Figure 3. IR Drop through Power Switch
    Minimal number of power switches can be determined from the following data:
    • DC I/V curve (Transistors are in linear state)
    • IR drop limit for the switches
    • Domain power consumption

    One can use the following formula to derive the minimum number of switches required for a
    design when the above data is given as input.


    Additional optimization can be made for leakage/Performance trade-off. While large number of
    switches increases total leakage & area, insufficient number of switches increase IR drop and
    degrades performance.

    References:

    1. Robust Power Gating Implementation using ICC by Ariel Wolf, SNUG Israel 2009.

    Now Android Application available, Click here to download it