Wednesday, 9 October 2020

Clock Tree Synthesis (CTS) - Overview

Clock Tree Synthesis

Clock Tree Synthesis (CTS) is the process of inserting buffers/inverters along the clock paths of the ASIC design to balance the clock delay to all clock inputs. So in order to balance the skew and minimize insertion delay CTS is performed. We will discuss about skew and insertion delay in upcoming posts. As shown in below figure 1, Before CTS, All clock pins are driven by a single clock source. Here we are discussing CTS overview. What are the checklist before CTS and after CTS?? What are the inputs and outputs for CTS? How CTS effect the design.

                                                 Figure 1. Clock Distribution before CTS

Checklist before CTS:
  • Placement - Completed
  • power ground nets - Prerouted
  • Estimated Congestion - acceptable 
  • Estimated Timing - acceptable (~ 0 ns slack)
  • Estimated Max Tran/Cap - No violations
  • High Fanout Nets
Inputs required for CTS:
  • Detailed Placement Database
  • Target for latency and skew if specified
  • Buffers or Inverters for building the clock tree
  • Clock Tree DRC (Max Tran, Max Cap, Max fanout, Max no of buffer levels)
Output of CTS:
  • Database with properly build clock tree in the design
Checklist after CTS:
  • Skew Report
  • Clock Tree Report
  • Timing Reports for setup and hold
  • Power and Area Report
CTS Goals:
  • Minimizing Clock Skew
  • Minimizing Insertion Delay
  • Minimizing Power Dissipation 
Why clock routes are given more priority than signal nets ?

Clock is propagated after placement because the exact physical location of cells and modules are needed for the clock’s propagation which in turn impacts in dealing with accurate delay and operating frequency and clock is propagated before routing because when compared to signal routes, clock routes are given more priority. This is because; clock is the only signal switches frequently which in acts as source for dynamic power dissipation.

Effects of CTS:
  • Clock Buffers are added
  • Congestion may increase
  • Non-clock cells may have been moved to less ideal locations
  • Can introduce timing and max tran/cap violations
                                                Figure 2. After CTS - Buffer tree is built




Tuesday, 8 October 2020

Placement

Placement
Placement is the process of placing standard cells in the rows created at floorplanning stage. The goal is to minimize the total area and interconnect cost. The quality of routing is highly determined by the placement. Placement becomes very critical in Deep Sub Micron technologies.The inputs for the placement stage are Gate-level Netlist, Floorplanned design, Design libraries (Physical and Logical libraries), Design Constraints, Technology file.


Gate-Level Netlist:

Gate-level netlist contain references to standard cells and macros, which are stored in the logical libraries, as well as other hierarchical logic blocks. Before placing one must ensure that all references can be resolved.

Reference Libraries:
Reference Libraries contain logical and physical information of macros, standard cells used by many other designs. These are referenced by pointers in the design library for memory efficiency. A standard cell library also contains a corresponding abstract view for each layout view.

Placement is the process of finding a suitable physical location for each cell in the design. Placement is performed in two stages: coarse placement and legalization.

Coarse Placement:
During coarse placement, The placement tool determines an approximate location for each cell according to the timing and congestion constraints. The placed cells do not fall on the placement grid and may overlap each other. Large cells, such as RAM and IP blocks, act as placement blockages for smaller, leaf-level cells. Coarse placement is fast and is sufficiently accurate for initial timing and congestion analysis

Legalization:
During legalization, Placement moves the cells to precisely legal locations on the placement grid and eliminates any overlap between cells. The small changes to cell locations cause the lengths of the wire connections to change, possibly causing new timing violations. Such violations can often be fixed by incremental optimization, for example, by re-sizing the driving cells. 

The place_opt command is recommended for performing placement in most situations. This command performs coarse placement, high-fanout net synthesis, physical optimization, and legalization, all in a single operation. In certain applications, you might want to perform placement tasks individually using commands such as create_placementand physopt, for a greater degree of control or to closely monitor the results as they are generated.

In the placement process, placement tool considers possible trade-offs between timing and congestion. Timing considerations bring cells closer together to minimize wire lengths and therefore wire delays. On the other hand, the occurrence of congestion draws cells further apart to provide room for the connections. Congestion cannot be ignored entirely in favor of timing because rerouting wires around congested areas will cause an increase in wire lengths and wire delays, thus defeating the value of close placement.


In the place_opt command, the -congestion option causes the tool to apply more effort to congestion removal, resulting in better routability. However, this option should be used only if congestion is expected to be a problem because it requires more runtime and causes area utilization to be less uniform across the available placement area. If congestion is found to be a problem after placement and optimization, it can be improved incrementally with the refine_placement command. Timing, area, and congestion optimization can also be done incrementally with the psynopt command. 

The -area_recovery option of the place_opt command allows placement tool to recover chip area where there is extra timing slack available. For example, it can resize cells smaller in timing paths where there is a positive timing slack. Placement is typically done before clock tree synthesis, so the clock network is ideal and does not have a clock buffer tree available for accurate clock network timing analysis. To get more accurate timing results, you should use the same commands as those used in synthesis tool to specify non-zero latency, uncertainty, and transition times for the clock network.



References:

1.Synopsys ICC Manual

Power Planning - Power Network Synthesis (PNS)

Power Planning - Power Network Synthesis (PNS)

In ICC Design Planning flow, Power Network Synthesis creates macro power rings, creates the power grid. PNS automates power topology definition, Calculations of the width and number of power straps to meet IR constraints, detailed P/G connections and via placement.

Here I am going to discuss about the Calculations of the width and number of power straps to meet EM IR constraints.Suppose consider core voltage Vdd core = 1.2volts.

Using below mentioned equations we can calculate vertical and horizontal strap width and required number of power straps.

1. Calculation of block currents w.r.t to power:

                     Iblock = Pblock/ Vddcore

                       Where Pblock = Block Power
                                  Vdd core = Core Voltage

2. Calculation the current supply from each side of the block :

                 Itop= Ibottom= {Iblockx [Wblock / (Wblock+Hblock)]} / 2

                 Ileft= Iright= {Iblock x [Hblock/ (Wblock+Hblock)]} / 2

3. Calculation of power-strap width based on EM:

               W strap_vertical( = W strap_top= W strap_bottom) = Itop/ J metal
               
                W strap_horizontal( = W strap_left= W strap_right) = Ileft/ J metal

4. Calculation of strap width based on IR drop dominates:

              Wstrap_vertical ≧ (Itop x Roe x Hblock) / 0.1Vdd
           
              Wstrap_horizontal≧ (Ileftx Roe x Wblock) / 0.1Vdd

5. Partition the power straps into power refreshes:

For better utilization of the routing channels, select a refresh width of (3 routing pitch + minimum metal6 width) = (3 x 0.59 μm + 0.25 μm) = 2.01μm 2 μm in the vertical and the same in the horizontal.

Block A as an example, the number of the Vdd/Vssrefresh is:

          Nrefresh_horizontal= Wstrap_ horizontal/ Wrefresh

          Nrefresh_vertical= Wstrap_vertical / Wrefresh

The spacing of each refresh would be:

         Srefresh_horizontal= Hblock/ Nrefresh_horizontal

         Srefresh_vertical = Wblock/ Nrefresh_vertical

6. Calculate the required number of core power/ground pads:

 If each power/ground pad can sustain 25 mA current, Pcore=630mw

        Npad_core = (Pcore/ Vddcore) / Icore_power_pad

                           = (630/1.2)/25
                           = 21

7. Core Power Estimation :

 The following equation provides a simple method to estimate the dynamic power and leakage power of combinational cells in the core area:

      Pdynamic= Pcore x F x Scomb x Ncomb

                Where,
                Pcomb. is the power per MHz per gate count (nW/MHz/gate)
                F is the working frequency. (Unit = MHz)
                Scomb. is the switching activity of combinational logic

                Ncomb. is the number of gate counts

      Pstatic= Pleakagex Ncomb
               
                   Where,
                    Pleakage is average leakage power of gate
                    Ncomb. is the number of gate counts
Consider
•Gate count of combinational logic is 160K gates
•The working frequency is 27MHz
•Switching activity is 0.2

Then, the dynamic power consumption in the combinational circuit is,
Pdynamic = Pcorex F x Scombx NcombPdynamic
                = 12.35 nW/MHz X 27 X 0.2 * 160K
                = 10.67 mW

The leakage power consumption in the combinational logic is
Pstatic= Pleakagex NcombPstatic
          = 0.756 nW X 160K
          = 0.121 mW

Multi Voltage Design - Power Management Technique

Multi Voltage Design:


Power is primary concern in many segments of today's electronics business. As discussed in earlier posts, Power is two types in IC Design - Dynamic and Static power. Dynamic power comprises of Internal power and switching power where as static power comprises of leakage power. As discussed in earlier post, Internal power (Dynamic) includes short-circuit (Vdd to GND) power as well as power consumed due to switching of internal nets.Switching (Dynamic) power is due to charging and discharging of load capacitance during switching.

We know that, Dynamic power is proportional to C.V^2. f. where

                                         C is Capacitance
                                          f is Switching Frequency
                                         V is Voltage

The dynamic power in designs is growing rapidly because dramatic increases in clock speeds and transistor counts. By using clock gating technique, the dynamic power due to switching can be reduced. But dynamic power varies linearly with frequency and it varies proportional to square of the operating voltage.Therefore, We can reduce the dynamic power significantly by reducing the operating voltage.

Challenges and Requirements for Multi Voltage Design:

Multi-voltage design styles vary with the target application. Figure 1 shows three different design styles used today. The most standard style consists of partitioning the design into independent voltage areas (or islands)that can function at a specific minimum voltage under a given performance constraint. Each voltage area operates at a single voltage: this can be the same as the chip voltage main Vdd or it can be a different voltage. Another commonly used multi-voltage design style consists of a power-down mode where one or more voltage areas may be shut down to conserve power during low-performance operating modes, such
as sleep or hibernation. The most advanced multi-voltage design style, however, is Adaptive Voltage Scaling
(AVS). AVS uses on-chip (or off-chip) monitors to adaptively adjust voltage levels based on operating mode requirements and process and temperature.


To achieve multi-voltage design, a systemic solution is required that:




  • Supports advanced infrastructures, offering required libraries and cells for different multi-voltage design          styles
  • Offers integrated RTL to GDSII implementation with advanced, convergent dynamic and leakage power       optimization for faster time-to-results (TTR) and enhanced quality-of-results (QoR)
  • Ensures timing, SI, power, and power integrity sign-off   


  •  Now Android Application available, Click here to download it

    Monday, 7 October 2020

    Power Gating - Power Management Technique

    Power Gating:

    Power Gating is a low power technique in deep sub micron technologies. Power Gating is performed by shutting down the power for a portion of the design in order to reduce the static(leakage) power in the design. Power Switch (PS) cell is  basic element which is used in power gating technique to shutting down the power for a portion of the design. The PS cell is also known as power management cell. The basic idea of power gating is to separate the VDD or GND power supply from standard cells of a specific design hierarchy.

    Appropriate sized PMOS(Header) or NMOS(Footer) transistors are used as Power Switch (PS) cells. These two NMOS, PMOS cells only differ in the fact that the switches switch different power rails VDD and VSS respectively as shown in below Figure1. The designer turned to use header switches since header switches have less leakage and they are also more easy for implementation.


                                                            Figure 1. Power Gating

    Switch cell has two modes of operation  - ON or OFF
    When switches are in off state, they disconnect the devices inside the block from power source. This reduces the leakage current flow in the devices of the block.

    There are two approaches in Power Gating.
        1. Fine Grain Power Gating
        2. Coarse Grain Power Gating

    In Fine Grain Power Gating Technique, Each standard cell has inbuilt power switch. Where in Coarse Grain technique switches control entire block of standard cells using a large size transistor. Each of these approaches has their various trade-offs. Fine grain is easier to implement in terms of timing analysis, but with significant area overhead resulting in higher fabrication cost.On the other hand, the coarse grain switches require more consideration in terms of timing and wake-up time, but shows grater leakage saving. The coarse grain power gating is common implementation technique nowadays and can reduce leakage current by 30X.

    Power Switches Placement Styles:

    Coarse grain implementation provides multiple placement topologies for the power switches. For example, switches can be placed around the power domain (in a column or ring way) or in an array fashion inside the domain area. Array style is a more common technique as it yields smaller IR-drop and less area. It is also more efficient with respect to Power-Gates control sequence. On the other hand, ring approach can eliminate the user from synthesizing complicated Power-Grid and it also gives better placement results, as it removes fragmentations from placement areas.

    Array style also suits best Flip-Chip designs, where Power is delivered from the Bond pads placed also inside the core, which reduce IR-drop significantly, when compared to ring placement style.

    Low power Cells:

    To facilitate data transfer between multiple Power domains operating at different voltage levels, it is recommended to use level-shifters. Usually both low-to-high and high-to-low level shifters are provided by library vendors.
    Level shifters are used for two main reasons. First of all, when a signal propagates from a low-voltage block to a high-voltage block, a lower voltage at the PMOS gate might result in the gate not being entirely switched off, which can cause abnormal leakage current. Secondly, because signals must transition across voltage domains, levels shifters should be used to ensure that both net transition and net delays are accurately calculated.

    For power domains which share the same operating voltage but some of them may be shut-off, an isolation cell is required on power domain interface. The reason for this is that cells connected to power-off blocks, their inputs become floating which may cause high leakage power. Therefore, isolation cells are necessary to isolate floating inputs. The isolation is performed by setting a default logic value on the output depends on the state of a dedicated control pin. Usually 2 types of isolation cells are provided by the library vendor: clamp0 and clamp1, which differs by the default value, set in isolation state. Desired cell type is chosen according to the functionality on the receiver side.

    Blocks operate at different voltage levels, and some of them can also be turned off, requires both isolation and level-shifting functions at the power domain interface. To simplify implementation, library vendors usually supply a single cell called the enable-level shifter, which is basically a level-shifter that includes an enable signal.

    The recommendation is to place Enable Level Shifters on all outputs of such blocks. Both Isolation cells and Enable Level Shifters are placed on the Always-on area. Figure 2 illustrates Low-Power cells usage between various types of power domains.
                                                       Figure 2. Low power cells usage

    Power Switch Count:

    In order to ensure correct operation under functional mode, we need to make sure no I/R drop is within cell characterization range (usually 10% of Nominal voltage). Since Power switches are in linear state when they are turned ON, they act like a resistor which drops the Voltage based on its resistance, as described in figure 3.

                                                Figure 3. IR Drop through Power Switch
    Minimal number of power switches can be determined from the following data:
    • DC I/V curve (Transistors are in linear state)
    • IR drop limit for the switches
    • Domain power consumption

    One can use the following formula to derive the minimum number of switches required for a
    design when the above data is given as input.


    Additional optimization can be made for leakage/Performance trade-off. While large number of
    switches increases total leakage & area, insufficient number of switches increase IR drop and
    degrades performance.

    References:

    1. Robust Power Gating Implementation using ICC by Ariel Wolf, SNUG Israel 2009.

    Now Android Application available, Click here to download it

    Sunday, 8 September 2020

    Physical Design (PD) Interview Questions - Floorplanning

        1. What is floorplaning?
       A. Floor planing is the process of placing Blocks/Macros in the chip/core area, thereby determining the routing areas between them. Floorplan determines the size of die and creates wire tracks for placement of standard cells. It creates power straps and specifies Power Ground(PG) connections. It also determines the I/O pin/pad placement information.

    In simple words, Floorplaning is the process of determining the Macro placement, power grid generation and I/O placement.
       2.       How can you say a floorplan is good?
       A.      A good floorplaning should meet the following constraints
    ·         Minimize the total chip area
    ·         Make Routing phase easy (Routable)
    ·         Improve the performance by reducing signal delays
       3.       What are the inputs for floorplan?
       A.      The following are the inputs for Floorplan
    ·         Synthesized Netlist (.v, .vhdl)
    ·         Design Constraints  (SDC - Synopsys Design Constraints)
    ·         Physical Partitioning Information of the design  
    ·         IO Placement file (optional)
    ·         Macro Placement File (optional)
    ·         Floorplaning Control parameters
       4.       What are the outputs of floorplan?
       A.      The following are the outputs for floorplan
    ·         Die/Block Area
    ·         I/Os Placed
    ·         Macros placed
    ·         Power Grid Design
    ·         Power Pre-routing
    ·         Standard cell placement areas
       5. What are the floorplaning control parameters?
       A.      Aspect ratio, Core utilization, Row/Core Ratio, Width and Height are the floorplaning control parameters. For more information please visit Floorplaning Control Parameters

       6.       What is the Aspect Ratio?
       A.      please visit floorplaning control parameters post
        7.       What is core utilization?
       A.      please visit floorplaning control parameters post
        8.       What is total chip utilization?
       A.      please visit Floorplan control parameters
        9.       How macro placement is done in floorplaning? or What are the guidelines for macro placement?
       A.      please visit Macro Placement post
       10.   What is blockage? What are the different types of blockages? How these blockages are used in physical       design?
       A.      please visit Blockages and Halos Post
      11.   What is Halo? How it is useful?
       A.      Please visit Blockages and Halos Post
      12.   What are the fly/flight lines? How these fly/flight lines are useful during macroplacement ?
       A.      Please visit Macro Placement post
       13.       A netlist consisting of 500k gates and I have to estimate die area and floorplanning.  How do I go about it?
        A.      There are 2 methods to estimate die area
        Method 1:
          Each cell has got its area according to a specific library. Go through all your cells and multiply each cell in its corresponding area from your vendor's library. Then you can take some density factor - usually for a standard design you should have around 80% density after placement. So from this data you can estimate your required die area.
        Method 2:
         One more way of doing it is, Load the design in the implementation tool, try to change the floorplan ( x & y coordinates ) in a such a way that the Starting utilization will be around 50% -to- 60%. Again, it depends on the netlist quality & netlist completion status (like Netlist is 75%, 80% & 90% completed).
       14.       How to do floor planning for multi Vdd designs?
         A.      First we have to decide about the power domains, and add the power rings for each domain, and add           the stripes to supply the power for standard cells.
       15.       How to calculate the power ring width and power straps width and no of power straps using the         core power consumption?
        A.      Please click here for more details
       16.       What is core utilization percentage?
       A.   Core utilization percentage indicates the amount of core area used for cell placement. The number is             calculated as a ratio of the total cell area (for hard macros and standard cells or soft macro cells) to the         core area. A core utilization of 0.8, for example, means that 80% of the core area is used for cell placement   and 20 percent is available for routing.
      17. When core utilization area increased to 90%, macros got placed outside core area so does it mean that increase in core utilization area decreases width and height?
       A. If you go on with 90% then there may be a problem of congestion and routing problem. It means that you    can’t do routing within this area. Sometimes you can fit within 90% utilization but while go on for timing          optimization like upsize and adding buffers will lead to increase in size. So in this case you can’t do anything  so we need to come back to floorplan again. So to be on safer side we are fixing to 70 to 80% utilization.
      18. Why do we remove all placed standard cells, and then write out floorplan in DEF format. What's use of DEF file?
       A. DEF deals only with floorplan size. So to get the abstract of the floorplan, we are doing like this. Saving and loading this file we can get this abstract again. We don’t need to redo floorplan.

       19. Can area recovery be done by downsizing cells at path with positive slack?
        A. Yes, Area recovery can be done by downsizing cells at path with positive slack. Also deleting unwanted        buffers will also help in area recovery
      20. We can manipulate IR drop by changing number of power straps. I increased power straps which reduced IR drop, but how many power straps can I keep adding to reduce IR drop? How to calculate number of straps required. What problems can arise with increase in number of straps?
      A. We can use tools to calculate IR drop (ex:- Voltagestrom, Redhawk) if drop is high. Based on that we can add straps. But if you do projects repeatedly you will come to know that this much straps is enough. In this case you will not need tools. It’s having calculation but it’s not accurate it’s an approximate one. Number of straps will create problem in routing also it affects area. So results will be in routing congestion. To number of power straps required for a design click here.
       21.       aprPGConnect, is used for logical connection of all VDD, VSS nets of all modules. so how do we connect all VDD, VSS to global VDD /VSS nets before placement?
       A. The aprPGConnect, is used for logical connection of all VDD, VSS nets of all modules. For physical connection you can use the axgCreateStandardcellRails command to create the standard cell rails and through them connect to the rings or the straps depending upon power delivery design.
       22.   A design has memory and analog IP. How to arrange power and ground lines in floor-plan. Is it      separate digital and analog power lines? It is important to design power-ground plan on ASIC?
       A. Basically you have to make sure to keep analog and digital rails isolated from one another. All hard macro and memory blocks need to have a vdd/vss pair ring around them. Memories are always on the side or corners of your chip. Put a pair of vdd/vss ring around your design. This is usually called core power ring.
         Create a pair of vertical vdd/vss every 100 micron. This is called the power straps and on either side taps into the core power ring. put a pair of vdd/vss around every analog block and strap these analog rings (using a pair of vdd/vss) and run them to your package vdd/vss rings.
         Keep in mind that in every place a digital vdd/vss crosses analog vdd/vss straps, then you need to cut the digital vdd/vss on either side of the analog crossing to isolate the analog from digital noise. you need to dedicate pins on your chip for analog power and ground. Now we come to the most time consuming part of this, HOW THICK SHOULD YOU MAKE all these rings/straps. The answer is this is technology dependent. Look into the packaging documentations, they usually have guidelines for how to calculate the thickness of you power rings. Some even have applications that calculate all this for you and makes the cuts for analog/digital crossings.
      23. In my design, core PG ring and strips were implemented by M6/M7,and strips in vertical              orientation is M6.I use default method to connect M6 strips to stand cell connection,M1,the vias    from V12,V23,.. to V56 will block the routing of M2,..M6, it will increase congestion to some    extent. I want to know is there any good method to avoid congestion when add strips or connect       strips to standard cell connection?
       A. In Synopsys ICC, there was a command controlling the standard cell utilization under power straps. Using this you can have some sort of channels passing through stacked vias, between standard cells. This limits the detours done because of these stacked vias and allows more uniform cell placement resulting and a reduced congestion. in Soc Encounter, The command setPrerouteAsObs can be used to control standard cell density under power strips. But the 100% via connection from M1 to M6 under wide strip metal still block other nets' routing.
    124.    How to control via generation when do special route for standard cell, such as how to reserve          gaps between vias for other net routing?
       A.      To remove those stack vias you need to
    1.       Either returns back to floorplan step, where power straps and power/ground preroute vias are dropped. Normally vias are dropped regularly to reduce power & ground resistance; therefore maximum numbers of vias are dropped over power/ground nets. Therefore you need to check your floorplan scripts. They should be after horizontal & vertical power strap generation at M6 & M7.
    2.   If the vias to be removed are at specific regions you can delete them at any step, but before global routing of course to allow global route be aware of resources/obstructions. In this case as you'll increase the power/ground resistance you should confirm this methods validity by IR Drop analysis.
    3.  If IR Drop is an issue, another option would be placing standard cell placement percentage blockages (Magma has percentage blockages which is good at reducing blockages). This is the safest method as you don't need to delete those stacked M1-to-M5 vias anymore. However as you'll need to reduce placement density this will cost you some unused area.
    1 25.   How to do a good floor plan and power stripes with blocks?
        A.      A good floorplan is made when:
                 -Minimum space lost between macros/rows,
                 -Macros placed in order to be close to their related logic,
                 -IR/Electro Migration is good
                 -Routing congestion as minimal.
    126. How to reduce congestion?
       A.  By adding placement blockage & routing blockage during the floorplan, Congestion can be reduced. Placement blockage is to avoid the unnecessary cell placement in between macros & other critical areas. Routing blockage is used to tell the global router not to route anything on the particular area. Sometimes people used to change/modify the blockages according to their needs   at   each stage of the design.      
          Normally routing blockages should be placed before global routing to force global router to respect these blockages. Most Place and Route tools runs the first global routing at placement step and then updates it incrementally, therefore add blockages before placement. Otherwise if you want to use it after any global/detail routing is done, you may need to update global routing first (may be incrementally).
      27.   How to find the reason for congestion in particular region? How to reduce congestion?
        A.      First analyze placed congested database, and find out the hot spot which is highly congested.
               Case -1: "Congestion in Channel between macro"
               Reason:-  Not enough tracks is available in channels to route macro pins, or channel is highly congested                       because of std cell placement.
               Solution:- Need to increase channel width between Macros or please make sure that soft blockage or                           hard blockage is properly placed.
               Case -2:- "Congestion in Macro Corners"
               Reason:- Corners of macro is very prone to congestion because its having connectivity from both direction
               Solution:-
                                 1. Place some HALO around each macro (5-7um).
                                 2. Place a hard blockage on macro corners (corner protection (Hard Placement Blockage)                                done after standard cell rail creation otherwise it won't allow standard cell inside it.            
               Case -3: "Congestion in center of chip/congestion in module anywhere in chip"
               Reason:- Congestion in standard cell or module is based on the module local density (local density is very high 95%-100%).Also depend on module nature (highly connected). Die area less.
               Solution:-
                              1.       Module density should be even in whole chip (order os 65-85%).
                              2.       Use density screen/Partial blockage to control module density in specific areas.
                              3.       Use cell padding
                              4.       If congestion is too big in that case chip area should be increased based on the congestion                              map.
      28.   What are the reasons for the Routing congestion in a design?
       A.      Routing congestion can be due to:
                1. High standard cell density in small area.
                2. Placement of standard cells near macros.
                3. High pin density on one edge of block.
                4. Placing macros in the middle of floorplan.
                5. Bad Floorplan
                6. Placement of complex cells in a design
                7. During IO optimization tool does buffering, so lot of cells sits at core area.
       29. What actually happens in power planning? What is the main aim of power planning?
       A. The main aim of power planning is to ensure all the cells in the design are able to get sufficient power for   proper functioning of the design. During the power planning the power rings and power straps are created to distribute power equally across the design.        
          Power straps are provided for the regulated power supply throughout the block or chip. Number of straps depends on the voltage and the current of your design. You must design the power grid that will provide equal power from all sides of the block .you can also use the early rail analysis method determine the IR drop in your block and lay the sufficient power stripes.
       30. How power stripes are useful in power planning ?
       A. If the chip size is large, therefore core power rings do not able to supply power to standard cells
         because of long distance particularly the cells in the center of the chips (or will give high IR drop to
         the farthest cells), then you need power stripes. The number of stripes depend of the area of you chip.
         31. What is the minimum space between two macros? How we can find minimum space of macros?
          A. The distance between macro = (no. of pins of macros*pitch*2)/no. of available routing layers
                  For example, the design has 2 macros having the pins of 50 each macro and pitch = 0.50 and available      
                  metals are 8.
                    Then space between macros = ((50+50)*0.5*2)/8 = 12.5
         32.   What are the steps needed to be taken care while doing Floorplaning?
                ·         Die Size Estimation
                ·         Pin/pad location
                ·         Hard macro placement
                ·         Placement and routing blockage
                ·         Location and area of the soft macros and its pin locations
                ·         Number of power pads and its location.
          Note:- For block level Die size and Pin placement comes from TOP
      
      à Fly-line analysis is required before placing the macros
      à While fixing the location of the pin or pad always consider the surrounding environment with which the           block or chip is interacting. This avoids routing congestion and also benefits in effective circuit timing
      à Provide sufficient number of power/ground pads on each side of the chip for
                 effective power distribution.
      à In deciding the number of power/ground pads, Power report and IR-drop in the design should also be
                 considered.
      à Orientation of these macros forms an important part of floorplaning.
      à Create standard cell placement blockage (Hard Blockage) at the corner of the macro because this part           is more sensitive to routing congestion.
      à Using the proper aspect ratio (Width /Height) of the chip

          for placing block-level pins:
      à First determine the correct layer for the pins
      à Spread out the pins to reduce congestion.
      à Avoid placing pins in corners where routing access is limited
      à Use multiple pin layers for less congestion
      à Never place cells within the perimeter of hard macros.
      à To keep from blocking access to signal pins, avoid placing cells under power straps unless      
                  the straps are on metal layers higher than metal2.
      à Use density constraints or placement-blockage arrays to reduce congestion.
      à Avoid creating any blockage that increases congestion.

    Wednesday, 4 September 2020

    Power Management Techniques - Clock Gating


    Clock Gating:
    As discussed in earlier blog, There are different types of techniques for low power design

    1. Clock Gating
    2. Multi Vt
    3. Multi Vdd

    Here I am going to discuss about Clock Gating

    Clock Gating:
    As discussed in earlier, The dynamic power is given by 
                                Pdynamic = Af * Cload * Vdd ^2  = 0.5 Cload *Vdd^2

                                                Where Af  = Switching Activity Facator
                                                      Cload  = Load Capacitance
                                                          Vdd = Supply Voltage

    Clock is high switching element in the design. It has high activity factor. Consequently, the clock network ends up consuming a huge fraction of the dynamic power. Clock Gating reduces the dynamic power  by disconnecting the clock from an unused circuit block to limit switching activity of clock. From the above equation it is clear that 50% of dynamic power is due to clock switching. Clock Gating technique reduces the dynamic power consumed by limiting the switching activity factor.  

    How Clock Gating works ?






    As shown in  figure 1, the two circuits are implemented one without clock gating and another with clock gating. In Figure 1(a), When the enable is high, the input D is  propagated to as input to the next synchronous element ( flip-flop). The new data D is propagated as output Q during the clock edge. when the enable is low the recycled data is propagated. In both cases ( Enable is either high or low ) the clock is continuous to toggle(switching) at the flipflop, which dissipates dynamic power.

    As shown in figure 1(b), the clock to the flipflop is applied through  AND gate. This clock is called as gated clock. This technique is clock gating technique. When the is Enable is at low level and the clock is at high level, the clock won't toggle (Switch) because of AND  gate. In this way the clock Gating technique will reduce the switching activity of clock in order to save the power.



    Tuesday, 27 August 2020

    IR Drop Analysis


    What is IR Drop Analysis? How it effects the timing?


    The power supply in the chip is distributed uniformly through metal layers (Vdd and Vss) across the design. These metal layers have finite amount of resistance. When voltage is applied to this metal wires current start flowing through the metal layers and some voltage is  dropped due to that resistance of metal wires and current. This Drop is called as IR Drop. For example, a design needs to operate at 2 volts and has a tolerance of 0.4 volts on either side, we need to ensure that the voltage across its power pin (Vdd) and ground pin (Vss) in that design does not fall short of 1.6 Volts.The acceptable IR drop in this context is 0.4 volts. That means the design in this context can allow upto 0.4 volts drop which does not effect the timing and functionality of design.

    How it effects the timing?
    IR Drop is Signal Integrity(SI) effect caused by wire resistance and current drawn off from Power (Vdd) and Ground (Vss) grids. According to Ohms law, V = IR. If wire resistance is too high or the current passing through the metal layers is larger than the predicted, an unacceptable Voltage drop may occur. Due to this un acceptable voltage drop, The power supply voltage decreases. That means the required power across the design is not reaching to the cells. This results in increased noise susceptibility and poor performance.

    The design may have different types of gates with different voltage levels. As the voltage at gates decreased due to unacceptable voltage drop in the supply voltage, the gate delays are increased non-linearly. This may lead to setup time and hold time violations depending on which path these gates are residing in the design. As technology node shrinking, there is decrease in the geometries of the metal layers and the resistance of this wires increased which lead to decrease in power supply voltage. During Clock Tree Synthesis, the buffers and inverters are added along the clock path to balance the skew. The voltage drop on the buffers and inverters of clock path will cause the delay in arrival of clock signal, resulting hold violation.


    What are the tools used for IR Drop Analysis? In which stage IR Drop Analysis performed ?

    Various tools are available for IR Drop Analysis. Voltagestorm from Cadence, Redhawk from Apache are mainly used to show IR Drop on chip. Here we are going to discuss about IR Drop using Redhawk. IR Drop Analysis using Redhawk is possible at different stages of the design flow. When changes are in expensive and they don't effect project's schedule, It is better to use Redhawk for IR drop analysis from start of the design cycle. It can identify and fix power grid problems in the design. This also reduces changes required in sign-off stage where final static and dynamic voltage (IR) drops performed. So Redhawk can be used anywhere in the design starting from the floorplanning stage through initial and final cell placement stages.
                                                                                                                        

    Wednesday, 21 August 2020

    Basic Terminology in Physical Design


    Design: A circuit that performs one or more logical functions.

    Cell: An instance of a design or library primitive within a design.

    Port: The input or output of a design.

    Pin: The input or output of a cell.

    Net: A wire that connects ports to ports or ports to pins.

    Clock: A timing reference object to describe a waveform for timing analysis.

    Logical Libraries: Logical libraries are libraries which provide


    • Timing and functionality information for all standard cells (like AND, OR, Flipflops)
    • Timing information for Hard Macros (IP, ROM, RAM)
    • Define drive/load design rules ( Max Transition, Max Fanout, Max/Min Capacitance)   
    Physical Libraries: Physical libraries are libraries which contain
    • Physical Information of Standard cells and Macro cells necessary for placement
    • Define placement unit tile 
    Standard Cell: A standard cell is a group of transistors and interconnect structures that provides a boolean logic function (e.g., ANDORXORXNOR, inverters) or a storage function (flipflop or latch). 

    Macro: Macros are intellectual properties that can be directly used in the design. These are need not to be design. For example memories, processor core, PLL etc. A macro can be hard or Soft macro.


    Target Library: A technology library that Design Compiler maps to during optimization. Along with the link_library and search_path variables, you need to specify the logical library that will be used for mapping/optimization.

    Link Library : The technology library that contains the definition of the cells used in the mapped
    design. In principle should be the same as target_library unless a technology translation is being performed.

    Search Path: If the library variables only specify file names, search_path is used to locate libraries. By default points to current working directory. By default, you must specify the unix-path for all files (relative or absolute). It specifies where to look for files.

    Constraints: Constraints are the instructions that the designer can apply during various steps in the VLSI chip implementation, such as logic synthesis, Clock Tree synthesis (CTS), Place & Route, and Static Timing Analysis (STA).
    Constraints are 2 types
    1. Design Rule Constraints
    2. Optimization Constraints  
    Design Rule Constraints:
    • These are implicit constraints.
    •  The technology library (.lib) defines them. 
    • These constraints are requirements for a design to them. 
    • These constraints are requirements for a design to function correctly, and they apply to any design using the library. 
    • You can make these constraints more restrictive than optimization constraints.
    Different types of Design Rule Constraints are
    1. Maximum Transition time
    2. Maximum Fanout
    3. Maximum/Minimum Capacitance
    4. Cell Degradation
    Optimization Constraints:
    • These are explicit constraints; 
    • Designer define them. 
    • Optimization constraints apply to the design on which you are working for the duration of the dc_shell session and represent the design’s goals. 
    • They must be realistic.
    • Optimization Constraints describe the design goals (Area, Timing etc)
    Maximum Transition time:
    The maximum transition time for a net is the longest time required for its driving pin to change logic values. Typically fixed by buffering the output of driving gate. 

    Maximum Fanout:
    The maximum fanout of an output measures it's load driving capability. Most technology libraries (.lib) place fanout restrictions on driving pins, creating an implicit fanout constraint for every driving pin in designs using that library. Design Compiler models fanout restrictions by associating a fanout_load attribute with each input pin and a max_fanout attribute with each output (driving) pin on a cell.

    Maximum Capacitance:
    The maximum total capacitance that an output pin can drive. The maximum capacitance design rule constraint allows you to control the capacitance of nets  directly. (The design rule constraints max_fanout and max_transition limit the actual capacitance of nets indirectly.)

    Minimum Capacitance:
    The min_capacitance design rule specifies the minimum load a cell can drive. It specifies the lower bound of the range of loads with which a cell has been characterized to operate.

    Optimization Constraints:

    Timing Constraints:
    Timing Constraints are required to communicate the design’s timing intentions to IC Compiler. They should be the same ones used for synthesis with Design Compiler (preferably SDC).

    Synopsys Design Constraints (SDC):
    A format used to specify the design intent including the timing, power and area constraints of a design. SDC is tool based. SDC contains 4 types of information.
    1. SDC Version
    2. SDC units
    3. Design Constraints
    4. comments
    SDC version:
    It sets the the version. Default version is 1.9

    SDC units:
    It specifies the units for capacitance, resistance, time, voltage, current and power.

    Design Constraints:
    The following are the design constraints are specified in SDC
                     1. system clock definition
                     2. clock delays
                     3. Multi Cycle Paths
                     4. Input & output delays
                     5. Minimum & Maximum path delays
                     6. Input transition and output load capacitance
                     7. False paths

    Clock Tree Synthesis (CTS):
    CTS is the process of inserting buffers/inverters along the clock paths of the design in order to balance the skew and to minimize insertion delay.
    Skew: Skew is the difference in arrival of clock at two consecutive pins of a sequential element.

    Positive skew- If capture clock comes late than launch clock then it is called positive skew.


    Negative skew-If capture clock comes early than launch clock it is called -ve skew.
    Local skew- It is the difference in arrival of clock at two consecutive pins of a sequential element.
    Global skew- It is Defined as the difference between max insertion delay and the min insertion delay of any flops.
    Boundary skew-It is defined as the difference between max insertion delay and the min insertion delay of boundary flops.
    Useful skew-If clock is skewed intentionally to resolve violations, it is called useful skew.
    Latency- Latency is the delay of the clock source and clock network delay.
    Source latency- The delay from the clock origin point to the clock definition point in the design.
    Network latency- The delay from the clock definition point to the clock pin of the register.
    Uncertainity- Clock uncertainty is the time difference between the arrivals of clock signals at registers in one clock domain or between domains.
    Jitter- Jitter is the short-term variations of a signal with respect to its ideal position in time. It is the variation of the clock period from edge to edge.

    Setting Operating conditions:

    1. Process Variation:
    Variations in the process parameters, such as impurity concentration densities, oxide thicknesses, and diffusion depths. These are caused by non-uniform conditions during the deposition and/or the diffusion of the impurities. This introduces variations in the sheet resistances and transistor parameters such as the threshold voltage Variations in the dimensions of the devices, mainly resulting from the limited resolution of the photo lithographic process. This causes (W/L) variations in MOS transistors and mismatches in the emitter areas of bipolar devices.
    2. Supply Voltage Variation
    3. Ambient temperature Variations
    4. It is important to analyze the design for best case and worst case scenarios. Best case to find issues with hold time violations and worst case to find issues with setup violations.

    Timing Analysis:
    Timing analysis is a method of validating the timing performance of a design  by checking the timing paths for timing violations. 

    Net Delay: Interconnect relationships between a driver pin and its fanout
    In the absence of physical design information, the timing analyser in Synopsys uses statistically generated wire load models to estimate wire lengths in a design. Two important concepts behind wire load models are
    1. Wire load models provide a fanout to length relationship. So by knowing fanout, one can estimate the            length.
    2. capacitance and resistance per unit length are given and the estimated length is then translated into                 estimated R and C values to give an estimated delays.
    Wire load models are area dependent. Larger the area, greater the R and C value per unit length.

    Cell Delay: 
    • Timing relationships between an input pin and an output pin, or between an output pin and another output pin of the same gate. 
    • Cell delay is calculated using non-linear delay models, which are stored in the ‘LM’ view of each cell. 
    • NLDM is highly accurate as it is derived from SPICE characterizations. 
    • The delay is a function of the input transition time of the cell (TInput) [also called slew], the driving strength of the cell (RCell), the wire capacitance (CNet) and the pin capacitance of the receivers (CPin). 
    • A slow input transition time will slow the rate at which the cell’s transistors can change state (from “on” to “off”), as well as a large output load (Cnet + Cpin), thereby increasing the “delay” of the logic gate. 

    There is another NLDM table in the library to calculate output transition. Output transition of a cell becomes the input transition of the next cell down the chain.

    CMOS Delay Model:  
       Transition Time = Drive R * Load C
       Cell Delay = f(Input Transition Time, Cnet + Cpin)
       Net Delay = f(Rnet, Cnet + Cpin)

    You might also like