Optimum Reconfiguration of Routing Interconnection Network in APSoC Fabrics

This paper presents an automated algorithm for optimum configuration of routing interconnection network in Xilinx Zynq-7000 All programmable system-on-chip (APSoC) fabrics. A method to configure circuits with optimum routing resources is presented along with their performance parameters with and without the proposed algorithm. The proposed algorithm enables full control over routing resources for using different interconnection types in order to create routing-based circuit-under-test. The algorithm proposes the routing techniques through the 2-D array of switch matrices inside the interconnection network and automatically identifies the involved programmable interconnection points associated with a node. An experimental setup is proposed to measure the performance parameters such as slack time and power with and without the applied algorithm on the APSoC routing resources. The proposed setup requires no external equipment such as manufactured equipments or external instruments for performance measurement.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

10/25/2018

A General, Fault tolerant, Adaptive, Deadlock-free Routing Protocol for Network-on-chip

The paper presents a topology-agnostic greedy protocol for network-on-ch...
04/19/2018

Toward a Programmable FIB Caching Architecture

The current Internet routing ecosystem is neither sustainable nor econom...
10/02/2021

Optimized Graph Based Routing Algorithm for the Angara Interconnect

JSC NICEVT has developed the Angara high-speed interconnect with 4D toru...
02/04/2021

Analyse formelle de concept pour le routage des requêtes dans les systèmes pair-à-pair

The Peer-to-Peer systems (P2P) were led these last years as the major te...
08/01/2021

Efficient On-Chip Multicast Routing based on Dynamic Partition Merging

Networks-on-chips (NoCs) have become the mainstream communication infras...
11/12/2018

Simple FPGA routing graph compression

Modern FPGAs continue to increase in capacity which requires more memory...
01/05/2020

An adaptive data-driven approach to solve real-world vehicle routing problems in logistics

Transportation occupies one-third of the amount in the logistics costs, ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Field Programmable Gate Arrays (FPGAs) have attracted a lot of interest in various domains due to their high circuit density and growing performance capability. These semiconductor devices are structured in an array of configurable logic blocks (CLB) connected via a programmable routing interconnection network [1, 2]. Advances in semiconductor technology enabled integrating programmable logics with complex systems in a single silicon die. These new Xilinx All Programmable System-on-Chip (APSoC) devices have been used extensively in different applications in recent years [1, 2, 3, 4, 5]. The PL part of the APSoC is the FPGA itself and is used for implementation of different digital circuits and systems. The PS part contains a microcontroller. Hence, the functionality of any system implemented into an APSoC can be partitioned between PL and PS while the PS can also take the control over the PL. Among the APSoC devices, the Xilinx Zynq-7000 fabricated in the Taiwan Semiconductor Manufacturing Company’s 28 nm technology node has been vastly used for different applications in recent years [6, 7, 8]. Routing resources in Xilinx APSoC fabrics are controlled by SRAM cells that are called configuration bits [9].

This paper presents an automated algorithm for optimum configuring of the routing resources in the programmable logic (PL) of a Zynq-7000 APSoC device. We propose a method to configure circuits with optimum routing resources as well as the performance validation results with and without the proposed algorithm. The proposed algorithm enables full control over routing resources for using different interconnection types to create optimum routing paths for the implemented circuits. The automated algorithm is implemented in the Xilinx Vivado scripting tool. Also, the algorithm proposes a technique for traversing switch matrices (SM) inside the interconnection network and automatically identifies the involved programmable interconnection points (PIP) associated with an input or output pin of an SM. It is noted that the default routing optimizer of the Xilinx Vivado tool never considers the optimum routing paths for a targeted design for implementation if the timing constraints are met. Even in the case of timing violation, Vivado optimizer tool just informs the violatied paths while no solution than increasing the paths delays is proposed by default. We also propose an experimental setup to measure the performance parameters such as slack time and power with and without the applied algorithm. No external equipment (e.g., such as manufactured equipments or external instruments) is required for such measurements.

This paper is structured as follows. Some background information is presented in Section II. An overview of the routing resources in Zynq-7000 APSoC is presented in Section III. The proposed algorithm for routing resources in Zynq-7000 APSoC is presented in Section IV. Experimental setup and results are discussed in Section V. Conclusion and future works are finally drawn in Section VI.

Ii Background

This paper focuses on developing and automated algorithm for optimum configuration of interconnection network in the PL resource in a Zynq-7000 device. This state-of-the-art APSoC device offers specialized modules merged with the PS in a single die. It is noted that almost 98% of all memory elements in the PL part are configuration bits, of which more than 90% control the routing resources [1, 5].

Routing interconnection network in the PL section is configured through a 2-D array of SMs. An input pin of each SM comprises a set of PIPs where a PIP, a CMOS transistor switch, can be programmably turned on/off to add/remove interconnects throughout the network.

Recent studies confirms the vast involvement of routing interconnection network in a variety of applications such as clock tuning [10], Time-to-Digital Converters (TDC) [11, 12], Physical Unclonable Functions (PUF) [13], and True Random Number Generators (TRNG) [14] have shown that full control of the routing path between two given points of the circuit is an essential requirement. It is noted that the placement of circuit elements can be fully controlled by the designer, while routing resources are less controllable.

Fig. 1: Topology of CLB and INT tiles in Xilinx APSoCs. In this example, Logical net NetA connects a source flip-flop to a destination LUT made of seven nodes each made of SINGLE (1L) interconnects.

To make the proposed routing algorithm more comprehendible and also the ease of performance validation, the routing-based ring oscillators (RO) will be implemented as the circuits under test (CUT).

Iii Overview of Routing Resources in Zynq-7000 Apsoc

Generic APSoC fabrics consist of some fundamental logic cores linked via an interconnection network. Three types of resources are commonly used in APSoC architectures: logic resources, routing interconnects, and switch matrices [15]. In this paper, we focus on the routing resources and switch matrices that are the roots of the interconnection network.

Iii-a Logic and Interconnect Tiles Resources

Logic resources in APSoCs are linked via an interconnection network comprised of different interconnection types. Some interconnects are dedicated to specific logics or functions and the rests are global. Interconnects in the network span in both horizontal and vertical planes traversing the gate array from west to east and north to south, respectively. SMs are used to link various interconnects and transmit data inside the fabric.

Fig. 1 shows the topology of a CLB and interconnect (INT) tiles in Xilinx 7-Series APSoCs [3, 5]. In this scheme, the planar SM has an injective mapping where each input node on the right side is connected to only one node on its left side. The INT tile comprises one Wilton SM (WSM) where each input node has a multiple mapping possibility to several output nodes and vice versa [16]. A net, such as NetA in Fig. 1, comprises a list of nodes and represents a logic net. A WSM input node sends data signal to several outgoing nodes (called downhill node) and one of the PIPs connected to an output node is configured to receive data signal from one of its multiple incoming nodes (called uphill nodes). A PIP specifies a configurable connection between an SM input and an SM output comprised of a programmable CMOS transistor as depicted in Fig. 1 [17].

Iii-B Interconnect Types Available in 7-Series FPGAs

Xilinx APSoCs generally consist of fifteen types of interconnects for data signal transmission throughout the PL side of the fabric. It is noted that these interconnects are not valid for the PS side cause it is made of the multi-core microcontroller cores with dedicated interconnection network for itself and is out of the focus in this paper. Interconnects linking the Wilton SMs are categorized as follows [3, 5]:

  • SINGLE (1L): unidirectional interconnects that span 1 CLB;

  • DOUBLE (2L): unidirectional interconnects that span 1 or 2 CLBs;

  • HQUAD (4L): unidirectional interconnects that span 4 CLBs;

  • VQUAD: unidirectional interconnects that span 6 CLBs;

  • BOUNCEACROSS: unidirectional interconnects that span 1 CLB only vertically;

  • VLONG: bidirectional long interconnects that span 20 CLBs vertically;

  • VLONG12: bidirectional long interconnects that span 12 CLBs vertically;

Interconnect Type Number of Interconnects
Connected to each Wilton SM
DOUBLE 70
SINGLE 68
BOUNCEACROSS 17
VLONG 3
HLONG 3
PINFEED 42
OUTBOUND 24*
BOUNCEIN 9
PINBOUNCE 16
GLOBAL 12
HQUAD 17
BENTQUAD 34
VQUAD 18
VLONG12 2
HVCCGNDOUT 2
TABLE I: TYPES AND NUMBER OF INTERCONNECTS LINKED TO EACH WILTON SM IN XILINX APSOCS
  • HLONG: bidirectional long interconnects that span 20 CLBs horizontally;

  • GLOBAL: homogeneous and unidirectional interconnects that span 20 CLBs vertically and are dedicated to route specific signals (clock, reset, enable, etc.);

  • BENTQUAD: unidirectional interconnects that bend and span 6 CLBs;

  • PINFEED: short interconnects that link Wilton SM to planar SM (coming into planar SM);

  • OUTBOUND: short interconnects that link planar SM to Wilton SM (outgoing from planar SM); some of them also span 1 CLB;

  • BOUNCEIN: short internal Wilton SM interconnects at some input nodes used to bounce signal;

  • PINBOUNCE: short internal Wilton SM interconnects at some output nodes used to bounce signal;

  • HVCCGNDOUT: GND and VCC interconnects to link Wilton SM nodes to logic ‘0’ or logic ‘1’, respectively.

TableI provides the number of each interconnect’s type connected to a Wilton SM [3,5]. Four out of a total of 24 OUTBOUND interconnects span only one CLB, while the other rests link planar SMs to the Wilton SMs (* in Table I). Interconnects of a same type may have different topologies.

A logical net, for example NetA shown in Fig. 1, comprises a list of nodes {Source  CLBLM_M_A  LOGIC_OUTS2  SW1BEG1  SW1BEG1  NN1BEG1  NN1BEG1  EE1BEG1  EE1BEG1  IMUX7  IMUX7  SW1BEG1  SW1BEG1  NW1BEG1  SW1BEG1  LOGIC_OUTS2  CLBLM_M_D6  Destitation} connecting a flip-flop source to a LUT destination between two cross CLBs. NetA has seventeen nodes made of SINGLE (1L) interconnect that spans only 1SM.

Fig. 2: Proposed algorithm for WSMs.
Fig. 3: Proposed algorithm for WSMs.

Iii-C PIP Notation and Interconnect Coordinates

In Zynq-7000 fabrics, usually a PIP is called by the name of interconnect it is connected to and the interconnect coordinates. The index of BEG or END is assigned to the PIP’s name depending on the interconnect’s tail being the beginning or the end of interconnect). For instance, a PIP in the tiles having coordinates X=5 and Y=15, connecting the beginning (BEG) of a SINGLE (1L) interconnect coming from southeast (NW) tile and the beginning (BEG) of a DOUBLE(2L) interconnect going to northwest (SE), is identified as:

pip INT_R_X5Y15 NW1BEG0 -> SE2BEG1

The numbers before BEG introduces the interconnects’ length they are connecting (2 for DOUBLE interconnect and 1 for SINGLE interconnect in this example). The last number is an auto-assigned index by Vivado tool to distinguish the interconnects of a same category.

Iv Proposed Routing Algorithm for Switch Matrices

It is possible to determine all the PIPs associated with each node of NetA in Fig. 1 that are connected to an input or output pin of a WSM. A trivial method is to select an input/output pin of a WSM and manually check the PIP Junction Properties

PIPs in wsm_level=1: PIPs in wsm_level=2:
node connected to node connected to
LOGIC_OUTS2 NN1BEG3
WW4BEG0 , NW2BEG0 WW4BEG0 , LV_L0
WW2BEG0 , NR1BEG0 WR1BEG1 , WW2BEG0
WR1BEG1 , NN6BEG0 NL1BEG_N3 , WL1BEG2
WN1BEG_N3 , NN1BEG3 NW6BEG0 , SW6BEG3
SW6BE0 , NE6BEG0 NW2BEG0 , SW2BEG3
SW2BEG0 , NE2BEG0 NN6BEG0 , SS6BG3
SS6BEG0 , IMUX_L8 NN2BEG0 , SS2BEG3
SS2BEG0 , IMUX_L40 NE6BEG0 , SR1BEG1
SR1BEG1 , IMUX_L32 LV_L18 , ER1BEG_S0
SL1BEG0 , IMUX_L24
SE6BEG0 , IMUX_L16
SE2BEG0 , IMUX_L0
NL1BEG_N3 , ER1BEG1
BYP_ALT0 , EL1BEG_N3
FAN_ALT0 , EE4BEG0
NW6BEG0 , EE2BEG0
TABLE II: EXAMPLE OF EXTRACTED PIPS CONNECTED TO LOGIC_OUTS2 AND NN1BEG3 INTERCONNECTS IN WSM1 AND WSM4 RESULTED BY THE PROPOSED ALGORITHM

in Vivado. This property identifies the PIPs of only one pin of a WSM at a time. It means that, it is not possible to identify the PIPs associated with several pins of a WSM, or the PIPs in different WSM levels simultaneously. This is very time consuming and inaccurate due to possible mistakes in net selection for designs including several nets. Automatic determination of all PIPs associated with each node facilitates the analysis of routing interconnection network in APSoC devices.

An algorithm is proposed for WSMs that automatically extracts all the PIPs in all WSM levels associated with each pin for any logic net as described in Fig. 2. In this pseudo-code, the source and destination logic cells are generated and placed with the FPGA floorplanner in STEP 1. The notion of logic cell here refers to a slice logic element, such as flip-flop or LUT. Then, a net is routed between the source and destination (STEP 2).

The proposed pseudo-code relies on the Xilinx Design Constraint (XDC) file, which provides full control of the placement and routing of a net in Vivado. Unlike using the automated place and route tasks performed by the tool, a group of nodes that the net should pass through can be specified. This can be achieved by employing the Tool Command Language (TCL) scripting available in Vivado. The FIXED_ROUTE property allows the generation of a list of nodes to configure a net. This property should end with a specified name for the net that is going to be routed. For example, the NetA in Fig. 1 is configured with the TCL script shown in Fig. 3. The indices associated with each node are pre-defined in Vivado to distinguish interconnects in the same category and cannot be changed. TableII shows a partial result for the extracted PIPs connected to LOGIC_OUTS2 and NN1BEG3 interconnects in WSM1 and WSM4 resulted by the proposed algorithm.

Fig. 4: Routing diagram for three different sets of ROs preliminary implemented on the Xilinx APSoC “without” the proposed algorithm: (a) diagram of 1L, 2L, 4L and LONG interconnects, (b) diagram of BENTQUAD interconnects, and (c) diagram of BOUNCEACROSS and VQUAD interconnects [3, 5].

Fig. 4 shows a routing diagram representation of three different sets of ROs preliminary implemented on the Xilinx APSoC [3, 5] “without” using the proposed routing optimization algorithm. Fig. 4(a) to Fig. 4(c) show routing diagrams of horizontal ROs, the BENTQUAD RO, and the vertical ROs, respectively. In this figure, each Wilton SM is shown with a circle (SM) and each interconnect is shown with an arrow (I). The proposed RO architecture makes use of long routing paths and only two logic components.

V Experiments and Results

In the experiments, different ROs were implemented on the PL side of Zynq-7000 APSoC and their routing net delay as well as their frequencies were measured during run time.

Fig. 5: Experimental setup.
RO Type Frequncy Net Delay # of
(kHz) (ps) Interconnects
1L 48912 398 51
1L 48909 402 52
2L 22541 696 56
2L 22541 696 56
4L 6399 183 60
4L 6398 182 60
LONG 16119 521 27
LONG 16121 516 26
BENTQUAD 23551 611 22
BENTQUAD 23548 615 23
BOUNCEACROSS 29852 489 25
BOUNCEACROSS 29851 490 25
VQUAD 29790 516 21
VQUAD 29789 519 22
TABLE III: PERFORMANCE RESULTS OF THE IMPLEMENTATION OF ROS USING THE PROPOSED ALGORITHM

V-a Setup and preliminary implementation

The routing net delay and frequency of each individual RO is measured by using the using Xilinx Integrated Logic analyzer (ILA). Fig. 5 shows the schematic of the implemented CUTs on the Zynq-7000 ZC702 APSoC using different types of interconnections as reported in Table I. It is noted that not all the interconnects reported in Table I are usable for RO implementation cause some of them do not span more than a single WSM (e.g. PINFEED interconnect). Two ROs per type were implemented that resulted in total of 14 ROs. Table III shows the performance results of each RO “without” applying the proposed algorithm.

V-B Implementation with the Proposed Algorithm

In the next step, the Ros were implementation while the algorithm written in TCL script was also sourced during the design implementation process. Fig. 6 shows updated routing diagram of the implemented ROs using the proposed algorithm. It is noted the optimized logic cell coordinates and routing topology has also been updated. Table IV shows the new measurements for the updated design using the proposed algorithm as shown in Fig. 6. It shows an improvement in the parameters specially the optimized number of utilized interconnects and the net delay.

Fig. 6: Routing diagram for three different sets of ROs implemented on the Xilinx APSoC “with” the proposed algorithm: (a) diagram of 1L, 2L, 4L and LONG interconnects, (b) diagram of BENTQUAD interconnects, and (c) diagram of BOUNCEACROSS and VQUAD interconnects.

Vi Conclusion

This paper has presented a detailed analysis of an optimization algorithm applied to the routing resources in PL part of a Zynq-7000 APSoC that includes an SRAM-based FPGA (7Z020-CLG484) available on a ZC702. Fourteen ROs have been configured using two logic cells and routing resources of different interconnection types. The frequency and net delay of ROs has been measured using the ILA and delay measurement scripts in Vivado tool. The measurements have been performed for two scenarios: implementation of ROs “without” applying the proposed algorithm, and implementation “with” deploying the algorithm. In the former, the Vivado self-optimizer took care of the placement and routing while in the latter the proposed algorithm has overwritten the new placement and routing topology.

References

  • [1] F. L. Kastensmidt, et al.,"Analysing the Impact of Aging and Voltage Scaling under Neutron-induced Soft Error Rate in SRAM-based FPGAs," in ESREF: European Symposium on Reliability of Electron Devices, Failure Physics and Analysis, 2014.
  • [2] M. Darvishi, et al., "Circuit level modeling of extra combinational delays in SRAM-based FPGAs due to transient ionizing radiation," IEEE Transactions on Nuclear Science, vol. 61, pp. 3535-3542, 2014.
  • [3] M. Darvishi, et al., "On the susceptibility of sram-based fpga routing network to delay changes induced by ionizing radiation," IEEE Transactions on Nuclear Science, vol. 66, pp. 643-654.
  • [4] M. Darvishi, et al., "Delay monitor circuit and delay change measurement due to SEU in SRAM-based FPGA," IEEE Transactions on Nuclear Science, vol. 65, pp. 1153-1160.
  • [5] M. Darvishi, "Characterization of Interconnection Delays in FPGAs Due to Single Event Upsets and Mitigation." PhD diss., École Polytechnique de Montréal, 2018.
  • [6] M. Ebrahimi, A. Evans, M. B. Tahoori, E. Costenaro, D. Alexandrescu, V. Chandra, et al., "Comprehensive analysis of sequential and combinational soft errors in an embedded processor," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 34, pp. 1586-1599, 2015.
  • [7] A. Pérez, L. Suriano, A. Otero, and E. de la Torre, "Dynamic reconfiguration under RTEMS for fault mitigation and functional adaptation in SRAM-based SoPCs for space systems," in Adaptive Hardware and Systems (AHS), 2017 NASA/ESA Conference on, 2017, pp. 40-47.
  • [8] Á. B. de Oliveira, L. A. Tambara, and F. L. Kastensmidt, "Exploring Performance Overhead Versus Soft Error Detection in Lockstep Dual-Core ARM Cortex-A9 Processor Embedded into Xilinx Zynq APSoC," in International Symposium on Applied Reconfigurable Computing, 2017, pp. 189-201.
  • [9] L. A. Tambara, P. Rech, E. Chielle, J. Tonfat, and F. L. Kastensmidt, "Analyzing the impact of radiation-induced failures in programmable SoCs," IEEE Transactions on Nuclear Science, vol. 63, pp. 2217-2224, 2016.
  • [10] T. Polzer, F. Huemer, and A. Steininger, "A Programmable Delay Line for Metastability Characterization in FPGAs," in Microelectronics (Austrochip), 2016 Austrochip Workshop on, 2016, pp. 51-56.
  • [11] P. Chen, Y.-Y. Hsiao, Y.-S. Chung, W. X. Tsai, and J.-M. Lin, "A 2.5-ps Bin Size and 6.7-ps Resolution FPGA Time-to-Digital Converter Based on Delay Wrapping and Averaging," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 25, pp. 114-124, 2017.
  • [12] S. Berrima, Y. Blaquière, and Y. Savaria, "A multi-measurements RO-TDC implemented in a Xilinx field programmable gate array," in Circuits and Systems (ISCAS), 2017 IEEE International Symposium on, 2017, pp. 1-4.
  • [13] C. Gu, N. Hanley, and M. O’neill, "Improved Reliability of FPGA-Based PUF Identification Generator Design," ACM Transactions on Reconfigurable Technology and Systems (TRETS), vol. 10, p. 20, 2017.
  • [14] A. P. Johnson, R. S. Chakraborty, and D. Mukhopadyay, "An Improved DCM-Based Tunable True Random Number Generator for Xilinx FPGA," IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 64, pp. 452-456, 2017.
  • [15] Xilinx. 7-Series FPGAs Configurable Logic Block User Guide (UG474) [Online].
  • [16] B. Taj, "Single Event Upset error detection on routing tracks of Xilinx FPGAs," Master’s Thesis, Computer and Software Department, McMaster University, Canada, 2013.
  • [17] S. Berrima, Y. Blaquière, and Y. Savaria, "Sub-ps resolution programmable delays implemented in a Xilinx FPGA," 2017.