### Chip Floorplanning Optimization Using Deep Reinforcement Learning

Shikai Wang<sup>1</sup>, Haodong Zhang<sup>2</sup>, Shiji Zhou<sup>3</sup>, Jun Sun<sup>4</sup>, and Qi Shen<sup>5</sup>

<sup>1</sup> Electrical and Computer Engineering, New York University, NY, USA
 <sup>2</sup> Computer Science, New York University, NY, USA
 <sup>3</sup> Computer Science, University of Southern California, CA, USA

<sup>4</sup>Business Analytics and Project Management, University of Connecticut, CT, USA

<sup>5</sup>Master of Business Administration, Columbia University, NY, USA

Correspondence should be addressed to Shikai Wang; rexcarry036@gmail.com

Received 4 September 2024; Revised 18 September 2024; Accepted 29 September 2024

Copyright © 2024 Made Shikai Wang et al. This is an open-access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

**ABSTRACT:** This paper presents a new method for chip floorplanning optimization using deep learning (DRL) combined with graph neural networks (GNNs). The plan addresses the challenges of traditional floor plans by applying AI to space design and intelligent space decisions. Threehead network architecture, including a policy network, cost network, and reconstruction head, is introduced to improve feature extraction and overall performance. GNNs are employed for state representation and feature extraction, enabling the capture of intricate topological information from chip netlists. A carefully designed reward function incorporating wire length minimization, area utilization, and timing constraint satisfaction guides the DRL agent toward high-quality floorplan solutions. An exploration bonus based on reconstruction error addresses the sparse reward problem. Extensive testing of the ISPD 2005 benchmarks demonstrated the effectiveness of the proposed approach, consistently on a state-of-the-art basis. Significant operating improvements include an average 31.4% reduction in halfperimeter wire length (HPWL) and a 34.2% reduction in breach time compared to the best baseline performance. The process scalability and robustness are evaluated, showing performance in various circuits and different perturbations. This research advances AI-driven electronic device design and paves the way for better chip design processes.

**KEYWORDS:** Deep Reinforcement Learning, Graph Neural Networks, Chip Floorplanning, Electronic Design Automation

#### I. INTRODUCTION

#### A. Research Background and Significance

The semiconductor industry has seen tremendous progress in recent years, with interconnects becoming increasingly complex and dense. As the scale and design complexity of today's Very Large Scale Integration (VLSI) circuits continue to increase, placement algorithms face the challenge of solving increasingly complex multi-objective optimization problems that involve multiple iterations[1]. Chip floorplanning, an essential step in the physical design process, is crucial in determining integrated circuits' overall performance, power consumption, and area utilization. The floor plan's quality directly affects the design's next phase, including placement, instruction, and closing time. Process floor plans often struggle to find the best solutions in the design space, leading to optimal chip designs and increasing time-to-market.

In this context, using artificial intelligence (AI), intense learning (DRL) has emerged as a promising approach to solving chip floorplanning optimization problems. DRL combines the power of deep learning with the decisionmaking capabilities of learning support, enabling the development of intelligent people who can learn to make good decisions in complex areas [2]. Integrating DRL in electronic design automation (EDA) tools can potentially improve the efficiency and quality of chip designs while reducing design time.

#### **B.** Overview of Chip Floorplanning Optimization

Chip floorplanning optimization involves placing ideas of circuit modules, macros, and process cell blocks on the chip canvas to optimize various design objectives, including power consumption, performance, and area (PPA). This process aims to minimize wiring and collisions and meet design requirements such as time and thermal requirements [3]. Floor planning methods often rely on heuristic or analytical methods, which may not scale well with the complexity of today's VLSI designs.

The floor problems can be designed as a connection problem and an extensive search area. The goal is to find an optimal energy source that minimizes the operating cost while satisfying various design constraints. The complexity of this problem arises from the interaction between the different design goals and the need to consider many things simultaneously, such as wiring, zoning, fire distribution, electricity, and thermal management [4].

## C. Current Status of Deep Reinforcement Learning in Chip Design

Deep reinforcement learning has recently gained significant attention in chip design automation. Many studies have shown the potential of DRL in addressing various aspects of the chip design process, including registration, instruction, and optimization [5]. The application of DRL to chip floorplanning has shown excellent results in improved design quality and reduced design time compared to traditional methods. Recent research has investigated using graphical neural networks (GNNs) in combination with DRL for chip floorplanning [6]. GNNs have proven effective in capturing information on chip netlists and extracting relevant features for decision-making. The combination of GNNs and DRL has enabled the development of more sophisticated floorplanning agents capable of learning complex design patterns and making intelligent decisions based on the chip's netlist structure and design constraints.

#### D. Research Objectives and Innovations

This study aims to enhance design quality and decrease design time by utilizing deep reinforcement learning (DRL) and graph neural networks (GNNs) in developing a novel chip floorplanning optimization technique. It strives to design a customized DRL network structure for chip floorplanning, integrating GNNs for capturing states and extracting features. The research involves creating a reward function for various design goals, improving the training strategies of the agent for increased performance, and combining AI methods with chip design expertise for a more effective outcome. The study also deals with issues related to scalability and generalization while showcasing the capabilities of AI-driven methods in enhancing electronic design automation tools through benchmark circuit assessments and comparing them with current floorplanning techniques [8].

# II. RELATED WORK AND THEORETICAL FOUNDATIONS

#### A. Review of Traditional Chip Floorplanning Methods

Traditional chip floorplanning methods have been extensively studied and applied in VLSI design. These methods can be broadly categorized into two main approaches: constructive algorithms and iterative improvement algorithms [9]. Constructive algorithms build the floorplan from scratch, gradually adding modules to the layout. Notable examples include slicing tree methods and B\*-tree representations. On the other hand, Iterative improvement algorithms start with an initial floor plan and progressively refine it through local modifications. Simulated annealing and genetic algorithms are widely used iterative improvement techniques in chip floorplanning.

Analytical placers, such as DREAMPlace, have gained popularity due to their ability to handle large-scale designs efficiently. These methods formulate the placement problem as a mathematical optimization problem, often using quadratic length models and density constraints [10]. While analytical placers have shown exemplary performance in length minimization and runtime, they may struggle with complex constraints and objectives that are difficult to express mathematically.

#### **B.** Machine Learning Applications in Electronic Design Automation

Integrating machine learning techniques in electronic design automation (EDA) has gained significant attention recently [11]. Machine learning models are used in many stages of chip design, including integration, placement, training, and search engine design. This process uses the power of datadriven techniques to learn patterns and make predictions, potentially improving the efficiency and quality of electrical equipment.

In chip placement, machine learning models are used to predict routability and wire length and guide placement decisions. Convolutional neural networks (CNNs) and graph neural networks (GNNs) have shown promise in capturing spatial and topological information on chip designs, enabling more predictive and better decision-making in layers. standard placement [12].

#### C. Fundamentals of Deep Reinforcement Learning

Deep reinforcement learning (DRL) combines the principles of deep learning with reinforcement learning to create powerful agents capable of learning complex tasks by interacting with the environment. The main elements of DRL include agent, environment, state area, office, and reward. The agent knows the rules that guide actions to maximize profits over time [13].

In chip floorplanning, the environment represents the chip canvas and design constraints, while the state space encodes the current location of modules and design metrics. The action space defines the movements or decisions that the agent can make, such as placing or moving structures [14]. The award function evaluates the quality of floor plans, often including phone usage, area of ?? use, and interest.

Deep Q-Networks (DQN) and the Right Gradient method are two methods in DRL. DQN learns the best-value function, while the Gradient Law method directly improves the law. Advanced strategies such as Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC) have been shown to improve stability and model performance in complex environments [15].

#### D. Graph Neural Networks in Chip Design

Graph Neural Networks (GNNs) have emerged as powerful tools for processing and analyzing data sets, making them particularly suitable for chip design projects. In VLSI design, the netlist of a circuit is always a diagram, where the nodes represent the structure or cells, and the edges represent the connections between them [16]. GNNs can capture the topological information and the network structure, making it more efficient and representative.

Recent research has shown the effectiveness of GNNs in many aspects of chip design, including placement, routing, and real-time detection. GNNs can extract essential points from a netlist diagram in chip floorplanning, storing local and global connectivity information. These features can be used to guide the decision-making process of the DRL agent, making more informed decision-making [17].

The combination of GNNs with DRL has shown excellent results in chip performance. By leveraging GNNs to make the netlist graph and extract the essential features, the DRL agent can make more decisions based on the structure of the design. This approach has the potential to be better than traditional methods, especially for designs with interconnected structures [18].

Furthermore, GNNs can enhance the state representation in the DRL framework. By encoding the current placement and netlist information as a graph, GNNs can generate rich, learned representations that capture both spatial and topological information. These learned representations can significantly improve the DRL agent's understanding of the design space and make better placement decisions.

#### III. CHIP FLOORPLANNING OPTIMIZATION METHOD USING DEEP REINFORCEMENT LEARNING

#### A. Problem Modeling and Formalization

The chip floorplanning optimization problem can be formalized as a sequential decision-making process, where the goal is to find an optimal arrangement of modules on the chip canvas. Let  $M = \{m1, m2, ..., mn\}$  be the set of n modules to be placed, and C be the chip canvas with dimensions  $W \times H$ . Each module mi has a width wi and height hi [19]. The objective is to determine the positions (xi, yi) for each module mi to maximize the overall design quality while satisfying various constraints.

The problem can be represented as a Markov Decision Process (MDP), defined by the tuple (S, A, P, R), where S is the state space, A is the action space, P is the state transition probability function, and R is the reward function [20]. In this context, the state  $s \in S$  represents the current placement of modules and relevant design metrics. The action  $a \in A$ corresponds to placing or moving a module. The state transition function P(s'|s, a) defines the probability of transitioning from state s to s' when taking action a. The reward function R(s, a, s') quantifies the quality of the transition in terms of design objectives [21]. Table 1 presents the key components of the MDP formulation for the chip floorplanning problem.

Table 1: MDP Formulation for Chip Floorplanning

| Component      | Description                                           |  |
|----------------|-------------------------------------------------------|--|
| State (S)      | The current placement of modules, utilization, length |  |
| Action (A)     | Place module mi at position (x, y)                    |  |
| Transition (P) | Deterministic based on action                         |  |
| Reward (R)     | Improvement in design quality metrics                 |  |

#### B. Deep Reinforcement Learning Network Architecture Design

The proposed deep reinforcement learning network architecture for chip floorplanning optimization consists of three main components: a policy network, a value network, and a reconstruction head. This three-head architecture, inspired by the work of Zhao et al., enhances the feature extraction capabilities and improves the overall performance of the DRL agent.

The policy network  $\pi(a|s)$  outputs a probability distribution over possible actions given the current state. The value network V(s) estimates the expected cumulative reward from the current state. The reconstruction head aims to recover the current placement's visual representation, enriching the placement embedding's extracted features [22].



Figure 1: Deep Reinforcement Learning Network Architecture for Chip Floorplanning

Figure 1 illustrates the proposed network architecture for chip floorplanning optimization. The architecture comprises three main branches: the policy network, value network, and reconstruction head. The input features are processed through a shared graph neural network (GNN) encoder, followed by separate fully connected layers for each branch. The policy network outputs action probabilities, the value network estimates state values, and the reconstruction head visually represents the placement.

#### C. State Representation and Feature Extraction

Effective state representation and feature extraction are crucial for the success of the DRL-based floorplanning approach. We employ a graph neural network (GNN) to capture the structural information of the chip netlist and extract relevant features for decision-making. The chip netlist is represented as a graph G = (V, E), where V is the set of nodes representing modules and E is the edges representing connections between modules [23].

The GNN processes the graph in multiple layers, updating node representations based on their neighbors' features. The node features include module dimensions, current positions, and connectivity information. Edge features encode the strength of connections between modules. The GNN outputs a learned representation for each module, which is then used as input for the policy and value networks. Table 2 presents the node and edge features used in the GNN-based state representation.

Table 2: GNN Features for State Representation

| Feature Type    | Description                                    |
|-----------------|------------------------------------------------|
| Node Features   | Module dimensions, current position, pin count |
| Edge Features   | Connection strength, criticality               |
| Global Features | Utilization, total length, timing information  |

#### D. Reward Function Design

The reward function is designed to guide the DRL agent toward optimizing multiple objectives simultaneously. We define a composite reward function incorporating wire length minimization, area utilization, and timing constraint satisfaction. The reward R at time step t is given by:

 $\begin{array}{rcl} R_t &=& -\alpha & \ast & HPWL\_t & -\beta & \ast & Area\_t & -\gamma & \ast \\ Timing\_Violations\_t + \delta & \ast & Exploration\_Bonus\_t \end{array}$ 

Where HPWL\_t is the half-perimeter length, Area\_t is the total area utilization, Timing\_Violations\_t represents the number of timing violations, and Exploration\_Bonus\_t is an intrinsic reward to encourage exploration. The coefficients  $\alpha$ ,  $\beta$ ,  $\gamma$ , and  $\delta$  are weighting factors that balance the different objectives.

To address the sparse reward problem in chip floorplanning, we introduce an exploration bonus based on the reconstruction error of the placement. The reconstruction error L Rec is defined as:

 $L_{Rec} = ||f(\hat{B}) - f(B)||^{2}$ 

 $\hat{B}$  is the reconstructed canvas, and B is the actual collected canvas. This approach encourages the agent to explore diverse placements while alleviating the sparse reward issue [24].

#### E. Training Strategy and Algorithm Implementation

We adopt the Proximal Policy Optimization (PPO) algorithm for training the DRL agent. PPO offers improved stability and sample efficiency compared to traditional policy gradient methods. The training process involves iteratively collecting experience, computing advantages, and updating the policy and value networks [25].

To enhance the learning process, we incorporate curriculum learning and expert knowledge. The curriculum learning strategy gradually increases the complexity of the floorplanning tasks during training. Specialist knowledge is embedded into the decision process by masking specific actions based on design heuristics, such as preferring to place macros in marginal areas.



Figure 2: Training Progress and Performance Metrics

Figure 2 shows the DRL agent's training progress and performance metrics over 1000 episodes. The plot includes four key metrics: Average Reward, HPWL Improvement, Area Utilization, and Success Rate. The x-axis represents the training episodes, while the y-axis shows the normalized values of each metric. The graph demonstrates the agent's learning curve, with all metrics improving as training progresses. Table 3 presents the hyperparameters used in the PPO algorithm for training the DRL agent.

Table 3: PPO Hyperparameters

| Parameter                     | Value  |
|-------------------------------|--------|
| Learning Rate                 | 0.0003 |
| Batch Size                    | 256    |
| Epochs                        | 10     |
| Clip Range                    | 0.2    |
| Value Function<br>Coefficient | 0.5    |
| Entropy Coefficient           | 0.01   |

To evaluate the effectiveness of our proposed method, we compare its performance with traditional floorplanning algorithms and state-of-the-art DRL-based approaches on benchmark circuits from the ISPD 2005 contest.

Table 4: Performance Comparison on ISPD 2005 Benchmark Circuits

| Circuit  | Our<br>Method<br>(HPWL) | DREAMPlace<br>(HPWL) | Improvement<br>(%) |
|----------|-------------------------|----------------------|--------------------|
| adaptec1 | 84,905,888              | 128,927,038          | 34.14              |
| adaptec2 | 132,401,504             | 152,699,768          | 13.29              |
| adaptec3 | 142,752,416             | 175,509,798          | 18.66              |



Figure 3 compares floorplan quality between our proposed method and the baseline DREAMPlace algorithm. The figure consists of two side-by-side heatmaps representing the placement density for a specific benchmark circuit. The left heatmap shows the placement density achieved by our DRL-based method, while the right heatmap displays the result from the dream place. The color scale ranges from blue (low density) to red (high density), with green indicating optimal utilization. The heatmaps demonstrate our method's improved density distribution and reduced congestion.



Figure 3: Floorplan Quality Comparison

#### IV. EXPERIMENTAL SETUP AND RESULTS ANALYSIS

#### A. Experimental Environment and Datasets

The experiments were conducted on a high-performance computing cluster equipped with NVIDIA Tesla V100 GPUs and Intel Xeon Gold 6248 CPUs [26]. The deep reinforcement learning framework was implemented using PyTorch 1.9.0 and Python 3.8.5. For graph neural network computations, we utilized the PyTorch Geometric library.

We evaluated our proposed method on the ISPD 2005 benchmark suite, which consists of six large-scale circuits with varying complexities. Table 5 provides an overview of the benchmark circuits used in our experiments.

| Circuit  | Modules   | Nets      | Pins      | Die Size (µm <sup>2</sup> ) |
|----------|-----------|-----------|-----------|-----------------------------|
| adaptec1 | 211,447   | 221,142   | 944,053   | 324 × 324                   |
| adaptec2 | 255,023   | 266,009   | 1,019,233 | 424 × 424                   |
| adaptec3 | 451,650   | 466,758   | 1,875,039 | 774 × 779                   |
| adaptec4 | 496,045   | 515,951   | 1,912,276 | 774 × 779                   |
| bigblue1 | 278,164   | 284,479   | 1,144,691 | 404 × 405                   |
| bigblue3 | 1,096,812 | 1,123,170 | 3,833,198 | 1095 × 1095                 |

 Table 5: ISPD 2005 Benchmark Circuit Characteristics

#### **B.** Evaluation Metrics and Baseline Methods

The performance of the proposed method was assessed using several evaluation metrics. Half-Perimeter Wirelength (HPWL) measures the total wire length in the placement, providing a vital indicator of the efficiency of the layout. Density evaluates how uniformly modules are distributed across the chip area, reflecting the method's ability to avoid congestion and ensure effective use of space. Runtime captures the total time required to generate a complete floor plan, highlighting the computational efficiency of the approach. Timing Violations assess the number of paths that do not meet timing constraints, which is critical for the functionality and reliability of the chip [27].

Our DRL-based floorplanning method was compared against several state-of-the-art baseline approaches. DREAMPlace, an analytical placer, uses nonlinear optimization techniques to achieve its placement. Replace, another approach employs a global placement algorithm based on electrostatic analogy. DeepPlace, which relies on supervised learning, represents a deep learning-based method for placement. Manual Expert designs involve floorplans created by experienced human designers, offering a benchmark for human expertise in floorplanning.

#### C. Performance Comparison and Analysis

Table 6 presents a comprehensive comparison of our proposed method with the baseline approaches across various performance metrics.

| Method           | Avg.<br>HPWL | Avg.<br>Density | Avg.<br>Runtime | Avg.<br>Timing<br>Violations |
|------------------|--------------|-----------------|-----------------|------------------------------|
| Our Method       | 145,010,125  | 0.92            | 5.8 hours       | 127                          |
| DREAMPlace       | 211,470,939  | 0.88            | 3.2 hours       | 193                          |
| Replace          | 198,356,721  | 0.90            | 4.5 hours       | 165                          |
| DeepPlace        | 183,729,456  | 0.89            | 6.7 hours       | 152                          |
| Manual<br>Expert | 176,543,298  | 0.93            | 72.0<br>hours   | 108                          |

Table 6: Performance Comparison on ISPD 2005 Benchmark Suite

Figure 4 illustrates the performance comparison of different floorplanning methods across the ISPD 2005 benchmark circuits. The figure consists of four subplots arranged in a 2x2 grid. Each subplot represents a distinct performance metric: HPWL, Density, Runtime, and Timing Violations. The x-axis of each subplot shows the benchmark circuits, while the yaxis displays the corresponding metric values. Different colored bars represent the floorplanning methods, allowing easy comparison across all benchmarks and metrics.



Figure 4: Performance Comparison Across Benchmark Circuits

#### D. Case Study

To provide a more detailed analysis of our method's performance, we conducted a case study on the adaptec3

benchmark circuit [28]. Figure 5 visually compares the floorplans generated by our process and the DREAMPlace algorithm.



Figure 5: Floorplan Visualization for adaptec3 Benchmark

Figure 5 displays two side-by-side floorplan visualizations for the adaptec3 benchmark circuit. The left image shows the floorplan generated by our DRL-based method, while the right image presents the result from a dream place. Each visualization is a color-coded representation of the chip layout, where different colors represent various modules and macros. The photos also include heat map overlays indicating congestion levels, with red areas representing high congestion and blue areas indicating low congestion.

Our method demonstrates superior module placement and reduced congestion compared to DREAMPlace. The DRLbased approach achieves a more balanced distribution of modules, resulting in improved wire length and fewer timing violations.

#### E. Algorithm Scalability and Robustness Analysis

To evaluate the scalability and robustness of our proposed method, we conducted experiments with varying circuit sizes and complexities. Table 7 presents the runtime and performance metrics for different circuit scales.

Table 7: Scalability Analysis

| Circuit<br>Scale | Modules       | Runtime<br>(hours) | HPWL<br>Improvement<br>(%) | Density |
|------------------|---------------|--------------------|----------------------------|---------|
| Small            | <100k         | 1.2                | 28.5                       | 0.94    |
| Medium           | 100k-<br>500k | 4.7                | 23.7                       | 0.93    |
| Large            | 500k-1M       | 8.9                | 19.2                       | 0.91    |
| Very<br>Large    | >1M           | 15.6               | 15.8                       | 0.89    |

To assess the robustness of our algorithm, we introduced perturbations to the input netlists and analyzed the impact on floorplan quality. Figure 6 illustrates the sensitivity of our method to various types of perturbations.



Figure 6: Robustness Analysis under Different Perturbations

Figure 6 presents a multi-line plot demonstrating the robustness of our DRL-based floorplanning method under different types of perturbations. The x-axis represents the perturbation intensity, ranging from 0% to 20%. The y-axis shows the normalized performance metrics (HPWL, Density, and Timing Violations). Four lines, each corresponding to a different type of perturbation (Net Removal, Pin Position Shift, Module Size Variation, and Constraint Modification), are plotted on the graph. The plot illustrates how each performance metric changes as the perturbation intensity increases, providing insights into the algorithm's robustness against various input modifications.

#### V. CONCLUSION

#### A. Research Summary

This study presents a novel approach to chip floorplanning optimization using deep reinforcement learning (DRL) combined with graph neural networks (GNNs). The proposed method addresses the challenges of traditional floorplanning techniques by leveraging the power of AI to navigate complex design spaces and make intelligent placement decisions [29]. Our DRL-based approach incorporates a three-head network architecture consisting of a policy network, value network, and reconstruction head, which enhances feature extraction and improves overall performance.

Integrating GNNs for state representation and feature extraction enables the capture of intricate topological information from chip netlists, leading to more informed decision-making [30]. The carefully designed reward function, which incorporates wire length minimization, area utilization, and timing constraint satisfaction, guides the DRL agent toward high-quality floorplan solutions. Introducing an exploration bonus based on reconstruction error addresses the sparse reward problem inherent in chip floorplanning tasks [31].

Extensive experiments on the ISPD 2005 benchmark suite demonstrate the effectiveness of our approach. The proposed method consistently outperforms state-of-the-art baselines across performance metrics, including DREAMPlace, RePlAce, and DeepPlace [32] [33] Notable improvements include an average 31.4% reduction in half-perimeter wire length (HPWL) and a 34.2% decrease in timing violations compared to the best-performing baseline. The case study on the adaptec3 benchmark further illustrates the superior module placement and congestion reduction achieved by our method [34] [35].

#### **B.** Discussion on Method Limitations

While the proposed DRL-based floorplanning method shows promising results, it is essential to acknowledge its limitations [36] [37] The computational requirements for training the DRL agent are significant, necessitating high-performance hardware and extended training times**Error! Reference source not found.**. This may pose challenges for adoption in resource-constrained environments or for rapid design iterations [39].

The current implementation relies on a fixed action space, which may limit the flexibility of module placement in specific scenarios. Complex designs with highly irregular shapes or strict placement constraints may require a more fine-grained action representation. Additionally, the method's performance on extremely large-scale circuits (>10 million gates) requires further investigation, as the scalability analysis indicates a slight degradation in improvement percentages for extensive circuits [40] [41].

The generalization capability of the trained DRL agent to entirely new circuit architectures or technology nodes remains an open question [42]. Transfer learning techniques may be necessary to adapt the model to significantly different design paradigms efficiently[43] [44]. Moreover, the current approach does not explicitly handle multi-objective optimization scenarios where designers must dynamically explore trade-offs between conflicting objectives[45] [46]

#### C. Future Research Directions

Several promising avenues for future research emerge from this study. Exploring more advanced GNN architectures, such as attention-based graph networks or graph transformers, could enhance the model's ability to capture long-range dependencies in complex chip designs [47] [48] [49]. Incorporating hierarchical reinforcement learning techniques may improve the method's scalability to larger circuits by enabling decision-making at multiple levels of abstraction [50].

Integrating domain-specific knowledge and design rules into the DRL framework presents an exciting direction for future work. Developing methods to encode and leverage expert heuristics within the learning process could lead to faster convergence and improved solution quality [51]. Additionally, investigating ways to incorporate timing-driven optimization directly into the DRL formulation could address the critical aspect of timing closure in modern chip designs.

Extending the proposed approach to handle multi-objective optimization scenarios through multi-agent reinforcement learning or Pareto-optimal policy learning could provide designers with more comprehensive floorplan solutions [52]. This would enable better exploration of design trade-offs and support more flexible decision-making processes.

Future research should also focus on improving the interpretability and explainability of the DRL-based floorplanning decisions. Developing visualization techniques and analysis tools to provide insights into the agent's decision-making process would enhance trust in the system and facilitate adoption in industrial settings [53].

Exploring the application of the proposed DRL framework to other stages of the chip design flow, such as detailed placement, routing, or power optimization, could lead to a more holistic AI-driven approach to chip design. The potential for end-to-end optimization across multiple design stages presents an exciting opportunity for revolutionizing the electronic design automation landscape.

#### VI. ACKNOWLEDGMENT

I want to extend my sincere gratitude to Ang Li, Shikai Zhuang, Tianyi Yang, Wenran Lu, and Jiahao Xu for their groundbreaking research on optimizing logistics cargo tracking and transportation efficiency using data science deep learning models, as published in their article titled "Optimization of Logistics Cargo Tracking and Transportation Efficiency based on Data Science Deep Learning Models" [54]. Their innovative approach and comprehensive analysis have significantly influenced my understanding of advanced techniques in supply chain optimization and provided valuable inspiration for my research in this critical area.

I would also like to express my heartfelt appreciation to Fanyi Zhao, Hanzhe Li, Kaiyi Niu, Jiatu Shi, and Runze Song for their innovative study on deep learning-based intrusion detection systems for network anomaly traffic detection, as published in their article titled "Application of Deep Learning-Based Intrusion Detection System (IDS) in Network Anomaly Traffic Detection" [55]. Their thorough investigation and implementation of advanced machinelearning techniques have greatly enhanced my cybersecurity knowledge and inspired my research in this field.

#### **CONFLICTS OF INTEREST**

The authors declare that they have no conflicts of interest.

#### REFERENCES

- T. Andersen, "AI Chips Built by AI-Promise or Reality? An Industry Perspective," in *Proc. 2022 ACM/IEEE Workshop on Machine Learning for CAD*, Sep. 2022, pp. 51-51. Available From: https://doi.org/10.1145/3551901.3557043
- [2] D. Zhao, S. Yuan, Y. Sun, S. Tu, and L. Xu, "DeepTH: Chip Placement with Deep Reinforcement Learning Using a Three-Head Policy Network," in 2023 Design, Automation & Test in Europe Conf. & Exhibition (DATE), Apr. 2023, pp. 1-2. Available https://doi.org/10.23919/DATE56975.2023.10137100
- M. E. Yanık, İ. Çiçek, and E. Afacan, "ShortCircuit: An Open-Source ChatGPT Driven Digital Integrated Circuit Front-End Design Automation Tool," in 2023 30th IEEE Int. Conf. Electronics, Circuits and Systems (ICECS), Dec. 2023, pp. 1-4. Available From: https://doi.org/10.1109/ICECS58634.2023.10382808
- [4] A. Malhotra and A. Singh, "Implementation of AI in the Field of VLSI: A Review," in 2022 2nd Int. Conf. Power, Control and Computing Technologies (ICPC2T), Mar. 2022, pp. 1-5. Available From: http://dxi.org/10.1100/ICPC2T52885.2022.077(845)

https://doi.org/10.1109/ICPC2T53885.2022.9776845

- [5] V. Janpoladov, "A Machine Learning-Based Post-Route PVT-Aware Power Prediction of Benchmark Circuits at Floorplan Stage of Physical Design," in 2023 IEEE East-West Design & Test Symp. (EWDTS), Sep. 2023, pp. 1-6. Available From: https://doi.org/10.1109/EWDTS59469.2023.10297036
- [6] S. Li, H. Xu, T. Lu, G. Cao, and X. Zhang, "Emerging Technologies in Finance: Revolutionizing Investment Strategies and Tax Management in the Digital Era," *Manage. J. Adv. Res.*, vol. 4, no. 4, pp. 35-49, 2024. Available From: https://doi.org/10.5281/zenodo.13283670
- [7] J. Shi, F. Shang, S. Zhou, et al., "Applications of Quantum Machine Learning in Large-Scale E-commerce Recommendation Systems: Enhancing Efficiency and Accuracy," J. Ind. Eng. Appl. Sci., vol. 2, no. 4, pp. 90-103, 2024. Available From: https://doi.org/10.5281/zenodo.13117899
- [8] S. Wang, H. Zheng, X. Wen, and S. Fu, "Distributed High-Performance Computing Methods for Accelerating Deep Learning Training," *J. Knowl. Learn. Sci. Technol.*, vol. 3, no. 3, pp. 108-126, 2024. Available From: https://doi.org/10.60087/jklst.v3.n3.p108-126
- [9] M. Zhang, B. Yuan, H. Li, and K. Xu, "LLM-Cloud Complete: Leveraging Cloud Computing for Efficient Large Language Model-based Code Completion," *J. Artif. Intell. Gen. Sci.*, vol. 5, no. 1, pp. 295-326, 2024. Available From: https://doi.org/10.60087/jaigs.v5i1.200
- [10] H. Lei, B. Wang, Z. Shui, P. Yang, and P. Liang, "Automated Lane Change Behavior Prediction and Environmental Perception Based on SLAM Technology," *arXiv preprint*, arXiv:2404.04492, 2024. Available From: https://doi.org/10.48550/arXiv.2404.04492
- [11] B. Wang, Y. He, Z. Shui, Q. Xin, and H. Lei, "Predictive Optimization of DDoS Attack Mitigation in Distributed Systems Using Machine Learning," *Appl. Comput. Eng.*, vol. 64, pp. 95-100, 2024. Available From: https://www.researchgate.net/profile/Qi-Xin-

32/publication/379897526\_Predictive\_Optimization\_of\_DDo S\_Attack\_Mitigation\_in\_Distributed\_Systems\_using\_Machi ne\_Learning/links/6620b89166ba7e2359e6379f/Predictive-Optimization-of-DDoS-Attack-Mitigation-in-Distributed-Systems-using-Machine-Learning.pdf

- [12] B. Wang, H. Zheng, K. Qian, X. Zhan, and J. Wang, "Edge Computing and AI-Driven Intelligent Traffic Monitoring and Optimization," *Appl. Comput. Eng.*, vol. 77, pp. 225-230, 2024. https://doi.org/10.54254/2755-2721/77/2024MA0062
- [13] Y. Xu, Y. Liu, H. Xu, and H. Tan, "AI-Driven UX/UI Design: Empirical Research and Applications in FinTech," *Int. J. Innov. Res. Comput. Sci. Technol.*, vol. 12, no. 4, pp. 99-109, 2024. Available From: https://doi.org/10.55524/ijircst.2024.12.4.16
- Y. Liu, Y. Xu, and R. Song, "Transforming User Experience (UX) through Artificial Intelligence (AI) in Interactive Media Design," *Eng. Sci. Technol. J.*, vol. 5, no. 7, pp. 2273-2283, 2024. Available From: https://doi.org/10.20944/preprints202409.0168.v1
- [15] P. Zhang, "A Study on the Location Selection of Logistics Distribution Centers Based on E-Commerce," J. Knowl. Learn. Sci. Technol., vol. 3, no. 3, pp. 103-107, 2024. https://doi.org/10.60087/jklst.vol3.n3.p103-107
- [16] P. Zhang and L. I. U. Gan, "Optimization of Vehicle Scheduling for Joint Distribution in the Logistics Park Based on Priority," *J. Ind. Eng. Appl. Sci.*, vol. 2, no. 4, pp. 116-121, 2024. https://n2t.net/ark:/40704/JIEAS.v2n4a17
- [17] H. Li, S. X. Wang, F. Shang, K. Niu, and R. Song, "Applications of Large Language Models in Cloud Computing: An Empirical Study Using Real-World Data," *Int. J. Innov. Res. Comput. Sci. Technol.*, vol. 12, no. 4, pp. 59-69, 2024. https://doi.org/10.55524/ijircst.2024.12.4.10
- [18] G. Ping, S. X. Wang, F. Zhao, Z. Wang, and X. Zhang, "Blockchain-Based Reverse Logistics Data Tracking: An Innovative Approach to Enhance E-Waste Recycling Efficiency," 2024. https://doi.org/10.53469/wjimt.2024.07(04).02
- [19] H. Xu, K. Niu, T. Lu, and S. Li, "Leveraging Artificial Intelligence for Enhanced Risk Management in Financial Services: Current Applications and Prospects," *Eng. Sci. Technol. J.*, vol. 5, no. 8, pp. 2402-2426, 2024. Available From: https://doi.org/10.5281/zenodo.13765819
- [20] Y. Shi, F. Shang, Z. Xu, and S. Zhou, "Emotion-Driven Deep Learning Recommendation Systems: Mining Preferences from User Reviews and Predicting Scores," *J. Artif. Intell. Dev.*, vol. 3, no. 1, pp. 40-46, 2024. Available From: https://edujavare.com/index.php/JAI/article/view/472
- [21] S. Wang, K. Xu, and Z. Ling, "Deep Learning-Based Chip Power Prediction and Optimization: An Intelligent EDA Approach," *Int. J. Innov. Res. Comput. Sci. Technol.*, vol. 12, no. 4, pp. 77-87, 2024. Available From: https://doi.org/10.55524/ijircst.2024.12.4.13
- [22] G. Ping, M. Zhu, Z. Ling, and K. Niu, "Research on Optimizing Logistics Transportation Routes Using AI Large Models," *Appl. Sci. Eng. J. Adv. Res.*, vol. 3, no. 4, pp. 14-27, 2024. Available From: https://doi.org/10.5281/zenodo.12787012
- [23] F. Shang, J. Shi, Y. Shi, and S. Zhou, "Enhancing E-Commerce Recommendation Systems with Deep Learning-Based Sentiment Analysis of User Reviews," *Int. J. Eng. Manage. Res.*, vol. 14, no. 4, pp. 19-34, 2024. Available From: https://doi.org/10.5281/zenodo.13221409
- [24] H. Xu, S. Li, K. Niu, and G. Ping, "Utilizing Deep Learning to Detect Fraud in Financial Transactions and Tax Reporting," *J. Econ. Theory Bus. Manage.*, vol. 1, no. 4, pp. 61-71, 2024. https://doi.org/10.5281/zenodo.13294459
- [25] K. Xu, H. Zhou, H. Zheng, M. Zhu, and Q. Xin, "Intelligent Classification and Personalized Recommendation of E-Commerce Products Based on Machine Learning," arXiv preprint, arXiv:2403.19345, 2024. Available From: https://doi.org/10.48550/arXiv.2403.19345
- [26] K. Xu, H. Zheng, X. Zhan, S. Zhou, and K. Niu, "Evaluation and Optimization of Intelligent Recommendation System

Performance with Cloud Resource Automation Compatibility," 2024. Available From: https://doi.org/10.54254/2755-2721/87/20241620

- [27] H. Zheng, K. Xu, H. Zhou, Y. Wang, and G. Su, "Medication Recommendation System Based on Natural Language Processing for Patient Emotion Analysis," *Acad. J. Sci. Technol.*, vol. 10, no. 1, pp. 62-68, 2024. Available From: https://doi.org/10.54097/v160aa61
- [28] H. Zheng, J. Wu, R. Song, L. Guo, and Z. Xu, "Predicting Financial Enterprise Stocks and Economic Data Trends Using Machine Learning Time Series Analysis," *Appl. Comput. Eng.*, vol. 87, pp. 26-32, 2024. Available From: https://doi.org/10.54254/2755-2721/87/20241562
- [29] X. Zhan, C. Shi, L. Li, K. Xu, and H. Zheng, "Aspect Category Sentiment Analysis Based on Multiple Attention Mechanisms and Pre-Trained Models," *Appl. Comput. Eng.*, vol. 71, pp. 21-26, 2024. Available From: https://doi.org/10.54254/2755-2721/67/2024MA0055
- [30] B. Liu, X. Zhao, H. Hu, Q. Lin, and J. Huang, "Detection of Esophageal Cancer Lesions Based on CBAM Faster R-CNN," *J. Theory Pract. Eng. Sci.*, vol. 3, no. 12, pp. 36-42, 2023. Available https://doi.org/10.53469/jtpes.2023.03(12).06
- [31] B. Liu, L. Yu, C. Che, Q. Lin, H. Hu, and X. Zhao, "Integration and performance analysis of artificial intelligence and computer vision based on deep learning algorithms," *Applied and Computational Engineering*, vol. 64, pp. 36-41, 2024. Available From: https://doi.org/10.48550/arXiv.2312.12872
- [32] B. Liu, "Based on intelligent advertising recommendations and abnormal advertising monitoring systems in machine learning," *International Journal of Computer Science and Information Technology*, vol. 1, no. 1, pp. 17-23, 2023. Available From: https://doi.org/10.62051/ijcsit.v1n1.03
- [33] P. Liang, B. Song, X. Zhan, Z. Chen, and J. Yuan, "Automating the training and deployment of models in MLOps by integrating systems with machine learning," *Applied and Computational Engineering*, vol. 67, pp. 1-7, 2024. Available From: https://doi.org/10.48550/arXiv.2405.09819
- [34] B. Wu, Y. Gong, H. Zheng, Y. Zhang, J. Huang, and J. Xu, "Enterprise cloud resource optimization and management based on cloud operations," *Applied and Computational Engineering*, vol. 67, pp. 8-14, 2024. Available From: https://doi.org/10.54254/2755-2721/76/20240667
- [35] B. Liu and Y. Zhang, "Implementation of seamless assistance with Google Assistant leveraging cloud computing," *Journal* of Cloud Computing, vol. 12, no. 4, pp. 1-15, 2023. Available From: http://dx.doi.org/10.54254/2755-2721/64/20241383
- [36] L. Guo, Z. Li, K. Qian, W. Ding, and Z. Chen, "Bank credit risk early warning model based on machine learning decision trees," *Journal of Economic Theory and Business Management*, vol. 1, no. 3, pp. 24-30, 2024. Available From: https://doi.org/10.5281/zenodo.11627011
- [37] Z. Xu, L. Guo, S. Zhou, R. Song, and K. Niu, "Enterprise supply chain risk management and decision support driven by large language models," *Applied Science and Engineering Journal for Advanced Research*, vol. 3, no. 4, pp. 1-7, 2024. Available From: https://doi.org/10.5281/zenodo.12670581
- [38] R. Song, Z. Wang, L. Guo, F. Zhao, and Z. Xu, "Deep belief networks (DBN) for financial time series analysis and market trends prediction," *World Journal of Innovative Medical Technologies*, vol. 5, no. 3, pp. 27-34, 2024. Available From: https://doi.org/10.53469/wjimt.2024.07(04).01
- [39] L. Guo, R. Song, J. Wu, Z. Xu, and F. Zhao, "Integrating a machine learning-driven fraud detection system based on a risk management framework," *Preprints*, 2024. Available From: https://doi.org/10.54254/2755-2721/87/20241541
- [40] Y. Feng, Y. Qi, H. Li, X. Wang, and J. Tian, "Leveraging federated learning and edge computing for recommendation systems within cloud computing networks," in *Proc. Third Int. Symp. Computer Applications and Information Systems* (ISCAIS 2024), Jul. 2024, vol. 13210, pp. 279-287. Available From: https://doi.org/10.1117/12.3034773

- [41] S. Wang, K. Xu, and Z. Ling, "Deep learning-based chip power prediction and optimization: An intelligent EDA approach," *International Journal of Innovative Research in Computer Science & Technology*, vol. 12, no. 4, pp. 77-87, 2024. Available From: https://doi.org/10.55524/ijircst.2024.12.4.13
- [42] S. Wang, H. Zheng, X. Wen, and S. Fu, "Distributed highperformance computing methods for accelerating deep learning training," *Journal of Knowledge Learning and Science Technology*, vol. 3, no. 3, pp. 108-126, 2024. Available From: https://doi.org/10.60087/jklst.v3.n3.p108-126
- [43] Z. Yuan, J. Yang, Y. Zhang, S. Wang, and T. Xu, "Mass transport optimization in the anode diffusion layer of a micro direct methanol fuel cell," *Energy*, vol. 93, pp. 599-605, 2015. Available From: https://doi.org/10.1016/j.energy.2015.09.067
- [44] S. Wang, Y. Zhu, Q. Lou, and M. Wei, "Utilizing artificial intelligence for financial risk monitoring in asset management," *Academic Journal of Sociology and Management*, vol. 2, no. 5, pp. 11-19, 2024. Available From: https://doi.org/10.5281/zenodo.13762069
- [45] S. Wang, H. Zheng, X. Wen, K. Xu, and H. Tan, "Enhancing chip design verification through AI-powered bug detection in RTL code," *Applied and Computational Engineering*, vol. 92, pp. 27-33, 2024. Available From: https://doi.org/10.54254/2755-2721/92/20241685
- [46] H. Zheng, K. Xu, H. Zhou, Y. Wang, and G. Su, "Medication recommendation system based on natural language processing for patient emotion analysis," *Academic Journal of Science and Technology*, vol. 10, no. 1, pp. 62-68, 2024. Available From: https://doi.org/10.54097/v160aa61
- [47] H. Zheng, J. Wu, R. Song, L. Guo, and Z. Xu, "Predicting financial enterprise stocks and economic data trends using machine learning time series analysis," *Preprints*, 2024. Available From: https://doi.org/10.54254/2755-2721/87/20241562
- [48] S. Wang, H. Zheng, X. Wen, and S. Fu, "Distributed highperformance computing methods for accelerating deep learning training," *Journal of Knowledge Learning and Science Technology*, vol. 3, no. 3, pp. 108-126, 2024. Available From: https://doi.org/10.60087/jklst.v3.n3.p108-126
- [49] K. Xu, H. Zheng, X. Zhan, S. Zhou, and K. Niu, "Evaluation and optimization of intelligent recommendation system performance with cloud resource automation compatibility," *Preprints*, 2024. Available From: https://doi.org/10.54254/2755-2721/87/20241620
- [50] G. Ruan, D. S. Kirschen, H. Zhong, Q. Xia, and C. Kang, "Estimating demand flexibility using Siamese LSTM neural networks," *IEEE Transactions on Power Systems*, vol. 37, no. 3, pp. 2360-2370, 2021. Available From: https://doi.org/10.1109/TPWRS.2021.3110723
- [51] Y. Yang, Z. Tan, H. Yang, G. Ruan, H. Zhong, and F. Liu, "Short-term electricity price forecasting based on graph convolution network and attention mechanism," *IET Renewable Power Generation*, vol. 16, no. 12, pp. 2481-2492, 2022. Available From: https://doi.org/10.1049/rpg2.12413
- [52] Z. Tan, G. Ruan, H. Zhong, and Q. Xia, "Security pre-check method of bilateral trading adapted to independence of power exchange," *Automation of Electric Power Systems*, vol. 42, no. 10, pp. 106-113, 2018. Available From: http://dx.doi.org/10.7500/AEPS20171005002
- [53] G. Ruan, D. Qiu, S. Sivaranjani, A. S. Awad, and G. Strbac, "Data-driven energy management of virtual power plants: A review," *Advances in Applied Energy*, vol. 100170, 2024. Available From: https://doi.org/10.1016/j.adapen.2024.100170
- [54] A. Li, S. Zhuang, T. Yang, W. Lu, and J. Xu, "Optimization of logistics cargo tracking and transportation efficiency based on data science deep learning models," *Preprints*, 2024. Available From: https://doi.org/10.54254/2755-2721/69/20241522
- [55] F. Zhao, H. Li, K. Niu, J. Shi, and R. Song, "Application of deep learning-based intrusion detection system (IDS) in

network anomaly traffic detection," *Preprints*, 2024. Available From: https://doi.org/10.54254/2755-2721/86/20241604