Chinese chips
  • Semiconductor giant NXP plans to adjust its production line
    Semiconductor giant NXP plans to adjust its production line
    Recently, it was reported that Half NXP plans to close four 8-inch wafer fabs, one of which is located in Nijmegen, the Netherlands, and the other three are within the United States. As another key location of NXP in the Netherlands besides its headquarters in Eindhoven, Nijmegen's business includes manufacturing, research and development, testing, technology enablement, and support functions, playing an important role in the process of introducing new products. Behind this, NXP plans to transition production to a new 12 inch wafer fab: even without considering edge loss, the production of 12 inch monocrystalline wafers is 2.25 times that of 8-inch wafers, which means lower fixed and manufacturing costs and higher profits. Therefore, NXP plans to close the four wafer fabs mentioned above in the next 10 years. In addition, the 12 inch wafer fab built by NXP and the world's leading joint venture VSMC in Singapore will begin mass production in 2027, which will help reduce the risk of NXP's capacity building. This factory focuses on the production of mixed signal, power management, and analog chips from 130nm to 40nm. It is expected to achieve a monthly production scale of 55000 wafers by 2029, becoming an important manufacturing hub for NXP in the Asia Pacific region. NXP's strategic adjustment is not an isolated case, but a microcosm of the global semiconductor industry upgrading. With the explosive growth of demand for AI and data centers, it has driven the market towards more efficient and lower cost manufacturing technologies. According to SEMI's statistics, it is expected that 82 new 12 inch chip facilities and production lines will be built globally between 2023 and 2026. By 2026, the production capacity of 12 inch wafer fabs will increase to 9.6 million wafers per month. According to relevant data, 12 inch wafers account for about 65% of the total semiconductor wafer shipments, while 8-inch wafers account for about 20%, with the remaining portion mainly consisting of smaller sized wafers. Dr. Li Wei, Executive Vice President of Shanghai Silicon Industry, believes that 2024 may be a turning point for the exit of 8-inch silicon wafers from the historical stage. Because the integrated circuit industry tends to eliminate outdated production capacity technologies during industrial adjustments. Industry analysis suggests that NXP's 12 inch transformation is the result of a combination of technological iteration, market demand, and industry competition. Despite facing challenges such as equipment costs and process complexity, it is gradually building a composite production capacity system that covers advanced and mature processes through joint ventures, contract manufacturing, and other models. However, it needs to find a new balance between technological breakthroughs, cost control, and regional layout.
    - June 11, 2025
  • 17.2 billion yuan! The semiconductor giant just announced
    17.2 billion yuan! The semiconductor giant just announced
    Qualcomm agreed to acquire British semiconductor company Alphawave IP Group on Monday. Qualcomm said the enterprise value of the transaction is approximately US$2.4 billion (approximately RMB 17.2 billion). Under the terms of the acquisition, each AlphaWave shareholder will be entitled to receive $2.48 in cash for each share of AlphaWave stock. AlphaWave said the board of directors unanimously recommended that AlphaWave shareholders vote in favor of the plan. It is reported that after two months of negotiations, Alphawave agreed to accept Qualcomm's $2.4 billion acquisition offer. The price is equivalent to 183 pence per share (approximately RMB 16.07), a 96% premium over the company's closing price of 93.50 pence (approximately RMB 9.09) on March 31 (the day before Qualcomm announced its acquisition intention). Alphawave focuses on developing high-speed semiconductors and connection technologies for data centers and artificial intelligence applications. Alphawave designs and licenses semiconductor technology for data centers, networks, and storage. Its "serializer/deserializer" (SerDes) technology attracted acquisition interest from Qualcomm and SoftBank's chip technology provider Arm in early April. But according to previous reports in April, Arm withdrew after preliminary discussions with Alphawave. SerDes technology is an indispensable part of artificial intelligence applications. Chatbots like ChatGPT usually require thousands of chips to work together to ensure smooth operation. As one of Broadcom's core competitive advantages, SerDes is a key factor in winning AI customers such as Google and OpenAI.
    - June 09, 2025
  • TI plans to increase prices for some product lines
    TI plans to increase prices for some product lines
    TI plans to increase prices for some product lines, effective from June 15th. The average price increase this time is over 10%, with some part numbers experiencing a price increase of 40-70% or more. The price increase materials are mainly concentrated in three types of products: low profit margins, old part numbers, and promised quantities that have not been met. This is a global price increase, not only for the China region. The price increase in the China region is mainly for low profit margin products, involving part numbers such as operational amplifiers and interface ADCs.
    - June 06, 2025
  • Chipanalog CA-PM4644BA four-channel fully integrated multi-phase DCDC micromodule
    Chipanalog CA-PM4644BA four-channel fully integrated multi-phase DCDC micromodule
    In the era of high-density integration of digital circuits, the efficiency, flexibility and reliability of power supply systems have become core challenges. Chipanalog has launched a new CA-PM4644BA wide voltage input four-channel DC/DC buck converter module, which provides high-precision power supply solutions for FPGA, communication storage and other scenarios with three advantages: multiple outputs, flexible expansion and ultra-high integration. 01 Product Overview   CA-PM4644BA is a step-down DC/DC converter with wide input voltage and 4A output for all four channels. The four channels can also be used in parallel to provide a maximum output current of 16A. The device adopts BGA77 package, and integrates switch control circuit, power MOSFET, power inductor, decoupling capacitor and other circuit components. Therefore, only a few external components are required, such as input capacitor, output capacitor, feedback resistor, etc., to form a complete step-down four-way DC/DC regulator. The input voltage range of CA-PM4644BA is 4V~15V, and the output voltage can be set in the range of 0.6V~5.5V by changing the feedback external resistor. CA-PM4644BA application circuit is often used as a load power supply, which can provide high-precision voltages of different specifications such as 1.0V, 1.2V, 1.5V, 1.8V, 3.3V, 5V, etc. for digital circuits in the whole system, such as FPGA control circuits, motherboards and CPUs, communication storage and other circuits, and provide a maximum output current of 4A. CA-PM4644BA can also flexibly configure 4 channels for parallel use, continuously providing output currents of up to 8A (two-phase parallel) and 16A (four-phase parallel).   02 Features   Multiple outputs, flexible expansion: one "core" meets the needs of multiple scenarios Four-channel independent power supply: single-channel 4A output capacity, four channels can independently power different loads (such as 1.8V/3.3V/5V multi-voltage requirements). Parallel output up to 16A: Four channels can be flexibly connected in parallel, supporting 8A (two-phase) and 16A (four-phase) high current output, suitable for high-power core power supply such as CPU/GPU. Wide voltage coverage: input 4V-15V, output 0.6V-5.5V adjustable, accurately matching the power supply requirements of digital circuits. Highly integrated design: Simplify the system and save space. BGA77 package (9mm×15mm×5.01mm), internally integrated switching circuits, MOSFET, inductors, capacitors and other core components, only a small amount of resistors and capacitors are required on the periphery to work. Reduce PCB area occupation and help high-density circuit design. Efficient and stable: A reliable choice in harsh environments. Efficiency is as high as 95% (5V input, 3.3V/1A output), reducing system power consumption and temperature rise. ±1.5% output voltage accuracy, combined with COT control mode, to achieve fast dynamic response and low ripple. -40℃~125℃ wide temperature operation, supporting industrial and automotive environment applications. Multiple protections: Built-in input overvoltage protection, output overcurrent/overvoltage protection, support soft start and temperature monitoring functions to protect the system from damage to the equipment under abnormal conditions.   03 Typical application scenarios FPGA/ASIC power supply: provides 1.0V/1.2V low voltage, high-precision power supply for multi-core processors and logic units. Communication base station and server: Supports multi-channel power management for 5G base station BBU and data center storage modules.
    - June 04, 2025
  • Chip giant, heading to India
    Chip giant, heading to India
      In recent years, in the current context of the global semiconductor industry's anti globalization wave and geopolitical games, India is rising at an impressive speed as the core coordinate of the strategic layout of international chip giants. From Renesas Electronics announcing the launch of 3nm advanced process research and development in India, to Texas Instruments settling its smallest MCU design team in Bangalore, to Foxconn partnering with HCL to build a semiconductor packaging base A 'India fever' is unfolding across the entire industry chain of chip design, manufacturing, and packaging. Indian semiconductor, lively now Renesas 3nm, strong entry into India On May 13, 2025, Japanese semiconductor giant Renesas Electronics launched two 3nm chip design centers in Noida and Bangalore, India. This is India's first 3nm chip design project and marks a crucial step in its semiconductor ambitions. Renesas 3nm Design Center focuses on the research and development of automotive grade and high-performance computing chips, with plans for mass production in the second half of 2027. The project has received strong support from the Indian government, with over 270 academic institutions receiving EDA software and learning kits for engineer training. Renesas plans to increase its workforce in India to 1000 by the end of 2025 and collaborate with over 250 academic institutions and startups through its "Semiconductor Program" and "Production Linked Incentive Program (PLI)". In the manufacturing process, Renesas, together with India's CG Power and Thailand's Star Microelectronics, has invested 76 billion rupees (approximately 920 million US dollars) in Gujarat to build an outsourcing packaging and testing plant, focusing on defense and space chip packaging. It collaborates with Tata Group's 28nm wafer fab to build a "design manufacture package" full industry chain. Renesas focuses on end-to-end capability expansion and hopes to obtain a 50% financial subsidy through cooperation with the Indian government, while deeply integrating into the Indian talent development system. India plans to train 85000 VLSI engineers and support 100 startups within five years, with the goal of building India into Renesas' second-largest global research and development base.   The Indian Ministry of Electronics and Information Technology sees it as a 'major leap' in the semiconductor roadmap, aiming to achieve a semiconductor output value of $109 billion by 2030, accounting for 10% of the global market. However, the project implementation faces many challenges. In the manufacturing process, the precision requirements for 3nm process equipment are extremely high, and only a few companies such as TSMC and Samsung can mass produce it globally. Renesas plans to outsource to TSMC for outsourcing, but geopolitical risks may affect the stability of outsourcing. On the supply chain, India's domestic system is not perfect, and the supply of raw materials and equipment relies on imports, resulting in high and unstable costs. On the technical level, although India has a large group of engineers, they lack high-end design experience and currently only have mature process design capabilities. The 3nm process has extremely high requirements for transistor density and energy efficiency optimization, and there is a lack of IP libraries and design toolchains locally, requiring external support. The ambition and challenges of India's semiconductor industry coexist, and the landing of Renesas' 3nm design center is an important progress. However, whether it can overcome manufacturing dependence, supply chain difficulties, and technological shortcomings in the future will determine whether it can truly occupy a place in the global semiconductor landscape. Foxconn and HCL Joint Venture: Building Semiconductor Packaging Plant in India On May 14, 2025, the Indian Cabinet approved the joint venture between Foxconn and HCL Group to build a semiconductor packaging plant, with a total investment of 37.06 billion rupees (approximately 435 million US dollars), located at Jawar Airport in Uttar Pradesh, and expected to start production in 2027. The project is divided into two phases, with the first phase focusing on packaging testing and the second phase upgrading to a complete manufacturing factory, ultimately achieving a monthly production capacity of 20000 wafers and 36 million display driver chips. In terms of technology and product planning, in the initial stage of the project, we will provide downstream services for overseas chips to avoid the shortcomings of domestic manufacturing in India; The second phase will shift towards the manufacturing of display driver chips, covering fields such as mobile phones and automobiles, forming a vertically integrated ecosystem of "chip module whole machine" with Foxconn's iPhone assembly plant in India. The project is deeply tied to Apple's supply chain restructuring needs. Currently, Indian made iPhones account for 20% of US imports, and Apple plans to expand production capacity in India to cope with geopolitical risks. Foxconn not only responds to Apple's "Made in India" strategy, but also reduces import tariffs on electronic components by 20% through localized chip supply. Its panel factory, which cooperates with Innolux Optoelectronics, will also collaborate with packaging factories to promote localization of the display industry chain. This project is the sixth semiconductor manufacturing project approved by India, supported by the "Semiconductor Plan" policy. The Indian government provides capital subsidies, land concessions, and tax exemptions, and Uttar Pradesh also grants exemptions from electricity taxes and grants for skills training. Foxconn holds 40% of the shares and HCL Group holds 60%. Both parties plan to adopt a "technology introduction+local operation" model to build automotive electronic manufacturing capabilities, and plan to build two more wafer fabs and one packaging plant in the future. As of May 2025, the project has completed company registration and site survey, and is expected to start infrastructure construction by the end of the year.   HCL Group is in talks with NXP and Tesla to establish OEM cooperation for automotive display driver chips. However, the project faces multiple challenges. India lacks sufficient accumulation of display driver chip technology, and although Foxconn has introduced panel technology, chip design relies on external IP authorization.   In addition, the global market is dominated by Samsung and LG, and Foxconn needs to break through technical indicators to enter the mainstream supply chain. Moreover, India can only absorb 30% of its domestic production capacity, and the remaining capacity depends on exports. Geopolitical risks may affect order stability. Overall, this cooperation is an important attempt for India's semiconductor "differentiation breakthrough". If mass production goes smoothly, it is expected to form regional advantages. However, to achieve a leap from "packaging and testing" to "independent design and manufacturing", many bottlenecks such as technology and production capacity still need to be overcome. TSMC to build its first 12 inch wafer fab in India In September 2024, TSMC signed a contract with India's Tata Electronics to jointly build India's first 12 inch wafer fab in Gujarat, with a total investment of $11 billion and a monthly production capacity of 50000 wafers. It is expected to start mass production in 2026. This project is not only a milestone in semiconductor manufacturing in India, but also a key part of TSMC's global layout. TSMC is responsible for the design and construction of wafer fabs, transfer of mature process technology (28nm and above processes), and talent training, while Tata Group undertakes over 90% of investment and operational management. Both parties will build a full industry chain ecosystem of "design manufacturing packaging" through a "technology authorization+local operation" model. The factory focuses on automotive grade, panel drivers, and high-speed computing logic chips, with target markets covering electric vehicles, AI, and other fields. Tata Electronics has negotiated OEM cooperation with NXP and Tesla, and plans to build two more factories in the future to simultaneously promote the construction of the Assam packaging plant. For TSMC, technology transfer can consolidate its mature process influence and obtain market access at low cost through India's "Semiconductor Plan" subsidy of 760 billion rupees and the "Production Linked Incentive Plan". The Indian government provides up to 50% financial subsidies for the project, promising land concessions and tax reductions. India has included the project in its "Self Reliance India" strategy, aiming to cultivate 50000 semiconductor talents and increase self-sufficiency to 50% by 2030. At present, 30% of the factory infrastructure has been completed, 12 mature process patents have been transferred, the first batch of 500 students have entered the training stage, and Tata and NXP's OEM cooperation has entered the technology verification stage. However, the project faces numerous challenges.   In terms of the market, there is overcapacity in mature processes worldwide, and the demand in India may be difficult to digest the scale of producing 50000 pieces per month, requiring reliance on OEM orders to balance production capacity. In terms of policy implementation, India's previous $10 billion subsidy plan had little effect due to slow approval and low participation, and it is doubtful whether this subsidy can be delivered on time. The cooperation between TSMC and Tata is a bold attempt by India's semiconductor industry to achieve "leapfrog development". Its success or failure depends not only on technology transfer, but also on the Indian government's sustained efforts in policy implementation, infrastructure support, and market cultivation. Infineon opens research and development center in India On March 24, 2025, Infineon officially opened its Global Competence Center (GCC) in Ahmedabad, Gujarat, India. As its fifth research and development base in India, the center is located in GIFT City and plans to hire 500 engineers over the next five years, focusing on chip design, product software development, information technology, supply chain management, and system application engineering. Currently, Infineon has over 2500 employees in India, with Bangalore being its largest research and development base. Infineon regards India as a global innovation core, aiming to achieve sales of over 1 billion euros by 2030, closely focusing on India's automotive regulations and industrial chip demand, and accelerating its layout with up to 50% financial subsidies under the "Semiconductor Plan". It adopts a "localization of research and development+outsourcing of manufacturing" model, with a focus on developing next-generation automotive specifications and industrial control chips on the R&D side, and utilizing Indian engineers to reduce costs; The manufacturing side has reached a wafer supply agreement with Indian companies CDIL and Kaynes, with Indian companies responsible for packaging, testing, and sales, forming a "design packaging sales" collaborative chain. Currently, there are no plans to build a self built wafer fab, and the strategy may be adjusted in the long term based on the maturity of the Indian supply chain. In addition, Infineon actively builds a local ecosystem, collaborates with universities to cultivate semiconductor talents, and deepens government enterprise cooperation by leveraging preferential policies such as land and taxation in Gujarat. It aims to capture over 10% of the $100 billion semiconductor market in India by 2032. Infineon's India layout is a key outcome of its "global localization" strategy, attempting to seize the opportunity during India's semiconductor boom period and help India transform into a "manufacturing powerhouse" through research and development centers, local cooperation networks, and policy resource integration. Micron is building a sealing and testing factory in India   The factory focuses on wafer segmentation, packaging, testing, and module production. It is expected that the first batch of products will be produced in the first half of 2025, and after full production, it will create over 5000 high-tech jobs and become a large-scale storage chip packaging and testing base in South Asia. The site selection forms a 50 kilometer industrial cluster with Tata Electronics wafer fab and Renesas Electronics packaging and testing project, and initially constructs a closed loop of "design manufacturing packaging and testing" area. The factory adopts mature processes of 40nm and above to serve the Indian, Southeast Asian, and Middle Eastern markets, which can reduce Micron Asia Pacific's packaging and testing costs by 15% -20%. In the progress of the project, Micron is promoting the localization of the supply chain, Korean material suppliers are investing with factories, Indian local enterprises are also cooperating in equipment maintenance, chemical supply and other fields, and the US government is providing key raw material support. Although production has been delayed by 6 months due to India's infrastructure shortcomings, Micron still sees great potential in the Indian market. This project is a result of the Modi government's "Self Reliance India" strategy, marking a breakthrough in India's chip manufacturing process. As India plans to launch a new round of semiconductor incentive policies worth over billions of dollars, Micron is evaluating phase two expansion and plans to increase monthly testing capacity to 150000 wafers by 2030, covering advanced technologies. Micron's layout in India demonstrates India's determination and potential to accelerate its transformation into a new global hub for chip manufacturing through "policy leverage+international cooperation". Semiconductor giants gather in India In addition, many leading global semiconductor companies are accelerating the construction of strategic pivot points in India. Chip giants such as NVIDIA and AMD have taken the lead in establishing large-scale research and design centers in India, integrating India into their global innovation network to diversify supply chain risks and stay close to the rapidly growing consumer electronics market. As a leader in the automotive chip industry, NXP announced that it will double its R&D investment in India to over $1 billion in the coming years. Currently, it has four design centers and 3000 employees, and plans to establish a second R&D department focused on 5-nanometer automotive chips at the Greater Noida Semiconductor Park, with the goal of increasing the total number of employees to 6000. Qualcomm, TI and other companies have established research and development centers and localized teams to deeply participate in the technological development of emerging fields such as 5G communication and the Internet of Things in India. ADI has formed a strategic alliance with Tata Group to explore the construction of semiconductor manufacturing plants in India, with a focus on developing customized chips for electric vehicles and network infrastructure. This move marks the beginning of international manufacturers extending from the design phase to the manufacturing phase. These layouts resonate with the industrial policies of the Indian government. India has attracted $10 billion wafer fab projects, including a collaboration between Israel's Tower Semiconductor and Adani Group, by revising its $10 billion semiconductor incentive plan, relaxing technology requirements, and increasing subsidy ratios. In addition, global semiconductor equipment giants are also accelerating the construction of strategic pivot points in India, deeply participating in the reshaping of its industrial ecosystem, and improving the layout of the industrial chain. DISCO Japan was the first to establish a legal entity in Bangalore and a service network in Ahmedabad. The initial team of 10 people will be expanded according to customer needs. Its layout aims to provide equipment installation and technical support for Micron, Tata Electronics, and other wafer fabs and packaging and testing plants in India. It also trains Indian marketing personnel in advance through its Singapore base. Applied Materials positions India as a global hub for research and supply chain, and the $400 million investment plan launched in 2023 is steadily advancing. Establishing a Center of Excellence for Artificial Intelligence and Data Science in Chennai, focusing on the development of AI applications for chip manufacturing, with an expected creation of 500 high-end positions. The plan is to expand the total number of employees from 8000 to 10000. At the same time, we are collaborating with 15 suppliers to explore the establishment of equipment component manufacturing bases in India, striving to physically co locate verification centers with wafer fabs, shorten research and development cycles, improve material verification efficiency, and help India form competitiveness in mature process areas. Lam Research (Panlin Group) is implementing a "localization of supply chain" strategy and announced a $1.2 billion investment in Karnataka state in 2024 to collaborate with the local government to promote the construction of local supply capabilities such as precision components and high-purity gas delivery systems. The company evaluates the potential for cooperation between Indian suppliers in the core components of wafer manufacturing equipment and plans to include India in a global network of 3000 suppliers to achieve localized support in key equipment areas such as etching and thin film deposition, thereby enhancing regional supply chain resilience and reducing supply chain risks in the Asia Pacific region. Tokyo Electronics has established a deep cooperation with India's Tata Electronics to supply equipment for its 12 inch wafer fab in Gujarat, and will also establish a specialized training system to help Tata Electronics engineers master advanced process equipment operation techniques. We plan to establish an equipment delivery and after-sales support system in India by 2026, and form a local engineering team to serve Tata Electronics' manufacturing needs in areas such as automotive electronics and AI chips. The layout of giants resonates with India's industrial policies, with central and local governments providing up to 75% of project cost subsidies to promote the coordinated development of equipment giants and wafer fabs. The influx of international capital confirms the strategic value of the Indian market. Its appeal lies not only in the expected chip demand to exceed 100 billion US dollars by 2026, making it the fastest-growing semiconductor market in the world, but also in the explosive growth in fields such as automotive electronics and 5G communication, providing broad application scenarios for the semiconductor industry. Although the Indian semiconductor industry is still constrained by weak infrastructure and insufficient technological accumulation, it is gradually moving from a major chip design outsourcing country to the manufacturing sector through "policy leverage+international cooperation". With the deep participation of leading semiconductor companies, India is expected to form differentiated competitiveness in sub sectors such as automotive electronics and industrial control, becoming an important variable in the restructuring of the global semiconductor supply chain.   The Story of India's Semiconductor Industry In fact, the development process of India's semiconductor industry is full of twists and turns and opportunities, from early technological breakthroughs to policy adjustments, and now global giants are laying out, reflecting a country's unremitting exploration in the semiconductor field. The starting point of India's semiconductor industry can be traced back to 1984, when the government funded semiconductor manufacturing company SCL upgraded its process from 5 microns to 0.8 microns in the 1980s, only one generation behind Intel. However, a major fire in 1989 destroyed the SCL factory, and reconstruction took 8 years, causing India to miss the golden period of semiconductor development. Since then, India has made multiple attempts to attract foreign investment to build factories, but has repeatedly suffered setbacks due to lagging policies and insufficient resources. For example, in 2005, Intel gave up investment due to policy deficiencies, and in 2012, incentive plans were stalled due to capital and water resource issues. Until December 2021, the Modi government launched the "India Semiconductor Plan", providing 760 billion rupees (approximately 10 billion US dollars) in incentive funds, but the initial response was limited. The real turning point came in June 2023, when the revised plan increased the financial support ratio to 50%, covering the entire industry chain of semiconductor manufacturing, packaging and testing, and relaxed technical requirements to attract giants such as Micron and Renesas to settle in. This policy adjustment marks India's shift from slogan based incentives to substantive industrial support. Under policy promotion, the Indian semiconductor industry has made significant progress. In addition to the manufacturers mentioned above, almost all of the world's top semiconductor companies, including Intel, Texas Instruments, Nvidia, Qualcomm, etc., have design and research centers in India, with most personnel concentrated in Bangalore, Karnataka state in southern India. Image source: ISM In addition, India has signed multiple cooperation agreements with the United States, Japan, and the European Union to promote technology transfer and supply chain diversification. Market data shows that semiconductor consumption in India is expected to grow from $22 billion in 2019 to $64 billion in 2026, with a compound annual growth rate of 16%, with automotive, consumer electronics, and wireless communications being the main growth areas.   Reasons for Semiconductor Giants to Invest in India There are several reasons why international semiconductor giants are rushing to India, in my opinion: Policy and financial support: India provides the most generous subsidy policy in the world, with the central government bearing 50% of project costs and state governments providing additional subsidies of 20% -25%. Enterprises only need to contribute 25% -30% in actual investment, directly lowering the investment threshold for enterprises. The revised plan also provides special support for sub sectors such as packaging and testing, compound semiconductors, etc., to further reduce investment risks for enterprises. Image source: India Semiconductor Mission (ISM) Talent reserve and cost advantage: India has 20% of the world's semiconductor design talent, 25 leading companies such as Intel and Qualcomm have established research and development centers in Bangalore, and companies such as New Think Technology have over 5500 employees. Every year, 100000 new engineering graduates are added, providing sufficient manpower reserves for the industry, and the labor cost is only one-third of that in developed countries. Intel, Qualcomm and other companies have established research and development centers in India, utilizing local talents for chip design and software development; Equipment giants such as Applied Materials and Lam Research are expected to train tens of thousands of engineers in the next five years through training programs. Geopolitics and Supply Chain Restructuring: Under the trend of China US trade frictions and global supply chain diversification, India has become an important choice for companies to diversify their risks. Semiconductor giants can avoid geopolitical risks and stay close to rapidly growing local markets such as automotive electronics and 5G equipment by setting up factories in India. The Memorandum of Understanding on Semiconductor Supply Chain and Innovation Partnership signed between India and the United States further strengthens its position as a "reliable manufacturing center". Market potential and industry synergy: The size of the Indian semiconductor market is expected to reach $110 billion by 2030, and the government is promoting the "Make in India" and "Digital India" plans to stimulate local demand. At the same time, India is building a complete industrial chain through local giants and international cooperation, constructing a complete ecosystem from design, manufacturing to packaging, attracting upstream and downstream enterprises to gather, forming industrial clusters, and reducing collaboration costs between enterprises. Meanwhile, Apple's production of iPhones in India can also drive demand for chip matching. Infrastructure upgrade: India is building a "semiconductor city" in Gujarat, with supporting infrastructure such as electricity and transportation, and establishing a semiconductor manufacturing ecosystem fund for park development and logistics network construction. In addition, the Indian government is promoting the "Digital India" plan, investing 11000 kilometers of highways and smart grids to improve supply chain efficiency.
    - June 02, 2025
  • The United States demands that the three major EDA giants completely cut off their supply to China
    The United States demands that the three major EDA giants completely cut off their supply to China
    According to the Financial Times, the Bureau of Industry and Security (BIS) of the US Department of Commerce has reportedly issued notices to the top three global electronic design automation (EDA) software suppliers - Synopsys, Cadence, and Siemens EDA - requesting them to cease providing services to Chinese customers. Another industry insider confirmed that these three companies did receive notification from BIS, but the specific content is still unclear.   Insiders have revealed that the US government is evaluating a broader policy to restrict the sale of chip design software to China. As part of the action, BIS has recently sent letters to some leading EDA suppliers requesting a suspension of shipments to Chinese customers. In response, a BIS spokesperson stated, "The US Department of Commerce is reviewing exports involving strategic projects in China. In some cases, existing export licenses may be suspended or additional license requirements may be imposed during the review period Sassine Ghazi, CEO of New Think Technology, stated in a conference call on May 28th that the company has not yet received formal notification from BIS, but he acknowledged reports of the letter, stating, "We cannot speculate on the potential impact of the notification that has not yet been received.   The United States' implementation of EDA supply cut-off to China is not the first time. In 2019, after Huawei was included in the "Entity List", New Think Technology, Kaiden Electronics, and Mentor Graphics (now Siemens EDA) were required to suspend software licensing and updates to Huawei.   In August 2022, the US Department of Commerce further tightened export controls on EDA tools used for advanced process chip design at 3 nanometers and below, aimed at limiting China's development in the field of cutting-edge chip design. These ongoing measures indicate that cutting off EDA supply to China is a key link in the US semiconductor strategy, with the core goal of curbing China's ability to improve in high-end chip design and manufacturing.  
    - May 30, 2025
  • IBM To Bring Deca's Fan-Out Packaging TechnologyTo North America
    IBM To Bring Deca's Fan-Out Packaging TechnologyTo North America
    IBM has formed an alliance with Deca Technologies to leverage Deca's MFIT technology to enter the fan out wafer level packaging (FOWLP) market, with plans to build a new production line at the Bromont factory in Canada in the second half of 2026. On the 22nd, both parties signed a contract to import Deca's M-Series and adaptive pattern technology into the factory, focusing on MFIT to expand the high-performance small chip integrated supply chain. FOWLP's global production capacity is concentrated in Asia, while North America is expanding its local production capacity. IBM is now focusing on chip design and packaging. This cooperation aims to seize the market in fields such as AI and also reflects the trend of regionalization in the global semiconductor industry chain. IBM and Deca Technologies form an alliance in the field of semiconductor packaging IBM and Deca Technologies have formed an important alliance in the semiconductor packaging field, which will enable IBM to enter the advanced fanout wafer level packaging market. According to the plan, IBM expects to establish a new high-capacity production line within its existing packaging factory located in Bromont, a city in southern Quebec, Canada. At some point in the future, IBM's new production line is expected to produce advanced packaging based on Deca's M-series Fan Out Interlayer Technology (MFIT). MFIT technology can achieve a new type of complex multi chip packaging. Nevertheless, IBM has been providing packaging and testing services to external clients at Bromont for many years to meet their internal needs. With Deca's announcement, IBM will expand its packaging capabilities and enter the field of Fan Out Wafer Level Packaging (FOWLP). Basically, after the chip is manufactured in the wafer fab, it will be assembled into a package. Encapsulation is a small casing used to protect one or more chips from harsh working conditions. FOWLP is an advanced packaging form that can integrate complex chips into the package. FOWLP and other types of packaging help improve chip performance. Deca's MFIT is an advanced form of FOWLP, in which the latest storage devices, processors, and other chips can be integrated in a 2.5D/3D package. Deca CEO Tim Olson stated that MFIT is a high-density integrated platform for AI and other memory intensive computing applications. ”(See Figure 1 below) Fan out wafer level packaging (FOWLP) is an enabling technology, but most, if not all, of the global FOWLP production capacity is located in Asia. Companies such as Riyueguang and TSMC produce fan out packaging across Asia. However, some customers may wish to manufacture and package chips in North America. At some point in the future, customers may have two new fan out production capacity options in North America. IBM is working hard to achieve this. In addition, SkyWater, a US wafer foundry, is developing a fan out production capacity based on Deca technology at a factory in the United States. A Brief History of IBM IBM is an iconic brand in the computer field with a long history. It also has a long and sometimes painful history in the semiconductor industry. The origin of IBM can be traced back to 1911, when a company called Computing Tabulating Recording Company (CTR) was established. CTR provides a record keeping and measurement system. In 1924, CTR was renamed International Business Machines. In 1952, IBM launched its first commercial/scientific computer, called the 701 Electronic Data Processing Machine (EDP). 701 integrates three electronic devices - vacuum tube, magnetic drum, and magnetic tape. Four years later, IBM established a new semiconductor research and development team with the goal of finding a technology to replace outdated vacuum tubes for its system. In the 1960s, IBM developed a new and more advanced alternative technology - solid-state electronic devices based on an emerging technology called integrated circuits (ICs). Afterwards, the company adopted more advanced chip technology in its computer product line. In 1966, IBM established its Microelectronics division, which became the company's Semiconductor division. At that time, the company was developing chips for its own system. In the same year, Robert Danard of IBM invented DRAM, which is still used as the main memory for personal computers, smartphones, and other products today. Another major event occurred in 1993 when IBM entered the commercial semiconductor market. The company manufactures and sells ASICs, processors, and other chips to external customers. In the 1990s, IBM also entered the OEM business, laying the foundation for competition with companies such as TSMC. IBM provides cutting-edge processes and RF technology to OEM customers. The company produces chips in its own wafer fab. However, in the 2010s, IBM's microelectronics division encountered difficulties. The department struggled in the commercial semiconductor business, losing millions of dollars. Its OEM business has also encountered failures. In 2014, IBM sold its microelectronics division (including wafer fabs and foundry business) to foundry supplier GlobalFoundries (GF). IBM has paid approximately $1.5 billion to GF to acquire its microelectronics division. IBM's current semiconductor/packaging work Time flies. Nowadays, IBM not only provides system services, but also offers hybrid cloud and consulting services. The company is still involved in the semiconductor industry. It designs processors and other chips, but no longer produces them in its own wafer fab. It relies on contract manufacturers to produce chips. In addition, IBM has a large semiconductor research and development center in New York. In 2015, the company's R&D department developed a groundbreaking transistor technology called nanosheets. Nanoflakes are essentially a next-generation surround gate (GAA) transistor. In addition, IBM has been providing packaging and testing services to Bromon's customers for many years. In fact, the Bromon factory is the largest outsourced semiconductor packaging and testing (OSAT) factory in North America. The company provides flip chip packaging and testing services at the factory. In addition, IBM is developing an assembly process for co packaging optical devices. IBM has also established an important alliance with Rapidus, a wafer foundry startup headquartered in Japan. Rapidus is developing a 2nm process based on IBM nanosheet transistor technology. Rapidus and IBM are also jointly developing various methods for producing chips. Chips are essentially small modular chips. These chips are electrically connected and then combined in one package to form a brand new complex chip. Now, IBM is collaborating with Deca to develop fanout packaging capabilities. According to the IBM website, the company plans to increase its FOWLP manufacturing capabilities in the second half of 2026. What is fan out? FOWLP is not a new technology and has a long history of development. FOWLP gained fame in 2016 when Apple used TSMC's fanout packaging technology in its iPhone 7. In packaging, TSMC stacks DRAM chips on top of application processors. This processor, named A10, was designed by Apple and manufactured by TSMC using a 16 nanometer process. Apple has also adopted TSMC's fanout packaging technology in subsequent smartphones. FOWLP has a wide range of applications. For example, fan out packaging can integrate multiple chips and components, such as MEMS, filters, crystals, and passive devices. But the uniqueness of fan out packaging lies in its ability to develop small-sized packages with a large number of I/O interfaces. In many cases, small chips are packaged in large-sized packages. This will take up too much space. According to ASE, in fan out packaging, the package size is roughly the same as the chip itself. Fan out packaging can be defined as a packaging where any connection is fan out from the chip surface to support more external I/O. ” Taiwan's ASE, the world's largest OSAT manufacturer, produces a fanout packaging production line based on Deca M series technology. South Korean OSAT manufacturer Nepes is another authorized manufacturer of Deca. In terms of research and development, IBM and SkyWater are developing fan out packaging based on Deca's technology. Last year, SkyWater and Deca announced a $120 million contract with the US Department of Defense. SkyWater is expected to produce fan out packaging at its factory in the United States by the end of this year. At the same time, Deca has also developed multiple versions of M-series fan out technology. Overall, M-series technology can assist customers in developing single-chip and multi chip packaging, 3D packaging, and chipsets. Deca has also developed a manufacturing technology called "Adaptive Patterning" for M-series technology, which is used to produce fine pitch fanout packaging. Deca's M series includes a version called MFIT. This is an advanced technology that covers double-sided wiring, dense 3D interconnects, and embedded bridge chips. It enables customers to develop multi chip packages that integrate high bandwidth memory (HBM), processors, and other devices. Deca's Olson said, "MFIT adopts M-series chip first fan out technology, combined with embedded bridging technology, to create a high-density intermediate layer for the chip, and finally install the processor and memory chip. Adaptive patterning technology can achieve extremely high density with a spacing of less than 10 µ m. ” He said, "MFIT adopts Deca's second-generation technology, which initially used a 20 µ m spacing for embedded components and plans to gradually achieve finer spacing. The flip chip technology used on the intermediate layer of chip level devices is initially consistent with the current industry-leading spacing and plans to gradually achieve finer spacing. Adaptive patterning technology can be extended to finer spacing while maintaining strong manufacturability through design during the manufacturing process. ” Fan out type is not the only choice in the field of advanced packaging. Other options include 2.5D and 3D packaging technology, as well as small chip technology. In summary, there are multiple options in the market, and there will be more innovations in the future.  
    - May 28, 2025
  • First "Made in India" chip produced by semiconductor factories in the northeast region.
    First "Made in India" chip produced by semiconductor factories in the northeast region.
    Indian Prime Minister Narendra Modi announced on Friday (May 23) that India will soon acquire the first "Made in India" chip produced by semiconductor factories in the northeast region. He said that the region is becoming an important destination for both the energy and semiconductor industries. Nowadays, the Northeast region is playing an increasingly important role in strengthening the Indian semiconductor ecosystem. India will soon obtain its first 'Made in India' chip from semiconductor factories in the Northeast region, "Modi said in his inaugural speech at the 2025 Northeast Rising Investors Summit. Last August, Tata Group began building a semiconductor factory in Assam with a total investment of 270 billion rupees. The Prime Minister stated that semiconductor factories have opened up opportunities for the semiconductor industry and other cutting-edge technologies in the region. Modi stated that the government is making large-scale investments in the hydropower or solar energy sectors in various northeastern states, with projects worth tens of millions of rupees already allocated.   He stated that investors not only have the opportunity to invest in factories and infrastructure in the Northeast region, but also have a golden opportunity to invest in the manufacturing industry in the area. He emphasized that significant investment is needed in the fields of solar modules, batteries, energy storage, and research and development, as they represent the future. He said, "The more we invest in the future, the less we rely on other countries The Prime Minister stated that robust roads, good power infrastructure, and logistics networks are the pillars of all industries. Where there is seamless connectivity, trade will also flourish. This means that robust infrastructure is the primary condition and foundation for any development. Modi stated that the trade potential of the Northeast region will double in the next decade. At present, the trade volume between India and ASEAN is close to 1.25 billion US dollars. In the next few years, this trade volume will exceed 200 billion US dollars, and the Northeast region will become a solid bridge to achieve this goal. He stated that the Northeast region will become a trade gateway for ASEAN. Adani Group Chairman Gautam Adani announced in a speech that the group will invest an additional 500 billion rupees in the Northeast region over the next 10 years. Three months ago, the group had promised to invest 500 billion rupees in Assam.  
    - May 24, 2025
  • Proposal and Working Principle of Gallium Oxide p-NiO Heterojunction Bidirectional Switching Device
    Proposal and Working Principle of Gallium Oxide p-NiO Heterojunction Bidirectional Switching Device
    Proposal and Working Principle of Gallium Oxide p-NiO Heterojunction Bidirectional Switching Device   The Power P-GaN SJ BDS Gallium Nitride Superjunction Voltage Resistant Bidirectional Switching Device places all the surge electrical stresses of the original transverse PSJ and transverse p-GaN ReSURF more concentrated on the line closest to the edge of the polarization structure,     There are reliability issues with overload surges, and the large capacitance of the ReSURF field board exacerbates the problem of hot electron injection from overload surges in this area. So Erbao thought about it and decided not to create multiple field limiting rings similar to p-GaN thin layers, all connected to the drain to form a uniform voltage divider, while simultaneously considering the RESURF super junction voltage resistance structure.   So, try making another bidirectional voltage resistant switch?     What name should I choose? Power P-GaN SJ BDS Gallium Nitride Superjunction Voltage Resistant Bidirectional Switching Device?     A friend left a message asking me, didn't you share and discuss the structure of new gallium oxide devices at the Nanjing meeting on Saturday, Erbao? Can this super junction bidirectional switching device SJ BDS structure be used for gallium oxide devices?     Of course, Erbao also wants to give it a try. If one day In the second round, Nakamura Shuji discovered a new buffer growth technology that can directly grow single crystal quasi single crystal quality gallium oxide epitaxial layers on different substrates such as silicon wafers/sapphire wafers. Perhaps in the future, gallium oxide materials will shine brightly in the field of heterogeneous epitaxial lateral high-voltage devices, and even high-voltage integrated ICs, and even replace GaN or silicon carbide devices in many fields?     The heterojunction bidirectional switching device composed of gallium oxide (Ga ₂ O3) and p-type nickel oxide (p-NiO) is a new type of power electronic device. Its working principle combines the characteristics of wide bandgap semiconductor materials, heterojunction band engineering, and superlattice structure design to achieve high voltage resistance, low loss, and bidirectional controllable switching function. The following is a detailed analysis of its working principle:   ---   **1. Material and structural characteristics** -Gallium oxide (Ga ₂ O3): -Ultra wide bandgap semiconductor (bandgap width of about 4.8-4.9 eV), with extremely high critical breakdown electric field (about 8 MV/cm), suitable for high-voltage applications.   -Natural n-type conductivity, but lacking stable p-type doping, requires the introduction of p-type materials (such as p-NiO) through heterojunctions.   -* * p-type nickel oxide (p-NiO) * *: -A p-type transparent conductive oxide forms a heterojunction with Ga ₂ O3 to compensate for the p-type defects of Ga ₂ O3 and provide hole injection capability.   -The alignment of heterojunction interface bands is crucial for carrier transport (which may form Type II band structures and promote charge separation).   -Superjunction structure: -Composed of alternating p-NiO and n-Ga ₂ O3 regions, the transverse electric field distribution is optimized through charge balance, significantly improving the breakdown voltage while reducing the on resistance.   ---   *2. Bidirectional switch mechanism** **(1) Blocked state (off state)** -* * Forward and reverse blockade * *: -Under bidirectional voltage, the heterojunction interface and superlattice structure expand uniformly distributed electric fields through depletion regions, avoiding local electric field concentration.   -The charge balance of a super junction allows the longitudinal electric field (perpendicular to the junction direction) to be shared by the transverse electric field (parallel to the junction direction), significantly increasing the breakdown voltage (up to several thousand volts).   **(2) Conductive state (open state)** -Bidirectional carrier injection: -Forward bias voltage (Ga ₂ O3 terminal connected to positive): The holes of p-NiO are injected into Ga ₂ O ∝, and the electrons of Ga ₂ O ∝ are injected into p-NiO, reducing the heterojunction potential barrier and forming bipolar conduction.   -Reverse bias voltage (Ga ₂ O3 terminal connected to negative): Through the symmetrical design of the superlattice structure, a conduction path is also formed at the interface between p-NiO and Ga ₂ O3 under reverse bias, achieving bidirectional current flow.   -The high doping concentration of the superlattice structure further reduces the on resistance (Ron) and improves efficiency.   **(3) Switch triggering mechanism** -Voltage triggered: -When the applied voltage exceeds the threshold, avalanche breakdown or tunneling effect occurs in the depletion region of the heterojunction, causing carrier multiplication and rapid conduction of the device.   -Field control effect: -Active switching control is achieved by regulating the heterojunction barrier height through gate (if designed) or structural electric field.   ---   **3. Key advantages** -* * High voltage resistance * *: The super junction structure and the high breakdown field strength of Ga ₂ O3 work together to support blocking voltages in the thousands of volts range.   -Low conduction loss: The bipolar conduction mechanism (where electrons and holes participate in conduction together) reduces Ron and improves energy efficiency.   -Bidirectional Symmetry: The structural design ensures consistent electrical characteristics in both forward and reverse directions, making it suitable for AC circuits or bidirectional power control.   -High temperature stability: The wide bandgap material is resistant to high temperatures and suitable for harsh environmental applications.   ---   **4. Potential applications** -High voltage DC/AC converters, such as smart grids and electric vehicle charging systems.   -Solid state circuit breaker: Fast response, high reliability circuit protection.   -RF power devices: High frequency, high-power communication systems.   ---   **5. Challenges and research directions** -Interface optimization: The interface defects of Ga ₂ O ∝/p-NiO heterojunctions may affect carrier transport and need to be improved through annealing or interface passivation.   -* * Thermal management * *: Ga ₂ O ∝ has low thermal conductivity and needs to be combined with heat dissipation design (such as diamond substrate integration).   -* * Process compatibility * *: Heteroepitaxial growth and superlattice manufacturing have high process complexity and require the development of low-cost mass production technologies.   ---   **Summary** Gallium oxide/p-NiO heterojunction bidirectional switching devices achieve high-voltage bidirectional conduction and fast switching characteristics through the synergistic effect of heterojunction band engineering and superlattice charge balance design, which is expected to break through the performance limit of traditional silicon-based devices and promote the development of the next generation of high-power electronic systems.  
    - May 21, 2025
  • Teach you how to design RISC-V CPU
    Teach you how to design RISC-V CPU
    In recent years, RISC-V has attracted global attention. This revolutionary ISA has swept the market with its continuous innovation, countless learning and tool resources, and contributions from the engineering community. The biggest charm of RISC-V is that it is an open source ISA. In this article, I (referring to the author of this article, Mitu Raj, the same below) will introduce how to design a RISC-V CPU from scratch. We will explain the process of defining specifications, designing and improving architecture, identifying and solving challenges, developing RTL, implementing CPU, and testing CPU on simulation/FPGA board.   Start with a Name   It is important to name or brand your idea so that you can keep going until you reach your goal! We are going to build a very simple processor, so I came up with a fancy name "Pequeno", which means "tiny" in Spanish; the full name is: Pequeno RISC-V CPU, aka PQR5. RISC-V has many flavors and extensions of the ISA architecture. We will start with the simplest one, RV32I, aka 32-bit base integer ISA. This ISA is suitable for building 32-bit CPUs that support integer operations. So, the first spec of Pequeno is as follows: Pequeno is a 32-bit RISC-V CPU that supports RV32I ISA. RV32I has 37 32-bit base instructions that we plan to implement in Pequeno. Therefore, we have to understand each instruction in depth. It took me a while to fully grasp the ISA. In the process, I learned the complete specification and designed my own assembler pqr5asm, which was verified with some popular RISC-V assemblers. "RISBUJ" The six-letter word above summarizes the instruction types in RV32I. These 37 instructions belong to one of the following categories: R-type: All integer calculation instructions on registers. I-type: All integer calculation instructions based on registers and immediate values. Also includes JALR and Load instructions. S-type: All storage instructions. B-type: All branch instructions. U-type: Special instructions such as LUI, AUIPC. J-type: Jump instructions like JAL. There are 32 general registers in the RISC-V architecture, x0-x31. All registers are 32 bits. Among these 32 registers, zero is also called x0 register, which is a useful special register. It is hardwired to zero, cannot be written, and always reads as zero. So what is it used for? You can use x0 as a dummy destination to dump results you don't want to read, or as operand zero, or to generate NOP instructions to idle the CPU. Integer computation instructions are ALU instructions that are executed on registers and/or 12-bit immediate values. Load/store instructions are used to store/load data between registers and data memory. Jump/branch instructions are used to transfer program control to different locations. Details of each instruction can be found in the RISC-V specification: RISC-V User Level ISA v2.2. To learn the ISA, the RISC-V specification document is enough. However, for more clarity, you can study the implementations of different open cores in RTL. In addition to the 37 basic instructions, I have added 13 pseudo/custom instructions to pqr5asm and extended the ISA to 50 instructions. These instructions are derived from the basic instructions and are intended to simplify the assembly programmer's life... For example: NOP instruction with ADDI x0, x0, 0 which of course does nothing on the CPU! But it is much simpler and easier to explain in code. Before we start designing the processor architecture, our expectation is to fully understand how each instruction is encoded in 32-bit binary and what it does.   The RISC-V RV32I assembler PQR5ASM that I developed in Python can be found on my GitHub. You can refer to the Assembler Instruction Manual to write sample assembly code. Compile it and see how it converts to 32-bit binary to consolidate/verify your understanding before moving on to the next step.   Specifications and Architecture   In this chapter, we defined the full specifications and architecture of Pequeno. Last time we simply defined it as a 32-bit CPU. Next, we will go into more details to get a general idea of ​​the architecture we are going to design. We will design a simple single-core CPU that is able to execute one instruction at a time in the order in which the instructions are fetched, but still in a pipelined manner. We will not support the RISC-V privileged specification because we do not currently plan to have our core operating system support it, nor do we plan to have it support interrupts. The CPU specifications are as follows: 32-bit CPU, single-issue, single-core. Classic five-stage RISC pipeline. Strictly in-order pipeline. Compliant with RV32I user-level ISA v2.2. Supports all 37 basic instructions. Separate bus interfaces for instruction and data memory access. (Why? More on that later…) Suitable for bare-metal applications, no support for operating systems and interrupts. (More precisely, a limitation!) As mentioned above, we will support the RV32I ISA. Therefore, the CPU only supports integer operations. All registers in the CPU are 32 bits. The address and data buses are also 32 bits. The CPU uses the classic little-endian byte addressing memory space. Each address corresponds to a byte in the CPU address space. 0x00 - byte[7:0], 0x01 - byte[15:8] ... 32-bit words can be accessed by 32-bit aligned addresses, i.e. addresses that are multiples of 4: 0x00 - byte 0, 0x04 - byte 1... Pequeno is a single-issue CPU, i.e. it fetches only one instruction from memory at a time and issues it for decoding and execution. A pipelined processor with a single issue has a maximum IPC = 1 (or minimum/optimal CPI = 1), i.e. the ultimate goal is to execute at a rate of 1 instruction per clock cycle. This is theoretically the highest performance that can be achieved. The classic five-stage RISC pipeline is the basic architecture for understanding any other RISC architecture. This is the most ideal and simple choice for our CPU. The architecture of Pequeno is built around this five-stage pipeline. Let's dive into the underlying concepts. For simplicity, we will not support timers, interrupts, and exceptions in the CPU pipeline. Therefore, CSRs and privilege levels do not need to be implemented either. Therefore, the RISC-V privileged ISA is not included in the current implementation of Pequeno. The simplest way to design a CPU is the non-pipelined way. Let's look at several design approaches for non-pipelined RISC CPUs and understand their drawbacks. Let's assume the classic sequence of steps that a CPU follows to execute instructions: fetch, decode, execute, memory access, and write back. The first design approach is to design the CPU as a finite state machine (FSM) with four or five states and perform all operations sequentially. For example:   But this architecture will seriously affect the instruction execution speed. Because it takes multiple clock cycles to execute an instruction. For example, writing to a register takes 3 clock cycles. In case of load/store instructions, memory latency also increases. This is a bad and primitive way to design a CPU. Let's get rid of it completely! The second approach is that the instruction can be fetched from the instruction memory, decoded, and then executed by fully combinatorial logic. Then, the result of the ALU is written back to the register file. The whole process until the write back can be completed in one clock cycle. Such a CPU is called a single-cycle CPU. If the instruction needs to access data memory, read/write latency should be taken into account. If the read/write latency is one clock cycle, then the store instruction may still execute in one clock cycle like all other instructions, but the load instruction may require an additional clock cycle because the loaded data must be written back to the register file. The PC generation logic must handle the effect of this latency. If the data memory read interface is combinatorial (asynchronous read), then the CPU becomes truly single-cycle for all instructions.   The main disadvantage of this architecture is obviously the long critical path of the combinatorial logic from instruction fetch to write to memory/register file, which limits the timing performance. However, this design approach is simple and suitable for low-end microcontrollers where low clock speed, low power and low area are required. To achieve higher clock speeds and performance, the instruction sequential processing function of the CPU can be separated. Each sub-process is assigned to an independent processing unit. These processing units are cascaded in sequence to form a pipeline. All units work in parallel and operate on different parts of the instruction execution. In this way, multiple instructions can be processed in parallel. This technique to achieve instruction-level parallelism is called instruction pipelining. This execution pipeline forms the core of a pipelined CPU.   The classic five-stage RISC pipeline has five processing units, also called pipeline stages. These stages are: Instruction Fetch (IF), Decode (ID), Execute (EX), Memory Access (MEM), Write Back (WB). The working principle of the pipeline can be intuitively represented as follows:   Each clock cycle, different parts of an instruction are processed, and each stage processes a different instruction. If you look closely, you will see that instruction 1 is only executed in the 5th cycle. This delay is called the pipeline delay. Δ This delay is the same as the number of pipeline stages. After this delay, cycle 6: instruction 2 is executed, cycle 7: instruction 3 is executed, and so on... In theory, we can calculate the throughput (instructions per cycle, IPC) as follows:   Therefore, a pipelined CPU guarantees that one instruction is executed per clock cycle. This is the maximum IPC possible in a single-issue processor. By splitting the critical path across multiple pipeline stages, the CPU can now also run at higher clock speeds. Mathematically, this gives a pipelined CPU a multiple of throughput improvement over an equivalent non-pipelined CPU.   This is called pipeline speedup. In simple terms, a CPU with an s-stage pipeline can run at S times the clock speed of its non-pipelined counterpart. Pipelining generally increases area/power consumption, but the performance gain is worth it. The math assumes that the pipeline never stalls, that is, data continues to flow from one stage to another on every clock cycle. But in real CPUs, pipelines can stall for a variety of reasons, the main ones being structural/control/data dependencies. For example: register X cannot be read by the Nth instruction because X was not modified by the (N-1)th instruction reading X back, which is an example of a data hazard in the pipeline. The Pequeno architecture uses a classic five-stage RISC pipeline. We will implement a strictly in-order pipeline. In an in-order processor, instructions are fetched, decoded, executed, and completed/committed in the order generated by the compiler. If one instruction stalls, the entire pipeline stalls. In an out-of-order processor, instructions are fetched and decoded in the order generated by the compiler, but execution can proceed in a different order. If one instruction stalls, it does not stall subsequent instructions unless there are dependencies. Independent instructions can pass forward. Execution can still complete/commit in order (this is how it is in most CPUs today). This opens the door to a variety of architectural techniques that significantly improve throughput and performance by reducing clock cycles wasted by stalls and minimizing the insertion of bubbles (what are “bubbles”? Read on…).   Out-of-order processors are fairly complex due to the dynamic scheduling of instructions, but are now the de facto pipeline architecture in today’s high-performance CPUs.   The five pipeline stages are designed as independent units: Fetch Unit (FU), Decode Unit (DU), Execution Unit (EXU), Memory Access Unit (MACCU), and Write Back Unit (WBU).   Fetch Unit (FU): The first stage of the pipeline, interfaces with the instruction memory. The FU fetches instructions from the instruction memory and sends them to the Decode Unit. The FU may contain instruction buffers, initial branch logic, etc. Decode Unit (DU): The second stage of the pipeline responsible for decoding instructions from the Execution Unit (FU). The DU also initiates read accesses to the register file. Packets from the DU and the register file are retimed and sent together to the Execution Unit. Execution Unit (EXU): The third stage of the pipeline that validates and executes all decoded instructions from the DU. Invalid/unsupported instructions are not allowed to continue in the pipeline and become "bubbles". The Arithmetic Unit (ALU) is responsible for all integer arithmetic and logical instructions. The Branch Unit is responsible for processing jump/branch instructions. The Load/Store Unit is responsible for processing load/store instructions that require memory access. Memory Access Unit (MACCU): The fourth stage of the pipeline that interfaces with the data memory. The MACCU is responsible for initiating all memory accesses based on instructions from the EXU. The data memory is the addressing space that may consist of data RAM, memory-mapped I/O peripherals, bridges, interconnects, etc. Write Back Unit (WBU): The fifth or last stage of the pipeline. Instructions complete execution here. The WBU is responsible for writing the data (load data) from the EXU/MACCU back to the register file. Between the pipeline stages, a valid-ready handshake is implemented. This is not so obvious at first glance. Each stage registers a data packet and sends it to the next stage. This packet may be instruction/control/data information to be used by the next stage or subsequent stages. This packet is validated by a valid signal. If the packet is invalid, it is called a bubble in the pipeline. A bubble is nothing more than a "hole" in the pipeline that just moves forward in the pipeline without actually performing any operation. This is similar to a NOP instruction. But don't think they are useless! We will see one use for them in the subsequent section when discussing pipeline risks. The following table defines bubbles in the Pequeno instruction pipeline.   Each stage can also stall the previous stage by issuing a stall signal. Once stalled, the stage will retain its data packet until the stall condition disappears. This signal is the same as the inverted ready signal. In an in-order processor, a stall at any stage is similar to a global stall, as it eventually stalls the entire pipeline.   The flush signal is used to flush the pipeline. The flush operation will invalidate all packets registered by the previous stages at once, as they are identified as no longer useful.   For example, when the pipeline fetches and decodes an instruction from the wrong branch after executing a jump/branch instruction, which was only identified as an error in the execution stage, the pipeline should be flushed and fetch the instruction from the correct branch!   Although pipelining significantly improves performance, it also increases the complexity of the CPU architecture. The pipeline technology of the CPU is always accompanied by its twinBro - Pipeline Hazards! Now, let's assume that we know nothing about pipeline hazards. We didn't consider the hazards when designing the architecture.   Dealing with Pipeline Hazards   In this chapter, we will explore pipeline hazards. Last time, we successfully designed a pipeline architecture for the CPU, but we didn't consider the "evil twin" that comes with pipelines. What impact can pipeline hazards have on the architecture? What architectural changes are needed to mitigate these hazards? Let's go ahead and demystify them! Hazards in the CPU instruction pipeline are dependencies that interfere with the normal execution of the pipeline. When a hazard occurs, the instruction cannot be executed within the specified clock cycle because it may result in incorrect calculation results or control flow. Therefore, the pipeline may be forced to pause until the instruction can be successfully executed.   In the above example, the CPU executes instructions in order according to the order generated by the compiler. Assume that instruction i2 has some dependency on i1, such as i2 needs to read a certain register, but the register is also being modified by the previous instruction i1. Therefore, i2 must wait until i1 writes the result back to the register file, otherwise the old data will be decoded and read from the register file for the execution stage to use. To avoid this data inconsistency, i2 is forced to stall for three clock cycles. The bubbles inserted in the pipeline represent the stall or wait state. i2 is decoded only when i1 is completed. Eventually, i2 completes execution at the 10th clock cycle instead of the 7th clock cycle. A three-clock-cycle delay is introduced due to the stall caused by the data dependency. How does this delay affect CPU performance?   Ideally, we expect the CPU to run at full throughput, i.e. CPI = 1. However, when the pipeline is stalled, the throughput/performance of the CPU decreases due to the increased CPI. For non-ideal CPUs:   There are various ways in which hazards occur in the pipeline. Pipeline hazards can be divided into three categories:   Structural hazards Control hazards Data hazards   Structural hazards occur due to hardware resource conflicts. For example, when two stages of the pipeline want to access the same resource. For example: two instructions need to access memory in the same clock cycle.   In the above example, the CPU has only one memory for storing instructions and data. The instruction fetch stage accesses the memory every clock cycle to fetch the next instruction. Therefore, the instructions in the instruction fetch stage and the memory access stage may conflict if the previous instruction in the memory access stage also needs to access the memory. This will force the CPU to increase the stall cycle, and the instruction fetch stage must wait until the instruction in the memory access stage releases the resource (memory). Some ways to mitigate structural hazards include: Stalling the pipeline until the resource is available. Duplicate the resource so that there will not be any conflict. Pipeline the resource so that the two instructions will be in different stages of the pipeline resource. Let's analyze the different situations that can cause structural hazards in Pequeno's pipeline and how to solve them. We do not intend to use stalling as an option to mitigate structural hazards! In Pequeno's architecture, we implemented the above three solutions to mitigate various structural hazards. Control hazards are caused by jump/branch instructions. Jump/branch instructions are flow control instructions in the CPU ISA. When control reaches a jump/branch instruction, the CPU must decide whether to execute the branch instruction. At this point, the CPU should take one of the following actions. Fetch the next instruction at PC+4 (branch not taken) or fetch the instruction at the branch target address (branch taken). The correctness of the decision can only be determined when the execution stage calculates the result of the branch instruction. Depending on whether the branch is taken or not, the branch address (the address the CPU should branch to) is determined. If the decision made previously was wrong, all instructions fetched and decoded in the pipeline before that clock cycle should be discarded. Because these instructions should not be executed at all! This is achieved by flushing the pipeline and fetching the instruction at the branch address on the next clock cycle. Flushing invalidates the instruction and converts it to a NOP or bubbles. This costs a large number of clock cycles as a penalty. This is called a branch penalty. Therefore, control hazards have the worst impact on CPU performance.   In the above example, i10 completed execution on the 10th clock cycle, but it should have completed execution on the 7th clock cycle. Because the wrong branch instruction (i5) was executed, 3 clock cycles were lost. When the execution stage identifies the wrong branch instruction on the 4th clock cycle, a flush must be performed in the pipeline. How does this affect CPU performance? If a program running on the above CPU contains 30% branch instructions, the CPI becomes: CPU performance is reduced by 50%! To mitigate the control risk, we can adopt some strategies in the architecture... If the instruction is identified as a branch instruction, just stall the pipeline. This decoding logic can be implemented in the fetch stage itself. Once the branch instruction is executed and the branch address is resolved, the next instruction can be fetched and the pipeline resumed. Add dedicated branch logic like branch prediction in the Fetch stage. The essence of branch prediction is: we use some prediction logic in the instruction fetch stage to guess whether the branch should be taken. In the next clock cycle, we fetch the guessed instruction. This instruction is either fetched from PC+4 (predicted branch not taken) or from the branch target address (predicted branch taken). Now there are two possibilities: If the prediction is found to be correct in the execute stage, nothing is done and the pipeline can continue processing. If the prediction is found to be wrong, the pipeline is flushed and the correct instruction is fetched from the branch address resolved in the execute stage. This incurs a branch penalty. As you can see, branch prediction still incurs a branch penalty if it predicts wrong. The design goal should be to reduce the probability of misprediction. The performance of a CPU depends a lot on how “good” the prediction algorithm is. Sophisticated techniques like dynamic branch prediction keep instruction history in order to predict correctly with 80% to 90% probability. To mitigate control hazard in Pequeno, we will implement a simple branch prediction logic. More details will be revealed in our upcoming blog on the design of the fetch unit.   Data hazard occurs when the execution of an instruction has a data dependency on the result of the previous instruction still being processed in the pipeline. Let’s understand the three types of data hazards with examples to better understand the concept. Suppose an instruction i1 writes a result to register x. The next instruction i2 also writes a result to the same register. Any subsequent instruction in the program order should read the result of i2 at x. Otherwise, data integrity will be compromised. This data dependency is called output dependency and can lead to WAW (Write-After-Write) data hazard.   Suppose an instruction i1 reads register x. The next instruction, i2, writes the result to the same register. At this point, i1 should read the old value of register X instead of the result of i2. If i2 writes the result to x before i1 reads the result, a data hazard will result. This data dependency is called an anti-dependency and can lead to a WAR (Write-After-Read) data hazard.   Suppose an instruction, i1, writes the result to register x. The next instruction, i2, reads the same register. At this point, i2 should read the value written by i1 to register x instead of the previous value. This data dependency is called a true dependency and can lead to a RAW (Read-After-Write) data hazard.   This is the most common and dominant type of data hazard in pipelined CPUs. To mitigate data hazards in in-order CPUs, we can use some techniques: When a data dependency is detected, the pipeline is paused (see the first figure). The decode stage can wait until the previous instruction is executed before executing. Compile rescheduling: The compiler reschedules the code by scheduling it to execute later to avoid data hazards. The idea is to avoid program stalls while not affecting the integrity of the program control flow, but this is not always possible. The compiler can also insert a NOP instruction between two instructions with data dependency. But this will cause stalls, which will affect performance.   Data/Operand Forwarding: This is a prominent architectural solution to mitigate RAW data risks in sequential execution CPUs. Let's analyze the CPU pipeline to understand the principle behind this technology. Suppose two adjacent instructions i1 and i2, there is a RAW data dependency between them because they are both accessing register X. The CPU should stall instruction i2 until i1 writes the result back to register x. If the CPU does not have a stall mechanism, i2 will read an older value from x in the decode stage of the third clock cycle. In the fourth clock cycle, the i2 instruction will execute the wrong value of x.   If you look closely at the pipeline, we already have the result of i1 in the third clock cycle. Of course, it is not written back to the register file, but the result is still available at the output of the execute stage. So if we can somehow detect data dependencies and then "forward" that data to the input of the execute stage, then the next instruction can use the forwarded data instead of the data from the decode stage. That way, the data hazard is mitigated! The idea is this:   This is called data/operand forwarding or data/operand bypassing. We forward the data forward in time so that the subsequent dependent instructions in the pipeline can access this bypassed data and execute in the execute stage.   This idea can be extended to different stages. In a 5-stage pipeline that executes instructions in the order i1, i2, ..in, data dependencies may exist:   i1 and i2- need to be bypassed between the execute stage and the output of the decode stage. i1 and i3- need to be bypassed between the memory access stage and the output of the decode stage. i1 and i4- need to be bypassed between the writeback stage and the output of the decode stage.   The architectural solution for mitigating RAW data hazards originating from any stage of the pipeline is as follows:   Consider the following scenario:   There is a data dependency between two adjacent instructions i1 and i2, where the first instruction is a load. This is a special case of a data hazard. Here, we cannot execute i2 until the data is loaded into x1. So, the question is whether we can still mitigate this data hazard with data forwarding? The load data is only available in the memory access stage of i1, and it must be forwarded to the decode stage of i2 to prevent this hazard. The requirement is as follows:   Assuming the load data is available in the memory access stage of cycle 4, you need to "forward" this data to cycle 3, to the decode stage output of i2 (why cycle 3? Because in cycle 4, i has already finished executing in the execute stage!). Essentially, you are trying to forward the current data to the past, which is impossible unless your CPU can time travel! This is not data forwarding, but "data backtracking".   Data forwarding can only be done forward in time.   This data hazard is called a pipeline interlock. The only way to solve this problem is to insert a bubble to stall the pipeline for one clock cycle when the data dependency is detected.   A NOP instruction (aka bubble) is inserted between i1 and i2. This delays i2 by one cycle, so data forwarding can now forward the load data from the memory access stage to the output of the decode stage. So far, we have only discussed how to mitigate RAW data risks. So, what about WAW and WAR risks? The RISC-V architecture is inherently resistant to WAW and WAR risks implemented by in-order pipelines! All register writebacks are done in the order that instructions are issued. The data written back is always overwritten by subsequent instructions that write to the same register. Therefore, WAW risk never occurs! Writeback is the last stage of the pipeline. When the writeback occurs, the read instruction has successfully completed execution on the older data. Therefore, WAR risk never occurs! To mitigate RAW data risks in Pequeno, we will implement data forwarding in hardware using pipeline interlock protection functions. More details will be revealed later, when we will design the data forwarding logic.   We understand and analyze various potential pipeline risks in existing CPU architectures that can cause instruction execution failures. We also design solutions and mechanisms to mitigate these risks. Let’s put together the necessary microarchitecture and finally design the architecture of the Pequeno RISC-V CPU to be completely free of all types of pipeline risks!   In the following posts, we will dive into the RTL design of each pipeline stage/functional unit. We will discuss the different microarchitectural decisions and challenges during the design phase.   Fetch Unit   From here, we start to dive into the microarchitecture and RTL design! In this chapter, we will build and design the Fetch Unit (FU) of Pequeno. The Fetch Unit (FU) is the first stage of the CPU pipeline that interacts with the instruction memory. The Fetch Unit (FU) fetches instructions from the instruction memory and sends the fetched instructions to the Decode Unit (DU). As discussed in the previous post on the improved architecture of Pequeno, the FU contains branch prediction logic and flush support.   1 Interfaces   Let’s define the interfaces of the Fetch Unit:   2 Instruction Access Interfaces   The core function of the FU in the CPU is instruction access. The Instruction Access Interface (I/F) is used for this purpose. Instructions are stored in the instruction memory (RAM) during execution. Modern CPUs fetch instructions from a cache instead of directly from the instruction memory. The instruction cache (called the primary cache or L1 cache in computer architecture terms) is closer to the CPU and enables faster instruction access by caching/storing frequently accessed instructions and prefetching larger blocks of instructions nearby. Therefore, there is no need to constantly access the slower main memory (RAM). Therefore, most instructions can be quickly accessed directly from the cache. The CPU will not directly access the interface with the instruction cache/memory. There will be a cache/memory controller between them to control the memory access between them.   It is a good idea to define a standard interface so that any standard instruction memory/cache (IMEM) can be easily plugged into our CPU, and requires little or no glue logic. Let's define two interfaces for instruction access. The request interface (I/F) handles requests from the instruction memory (FU) to the instruction memory. The response interface (I/F) handles responses from the instruction memory to the instruction memory (FU). We will define a simple valid ready based request and response interface (I/F) for the instruction memory (FU), as this is easy to convert to bus protocols such as APB, AXI, etc. if necessary.   Instruction access requires knowing the address of the instruction in memory. The address requested through the request interface (Request I/F) is actually the PC generated by the FU. In the FU interface, we will use a stall signal instead of the ready signal, which behaves in the opposite way to the ready signal. The cache controller usually has a stall signal to stall the request from the processor. This signal is represented by cpu_stall. The response from the memory is the fetched instruction received through the response interface (Response I/F). In addition to the fetched instruction, the response should also contain the corresponding PC. PC is used as an ID to identify the request to which a response has been received. In other words, it indicates the address of the instruction that has been fetched. This is important information required by the next stage of the CPU pipeline (how is it implemented? We will see soon! ). Therefore, the fetched instruction and its PC constitute the response packet to the FU. When the internal pipeline is stalled, the CPU may also need to stall the response from the instruction memory. This signal is represented by mem_stall. At this point, let's define instruction packet={instruction, PC} in the CPU pipeline. 3PC Generation Logic The core of the FU is the PC generation logic that controls the request interface (I/F). Since we are designing a 32-bit CPU, the generation of PC should be in increments of 4. After this logic is reset, the PC is generated every clock cycle. The reset value of PC can be hard-coded. This is the address from which the CPU fetches and executes instructions after reset, that is, the address of the first instruction in memory. PC generation is a free-running logic that is only stalled by c pu_stall. The free-running PC can be bypassed by flushing the I/F and internal branch prediction logic. The PC generation algorithm is implemented as follows:   4 Instruction Buffers There are two back-to-back instruction buffers inside the FU. Buffer 1 buffers instructions fetched from the instruction memory. Buffer 1 can directly access the response interface (Response I/F). Buffer 2 buffers instructions from buffer 1 and then sends it to the DU through the DU I/F. These two buffers constitute the instruction pipeline inside the FU.   5 Branch Prediction Logic As discussed above, we must add branch prediction logic in the FU to mitigate control risks. We will implement a simple and static branch prediction algorithm. The main content of the algorithm is as follows: Always make an unconditional jump. If the branch instruction is a backward jump, execute the branch. Because the possibilities are as follows: 1. This instruction may be part of the loop exit check of some do-while loop. In this case, if we execute the branch instruction, the probability of correctness is higher. If the branch instruction is a forward jump, do not execute it. Because the possibilities are as follows: 2. This instruction may be part of the loop entry check of some for loop or while loop. If we do not take the branch and continue to execute the next instruction, the probability of correctness is higher. 3. This instruction may be part of some if-else statement. In this case, we always assume that the if condition is true and continue to execute the next instruction. Theoretically, this deal (bargain) is 50% correct.   The instruction packet of buffer 1 is monitored and analyzed by the branch prediction logic, and a branch prediction signal: branch_taken is generated. This branch prediction signal is then registered and transmitted synchronously with the instruction packet sent to DU. The branch prediction signal is sent to DU through the DU interface. 6 DU This is the main interface between the fetch unit and the decode unit for sending payloads. The payload contains the fetched instructions and branch prediction information.   Since this is the interface between the two pipeline stages of the CPU, the valid ready I/F is implemented. The following signals constitute the DU I/F:   In the previous blog post, we discussed the concept of stall and refresh in the CPU pipeline and its importance. We also discussed various scenarios in Pequeno architecture that require stall or refresh. Therefore, proper stall and refresh logic must be integrated in each pipeline stage of the CPU. It is crucial to determine at which stage a stall or refresh is required, and which part of the logic in that stage needs to be stalled and refreshed. Some initial thoughts before implementing stall and refresh logic: Pipeline stages may be stalled by externally or internally generated conditions. Pipeline stages can be refreshed by externally or internally generated conditions. There is no centralized stall or refresh generation logic in Pequeno. Each stage may have its own stall and refresh generation logic. A stage in the pipeline can only be blocked by the next stage. Any stage blocking will eventually affect the upstream pipeline and cause the entire pipeline to be blocked. Any stage in the downstream pipeline can refresh a stage. This is called pipeline refresh because the entire pipeline upstream needs to be refreshed at the same time. In Pequeno, pipeline refresh is required only for branch misses in the execution unit (EXU).   Stall logic contains logic to generate local and external stalls. The flush logic contains logic to generate local and pipeline flushes. Local stalls are generated internally and used locally to stop the current stage. External stalls are generated internally and sent externally to the next stage of the upstream pipeline. Both local and external stalls are generated based on internal conditions and external stalls at the next stage of the downstream pipeline. Local flush is a flush generated internally and used for the local flush stage. External flush or pipeline flush is a flush generated internally and sent externally to the upstream pipeline. This flushes all stages upstream simultaneously. Both local and external flushes are generated based on internal conditions.   Only the DU can stop the operation of the FU externally. When the DU sets stall, the internal instruction pipeline of the FU (buffer 1 -> buffer 2) should be stopped immediately, and since the FU can no longer receive packets from the IMEM, it should also set mem_stall to the IMEM. Depending on the pipeline/buffer depth in IMEM, the PC generation logic may also eventually be stalled by cpu_stall from IMEM, since IMEM cannot receive any more requests. There are no internal conditions in FU that cause local stalls. Only EXU can externally flush FU. EXU initiates branch_flush function in CPU instruction pipeline and passes in the address of the next instruction to be fetched after pipeline is flushed ( branch_pc ). FU provides flush interface (Flush I/F) to accept external flush. Buffer 1, Buffer 2 and PC generation logic in FU are flushed by branch_flush. Signal branch_taken from branch prediction logic also acts as a local flush to buffer 1 and PC generation logic. If branch is taken: The next instruction should be fetched from branch predicted PC. Therefore, PC generation logic should be flushed and next PC should = branch_pc. Next instruction in buffer 1 should be flushed and invalidated, i.e. NOP/bubble inserted.   Wonderful why Buffer-2 is not flushed by branch_taken? Because the branch instruction from Buffer-1 (responsible for flush generation) should be buffered to Buffer-2 in the next clock cycle and allowed to continue execution in the pipeline. This instruction should not be flushed! The instruction memory pipeline should also be flushed appropriately. IMEM flush mem_flush is generated by branch_flush and branch_taken. Let's integrate all the microarchitectures designed so far to complete the architecture of the Fetch Unit.   Ok, everyone! We have successfully designed the Fetch Unit of Pequeno. In the next part, we will design the Decode Unit (DU) of Pequeno.   Decode Unit   The Decode Unit (DU) is the second stage of the CPU pipeline and is responsible for decoding instructions from the Fetch Unit (FU) and sending them to the Execution Unit (EXU). In addition, it is responsible for decoding register addresses and sending them to the register file for register read operations. Let's define the interface of the Decode Unit.   Among them, the FU interface is the main interface between the fetch unit and the decode unit to receive the payload. The payload contains the fetched instructions and branch prediction information. This interface has been discussed in the previous section.   The EXU interface is the main interface between the decode unit and the execution unit to send the payload. The payload includes the decoded instructions, branch prediction information, and decoded data.   The following are the instruction and branch prediction signals that make up the EXU I/F:   Decoded data is the important information that the DU decodes from the fetched instructions and sends to the EXU. Let's understand what information the EXU needs to execute an instruction.   Opcode, funct3, funct7: Identifies the operation that the EXU is going to perform on the operand. Operand: Depending on the opcode, the operand can be register data (rs0, rs1), register address for writeback (rdt), or 12-bit/20-bit immediate value. Instruction type: Identifies which operand/immediate value must be processed. The decoding process can be tricky. If you understand the ISA and instruction structure correctly, you can recognize different types of instruction patterns. Recognizing the pattern helps in designing the decoding logic in the DU. The following information is decoded and sent to the EXU via the EXU I/F.   The EXU will use this information to demultiplex the data to the appropriate execution subunit and execute the instruction. For R-type instructions, the source registers rs1 and rs2 must be decoded and read. The data read from the registers are the operands. All general user registers are located in the register file outside the DU. The DU uses the register file interface to send the address of rs0 and rs1 to the register file for register access. The data read from the register file should also be sent to the EXU in the same clock cycle along with the payload.   The register file takes one cycle to read the register. The DU also takes one cycle to register the payload to be sent to the EXU. Therefore, the source register address is decoded directly from the FU instruction packet by the combinational logic. This ensures the timing synchronization of 1) the payload from the DU to the EXU and 2) the data from the register file to the EXU. Only the EXU can stop the operation of the DU externally. When the EXU sets the stop, the internal instruction pipeline of the DU should stop immediately, and it should also set the stop to the FU because it can no longer receive packets from the FU. To achieve synchronous operation, the register file should be stopped with the DU because they are both at the same stage of the CPU five-stage pipeline. Therefore, the DU feeds back the external stop from the EXU to the register file. There is no situation inside the DU that causes a local stop. Only the EXU can flush the FU externally. The EXU starts the branch_flush function in the CPU instruction pipeline and passes in the address of the next instruction to be fetched after flushing the pipeline (branch_pc). The DU provides a flush interface (Flush I/F) to accept external flushes. The internal pipeline is flushed by branch_flush. The branch_flush from the EXU should immediately invalidate the DU instruction pointing to the EXU with a latency of 0 clock cycles. This is to avoid potential control risks in the next clock cycle EXU. In the design of the Fetch Unit, we did not invalidate the FU instruction with a 0-cycle delay after receiving the branch_flush instruction. This is because the DU will also be flushed in the next clock cycle, so there will be no control hazard in the DU. So, there is no need to invalidate the FU instruction. The same idea applies to the instructions from IMEM to FU.   The above flowchart shows how the instruction packets and branch prediction data from the FU are buffered in the DU of the instruction pipeline. Only a single level of buffering is used in the DU. Let’s integrate all the microarchitectures designed so far to complete the architecture of the Decode Unit.   Currently we have completed: Fetch Unit (FU), Decode Unit (DU). In the next section, we will design the register file of Pequeno.   Register File   In RISC-V CPU, the register file is a key component, which consists of a set of general purpose registers used to store data during execution. Pequeno CPU has 32 32-bit general purpose registers ( x0 – x31 ). Register x0 is called the zero register. It is hardwired to a constant value of 0, providing a useful default value that can be used with other instructions. Suppose you want to initialize another register to 0, just execute mv x1, x0. x1-x31 are general-purpose registers used to hold intermediate data, addresses, and results of arithmetic or logical operations. In the CPU architecture designed in the previous article, the register file requires two access interfaces.   Among them, the read access interface is used to read the register at the address sent by DU. Some instructions (such as ADD) require two source register operands rs1 and rs2. Therefore, the read access interface (I/F) requires two read ports to read two registers at the same time. The read access should be a single-cycle access so that the read data is sent to the EXU in the same clock cycle as the payload of the DU. In this way, the read data and the payload of the DU are synchronized in the pipeline. The write access interface is used to write the execution result back to the WBU sends the register at the address. Only one destination register rdt is written at the end of execution. Therefore, one write port is sufficient. Write access should be single cycle access. Since the DU and the register file need to be synchronized at the same stage of the pipeline, they should always be stopped together (why? Check the block diagram in the previous section!). For example, if the DU is stopped, the register file should not output read data to the EXU, because this will damage the pipeline. In this case, the register file should also be stopped. This can be ensured by inverting the stop signal of the DU to generate the read_enable of the register file. When the stop is valid, read_enable is driven low and the previous data will remain at the read data output, effectively stopping the register file operation. Since the register file does not send any instruction packets to the EXU, it does not need any refresh logic. The refresh logic only needs to be handled inside the DU. In summary, the register file is designed with two independent read ports and one write port. Both read and write accesses are single cycle. The read data is registered. The final architecture is as follows:   We have currently completed: instruction fetch unit (FU), decode unit (DU), register file. Please stay tuned for the next part.
    - May 11, 2025
  • RUNIC launches the first voltage-following LDO RS3011-Q1
    RUNIC launches the first voltage-following LDO RS3011-Q1
    RUNIC launches the first voltage-following LDO RS3011-Q1     The voltage-following low-dropout linear regulator chip (LDO) is a special type of LDO regulator whose output voltage (VOUT) directly tracks or "follows" its reference input voltage (VREF), that is, VOUT=VREF.   This type of LDO is usually used to power off-board sensors or other off-board modules in automobiles to ensure the accuracy and stability of the sensor's operating voltage, and to prevent damage to on-board components due to short circuits to the ground or battery short circuits caused by cable breakage.   RS3011-Q1 adopts automotive-grade technology and maximizes safety margin design. It has an input withstand voltage of -40V to 45V and provides a maximum output current of 300mA, meeting the high reliability requirements of automotive electronic application scenarios.   Its main features are as follows: Ø Input limit voltage supports -40V~45V; Ø Adjustable output voltage range 1.5V to 40V; Ø Self-consumption current 80μA under light load; Ø Provides maximum output current capability of 300mA; Ø Maximum 4mV tracking error output; Ø High power supply ripple suppression capability, PSRR reaches 78dB@100Hz; Ø Low Dropout voltage, 280mV@200mA; Ø Reverse polarity protection; Ø Output short circuit to ground protection, output short circuit to power supply protection; Ø Output foot inductive load clamp protection; Ø With over-temperature protection and over-current protection;   Typical application circuit Voltage follower LDO is mainly designed for special circuit protection in the field of automotive electronics applications. Of course, it can also be used in other power supply systems that require real-time adjustable power supply. For example, by controlling the DAC to directly give the required output voltage to the ADJ foot, a high-precision real-time adjustable power supply can be achieved. RS3011-Q1 has an internal back-to-back PMOS topology, which can achieve reverse polarity protection without the use of external diodes. It is particularly suitable for circuits that require reverse connection and backflow protection. RS3011-Q1 provides a standard ESOP8 package, which is fully compatible with commonly used models in the market, and provides two versions with and without enable to meet the design requirements of different product systems. RS3011-Q1 automotive-grade certification is in progress, and samples have been put into storage. You are welcome to request samples for comparison testing.  
    - May 09, 2025
  • How to choose parameters for operational amplifiers?
    How to choose parameters for operational amplifiers?
    The meaning of the parameters related to operational amplifiers that will be encountered in the future will be recorded here. Recently, while using a PGA, I found that there is always a rectangular wave signal in the output when the PGA input is grounded. After amplification by 1000 times, it is very obvious, and I suspect that it is caused by interference from the power supply. At the beginning, 100uf and 0.1 capacitors were added to both the positive and negative power inputs, but the effect was not significant. Later, we planned to connect a resistor in series with the power input terminal. Initially, we chose 1k resistor, but after powering on, we found that the chip could not work at all. We measured the power supply voltage at both ends of the chip and found that it was only slightly above volts. At this point, I looked at the static current in the data manual and found that it was actually 5mA. The PGA is powered by 5V, and if the PGA works normally, the voltage division on the 1k resistor can reach 5V. So later, a 50 ohm resistor was used in combination with 100uf and 0.1uf to form a low-pass filter. This way, the chip worked normally and the output ripple was also much smaller. When choosing an operational amplifier, one should know their design requirements and search for them in the operational amplifier parameter table. Generally speaking, the issues that need to be considered in design include 1 Selection of power supply voltage and mode for operational amplifiers; 2. Selection of operational amplifier packaging; 3. Operational amplifier feedback method, which is VFA (voltage feedback operational amplifier) or CFA (current feedback operational amplifier); 4. Operational amplifier bandwidth; 5. Selection of bias voltage and bias current; 6 temperature drift; 7. Pendulum rate; 8. Selection of input impedance for operational amplifiers; 9. Selection of output driving capability for operational amplifiers; 10. Static power consumption of operational amplifiers, i.e. selection of ICC current size; 11. Selection of operational amplifier noise; 12. Operational amplifier drives load stabilization time, etc. Bias voltage and input bias current In precision circuit design, bias voltage is a key factor. For parameters that are often overlooked, such as bias voltage drift and voltage noise that vary with temperature, they must also be measured. Accurate amplifiers require bias voltage drift of less than 200 μ V and input voltage noise of less than 6nV/√ Hz. The bias voltage drift with temperature variation is required to be less than 1 μ V/℃. The indicator of low bias voltage is important in high gain circuit design, as amplifying the bias voltage may cause high voltage output and occupy a large part of the output swing. The temperature sensing and tension measurement circuit is an application example using precision amplifiers. Low input bias current is sometimes necessary. The amplifier in the light receiving system must have low bias voltage and low input bias current. For example, the dark current of a photodiode is on the order of pA, so the amplifier must have a smaller input bias current. CMOS and JFET input amplifiers are currently available operational amplifiers with the minimum input bias current. Because I am currently using a photovoltaic system for data collection, I am particularly concerned with bias voltage and current during use. If there are other needs, more consideration should also be given to other parameters at this time. 1. Input Offset Voltage (VIO) The input offset voltage is defined as the compensation voltage applied between the two input terminals when the output terminal voltage of the integrated operational amplifier is zero. The input offset voltage actually reflects the symmetry of the internal circuit of the operational amplifier, and the better the symmetry, the smaller the input offset voltage. Input offset voltage is a very important indicator for operational amplifiers, especially for precision operational amplifiers or when used for DC amplification. 2. Input Offset Voltage Drift (VIO) The temperature drift (also known as temperature coefficient) of input offset voltage is defined as the ratio of the change in input offset voltage to the change in temperature within a given temperature range. This parameter is actually a supplement to the input offset voltage, which facilitates the calculation of the drift caused by temperature changes in the amplifier circuit within a given operating range. The input offset voltage temperature drift of general operational amplifiers is between ± 10~20 μ V/℃, while the input offset voltage temperature drift of precision operational amplifiers is less than ± 1 μ V/℃. 3. Input Bias Current IB In the use of operational amplifiers, there may also be an input bias current IB, which refers to the DC current at the base of the input transistor of the first stage amplifier. This current ensures that the amplifier operates within a linear range, providing a DC operating point for the amplifier. The input bias current is defined as the average bias current at the two input terminals of an operational amplifier when the output DC voltage is zero. The input bias current has a significant impact on areas that require input impedance, such as high impedance signal amplification and integration circuits. The input bias current is related to the manufacturing process, and the input bias current for bipolar process (i.e. the standard silicon process mentioned above) is between ± 10nA and 1 μ A; For input stages using field-effect transistors, the input bias current is generally less than 1nA. For bipolar operational amplifiers, the value has a high degree of variability, but is almost unaffected by temperature; For MOS type operational amplifiers, this value is the gate leakage current, which is small but greatly affected by temperature.   4. Input Offset Current Input offset current refers to the error in bias current between two differential input terminals. The input offset current is defined as the difference in bias current between the two input terminals of an operational amplifier when the output DC voltage is zero. The input offset current also reflects the symmetry of the internal circuit of the operational amplifier, and the better the symmetry, the smaller the input offset current. Input offset current is a very important indicator for operational amplifiers, especially for precision operational amplifiers or when used for DC amplification. The input offset current is approximately one percent to one tenth of the input bias current. The input offset current has a significant impact on small signal precision amplification or DC amplification, especially when larger resistors are used outside the operational amplifier (such as 10k or more). The impact of input offset current on accuracy may exceed that of input offset voltage. The smaller the input offset current, the smaller the midpoint offset during DC amplification, and the easier it is to handle. So for precision operational amplifiers, it is an extremely important indicator. 5. Input impedance (1) Differential input impedance Differential input impedance is defined as the ratio of the voltage change at the two input terminals to the corresponding current change at the input terminals when the operational amplifier operates in the linear region. Differential input impedance includes input resistance and input capacitance, and only refers to input resistance at low frequencies. (2) Common mode input impedance The common mode input impedance is defined as the ratio of the change in common mode input voltage to the corresponding change in input current when the operational amplifier operates on an input signal (i.e. the same signal is input at both input terminals of the operational amplifier). At low frequencies, it manifests as common mode resistance. 6. Voltage gain (1) Open Loop Voltage Gain In the absence of negative feedback (open-loop condition), the amplification factor of an operational amplifier is called open-loop gain, denoted as AVOL, which is written as: Large Signal Voltage Gain。 The ideal value of AVOL is infinite, generally ranging from thousands to tens of thousands of times, and its representation can be expressed in dB and V/mV. (2) Closed Loop Gain As the name suggests, it is the amplification factor of an operational amplifier with feedback. 7. Output Voltage Swing When the operational amplifier operates in the linear region, under a specified load, the maximum voltage amplitude that the operational amplifier can output when powered by the current power supply voltage. 8. Input voltage range (1) Differential input voltage range The maximum differential input voltage is defined as the maximum allowable input voltage difference between the two input terminals of the operational amplifier. When the allowed input voltage difference between the two input terminals of the operational amplifier exceeds the maximum differential mode input voltage, it may cause damage to the operational amplifier input stage. (2) Common Mode Input Voltage Range The maximum common mode input voltage is defined as the common mode input voltage when the operational amplifier operates in the linear region and its common mode rejection ratio characteristics significantly deteriorate. It is generally defined as the maximum common mode input voltage corresponding to a 6dB decrease in common mode rejection ratio. The maximum common mode input voltage limits the range of maximum common mode input voltage in the input signal, and this issue needs to be taken into account in circuit design in the presence of interference. 9. Common Mode Rejection Ratio The common mode rejection ratio is defined as the ratio of the differential mode gain to the common mode gain of an operational amplifier when it operates in the linear region. Common mode rejection ratio is an extremely important indicator that can suppress common mode interference signals. Due to the large common mode rejection ratio, the common mode rejection ratio of most operational amplifiers is generally tens of thousands of times or more, and it is not convenient to compare directly with numerical values. Therefore, decibel recording and comparison are generally used. The common mode rejection ratio of a typical operational amplifier is between 80 and 120dB. 10. Supply Voltage Rejection Ratio The power supply voltage suppression ratio is defined as the ratio of the input offset voltage of the operational amplifier to the variation of the power supply voltage when the operational amplifier operates in the linear region. The power supply voltage suppression ratio reflects the impact of power supply changes on the output of operational amplifiers. So when used for DC signal processing or small signal processing analog amplification, the power supply of the operational amplifier needs to be carefully and meticulously processed. Of course, operational amplifiers with high common mode rejection ratio can compensate for a portion of the power supply voltage rejection ratio. In addition, when using dual power supply, the power supply voltage rejection ratios of positive and negative power supplies may not be the same. 11. Static power consumption The static power of an operational amplifier at a given power supply voltage is usually in an unloaded state. Here is the concept of static current IQ, which refers to the current consumed by the operational amplifier during no-load operation. This is the minimum current consumption of the operational amplifier (excluding sleep mode) 12. Slew Rate The conversion rate of an operational amplifier is defined as the rate at which a large signal (including a step signal) is input to the input of the operational amplifier under closed-loop conditions, and the output rise rate of the operational amplifier is measured from its output. Due to the fact that the input stage of the operational amplifier is in a switching state during the conversion period, the feedback loop of the operational amplifier does not function, meaning that the conversion rate is independent of the closed-loop gain. Conversion rate is an important indicator for large signal processing. For general operational amplifiers, the conversion rate SR<=10V/μ s, while for high-speed operational amplifiers, the conversion rate SR>10V/μ s. The current high-speed operational amplifier has a maximum conversion rate SR of 6000V/μ s. This is used for selecting operational amplifiers in large signal processing. 13. Gain bandwidth (1) Gain Bandwidth Product Gain bandwidth product, GBP, The product of bandwidth and gain. (2) Unit gain bandwidth The bandwidth when the amplification factor of the operational amplifier is 1. The concepts of unit gain bandwidth and bandwidth gain product are somewhat similar, but different. It should be noted that for voltage feedback type operational amplifiers, the gain bandwidth product is a constant, but not for current type operational amplifiers, because for current type operational amplifiers, bandwidth and gain are not linearly related. 14. Output impedance The output impedance is defined as the ratio of the voltage change to the corresponding current change when a signal voltage is applied to the output terminal of an operational amplifier when it operates in the linear region. At low frequencies, it only refers to the output resistance of the operational amplifier. This parameter is tested in an open-loop state. 15. Equivalent Input Noise Voltage The equivalent input noise voltage is defined as any AC irregular interference voltage generated at the output of an operational amplifier with good shielding and no signal input. When this noise voltage is converted to the input terminal of the operational amplifier, it is called the operational amplifier input noise voltage (sometimes also expressed as noise current). For broadband noise, the effective value of input noise voltage for ordinary operational amplifiers is about 10-20 μ V.
    - April 11, 2025
1 2 3 4 5

A total of 5 pages

Need Help? Chat with us

leave a message
For any request of parts price or technical support and Free Samples, please fill in the form, Thank you!
Submit

Home

Products

whatsApp

contact