
조현제 Hyunje Jo
한정된 하드웨어 자원 위에서 무한한 논리적 가능성을 설계합니다.
Starting from a question about natural numbers and infinity in a library, I have grown into an engineer who seeks the fundamental principles behind technology. I find my greatest motivation in the joy of learning and the correspondence between theory and real-world implementation.
Get in touch
다양한 방식으로 저와 연락할 수 있습니다.
PGP Public Key
안전한 통신을 위해 다음 PGP 공개 키를 사용합니다.
-----BEGIN PGP PUBLIC KEY BLOCK-----
mDMEaO0QbhYJKwYBBAHaRw8BAQdAdLjOjbCunshoKBEs/eIEBdJbFXMd24mf6qke
94R7YK20FWFsZXBoXzM2MTEgPGFAaGo0Lml0PohyBBMWCAAaBAsJCAcCFQgCFgEC
GQEFgmjtEG4CngECmwMACgkQGUuxXPxxLpgAFQD+JAGV7xNNm8TeqtkT2FET67Ea
+WiWQUJiFV0CGoRdBu4A/1OxX+wwAtgBQcuVnNT3S4d4BVhnlZM1CzHkdqSqym4C
uDgEaO0QbhIKKwYBBAGXVQEFAQEHQHd2XZBmNM4x9H0vcFH7/9aJIoItDYX2YeCi
aGE6f/gGAwEIB4hhBBgWCAAJBYJo7RBuApsMAAoJEBlLsVz8cS6Y6KQA/2l9u7/g
4TbH9pRDEAIRpe+pJQNv6VQqne2H6p3oz1SSAP4+29EBMDQ4DzDxw+06hSLKjZWs
qzh6p+2OWW7rNfwHAw==
=BSh0
-----END PGP PUBLIC KEY BLOCK-----
소원 경비 비밀 무지개 박사 동화책 진로 경찰 소년 다이어트 군인 정원
오직 자동 주먹 강제 독립 간접 변신 찻잔 음식 구별 이렇게 여름
이 키를 사용하여 저에게 암호화된 메시지를 보내거나, 저의 서명을 검증할 수 있습니다.
Experience

NPU Architecture & RDMA Design: Designing High-Performance NPU interconnects, specifically focusing on InfiniBand (IB) RDMA logic. Solving critical bottlenecks such as HBM direct access contention and designing Weighted Priority Arbiters to optimize bandwidth between Compute Units and Network traffic.
FPGA Emulation & Prototyping (Lead): Led the emulation of the ‘ATOM’ chip (1.55B gates). Overcame extreme routing congestion on Synopsys HAPS-100 (VU19P) and Xilinx U250 clusters by implementing custom SerDes logic and manual SLR partitioning.
Research & Optimization: First author of ICCAD 2025 paper (CTDM). Developed a resource-efficient FPGA simulation technique using Chain-based Time Division Multiplexing, significantly reducing LUT usage and enabling faster verification.
System DMA & Memory Architecture: Designed programmable System DMAs supporting 4-AXI Master SIMD operations and architected a 32MB On-Chip Memory system including cache coherency logic.
Co-Simulation Environment: Built a seamless VCS-FPGA co-simulation system to bridge the gap between pre-silicon verification and post-silicon validation.

Automotive SoC Verification: Conducted rigorous simulation and performance analysis for Automotive SoCs (Lock-step & Split mode architectures) ensuring compliance with safety standards.
ARM Core Optimization: Optimized AMBA Bus interconnects and performed CPU/GPU simulations for Exynos Modems using ARM Cortex (Ananke) and Mali GPU architectures.
Emulator Acceleration: Migrated simulation environments from software-based models to Cadence Palladium accelerators, significantly reducing verification time for the S9 processor GPU (S5E9810).
DFT & Low-level Debugging: Handled DFT (Design for Testability) using Synopsys tools and performed deep-dive assembly level debugging for ARM ELF binaries.
Education
BS in Electrical and Electronics Engineering
Korea University
2011 - 2018
Activities: KUCC (Computer Club) C++ Lecturer

High School
Incheon Science High School (ISHS)
2009 - 2010
Achievement: Early Graduation (2 years) | Informatics & Math Olympiad

Publications & Articles
📄 [Paper] CTDM: Resource-Efficient FPGA-Accelerated Simulation of Large-Scale NPU Designs
Role: First Author | Venue: ICCAD 2025
Abstract This paper proposes a novel approach to accelerate large Neural Processing Unit (NPU) design simulations on FPGA through Chain-based Time-Division Multiplexing (CTDM) and its automatic compiler.
- Key Innovation: CTDM replaces repeated logic patterns with single logic patterns and register chains, leveraging hardware-predefined shift register primitives. This minimizes logic overhead and routing congestion, reducing FPGA resource utilization more effectively than conventional multiplexer-based TDM.
- Scalability & Compatibility: The automated compiler supports various HDLs (Verilog, VHDL, HLS, Chisel) and diverse hardware ranging from single boards to server-grade simulators like Synopsys Zebu. It also introduces a block interleaving technique to hide inter-FPGA link latency.
- Results: When applied to NVIDIA’s NVDLA, CTDM achieved 66% LUT and 82% FF resource reduction, enabling full deployment on a single AMD U250 FPGA. This resulted in a 3,653x acceleration in simulation time compared to CPU-based VCS.
- Real-World Application: Successfully implemented for the verification of a proprietary 4-die 1024 TFLOPS chiplet using 144 FPGAs on Zebu Server 5.
📄 [Paper] A Quad-Chiplet AI SoC with Full-Chip Scalable Mesh Over 16Gb/s UCIe-Advanced Die-to-Die Interface for Large-Scale AI Inferencing
Role: Co-author | Venue: ISSCC 2026
- Source: 2026 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA
Abstract This paper presents a 4nm-based quad-chiplet LLM accelerator achieving 56.8TPS on LLaMA v3.3 70B. The architecture integrates low-latency UCIe-Advanced die-to-die interfaces, unified mixed-precision compute, and HBM3E with advanced power schemes to sustain the bandwidth and thermal stability required for large-scale AI inferencing.
📄 [Paper] ATOMUS: A 5nm 32TFLOPS/128TOPS ML System-on-Chip for Latency Critical Applications
Role: Co-author | Venue: ISSCC 2024
- Source: 2024 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA
Abstract ATOMUS is a 5nm AI accelerator optimized for latency-critical applications such as high-frequency trading and SLO-based AI services. It delivers 32TFLOPS/128TOPS with outstanding single-stream responsiveness and low TDP, enabling efficient scale-out for both edge and datacenter/cloud platforms.
📰 [Article] Technical Reliability Issues in the Student Council Mobile Voting System
Role: Reporter (Korea University Newspaper) link news | [2015.11.23] Investigative Report | [2015.11.23]
Summary Authored an investigative article critiquing the mobile voting system used by the university student council. The report exposed significant security vulnerabilities and a lack of technical reliability in the system, raising concerns about potential election fraud and the integrity of the digital voting process.