Hi! My name is JhihYang Wu and I am a Master's student at the University of Arizona studying Electrical & Computer Engineering. I was born in Taiwan, grew up in Shanghai, and currently living in the US. I am really interested in anything related to artificial intelligence and hope to one day contribute to the Big AI Dream of building robots as smart as humans. Recently, I also started getting interested in hardware acceleration and computer graphics. I love making friends so feel free to contact me about anything!
KewlAI
Re-implementation of cool deep learning algorithms.

GitHub Repository
OptiZona
Optics simulation software coded from scratch using OpenGL and C++. For my senior design project with ASML involving use of a Shack–Hartmann wavefront sensor.
miniRT
Small but powerful ray tracer I built from scratch using just C++ and math. Supports quite a few features to generate realistic images and does so in a reasonable amount of rendering time.

GitHub Repository
minigrad
PyTorch clone coded from scratch in Python. Has an autograd engine so it supports both training and inference.

GitHub Repository
C-- Compiler
A compiler coded in C that converts C-- code to MIPS assembly.

(GitHub repo is not public because it is UofA CSC 453 coursework)
HLS Compiler
A high-level synthesis tool coded in C++ that turns high level C-like code into optimized RTL code / Verilog. Uses the Force Directed Scheduling algorithm to minimize the number of resources used while meeting a time constraint.

(GitHub repo is not public because it is UofA ECE 474A/574A coursework)
MIPS CPU on FPGA
Five-stage pipelined datapath for the MIPS 32-bit ISA coded using Verilog. Works on Xilinx Artix-7 FPGA.

(GitHub repo is not public because it is UofA ECE 369A coursework)
BookTank
App that I built with some friends in high school for students and parents to buy and sell school textbooks more effectively. As of September 2022, we have attracted more than 1900 users and have more than 1100 book listings of which more than 600 are successful transactions.
CrossModalityDiffusion: Multi-Modal Novel View Synthesis with Unified Intermediate Representation
WACV 2025 GeoCV Workshop
Alex Berian, Daniel Brignac, JhihYang Wu, Natnael Daba, Abhijit Mahalanobis
Geospatial imaging leverages data from diverse sensing modalities-such as EO, SAR, and LiDAR, ranging from ground-level drones to satellite views. These heterogeneous inputs offer significant opportunities for scene understanding but present challenges in interpreting geometry accurately, particularly in the absence of precise ground truth data. To address this, we propose CrossModalityDiffusion, a modular framework designed to generate images across different modalities and viewpoints without prior knowledge of scene geometry. CrossModalityDiffusion employs modality-specific encoders that take multiple input images and produce geometry-aware feature volumes that encode scene structure relative to their input camera positions. The space where the feature volumes are placed acts as a common ground for unifying input modalities. These feature volumes are overlapped and rendered into feature images from novel perspectives using volumetric rendering techniques. The rendered feature images are used as conditioning inputs for a modality-specific diffusion model, enabling the synthesis of novel images for the desired output modality. In this paper, we show that jointly training different modules ensures consistent geometric understanding across all modalities within the framework. We validate CrossModalityDiffusion’s capabilities on the synthetic ShapeNet cars dataset, demonstrating its effectiveness in generating accurate and consistent novel views across multiple imaging modalities and perspectives.
[PDF] [Code]
ViewAttention: Pay Attention to Where You Look
ICIP 2025 Workshop of Generative AI for World Simulations and Communications
Alex Berian, JhihYang Wu, Daniel Brignac, Natnael Daba, Abhijit Mahalanobis
Novel view synthesis (NVS) has gained significant attention with the advancement of generative modeling techniques. While earlier methods, such as neural radiance fields (NeRF) and Gaussian splatting, have achieved impressive results, the integration of diffusion models has brought NVS closer to generating photorealistic images. Recently, there has been a shift toward few-shot NVS, where only a sparse set of input views are provided to generate novel views from unseen camera poses. Despite this, existing methods typically assume equal importance for all input views relative to the target view, which can lead to suboptimal results, especially when the input views vary significantly in relevance to the target pose. In this work, we focus on improving few-shot NVS by introducing a camera-weighting mechanism that adjusts the importance of source views based on their relevance to the target view. We propose two types of approaches: a deterministic weighting schemes, which account for geometric properties like Euclidean distance and angular differences between the source and target views, and a cross-attention based learning scheme that enables the model to learn optimal view weighting. We apply these techniques to few-shot NVS scenarios, demonstrating that our camera-weighting methods enhance the quality of synthesized views compared to conventional equal-weighting approaches. Our results highlight the potential of view-weighting strategies to improve accuracy and realism in few-shot NVS.
[Code]
CONTACT
or email me at jhihyang_wu å† outlook ∂ø† com