GLU-accelerated Sparse Parallel LU factorization solver V3.0

Last update: March 19, 2026

Authors:

Shaoyi Peng (speng004@ucr.edu)
Kai He (khe004@ucr.edu)
Sheldon Tan (stan@ece.ucr.edu)

Please contact Sheldon Tan for any question.

Additional information: https://intra.ece.ucr.edu/~stan/project/glu/glu_proj.htm

License

USB 3-Clause License

Sub-directories

docs: contains some related document's and publications for GLU src: contains all the source codes for GLU pyglu: Python package (bindings for the GPU solver)

Python Bindings (pyglu)

pyglu exposes the GLU solver as a Python library with an API similar to scipy.sparse.linalg.splu / spsolve.

Requirements

NVIDIA CUDA Toolkit (nvcc); set CUDA_HOME if not at /usr/local/cuda
GCC / G++
Python ≥ 3.9, pybind11 ≥ 2.11, NumPy ≥ 1.22
SciPy ≥ 1.9 (optional, for passing sparse matrices directly)

Installation

pip install pybind11
pip install -e . --no-build-isolation

Usage

import scipy.sparse as sp
import numpy as np
import pyglu

A = sp.random(1000, 1000, density=0.01, format='csc') + sp.eye(1000)
b = np.ones(1000)

# Factorize once, solve multiple times
lu = pyglu.splu(A)
x1 = lu.solve(b)
x2 = lu.solve(np.random.rand(1000))

# Or solve directly
x = pyglu.spsolve(A, b)

# Enable diagonal perturbation for near-singular matrices
x = pyglu.spsolve(A, b, perturb=True)

splu accepts any scipy sparse matrix (any format) or a raw (data, indices, indptr, shape) tuple in CSC format. The solver uses single-precision (float32) arithmetic on the GPU; inputs and outputs are float64 with conversion at the boundary.

Publications:

J1. K. He, S. X.-D. Tan, H. Wang and G. Shi, “GPU-Accelerated Parallel Sparse LU Factorization Method for Fast Circuit Analysis”, IEEE Transactions on Very Large Scale Integrated Systems (TVLSI), vol. 24, no.3, pp.1140-1150, March 2016.

J2. S. Peng and S. X.-D. Tan, “GLU3.0: Fast GPU-based Parallel Sparse LU Factorization for Circuit Simulation”, IEEE Design and Test (accepted in Feb 2020), pre-print is available at http://arxiv.org/abs/1908.00204

Some recent bug fixes by Codex, Feb 2026

CUDA sync bug that could deadlock kernels (__syncthreads() on divergent path)

Fixed in numeric.cu (line 270) (kernel RL_onecol_updateSubmat). GPU resource/error-handling gaps (unchecked CUDA calls, leaked streams/events/tmp buffer, unsafe tmpMem sizing when free memory < 4GB)

Fixed in numeric.cu (line 347) onward (LUonDevice). Ownership bug in preprocess failure path (freeing caller-owned SNicsLU*) + memory-management cleanup issues

Fixed in preprocess.c (line 102) onward. CLI parse bug (-i missing value check off-by-one) + missing cleanup/return in main flow

Fixed in lu_cmd.cpp (line 43), lu_cmd.cpp (line 131). Structural diagonal robustness for symbolic phase (prevents downstream invalid indexing assumptions)

Fixed in symbolic.cc (line 39).

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
docs		docs
include		include
pyglu		pyglu
src		src
CLAUDE.md		CLAUDE.md
LICENSE.txt		LICENSE.txt
README		README
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GLU-accelerated Sparse Parallel LU factorization solver V3.0

Authors:

License

Sub-directories

Python Bindings (pyglu)

Requirements

Installation

Usage

Publications:

Some recent bug fixes by Codex, Feb 2026

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GLU-accelerated Sparse Parallel LU factorization solver V3.0

Authors:

License

Sub-directories

Python Bindings (pyglu)

Requirements

Installation

Usage

Publications:

Some recent bug fixes by Codex, Feb 2026

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages