Stuff I’ve Built (& Why)
I’ve learned so much from free software, and want to give back by making the things I build available to others.
I would characterize my work as falling into one of three categories:
- improving the research & development experience
- educational resources
- software to support computational science
I believe in making complicated things easier for people to digest, and I often feel that software helps achieve that goal.
On this page I provide some context around why I built them.
If you want, you can skip to the software contributions page.
The research I was required to conduct for my doctoral dissertation involved developing new mathematical methods in tandem with writing code that demonstrates its practical utility.
I kept running into certain problems that undermined my productivity, including but not limited to:
- stuff worked on one machine but not another
- small changes would break large functionality
- missing dependencies (data, images) preventing reproducibility
- trusting my implementations were “correct” before sharing
These issues led me to adopt tools like containers for setting up a reproducible computational environment, git for tracking versions of my work, and software packaging (including testing) for ensuring the integrity of my contributions.
I wanted to make it so that other people wouldn’t have to trip against the same roadblocks that I did.
Furthermore, I noticed that professors who were trying to adopt more computational work into their curriculum were struggling to help students set up an environment on their personal computers that would enable them to do their coursework.
I took initiative to solve these problems.
I set up JupyterHub at CU Denver and onboarded much of my math department to a browser-based development workflow.
I maintained this deployment for many years with upgrades on a fork of a Project Jupyter reference deployment, and eventually published a “template” under the ml-starter-packs organization after using it years post-graduation (including with clients).
Under the name of that Github organization, I also began to publish templates when teaching internal workshops on the deployment of machine-learning models and exposing their functionality through APIs.
I also began adopting the use of mlflow and advocating for its use in tracking experiments during the research and development lifecycle, so I created a containerized deployment template to help others get started by providing a complete working example.
I discuss software related to my research on the estimation page and provide references to the associated code there, as I made all of my results available as open-source Python packages published on PyPi.
However, one of the projects of which I’m most proud was the demonstration of Continuous Integration & Deployment principles in the publication of my doctoral dissertation, which I used to create a template for others to use.
If there’s one theme that unifies my research experience, it’s estimation.
Most of my time was spent creating new methods to estimate truth from data, which is polluted by observational noise in the measurement process.
I developed a new statistical method to estimate model parameters from noisy data (i.e., “fitting a model”).
I used a measure-theoretic framework (developed by my advisor Dr. Troy Butler) to solve a stochastic inverse problem, one involving estimation of a true parameter.
What makes this framework interesting is that you can incorporate your assumptions without biasing your conclusions.
This is accomplished by the construction of a specialized regularization term by solving an associated stochastic forward problem.
To reiterate, this measure-theoretic framework inverts a distribution by incorporating the knowledge gained by first solving a forward problem.
My specific contributions focused on casting a parameter-estimation problem as a distribution-estimation problem in order to leverage the aforementioned measure-theoretic framework.
I named this new method “MUD”, which stands for Maximal Updated Density.
You can read about it below.
I published my dissertation on Github, and made all the software available as open-source Python packages.
Tracking Toxins & Hurricanes
The project that actually motivated my research was at Los Alamos National Laboratory in the Environmental & Earth Sciences Group, working on physics-based modeling for environmental remediation.
The team there was using Julia to simulate the spread of hexavelent chromium in the subsurface by incorporating data collected at groundwater wells.
I used Julia to develop some of the initial work for what eventually became my Python package
mud, but it was my encounters with other researchers at the lab that eventually led me to the research question I would answer for the next few years.
How do we incorporate all of the data, not knowing which of it is helpful?
They were solving a parameter-estimation problem (e.g., “where is the toxin and how is it spreading?”), not a distribution-estimation problem (e.g., “what is the variability in toxin concentrations?”).
The framework I was tasked with using didn’t handle parameter-estimation, as it was designed for an entirely different class of problems.
I would spend the next two years prototyping how I could “translate” between these two problems (a common theme in pure mathematics), and several subsequent years refining and rigorously proving the approach that I developed.
I knew it worked due to computational evidence for over a year before I developed the theory to mathematically prove what I was doing.
In this way, the computer was my laboratory: I experimented with different hypotheses for how I could structure an algorithm, gained valuable insights from studying what worked, and then pursued an explanation which would start the cycle all over again.
At the end of my time in academia, I was unable to revisit how well it worked on the original problem that I was trying to solve involving subsurface contaminant transport, as the research group had its funding changed and moved on to other projects.
However, the approach has now been thoroughly tested against data from recent hurricane storm surge simulations (involving real data collected from buoys) and is proving extremely promising in providing accurate and precise parameter estimates.
My work was funded by the NSF and DOE, and is in the process of being incorporated into real prediction systems to inform evacuation decisions during hurricanes (though I have only an advisory role at this point).
(AKA the Monte-Carlo Calculator)
After graduating, I really wanted to build something useful to a broader audience as a consequence of my academic research.
One of the small components of my work involved stochastic forward problems, which I found myself reaching for when simulating various processes under uncertainty.
In addition, I became adept at understanding how to deal with high-dimensional data.
One technique to do visualize problems in moderately sized dimensions is Parallel Plots.
I combined these two things and built many versions of what eventually became The Oracle.
The Oracle is a web app I built to help me make quick estimates without keeping them in my head (and then share them!).
The software library
mud associated with the Python package of the same name can be found here:
Some information about it:
- Built with Python
- Shipped as the package
mud via PyPi
- Automatic documentation built with sphinx and deployed with readthedocs.
- Github Actions for automated testing + package releases
- Code Coverage Reports
- Can be paired with
mud-examples to generate all the figures in my dissertation with one command
I combine art, math, and machines to build wonderful things.
Unfortunately, I have done a bad job of documenting these projects.
Please pardon this Work-In-Progress. I promise to fill this page with content shortly.
My priority was first to document all of the other projects I’ve done.
In short, I have been making digital art (which occasionally becomes physical) with the use of randomness since 2013.
I suppose it was a bit of a harbinger for my academic career: I wanted to explore the impact of probability distributions on some (here, visual) outcome through the use of software built and run on computers.
My art projects were actually how I learned to program, they were how I pushed the boundaries of my knowledge in both hardware and software development (it started with scripting in Matlab…), and how I found a creative outlet for expression.
To date, my biggest artistic “accomplishment” is that I wrote software that enabled the parametric design of the exterior of a hospital in Denver.
The colors of the glass windows were chosen at random through simulation in proportions that could be changed interactively and show up in the Architect’s CAD software immediately.
This enabled the automation of part ordering and flexibility in the design (choice of colors and their relative occurrences).
read more about it