About Me

I’m a Master’s student in Health Informatics, joint between Northeastern University’s Khoury College of Computer Science and Bouvé College of Health Sciences.

My Background 🇿🇼

Having grown up in Zimbabwe, my greatest passion was writing, producing, and performing music. I spent most of my time in my home church and my high school’s music department. I was accepted to Drexel University in 2013 to study psychology, and in my junior year I joined the accelerated BSMS in Psychology program mentored by Fengqing Zhang of the Quantitative Psychology & Statisics Lab. There, I learned about statistical/machine learning, data mining, and computer programming in R & Python, and learned to apply these skills in a range of mental & behavioural health studies. In 2017, I interned at Salesforce as a data scientist in People Analytics, where I worked on NLP and text mining problems geared to improve employee success, and in 2018, I graduated from Drexel with my BSc & MSc in psychology.

From 2018 to 2022, I worked as a neuroimaging data analyst at the Penn Lifespan Informatics and Neuroimaging Center, where I developed and used various software and programming tools to process, curate, and analyse neuroimaging data. Primarily, my role as a data analyst involved developing robust, reproducible, and scalable data preprocessing pipelines, using technologies like Python, R, bash, and docker.

My Data Science Interests🪚

My goal as a research software engineer is to advance the goals of PIs, their labs, and their students, by providing high-quality software and engineering support. Science is a team sport, and relying on a single data science unicorn is as much a myth as it is a handicap to good work. As such, my philosophy is that successful labs and teams silo and integrate different skill sets among talented individuals in the data science pipeline. I thrive in roles where I can get my hands dirty tackling the nitty-gritty barriers to getting the science done. I love figuring out things like:

  • Documentation: Knowledge is a resource, and everyone should have access to the “how-to’s,” in a lab. I pride myself on creating clear, concise, and engaging documentation of both refined systems, and best practices and tutorials. I use tools such as automated documentation generators (roxygen, pkgdown, Sphinx, Quarto, Github Pages).

  • Reproducible Data Pipelines: Nothing’s worse than looking back over several hours of work and being unable to figure out how you ended up where you did. I enjoy creating reproducible and well documented pipelines for data transformation and analysis, using tools like Quarto, Jupyter, targets, pytask, testhat, pytest, datalad, and CI/CD tools like Github Actions.

  • Tooling: Your analysis shouldn’t be hampered because you can’t open an IDE or install a library; nothing gives me more joy than tinkering with a silly installation so that it “just works,” for everyone who tries it later.

  • Accessibility: I don’t believe in “gatekeeping” data science — it’s not true that only “real programmers” use vim to write scripts. It’s okay to use tools and other helpful strategies to make programming more accessible to you, including friendly IDEs, libraries, and problem-solving paradigms. This is why I’m a huge advocate of notebook-driven development with tools like nbdev (Python) and fusen (R).

When I’m Not Doing Data Stuff…🏡