Below are all of my projects that are a result of either accomplishing some goal or learning something new in the field of software, web development, statistics, and data.
Optimal Vancouver Brewery Tour
Recommendation Engine for Canadian Stats Graduate Programs
- : R, Google Maps/Places API
- For STAT853 (Stat Computing), our final project was to use a dataset consisting of beer ratings to build a data-driven brewery tour for the Greater Vancouver Area (GVA). With a partner, we built functions in R to identify and locate breweries in the dataset that are in the GVA, developed numerous heuristics to quantify each brewery, and applied Traveling Salesman Problem algorithms to discover optimal routes based on various starting and end points.
NHL Trades and Trust Research
- : R, PDF parsing, LDA/CTM, TD-IDF, text analytics
- For STAT853 (Stat Computing), one project had us parse years of publicly available conference guides that included abstracts from students and professors presenting at SSC. We built a custom PDF parser for the SSC guide format, and compiled all of the abstracts into a single corpus. Applying the usual preprocessing steps for a text analytics pipeline (removing stop words, stemming), we used cross validation and LDA to identify how many topics we believe are in our corpus. After coming up with sensible topic names based on the top n terms in each topic (topics included Bayesian Methods, Financial Analysis, Statistical Computing), we trimmed the document term matrix using cutoff scores of TF-IDF values for each word and trained a model using LDA to obtain our final assigned topics for each abstract. For each school, we generated histograms of their topic distributions and word clouds of the most popular words seen in the abstracts coming from that school.
- : Python (Pandas, networkx), R, LaTeX
- A business professor wanted data analysis performed on a compiled dataset of trades and front-office movement between NHL teams. With a deadline of less than two weeks, I wrote software for joining the datasets, creating new explanatory variables that captured intuitive ideas, and tested various log-linear models handling overdispersion - finally compiling all results and conclusions in a LaTeX document.
Full Stack Projects
NBA Draft Chatter Aggregator
- : http://seecis.com
- This started from my efforts to automate getting +/- from CIS play-by-play. All information can be found on the landing page, in the hyperlink above.
- : http://stevenwu.pythonanywhere.com
- : Python (nltk), Flask, MongoDB, HTML, CSS
- I submitted an (eventually winning) idea for NSERC CGS-M and this was the result of proving out the feasibility of my idea (in terms of data collection and organization). The idea was to use the inherent structure of online forums to improve sentiment classification. I decided to use RealGM's NBA Draft forums as the data source for prototyping the data collection and storage.
Applying Evolutionary Algorithms to Neural Network Training
- : MATLAB, LaTeX
- For CMPT726, our final project idea was to try evolutionary algorithms to replace backpropagation in training neural networks. Evolutionary algorithms are known to be good alternatives to descent methods, and there was published work for us to follow. We ended up trying out some novel tweaks in using simulated annealing to train the neural network, and we followed up on another paper's suggested future work in implementing genetic algorithms to tune the simulated annealing hyperparameters. Running a series of experiments, we compared the results (in terms of accuracy and runtime) to the baseline stochastic gradient descent method of training, in a 7 page paper in the NIPS style guide.
- : https://github.com/stevenwu4/ai-snake
- : Python
- My first 4th year CS course that required individual programming of a project from scratch. We were required to implement the popular snake game (where you drive a snake using your arrow keys to eat food, and each food results in the tail growing by 1) - except the snake had to drive itself using uninformed searches (BFS, DFS) and then A* search. I designed the state space and coded the implementation with a command line UI, with reporting that allowed for comparing between the performance of the searches. The grid dimensions, number of obstacles that the snake must avoid, number of iterations before being finished, and search type are all configurable parameters.
- : http://cqads.carleton.ca/blog
- : https://github.com/stevenwu4/CFL
- : Python, MongoDB
- While working for Carleton's CQADS, a fellow colleague interested in sports statistics wanted to get his hands on a CFL dataset. Since there were none available, I scraped it for him and used my code as the main resource for blog posts that the CQADS bought for their blog.
Maple Crytography Output Parser
- : https://github.com/stevenwu4/astrotweetbot
- : Python, Flask, Twitter API, Openshift
- This was the proud result of my first hackathon with two teammates (one from University of Waterloo, the other from Carleton University), for NASA Hackathon 2014. The specific challenge we addressed was the Alert! Alert! challenge, which asked for a central place for information of sky phenomena to observe. We chose to centralize the information via a Twitterbot, that would provide a single location for you to Tweet at if you wanted to observe something cool in the sky from where you are. Not only does this allow for subscription/notification capability of sky events, but it posts the best sky/space photo from Reddit everyday.
- : https://github.com/stevenwu4/maple-crypto-parser
- : Python
- My girlfriend was a research assistant for Dr. Steven Wang over one summer, and was tasked with basically finding patterns in a maze of output. The hypothesis was about factoring cyclotomic polynomials over finite fields, and Maple was used to factor specific polynomials. She was manually surveying each factored polynomial, sorting it, and recording the number of factors for each degree. I decided to write a program that would do all of that work for her, to help save her time and allow her to get to the interesting bits of her research.