Projects

Data Projects

  • 2019, Traffic incident detection (anomaly detection)
    • Use time series analysis, neural networks and tensors to detect traffic incidents from sensor counts data. Data includes several sensors' counts in 5 minute granularity and their longitude and latitude information. Data are available for the whole year. Apparent seasonality appears. The goal is to detect traffic incidents from these data.
  • 2019, ASHRAE - great energy predictor III (predictive modeling, time series)
    • Use feature engineering, deep learning, gradient boosting, time series analysis and ensemble to predict energy cost. Data includes energy cost time series for 1000+ properties, each with 4 meters. Additional information includes the property information (area, floors, primary use and etc.) and weather information for different region (temperature, precipitation, cloud coverage, wind speed and etc.) Data includes 60M rows.
  • 2019, Autonomous driving object classification (classification)
    • Use KITTI data (images) to classify objects into pedestrians, cars, trucks and etc. Convolutional neural networks are used to pre-process images and extract features, then sparse group lasso regularized artificial neural networks were used to classify objects.
  • 2018, MRI Alzheimer's disease prediction (classification)
    • Use sparse group lasso regularized artificial neural network to classify people into healthy person/Alzheimer's disease patient based on their brain MRI images. Data is available on ADNI with proper pre-processing such as segmentation, lowering the resolution and removing inactive regions.
  • 2018, Spam email classification (classification)
    • Used group lasso regularized generalized additive model (GAM) to classify spam emails based on word features (key words, capital letters and etc.)
  • 2017, Cats and dogs recognizer (image classification)
    • Used convolutional neural networks, transfer learning and random forests to classify a cat/dog image.
  • 2017, Prostate cancer classification (classification, gene data)
    • Used gene expression data to classify prostate cancer status.
  • 2016, Alcohol detection (image classification)
    • Used images pulled from Pinterest to build a classifier with convolutional neural networks to detect if an image contains alcohol drink or not.
  • 2014, Data analysis for handwritten patterns (Gaussian mixed model, EM algorithm)
    • Used Gaussian mixed models to analyze and generate handwritten digits.
  • 2014, Design of experiment for factors that influence the performance of paper planes (Design of experiment)
    • Used 2^k factorial design to study the influence of performance of paper planes by nose shape, paper material, wing shape and fold shape. Project includes making paper planes, experiment implementation, data collection, data analysis and presentation.
  • 2013, Data analysis for the television, school and family smoking prevention (Mixed model, random effect)
    • Used linear mixed models and random effect models to analyze school performance and several factors. Statistical testings are also involved in this project.
  • 2013, Data analysis for Tianjin's city economy (Regression, clustering, factor analysis)
    • Used regression, clustering, factor analysis and other multivariate analysis methods to analyze the economy and family expense in Tianjin, China.
  • 2012, Ship schedule optimization on the big long river (Optimization)
    • Used optimization models to arrange the ships' leaving time and camping time on the big long river.

Theoretical Projects

  • 2019, worked on projections methods, random projections and other dimensionality reduction methods.
  • 2019, worked on the classification consistency of shallow neural network with sparse group lasso regularization.
  • 2018, worked on the selection consistency of generalized additive model (GAM) with a diverging number of active predictors and studied the tuning parameter selection consistency with generalized information criterion (GIC).
  • 2017, worked on the selection consistency of Bayesian graphic model with a latent variable following the Ising prior to help variable selection.