oblatespheroid.github.io

Oblate Spheriod

Intro

I came into coding from the traditional research world (environmental science). I started building models in R and Python, and later moved into scientific computing using SciPy and now dabbling with Julia. I’m now focused mostly on machine learning, text mining/natural language processing, and automation in application development and deployment.

My T-skills graph

Skill and tool sets

Table 1: Long version

Skill set Tool set
Data Management/Information Systems
  SQL PostgreSQL, MySQL, MS SQL, Sybase
  NoSQL MongoDB, SPARQL, Redis
  Ontologies and Taxonomies RDF/OWL, Protege, SPARQL, Apache Jena
Software developement
  Unix environments shell (bash, zsh), system admin tools
  Rapid prototyping R Shiny
  General programming langauges Python, Scala, Go, Rust
  Backend developement Django, Flask, Node.js
Analytics and Prediction
  Data munging, cleaning, processing NumPy, Pandas, R's tidyverse
  Text mining and Natural Language Processing Python (NLTK, scikit-learn), R (tm, quanteda)
  Machine Learning scikit-learn, TensorFlow, PyTorch
  Probablility and Inference hypothesis testing, time series, probability modeling,
forecasting, resampling methods, Bayesian methods
  Simulation Arena, SimPy
  Optimization linear and integer programming, calculus, numerical methods
  Experimental design
  Vizualizations and dashboards Tableau, R Shiny, ggplot, plotly, Kibana
DevOps
  Version control and Collaboration Git, GitHub, Jira
  Continuous Integration/Delivery GoCD, Jenkins
  Configuration/Cluster Management Ansible, Vagrant, Docker, Kubernetes, Mesos
  Build Tools Make, Ant, Maven, Gradle
  Monitoring Elasticsearch, Loghash, Kibana (ELK), Icinga

Table 2:

Data Management Software/Web Development Advanced Analytics
SQL (PostgreSQL, MySQL) *nix environments, bash Text mining and NLP
NoSQL (MongoDB, RDF/SPARQL) Python, R, SAS Machine learning
Data wrangling and cleaning Pipelines (GoCD) Classical and Bayesian probability and inference
Database/schema design and mgmt Shiny for rapid protyping Simulation
Ontologies and taxonomies Django web framework Optimization
  Agile development process Experimental design
  git version control Visualizations, dashboards

Repos

Rust Books: List of books on Rust (programming language)

MOOCs for datascience: List of Massive Open Online Courses (MOOCs) related to Data Science from several sources

Install from requirements: A simple R function that works like Python’s ‘pip install -r requirements.txt’

OReilly Data Show: Copy of the RSS feed for the O’Reilly Data Show podcast to get around a firewall issue with the offical feed

pic