oblatespheroid.github.io

Oblate Spheriod

Intro

I came into coding from the traditional research world (environmental science). I started building models in R and Python, and later moved into scientific computing using SciPy and now dabbling with Julia. I’m now focused mostly on machine learning, text mining/natural language processing, and automation in application development and deployment.

My T-skills graph

Skill and tool sets

Table 1: Long version

Skill set	Tool set
Data Management/Information Systems
SQL	PostgreSQL, MySQL, MS SQL, Sybase
NoSQL	MongoDB, SPARQL, Redis
Ontologies and Taxonomies	RDF/OWL, Protege, SPARQL, Apache Jena
Software developement
Unix environments	shell (bash, zsh), system admin tools
Rapid prototyping	R Shiny
General programming langauges	Python, Scala, Go, Rust
Backend developement	Django, Flask, Node.js
Analytics and Prediction
Data munging, cleaning, processing	NumPy, Pandas, R's tidyverse
Text mining and Natural Language Processing	Python (NLTK, scikit-learn), R (tm, quanteda)
Machine Learning	scikit-learn, TensorFlow, PyTorch
Probablility and Inference	hypothesis testing, time series, probability modeling, forecasting, resampling methods, Bayesian methods
Simulation	Arena, SimPy
Optimization	linear and integer programming, calculus, numerical methods
Experimental design
Vizualizations and dashboards	Tableau, R Shiny, ggplot, plotly, Kibana
DevOps
Version control and Collaboration	Git, GitHub, Jira
Continuous Integration/Delivery	GoCD, Jenkins
Configuration/Cluster Management	Ansible, Vagrant, Docker, Kubernetes, Mesos
Build Tools	Make, Ant, Maven, Gradle
Monitoring	Elasticsearch, Loghash, Kibana (ELK), Icinga

Table 2:

Data Management	Software/Web Development	Advanced Analytics
SQL (PostgreSQL, MySQL)	*nix environments, bash	Text mining and NLP
NoSQL (MongoDB, RDF/SPARQL)	Python, R, SAS	Machine learning
Data wrangling and cleaning	Pipelines (GoCD)	Classical and Bayesian probability and inference
Database/schema design and mgmt	Shiny for rapid protyping	Simulation
Ontologies and taxonomies	Django web framework	Optimization
	Agile development process	Experimental design
	git version control	Visualizations, dashboards

Repos

Rust Books: List of books on Rust (programming language)

MOOCs for datascience: List of Massive Open Online Courses (MOOCs) related to Data Science from several sources

Install from requirements: A simple R function that works like Python’s ‘pip install -r requirements.txt’

OReilly Data Show: Copy of the RSS feed for the O’Reilly Data Show podcast to get around a firewall issue with the offical feed

oblatespheroid.github.io

Oblate Spheriod

Intro

Skill and tool sets

Data Management/Information Systems

Software developement

Analytics and Prediction

DevOps

Repos