Every March the SXSW film/music/tech festival invades Austin for a week plus. There's even a job fair, which /used/ to be free and amazing (if mostly out of town companies hiring). But this year it required a trade show badge- badges for the different parts range from \$500 - \$2000, which is pretty exclusivist if you ask me, but it's ok. What matters is that there's still a ton of free music shows, every day, with local bands and those from as far away as Germany.
Exploring Json Data With Pandas
Flattening Jsons with Pandas You may recall the json view showed messy nested dictionaries. To make it more readable, I altered what level of the data it extracts, but still need to do something to view it as a table. In the next post I'll do a different structuring approach for downloading the bulk data, but for now I just want to look some more at what I have.
Extracting Data with Facebook API (II)
Using Facebook-sdk for Python Now I've got the basic usage down for the Facebook API, I need to access it through a Python script that can gather years worth of data and also grab the children (comments and replies to comments). I could just use the usual requests library, but there happens to be a lovely facebook graph api package, facebook-sdk
Extracting Data with Facebook API
First step in analyzing food allergy data from a Facebook support group
Sulfite Allergy Data Project Overview
Non-technical overview of new project to analyze discussion data from a sulfite allergy support group
Windows File Paths
Another day, another Windows quirk. Today I found an explanation and trick (read, proper way) to write a file path in Python (or probably any language). I am sure this has been the reason behind inconsistent "relative path" success many times in the past year or so of working with data.
The most recent head-against-wall experience:
A Series of Unfortunate Code
I'm trying to go back and add work I did during school. It's not spectacular but I have fond memories.
Spark, Jupyter, AWS? Oh my!
First part of guide to running a cluster on AWS with Spark through Jupyter
Analyzing Astronomical Data in Apache Spark- Discussion
Why? After a seminar style course on data science, my professor invited us to do our MS project (a non-verbose thesis) using Apache Spark, a new and popular engine for distributed computing. Since my previous degree was in physics, he suggested I look there for data. Having some background in astronomy, I knew there was plenty of free, accessible data available there, and turned my nose in that direction. In my initial research, I found that very little work had been done with Spark in astronomy research, and then found this delightful new Python library, astroML
Astronomical Data in Spark: GMM Models From Prepared Data
Gaussian Mixture Model
This notebook performs 2 forms of Gaussian Mixture models algorithms to find clusters in flux space on stellar data 1) Spark ML GMM module (on RDD) 2) Sci-kit learn basic GMM on (numpy aray)
- Data has been preprocessed in Spark as dataframes and converted to numpy arrays