To avoid system administration difficulties, we will use the latest Python 2.x version. I downloaded the
.dmg installer, unpacked it and ran the
.mkpg file. You may prefer to install Python with Homebrew, but I chose not to and I’ve forgotten why.
Next I installed
virtualenv to make sure this setup will not interfere with other projects.
$ cd ~/.local/lib $ git clone git://github.com/pypa/virtualenv.git $ cd virtualenv $ python setup.py install
I set up an alias in my
~/.profile file and set an environment variable to avoid some typing. This isn’t very useful, because you’re unlikely to be using
$ export VIRTUALENV_DISTRIBUTE=true $ alias venv='virtualenv --distribute'
Now my activation of
virtualenv can be more terse.
$ venv ~/venv/data-science New python executable in /Users/mike/venv/data-science/bin/python Installing distribute........................................... ................................................................ ...........................................................done. Installing pip................done.
I activate the new
data-science virtual environment.
$ source ~/venv/data-science/bin/activate
Next is the tedious process of downloading and installing packages. Unfortunately some of them do not play well with
pip and need to be built from the latest source code explicitly. Others simply take a long time to build and you may want to avoid repeating the build when creating a new python virtual environment. Still others don’t build well and need to be installed using
pip. Good times.
To test that I’ve installed these packages correctly, I first install
nose, a testing framework.
(data-science) $ pip install nose Downloading/unpacking nose Downloading nose-1.2.1.tar.gz (400Kb): 400Kb downloaded Running setup.py egg_info for package nose no previously-included directories found matching 'doc/.build' Installing collected packages: nose Running setup.py install for nose no previously-included directories found matching 'doc/.build' Installing nosetests script to /Users/mike/venv/data-science/bin Installing nosetests-2.7 script to /Users/mike/venv/data-science/bin Successfully installed nose Cleaning up...
I won’t bother to type all that command line output from
Whoops, hold on. Before we build NumPy from source, we’ll need to get up-to-date C and FORTRAN compilers. On OS X that means we need the Xcode command line tools. You can either install Xcode or install just the command line tools. Downloading the command line tools will require a (free) developer account. If you have an older Mac, I suggest searching or the appropriate command line tools for your OS version. If you want all Xcode, then after you install Xcode you will need to download and install the command line tools via the Xcode Preferences dialog. Go to Xcode → Preferences, click on the Downloads tab, select Command Line Tools, and click Install.
Apparently Xcode forgot FORTRAN, so we need to use brew for that.
(data-science)$ brew install gfortran
We’ll install NumPy and SciPy from the bleeding edge source.
(data-science)$ cd ~/.local/lib (data-science)$ git clone git://github.com/numpy/numpy.git (data-science)$ python ~/.local/lib/numpy/setup.py install
To check if numpy is working correctly, we can get python running and quickly test it.
(data-science)$ python >>>Python 2.7.3 (v2.7.3:70274d53c1dd, Apr 9 2012, 20:52:43) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import numpy >>> numpy.test() Running unit tests for numpy ... Ran 4779 tests in 25.513s OK (KNOWNFAIL=5, SKIP=6) <nose.result.TextTestResult run=4779 errors=0 failures=0> >>> quit()
There were a few skips and known failures, but that’s OK. Let’s move on to SciPy.
(data-science)$ git clone git://github.com/scipy/scipy.git (data-science)$ python ~/.local/lib/scipy/setup.py install (data-science)$ python >>> import scipy >>> scipy.test() Running unit tests for scipy ... Ran 3896 tests in 62.633s FAILED (KNOWNFAIL=6, SKIP=34, errors=14, failures=72) <nose.result.TextTestResult run=3896 errors=14 failures=72> >>> quit()
This one was a little more disturbing, but I’ll ignore these failures for now. Let’s hope for the best.
For some reason, I have to build MatPlotLib before I can install it.
(data-science)$ git clone https://github.com/matplotlib/matplotlib.git (data-science)$ python ~/.local/lib/matplotlib/setup.py build (data-science)$ python ~/.local/lib/matplotlib/setup.py install (data-science)$ python >>> import matplotlib >>> matplotlib.test() ....K...K..K...... ...Taking.a.nap... .................. Ran 1211 tests in 385.316s FAILED (KNOWNFAIL=300, failures=2) >>> quit()
Alright, more disturbing failures, but mostly working.
(data-science)$ ruby -e "$(curl -fsSkL raw.github.com/mxcl/homebrew/go)" (data-science)$ brew install zeromq
With so many dependencies, I started relying on
(data-science)$ pip install pyzmq (data-science)$ pip install tornado (data-science)$ pip install sympy (data-science)$ pip install pygments
Not sure why, but
pip doesn’t like
readline, or vice versa.
(data-science)$ easy_install readline (data-science)$ pip install ipython
Finally we can test IPython.
Mostly passing, good enough.
(data-science)$ pip install pandas (data-science)$ pip install statsmodels
Had fun? We’re just about ready to get some work done. Start up an ipython notebook server from your General Assembly Data Science source folder.
$ cd ~/src/data-science/ $ ipython notebook --profile=sympy --pylab inline
Click the ‘new notebook’ button. The first time you use IPython Notebook it will create your config files. Then you can install MathJax for IPython so that you won’t be hitting the MathJax server to render your equations.
In : from IPython.external.mathjax import install_mathjax In : install_mathjax()
Done. For now
If you find any errors with this guide, please let me know.