Information on Python

Python

The main python website is: python.

The main python docs page is python docs.
(note the tutorial).

People install python two different ways.
There is the standard install into your system and the anaconda/miniconda/conda forge install.
For the standard install, see downloads at python.

Most people start with anaconda.

Python is supported by many packages the provide crucial utility.
Just about about every time you go into python you will use stuff like:
In [1]: import numpy as np  #efficient arrays (vectors, matrices ...)                                                                                                          
In [2]: import pandas as pd #data analytic tools
Conda is an open source package management system and environment management system
that runs on Windows, macOS and Linux.
See conda

Anaconda is a python distribution that uses conda.
Anaconda includes all the python packages you need to get started with data science/machine learning!!.
See anaconda.
Anaconda getting started.
Install page: Anaconda Installers
More info on anaconda installation: Installing Anaconda Distribution

There are a couple of nice things about anaconda:
(i) It bundles up a lot of tools and python packages you will need.
(ii) Using conda, you can maintain and switch between different python environments .
Each environment can be built on a different python version and include different python packages.

The downside to anaconda is that can take up a fair amount of disk space.
miniconda allows you to install an minimal python/conda setup up which you
can then add to as needed.
See miniconda (at conda website).

There is also minforge: conda-forge/miniforge
miniforge from chatGPT:
Developer: Community-driven (conda-forge community).
Python Versions: Available for multiple Python versions.
Default Channels:
Uses the conda-forge channel by default.
All packages are built from open-source software, avoiding proprietary packages.
Purpose:
Designed to provide a minimal Conda installation with a focus on open-source.
Ideal for users who prioritize transparency and community-maintained packages.
Platform Availability:
In addition to standard platforms, Miniforge offers versions optimized for Apple Silicon (arm64) and other architectures.
With the standard install, people usually use pip (pip3) as the package manager.
I think you can also use pip with anaconda.

Some basic conda

After you install anaconda you will be in the base environment.

To see what environments you have:
conda info --envs

To see what packages are in the current environment:
conda list

To deactivate an environment:
conda deactivate

To activate the environment ENV:
conda activate ENV

On Rob's old intel Mac, a new environment named XGB with anaconda and xgboost was created with:
conda create -n XGB anaconda xgboost

Before you create an environment, get out of (deactive) any current environment.

DO NOT INSTALL PACKAGES INTO BASE, MAKE A NEW ENVIRONMENT !!!!

Python Tools

Python tools you may want to have are:
(i) ipython: an enhanced python shell.
(ii) jupyter notebook: A note book where you can mix text, python code, python output, latex ...
(iii) a development environment such as spyder or pycharm.

Anaconda will get you Jupyter lab which has all these tools and more.
ipython has a lot of enhancements over the basic python shell.
The jupyter notebook has become a standard way to communicate results in data science.

That being said:
Chapter 1 of Python Data Science Handbook, by VanderPlas
"There are many options for development environments for Python, 
and I'm often asked which one I use in my own work.
My answer sometimes surprises people: my preferred environment is 
IPython plus a text editor. 

Another thing to be aware of is google colab:
Welcome To Colaboratory
   This is a remarkable free online notebook type environment with all the key Machine Learning tools available.

For fun, try this in colab to see the power of gpu: Matrix ops with gpu in torch

Be sure to check out the official help pages for each package (e.g. numpy).
The help tab in Jupyter notebook is also great.

Books

Rob's books: books

Some python links:

A Whirlwind Tour of Python, by Jake VanderPlas

Matloff's tutorial on Python, for those with a strong programming background.

python

conda
   Conda cheat sheet
   conda-cheatsheet

Getting started with conda

ipython

jupyter notebook

Nice short python intro

This package fits many statistical models giving the standard inferential ouput:
statsmodels

basic python packages:
scipy (scientific computing)

numpy (efficient arrays, e.g. matrices and vectors)

pandas (data structures for working with data, e.g Data Frames)
  Nice pandas reference

matplotlib (graphics)

scikit-learn (machine learning)

Note

The sciki-learn webpage is absolutely wonderful!!!

The two basic software platforms for deep learning are tensorflow/keras and pytorch (torch in R). Keras provides a nice interface to tensorflow.
The full tensorflow is like the SAS of deep-learniing. pytorch and tensoflow/keras do not seem to be in the base anaconda distribution.
You will have to use conda or pip to install them.
It is highly recommended that you create a new environment (e.g. other that anaconda base) to install into!!

The anaconda base environment has all the tools you need for intro data science/maching learning, but
you will have to do some additional install if you want the neural nets/deep learning locally (as opposed to using cola).

pip

Note that the standard package manager for python is pip (as opposed to using conda),
see for example the python.org documentation here. conda is recommended for working with anaconda.

A good pip reference for all the basic commands: Commands

more pip: A Beginner's Guide to pip

pip cheat sheet: pip cheat sheet

More pip links:
python environments
Creation of virtual environments.
Tutorials


Data Science in Python

Hello World, Data Science in Python

simple-for-ipython.py, a very simple little python script with some of the basics

Hello world regression in python (.html)
   Video of simple plots and simple linear regression in with sklearn
   Video of Multiple Linear Regression and more on sklearn
   Video of multiple regression using statsmodels and dummies for the categorical color

Hello world regression in python, Jupyter note book (.ipynb)

Hello world regression in python, html, short version

OOS Loop in Python

Here is a simple example of a loop in python to estimate the out-of-same root mean square error
for linear regression and the susedcars.csv data set using just x=(mileage,year) for y=price:
  do-cars-oos.py.

What is the oos loop trying to do?
  Out of sample Loss..

Python and R

Note that you can call python from R:
R studio notes on package reticulate