Some general resources and codes for data analysis in Python

Topics:

Python for Data Analysis

Tags:

data analysis

joins

merges

Pandas

Python

time series

Working with Time Series Graphs
http://earthpy.org/pandas-basics.html
http://stackoverflow.com/questions/39403580/create-gantt-chart-with-hlines

Convert Days from Date Time to Numeric
df ['Days'] = (pd.to_timedelta(df['Days'], unit='d') + pd.to_timedelta(1,unit='s')).astype('timedelta64[D]')

Convert to Date
df['Text Date'] = pd.to_datetime(df['Text Date'])
Some Information on Joins and Merges
http://chrisalbon.com/python/pandas_join_merge_dataframe.html
http://pandas.pydata.org/pandas-docs/stable/merging.html
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.concat.html
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.a...
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.m...

SQL Options with Python
http://www.sqlalchemy.org/

Create New Column Based on Criteria
# create the column and set it equal to exclude df['HC_Indicator'] = "Exclude"
# tag each row as RBE if it meets these conditions: Employee Status is Active AND the Staff Class Indicator is equal to Non-Academic Staff, Academic Staff, and Other Teaching AND Standard Hours is greater than or equal to 20 AND Selection Index is equal to 1, and paygroup is not equal to VA, STU, CTS, or ST2 df['HC_Indicator'][(df['Empl Status'].isin(["A"])) & (df['Stf Class Indicatr'].isin(["Non-Academic Staff", "Academic Staff", "Other Teaching"])) & (df['Std Hours'] >= 20) & (df['Selection Index'] == 1) & (df['Paygroup'].isin(["VA", "STU", "CTS","ST2"])==False)] = "RBE"

Basic Filters
# option 1 df = df[df['HC Indicator'] == "Temp/Casual"] df = df[df['Empl Rcd'] == 0]

# option 2 df = df[df['Action'].isin(["HIR", "REH"])]

Advanced Filters
# filter if action is a hire or rehire and if empl rcd is 0 or status is active; note that each condition is separated by parnthesis df = df[(df['Action'].isin(["HIR", "REH"])) & (df['Empl Rcd'].isin([0])) | (df['Status'] == "Active") ]

Merging
# this is the equivalent of doing a vlookup or left join df3 = pd.merge(df1, df2, on='Emplid', how = 'left')

Concat (Stacking Rows)
# this will start with all rows in df1, then dump all df2 under df1, and df3 will go under df2; if there is a new column in not represented, it will add that column in df1 = pd.concat([df1,df2,df3])