Code
import pandas as pd
import matplotlib.pyplot as plt
pd.set_option('display.max_rows', None)So, there comes a time in every young PhD student’s life when they say, “Nah… this ain’t it”. That is, you’ll hear a ton of people who decide the doctoral program isn’t for them. This could be due to a slew of reasons, including (non-exhaustive) they’ve lost passion, got burnout, found new interests, or got really tired of being broke and tired. I experienced what I like to call my PhD Quarter Life Crisis – that weird time between your second and third year where you may feel aimless and simply inadaquate. I couldn’t tell if I was doing enough. My progress felt (and continues to feel) very slow growing. I desperately want to publish something, ANYTHING. But it’s hard to get a project to that point.
So I started applying for jobs. Particularly roles that would 1) bring me back home to Texas and have me closer to friends & family, 2) looked interesting and where I knew I would excel, and 3) would keep me in academia (bonus: a salary doesn’t hurt).
Thus, I present to you a mini-project I did to look up academic roles at public Texas institutions of higher education.
Note: All data used here is publically available by Texas mandate. File can be found here.
Other interesting data sources:
- https://texascollegesalaries.com/institutions
- https://salaries.texastribune.org/departments/library-and-archives-commission/
- https://govsalaries.com/state/TX
import pandas as pd
import matplotlib.pyplot as plt
pd.set_option('display.max_rows', None)### Again, I was looking at a Data Science Librarian Role (however, at a non-public university!)
### read in data
uni_ = pd.read_csv('TCS_All_Data_4-28-22.csv', thousands=',')
### update hire date to datetime
uni_['hire_date_dt'] = pd.to_datetime(uni_['hire_date'], errors='coerce')
uni_['race'] = uni_.race.str.lower()
### add 2 "inflation" columns
### universities (in Texas at least) often increase wages to keep up with inflation
### I wanted to project 3 years forward
uni_['salary2'] = pd.to_numeric(uni_['salary'])
uni_['salary3'] = (uni_['salary2']*.028)+(uni_['salary2'])
uni_['salary4'] = (uni_['salary3']*.028)+(uni_['salary3'])### look at some data
display(uni_.shape, uni_.head())
### What schools do we have?
display(sorted(uni_.agency.unique()))(80009, 15)
| agency | department | full_name | job_title | employment_time | race | gender | hire_date | salary | id | data_date | hire_date_dt | salary2 | salary3 | salary4 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | UT Austin | Accounting | Soren Aandahl | Lecturer | Part-time | white | Male | 1/16/2021 | 88000.0 | 192000 | 3/31/2022 | 2021-01-16 | 88000.0 | 90464.000 | 92996.992000 |
| 1 | UT Austin | Marketing | Christopher Aarons | Assistant Professor of Instruction | Full-time | white | Male | 9/1/2019 | 64711.0 | 192001 | 3/31/2022 | 2019-09-01 | 64711.0 | 66522.908 | 68385.549424 |
| 2 | UT Austin | Computer Science | Scott J Aaronson | Professor | Full-time | white | Male | 4/28/2016 | 251500.0 | 192002 | 3/31/2022 | 2016-04-28 | 251500.0 | 258542.000 | 265781.176000 |
| 3 | UT Austin | UTeach-Natural Sciences | Vivian Abagiu | Communications Coordinator | Full-time | hispanic or latino | Female | 10/15/2015 | 72000.0 | 192003 | 3/31/2022 | 2015-10-15 | 72000.0 | 74016.000 | 76088.448000 |
| 4 | UT Austin | University of Texas Elementary School | Anna Marie Abalos | Food Preparation/Service Worker | Part-time | hispanic or latino | Female | 8/19/2021 | 31200.0 | 192004 | 3/31/2022 | 2021-08-19 | 31200.0 | 32073.600 | 32971.660800 |
['Austin Community College',
'Lamar University',
'Lone Star Community College',
'Sam Houston State University',
'Sul Ross',
'TAMU Commerce',
'TAMU Prairie View',
'Texas A&M',
'Texas A&M Health Science Center',
'Texas State University',
'Texas Tech',
"Texas Woman's University",
'UNT',
'UT Arlington',
'UT Austin',
'UT Dallas',
'UT Permian Basin',
'UT San Antonio',
'UT Tyler',
'University of Houston',
'University of Houston System']
### Looking for 'lib'rary roles (without much discrimination) and Full-Time
FT_lib = uni_[uni_.job_title.str.contains('Lib') & uni_.employment_time.str.contains('Full')].copy(deep=True)
FT_lib.sort_values(by=['race', 'gender','salary',], inplace=True)
display(FT_lib.shape, FT_lib.head())(804, 15)
| agency | department | full_name | job_title | employment_time | race | gender | hire_date | salary | id | data_date | hire_date_dt | salary2 | salary3 | salary4 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 74223 | Sam Houston State University | Newton Gresham Library | Sarah I. Sellers | Library Associate | Full-Time | american indian or alaskan native | Female | 11/1/2017 | 35040.00 | 184691 | 3/28/2022 | 2017-11-01 | 35040.00 | 36021.12000 | 37029.711360 |
| 406 | Texas State University | University Libraries | Karen E Cowen | Library Assistant IV | Full-time | american indian or alaskan native | Female | 8/21/2000 | 49128.60 | 74505 | 8/20/2021 | 2000-08-21 | 49128.60 | 50504.20080 | 51918.318422 |
| 59769 | Texas Woman's University | Library | Julie Reed Sullivan | Librarian Digital Content | Full-Time | american indian or alaskan native | Female | 2/22/1993 | 61055.00 | 211345 | 3/22/2022 | 1993-02-22 | 61055.00 | 62764.54000 | 64521.947120 |
| 418 | Texas State University | University Libraries | Elizabeth Karen Cruces | Librarian | Full-time | american indian or alaskan native | Female | 3/1/2021 | 62499.96 | 75561 | 8/20/2021 | 2021-03-01 | 62499.96 | 64249.95888 | 66048.957729 |
| 74696 | Sam Houston State University | Newton Gresham Library | Akira Y. Wu | Library Assistant II | Full-Time | asian | Female | 12/1/2020 | 33360.00 | 185163 | 3/28/2022 | 2020-12-01 | 33360.00 | 34294.08000 | 35254.314240 |
### Let's also check out data related roles, full-time
FT_data = uni_[uni_.job_title.str.contains('Data') & uni_.employment_time.str.contains('Full')].copy(deep=True)
FT_data.sort_values(by=['race', 'gender','salary',], inplace=True)
display(FT_data.shape, FT_data.head())(195, 15)
| agency | department | full_name | job_title | employment_time | race | gender | hire_date | salary | id | data_date | hire_date_dt | salary2 | salary3 | salary4 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 74160 | Sam Houston State University | IT Infrastructure and Support | Victorria A. Saldana | Data Center Operations Spec I | Full-Time | asian | Female | 11/1/2021 | 41616.00 | 184628 | 3/28/2022 | 2021-11-01 | 41616.00 | 42781.24800 | 43979.122944 |
| 1122 | Texas A&M | Dean | Xiaoping Li | Data Analyst | Full-Time | asian | Female | 7/2/2018 | 48502.44 | 155323 | 3/1/2022 | 2018-07-02 | 48502.44 | 49860.50832 | 51256.602553 |
| 1181 | Texas A&M | Tamu Libraries | Ethelyn V Mejia | Data Analyst | Full-Time | asian | Female | 6/1/2005 | 51188.16 | 156457 | 3/1/2022 | 2005-06-01 | 51188.16 | 52621.42848 | 54094.828477 |
| 57665 | UT Austin | Governmental Affairs and Initiatives | Susan Yuanyuan Whitman | Database Coordinator | Full-time | asian | Female | 9/2/2008 | 55000.00 | 208164 | 3/31/2022 | 2008-09-02 | 55000.00 | 56540.00000 | 58123.120000 |
| 572 | Texas A&M | Office Of Admissions | Liu Shi | Senior Data Analyst | Full-Time | asian | Female | 9/1/2008 | 60009.96 | 147134 | 3/1/2022 | 2008-09-01 | 60009.96 | 61690.23888 | 63417.565569 |
FT_data.sort_values(by=['salary']).head()| agency | department | full_name | job_title | employment_time | race | gender | hire_date | salary | id | data_date | hire_date_dt | salary2 | salary3 | salary4 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 58445 | Texas Woman's University | Library | Dolores Aguilar | Data Entry Oper II | Full-Time | hispanic or latino | Female | 12/1/2006 | 24046.00 | 210021 | 3/22/2022 | 2006-12-01 | 24046.00 | 24719.28800 | 25411.428064 |
| 29916 | Sul Ross | Student Support Services Department | Lisa Griffith | Data Tracking Admin Specialist | Full-Time | white | Female | 9/25/2018 | 24293.00 | 62890 | 8/13/2020 | 2018-09-25 | 24293.00 | 24973.20400 | 25672.453712 |
| 30047 | Sul Ross | Trio Talent Search | Stephanie Weintraut | Talent Search Data Spec/Sec | Full-Time | white | Female | 3/16/2020 | 24341.00 | 63214 | 8/13/2020 | 2020-03-16 | 24341.00 | 25022.54800 | 25723.179344 |
| 5369 | UT Tyler | Admissions | Brittany Johnson | Admissions Data Spec I | Full-Time | black or african american | Female | 5/16/2013 | 30514.32 | 27456 | 7/29/2020 | 2013-05-16 | 30514.32 | 31368.72096 | 32247.045147 |
| 19931 | Texas State University | Materials Mgmt & Logistics | Jennifer Ann Mireles | Data Entry Operator | Full-time | white | Female | 5/10/2010 | 31985.76 | 75435 | 8/20/2021 | 2010-05-10 | 31985.76 | 32881.36128 | 33802.039396 |
### Community Colleges (with community in the name) all roles
CC_ = uni_[uni_.agency.str.contains('Community')]
display(CC_.shape, CC_.head())(5223, 15)
| agency | department | full_name | job_title | employment_time | race | gender | hire_date | salary | id | data_date | hire_date_dt | salary2 | salary3 | salary4 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 174 | Austin Community College | Marketing | Nicholas Sarantakes | Professor | Full-time | white | Male | 8/24/1981 | 142052.00 | 189000 | 3/29/2022 | 1981-08-24 | 142052.00 | 146029.45600 | 150118.280768 |
| 8323 | Austin Community College | Physics | Paul Edward Williams | Professor | Full-time | white | Male | 8/16/2004 | 88776.00 | 189086 | 3/29/2022 | 2004-08-16 | 88776.00 | 91261.72800 | 93817.056384 |
| 11729 | Austin Community College | Biology | Felix S Villarreal | Professor | Full-time | hispanic or latino | Male | 9/1/1992 | 89047.00 | 189087 | 3/29/2022 | 1992-09-01 | 89047.00 | 91540.31600 | 94103.444848 |
| 12866 | Austin Community College | Biology | Sarah L Strong | Professor | Full-time | white | Female | 8/23/1993 | 118117.00 | 189261 | 3/29/2022 | 1993-08-23 | 118117.00 | 121424.27600 | 124824.155728 |
| 74587 | Lone Star Community College | Cisco | Donna Ivey | Dir CISCO Prog | Full-Time | white | Female | 7/1/2010 | 102625.94 | 186427 | 3/24/2022 | 2010-07-01 | 102625.94 | 105499.46632 | 108453.451377 |
### What types of Assistant roles makes between 55k-75k at these community colleges?
display(CC_[CC_.job_title.str.contains('Assistant') & (CC_.salary2.between(55000,75000))].sort_values(by=['salary'], ascending=False).head())| agency | department | full_name | job_title | employment_time | race | gender | hire_date | salary | id | data_date | hire_date_dt | salary2 | salary3 | salary4 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 79903 | Austin Community College | Sonography | Joel Thurman | Professor, Assistant | Full-time | white | Male | 8/9/2021 | 73972.0 | 191346 | 3/29/2022 | 2021-08-09 | 73972.0 | 76043.216 | 78172.426048 |
| 79718 | Austin Community College | Library Services | Jorge Lopez-McKnight | Professor, Assistant | Full-time | hispanic or latino | Male | 10/14/2018 | 73972.0 | 191161 | 3/29/2022 | 2018-10-14 | 73972.0 | 76043.216 | 78172.426048 |
| 77959 | Austin Community College | Sonography | Sherri Lynn | Professor, Assistant | Full-time | white | Female | 8/19/2019 | 73972.0 | 189402 | 3/29/2022 | 2019-08-19 | 73972.0 | 76043.216 | 78172.426048 |
| 79631 | Austin Community College | Library Services | Christina M McCourt | Professor, Assistant | Full-time | hispanic or latino | Female | 10/2/2017 | 72749.0 | 191074 | 3/29/2022 | 2017-10-02 | 72749.0 | 74785.972 | 76879.979216 |
| 78779 | Austin Community College | Emergency Med Svcs Professions | Neia D Hoffman | Professor, Assistant | Full-time | white | Female | 5/23/2016 | 72749.0 | 190222 | 3/29/2022 | 2016-05-23 | 72749.0 | 74785.972 | 76879.979216 |
Herre you can see the top 10 ‘Data’ related roles, along with the boxplots by race and gende (see @box_plots_race or @box_plots_gender below).
### Describe Data
import seaborn as sns
cmap = sns.light_palette("#34A853", as_cmap=True)
### Change this as you see fit. Suggestions: 'Data', 'Assistant', 'Librar*'
Role = 'Data'
num_ = 10
print('Top {} {} Roles: \n\n'.format(num_, Role))
display(uni_[uni_.job_title.str.contains(Role)].sort_values(by=['salary'], ascending=False).head(num_).style.background_gradient(cmap=cmap, subset=['salary']))
print('\n\n','----'*50, '\n\n')
display(uni_[uni_.job_title.str.contains(Role)].groupby(['race', 'gender'])[['salary']].describe().style.background_gradient(cmap=cmap, subset=[('salary', 'mean')]))
print('\n\n','----'*50, '\n\n')Top 10 Data Roles:
| agency | department | full_name | job_title | employment_time | race | gender | hire_date | salary | id | data_date | hire_date_dt | salary2 | salary3 | salary4 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 46963 | Texas A&M | Texas Real Estate Research Center | Gerald A Klassen | Research Data Scientist | Full-Time | white | Male | 11/28/2005 | 223485.600000 | 149201 | 3/1/2022 | 2005-11-28 00:00:00 | 223485.600000 | 229743.196800 | 236176.006310 |
| 19476 | Texas State University | Office of Institutional Research | Tami Lynn Rice | Dir, System Data & Analysis | Full-time | white | Female | 12/1/1997 | 142124.880000 | 74397 | 8/20/2021 | 1997-12-01 00:00:00 | 142124.880000 | 146104.376640 | 150195.299186 |
| 79868 | Austin Community College | Information Technology | David Cantu | Sr. Manager, Data Governance | Full-time | hispanic or latino | Male | 5/17/2021 | 129273.000000 | 191311 | 3/29/2022 | 2021-05-17 00:00:00 | 129273.000000 | 132892.644000 | 136613.638032 |
| 24980 | UT Austin | IQ - Information Quest | Darren S Holm | Senior Database Administrator | Full-time | white | Male | 11/10/2014 | 127335.000000 | 198732 | 3/31/2022 | 2014-11-10 00:00:00 | 127335.000000 | 130900.380000 | 134565.590640 |
| 42959 | Texas A&M | Office Of Institutional Effectiveness | Rajeeb L Das | Senior Data Scientist | Full-Time | hispanic or latino | Male | 12/2/2019 | 126284.160000 | 147034 | 3/1/2022 | 2019-12-02 00:00:00 | 126284.160000 | 129820.116480 | 133455.079741 |
| 71763 | UT Arlington | University Analytics | Lisa Creed | MANAGER, Partnerships & Data | Not Provided | not provided | Not Provided | 9/8/2020 | 123600.000000 | 145604 | 2/15/2022 | 2020-09-08 00:00:00 | 123600.000000 | 127060.800000 | 130618.502400 |
| 32501 | UNT | Data Analytics & Instl Rsrch | Daniel J Hubbard | Director, Data Management | Full-Time | white | Male | 2/22/2017 | 123000.000000 | 179862 | 3/29/2022 | 2017-02-22 00:00:00 | 123000.000000 | 126444.000000 | 129984.432000 |
| 64495 | UT Arlington | OIT Enterprise Data Services | Paul Savoy | Sr Database Administrator | Not Provided | not provided | Not Provided | 12/12/2016 | 121616.000000 | 141732 | 2/15/2022 | 2016-12-12 00:00:00 | 121616.000000 | 125021.248000 | 128521.842944 |
| 1803 | University of Houston | Enterprise Systems | Zeandra Mathura | ES Database Adminstrator 4 | Full-Time | asian | Female | 6/1/2016 | 116857.800000 | 120609 | 2/18/2022 | 2016-06-01 00:00:00 | 116857.800000 | 120129.818400 | 123493.453315 |
| 6885 | University of Houston | Enterprise Systems | Carol Pena | ES Database Adminstrator 4 | Full-Time | hispanic or latino | Female | 2/11/2008 | 114880.440000 | 120659 | 2/18/2022 | 2008-02-11 00:00:00 | 114880.440000 | 118097.092320 | 121403.810905 |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| salary | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| count | mean | std | min | 25% | 50% | 75% | max | ||
| race | gender | ||||||||
| asian | Female | 22.000000 | 79189.720000 | 21915.379994 | 41616.000000 | 60969.120000 | 80713.320000 | 97159.000000 | 116857.800000 |
| Male | 18.000000 | 86061.783889 | 20387.068553 | 52000.000000 | 75340.432500 | 90175.860000 | 102901.000000 | 110560.320000 | |
| black or african american | Female | 8.000000 | 66522.936250 | 16022.700192 | 30514.320000 | 63750.000000 | 72318.585000 | 74150.000000 | 81690.000000 |
| Male | 3.000000 | 87640.000000 | 19325.716028 | 69770.000000 | 77385.000000 | 85000.000000 | 96575.000000 | 108150.000000 | |
| hispanic or latino | Female | 7.000000 | 53637.491429 | 31630.273709 | 24046.000000 | 34779.500000 | 41124.000000 | 62926.500000 | 114880.440000 |
| Male | 19.000000 | 70699.188947 | 29236.914059 | 34680.000000 | 50260.020000 | 60000.000000 | 90190.495000 | 129273.000000 | |
| native hawaiian or other pacific islander | Female | 1.000000 | 43296.000000 | nan | 43296.000000 | 43296.000000 | 43296.000000 | 43296.000000 | 43296.000000 |
| not provided | Female | 3.000000 | 43594.666667 | 16738.365671 | 32000.000000 | 34000.000000 | 36000.000000 | 49392.000000 | 62784.000000 |
| Male | 4.000000 | 71264.957500 | 32838.361957 | 42861.000000 | 43677.480000 | 67508.410000 | 95095.887500 | 107182.010000 | |
| Not Provided | 40.000000 | 70903.949500 | 26333.239253 | 26450.000000 | 50600.000000 | 65693.115000 | 93155.000000 | 123600.000000 | |
| two or more races | Female | 3.000000 | 61697.586667 | 18196.965722 | 45000.000000 | 52000.000000 | 59000.000000 | 70046.380000 | 81092.760000 |
| Male | 2.000000 | 60190.040000 | 11582.465644 | 52000.000000 | 56095.020000 | 60190.040000 | 64285.060000 | 68380.080000 | |
| white | Female | 49.000000 | 59937.559388 | 22755.208720 | 24293.000000 | 43703.000000 | 54200.000000 | 71400.000000 | 142124.880000 |
| Male | 68.000000 | 74908.674118 | 30083.636671 | 34000.000000 | 51940.500000 | 72003.500000 | 92756.257500 | 223485.600000 | |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
### Visualize!
boxprops = dict(linewidth=1.5, color='pink')
medianprops = dict(linestyle='-.', linewidth=2.5, color='firebrick')
# Creating boxplot
fig, ax = plt.subplots(figsize =(18, 6))
# Remove top and right border
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)
uni_[uni_.job_title.str.contains(Role)].boxplot(column='salary', by=['race'], ax=ax, boxprops=boxprops, medianprops=medianprops)
plt.suptitle("Boxplot for {} by Race".format(Role))
plt.show()
### Visualize!
boxprops = dict(linewidth=1.5, color='pink')
medianprops = dict(linestyle='-.', linewidth=2.5, color='firebrick')
fig, ax = plt.subplots(figsize =(18, 6))
# Remove top and right border
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)
uni_[uni_.job_title.str.contains(Role)].boxplot(column='salary', by=['gender'], ax=ax, boxprops=boxprops, medianprops = dict(linestyle='-.', linewidth=2.5, color='green'))
plt.suptitle("Boxplot for {} by Gender".format(Role))
plt.show()