So, there comes a time in every young PhD student’s life when they say, “Nah… this ain’t it”. That is, you’ll hear a ton of people who decide the doctoral program isn’t for them. This could be due to a slew of reasons, including (non-exhaustive) they’ve lost passion, got burnout, found new interests, or got really tired of being broke and tired. I experienced what I like to call my PhD Quarter Life Crisis – that weird time between your second and third year where you may feel aimless and simply inadaquate. I couldn’t tell if I was doing enough. My progress felt (and continues to feel) very slow growing. I desperately want to publish something, ANYTHING. But it’s hard to get a project to that point.
So I started applying for jobs. Particularly roles that would 1) bring me back home to Texas and have me closer to friends & family, 2) looked interesting and where I knew I would excel, and 3) would keep me in academia (bonus: a salary doesn’t hurt).
Thus, I present to you a mini-project I did to look up academic roles at public Texas institutions of higher education.
Note: All data used here is publically available by Texas mandate. File can be found here.
import pandas as pdimport matplotlib.pyplot as pltpd.set_option('display.max_rows', None)
Code
### Again, I was looking at a Data Science Librarian Role (however, at a non-public university!)### read in datauni_ = pd.read_csv('TCS_All_Data_4-28-22.csv', thousands=',')### update hire date to datetimeuni_['hire_date_dt'] = pd.to_datetime(uni_['hire_date'], errors='coerce')uni_['race'] = uni_.race.str.lower()### add 2 "inflation" columns### universities (in Texas at least) often increase wages to keep up with inflation### I wanted to project 3 years forwarduni_['salary2'] = pd.to_numeric(uni_['salary'])uni_['salary3'] = (uni_['salary2']*.028)+(uni_['salary2'])uni_['salary4'] = (uni_['salary3']*.028)+(uni_['salary3'])
Code
### look at some datadisplay(uni_.shape, uni_.head())### What schools do we have?display(sorted(uni_.agency.unique()))
(80009, 15)
agency
department
full_name
job_title
employment_time
race
gender
hire_date
salary
id
data_date
hire_date_dt
salary2
salary3
salary4
0
UT Austin
Accounting
Soren Aandahl
Lecturer
Part-time
white
Male
1/16/2021
88000.0
192000
3/31/2022
2021-01-16
88000.0
90464.000
92996.992000
1
UT Austin
Marketing
Christopher Aarons
Assistant Professor of Instruction
Full-time
white
Male
9/1/2019
64711.0
192001
3/31/2022
2019-09-01
64711.0
66522.908
68385.549424
2
UT Austin
Computer Science
Scott J Aaronson
Professor
Full-time
white
Male
4/28/2016
251500.0
192002
3/31/2022
2016-04-28
251500.0
258542.000
265781.176000
3
UT Austin
UTeach-Natural Sciences
Vivian Abagiu
Communications Coordinator
Full-time
hispanic or latino
Female
10/15/2015
72000.0
192003
3/31/2022
2015-10-15
72000.0
74016.000
76088.448000
4
UT Austin
University of Texas Elementary School
Anna Marie Abalos
Food Preparation/Service Worker
Part-time
hispanic or latino
Female
8/19/2021
31200.0
192004
3/31/2022
2021-08-19
31200.0
32073.600
32971.660800
['Austin Community College',
'Lamar University',
'Lone Star Community College',
'Sam Houston State University',
'Sul Ross',
'TAMU Commerce',
'TAMU Prairie View',
'Texas A&M',
'Texas A&M Health Science Center',
'Texas State University',
'Texas Tech',
"Texas Woman's University",
'UNT',
'UT Arlington',
'UT Austin',
'UT Dallas',
'UT Permian Basin',
'UT San Antonio',
'UT Tyler',
'University of Houston',
'University of Houston System']
We don’t have all the TX schools, but this is great! Let’s dig around.
Code
### Looking for 'lib'rary roles (without much discrimination) and Full-TimeFT_lib = uni_[uni_.job_title.str.contains('Lib') & uni_.employment_time.str.contains('Full')].copy(deep=True)FT_lib.sort_values(by=['race', 'gender','salary',], inplace=True)display(FT_lib.shape, FT_lib.head())
(804, 15)
agency
department
full_name
job_title
employment_time
race
gender
hire_date
salary
id
data_date
hire_date_dt
salary2
salary3
salary4
74223
Sam Houston State University
Newton Gresham Library
Sarah I. Sellers
Library Associate
Full-Time
american indian or alaskan native
Female
11/1/2017
35040.00
184691
3/28/2022
2017-11-01
35040.00
36021.12000
37029.711360
406
Texas State University
University Libraries
Karen E Cowen
Library Assistant IV
Full-time
american indian or alaskan native
Female
8/21/2000
49128.60
74505
8/20/2021
2000-08-21
49128.60
50504.20080
51918.318422
59769
Texas Woman's University
Library
Julie Reed Sullivan
Librarian Digital Content
Full-Time
american indian or alaskan native
Female
2/22/1993
61055.00
211345
3/22/2022
1993-02-22
61055.00
62764.54000
64521.947120
418
Texas State University
University Libraries
Elizabeth Karen Cruces
Librarian
Full-time
american indian or alaskan native
Female
3/1/2021
62499.96
75561
8/20/2021
2021-03-01
62499.96
64249.95888
66048.957729
74696
Sam Houston State University
Newton Gresham Library
Akira Y. Wu
Library Assistant II
Full-Time
asian
Female
12/1/2020
33360.00
185163
3/28/2022
2020-12-01
33360.00
34294.08000
35254.314240
Code
### Let's also check out data related roles, full-timeFT_data = uni_[uni_.job_title.str.contains('Data') & uni_.employment_time.str.contains('Full')].copy(deep=True)FT_data.sort_values(by=['race', 'gender','salary',], inplace=True)display(FT_data.shape, FT_data.head())
(195, 15)
agency
department
full_name
job_title
employment_time
race
gender
hire_date
salary
id
data_date
hire_date_dt
salary2
salary3
salary4
74160
Sam Houston State University
IT Infrastructure and Support
Victorria A. Saldana
Data Center Operations Spec I
Full-Time
asian
Female
11/1/2021
41616.00
184628
3/28/2022
2021-11-01
41616.00
42781.24800
43979.122944
1122
Texas A&M
Dean
Xiaoping Li
Data Analyst
Full-Time
asian
Female
7/2/2018
48502.44
155323
3/1/2022
2018-07-02
48502.44
49860.50832
51256.602553
1181
Texas A&M
Tamu Libraries
Ethelyn V Mejia
Data Analyst
Full-Time
asian
Female
6/1/2005
51188.16
156457
3/1/2022
2005-06-01
51188.16
52621.42848
54094.828477
57665
UT Austin
Governmental Affairs and Initiatives
Susan Yuanyuan Whitman
Database Coordinator
Full-time
asian
Female
9/2/2008
55000.00
208164
3/31/2022
2008-09-02
55000.00
56540.00000
58123.120000
572
Texas A&M
Office Of Admissions
Liu Shi
Senior Data Analyst
Full-Time
asian
Female
9/1/2008
60009.96
147134
3/1/2022
2008-09-01
60009.96
61690.23888
63417.565569
Code
FT_data.sort_values(by=['salary']).head()
agency
department
full_name
job_title
employment_time
race
gender
hire_date
salary
id
data_date
hire_date_dt
salary2
salary3
salary4
58445
Texas Woman's University
Library
Dolores Aguilar
Data Entry Oper II
Full-Time
hispanic or latino
Female
12/1/2006
24046.00
210021
3/22/2022
2006-12-01
24046.00
24719.28800
25411.428064
29916
Sul Ross
Student Support Services Department
Lisa Griffith
Data Tracking Admin Specialist
Full-Time
white
Female
9/25/2018
24293.00
62890
8/13/2020
2018-09-25
24293.00
24973.20400
25672.453712
30047
Sul Ross
Trio Talent Search
Stephanie Weintraut
Talent Search Data Spec/Sec
Full-Time
white
Female
3/16/2020
24341.00
63214
8/13/2020
2020-03-16
24341.00
25022.54800
25723.179344
5369
UT Tyler
Admissions
Brittany Johnson
Admissions Data Spec I
Full-Time
black or african american
Female
5/16/2013
30514.32
27456
7/29/2020
2013-05-16
30514.32
31368.72096
32247.045147
19931
Texas State University
Materials Mgmt & Logistics
Jennifer Ann Mireles
Data Entry Operator
Full-time
white
Female
5/10/2010
31985.76
75435
8/20/2021
2010-05-10
31985.76
32881.36128
33802.039396
Code
### Community Colleges (with community in the name) all rolesCC_ = uni_[uni_.agency.str.contains('Community')]display(CC_.shape, CC_.head())
(5223, 15)
agency
department
full_name
job_title
employment_time
race
gender
hire_date
salary
id
data_date
hire_date_dt
salary2
salary3
salary4
174
Austin Community College
Marketing
Nicholas Sarantakes
Professor
Full-time
white
Male
8/24/1981
142052.00
189000
3/29/2022
1981-08-24
142052.00
146029.45600
150118.280768
8323
Austin Community College
Physics
Paul Edward Williams
Professor
Full-time
white
Male
8/16/2004
88776.00
189086
3/29/2022
2004-08-16
88776.00
91261.72800
93817.056384
11729
Austin Community College
Biology
Felix S Villarreal
Professor
Full-time
hispanic or latino
Male
9/1/1992
89047.00
189087
3/29/2022
1992-09-01
89047.00
91540.31600
94103.444848
12866
Austin Community College
Biology
Sarah L Strong
Professor
Full-time
white
Female
8/23/1993
118117.00
189261
3/29/2022
1993-08-23
118117.00
121424.27600
124824.155728
74587
Lone Star Community College
Cisco
Donna Ivey
Dir CISCO Prog
Full-Time
white
Female
7/1/2010
102625.94
186427
3/24/2022
2010-07-01
102625.94
105499.46632
108453.451377
Code
### What types of Assistant roles makes between 55k-75k at these community colleges?display(CC_[CC_.job_title.str.contains('Assistant') & (CC_.salary2.between(55000,75000))].sort_values(by=['salary'], ascending=False).head())
agency
department
full_name
job_title
employment_time
race
gender
hire_date
salary
id
data_date
hire_date_dt
salary2
salary3
salary4
79903
Austin Community College
Sonography
Joel Thurman
Professor, Assistant
Full-time
white
Male
8/9/2021
73972.0
191346
3/29/2022
2021-08-09
73972.0
76043.216
78172.426048
79718
Austin Community College
Library Services
Jorge Lopez-McKnight
Professor, Assistant
Full-time
hispanic or latino
Male
10/14/2018
73972.0
191161
3/29/2022
2018-10-14
73972.0
76043.216
78172.426048
77959
Austin Community College
Sonography
Sherri Lynn
Professor, Assistant
Full-time
white
Female
8/19/2019
73972.0
189402
3/29/2022
2019-08-19
73972.0
76043.216
78172.426048
79631
Austin Community College
Library Services
Christina M McCourt
Professor, Assistant
Full-time
hispanic or latino
Female
10/2/2017
72749.0
191074
3/29/2022
2017-10-02
72749.0
74785.972
76879.979216
78779
Austin Community College
Emergency Med Svcs Professions
Neia D Hoffman
Professor, Assistant
Full-time
white
Female
5/23/2016
72749.0
190222
3/29/2022
2016-05-23
72749.0
74785.972
76879.979216
Let’s look at the top salaries for select roles
Herre you can see the top 10 ‘Data’ related roles, along with the boxplots by race and gende (see @box_plots_race or @box_plots_gender below).
Code
### Describe Dataimport seaborn as snscmap = sns.light_palette("#34A853", as_cmap=True)### Change this as you see fit. Suggestions: 'Data', 'Assistant', 'Librar*'Role ='Data'num_ =10print('Top {}{} Roles: \n\n'.format(num_, Role))display(uni_[uni_.job_title.str.contains(Role)].sort_values(by=['salary'], ascending=False).head(num_).style.background_gradient(cmap=cmap, subset=['salary']))print('\n\n','----'*50, '\n\n')display(uni_[uni_.job_title.str.contains(Role)].groupby(['race', 'gender'])[['salary']].describe().style.background_gradient(cmap=cmap, subset=[('salary', 'mean')]))print('\n\n','----'*50, '\n\n')