*These data have already been transformed from the initial TRIP data to evaluate the claims in Maliniak et al. 2013
*Citation data comes from the Web of knowledge. These values were gathered by an automated script in March 2013 and linked to
*the articles in the TRIP database by the unique combination of values formed by
*the title, journal of publication, issue number, and volume number+ The number of
*citations for each article reflects citations from all articles catalogued in the WOK’s
*Social Science Citation Index Expanded, not just those journals from which we
*draw our sample. See Maliniak et al. 2013 for more information.

use "replication_data.dta", clear 
rename id cited_pub_id
merge 1:1 cited_pub_id using "citation_counts_age_wide.dta"
rename cited_pub_id id
drop _merge

merge 1:1 id using "processed_gender_trip.dta", keepusing(authscore*) force

drop _merge

save Maliniak_et_al_replication_data.dta, replace
drop if C__Year>2007
*table1 

estimates clear
estpost tabstat sscie if in_analysis==1 & C__Year > 1980 & C__Year < 1990, by(gender_comp) statistics(mean sd median) listwise 
eststo a
estpost tabstat sscie if in_analysis==1 & C__Year > 1990 & C__Year < 2000, by(gender_comp) statistics(mean sd median) listwise 
eststo b
estpost tabstat sscie if in_analysis==1 & C__Year > 2000 & C__Year < 2007, by(gender_comp) statistics(mean sd median) listwise 
eststo c
esttab a b c using table1.rtf, replace cells("mean(fmt(2)) sd(fmt(2)) p50(fmt(0))") nostar label nonotes mtitles("1980s" "1990s" "2000s") nonumbers title("Table 1: Citations by gender and decade.")



*Table 2

*simple baseline
nbreg sscie_count all_female coed y1980-y2007  if in_analysis == 1 & C__Year < 2007, robust
eststo gender

*add age
nbreg sscie_count all_female coed article_age article_age_sq y1980-y2007  if in_analysis == 1 & C__Year < 2007, robust
eststo age

*Career Model
nbreg sscie_count all_female coed article_age article_age_sq tenured tenured_female coauthored R1   y1980-y2007  if in_analysis == 1 & C__Year < 2007, robust
eststo career

*Epistemology
nbreg sscie_count all_female coed article_age article_age_sq tenured tenured_female coauthored R1 positivist  y1980-y2007  if in_analysis == 1 & C__Year < 2007, robust
eststo epist

*Material
nbreg sscie_count all_female coed article_age article_age_sq tenured tenured_female coauthored R1 positivist material y1980-y2007  if in_analysis == 1 & C__Year < 2007, robust
eststo material

*Ideational
nbreg sscie_count all_female coed article_age article_age_sq tenured tenured_female coauthored R1 positivist material idea  y1980-y2007  if in_analysis == 1 & C__Year < 2007, robust
eststo idea

*Paradigm
nbreg sscie_count all_female coed article_age article_age_sq tenured tenured_female coauthored R1 positivist material idea para_atheo-para_real  y1980-y2007  if in_analysis == 1 & C__Year < 2007, robust
eststo paradigm

*Issue
nbreg sscie_count all_female coed article_age article_age_sq tenured tenured_female coauthored R1 positivist material idea para_atheo-para_real  y1980-y2007 issue_american-issue_usfp if in_analysis == 1 & C__Year < 2007, robust
eststo issue

*Methodology
nbreg sscie_count all_female coed article_age article_age_sq tenured tenured_female coauthored R1 positivist material idea para_atheo-para_real  y1980-y2007 issue_american-issue_usfp meth_quant-meth_count if in_analysis == 1 & C__Year < 2007, robust
eststo method

*kitchen_sink
nbreg sscie_count all_female coed article_age article_age_sq tenured tenured_female coauthored R1 positivist material idea para_atheo-para_real y1980-y2007 issue_american-issue_usfp meth_quant-meth_count AJPS APSR BJPS EJIR IO IS ISQ JCR JOP JPR SS WP if in_analysis == 1 & C__Year < 2007, robust
eststo kitchen_sink

esttab gender age career epist material idea paradigm issue method kitchen_sink using table6.rtf, replace label nodepvars nogaps nonumbers mtitles mtitles("Gender" "Time" "Career" "Epistemology" "Material" "Ideational" "Paradigm" "Issue Area" "Methodology" "Kitchen Sink") se(%8.3g) b(%8.3g) starlevels(* 0.10 ** 0.05 *** 0.01) drop(y1* y2* o*) stat(N ll aic r2_p, labels("N" "Log Lik." "AIC" "R2") fmt(%10.7g %10.7g %10.7g %7.2g))

*using a model to predict female citations

nbreg sscie_count article_age article_age_sq tenured tenured_female coauthored R1  y1980-y2007  if in_analysis == 1 & C__Year < 2007&all_male==1, robust
predict m1, n 
predict m1_ci, stdp
gen m1_upper = m1+(m1_ci*1.96)
gen m1_lower = m1-m1_ci*(1.96)

corr m1 sscie_count if  all_female==1
gen diff_citation_count= sscie_count-m1
bysort  gender_comp:  sum diff_citation_count m1 sscie_count if in_analysis==1, detail
ttest  m1=sscie_count if all_male==1&in_analysis==1
ttest  m1=sscie_count if all_female==1&in_analysis==1
ttest  m1=sscie_count if coed==1&in_analysis==1
ttest  diff_citation_count, by( all_female)

label variable m1 "Predicted"
label variable sscie_count "Actual"
label variable diff_ "Diff."  

estimates clear
estpost tabstat m1 sscie diff_ if in_analysis==1 & C__Year < 2007, by(gender_comp) statistics(mean) columns(statistics) listwise 
esttab using table7.rtf, replace main(mean) nostar nogaps unstack label nonotes nomtitles nonumbers title("Table 7: Predicted vs. Actual Citation Counts.")

*Models with weighting, i.e. parametric matching

mlogit gender article_age article_age_sq tenured coauthored R1 positivist material idea para_atheo-para_real issue_american-issue_usfp meth_quant-meth_count  APSR BJPS EJIR IO IS ISQ JCR JOP JPR SS WP if in_analysis == 1 & C__Year < 2007, robust

predict ps1 ps2 ps3
gen dose_w=1/ps1 if  gender==1
replace dose_w=1/ps2 if gender==2
replace dose_w=1/ps3 if gender==3
nbreg sscie_count all_female coed article_age article_age_sq tenured tenured_female coauthored R1  y1980-y2007  if in_analysis == 1 & C__Year < 2007, robust
eststo m1
nbreg sscie_count all_female coed article_age article_age_sq tenured tenured_female coauthored R1  y1980-y2007  if in_analysis == 1 & C__Year < 2007 [pw= dose_w], robust 
eststo m2

esttab m1 m2 using table7a.rtf, replace label nogaps nonumbers mtitles mtitles("Unweighted" "Weighted") se(%8.3g) b(%8.3g) starlevels(* 0.10 ** 0.05 *** 0.01) drop(y1* y2* o*) stat(N ll aic r2_p, labels("N" "Log Lik." "AIC" "R2") fmt(%10.7g %10.7g %10.7g %7.2g)) title("Table 7a: Matching Analysis Confirms the Gap Exists.")
*/

************************
* Is it getting better?
************************

*Interaction graph

 quietly summarize article_age, detail 
 local min=5
 local max=34
 local cen25=r(p25)
 local cen50=r(p50)
 local cen75=r(p75)
 local numparams=20
 local inc=(`max'-`min')/(`numparams'-1)
 local iter=0
 matrix foo2 = 0,0,0,0
 local order=2  


 while `iter'<`numparams' { 
    gen article_agea=article_age-`min'-(`inc'*`iter') 
    summarize article_agea
    gen article_ageaXall_female=article_agea*all_female
    gen article_age_sqa= article_agea^2
    gen article_age_sqaXall_female=article_age_sqa*all_female
    display `min'
    display `inc'
    display `iter'


   nbreg sscie_count article_agea all_female article_ageaXall_female tenured R1 y1980-y2008 if C__Year <= 2007 & C__Year >= 1980 & in_analysis == 1

     matrix betas=e(b)                
     scalar x1coef=betas[1,`order']
     matrix ses=e(V)
     scalar x1se=sqrt(ses[`order',`order'])
     local obs=e(N)                     
     scalar ci95=invttail(`obs', 0.05)
     local xval = `min'+(`inc'*`iter')
     matrix foo = x1coef-ci95*x1se, x1coef, x1coef+ci95*x1se, `xval'
     matrix foo2 = foo2 \ foo
     drop article_agea*
     drop article_age_sqa*
     local iter=`iter'+1
     }
 
     matrix points=foo2[2..(`numparams'+1),1..4]
  
     svmat points
     twoway (rconnected points1 points3 points4, scheme(lean1) scale(.8) lpattern(-) msymbol(i) lcolor(black)) (histogram article_age if article_age < 35 & article_age >=5 & in_analysis == 1, width(1) yaxis(2) blcolor(gray) bfcolor(none))   (connected points2 points4, msymbol(i) lcolor("red")), ylabel(, labsize(*medium)) ytitle(Coeff. on All Female and 95% CIs)  ytitle(Histogram of Article Age, axis(2)) yline(0, lwidth(medthick)) xtitle(Article Age) xlabel(5(5)34, labsize(small)) title("Coeff. on All Female at Different Levels of Article Age") legend(off) 
*Only for Mac
*     graph export "figure5.pdf", replace

     drop points*
     clear matrix

gen article_ageXall_female = all_female*article_age
nbreg sscie_count article_age all_female article_ageXall_female tenured R1 y1980-y2008 if C__Year <= 2007 & C__Year >= 1980 & in_analysis == 1
eststo getting_better
esttab getting_better using getting_better.rtf, replace nogaps nonumbers mtitles mtitles("Full Sample") se(%8.3g) b(%8.3g) starlevels(* 0.10 ** 0.05 *** 0.01) drop(y1* y2* o*) stat(N ll aic r2_p, labels("N" "Log Lik." "AIC" "R2")) sfmt(%8.2g) title("Table 9: Subsample analysis of citation count at six years.") label

*full sample
nbreg cum_citations6 all_female coed coauthored tenured tenured_female R1 y1980-y2007 if C__Year >=1980 & C__Year <= 2007 & in_analysis == 1
eststo model_all

*1980s sub sample
nbreg cum_citations6 all_female coed coauthored tenured tenured_female R1 y1980-y1989 if C__Year >=1980 & C__Year <= 1989 & in_analysis == 1, robust
eststo model_80

*1990s sub sample
nbreg cum_citations6 all_female coed coauthored tenured tenured_female R1 y1990-y1999 if C__Year >=1990 & C__Year <= 1999 & in_analysis == 1, robust
eststo model_90

*2000s sub sample
nbreg cum_citations6 all_female coed coauthored tenured tenured_female R1  y2000-y2007 if C__Year >=2000 & C__Year <= 2008 & in_analysis == 1, robust
eststo model_00

esttab model_all model_80 model_90 model_00 using table9.rtf, replace nogaps nonumbers mtitles mtitles("Full Sample" "1980s" "1990s" "2000s") se(%8.3g) b(%8.3g) starlevels(* 0.10 ** 0.05 *** 0.01) drop(y1* y2* o*) stat(N ll aic r2_p, labels("N" "Log Lik." "AIC" "R2")) sfmt(%8.2g) title("Table 9: Subsample analysis of citation count at six years.") label




gen auth_six_years = .
replace auth_six_years=authscore_1986 if C__Year==1980
replace auth_six_years=authscore_1987 if C__Year==1981
replace auth_six_years=authscore_1988 if C__Year==1982
replace auth_six_years=authscore_1989 if C__Year==1983
replace auth_six_years=authscore_1990 if C__Year==1984
replace auth_six_years=authscore_1991 if C__Year==1985
replace auth_six_years=authscore_1992 if C__Year==1986
replace auth_six_years=authscore_1993 if C__Year==1987
replace auth_six_years=authscore_1994 if C__Year==1988
replace auth_six_years=authscore_1995 if C__Year==1989
replace auth_six_years=authscore_1996 if C__Year==1990
replace auth_six_years=authscore_1997 if C__Year==1991
replace auth_six_years=authscore_1998 if C__Year==1992
replace auth_six_years=authscore_1999 if C__Year==1993
replace auth_six_years=authscore_2000 if C__Year==1994
replace auth_six_years=authscore_2001 if C__Year==1995
replace auth_six_years=authscore_2002 if C__Year==1996
replace auth_six_years=authscore_2003 if C__Year==1997
replace auth_six_years=authscore_2004 if C__Year==1998
replace auth_six_years=authscore_2005 if C__Year==1999
replace auth_six_years=authscore_2006 if C__Year==2000

*full sample
reg auth_six_years all_female coed coauthored tenured tenured_female R1 y1980-y2007 if C__Year >=1980 & C__Year < 2007 & in_analysis == 1
eststo model_all

*1980s sub sample
reg auth_six_years all_female coed coauthored tenured tenured_female R1 y1980-y1989 if C__Year >=1980 & C__Year <= 1989 & in_analysis == 1, robust
eststo model_80

*1990s sub sample
reg auth_six_years all_female coed coauthored tenured tenured_female R1 y1990-y1999 if C__Year >=1990 & C__Year <= 1999 & in_analysis == 1, robust
eststo model_90

*2000s sub sample
reg auth_six_years all_female coed coauthored tenured tenured_female R1  y2000-y2007 if C__Year >=2000 & C__Year <= 2008 & in_analysis == 1, robust
eststo model_00

esttab model_all model_80 model_90 model_00 using table10.rtf, replace nogaps nonumbers mtitles mtitles("Full Sample" "1980s" "1990s" "2000s") se(%8.3g) b(%8.3g) starlevels(* 0.10 ** 0.05 *** 0.01) drop(y1* y2* o*) stat(N ll aic r2, labels("N" "Log Lik." "AIC" "R2")) sfmt(%5.2g) title("Table 10:  Subsample analysis of authority score at six years.") label

label variable sscie_count "Citation Count"

*descriptive stats table
estimates clear
estpost summarize sscie_count all_female coed article_age article_age_sq tenured tenured_female coauthored R1 positivist material idea para_atheo-para_real C__Year issue_american-issue_usfp meth_quant-meth_count AJPS APSR BJPS EJIR IO IS ISQ JCR JOP JPR SS WP if in_analysis==1 & C__Year < 2007,listwise 
esttab using table5.rtf, replace cells("mean(fmt(2)) sd(fmt(2)) min max")  nostar nogaps label nonotes nonumbers nomtitles title("Table 5: Descriptive Statistics")


estimates clear
estpost tabulate Issue_Area gender_comp  
esttab using table2.rtf, replace cell(colpct(fmt(2))) unstack noobs label varlabels(`e(labels)') nonumbers nodepvars nonotes nomtitles title("Table2: Gender of Authors by Issue Area.")

*table 3
estimates clear
estpost tabulate Paradigm gender_comp   
esttab using table3.rtf, replace cell(colpct(fmt(2))) unstack noobs label varlabels(`e(labels)') nonumbers nodepvars nonotes nomtitles title("Table2: Gender of Authors by Paradigm.")