Some Thoughts as a Junior Faculty (at JHSPH)

Being a Junior Faculty member, or considering it, leads to a lot of questions. I hope to answer a few of them here. Some of my statements will be specific to Johns Hopkins Bloomberg School of Public Health (JHSPH) and maybe specific to the Department of Biostatistics. Disclaimer: this is the only department I have been in (for PhD and faculty), so not all of these may generalize or apply. All of these opinions are my own and all of this is knowledge that was not taken in confidence.

Do you want to be a faculty?

First and foremost: do you want to do this? I'm not saying you need to be 100% sure about everything and this has been your lifelong dream and you've never thought about anything else. I'm saying, did you like writing papers and doing research, where much of the work you needed to be independently motivated? I like going down the rabbit hole and finding out where it leads me. Maybe too much at times. That means finding a bug in my code, figuring out if my hypothesis is off the mark, or whether I can tackle this problem in front of me. The independence is a large draw for me.

Overall, I believe the flexibility/independence to work on what you're passionate about is the main draw of academia. That doesn't mean you'll never have to do things you don't like or aren't passionate about. It means that you'll have the opportunity to explore your own ideas if you want, or work on interesting research that just-so-happens someone else wrote the grant for. A lot of the other perks of academia you can find in other industries. Many jobs today are allowing for flexible time schedules, conference travel, up to 20% independent research time, remote work, and other things that were unheard of 25 years ago. That's not a bad thing for academia, but just that those perks are not only for academic faculty.

That independence/flexibility comes at some cost. For one thing, you may be paid below “market rate” in industry or consulting. The main cost I see, though, is that independence can be hard sometimes, at least for me. I don't like being told what to work on all the time (see rabbit hole above), but I do like some structured work that has deliverables. Trying to reorder your priorities fluidly can be a bit draining.

One of the best analogies I've heard about being a junior faculty is that your own startup. You're the CEO of your own career. You're finding funding usually by grants compared to VCs. You're a recruiter, usually of students and other collaborators. You're your own assistant, scheduling meetings, staying on top of your email, booking your own travel (maybe), and running the meetings. And you're the team doing the research, writing the code, and delivering the product (papers/presentations/grants); you're the advertiser of the product (vlogs/blogs/presentations/papers/classes). Over time, these roles change in the percent of time you spend doing each task, but when you start out, you're it. And lastly, you're setting the agenda and vision for your career.

I'm an impostor: I don't have ideas

Many graduating students have the concern that they will not have enough ideas to generate new papers or grants. I'd stay that's generally not something you should worry about. No area of research is completely explored; but it may be an issue if you are too narrow in your scope. Almost every paper I have finished has led to at least 3 more questions. Those questions may be about that data set or method or about new data we need collected. Even if your well of ideas dries up temporarily (highly doubtful), if you have energetic collaborators/mentors, they will have enough ideas to lend you. If you're working on something someone else suggested, I recommend to 1) understand why it's important before starting, 2) making sure you have enough interest/passion in this topic, for those nights where the project has turned to your worst enemy, this passion keeps you from totally throwing it in the garbage, and 3) to have expectations discussed before doing the work with respect to the level of help those suggesting is providing, and 4) make sure authorship is at least discussed a bit before doing a whole bunch of work. If that doesn't work, go to one conference and see if you don't come back with a handful of ideas.

Soft money vs. Hard money

Soft-money generally refers to salary funding coming from grants or other awards rather than tuition or endowments. Hard money is the opposite and many times the majority of your salary will come from teaching. There are numbers such as “2-1”, “2-2”, “1-1” that refer to the number of classes you teach in a semester for hard money positions. JHSPH is generally a soft money environment. Moreover, we are in a quarter (not semester) system, so the numbers do not mean the same thing. Depending on how much you teach, however, you will be required to cover anywhere from 60-85% of your salary as a tenure-track faculty at a given time on grants or awards. If you're research track, make it 75-100%.

Research Track vs. Tenure-Track

First off, I'm an Assistant Scientist at JHSPH. This means I'm a research-track faculty member. Other institutions have different names for this track and also may have different tracks for research or clinical work, etc. In some departments, research-track faculty members are treated starkly different than tenure-track members, not just implicitly: some have different voting rights and restrictions on their work and/or mentorship. In Biostatistics at JHSPH, research/scientist- track members have similar voting rights (not completely the same) and are treated very similar to tenure-track faculty.
For example:
You can teach courses.
You can have discretionary accounts.
You can be the PI on a grant (or co-PI).
You usually get competitive offers and can use the AMStat news to guide your salary.
Skills related to research, teaching, service, and mentorship are extremely useful.

Some differences worth noting are:
You cannot be the primary research advisor to a PhD student. You can be an advisor, not the primary. You can be a primary research advisor for a Master's student.
This has pros and cons. You can't be the primary mentor, but can still work with students, and tend to not have to find funding for them as that is likely the duty of the primary advisor
You don't have a built-in sabbatical whereas it's more assumed for tenure track. You could potentially negotiate this.
You are usually hired under a project or a direct mentor.
This does not imply that you cannot work on your own work, but that initially you don't have to find all of your funding when starting.
The search, hiring process, and requirements from the dean is not exactly the same as tenure-track
Startup packages are not necessarily the same. Again, could potentially be negotiated.
You start working on day 1, compared to some “protected time” with tenure-track faculty.
You don't have a “tenure clock”. This can be a double-edged sword.
On one hand, you don't have the same timeline pressure.
On the other hand, you may need to make a concerted effort to set up meetings with your chair and/or mentor to discuss progress with respect to promotions. Our chair has yearly progress meetings with all faculty, regardless of track.
This can also lead to more variable promotion timelines. This can be mitigated by clear communication from the chair and mentor about expectations and previous precedence.
You have different expectations for promotion. These can vary wildly from institution to institution. We have similar expectations in many respects at JHSPH, but do not have as many external letters required for the promotion committee.

Mentorship

How do you choose a mentor? Well, find someone you can talk with, that knows stuff about stuff you don't know well, and will agree to make time for you. We have one formal mentor. But most likely, you'll have many mentors. One is likely to be in the department, but you'll likely find mentors that are collaborators. There are some informal setups in our department, which work overall because most people are open to having you schedule a meeting or walk in and ask some questions. If you find a department where that's not the case, try to get something more formal. Generally, someone in a working group you are in may be a good place to start. We also have an informal lunch on the calendar each day where faculty/post-docs may join, which allows you to meet other faculty that you may not directly work with. I have found this immensely helpful to get to know my fellow faculty, or get some advice from senior faculty that have dense schedules I would not feel comfortable sequestering an hour from.

Grants

One of the most asked questions for new junior faculty is about funding. These questions and discussions can be stressful, especially if you have no experience with grants. I had some experience with grants when being a Master's-level statistician, but never from the viewpoint of a PI.

How do Grants work?

Honestly, I'm still not 100% sure. NIH R Grants have different requirements with respect to page limits (https://grants.nih.gov/grants/how-to-apply-application-guide/format-and-write/page-limits.htm), but they are generally between 6 and 12 pages. That seems like a lot but it isn't. Remember, all the aims of the grant, the introduction, the figures, and novelty of the grant needs to go in there. Don't go over the limit; period.

One thing we do in JHSPH Biostatistics is the faculty share written grant proposals. Some of the grants have been funded, some have not been funded but discussed, and some were not discussed. This allows junior faculty who have never been on an NIH panel see an array of grants. I learned writing papers by reading other papers and applying a similar logic structure. I imagine grants are a similar endeavour. Disclaimer, I've never applied for an NIH grant where I was the main PI and the one who did the lion's share of the writing. But when I do, having examples to draw from can help immensely. I have submitted to internal and other grant mechanisms, but not NIH as a PI.

Study sections and that stuff

I will tackle a few simple questions now. At JHSPH, as it is a school of public health, a lot of grants come from the National Institutes of Health (NIH), at many different institutes or centers in the NIH (called ICs). Many of our faculty (not myself though) have received grants from the National Science Foundation (NSF). These tended to be more theoretical, but not always. There are also a number of internal grants at an institution. For example, I have a DELTA grant (https://provost.jhu.edu/about/digital-initiatives/delta/rfp/), which is an internal JHU grant.

Grants have letters and numbers, those letters generally refer to the type of grant it is (see https://grants.nih.gov/grants/funding/funding_program.htm). Many grants you will apply for will be R grants, which stands for research. Particularly for junior faculty, some target Career development awards (K grants, https://researchtraining.nih.gov/programs/career-development). Many faculty target R01 grants, as they are the most common. Junior faculty may be more likely to target R21 grants as well as they are for research in earlier stages.

If you are a postdoctoral fellow, you can apply for a K99/R00 (sometimes called a “kangaroo” grant, https://researchtraining.nih.gov/programs/career-development/K99-R00), which is a “Pathway to Independence Award”. These are similar to R01s in the funding amount usually. They are highly competitive, but the number of eligible applicants is smaller than the number of faculty.

For many sections, there are requests for applications (RFAs) that go out. These are proposals that call for grants that do a specific type of work, tackle a specific subject area, or require specific infrastructure resources. Make sure you're on the mission of the RFA before going forward. In order to do that, you'll want to talk to a program officer. In many respects, these people are similar to project managers in other settings. They have a portfolio of different divisions; this proposal is not their only one. Most program officers (POs) have extensive backgrounds in science, but not always specific to your field or the niche of the RFA. That can cause struggles when discussing some of the importance of your work, but that's a good thing. It's a good thing because the panel of the grant isn't going to be niche people. If the program officer doesn't see how your proposal fits with the RFA, it's highly unlikely the study section will see it either. Also, the program officers look at a number RFAs other than this specific one, which allows them to maybe identify other sections or RFAs where your grant may be more appropriate. Don't harass them, but they are your contact to ask questions and you should use them.

Funding: Direct and Indirects

Grants have direct and indirect costs. The direct costs are the monies needed to do the work, such as salary, computing, data collection/analysis, etc. This is generally how you can fund your salary, your work, students, and/or post-docs. The indirects or indirect costs relate to money in the budget that is not directly related to the work (hence indirect), such as money for office space, staff, heating/cooling, electricity, other institutional requirements/support. The “indirect rate” is negotiated by the school and the funding body (see https://www.hopkinsmedicine.org/research/resources/offices-policies/ora/handbook/appendixc.html for some rates).

Write a lot

I recommend the book “How to Write a Lot: A Practical Guide to Productive Academic Writing”. It's not expensive and it's a short book. Note, this will not teach you how to write well or publish. It's specifically on how to write a lot. As an academic, that's the majority of the job. Writing papers, writing grants, writing letters of recommendation (eventually), writing letters of support (“I'd work on this grant for sure”), writing presentations, etc. Writing a lot can help, even if the writing isn't that great to start. The book also recommends a writing accountability group (WAG). We have one with junior faculty in our department, and it has led to grants and papers that would not have existed otherwise. If you don't have one, start one. At JHU, our faculty development office helps create them and facilitate them if you don't have the ability or pull to start one on your own (https://www.hopkinsmedicine.org/fac_development/career_path/wags.html).

How do you recruit students?

First, students need to know who you are. That means attending departmental events and meetings where there are students. We have a tea time every week where students and faculty share tea. We discuss a number of things: life, pets, that week's seminars, other non-statistics human things. We also have a chili cookoff at the beginning of every academic year so that new students can meet the department. We also have a holiday and end of the year celebration. We additionally have joint faculty/student meetings to discuss departmental matters. We have had off-site retreats approximately every 1.5 years to discuss long-term matters of the department and adjustment to our vision and our mission. Our offices are all on the same floor, so they see us in the halls and know where we sit. If you are in a department where that's not the case, try to be somewhere visible some days a week (like a coffee shop in the building the students are) if possible.

A large resource for recruiting students is teaching the first or second-year courses. These students get to know you, how you work, and you get to know them. They at least know who you are if they are your teacher (hopefully). Thus, in some hard money environments, you may have discussions at interviews about “buying out” of teaching. This can be beneficial to have a discussion about this option, but not teaching may put off some departments and may limit your ability to recruit students quickly. That being said, I find teaching incredibly rewarding, but also extremely tiring. I have never taught a lot in one day and not felt like it took a lot out of me. But I've never seen those days as I “got nothing accomplished”, which has happened with strictly research days at times.

Conferences

At JHSPH, I had the tremendous opportunity to attend a lot of conferences. I like to travel and see places, network and meet people, and don't mind public speaking. All of those traits are helpful for going to conferences, but are by no means necessary. Sadly, some programs allow students to go to one conference over the course of their degree. This sometimes conveys the idea that conferences aren't useful or aren't for students. Both are patently false. You don't need to go to 5 conferences a year to be a successful faculty member. Heck, you don't need to go to any. But conferences are great places to meet people in your field, get your name out there (advertiser), and make collaborations and connections for future projects. Oh, and students definitely do go to conferences (maybe a future post-doc?). To fund the travel, hopefully there are funds in the grant for travel and conferences. If not, you may have money in a discretionary account that you had from a startup package or other means. Our department also will pay for one conference a year for all faculty. If none of those options exist, try your hardest to get a travel award from the conference. These are highly competitive, may be only open to students or post-docs, and will likely not cover all the costs incurred at the conference.

Staff and Administration

Lastly, respect the hell out of your good staff and administration members. My mother was a secretary at a university. She had stories about professors who were not the nicest to staff and that stuck with me. If you think it's hard getting a meeting with a collaborator, imagine trying to organize 5 senior faculty from different departments to get on thesis defenses or filling a speaker schedule where no one answers emails. The administrative may be the gatekeepers to senior faculty calendars or room schedules. They are the glue that keeps things together at times and the oil that keeps the machine running at others.

The administrative team also usually knows the ins and outs of grant submissions and may be the ones submitting the grant. Respect their time. Do not expect them to reply on weekends or after hours unless absolutely necessary. Our admin at JHSPH Biostatistics have made policies about requiring notification that we are submitting a grant a period of time before submission and the faculty agreed. Moreover, most staff and admin have been in the department much longer than you; they know who to talk to, the answers to your questions, and they generally will meet with you and do a Q&A. Most importantly, if you have good people that do not feel respected at their job, they will leave.

Conclusion

Try to find someone who's done well in the environment you're in. That is likely a mentor, but maybe not. Try to have people know who you are. You'll have ideas for research; you'll probably will write grants. Ask successful grant writers for copies of their work to use a starting template. You weren't always an expert on how to analyze data or write papers; it takes practice, help, and usually a template. Like most things, a lot of anxiety and frustration can be mitigated or avoided by having open, frank discussions about expectations, requirements, and getting feedback. Remember, you will likely have to ask for help if you need it, but your department wants you to succeed.

The way people use AI is ruining Reproducible Science Again

The basic premise of this article is this: “Would you accept a paper that did a logistic regression, but did not publish the weights due to intellectual property?”. If you answer yes, then I do not think you will agree with some of the following statements. If so, I thank you for your reviewing service and will let the authors for which I review know who you are to send to you.

If you answered no, my question to you is, why do we accept this for artificial intelligence (AI) models? Here I'm using AI in the broad sense, including machine learning, deep learning, and neural networks. In many of these cases, the model itself is only useful as an object. For example, for a random forest, the combination of the individual trees are necessary to do prediction. It is extremely difficult (likely impossible) to reduce this to a reduced representation that would be useful in a paper to do prediction. In a regression framework, even penalized regression, the model can be shown by a series of weights or beta coefficients. For deep learning models, the number of parameters can explode given the complexity, depth, and representation of the network. When using a convolutional neural network (CNN) to segment or classify images, there can be millions of weights for different areas of an image to get a final result. These weights are impractical to print out in a PDF, text file, or supplemental material as it would take a researcher hours to reconstruct this into the network. Thus, the model weights should be released if the results are to be reproducible or useful on an external data set. I will yield that a CNN can be represented in a figure to some degree and be reproduced, but many times other processing, normalization, augmentation, or other non-shown steps are required for reproducibility.

Why is this Happening?

Frameworks such as Tensorflow, Keras, Theano, and PyTorch make deep learning more usable for all researchers. Fitting these models or predicting output (also called inference) can be done on a number of platforms, including mobile, which makes it highly attractive. Moreover, container solutions such as Docker and Singularity allow the entire system to be preserved on which the model was used. So what's the issue? The growing issue is the use of AI, especially in applications of medical data, is that people are not releasing 1) their data, 2) their code, or 3) the model weights.

Release the Data?

Let us tackle the easiest first: the data. Some data was collected without consent to be released, has protected health information (PHI) that cannot be released under protections such as HIPAA (Health Insurance Portability and Accountability Act). It is completely reasonable for researchers to not be able to release the data. Thus, this is totally valid. I will say if they can release the data, many times it is stated it is “available upon request”, but adherence to this policy is not enforced by many journals as the paper is already published (https://science.sciencemag.org/content/354/6317/1242.1, https://twitter.com/gershbrain/status/1207677210221527045) . If authors simply ignore these requests, there can be little ramifications. This may be understandable, because the downsides to the researcher of releasing data, as 1) users could find issues (may be a benefit), 2) it may require maintaining data usage agreements, or 3) many think of this as “intellectual property”, which I will address now.

Release the Code?

Many people, seeing how well AI is working in their application, think that their method could be turned into a commercial product. This may be valid, but must not be used as a shield against reproducible research. Let's turn to releasing the code. If there is no novelty in the framework they used, such as an off-the-shelf VNET, then the code should be released as nothing is “secret”. Even with slight adaptations, unless large and completely new, the code should be released. Many state that if it is off-the-shelf, why would code need to be released? The reason is that although most off-the-shelf methods are used, getting the data into the correct way before running them, including data processing and checks, need to be available. Thus, these “ancillary” scripts are actually crucial for research and reproduction. Even if the architecture is completely novel, it will likely be described in detail in the publication, and thus potentially could be released. Let's assume though that you cannot release the data or the code.

Release the Model?

Lastly, releasing the model. Again, the “model” in this setting can be a complex set of trees or weights, amongst other things. It's uncertain as to whether PHI can be recovered from these models, which is a valid concern given the data cannot be released. I assert that after many discussions that many don't release the model because it is “proprietary” or has potential “intellectual property” that can be commercializable, which I don't disagree with. What I disagree with is that many applications will not fit the requirements for a patent, as slight changes to an algorithm can classify it as a different algorithm. Using these models in a software-as-a-service (SaaS) framework could potentially be profitable, but it's doubtful this will ever happen. Moreover, there is no time limit on these commercializations. Therefore, you claim this can be commercialized, but after 5 years no progress is made, then is it really going to be commercialized or simply an impediment to reproducible and progressive science. If a model fits in the cloud but never comes down, is it a model really at all?

Any Solution?

So what's the answer? I don't know. But here's some help in reviewing.
Personally, I have been putting in boilerplate concerns with a number of medical imaging AI projects, which hopefully you may be able to use:

  • Overall, the other main concerns are 1) the data is not available to determine quality, 2) no software is available to test or apply this methodology to another data set, and 3) the segmentation/detection results were not directly compared to any of the methodology for segmentation previously published.
  • Releasing the code for processing and modeling, including the final model weights would greatly increase the impact for this paper is highly encouraged.
  • Are the data released anywhere? Will it be made public? Will the segmentations/classifications?

I've had authors and editors give the concerns above, which I have yielded to in some cases. I don't think these are 100% necessary for publication, but I would like to know the reasons that I cannot reproduce this analysis or use it to learn how to do better science. Until journals make clearer guidance about these policies (instead of omitting them in many cases), I guess I'll just be ice-skating uphill.