Internship Opportunities

Graduate level Natural Language Processing Internship

About the Program

Natural language processing is a multidisciplinary field concerned with the interactions between computers and human (natural) languages. Clinical care produces a vast amount of data which could be used by administrators, researchers, and clinicians to improve the quality of care and advance research. However the vast majority of this data is contained within the raw text of physician narratives, requiring either manual abstraction or automated extraction to transform it into usable, aggregatable, and relatable data. The complexity of clinical data and medical language provide unique challenges in text processing and information extraction.

Work alongside natural language processing engineers, researchers, clinicians, data managers and system developers to help architect, develop and implement computational methods and tools to assist time- and resource-intensive manual processes and enable  researchers, clinicians, and physicians to retrieve and use clinical information more efficiently, improving healthcare operations and advancing cancer research.

Examples of Past Projects

Information extraction from pathology reports
Extraction of important data elements like diagnosis, tumor size, location, genetic markers, treatment history, etc. from breast, prostate, and sarcoma pathology reports.

Extraction of Pancreatic Cancer Diagnosis and Staging
Over the course of 5 months, our graduate level, computational linguistics intern piloted a project to automatically extract diagnosis and staging from clinical notes and pathology reports to build a resource of discrete, queryable data for the pancreatic working group at SCCA.    

Examples of Future Projects

A Spectrum of Certainty in Epistemic Terms and Hedging Language
So called "hedging language" is ubiquitous in clinical narratives (e.g. "imaging is worrisome for ___ " or "pathology is consistent with ____").  In a subjective grading, how do these various hedging statements relate to the authors' confidence in the given evidence and how does the use of these hedging statements and their level of certainty vary among specialties.  Also how do the use of these epistemic/evidential/hedging phrases actually relate to changes in treatment

Simple Temporal Designations for Clinical Timelines
Recreating the clinical timeline of events is a crucial step in outcomes research.  For example, piecing together something like "For a given patient cohort, what were the outcomes of treatment X versus treatment Y?" means ascertaining medical and social history, diagnosis, complications, progressions, treatments, and when they all happened.  Determining fine grain temporal relations (for example with Allen Calculus) is an extremely difficult task, even for people with extensive linguistic and/or medical knowledge.  However, creating broad temporal bins for clinical events (e.g. remote past, recent past, present, and future) could potentially, not only be an easier task, but also capture the significant relations necessary to relate clinical events temporally and create a complete timeline of a patient’s clinical story.

Eligibility Requirements

  • Graduate student currently enrolled in or has recently graduated from a masters or PhD program in Computational Linguistics or related field. 
  • Completion of some coursework in each of the following areas is required: shallow processing, deep processing, and statistical methods.

This is a paid, non-benefits eligible internship that starts June 23 and requires a minimum of 20 hours a week for a ten week period ending on Aug 29. 

Attendance for the duration of the program is required.

How to Apply

The 2014 online application has been closed and we are no longer accepting applications.

Please be sure to include the following:

  • a one page document on research interests (may outline past research experience),
  • a paragraph outlining any and all programming and technical experience such as known programming languages, NLP toolkits, software experience, (e.g., Python, R, SQL, Java), and
  • your resume.

Late or incomplete applications will not be processed.

Notification will occur by April 15.

Contact Us

Please read this page completely and carefully before contacting us.

For questions about the program or application process, please contact: Scott Canavera, Internship Program Manager at scanaver@fhcrc.org.  

Thank you for your interest!