Welcome to HDA

Congratulations on your acceptance to the Health Data Analytics and Machine Learning (HDA) MSc at Imperial College London. Your impressive transcripts, CV, personal statement, and references stood out from well over 700 other applicants in a blind selection process. Each year we are excited to welcome a diverse cohort of exceptional candidates from all over the world. Our program is committed to making this formative year in your life as enjoyable, educational, and memorable as possible.

This pre-course guide will ensure that you are set up for success. It provides an introduction to the program, key term dates, general advice, pre-course instructions, and answers for some frequently asked questions.

0.1 What Makes HDA Unique

The students Each cohort is diverse and different. We accept candidates from a wide range of educational backgrounds such as biology, math, medicine, computer science, and more. Some have just graduated, some have substantial professional experience, and others already hold MSc or PHD degrees. Our students have been from a wide range of age, nationalities, and cultures. They typically have had vastly different life experiences. The only similarities between our students is that they are exceptional, open minded, and hard working. This diversity is a major strength of our program. The students benefit from each others perspective. The instructors get to engage with new ideas each year. Group research projects can achieve holistic and novel solutions.

The multidisciplinary approach Health data poses the biggest of big data challenges. Attempting to find causal signals of disease from a lifetime’s worth of external exposures (pollution, stress, early life gender identity, etc.) or to identify molecular signatures of heterogeneous disease phenotypes requires more than just advanced analytical techniques. The challenge requires a rigorous and formal scientific approach which leverages transferable high performance computing and advanced statistical, machine learning, and AI methodology to find the signal through the noise. Possessing a core competency in a relevant domain such as molecular biology is necessary to ensure interpretable, actionable, and impactful solutions.

The hands-on application with real data Real data is messy and real world problems do not have straight forward solutions. Attempting to solve problems without known solutions is often an unique experience for students. Through closely supervised projects in Term 2 and Term 3, you will get the experience of handling unexpected issues, formulating creative solutions, and understanding unexpected results while performing novel, cutting-edge research with real world clinical data.

These unique circumstances poses their own challenge: How to design a course that covers a wide range of advanced topics to a level that is rigorous enough for those familiar with the topic but accessible enough for those new to the material?

0.2 Program Overview

HDA is split into four components. The pre-course material defines the scope of the course and provides conceptual introductions to important core concepts. Term 1 up-skills the cohort and establishes the base competency needed for more advanced approaches. Optional short courses in Biology, Python, and Math are available. At this point, every student has the same multidisciplinary base skill set in coding, statistics, molecular biology, research, and data science. Term 2 builds on that base to learn and apply cutting-edge analytical methodology. Term 3 pairs each student with a world leading research team to design, implement, and report a novel solution to a real-world scientific issue. This capstone project offers an opportunity to demonstrate the skillset from the course and your new ability as an analytical expert.

Each course broadly follows the format of a 1.5-2 hour theoretical lecture in the morning followed by a 3 hour afternoon practical session to apply the subject of the lecture. There are optional weekly conceptual quizzes and ungraded midterms to help you figure out which subjects need more of your attention.

0.3 Getting Ready For the Program

Consistency is key. HDA requires a sustained effort throughout the entirety of the program. This requires prioritization and a manageable work load. You will be studying several courses at once, have a week of exams, and will be exploring your own research ideas. There will always be more work that you could do, another subject to review, or another line of code to write. Burning yourself out will lead to worse grades, learning outcomes, and enjoyment than a committed, organized, and achievable approach.

Don’t panic. Don’t feel that you need to be an expert in the subjects you will be taught before you arrive. A large number of the HDA cohorts have not been provided pre-course material and have performed quite well. Student feedback was that a gentle conceptual introduction to the core themes of the course would have made new or advanced topics more easy to digest.

Please spend 30 minutes to an hour, 5-6 days a week, reviewing the suggested preparatory material. Note that we are not asking you to spend 300 hours, cover a certain amount of material, or vary your effort by when you receive this guide. Try to identify which areas you know the least and begin to get familiar with the subjects. Adopt a targeted approach on a small concept to make the most of each study session. For example: don’t sit down to study linear algebra, but do sit down to study vectors (a concept within linear algebra). Establishing the good habit of consistent and manageable effort is just as important as the material you will cover.

0.4 Iterative Project Development

Most softwares, research projects, and programs adopt iterative project development. This is why there are multiple versions of software. This guide also has versions. Please see the change log for details on what has changed in each version update.

The current version of this guide is v.0.1