Directions: Present your perspective on one of the issues below, using relevant reasons and/or examples to support your views. “The function of science is to reassure; the purpose of art is to upset. Therein lies the value of each.”
“If an employer wants to estimate how many employees like him without anonymous questionnaires, what is he supposed to do?” That is the question a junior student asked us when he was introducing the statistics major, “He should prepare a box with two even balls numbering 1 and 2. Then ask all employees to pick a ball from the box. If the employee picks ball 1 he should answer the question ‘Do you like your boss?’, otherwise he should answer the question ‘Do you like blue?’. Now the employer only knows the answer but does not know which question the employee has answered. If the employers know what percentage of ordinary people likes blue, he could calculate what percentage of his employees like him.” That is the first time I acknowledged the power of statistics and found it so interesting that I later chose it as my major. During my time at Cambridge, I wrote an essay entitled Optimal Allocation in Sequential Multi-armed Clinical Trials with a Binary Response under the supervision of Dr David Robertson and Dr Sofia Villar. After reading some papers on the topic, I found that most papers are based on normal linear models. However, the clinical trials usually take a very long time in real life, so I started to take temporal effects into account. I decided to replace the normal linear model with the mixed effect model, which I have changed the representation of the error to make it suitable for clinical trials. To compare it with the normal linear model, I calculated the solution and derived some properties for A_A, E_A and D_A optimal allocations. I can remember those afternoons I spent to calculate this new model and the joy when I finally made it. For the first time, I feel like I can make some original improvement to the field. I am still working with my supervisors to make the essay into a paper for publishment. I have also participated in the Cambridge Mathematics Placement programme which invites researchers from other departments to give presentations about their project and mathematics students can join the project they are interested in. I met my EBI supervisor at the seminar whose presentation attracts me most. The project is about how sampling bias could influence the result in continuous phylogeography. The basic idea is to divide the dataset based on their locations and use Markov Chain Monte Carlo method to find the posterior density of the ancestor’s location with the whole set or the subset or the sequence-free samples which we dropped the genetic information but kept date and location. We found that the posterior densities are very different in each scenario which indicates a strong effect of sampling bias. After graduation, I worked as a data analyst. I shortly found that the job is product-driven and I am not really interested in making a successful product. I start to miss the days when I was doing research: trying to find out something that nobody else has tried is hard, but the sense of achievement when finally succeed all worth it. That is why I choose to return to academic by applying for a PhD position. During my PhD, I would like to do some research on machine learning and high-dimensional statistics. These two fields have changed the way we deal with data and produced useful results, and I believe they will continue flourishing with the development of artificial intelligence. Among all the applications of machine learning, I am most optimistic about autonomous vehicles. Many companies have reached the high automation level and some are very close to the full automation level. If full automation were achieved, we could reduce vehicle number, traffic collisions, air pollution, and very likely return the city to citizens. At the moment, I am most interested in Gaussian process. I am considering to combine Gaussian process with the idea of generative adversarial networks (GAN). GAN is based on neural network and has been proved successful in many areas. My idea is to replace neural network with Gaussian process and employ expectation propagation to calculate the parameters. First, input some multi-dimensional normal noise to the generator with some random parameters. Second, use the output of the generator and the real data as the input to the discriminator which tries to distinguish noise from real data. Finally, use expectation propagation in the generator to make the output more like real data. Another idea I would like to develop during PhD is how to address temporal effect in Gaussian process. A possible path is to add a random effect error to the squared exponential covariance function if the two data come from the same phase. Then we can derive the solutions and properties for this model in classification and regression. I think this can be utilised in clinical trials which usually have a long time span. The reason I think Modern Statistics and Statistical Machine Learning CDT is the right choice for me lies int its fit to my research interests. I am interested in modern statistics and machine learning and the programme can provide me with a lot of opportunities to research in the two fields. Secondly, students can work in groups to explore additional aspects of the lecture materials in the teaching module. From my perspective, cooperation and communication skill are important in research, and therefore this should prepare me well for the projects I undertake in my PhD. I have attended lectures covering statistical learning and modern statistics during undergraduate and master, as well as utilised the knowledge in research projects and essay. With the training of the programme in research methodologies and business skills, I believe I would become a successful researcher in statistics. I am hoping that I would have a thorough understanding of the field after I finished the degree: have a good taste of research, know which work is important and have a sense of how difficult the problems are. I also hope that I could have left some meaningful work which made an improvement to the field. After the programme, I would like to be a researcher in the industry using the knowledge and skills I learnt to solve problems in statistical learning and artificial intelligence.