Year 2018 comes soon, at the tail of 2017, I would like to review the whole year, sum up both professional gains and self-learning gains. This year is a turning point in my life: I finished my study in school and entered the workplace. Thanks to the rich education of Toulouse School of Economics and the trust of my lead, I did my modest part to contribute franprix as a Data Scientist.
In franprix, I play a role as a “full stack” data scientist, which means when I put my hands to a new project, I do ETL(Extract-Transform-Load), data cleaning, data visualisation, create model, apply algorithm, analyse and give a presentation to requesters all by myself. Not easy for a junior, but I really enjoy it and learnt lots of new things in technology, I know how to communicate with my colleagues and organise time when I have a new project. In the following, I’ll go into detail.
Projects at franprix
In this year, I studied many interesting subjects. In order to find the impact of weather on sales, I created a linear regression, which can specify the impact of weather on each subfamily.
In order to help merchandise team to organise the shelves, I used apriori algorithm analysed consumers’ preference, determined the shelves that often sold together, and found the association between each two shelves.
Furthermore, we also helped HR Department. As enterprise growing, there are more and more shops spread all over France, so are its employees. Some employees take such a long duration for working and going home. To facilitate the way between one’s home and workplace(shop), CEO proposed that 2 employees can exchange their workplaces, if it saves time for both. This project is for finding the optimal solution for all employees. We combined R and Google Maps API, created a tool for visualising and simplifying the employee-switching. Voilà, like this graph, we can enter an employee ID and click “choice 1”, then his domicile and workplace position will display on the map, the best switch for him as well; it will also display the distance and duration for arriving at the new workplace.
Here I just show you the most interesting projects of this year, we also achieved many other projects like automating creation of report and sending e-mail by Talend, visualising data by Microstrategy, analysing stores’ performance, analysing tourist-area stores, etc.
As a data scientist, we should never stop learning new things(languages, technologies, algorithms). In this year, I participated Microstrategy’s formation and learnt how to make a clear and simple visual report with its tools, and continue to participate online courses. Thanks again to datacamp, I started to learn basic knowledge of Python. I also learnt linear regression, ridge regression, lasso and logistic regression in books “Introduction to Machine Learning with Python” and “Python Machine Learning Cookbook”.
Besides python, I learnt a new framework, git, on youtube. Git is a
version control system for tracking changes in computer files and coordinating
work on those files among multiple people. It is primarily used for source code
management in software development, but it can be used to keep track of changes
in any set of files. Thanks to the videos, I learnt that
merge means from
the chosen branch switch to the current HEAD branch, when there are merge
conflicts, you can UNDO a merge(
abort) and start over;
rebase is like an
alternative to merge, we should use
rebase only locally on unpublished
commits, and never rebase commits that have already been pushed to a remote
repository. If you’re interested for the details, you will find more things on
youtube and Git documentation.
Moreover, I keep learning R this year, not only on datacamp, but also reading a book named “R IN ACTION”. By reading it, I strengthened basic knowledge of R, like data structures, simple visualisation, also enrich my knowledge reserve of machine learning. I learnt more about OLS regression models, using regression diagnostics to assess the data’s fit to statistical assumptions, and methods for modifying the data to meet these assumptions more closely; I looked at how to create time series in R, assess trends, examine seasonal effects, and considered two of the most popular approaches to forecasting: exponential models and ARIMA models; I understand the logistic regression is a type of generalized linear model to predict a binary outcome from a set of numeric variables; I know that decision trees involve creating a set of binary splits on the predictor variables in order to created a tree that can be used to classify new observations into one of two groups. All these algorithms helped our team on different projects.
Apart from participating online courses and reading books, I also joined technical meetups and exchanged ideas and technologies with others.
Besides technical progress, I got some tips on communicating with my colleagues. During our first discussion on a new project, I used to asking their objective, the problems they met, project’s deadline and all elements for the project. Then thinking about its feasibility, possible duration and give requesters the first response on project planning, so that we can assure the deadline according to our ability.
Since there are lots of projects to analyse, we need to well organise our time for being efficient. Everyday before leaving for home, I write a todo list for tomorrow on my agenda, it can remind me the tasks immediately when I arrive at my office the next day. How to make myself to be efficient? Besides listing tasks on my agenda, I also organise the duration for each task. For the latter, I follow Pomodoro Technique with a tool named Pomotodo, I assign time-consuming of each task to several pomodoro, and keep focused during each pomodoro. It works pretty well for me.
Sometimes we can’t avoid taking a roundabout course in our work. Under these circumstances, I always asked myself the reason, why I misguided? What leads it? What should I notice in the following projects? Persisting in so doing, this kind of self-examination helps me to go forward faster.
Apart from working in data science, I also have chance to recruit my own trainees. It’s really an amazing experience, I applied for a job one year ago, but this time I sat on the other side of table and interviewed my trainee candidates. By this chance, I understand better what company thinks, emphasizes and expects. Today I’m responsible of 2 trainees, thankfully, they help me a lot.
As growing of team, we need tools to manage our codes and projects. For code management, the first idea came into my mind is Git, because it’s a version control system and suitable for coordinating work, and we can also review codes for each other, it’s good for our team and ourselves. Considering the security, we decided to install gitlab in our server. For project management, we need a tool to assign projects to different people, separate a whole project into small parts and track progress. Finally we chose freecamp because it’s simple and clear to use, all its functions satisfy our need.
Well, besides learning technology, I did sports because it’s good to health and health is important for learning. This year I participated and finished Paris Marathon, it’s my third marathon. Voila, here is the medal:
Looking back this year, I learnt many things on different aspects. Hope it will be better in 2018. Happy new year!