
Coursera project catch the pink flamingo
This blog presents the Coursera project "catch the pink flamingo". Here, you can find the data model design like classification, clustering, and the histogram made by splunk.
UCSD Introduction to Big Data Week 1 & 2 review
Since big data becomes more and more important in our life. As a fresh graduate
in Economics and Statistics, I’m eager to learn more knowledge about big data.
Thus, I decide to participate the course Big Data specilization, created by University of California, San Diego, taught by Ilkay
Altintas (Chief Data Science Officer), Amarnath Gupta (Dire...

Analysis of internauts' behavior
Since the internship will be finished in a few days, I’m seeking employment and
received several invitations of interview, one of these company is FABERNOVEL
DATA & MEDIA. After the first telephone interview,
there was a data-analysis test, which is really interesting. Thus, I would like
to share with you.
Data
There are 288,425 observatio...

Impact of different elements on compensation
For the end of master, I’m doing my internship at French Development Agency as a data analyst in Guarantee Division (GAR), which
offers guarantees to banks providing loans to SMEs, thus limiting their risks.
In order to record the business operation of GAR, we have several databases.
Thus, during this internship, my principal mission is accordi...
Apply Google Analytics to website
Nowadays, technique plays an important role in our life, almost all of us search
and get information on website. However, as a data analyst, it’s important to
read, understand and explain the website performance, which can contribute to
business of company. However, building such a framework is always complicated
and taking long time. Fortunatel...

Decision Tree
Applying decision tree model in R's library 'rpart', using rpart function to build a model, showing fancyRpartPlot example, predicting with predict(), pruning with prune().

Training set and Test set
As I mentioned in my blog “What is Machine Learning”, Machine Learning
tasks are typically classified into three broad categories, one of them is
Supervised learning. In the supervised learning setting, it is important to
understand the concept of two sets: Training set and Test set, which are
useful to make good predictions about unseen observa...

What is Machine Learning?
Machine learning explores the study and construction of algorithms that can
learn from and perform predictive analysis on data. Such algorithms operate by
building a model from an example training set of input observations in order to
make data-driven predictions or decisions expressed as outputs, rather than
following strictly static program in...
113 post articles, 15 pages.