Review of 2023

Review of 2023.


The year 2023 almost passed, before the arrival of the year 2024, I prefer to review this year, like what I did each year. This year I enriched my skills in data engineering, data ops and management. I’ll talk about each of them in the following.

Working in retailing (In The Memory)

This year I started different projects not only on data science but also on data engineering and data ops, which is a new domain for me and it completed my skill tree on data.

Head of Data Management projects for some clients

Same as last year, I’m always in charge of data management for some clients: maintaining the data integration process, troubleshooting, designing the workflow for new data integration, designing the retro planning with team consulting, interaction among clients, team consulting and team data. The project became richer and more challenging this year: I created a data audit flow to ensure the data quality, which contains two parts, data engineering checking and business rules checking. Data engineering checking ensures the data’s format, no missing values, columns’ names, etc. Business rules checking verifies if the data make sense in terms of business, such as if the start date is earlier than the end date for each product, if we can find all sold products in the table “product”, or if the brand labels are all good, etc. With the audit results, a report is exported and I discussed it with our consultants, data engineers, data analysts, and with our clients as well.

Moreover, I collaborated with the DevOps team to achieve the migration process on Eldorado, migrated the code base on GitHub and applied configuration files to standardize the codes across clients.

Thanks to these projects, enriched my experiences in data engineering, project management and communication, and let me have my first experience in data ops.

Customer segmentation

This is a pretty interesting project that I accomplished with my intern and my lead. We created a consumer segmentation based on consumers’ purchases for our client several years ago, caused of COVID-19, the purchase behaviours have changed a lot, so it’s time to update the definition for each consumer group.

We took last year’s transaction data, cleaned them and applied PCA and k-means clustering for all loyal customers and all consumer products, got firstly the product groups, then based on them we studied loyal clients’ purchase behaviours and defined various loyal clients groups. The study is industrialized, will be triggered in all quarters and send results to clients automatically.

Update products’ level

This project aims to overhaul the last level of the customer hierarchy at intermarché (level 5) containing the UBC (Unités de Besoin Consommateur in French, consumer need units in English). The project was started by an ancient colleague, I relayed it after his leaving. Based on his work, I accomplished reassigning filtered products to new UBCs. After being verified by consultants, we’ll apply the new level 5 on our platform in 2 steps: integrate them into the database and apply the new table on the platform, For each step, we’ll first test on staging and then put it on production environment when it’s good on staging.

This project is on standby since I’m on maternity leave.


One of the things that I most appreciated for this year is I recruited an intern and finished a great project with her. Thanks to this experience, I enriched the following skills:

  • project management: I learn how to cut a whole project into several parts and assign them to different people according to each member’s strengths, how to estimate the timing for each step, how to organize meetings to follow and ensure the project’s advancement, and how to communicate the project to different teams: consultants, data and our clients.
  • team management: to well understand my intern, keep transparency with her and follow the project, I organized a weekly meeting and several meetings for her one-month and three-month arrival.
  • communication: As I mentioned above, I communicated the project among internal teams (consultant, data), my intern and our clients, it’s a great experience to enhance my communication skills.


Overwatch was built to enable Databricks’ customers, employees, and partners to quickly / easily understand operations within Databricks deployments. As enterprise adoption increases there’s an ever-growing need for strong governance. Overwatch means to enable users to quickly answer questions and then drill down to make effective operational changes. Common examples of operational activities Overwatch assists with are:

  • Cost tracking by various dimensions
  • Governance at various dimensions
  • Optimization Opportunities
  • What If Experiments

Thanks to Overwatch, we got the first understanding of our cost and found most of the clusters that we used on Data Factory pipelines were all-purpose clusters, which spend more than job clusters. Based on this, I switched all-purpose clusters to job clusters by Azure CLI.

Personal new role

This year I have my new role: I become a mother! It’s really a special experience and a new challenge for us! Hope we can be good parents ;)

Hope to see you in 2024!