Jingwen Zheng

Review of 2023

2023-12-29T10:35:37+01:00

Introduction

The year 2023 almost passed, before the arrival of the year 2024, I prefer to review this year, like what I did each year. This year I enriched my skills in data engineering, data ops and management. I’ll talk about each of them in the following.

Working in retailing (In The Memory)

This year I started different projects not only on data science but also on data engineering and data ops, which is a new domain for me and it completed my skill tree on data.

Head of Data Management projects for some clients

Same as last year, I’m always in charge of data management for some clients: maintaining the data integration process, troubleshooting, designing the workflow for new data integration, designing the retro planning with team consulting, interaction among clients, team consulting and team data. The project became richer and more challenging this year: I created a data audit flow to ensure the data quality, which contains two parts, data engineering checking and business rules checking. Data engineering checking ensures the data’s format, no missing values, columns’ names, etc. Business rules checking verifies if the data make sense in terms of business, such as if the start date is earlier than the end date for each product, if we can find all sold products in the table “product”, or if the brand labels are all good, etc. With the audit results, a report is exported and I discussed it with our consultants, data engineers, data analysts, and with our clients as well.

Moreover, I collaborated with the DevOps team to achieve the migration process on Eldorado, migrated the code base on GitHub and applied configuration files to standardize the codes across clients.

Thanks to these projects, enriched my experiences in data engineering, project management and communication, and let me have my first experience in data ops.

Customer segmentation

This is a pretty interesting project that I accomplished with my intern and my lead. We created a consumer segmentation based on consumers’ purchases for our client several years ago, caused of COVID-19, the purchase behaviours have changed a lot, so it’s time to update the definition for each consumer group.

We took last year’s transaction data, cleaned them and applied PCA and k-means clustering for all loyal customers and all consumer products, got firstly the product groups, then based on them we studied loyal clients’ purchase behaviours and defined various loyal clients groups. The study is industrialized, will be triggered in all quarters and send results to clients automatically.

Update products’ level

This project aims to overhaul the last level of the customer hierarchy at intermarché (level 5) containing the UBC (Unités de Besoin Consommateur in French, consumer need units in English). The project was started by an ancient colleague, I relayed it after his leaving. Based on his work, I accomplished reassigning filtered products to new UBCs. After being verified by consultants, we’ll apply the new level 5 on our platform in 2 steps: integrate them into the database and apply the new table on the platform, For each step, we’ll first test on staging and then put it on production environment when it’s good on staging.

This project is on standby since I’m on maternity leave.

Management

One of the things that I most appreciated for this year is I recruited an intern and finished a great project with her. Thanks to this experience, I enriched the following skills:

project management: I learn how to cut a whole project into several parts and assign them to different people according to each member’s strengths, how to estimate the timing for each step, how to organize meetings to follow and ensure the project’s advancement, and how to communicate the project to different teams: consultants, data and our clients.
team management: to well understand my intern, keep transparency with her and follow the project, I organized a weekly meeting and several meetings for her one-month and three-month arrival.
communication: As I mentioned above, I communicated the project among internal teams (consultant, data), my intern and our clients, it’s a great experience to enhance my communication skills.

finOps

Overwatch was built to enable Databricks’ customers, employees, and partners to quickly / easily understand operations within Databricks deployments. As enterprise adoption increases there’s an ever-growing need for strong governance. Overwatch means to enable users to quickly answer questions and then drill down to make effective operational changes. Common examples of operational activities Overwatch assists with are:

Cost tracking by various dimensions
Governance at various dimensions
Optimization Opportunities
What If Experiments

Thanks to Overwatch, we got the first understanding of our cost and found most of the clusters that we used on Data Factory pipelines were all-purpose clusters, which spend more than job clusters. Based on this, I switched all-purpose clusters to job clusters by Azure CLI.

Personal new role

This year I have my new role: I become a mother! It’s really a special experience and a new challenge for us! Hope we can be good parents ;)

Hope to see you in 2024!

References

Ricardo Loaiza, “2023”, unsplash. [Online]. Available: https://unsplash.com/photos/the-word-eos-spelled-out-of-fireworks-k_2huy51I3k

Data Engineers’ roles in their job

2023-04-23T15:23:05+02:00

Introduction

The composition of the data team can vary depending on the size and needs of an organization. However, some common roles are typically included in a data team. These roles include data engineers, data analysts, data scientists, data architects and data administrators, etc. They work to collect, store, analyze, and interpret data to drive business decisions and improve organizational performance. In this blog, I’ll focus on data engineers, and talk about their role and how they collaborate with others.

Data engineer’s key responsibilities

Data engineers are responsible for designing, building, testing, and maintaining the infrastructure that supports an organization’s data needs. Their role is critical in ensuring that data is properly collected, stored, and made available to data scientists and other users in a timely and efficient manner. Some of the key responsibilities of data engineers include:

Data collection: Data engineers are responsible for designing and implementing systems that collect data from various sources. They ensure that the data is collected consistently and reliably.
Data storage: Once the data is collected, data engineers must store it in a way that is secure, scalable, and easily accessible. They must select appropriate database technology and design the schema that defines how the data is organized.
Data transformation: Raw data is often messy and inconsistent. Data engineers are responsible for cleaning and transforming the data so that it is ready for analysis. This includes removing duplicates, correcting errors, and transforming the data into a format that is suitable for analysis.
Data pipelines: Data engineers design and build the systems that move data from its source to its destination. They create data pipelines that move data from its source to a data warehouse, where it can be analyzed by data scientists.
Data quality: Data engineers are responsible for ensuring that the data used by data scientists is accurate and reliable. They implement quality control measures and monitor the data to identify any issues.
Infrastructure maintenance: Data engineers must ensure that the infrastructure supporting data collection, storage, and analysis is maintained and updated as needed. This includes monitoring system performance and making upgrades to hardware and software as needed.

Collaboration with data scientists/data analysts

When data engineers design the data-storage structure, they need to communicate with data scientists and data analysts to well understand their needs, pain points and use cases. They ensure that the required data is accessible and available in the appropriate formats. Raw data is often not immediately ready for analysis, after understanding that, data engineers start to clean and transform the data, ensuring that it is in a suitable format for analysis, like defining which tables need to create, with what columns, the data type of each column, do they need to do some data transformation and aggregation, create the data pipeline and define its running frequency. Moreover, they also need to create an automatic process for data quality checking, which is to verify the received data format, data volume, data value consistency, business-related logic, etc.

Moreover, they work together to establish data quality standards, data validation processes, and data monitoring mechanisms. Data engineers help data scientists and data analysts understand the data lineage, metadata, and data quality issues, ensuring that they can rely on the accuracy and integrity of the data for their analyses. After all these, data engineers gather feedback on data availability, performance, and infrastructure requirements from the data science and analysis processes. Data engineers continuously improve the data infrastructure and systems based on this feedback, ensuring that they meet the evolving needs of the data science and analysis work.

Collaboration with product owners

Before designing new products or new product features, we need to support them from the data side, which means ensuring that data infrastructure and systems align with the goals and requirements of the product or project.

Data engineers work closely with product owners to understand the data requirements for a specific product or project and gather information about the data sources, data formats, data volume, data quality standards and validation rules to meet the product objectives.

Furthermore, product owners often have specific data needs for their products. Data engineers work with them to define the data pipelines that capture, process, and transform the required data. They collaborate on the data flow, transformations, and integration points to ensure that the data is collected and processed correctly. Throughout the development and deployment lifecycle, data engineers collaborate with product owners to gather feedback, identify areas for improvement, and make necessary adjustments to the data infrastructure. They work together to iterate on the data pipelines and systems, aligning them with evolving product requirements.

Collaboration with clients

As a data engineer in a SaaS(Software as a Service) company, data come from the client side. Thus, communicating with clients on data integration is one of my missions. We engage in discussions and meetings to gather information about the types of data the clients need, the sources of data, and the specific use cases or analytics they want to perform. This helps data engineers gain a clear understanding of the client’s data needs. They also work together to determine which data systems, databases, APIs, or external sources are necessary to collect the required data.

Based on the client’s data requirements, data engineers design data integration processes and pipelines. They collaborate with clients to define the data flow, transformations, and any data cleaning or enrichment steps that may be required. Data engineers work with clients to ensure that the data pipelines align with their specific needs and can provide the desired outputs.

Besides, data engineers need to maintain regular communication with clients throughout the project lifecycle. They provide progress updates, discuss any challenges or issues encountered, and seek client feedback to ensure that the data engineering processes align with the client’s expectations and requirements. This helps maintain transparency and fosters effective collaboration.

Conclusion

Data engineer plays a very important role in terms of the data side and business side by providing the infrastructure necessary to support data-driven decision-making. Without effective data engineering, organizations would struggle to make sense of the vast amounts of data they collect and would miss out on the valuable insights that data can provide.

Collaboration among data engineers, data scientists, and data analysts is crucial for successful data-driven initiatives, by working together, they ensure that data is accessible, reliable, and ready for analysis, enabling meaningful insights and valuable outcomes from the data; collaboration with product owners ensure that the data infrastructure supports the product’s needs, enabling data-driven decision-making and delivering value to end-users; by actively engaging with clients, data engineers can design and implement data solutions that meet their specific needs, leading to successful data-driven outcomes.

References

the blowup, “white and black industrial machine”, unsplash.com. [Online]. Available: https://unsplash.com/photos/qafbp6O0Rr0

Review of 2022

2022-12-26T15:42:52+01:00

Introduction

During the year 2022, our life is almost back to normal, we started to take off the mask, and have more contact with colleagues and friends. As in previous years, it’s time to review the whole year 2022. This year, I not only levelled up my skills in technique but also in project management and communication with clients. In this blog, I’ll resume my year 2022 with the following points:

Working in retailing (In The Memory)
Blogs

Working in retailing (In The Memory)

In The Memory is a retail-tech company that helps retail players to make the best use of the different internal and external data sources to meet their strategic and operational business challenges. Our products allow distributors and brands to accelerate their decision-making to attract more customers and make the best assortment, merchandising, pricing, and promotional choices, in their various physical and online sales channels. We build tailored Augmented Intelligence solutions to meet clients’ priority challenges and serve their strategies by supporting their teams in change management, defining together the best KPIs to meet clients’ challenges and adapting our solutions to the client’s needs, constraints and processes. Moreover, this year, we have been labelled happyatwork 2022 (this label rewards company in which employees are the most committed and motivated), won the LSARetailTech trophy in the “Data, knowledge and customer personalization” category, and have been selected by Business France to represent France during the next Retail’s Big Show in New York in January 2023; we have nearly 70 colleagues vs. 50 in 2021.

This year, I’m promoted to Senior Data Scientist and levelled up my skills not only in technique but also in project management and communication with clients.

Head of Data Management projects for some clients

This year, I started to be in charge of data management for some clients. Thanks to this project, I improve my skills in multiple aspects: maintaining the data integration process, troubleshooting, designing the workflow for new data integration, designing the retro planning with team consulting, interaction among clients, team consulting and team data. All of these allow our clients to well integrate their data into our system which is the base of our data products, it also allows In The Memory to explore and create more products on the platform.

Launch of 2 promotional data products

After more than 1-year development, we finally release these 2 promotional modules for our client. Thanks to this project, I enrich my knowledge on promotion business, know how to manage a project from the viewpoint of data, and discuss the project with different teams (consulting and software). This product helps users to save lots of time in defining the promotion leaflets and it can also attract other retailers to use these amazing modules.

Data reliability

We created some tools for ensuring data reliability. For example, we created the first API of team data, this API allows us to check data format and quality after the user imports the input files which ensures the data is ready to be used to run our calculation and model. I also set up a staging environment for the data team, which is isolated from the production environment, this allows us to modify or develop new features without worrying about breaking the data in production, this is a product guarantee and it’s a playground!

Recruiting talents for In The Memory

It has been 6 years since the graduation ceremony, this year I came back to Toulouse School of Economics again to help In The Memory recruit talents. It was an enriching experience for us to interact with students and teachers throughout the day.

Blogs

This year I focused on my job and wrote only six blogs: some of them summarize what I learnt during my work, some of them talk about Data Science in Retail Discount, and some analyse open source second-hand apartment transaction data.

Since August 2016, I’ve written 116 blogs on various aspects: python, data analysis, data visualisation, machine learning, and data science in retailing, had more than 212k users come more than 301k times on my blogs.

Don’t hesitate if you want to ask questions or write comments, they’re welcome!!

Hope to see you in 2023!

References

Moritz Knöringer, “2022”, unsplash.com. [Online]. Available: https://unsplash.com/photos/LPj8vt3EoXE

Data Science in Retail Discount

2022-10-31T15:26:34+01:00

Introduction

Retail discount plays an important role in increasing turnover, increasing customer loyalty and attracting customers. To have a good strategy of discount, data mining can help retailers to build a recommendation engine that recommends products for customers and retailers. In this blog, I’ll share my experience on how to apply data mining on retail discounts with the following points:

What is a retail discount strategy?
What is a recommendation engine?
Use cases of recommendation engine
Promotion Analysis

What is a retail discount strategy?

Retail discounting is used to decrease the price of specific products for a set amount of time. In some cases, retailers offer a store-wide discount to move excess inventory and create space for new collections. Retailers usually run discounts to attract new customers, increase sales, and clear out old inventory. Large retailers have an easier time selling low-priced merchandise in high volumes, but this strategy doesn’t always work for small to mid-sized retail boutiques. With discounting, it’s important to keep an eye on your profit margins and break-even point, avoid conditioning customers to wait for a sale, and understand exactly why and when you want to discount products.

What is a recommendation engine?

A recommendation system is a platform that provides its users with various content based on their preferences and likings. A recommendation system takes the information about the users and their behaviours as inputs. This information can be in the form of the past usage of the product or the ratings that were provided for the product. It then processes this information to predict how much the user would rate or prefer the product. A recommendation system makes use of a variety of machine learning algorithms.

Another important role that a recommendation system plays today is to search for similarities between different products. In retailing domain, the recommendation system searches for products that are similar to the ones you have purchased previously. This is an important method for scenarios that involve a cold start. In a cold start, the retailer does not have much user data available to generate recommendations. Therefore, based on the products that are sold, the engine can provide recommendations of the products that share a degree of similarity or satisfy the discount rules. There are three types of Recommendation Engine:

Content-based Recommendation System

In a content-based recommendation system, the background knowledge of the products and customer information are taken into consideration. Based on the products that you have purchased in a retailer chain, it provides you with similar suggestions. For example, if you have purchased a product that belongs to the “alcohol” category, the content-based recommendation system will provide you with suggestions for similar products that have the same category.

Collaborative Filtering Recommendation System

Unlike content-based filtering that provided recommendations of similar products, Collaborative Filtering provides recommendations based on the similar profiles of its users. One key advantage of collaborative filtering is that it is independent of product knowledge. Rather, it relies on the users with a basic assumption that what the users liked in the past will also like in the future. For example, if person A purchases alcohol, snacking and baker categories and B purchases snacking, baker and ice-cream categories then A will also like ice cream and B will like the alcohol category.

Hybrid Recommendation System

There is also a third type of recommendation system that combines both Content and Collaborative techniques.

Use cases of recommendation engine

There are many use cases in the retailing domain like recommending products that are complementary to the product the shopper has chosen, offering a discount to the potential customers to encourage the purchase or even recommending some new products that might be interesting for customers. Here I’ll talk about two use cases.

CRM (Customer Relationship Management)

Customer relationship management (CRM) refers to the principles, practices, and guidelines that a retailer follows when interacting with its customers. One of the CRM approaches is offering a discount on different products to various profiles of customers, which is an application of the Collaborative Filtering Recommendation System. The discounts target different objectives:

Loyalty rewards: Most consumers in today’s world expect to be rewarded for shopping at your store and taking home memorable experiences. Studies say 75% of consumers say they favour companies that offer rewards. You can give out a cash discount on every nth purchase, a points-based rewarding system that can be redeemed whenever they want, or gifts on completing purchases of X amount of merchandise.
Conditional discount: Conditional discounts can help you maintain a healthy profit margin while also increasing your revenue. For example, 20% off on purchases of $500 worth of merchandise rather than a straight 15% on every item is likely to be healthier for your business. It will also be more helpful in increasing the average basket size and average revenue per consumer.
First purchase or first interaction discount: Some consumers may browse through your website or your shop and like some of your offerings. But, they might be on the edge and not able to decide if your products are worth the value. You can use a special first-purchase discount to tip them over in the category of paying customers.

Recommending leaflet products to retailers

Usually, retailers start to build the promotion plan several months or even 1 year in advance, since it takes lots of time to negotiate with the supply, define the category and brand for each leaflet, and also design the discount for different products. The promotion plan recommendation system is not that common but it exists and it’s pretty helpful for retailers. It can recommend products for different leaflets by considering various elements like target turnover, target product amount, discount periods, category distribution, brand distribution, etc, and it helps retailers to save lots of time.

Promotion Analysis

After promotion implementation, we need to do some analysis to understand the promotion effect. For example, among all target customers, how many of them benefit from the discount? Thanks to the promotion, how much did turnover increase? Furthermore, the following figure shows the effects of a retailer promotion on the sales of the promoted product (Gedenk 2002, Neslin 2002).

We distinguish between short-term effects, which occur during the promotion, and long-term effects, which involve behaviour that takes place after the promotion. Sales for the promoted brand can increase during the promotion by attracting customers from other stores (store switching), inducing customers to switch brands (brand switching), inducing customers to buy from the promoted category rather than another category (category switching), inducing customers who normally do not use the product category to purchase it (new users), or inducing customers to move their purchases forward in time (purchase acceleration). Purchase acceleration can occur because consumers purchase earlier or because they purchase more than they would have done without the promotion. Consumers can either stockpile the extra quantity for future use or consume it at a faster rate. Total category consumption can also increase owing to category switching or if the promotion attracts new users.

Conclusion

In this blog, we talked about how recommendation engines contribute to retail promotion and how could we follow the performance of a promotion. Since promotion plays an important role in the retail domain, it’s also important to build a suitable promotion plan and analyse its effects, In The Memory could be the expert which helps retailers to accelerate and improve the decision-making process, define the promotion plan, optimise category management levers (promotion, assortment, CRM, etc) and follow the business performance.

References

Alexis Damen, “Discount Strategy in Retail: How to Offer Discounts & Markdowns Without Slashing Profits”, www.shopify.com. [Online]. Available: https://www.shopify.com/retail/retail-discounts-with-profits
“How to Execute a Profitable Retail Discounting Strategy: A Definitive Guide”, blog.appointy.com. [Online]. Available: https://blog.appointy.com/2021/09/17/retail-discounting/
Goran, “40 Amazing Customer Loyalty Statistics in 2022”, www.smallbizgenius.net. [Online]. Available: https://www.smallbizgenius.net/by-the-numbers/customer-loyalty-statistics/#gref
Karen Gedenk, Scott A. Neslin, and Kusum L. Ailawadi, “Sales Promotion”, faculty.tuck.dartmouth.edu. [Online]. Available: https://faculty.tuck.dartmouth.edu/images/uploads/faculty/kusum-ailawadi/Promotions_Metro_Book_Chapter.pdf
“6 Examples of FMCG Brands That Publish Digital Leaflets and Flyers”, www.publitas.com. [Online]. Available: https://www.publitas.com/blog/6-examples-of-fmcg-brands-that-publish-digital-leaflets-and-flyers/alcampo-deals-discounts-in-brochure.jpg
“Customize this Christmas Retail Instagram Post Template”, d1csarkz8obe9u.cloudfront.net. [Online]. Available: https://d1csarkz8obe9u.cloudfront.net/posterpreviews/christmas-season-sale-flyer-template-design-e969eba50ca2c2bcc2e5c432f5ec5080_screen.jpg?ts=1605530185
Skitterphoto, “store prices rabat discount sale”, pixabay.com. [Online]. Available: https://pixabay.com/photos/store-prices-rabat-discount-sale-3867742/

Building Azure function with Python

2022-09-25T09:47:03+02:00

Introduction

Recently I accomplished a product feature which checks input data quality after users upload their datasets to our platform. To return the checking result to the backend, I created an HTTP API to simplify the communication and since we use Microsoft Azure, we choose to create it with the Azure function. In this blog, I’ll talk about this with the following points:

What is Azure function?
Prerequisites
Folder structure
HTTP trigger and bindings
Run functions locally
Publish on Azure
Going further
Conclusion

What is Azure function?

Azure Function is a serverless solution that allows you to write less code, maintain less infrastructure, and save on costs. Instead of worrying about deploying and maintaining servers, the cloud infrastructure provides all the up-to-date resources needed to keep your applications running.

You focus on the pieces of code that matter most to you, and Azure Functions handles the rest.

You can build an Azure function to react to a series of critical events, for example building a web API, responding to database changes, processing IoT data streams, or even managing message queues, etc. and with your preferred language (C#, Java, JavaScript, PowerShell, Python, etc.). In the blog, I’ll only talk about building a web API with Python.

Prerequisites

The specific prerequisites for Core Tools depend on the features you plan to use:

Publish: Core Tools currently depends on either the Azure CLI or Azure PowerShell for authenticating with your Azure account. This means that you must install one of these tools to be able to publish to Azure from Azure Functions Core Tools.
Install extensions: To manually install extensions by using Core Tools, you must have the .NET Core 3.1 SDK installed. The .NET Core SDK is used by Core Tools to install extensions from NuGet. You don’t need to know .NET to use Azure Functions extensions.
Install the Azure Functions Core Tools: Azure Functions Core Tools lets you develop and test your functions on your local computer from the command prompt or terminal. Your local functions can connect to live Azure services, and you can debug your functions on your local computer using the full Functions runtime. You can even deploy a function app to your Azure subscription.

Folder structure

The recommended folder structure for an Azure Functions project in Python looks like the following example:

/
 | - .venv/
 | - .vscode/
 | - my_first_function/
 | | - __init__.py
 | | - function.json
 | | - example.py
 | - my_second_function/
 | | - __init__.py
 | | - function.json
 | - shared_code/
 | | - __init__.py
 | | - my_first_helper_function.py
 | | - my_second_helper_function.py
 | - tests/
 | | - test_my_second_function.py
 | - .funcignore
 | - host.json
 | - local.settings.json
 | - requirements.txt
 | - Dockerfile

The main project folder can contain the following files:

requirements.txt: Contains the list of Python packages that the system installs when you’re publishing to Azure.
host.json: Contains configuration options that affect all functions in a function app instance. This file is published to Azure. Not all options are supported when functions are running locally.
.vscode/: (Optional) Contains stored Visual Studio Code configurations.
.venv/: (Optional) Contains a Python virtual environment that’s used for local development.
Dockerfile: (Optional) Used when you’re publishing your project in a custom container.
tests/: (Optional) Contains the test cases of your function app.
.funcignore: (Optional) Declares files that shouldn’t be published to Azure. Usually, this file contains .vscode/ to ignore your editor setting, .venv/ to ignore the local Python virtual environment, tests/ to ignore test cases, and local.settings.json to prevent local app settings from being published.

local.settings.json

Used to store app settings and connection strings when functions are running locally. This file isn’t published to Azure.

{
  "IsEncrypted": false,
  "Values": {
    "FUNCTIONS_WORKER_RUNTIME": "",
    "AzureWebJobsStorage": "",
    "MyBindingConnection": "",
    "AzureWebJobs.HttpExample.Disabled": "true"
  },
  "Host": {
    "LocalHttpPort": 7071,
    "CORS": "*",
    "CORSCredentials": false
  },
  "ConnectionStrings": {
    "SQLConnectionString": ""
  }
}

IsEncrypted: When this setting is set to true, all values are encrypted with a local machine key. Used with func settings commands. The default value is false.
Values: Collection of application settings used when a project is running locally. This key-value (string-string) pairs correspond to application settings in your function app in Azure, like AzureWebJobsStorage.
- FUNCTIONS_WORKER_RUNTIME: Indicates the targeted language of the Functions runtime.
- AzureWebJobsStorage: Contains the connection string for an Azure storage account. Required when using triggers other than HTTP.
Host: Settings in this section customize the Functions host process when you run projects locally. These settings are separate from the host.json settings, which also apply when you run projects in Azure. You can find more information here.

Function code

A function is the primary concept in Azure Functions. A function contains two important pieces - your code, which can be written in a variety of languages, and some config, the function.json file. For compiled languages, this config file is generated automatically from annotations in your code. For scripting languages, you must provide the config file yourself.

The function.json file defines the function’s trigger, bindings, and other configuration settings. Every function has one and only one trigger. The runtime uses this config file to determine the events to monitor and how to pass data into and return data from function execution. The following is an example function.json file.

{
    "scriptFile": "__init__.py",
    "bindings": [
        {
            "authLevel": "function",
            "type": "httpTrigger",
            "direction": "in",
            "name": "req",
            "methods": [
                "get",
                "post"
            ]
        },
        {
            "type": "http",
            "direction": "out",
            "name": "$return"
        }
    ]
}

The bindings property is where you configure both triggers and bindings. Each binding shares a few common settings and some settings which are specific to a particular type of binding. Every binding requires the following settings:

type: Name of binding.
direction: Indicates whether the binding is for receiving data into the function or sending data from the function.
name: The name that is used for the bound data in the function.

HTTP trigger and bindings

Concept

Triggers cause a function to run. A trigger defines how a function is invoked and a function must have exactly one trigger. Triggers have associated data, which is often provided as the payload of the function.

Binding to a function is a way of declaratively connecting another resource to the function; bindings may be connected as input bindings, output bindings, or both. Data from bindings are provided to the function as parameters.

You can mix and match different bindings to suit your needs. Bindings are optional and a function might have one or multiple input and/or output bindings.

Triggers and bindings let you avoid hardcoding access to other services. Your function receives data (for example, the content of a queue message) in function parameters. You send data (for example, to create a queue message) by using the return value of the function.

Defining

The HTTP trigger is defined in the function.json file. The name parameter of the binding must match the named parameter in the function. The previous examples use the binding name req. This parameter is an HttpRequest object, and an HttpResponse object is returned.

From the HttpRequest object, you can get request headers, query parameters, route parameters, and the message body.

Here is an example:

def main(req: func.HttpRequest) -> func.HttpResponse:
    headers = {"my-http-header": "some-value"}

    name = req.params.get('name')
    if not name:
        try:
            req_body = req.get_json()
        except ValueError:
            pass
        else:
            name = req_body.get('name')

    if name:
        return func.HttpResponse(f"Hello {name}!", headers=headers)
    else:
        return func.HttpResponse(
             "Please pass a name on the query string or in the request body",
             headers=headers, status_code=400
        )

In this function, the value of the name query parameter is obtained from the params parameter of the HttpRequest object. The JSON-encoded message body is read using the get_json method. Likewise, you can set the status_code and headers information for the response message in the returned HttpResponse object.

Run functions locally

Before running functions locally, we need to have installed Azure Functions Core Tools in your machine.

To run a Functions project, you run the Functions host from the root directory of your project. The host enables triggers for all functions in the project.

To test your functions locally, you start the Functions host and call endpoints on the local server using HTTP requests.

The command below must be run in a virtual environment.

# start the Functions host
# version 2.x
func start

Then we call the following endpoint to locally run HTTP and webhook triggered functions:

http://localhost:{port}/api/{function_name}

The following example is the function MyHttpTrigger called from a POST request passing name in the request body:

curl --request POST http://localhost:7071/api/MyHttpTrigger --data '{"name":"Azure Rocks"}'

Publish on Azure

When you’re ready to publish, make sure that all your publicly available dependencies are listed in the requirements.txt file. This file is at the root of your project directory. You can also find project files and folders that are excluded from publishing, including the virtual environment folder, in the root directory of your project.

Three build actions are supported for publishing your Python project to Azure: remote build, local build, and builds that use custom dependencies.

You can also use Azure Pipelines or GitHub Actions to build your dependencies and publish by using continuous delivery (CD), which is also the way that I choose.

In GitHub Actions, a workflow is an automated process that you define in your GitHub repository. This process tells GitHub how to build and deploy your function app project on GitHub. A workflow is defined by a YAML (.yml) file in the /.github/workflows/ path in your repository. This definition contains the various steps as the following and parameters that make up the workflow:

Generate deployment credentials
Create the environment
Build the function app
Deploy the function app

For the details of each step, you can find information here.

Going further

How to go further from here?

If you want to know the best practices for reliable Azure Functions, visit https://learn.microsoft.com/en-us/azure/azure-functions/functions-best-practice
If you want to plan for secure development, visit https://learn.microsoft.com/en-us/azure/azure-functions/security-concepts
If you want to monitor executions in Azure Functions, you can find more information on https://learn.microsoft.com/en-us/azure/azure-functions/functions-monitoring

Conclusion

In this article, we talk about what we need to create an Azure function, what is HTTP triggers and bindings, how to test its functionality locally and how to publish it on Azure. Hope it’s useful for you :)

References

Azure, “azure-functions-templates/Functions.Templates/Templates/HttpTrigger-Python”, github.com. [Online]. Available: https://github.com/Azure/azure-functions-templates/tree/dev/Functions.Templates/Templates/HttpTrigger-Python
“Introduction to Azure Functions”, learn.microsoft.com. [Online]. Available: https://learn.microsoft.com/en-us/azure/azure-functions/functions-overview
“HTTP trigger and bindings”, learn.microsoft.com. [Online]. Available: https://learn.microsoft.com/en-us/azure/azure-functions/functions-reference-python?tabs=asgi%2Capplication-level#http-trigger-and-bindings
t-eckert, “sample-python-azure-functions/encryption_and_decryption/”, github.com. [Online]. Available: https://github.com/t-eckert/sample-python-azure-functions/tree/master/encryption_and_decryption
“Publishing to Azure”, learn.microsoft.com. [Online]. Available: https://learn.microsoft.com/en-us/azure/azure-functions/functions-reference-python?tabs=asgi%2Capplication-level#publishing-to-azure
“Continuous delivery by using GitHub Actions”, learn.microsoft.com. [Online]. Available: https://learn.microsoft.com/en-us/azure/azure-functions/functions-how-to-github-actions?tabs=python
coppla-note, “Azure Functions を使って Python スクリプトを API 化する”, coppla-note.net. [Online]. Available: https://www.coppla-note.net/posts/tutorial/azure-functions/

Learning PostgreSQL with Docker

2022-05-14T22:41:35+02:00

Introduction

During the recent work, I need to write data into PostgreSQL database. Before writing real data on staging, I learnt how to do it with Docker. In this blog, I’ll talk about this with the following points:

What is PostgreSQL?
Why Docker?
Create a database and tables with Docker container
Insert data with Python

What is PostgreSQL?

PostgreSQL is a powerful, open source object-relational database system that uses and extends the SQL language combined with many features that safely store and scale the most complicated data workloads. PostgreSQL comes with many features aimed to help developers build applications, administrators to protect data integrity and build fault-tolerant environments, and help you manage your data no matter how big or small the dataset. In addition to being free and open source, PostgreSQL is highly extensible. For example, you can define your own data types, build out custom functions, even write code from different programming languages without recompiling your database!

Why Docker?

Developing apps today requires so much more than writing code. Multiple languages, frameworks, architectures, and discontinuous interfaces between tools for each lifecycle stage creates enormous complexity. Docker simplifies and accelerates your workflow, while giving developers the freedom to innovate with their choice of tools, application stacks, and deployment environments for each project.

Create a database and tables with Docker container

Here, I used the official docker image of postgres to create a database and tables.

Create a database

CREATE DATABASE xxx;

Create a table

We can create a table with CREATE TABLE and insert value into it with INSERT INTO table_name (column_name) VALUES (values).

CREATE TABLE jsonb_test (
    id INT GENERATED ALWAYS AS IDENTITY,
    parameters jsonb
);
INSERT INTO jsonb_test ("parameters") VALUES ('{"param1":"value1","param2":22,"param3":[3,33]}');

Deletion

To delete data in a table with conditions or data a whole table, we can use DELETE FROM table_name WHERE xxx; or DELETE FROM table_name or DROP TABLE tb_name.

Insert data with Python

Here, I used the official docker images of python3. Now we will insert a pandas dataframe pdf into the table jsonb_test:

from sqlalchemy import create_engine
import pandas as pd

pdf = pd.DataFrame({'parameters':['{"param1":"v1", "param2": 2}']})

eng_pg = create_engine("postgresql://postgres:{pw}@{host}/{dbname}".format(pw=PASSWORD],
                                                                           host=HOST_NAME,
                                                                           dbname=pgdb))
pdf.to_sql("jsonb_test", eng_pg, if_exists='append', index=False)

Before inserting the dataframe into PGSQL, we need to create a PGSQL engine with create_engine by specifying the host name and password, then insert the dataframe with to_sql.

With the check above, we can ensure that we insert the dataframe successfully.

References

“About PostgreSQL”, postgresql.org. [Online]. Available: https://www.postgresql.org/about/
“How to run PostgreSQL and pgAdmin on Docker on Fedora 31?”, medium.com. [Online]. Available: https://medium.com/@amiry.jd/how-to-run-postgresql-and-pgadmin-on-docker-on-fedora-31-e0bda8c65e3d

Second-hand apartments transactions in Île-de-France (01/2014 - 06/2021)

2022-01-03T16:12:46+01:00

Île-de-France is the most populous of the 18 regions of France. It is located in the north-central part of the country and often called the Région Parisienne (“Paris Region”) because it includes the city of Paris. Besides the landmarks of Paris, the region has many important historic sites, including the Palace of Versailles and the Palace of Fontainebleau, as well as the most-visited tourist attraction in France, Disneyland Paris. The poverty rate in Île-de-France was 15.9% in 2015, compared with 12.3% in 2006. The region is also increasingly unequal. Housing prices have pushed the less affluent outside Paris. In this blog, I’ll talk about the second-hand apartments purchases in Île-de-France between January 2014 and June 2021, with the following points:

Data description
How large are the apartments?
Which is the most expensive district?
Relationship between area & apartment price?
Price evolution in terms of timing?

Data description

First of all, I downloaded real estate transactions’ data on the site of government. In the dataset, we have transaction’s information from January 2014 to June 2021, like “nature_mutation” specifies the sale’s nature, “nombre_pieces_principales” indicates the number of rooms, “valeur_fonciere” presents the sold price, “code_commune”, “nom_commune” and “code_departement” specify the communities and departments, “surface_reelle_bati” describes the real surface area.

For this analysis, I only took account of second-hand apartments’ transactions with a positive area in Île-de-France.

How large are the apartments?

I classed second-hand apartments into 5 groups in terms of piece’s number: T1, which means one-room apartment with around 25 m2; T2, which means two-room apartment with around 40 m2; T3 presents three-room apartment around 60 m2; T4 are four-room apartments nearly 80 m2; T5 are five-room apartments larger than 100 m2. This donut chart describes the quote-part of different pieces’ apartment among the transactions. T2 and T3 hold nearly 60% of transactions, nearly 17% of transactions sold T1 apartments, other purchases are for larger apartments. Let’s go further on the details.

This graph describes the average price m2 for different numbers of piece second-hand apartments in Ile-de-France, between January 2014 and June 2021. I classed second-hand apartments into 5 groups in terms of piece’s number: T1, which means one-room apartment with around 25 m2; T2, which means two-room apartment with around 40 m2; T3 presents three-room apartment around 60 m2; T4 are four-room apartments nearly 80 m2; T5 are five-room apartments larger than 100 m2. The average area of each class is similar in different departments. For the T1 and T2 apartments, the ones in Paris are smaller than other departments; however, the second-hand apartments of T4 or larger ones are larger than other departments.

According to the second graph, we find that although T2 and T3 are much larger than T1, their unit prices are lower than the unit price of T1: the gap in Paris is 6.7% and 6.3%, respectively, to T1; the gap in other departments is pretty large (12% and 23%). Moreover, the average area of T4 is three times larger than T1, its unit price is 20% less expensive than T1 on average; except for Paris, T4’s unit price is 1% less expensive than T1 in Paris. Why are T1 apartments that expensive per m2? That might be because there are many students or young workers, they need to rent a big enough apartment, which makes investors invest in T1 apartments, which also leads to higher demands on T1.

Which is the most expensive department?

According to this map, we observe that the second-hand apartments in the center and west of Paris are more expensive( > 8.5k euros per m2) than other districts of Paris(6k - 8.5k euros per m2), the second-hand apartments in Paris are more expensive than other departments of IDF. Among the departments except for Paris, the second-hand apartments in Hauts-de-Seine are more expensive(4k - 6k euros per m2).

Furthermore, in the light of the stacked bar chart, it’s obvious that there are much more transactions in Paris, although it’s much more expensive than other departments, no matter which class of apartments. In Paris, 60% of transactions sold studio or 2-room apartments; but in other departments, the majority of transactions sold 2-room or 3-room apartments. One of the reasons might be the unit price in Paris is more expensive, the studio or 2-room apartment can satisfy the needs of people who live alone or in couple and satisfy the need of investors; on the contrary, people who live with family prefer the apartments outside Paris, they are larger and less expensive.

Relationship between area & apartment price?

According to this group of scatter plots, we can simply get the relationship between second-hand apartments’ price and their area. Each point stands for one transaction, the plots on the red dash line mean that the price per m2 of these transactions is 10k euros. The points above the dashed line indicate their unit price is greater than 10k euros; otherwise, it’s less than 10k euros per m2.

For the transactions in Paris, most of the sold apartments are smaller than 150 m2, the unit price is around 10k euros per m2. However, for the transaction in other departments, most of the apartments are smaller than 130 m2, the unit price is lower than 10k euros per m2; especially in Seine-et-Marne(77), Essonne(91) and Val-d’Oise(95), we can even get a 100 m2 second-hand apartment with only 0.25 million euros, which is much cheaper than other departments.

Price evolution in terms of timing?

The line chart describes the second-hand apartments’ average price per m2 of Ile-de-France, between January 2014 and June 2021. Obviously, the average price in Paris is the highest in Ile-de-France and its evolution is the highest as well, which increased 37%(11.5/8.4 - 1). Besides, the average price in Hauts-de-Seine is the second-highest in Ile-de-France, it increased 33% (7.3/5.5-1); the average price in Val-de-Marne increased 31% (5.5/4.2-1); the average price of other departments in Ile-de-France doesn’t change a lot. Among 8 departments, the second-hand apartments are the most expensive in Paris until June 2021, which is 11.3k euros per m2, it’s 55% higher than the price in Hauts-de-Seine and twice more expensive than second-hand apartments in Val-de-Marne.

The stacked area plot presents the second-hand apartments transaction amount of Ile-de-France, during the same period as the line chart. We can easily find that the transaction amount of Paris’ second-hand apartments is more than other departments of IDF, which is nearly 30% of transactions in IDF. Following Paris, the transaction amount of Hauts-de-Seine and Val-de-Marne are the second and third greatest of IDF. The peak of Paris’s transaction is December 2015 (4612 transactions), for Hauts-de-Seine and Val-de-Marne is July 2019.

Then I used Time Series additive model to decompose data into a trend component, a seasonal component, and a residual component. The trend component captures changes over time, the seasonal component captures cyclical effects due to the time of year, the residual component captures the influences not described by the trend and seasonal effects. Thanks to this model, we find that except for July, there is another transaction peak in March, which we didn’t find above. In June and August, the transactions arrive at their low points, that might be because, during the transition period between 2 months, the desire for purchasing or selling apartments is not that high.

Moreover, I used fbprophet module to predict the price per m2. The black points present actual values, the blue line indicates the forecasted values, and the light blue shaded region is the uncertainty. The uncertainty’s region increases for the prediction because of the initial uncertainty and it grows over time. This can be impacted by policy, social elements, or some others.

Conclusion

According to this analysis, we find that among all transactions of second-hand apartments in Île-de-France, T2 and T3 hold 60% transactions. The second-hand apartments in the center and west of Paris are more expensive(> 8.5k euros per m2) than other arrondissements of Paris(6k - 8.5k euros per m2), the second-hand apartments in Paris are more expensive than other departments of IDF. Among the departments except for Paris, the second-hand apartments in Hauts-de-Seine are more expensive(4k - 6k euros per m2).

Reference

Ministère de l’économie, des finances et de la relance, “Demandes de valeurs foncières”, data.gouv.fr. [Online]. Available: https://www.data.gouv.fr/fr/datasets/5c4ae55a634f4117716d5656/
Grégoire David, “Departments polygon”, github.com. [Online]. Available: https://github.com/gregoiredavid/france-geojson/blob/master/departements.geojson
APUR, “APUR : Communes - Ile de France”, data.gouv.fr. [Online]. Available: https://www.data.gouv.fr/fr/datasets/apur-communes-ile-de-france/
“Île-de-France”, en.wikipedia.org. [Online]. Available: https://en.wikipedia.org/wiki/Paris
Ehi Aigiomawu, “Analyzing time series data in Pandas”, towardsdatascience.com. [Online]. Available: https://towardsdatascience.com/analyzing-time-series-data-in-pandas-be3887fdd621
Will Koehrsen, “Time Series Analysis in Python: An Introduction”, towardsdatascience.com. [Online]. Available: https://towardsdatascience.com/time-series-analysis-in-python-an-introduction-70d5a5b1d52a
Pexels, “Paris gargoyle France architecture”, pixabay.com. [Online]. Available: https://pixabay.com/photos/paris-gargoyle-france-architecture-1852928/

Second-hand apartments transactions in Paris (01/2014 - 06/2021)

2022-01-02T22:20:57+01:00

Paris is the capital and most populous city of France. Since the 17th century, Paris has been one of Europe’s major centers of finance, diplomacy, commerce, fashion, science, and the arts. According to the Economist Intelligence Unit Worldwide Cost of Living Survey in 2018, Paris was the second most expensive city in the world, after Singapore, and ahead of Zürich, Hong Kong, Oslo and Geneva. Another source ranked Paris as the most expensive, on a par with Singapore and Hong Kong, in 2018. All these elements help to bring the increasing price of real estate in Paris. In this blog, I’ll talk about the second-hand apartments purchases in Paris between January 2014 and June 2021, with the following points:

Data description
How large are the apartments?
Which is the most expensive arrondissement?
Relationship between area & apartment price?
Price evolution in terms of timing?

Data description

For this analysis, I only took account of second-hand apartments’ transactions with a positive area in Paris.

How large are the apartments?

I classed second-hand apartments into 5 groups in terms of piece’s number: T1, which means one-room apartment with around 23 m2; T2, which means two-room apartment with around 40 m2; T3 presents three-room apartment around 63 m2; T4 are four-room apartments nearly 93 m2; T5 are five-room apartments with a larger area of about 147 m2. This donut chart describes the quote-part of different pieces’ apartments among the transactions. T1 and T2 hold 60% transactions, 22% transactions sold T3 apartments, other purchases are for larger apartments. Let’s go further on the details.

This graph describes the average price m2 for the different number of piece second-hand apartments in Paris, between January 2014 and June 2021. According to the second graph, we find that although T2 and T3 are much larger than T1, their unit prices are 6.7% and 6.4% lower than the unit price of T1. Moreover, the average area of T4 is three times larger than T1, its unit price is only 4% more expensive than T1; similar for other piece-number apartments. Why are T1 apartments that expensive per m2? That might be because there are many students or young workers in Paris, they need to rent a big enough apartment, which makes investors invest in T1 apartments, which also leads to higher demands on T1.

Which is the most expensive arrondissement?

According to this map, we observe that the second-hand apartments in arrondissements 4, 6, 7, 8 are much more expensive than other arrondissements, their average unit price is at least 11800 euros; on the contrary, the second-hand apartments in arrondissements 18, 19 and 20 are much cheaper than others, their average unit price is less than 8000 euros. This might be caused by geographical positions, number of pieces, apartment’s state, the performance of energy, public security, etc. The public transport in the city center is more than in other areas, there are also lots of shopping centers or tourist spots, which attracts plenty of people, so that makes the city center to be more valuable.

Furthermore, according to the stacked bar plot below, it’s obvious that there are much more transactions in the 18th arrondissement than in other areas, nearly 50% sold apartments are 2-room apartments. The Sacré-Cœur Basilica and Montmartre make the 18th arrondissement famous. A real neighborhood of artists, it is bohemian and cosmopolitan. If you like discovering atypical places and diverse personalities, you will find what you’re looking for. You will discover the popular flea market, many schools and many nightlife venues, such as cabarets around Pigalle. All these attract couples to live in the 18th arrondissement.

Moreover, in the 16th arrondissement, the transaction amount of T4 is pretty larger than all other arrondissements. Paris 16 is eminently residential, as evidenced by its charming buildings with green courtyards and balconies. But it is also a Parisian cultural hotspot with many museums and emblematic places from both a historical and intellectual point of view. Moreover, it concentrates many schools and establishments of choice for the education of children and students. All these might be the reason why the transactions of T4 in Paris 16 are much more than other arrondissements.

For the arrondissements as 1st, 2nd or 3rd arrondissement, more than one third of sold apartments are T1 apartments, that might be because there is not that many apartments at the center of Paris, and its unit price is high.

Relationship between area & apartment price?

For the transactions of the downtown area, most of them are smaller than 50 m2, but their price varies widely to nearly 2 million euros; on the other hand, for the 8th, 16th and 17th arrondissement, many sold apartments’ price also arrive more than 2 million euros, but their area varies widely to 200 m2; moreover, there are also apartments whose unit price and area don’t vary that widely, as in 13th, 18th, 19th and 20th arrondissement, most of the apartments here are smaller than 100 m2 and cheaper than 1 million euros, so than less than 10k euros per m2.

Price evolution in terms of timing?

This graph describes second-hand apartments’ transaction amount and average price per m2 of Paris, between January 2014 and June 2021. The orange line shows the monthly average price per m2, the blue area displays the monthly transaction amount. During 7.5 years, the average price per m2 increases 37% (11.5/8.4 - 1), especially from the year 2017, the average price per m2 increases nearly 26% (11.5/9.1 - 1). Moreover, the transaction amount arrives at the yearly lowest point in August, which might be because people go on holiday at that time; on the contrary, the transactions in July or September are higher than other months, which means that people usually sign the purchase promise in May or July (supposed that we have 2 months for negotiating the credit between the purchase promise and purchase agreement), so that they can sign the agreement before their holiday or before the school opening. Moreover, because of the COVID-19 pandemic, the transaction amount dropped 50% in April 2020, then it reverted after the first confinement ended. Impacted by the pandemic, both transaction amount and average price didn’t increase a lot in 2021.

Then I used Time Series additive model to decompose data into a trend component, a seasonal component, and a residual component. The trend component captures changes over time, the seasonal component captures cyclical effects due to the time of year, the residual component captures the influences not described by the trend and seasonal effects. Thanks to this model, we find that except for July, there is another transaction peak in January, which we didn’t find above. In March and June, the transactions arrive at their low points, that might be because, during the transition period between 2 months, the desire for purchasing or selling apartments is not that high.

Conclusion

According to this analysis, we find that among all transactions of second-hand apartments in Paris, T1 and T2 hold 60% transactions. The second-hand apartments in arrondissements 4, 6, 7 and 8 are much more expensive than other arrondissements, their average unit price is at least 11800 euros; on the contrary, the second-hand apartments in arrondissements 18, 19 and 20 are much cheaper than others, their average unit price is less than 8000 euros.

Reference

Ministère de l’économie, des finances et de la relance, “Demandes de valeurs foncières”, data.gouv.fr. [Online]. Available: https://www.data.gouv.fr/fr/datasets/5c4ae55a634f4117716d5656/
Grégoire David, “Departments polygon”, github.com. [Online]. Available: https://github.com/gregoiredavid/france-geojson/blob/master/departements.geojson
APUR, “APUR : Communes - Ile de France”, data.gouv.fr. [Online]. Available: https://www.data.gouv.fr/fr/datasets/apur-communes-ile-de-france/
“Paris”, en.wikipedia.org. [Online]. Available: https://en.wikipedia.org/wiki/Paris
Ehi Aigiomawu, “Analyzing time series data in Pandas”, towardsdatascience.com. [Online]. Available: https://towardsdatascience.com/analyzing-time-series-data-in-pandas-be3887fdd621
Will Koehrsen, “Time Series Analysis in Python: An Introduction”, towardsdatascience.com. [Online]. Available: https://towardsdatascience.com/time-series-analysis-in-python-an-introduction-70d5a5b1d52a
“Rent apartment Paris 18”, parisattitude.com. [Online]. Available: https://www.parisattitude.com/rent-apartment-paris-18.aspx
“Guide immobilier Paris 16ème arrondissement”, engelvoelkers.com. [Online]. Available: https://www.engelvoelkers.com/fr-fr/paris/guide-immobilier-16eme-arrondissement/
Walkerssk, “Paris France Eiffel tower night”, pixabay.com. [Online]. Available: https://pixabay.com/photos/paris-france-eiffel-tower-night-1836415/

Review of 2021

2021-12-30T14:43:06+01:00

Introduction

The year 2021 is still a year battling with COVID-19. Thanks to the vaccination, our life gradually returns to normal, we started to back to the office more frequently and had the chance to travel as before. This year I continue to go deeper on how to apply data science to the retailing domain and do some data analysis with open-source data as well. In this blog, I’ll resume my year of 2021 with the following points:

Working in retailing (In The Memory)
COVID-19 analysis
Blogs

Working in retailing (In The Memory)

In The Memory is a retail-tech company that helps retail players to make the best use of the different internal and external data sources to meet their strategic and operational business challenges. Our products allow distributors and brands to accelerate their decision-making to attract more customers and make the best assortment, merchandising, pricing, and promotional choices, in their various physical and online sales channels. We build tailored Augmented Intelligence solutions to meet clients’ priority challenges and serve their strategies by supporting their teams in change management, defining together the best KPIs to meet clients’ challenges and adapt our solutions to the client’s needs, constraints and processes. Moreover, this year, we have won the “Pépite du Retail 2020” trophy, voted for by LSA Live participants and was elected best Microsoft 2021 partner in the “France Action Startup Award” category; we have nearly 50 colleagues vs. 25 in 2020.

What did I do during working?

This year I accomplished about twenty CRM (Customer Relationship Management) projects, some are for the distributors, some are for the industrialists. With our analysis, we help them to have 10% more turnover per client. I also developed new features for a module that can extract about 50 KPIs for 1 year for different levels, such as per product/store, per product category/store group, or temporal levels x product/store, like month x product or day x store, etc. The SLA (Service-Level Agreement) of this module is about 2-5 min, and during 2 weeks after the release, the module has already been used around 1500 times. Moreover, with my colleague, we created a model for estimating the product’s turnover and recommend products for different promotion operations, which will be applied in a new module. Since it’s confidential, I won’t talk about the details here ;)

Furthermore, as the company expands, we updated our information on Welcome to the Jungle. I participated in a video shooting for presenting what the data team does in daily work and how we cooperate with other teams like consulting team and dev team.

What did I learn during working?

Working in different projects of retailing, I gained more knowledge of different indicactors. Thanks to the CRM projects, I understand what should we focus on according to clients’ needs and how to segment customers with their purchases. During the daily work, I learned how to cut a project and accomplish different parts of the project with colleagues. The biggest gain is that when I did the sales promotion project, I enriched my knowledge of promotion, I understood different promotions operate different mechanics and generosities; to define the products for each promotion, we need to reach various objectives, such as the turnover objective, product number, brand type distribution, generosity distribution, etc. Thanks to this project, I have closer contact with the people of business (category managers, purchase, promotion, etc.), which let me better understand their needs/pain points, so that we can develop the right product to satisfy the needs or solve the pain points.

COVID-19 vaccine analysis

Since the beginning of 2020, people from all over the world have struggled with the COVID-19 virus, and scientists are also actively looking for solutions. To achieve herd immunity, the most effective method at the moment can be said to be vaccination. Various voices about the vaccine have also been on the cusp of social public opinion, and the praise or controversy about it has never stopped. And a vaccine from theoretical design to clinical trials requires too much wisdom and effort of scientists.

With open-source datasets, I analyzed the adverse reactions of Pfizer vaccine, Coronavac vaccine, AstraZeneca vaccine and Modena vaccine in different blogs:

Whether it is a local reaction or a systemic reaction, the reaction is most obvious after the injection of Modena. 86% of people have local reactions, such as pain, swelling, and redness at the injection site, and nearly 67% have a systemic reaction. Such as fatigue, chills, joint pain, muscle pain, etc. Followed by the Kexing vaccine, 62% of people had a local reaction after injection, and 58% had a systemic reaction. Among the four vaccines, the Pfizer-Biotech vaccine caused the least adverse reactions. The probability of local and systemic reactions after vaccination was 29.5% and 22.4%, respectively.

Blogs

This year I wrote 22 blogs (including this one), they talk about various topics: retailing, COVID-19, population and employee. Moreover, the traffic of my blog increased by 21.4% concerning 2020. I’m pretty glad if my blogs can help you and solve the problems for you.

Besides, I opened a Wechat Official Account, which is likely a personal blog based on Wechat. On this platform, I translated some of my English blogs into Chinese ones and shared them with my Chinese friends. I’ve written 11 blogs and they’ve been read for 8500 times.

Don’t hesitate if you want to ask questions or write comments, it’s welcome!!

Hope to see you in 2022!

References

Tumisu, “New year 2021 moon New year’s eve”, pixabay.com. [Online]. Available: https://pixabay.com/photos/new-year-2021-moon-new-year-s-eve-5678207/

Writing dataframes into an Excel template

2021-11-27T12:07:17+01:00

Introduction

People who are familiar with openpyxl know that we can use it to read/write Excel 2010 xlsx/xlsm/xltx/xltm files. As I presented in this blog, we can create a workbook, assign values to some cells, apply number formats, merge cells, etc. However, if we need to create an Excel dashboard as the following, should we accomplish all formats with openpyxl?

For the question above, we can resolve it from another point of view: we can create an Excel template with the fixed format, such as dashboard title and logo, subtitles, then write values into this template. In this blog, I’ll show you how to do this with the following points:

Context
Write data into the template

Context

We have an Excel template named “template.xlsx”, which contains two worksheets “category” and “product”:

The worksheet “category” shows the performance of each category with different indicators like turnover, volume, number of clients, etc.

The worksheet “product” shows the performance of each product with the same indicators.

And what we need to insert into these two worksheets are three pandas dataframes: classic_indicators_df, other_indicators_df and products_detail_df.

Write data into the template

With all data preparation, the next target is writing the three dataframes into the template with several steps:

Load the workbook
Create a writer
Write dataframe into the template

Load the workbook

template = openpyxl.load_workbook('./template.xlsx')

We load the template with openpyxl.load_workbook by indicating the path.

Create a writer

import pandas as pd

out_path = './final_report.xlsx'

writer = pd.ExcelWriter(out_path)
writer.book = template

We set the writer with pandas.ExcelWriter that allows to write DataFrame objects into excel sheets, and set the template file as the writer’s workbook.

Write dataframe into the template

import openpyxl
from openpyxl.styles.borders import Border, Side
import string

def set_border(ws, cell_range):
    rows = ws[cell_range]
    side = Side(border_style='thin', color="FF000000")

    rows = list(rows)
    max_y = len(rows) - 1  # index of the last row
    for pos_y, cells in enumerate(rows):
        max_x = len(cells) - 1  # index of the last cell
        for pos_x, cell in enumerate(cells):
            border = Border(
                left=cell.border.left,
                right=cell.border.right,
                top=cell.border.top,
                bottom=cell.border.bottom
            )
            border.left = side
            border.right = side
            border.top = side
            border.bottom = side
            
            cell.border = border

Before writing into the template, I create a function set_border() by using Side() and Border() to set borders for each cell in the given range.

df_sheet_list = [(classic_indicators_df, 'category'),
                 (other_indicators_df, 'category'),
                 (products_detail_df, 'product')]

for (df, sht) in df_sheet_list:
    templ_sht = template[sht]
    writer.sheets = {templ_sht.title:templ_sht}

    if df is classic_indicators_df:
        classic_indicators_df.to_excel(writer, sheet_name=sht, index=False,
                                       header=False, startrow=13, startcol=2)
        set_border(writer.sheets[sht], f"C14:G{14-1+len(df)}")
    elif df is other_indicators_df:
        other_indicators_df.to_excel(writer, sheet_name=sht, index=False,
                                       header=False, startrow=13, startcol=8)
        set_border(writer.sheets[sht], f"I14:M{14-1+len(df)}")
    elif df is products_detail_df:
        products_detail_df.to_excel(writer, sheet_name=sht, index=False,
                                       header=False, startrow=12, startcol=4)
        set_border(writer.sheets[sht], f"E13:K{13-1+len(df)}")

writer.save()

We assign the sheet where we will insert the dateframe to writer.sheets. Then we write the dataframe with .to_excel by specifying the writer we use, the worksheet that we want to insert, and the start row and the start column with integers, after all these steps, we save the file with writer.save().

If you are curious about the scripts, you will find them here.

References

“What’s new in Excel 365?”, laptrinhx.com. [Online]. Available: https://laptrinhx.com/what-s-new-in-excel-365-589177363/#
“SALES KPI REPORT”, thesmallman.com. [Online]. Available: https://www.thesmallman.com/premium/key-performance-indicators
“Howard Tucan”, miro.medium.com. [Online]. Available: https://miro.medium.com/max/1400/1*hpMdyM6QNJGix73T-sc8vw.jpeg