Everything about Data

Demand Forecasting of Brazilian Commodities

Demand Forecasting of Brazilian Commodities Soybean, Corn, Sugar, Soybean Meal, Soybean Oil and Wheat (left to right). Demand Forecasting is a technique for estimation of probable demand for a product or services. It is based on the analysis of past demand for that product or service in the present market condition. Demand forecasting should be done on a scientific basis and facts and events related to forecasting should be considered.

Effective approach to analyze correlation coefficients

Learn how to use corrplot and corrr packages

Correlation analysis is a key task when you’re exploring any dataset. The principal objective is to find linear relationships between features that can help to understanding the big picture. Probably, the best way to see correlations between variables is to use scatterplots, but in most of time you’re working with a high dimensional dataset with a high number of variables, in these situations you have two major problems: It’s a high computational task to plot lots of scatterplot, specially if you have a big dataset.

How to automate exploratory plots?

An awesome package combo: ggplot2 and purrr

When you are plotting different charts during your exploratory data analysis, you sometimes end up doing a lot of repeated coding. That’s moments you feel like would faster if you go back to excel or other tools you feel more comfortable, and that’s great if you have no time to learn some new technique or adjust some parameters by coding. What I want to show here is a batter way to do your EDA, and with less unnecessary coding and more flexibility.

Hypothesis Testing by Computational Methodology - Part 1

Introduction This is the first of two articles that we’ll talk about two different approaches to perform hypotheses tests, covering the classical and computational methodologies. In the end I’ll show you one R package (Infer) capable to execute any of these methods in an easy, flexible, and less error-prone way. In the second article, we’ll go deeper in a hands-on experiment using the Infer package, if you already know the package and want to see more code than text, click here.

How to Perform Correlation Analysis in Time Series data using R?

What is it correlation analysis? The concept of correlation is the same used in non-time series data: identify and quantify the relationship between two variables. Due to the continuous and chronologically ordered nature of time series data, there is a likelihood that there will be some degree of correlation between the series observations. Measuring and analyzing the correlation between two variables, in the context of time series analysis, can be understood by two different aspects:

Segmentação de clientes de Food Delivery

Segmentação de clientes de Food Delivery Neste projeto o objetivo é segmentar clientes de um food delivery. A segmentação permite que profissionais de marketing e product managers possam identificar subconjuntos de público-alvo para melhor adaptar suas estratégias. O dataset utilizado foi cedido pela Data Science Academy, e pode ser encontrado no meu github juntamente com todo o código apresentado aqui. Um breve overview do que será abordado no projeto: Preparação de dados Visualização de dados Padronização de variáveis Clusterização com K-Means Avaliação do melhor número de cluster com o índice de Calinski-Harabasz Bootstrap para avaliar consistência do cluster Visualização dos clusters Sem mais delongas, vamos ao código!