By Pasquale Scudieri, Luigi Cutolo
Datawarehouse and Data Science are two hot topics in the world of programming, their presence is increasingly in demand, and it is precisely from this that the idea of this challenge was born.
The scope was to create an algorithm capable of recognizing the activity labels of each time entry and using this as an objective,and meanwhile, to create a database capable of containing those entries in a single environment, which brings together all the different sources currently in use in our community.
To achieve this goal we have created a series of algorithms that will be briefly described in this paper, starting with the decision of the database to be used, then moving on to the study of data, analysis and cleaning of the latter, and finally actually build one or more clustering algorithms, which were able to assign a label to each entry.
We understood that creating a textual clusterization algorithm, when the textual content is so large, is not an easy task to complete, but the result seemed very pleasing to us.
This paper follows our line of thought as we developed the project, highlighting our doubts and certainties. Our study noted the difficulty in developing an application of this kind, but also how useful it was to our training, which in the end was what we were aiming for.