asyncio is a high-level API for supporting to implement asyncrounous code, which has been added to python default module from version 3.4. There were some workaround for asyncrounous before, but asyncio is supporting process to run asyncrounous in language level. It will reduce lots of code comparing with old-style way.
It has been few year since I started about knowing data research. And during now, I had some chance to study/use machine learning algorithm with great ML/DL modules, but didn’t think deep about data engineering to store or handle big data, because the data I treated was not so big.
First time I heard of Akka, was the blog post written about case of Twitter, that they implemented their service based on this to handle massive twit data in real-time. I’ve had some experiences developing web server, and currently having interests of concurrency issue, and it makes me looking on this.
As I wrote in previous post, most dataset in real world are not clean. They are usually messed up, incompatible(ex:’-1’ is included in data which defines people’s age), and some of are missing. These kind of dataset will mostly cause error, or return wrong result when you just put on your elaborated logic.
kaggle is a platform for competiting data analytic and predictive modeling. Lots of companies post their raw data, and researchers compete to find best prediction from here. You can use Python, R, or Julia for a research. I’ll work on with Python here.