Experience with data mining, machine learning, distributed system, and web crawling. Hopes to focus more on data science and data engineer in future career.
• Develop python Api (shioaji) for stock/option/future place orde and account.
• Collecting Distributed System Log.
• Monitor Distributed System and alert chatbot.
• Analysis travel data and build a machine learning model. Estimating increase 4% orders (revenue).
• Maintain and develop an ETL distributed queuing system with 20 machines.
• Optimize the ETL system reduced more than 50% execution time.
• Develop new product crawler let product volume increase 1.5%.
• Making analysis charts provide for other departments.
Analysing G7 financial data. Model validation and parameter estimation by regression models ( SUR, MLE, Bootstrapping ). And comparing single equation estimators and confidence interval with system equation.
Calculus, Linear Algebra, Statistics.
FinMind Open data Api
Open source financial data, more than 50 dataset, provide Api.
Automatic update daily by docker swarm, distributed queue system rabbitmq and celery ( 8 cloud machines ).
900 stars on github.
Highly imbalance data, ratio is 1000 : 1, 10 GB dataset size. And the data is 50% missing value. More than 4000 variables, but I build models by only 50 features.
Post-competition analysis, top 10% rank.
Time series problem. Building models predict sales after 48 days.
Post-competition analysis, top 8% rank.
Time series problem, eighty millions data size. Building models predict inventory demand after 2 weeks.
Real competition, top 25% rank.
Predicting which products will an consumer purchase again.
Create python package of Taiwan Train Verification Code to text.
The model is made by keras-CNN.
1. Using swarm, portainer, manager backend service. Include crawler, backend api, finmind api, db and web.
2. Using traefik, on swarm, manager router and DNS. Include DNS of backend api, finmind api and web.
3. Using traefik auto register SSL let's encrypt.
4. Linode Cloud.
1. Including more than 50 Taiwan stock datasets.
2. Github 900 stars.
3. Using python and fastapi develop.
1. Create automated tests and automated deploy for the FinMind team.
2. Using gitlab runner.
3. CD for auto publish python package.
4. CD for auto update and deploy new version service.
Python - numpy, pandas, sklearn, multiprocessing, joblib.
R - parallel, dplyr, data.table, mice.
Python - xgboost-gpu.
R - xgboost, svm, random forest, knn.
Python - kears-CNN.
R - GLM, GLMNET, NLS, SUR, MLE.
Major : Mathematics and Statistics.
R, Python. Basic in English and proficient in Chinese.