Hands-On Machine Learning for Algorithmic Trading - Original PDF

دانلود کتاب Hands-On Machine Learning for Algorithmic Trading - Original PDF

Author: Stefan Jansen

0 (0)

توضیحات کتاب :

Historically, algorithmic trading used to be more narrowly defined as the automation of trade execution to minimize costs as offered by the sell side, but we will take a more comprehensive perspective since the use of algorithms, and ML, in particular, has come to impact a broader range of activities from idea generation and alpha factor design to asset allocation, position sizing, and the testing and evaluation of strategies. This chapter looks at the bigger picture of how the use of ML has emerged as a critical source of competitive advantage in the investment industry and where it fits into the investment process to enable algorithmic trading strategies. We will be covering the following topics in the chapter: How this book is organized and who should read it How ML has come to play a strategic role in algorithmic trading How to design and execute a trading strategy How ML adds value to an algorithmic trading strategy

سرچ در وردکت | سرچ در گودریدز | سرچ در اب بوکز | سرچ در آمازون | سرچ در گوگل بوک

656 بازدید 0 خرید

ضمانت بازگشت

ضمانت بازگشت

فایل های تست شده

فایل های تست شده

پرداخت آنلاین

پرداخت آنلاین

تضمین کیفیت

تضمین کیفیت

دانلود فوری

دانلود فوری

Algorithmic trading relies on computer programs that execute algorithms to automate some, or all, elements of a trading strategy. Algorithms are a sequence of steps or rules to achieve a goal and can take many forms. In the case of machine learning (ML), algorithms pursue the objective of learning other algorithms, namely rules, to achieve a target based on data, such as minimizing a prediction error. These algorithms encode various activities of a portfolio manager who observes market transactions and analyzes relevant data to decide on placing buy or sell orders. The sequence of orders defines the portfolio holdings that, over time, aim to produce returns that are attractive to the providers of capital, taking into account their appetite for risk. Ultimately, the goal of active investment management consists in achieving alpha, that is, returns in excess of the benchmark used for evaluation. The fundamental law of active management applies the information ratio (IR) to express the value of active management as the ratio of portfolio returns above the returns of a benchmark, usually an index, to the volatility of those returns. It approximates the information ratio as the product of the information coefficient (IC), which measures the quality of forecast as their correlation with outcomes, and the breadth of a strategy expressed as the square root of the number of bets. Hence, the key to generating alpha is forecasting. Successful predictions, in turn, require superior information or a superior ability to process public information. Algorithms facilitate optimization throughout the investment process, from asset allocation to idea- generation, trade execution, and risk management. The use of ML for algorithmic trading, in particular, aims for more efficient use of conventional and alternative data, with the goal of producing both better and more actionable forecasts, hence improving the value of active management

چکیده فارسی

 

تجارت الگوریتمی متکی بر برنامه‌های رایانه‌ای است که الگوریتم‌هایی را برای خودکار کردن برخی یا همه عناصر یک استراتژی معاملاتی اجرا می‌کنند. الگوریتم ها دنباله ای از مراحل یا قوانین برای رسیدن به یک هدف هستند و می توانند اشکال مختلفی داشته باشند. در مورد یادگیری ماشینی (ML)، الگوریتم‌ها به دنبال یادگیری الگوریتم‌های دیگر، یعنی قوانین، برای دستیابی به هدف مبتنی بر داده‌ها، مانند به حداقل رساندن خطای پیش‌بینی هستند. این الگوریتم‌ها فعالیت‌های مختلف یک مدیر پورتفولیو را رمزگذاری می‌کنند که معاملات بازار را مشاهده می‌کند و داده‌های مربوطه را برای تصمیم‌گیری در مورد سفارش خرید یا فروش تجزیه و تحلیل می‌کند. توالی سفارشات، دارایی‌های پرتفوی را تعریف می‌کند که در طول زمان، با در نظر گرفتن تمایل آنها به ریسک، بازدهی جذابی برای تأمین‌کنندگان سرمایه ایجاد می‌کنند. در نهایت، هدف مدیریت سرمایه گذاری فعال، دستیابی به آلفا، یعنی بازدهی بیش از معیار مورد استفاده برای ارزیابی است. قانون اساسی مدیریت فعال، نسبت اطلاعات (IR) را برای بیان ارزش مدیریت فعال به عنوان نسبت بازده پرتفوی بالاتر از بازده یک معیار، معمولاً یک شاخص، به نوسانات آن بازده ها اعمال می کند. نسبت اطلاعات را به‌عنوان حاصل ضرب ضریب اطلاعات (IC) تقریب می‌کند، که کیفیت پیش‌بینی را به عنوان همبستگی آن‌ها با نتایج، و وسعت یک استراتژی را که به صورت جذر تعداد شرط‌ها بیان می‌شود، اندازه‌گیری می‌کند. از این رو، کلید تولید آلفا پیش بینی است. پیش‌بینی‌های موفق به نوبه خود نیازمند اطلاعات برتر یا توانایی برتر برای پردازش اطلاعات عمومی هستند. الگوریتم‌ها بهینه‌سازی را در سراسر فرآیند سرمایه‌گذاری، از تخصیص دارایی گرفته تا ایده‌پردازی، اجرای تجارت و مدیریت ریسک تسهیل می‌کنند. هدف استفاده از ML برای تجارت الگوریتمی، به ویژه، استفاده کارآمدتر از داده‌های معمولی و جایگزین، با هدف تولید پیش‌بینی‌های بهتر و عملی‌تر، در نتیجه بهبود ارزش مدیریت فعال است

 

ادامه ...

Author(s): Stefan Jansen

Publisher: Packt, Year: 2018

ISBN: 978-1-78934-641-1

ادامه ...

Table of Contents [ vi ] Chapter 7: Linear Models 175 Linear regression for inference and prediction 176 The multiple linear regression model 177 How to formulate the model 177 How to train the model 178 Least squares 178 Maximum likelihood estimation 179 Gradient descent 180 The Gauss—Markov theorem 181 How to conduct statistical inference 182 How to diagnose and remedy problems 184 Goodness of fit 184 Heteroskedasticity 185 Serial correlation 186 Multicollinearity 187 How to run linear regression in practice 187 OLS with statsmodels 187 Stochastic gradient descent with sklearn 190 How to build a linear factor model 190 From the CAPM to the Fama—French five-factor model 191 Obtaining the risk factors 193 Fama—Macbeth regression 194 Shrinkage methods: regularization for linear regression 198 How to hedge against overfitting 198 How ridge regression works 199 How lasso regression works 201 How to use linear regression to predict returns 201 Prepare the data 201 Universe creation and time horizon 202 Target return computation 202 Alpha factor selection and transformation 203 Data cleaning – missing data 203 Data exploration 204 Dummy encoding of categorical variables 204 Creating forward returns 205 Linear OLS regression using statsmodels 206 Diagnostic statistics 206 Linear OLS regression using sklearn 207 Custom time series cross-validation 207 Select features and target 207 Cross-validating the model 208 Test results – information coefficient and RMSE 209 Ridge regression using sklearn 210 Tuning the regularization parameters using cross-validation 211 Cross-validation results and ridge coefficient paths 212 Top 10 coefficients 212 Lasso regression using sklearn 213 Table of Contents [ vii ] Cross-validated information coefficient and Lasso Path 214 Linear classification 215 The logistic regression model 215 Objective function 216 The logistic function 216 Maximum likelihood estimation 217 How to conduct inference with statsmodels 218 How to use logistic regression for prediction 220 How to predict price movements using sklearn 220 Summary 222 Chapter 8: Time Series Models 224 Analytical tools for diagnostics and feature extraction 225 How to decompose time series patterns 226 How to compute rolling window statistics 227 Moving averages and exponential smoothing 228 How to measure autocorrelation 229 How to diagnose and achieve stationarity 229 Time series transformations 230 How to diagnose and address unit roots 231 Unit root tests 233 How to apply time series transformations 234 Univariate time series models 236 How to build autoregressive models 237 How to identify the number of lags 237 How to diagnose model fit 238 How to build moving average models 238 How to identify the number of lags 239 The relationship between AR and MA models 239 How to build ARIMA models and extensions 239 How to identify the number of AR and MA terms 240 Adding features – ARMAX 240 Adding seasonal differencing – SARIMAX 241 How to forecast macro fundamentals 241 How to use time series models to forecast volatility 243 The autoregressive conditional heteroskedasticity (ARCH) model 244 Generalizing ARCH – the GARCH model 245 Selecting the lag order 245 How to build a volatility-forecasting model 246 Multivariate time series models 250 Systems of equations 250 The vector autoregressive (VAR) model 251 How to use the VAR model for macro fundamentals forecasts 252 Cointegration – time series with a common trend 256 Testing for cointegration 257 How to use cointegration for a pairs-trading strategy 258 Summary 259 Table of Contents [ viii ] Chapter 9: Bayesian Machine Learning 260 How Bayesian machine learning works 261 How to update assumptions from empirical evidence 262 Exact inference: Maximum a Posteriori estimation 263 How to select priors 264 How to keep inference simple – conjugate priors 265 How to dynamically estimate the probabilities of asset price moves 265 Approximate inference: stochastic versus deterministic approaches 267 Sampling-based stochastic inference 268 Markov chain Monte Carlo sampling 268 Gibbs sampling 269 Metropolis-Hastings sampling 270 Hamiltonian Monte Carlo – going NUTS 270 Variational Inference 270 Automatic Differentiation Variational Inference (ADVI) 271 Probabilistic programming with PyMC3 271 Bayesian machine learning with Theano 272 The PyMC3 workflow 272 Model definition – Bayesian logistic regression 273 Visualization and plate notation 274 The Generalized Linear Models module 275 MAP inference 275 Approximate inference – MCMC 275 Credible intervals 276 Approximate inference – variational Bayes 276 Model diagnostics 277 Convergence 277 Posterior Predictive Checks 279 Prediction 279 Practical applications 280 Bayesian Sharpe ratio and performance comparison 280 Model definition 281 Performance comparison 281 Bayesian time series models 282 Stochastic volatility models 283 Summary 283 Chapter 10: Decision Trees and Random Forests 284 Decision trees 285 How trees learn and apply decision rules 285 How to use decision trees in practice 287 How to prepare the data 287 How to code a custom cross-validation class 288 How to build a regression tree 288 How to build a classification tree 291 How to optimize for node purity 291 How to train a classification tree 292 How to visualize a decision tree 292 How to evaluate decision tree predictions 293 Feature importance 294 Table of Contents [ ix ] Overfitting and regularization 294 How to regularize a decision tree 295 Decision tree pruning 296 How to tune the hyperparameters 297 GridsearchCV for decision trees 297 How to inspect the tree structure 298 Learning curves 299 Strengths and weaknesses of decision trees 300 Random forests 301 Ensemble models 302 How bagging lowers model variance 303 Bagged decision trees 304 How to build a random forest 306 How to train and tune a random forest 307 Feature importance for random forests 310 Out-of-bag testing 311 Pros and cons of random forests 311 Summary 312 Chapter 11: Gradient Boosting Machines 313 Adaptive boosting 314 The AdaBoost algorithm 315 AdaBoost with sklearn 317 Gradient boosting machines 319 How to train and tune GBM models 321 Ensemble size and early stopping 321 Shrinkage and learning rate 322 Subsampling and stochastic gradient boosting 322 How to use gradient boosting with sklearn 323 How to tune parameters with GridSearchCV 324 Parameter impact on test scores 325 How to test on the holdout set 327 Fast scalable GBM implementations 327 How algorithmic innovations drive performance 328 Second-order loss function approximation 328 Simplified split-finding algorithms 330 Depth-wise versus leaf-wise growth 330 GPU-based training 331 DART – dropout for trees 331 Treatment of categorical features 332 Additional features and optimizations 333 How to use XGBoost, LightGBM, and CatBoost 333 How to create binary data formats 333 How to tune hyperparameters 335 Objectives and loss functions 335 Learning parameters 335 Regularization 336 Randomized grid search 336 Table of Contents [ x ] How to evaluate the results 338 Cross-validation results across models 338 How to interpret GBM results 342 Feature importance 342 Partial dependence plots 343 SHapley Additive exPlanations 345 How to summarize SHAP values by feature 346 How to use force plots to explain a prediction 347 How to analyze feature interaction 349 Summary 350 Chapter 12: Unsupervised Learning 351 Dimensionality reduction 352 Linear and non-linear algorithms 354 The curse of dimensionality 355 Linear dimensionality reduction 357 Principal Component Analysis 358 Visualizing PCA in 2D 358 The assumptions made by PCA 359 How the PCA algorithm works 360 PCA based on the covariance matrix 360 PCA using Singular Value Decomposition 362 PCA with sklearn 363 Independent Component Analysis 365 ICA assumptions 365 The ICA algorithm 366 ICA with sklearn 366 PCA for algorithmic trading 366 Data-driven risk factors 366 Eigen portfolios 369 Manifold learning 372 t-SNE 374 UMAP 375 Clustering 376 k-Means clustering 377 Evaluating cluster quality 379 Hierarchical clustering 381 Visualization – dendrograms 382 Density-based clustering 383 DBSCAN 383 Hierarchical DBSCAN 384 Gaussian mixture models 384 The expectation-maximization algorithm 385 Hierarchical risk parity 386 Summary 388 Chapter 13: Working with Text Data 389 How to extract features from text data 390 Challenges of NLP 390 The NLP workflow 391 Table of Contents [ xi ] Parsing and tokenizing text data 392 Linguistic annotation 392 Semantic annotation 393 Labeling 393 Use cases 393 From text to tokens – the NLP pipeline 394 NLP pipeline with spaCy and textacy 394 Parsing, tokenizing, and annotating a sentence 395 Batch-processing documents 396 Sentence boundary detection 397 Named entity recognition 397 N-grams 398 spaCy's streaming API 398 Multi-language NLP 398 NLP with TextBlob 400 Stemming 400 Sentiment polarity and subjectivity 401 From tokens to numbers – the document-term matrix 401 The BoW model 401 Measuring the similarity of documents 402 Document-term matrix with sklearn 403 Using CountVectorizer 404 Visualizing vocabulary distribution 404 Finding the most similar documents 405 TfidFTransformer and TfidFVectorizer 406 The effect of smoothing 407 How to summarize news articles using TfidFVectorizer 408 Text Preprocessing - review 408 Text classification and sentiment analysis 408 The Naive Bayes classifier 409 Bayes' theorem refresher 409 The conditional independence assumption 410 News article classification 411 Training and evaluating multinomial Naive Bayes classifier 411 Sentiment analysis 412 Twitter data 412 Multinomial Naive Bayes 412 Comparison with TextBlob sentiment scores 413 Business reviews – the Yelp dataset challenge 413 Benchmark accuracy 414 Multinomial Naive Bayes model 414 One-versus-all logistic regression 415 Combining text and numerical features 415 Multinomial logistic regression 416 Gradient-boosting machine 416 Summary 417 Chapter 14: Topic Modeling 418 Learning latent topics: goals and approaches 419 From linear algebra to hierarchical probabilistic models 420 Table of Contents [ xii ] Latent semantic indexing 420 How to implement LSI using sklearn 422 Pros and cons 424 Probabilistic latent semantic analysis 424 How to implement pLSA using sklearn 425 Latent Dirichlet allocation 427 How LDA works 427 The Dirichlet distribution 428 The generative model 428 Reverse-engineering the process 429 How to evaluate LDA topics 430 Perplexity 430 Topic coherence 430 How to implement LDA using sklearn 431 How to visualize LDA results using pyLDAvis 432 How to implement LDA using gensim 433 Topic modeling for earnings calls 436 Data preprocessing 437 Model training and evaluation 437 Running experiments 438 Topic modeling for Yelp business reviews 439 Summary 440 Chapter 15: Word Embeddings 441 How word embeddings encode semantics 442 How neural language models learn usage in context 442 The Word2vec model – learn embeddings at scale 443 Model objective – simplifying the softmax 444 Automatic phrase detection 445 How to evaluate embeddings – vector arithmetic and analogies 445 How to use pre-trained word vectors 447 GloVe – global vectors for word representation 448 How to train your own word vector embeddings 449 The Skip-Gram architecture in Keras 449 Noise-contrastive estimation 449 The model components 449 Visualizing embeddings using TensorBoard 450 Word vectors from SEC filings using gensim 450 Preprocessing 450 Automatic phrase detection 451 Model training 451 Model evaluation 452 Performance impact of parameter settings 452 Sentiment analysis with Doc2vec 453 Training Doc2vec on yelp sentiment data 454 Create input data 454 Bonus – Word2vec for translation 457 Table of Contents [ xiii ] Summary 457 Chapter 16: Next Steps 458 Key takeaways and lessons learned 459 Data is the single most important ingredient 459 Quality control 459 Data integration 460 Domain expertise helps unlock value in data 460 Feature engineering and alpha factor research 461 ML is a toolkit for solving problems with data 461 Model diagnostics help speed up optimization 462 Making do without a free lunch 462 Managing the bias-variance trade-off 463 Define targeted model objectives 463 The optimization verification test 464 Beware of backtest overfitting 464 How to gain insights from black-box models 464 ML for trading in practice 465 Data management technologies 465 Database systems 466 Big Data technologies – Hadoop and Spark 466 ML tools 467 Online trading platforms 468 Quantopian 468 QuantConnect 469 QuantRocket 469 Conclusion 469 Other Books You May Enjoy 470 Index 473

ادامه ...
برای ارسال نظر لطفا وارد شوید یا ثبت نام کنید
ادامه ...
پشتیبانی محصول

۱- در صورت داشتن هرگونه مشکلی در پرداخت، لطفا با پشتیبانی تلگرام در ارتباط باشید.

۲- برای خرید محصولات لطفا به شماره محصول و عنوان دقت کنید.

۳- شما می توانید فایلها را روی نرم افزارهای مختلف اجرا کنید(هیچگونه کد یا قفلی روی فایلها وجود ندارد).

۴- بعد از خرید، محصول مورد نظر از صفحه محصول قابل دانلود خواهد بود همچنین به ایمیل شما ارسال می شود.

۵- در صورت وجود هر مشکلی در فرایند خرید با تماس بگیرید.