EBookClubs

Read Books & Download eBooks Full Online

EBookClubs

Read Books & Download eBooks Full Online

Book MYSQL FOR JAVA GUI  Database  Cryptography  and Image Processing

Download or read book MYSQL FOR JAVA GUI Database Cryptography and Image Processing written by Vivian Siahaan and published by SPARTA PUBLISHING. This book was released on 2019-08-01 with total page 475 pages. Available in PDF, EPUB and Kindle. Book excerpt: In this book, you will learn how to build from scratch a criminal records management database system using Java / MySQL. All Java code for digital image processing in this book is Native Java. Intentionally not to rely on external libraries, so that readers know in detail the process of extracting digital images from scratch in Java. There are only three external libraries used in this book: Connector / J to facilitate Java to MySQL connections, JCalendar to display calendar controls, and JFreeChart to display graphics. Digital image techniques to extract image features used in this book are grascaling, sharpening, invertering, blurring, dilation, erosion, closing, opening, vertical prewitt, horizontal prewitt, Laplacian, horizontal sobel, and vertical sobel. For readers, you can develop it to store other advanced image features based on descriptors such as SIFT and others for developing descriptor based matching. In the first chapter, you will be shown the number of devices needed to be downloaded and installed. You need to know how to add external libraries to the NetBeans environment. These tools are needed so that you can run the Java scripts. In the second chapter, you will learn the basics of cryptography using Java. Here, you will learn how to write a Java program to count Hash, MAC (Message Authentication Code), store keys in a KeyStore, generate PrivateKey and PublicKey, encrypt / decrypt data, and generate and verify digital prints. In the third chapter, you will learn how to create and store salt passwords and verify them. You will create a Login table. In this case, you will see how to create a Java GUI using NetBeans to implement it. In addition to the Login table, in this chapter you will also create a Client table. In the case of the Client table, you will learn how to generate and save public and private keys into a database. You will also learn how to encrypt / decrypt data and save the results into a database. In the fourth chapter, you will create an Account table. This account table has the following ten fields: account_id (primary key), client_id (primarykey), account_number, account_date, account_type, plain_balance, cipher_balance, decipher_balance, digital_signature, and signature_verification. In this case, you will learn how to implement generating and verifying digital prints and storing the results into a database. In the fifth chapter, You create a table with the name of the Account, which has ten columns: account_id (primary key), client_id (primarykey), account_number, account_date, account_type, plain_balance, cipher_balance, decipher_balance, digital_signature, and signature_verification. In the sixth chapter, you will create a Client_Data table, which has the following seven fields: client_data_id (primary key), account_id (primary_key), birth_date, address, mother_name, telephone, and photo_path. In the seventh chapter, you will be taught to create Java GUI to view, edit, insert, and delete Suspect table data. This table has eleven columns: suspect_id (primary key), suspect_name, birth_date, case_date, report_date, suspect_ status, arrest_date, mother_name, address, telephone, and photo. In the eighth chapter, you will be taught how to create Crime database and its tables. In nineth chapter, you will be taught how to extract image features, utilizing BufferedImage class, in Java GUI. In the tenth chapter, you will be taught to create Java GUI to view, edit, insert, and delete Feature_Extraction table data. This table has eight columns: feature_id (primary key), suspect_id (foreign key), feature1, feature2, feature3, feature4, feature5, and feature6. All six fields (except keys) will have a BLOB data type, so that the image of the feature will be directly saved into this table. In the eleventh chapter, you will add two tables: Police_Station and Investigator. These two tables will later be joined to Suspect table through another table, File_Case, which will be built in the seventh chapter. The Police_Station has six columns: police_station_id (primary key), location, city, province, telephone, and photo. The Investigator has eight columns: investigator_id (primary key), investigator_name, rank, birth_date, gender, address, telephone, and photo. Here, you will design a Java GUI to display, edit, fill, and delete data in both tables. In the twelfth chapter, you will add two tables: Victim and File_Case. The File_Case table will connect four other tables: Suspect, Police_Station, Investigator and Victim. The Victim table has nine columns: victim_id (primary key), victim_name, crime_type, birth_date, crime_date, gender, address, telephone, and photo. The File_Case has seven columns: file_case_id (primary key), suspect_id (foreign key), police_station_id (foreign key), investigator_id (foreign key), victim_id (foreign key), status, and description. Here, you will also design a Java GUI to display, edit, fill, and delete data in both tables.

Book POSTGRESQL FOR JAVA GUI  Database and Image Processing

Download or read book POSTGRESQL FOR JAVA GUI Database and Image Processing written by Vivian Siahaan and published by SPARTA PUBLISHING. This book was released on 2019-08-27 with total page 340 pages. Available in PDF, EPUB and Kindle. Book excerpt: In this book, you will learn how to build from scratch a criminal records management database system using Java/PostgreSQL. All Java code for digital image processing in this book is Native Java. Intentionally not to rely on external libraries, so that readers know in detail the process of extracting digital images from scratch in Java. There are only three external libraries used in this book: Connector / J to facilitate Java to MySQL connections, JCalendar to display calendar controls, and JFreeChart to display graphics. Digital image techniques to extract image features used in this book are grascaling, sharpening, invertering, blurring, dilation, erosion, closing, opening, vertical prewitt, horizontal prewitt, Laplacian, horizontal sobel, and vertical sobel. For readers, you can develop it to store other advanced image features based on descriptors such as SIFT and others for developing descriptor based matching. In the first chapter, you will learn: How to install NetBeans, JDK 11, and the PostgreSQL connector; How to integrate external libraries into projects; How the basic PostgreSQL commands are used; How to query statements to create databases, create tables, fill tables, and manipulate table contents is done.In the first chapter, you will learn: How to install NetBeans, JDK 11, and the PostgreSQL connector; How to integrate external libraries into projects; How the basic PostgreSQL commands are used; How to query statements to create databases, create tables, fill tables, and manipulate table contents is done. In the second chapter, you will learn querying data from the postgresql using jdbc including establishing a database connection, creating a statement object, executing the query, processing the resultset object, querying data using a statement that returns multiple rows, querying data using a statement that has parameters, inserting data into a table using jdbc, updating data in postgresql database using jdbc, calling postgresql stored function using jdbc, deleting data from a postgresql table using jdbc, and postgresql jdbc transaction. In third chapter, you will be taught how to extract image features, utilizing BufferedImage class, in Java GUI. In the fourth chapter, you will be taught how to create Crime database and its tables. In the fifth chapter, you will be taught to create Java GUI to view, edit, insert, and delete Suspect table data. This table has eleven columns: suspect_id (primary key), suspect_name, birth_date, case_date, report_date, suspect_ status, arrest_date, mother_name, address, telephone, and photo. In the sixth chapter, you will be taught to create Java GUI to view, edit, insert, and delete Feature_Extraction table data. This table has eight columns: feature_id (primary key), suspect_id (foreign key), feature1, feature2, feature3, feature4, feature5, and feature6. All six fields (except keys) will have a BLOB data type, so that the image of the feature will be directly saved into this table. In the seventh chapter, you will add two tables: Police_Station and Investigator. These two tables will later be joined to Suspect table through another table, File_Case, which will be built in the seventh chapter. The Police_Station has six columns: police_station_id (primary key), location, city, province, telephone, and photo. The Investigator has eight columns: investigator_id (primary key), investigator_name, rank, birth_date, gender, address, telephone, and photo. Here, you will design a Java GUI to display, edit, fill, and delete data in both tables. In the eigthth chapter, you will add two tables: Victim and File_Case. The File_Case table will connect four other tables: Suspect, Police_Station, Investigator and Victim. The Victim table has nine columns: victim_id (primary key), victim_name, crime_type, birth_date, crime_date, gender, address, telephone, and photo. The File_Case has seven columns: file_case_id (primary key), suspect_id (foreign key), police_station_id (foreign key), investigator_id (foreign key), victim_id (foreign key), status, and description. Here, you will also design a Java GUI to display, edit, fill, and delete data in both tables. Finally, this book is hopefully useful for you.

Book MariaDB with Java GUI for Cryptography and Image Processing

Download or read book MariaDB with Java GUI for Cryptography and Image Processing written by Vivian Siahaan and published by SPARTA PUBLISHING. This book was released on 2019-09-02 with total page 465 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book is Java/MariaDB version of our previous books which used Java/MySQL and Java/PostgreSQL. In this book, you will learn how to build from scratch a criminal records management database system and simple bank database system using Java/MariaDB. All Java code for digital image processing in this book is Native Java. Intentionally not to rely on external libraries, so that readers know in detail the process of extracting digital images from scratch in Java. There are only three external libraries used in this book: Connector/J to facilitate Java to MariaDB connections, JCalendar to display calendar controls, and JFreeChart to display graphics. Digital image techniques to extract image features used in this book are grascaling, sharpening, invertering, blurring, dilation, erosion, closing, opening, vertical prewitt, horizontal prewitt, Laplacian, horizontal sobel, and vertical sobel. For readers, you can develop it to store other advanced image features based on descriptors such as SIFT and others for developing descriptor based matching. In the first chapter, you will learn the basics of cryptography using Java. Here, you will learn how to write a Java program to count Hash, MAC (Message Authentication Code), store keys in a KeyStore, generate PrivateKey and PublicKey, encrypt / decrypt data, and generate and verify digital prints. In the second chapter, you will learn how to create and store salt passwords and verify them. You will create a Login table. In this case, you will see how to create a Java GUI using NetBeans to implement it. In addition to the Login table, in this chapter you will also create a Client table. In the case of the Client table, you will learn how to generate and save public and private keys into a database. You will also learn how to encrypt / decrypt data and save the results into a database. In the third chapter, you will create an Account table. This account table has the following ten fields: account_id (primary key), client_id (primarykey), account_number, account_date, account_type, plain_balance, cipher_balance, decipher_balance, digital_signature, and signature_verification. In this case, you will learn how to implement generating and verifying digital prints and storing the results into a database. In the fourth chapter, You create a table with the name of the Account, which has ten columns: account_id (primary key), client_id (primarykey), account_number, account_date, account_type, plain_balance, cipher_balance, decipher_balance, digital_signature, and signature_verification. In the fifth chapter, you will create a Client_Data table, which has the following seven fields: client_data_id (primary key), account_id (primary_key), birth_date, address, mother_name, telephone, and photo_path. In the sixth chapter, you will be taught to create Java GUI to view, edit, insert, and delete Suspect table data. This table has eleven columns: suspect_id (primary key), suspect_name, birth_date, case_date, report_date, suspect_ status, arrest_date, mother_name, address, telephone, and photo. In the seventh chapter, you will be taught how to create Crime database and its tables. In nineth chapter, you will be taught how to extract image features, utilizing BufferedImage class, in Java GUI. In the eighth chapter, you will be taught to create Java GUI to view, edit, insert, and delete Feature_Extraction table data. This table has eight columns: feature_id (primary key), suspect_id (foreign key), feature1, feature2, feature3, feature4, feature5, and feature6. All six fields (except keys) will have a BLOB data type, so that the image of the feature will be directly saved into this table. In the nineth chapter, you will add two tables: Police_Station and Investigator. These two tables will later be joined to Suspect table through another table, File_Case, which will be built in the seventh chapter. The Police_Station has six columns: police_station_id (primary key), location, city, province, telephone, and photo. The Investigator has eight columns: investigator_id (primary key), investigator_name, rank, birth_date, gender, address, telephone, and photo. Here, you will design a Java GUI to display, edit, fill, and delete data in both tables. In the eleventh chapter, you will add two tables: Victim and File_Case. The File_Case table will connect four other tables: Suspect, Police_Station, Investigator and Victim. The Victim table has nine columns: victim_id (primary key), victim_name, crime_type, birth_date, crime_date, gender, address, telephone, and photo. The File_Case has seven columns: file_case_id (primary key), suspect_id (foreign key), police_station_id (foreign key), investigator_id (foreign key), victim_id (foreign key), status, and description. Here, you will also design a Java GUI to display, edit, fill, and delete data in both tables. Finally, this book is hopefully useful for you.

Book JAVA GUI WITH MYSQL  Database and Image Processing

Download or read book JAVA GUI WITH MYSQL Database and Image Processing written by Vivian Siahaan and published by SPARTA PUBLISHING. This book was released on 2019-08-26 with total page 325 pages. Available in PDF, EPUB and Kindle. Book excerpt: In this book, you will learn how to build from scratch a criminal records management database system using Java / MySQL. All Java code for digital image processing in this book is Native Java. Intentionally not to rely on external libraries, so that readers know in detail the process of extracting digital images from scratch in Java. There are only three external libraries used in this book: Connector / J to facilitate Java to MySQL connections, JCalendar to display calendar controls, and JFreeChart to display graphics. Digital image techniques to extract image features used in this book are grascaling, sharpening, invertering, blurring, dilation, erosion, closing, opening, vertical prewitt, horizontal prewitt, Laplacian, horizontal sobel, and vertical sobel. For readers, you can develop it to store other advanced image features based on descriptors such as SIFT and others for developing descriptor based matching. In the first chapter, you will be shown the number of devices needed to be downloaded and installed. You need to know how to add external libraries to the NetBeans environment. These tools are needed so that you can run the Java scripts. In the second chapter, you will be taught how to create Crime database and its tables. In third chapter, you will be taught how to extract image features, utilizing BufferedImage class, in Java GUI. In the fourth chapter, you will be taught to create Java GUI to view, edit, insert, and delete Suspect table data. This table has eleven columns: suspect_id (primary key), suspect_name, birth_date, case_date, report_date, suspect_ status, arrest_date, mother_name, address, telephone, and photo. In the fifth chapter, you will be taught to create Java GUI to view, edit, insert, and delete Feature_Extraction table data. This table has eight columns: feature_id (primary key), suspect_id (foreign key), feature1, feature2, feature3, feature4, feature5, and feature6. All six fields (except keys) will have a BLOB data type, so that the image of the feature will be directly saved into this table. In the sixth chapter, you will add two tables: Police_Station and Investigator. These two tables will later be joined to Suspect table through another table, File_Case, which will be built in the seventh chapter. The Police_Station has six columns: police_station_id (primary key), location, city, province, telephone, and photo. The Investigator has eight columns: investigator_id (primary key), investigator_name, rank, birth_date, gender, address, telephone, and photo. Here, you will design a Java GUI to display, edit, fill, and delete data in both tables. In the seventh chapter, you will add two tables: Victim and File_Case. The File_Case table will connect four other tables: Suspect, Police_Station, Investigator and Victim. The Victim table has nine columns: victim_id (primary key), victim_name, crime_type, birth_date, crime_date, gender, address, telephone, and photo. The File_Case has seven columns: file_case_id (primary key), suspect_id (foreign key), police_station_id (foreign key), investigator_id (foreign key), victim_id (foreign key), status, and description. Here, you will also design a Java GUI to display, edit, fill, and delete data in both tables. Finally, this book is hopefully useful for you.

Book LEARN FROM SCRATCH SIGNAL AND IMAGE PROCESSING WITH PYTHON GUI

Download or read book LEARN FROM SCRATCH SIGNAL AND IMAGE PROCESSING WITH PYTHON GUI written by Vivian Siahaan and published by BALIGE PUBLISHING. This book was released on 2023-06-14 with total page 372 pages. Available in PDF, EPUB and Kindle. Book excerpt: In this book, you will learn how to use OpenCV, NumPy library and other libraries to perform signal processing, image processing, object detection, and feature extraction with Python GUI (PyQt). You will learn how to filter signals, detect edges and segments, and denoise images with PyQt. You will also learn how to detect objects (face, eye, and mouth) using Haar Cascades and how to detect features on images using Harris Corner Detection, Shi-Tomasi Corner Detector, Scale-Invariant Feature Transform (SIFT), and Features from Accelerated Segment Test (FAST). In Chapter 1, you will learn: Tutorial Steps To Create A Simple GUI Application, Tutorial Steps to Use Radio Button, Tutorial Steps to Group Radio Buttons, Tutorial Steps to Use CheckBox Widget, Tutorial Steps to Use Two CheckBox Groups, Tutorial Steps to Understand Signals and Slots, Tutorial Steps to Convert Data Types, Tutorial Steps to Use Spin Box Widget, Tutorial Steps to Use ScrollBar and Slider, Tutorial Steps to Use List Widget, Tutorial Steps to Select Multiple List Items in One List Widget and Display It in Another List Widget, Tutorial Steps to Insert Item into List Widget, Tutorial Steps to Use Operations on Widget List, Tutorial Steps to Use Combo Box, Tutorial Steps to Use Calendar Widget and Date Edit, and Tutorial Steps to Use Table Widget. In Chapter 2, you will learn: Tutorial Steps To Create A Simple Line Graph, Tutorial Steps To Create A Simple Line Graph in Python GUI, Tutorial Steps To Create A Simple Line Graph in Python GUI: Part 2, Tutorial Steps To Create Two or More Graphs in the Same Axis, Tutorial Steps To Create Two Axes in One Canvas, Tutorial Steps To Use Two Widgets, Tutorial Steps To Use Two Widgets, Each of Which Has Two Axes, Tutorial Steps To Use Axes With Certain Opacity Levels, Tutorial Steps To Choose Line Color From Combo Box, Tutorial Steps To Calculate Fast Fourier Transform, Tutorial Steps To Create GUI For FFT, Tutorial Steps To Create GUI For FFT With Some Other Input Signals, Tutorial Steps To Create GUI For Noisy Signal, Tutorial Steps To Create GUI For Noisy Signal Filtering, and Tutorial Steps To Create GUI For Wav Signal Filtering. In Chapter 3, you will learn: Tutorial Steps To Convert RGB Image Into Grayscale, Tutorial Steps To Convert RGB Image Into YUV Image, Tutorial Steps To Convert RGB Image Into HSV Image, Tutorial Steps To Filter Image, Tutorial Steps To Display Image Histogram, Tutorial Steps To Display Filtered Image Histogram, Tutorial Steps To Filter Image With CheckBoxes, Tutorial Steps To Implement Image Thresholding, and Tutorial Steps To Implement Adaptive Image Thresholding. In Chapter 4, you will learn: Tutorial Steps To Generate And Display Noisy Image, Tutorial Steps To Implement Edge Detection On Image, Tutorial Steps To Implement Image Segmentation Using Multiple Thresholding and K-Means Algorithm, and Tutorial Steps To Implement Image Denoising. In Chapter 5, you will learn: Tutorial Steps To Detect Face, Eye, and Mouth Using Haar Cascades, Tutorial Steps To Detect Face Using Haar Cascades with PyQt, Tutorial Steps To Detect Eye, and Mouth Using Haar Cascades with PyQt, and Tutorial Steps To Extract Detected Objects. In Chapter 6, you will learn: Tutorial Steps To Detect Image Features Using Harris Corner Detection, Tutorial Steps To Detect Image Features Using Shi-Tomasi Corner Detection, Tutorial Steps To Detect Features Using Scale-Invariant Feature Transform (SIFT), and Tutorial Steps To Detect Features Using Features from Accelerated Segment Test (FAST). You can download the XML files from https://viviansiahaan.blogspot.com/2023/06/learn-from-scratch-signal-and-image.html.

Book SIX BOOKS IN ONE  Classification  Prediction  and Sentiment Analysis Using Machine Learning and Deep Learning with Python GUI

Download or read book SIX BOOKS IN ONE Classification Prediction and Sentiment Analysis Using Machine Learning and Deep Learning with Python GUI written by Vivian Siahaan and published by BALIGE PUBLISHING. This book was released on 2022-04-11 with total page 1165 pages. Available in PDF, EPUB and Kindle. Book excerpt: Book 1: BANK LOAN STATUS CLASSIFICATION AND PREDICTION USING MACHINE LEARNING WITH PYTHON GUI The dataset used in this project consists of more than 100,000 customers mentioning their loan status, current loan amount, monthly debt, etc. There are 19 features in the dataset. The dataset attributes are as follows: Loan ID, Customer ID, Loan Status, Current Loan Amount, Term, Credit Score, Annual Income, Years in current job, Home Ownership, Purpose, Monthly Debt, Years of Credit History, Months since last delinquent, Number of Open Accounts, Number of Credit Problems, Current Credit Balance, Maximum Open Credit, Bankruptcies, and Tax Liens. The models used in this project are K-Nearest Neighbor, Random Forest, Naive Bayes, Logistic Regression, Decision Tree, Support Vector Machine, Adaboost, LGBM classifier, Gradient Boosting, and XGB classifier. Three feature scaling used in machine learning are raw, minmax scaler, and standard scaler. Finally, you will develop a GUI using PyQt5 to plot cross validation score, predicted values versus true values, confusion matrix, learning curve, decision boundaries, performance of the model, scalability of the model, training loss, and training accuracy. Book 2: OPINION MINING AND PREDICTION USING MACHINE LEARNING AND DEEP LEARNING WITH PYTHON GUI Opinion mining (sometimes known as sentiment analysis or emotion AI) refers to the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. This dataset was created for the Paper 'From Group to Individual Labels using Deep Features', Kotzias et. al,. KDD 2015. It contains sentences labelled with a positive or negative sentiment. Score is either 1 (for positive) or 0 (for negative). The sentences come from three different websites/fields: imdb.com, amazon.com, and yelp.com. For each website, there exist 500 positive and 500 negative sentences. Those were selected randomly for larger datasets of reviews. Amazon: contains reviews and scores for products sold on amazon.com in the cell phones and accessories category, and is part of the dataset collected by McAuley and Leskovec. Scores are on an integer scale from 1 to 5. Reviews considered with a score of 4 and 5 to be positive, and scores of 1 and 2 to be negative. The data is randomly partitioned into two halves of 50%, one for training and one for testing, with 35,000 documents in each set. IMDb: refers to the IMDb movie review sentiment dataset originally introduced by Maas et al. as a benchmark for sentiment analysis. This dataset contains a total of 100,000 movie reviews posted on imdb.com. There are 50,000 unlabeled reviews and the remaining 50,000 are divided into a set of 25,000 reviews for training and 25,000 reviews for testing. Each of the labeled reviews has a binary sentiment label, either positive or negative. Yelp: refers to the dataset from the Yelp dataset challenge from which we extracted the restaurant reviews. Scores are on an integer scale from 1 to 5. Reviews considered with scores 4 and 5 to be positive, and 1 and 2 to be negative. The data is randomly generated a 50-50 training and testing split, which led to approximately 300,000 documents for each set. Sentences: for each of the datasets above, labels are extracted and manually 1000 sentences are manually labeled from the test set, with 50% positive sentiment and 50% negative sentiment. These sentences are only used to evaluate our instance-level classifier for each dataset3. They are not used for model training, to maintain consistency with our overall goal of learning at a group level and predicting at the instance level. The models used in this project are K-Nearest Neighbor, Random Forest, Naive Bayes, Logistic Regression, Decision Tree, Support Vector Machine, Adaboost, LGBM classifier, Gradient Boosting, and XGB classifier. Three feature scaling used in machine learning are raw, minmax scaler, and standard scaler. Finally, you will develop a GUI using PyQt5 to plot cross validation score, predicted values versus true values, confusion matrix, learning curve, decision boundaries, performance of the model, scalability of the model, training loss, and training accuracy. Book 3: EMOTION PREDICTION FROM TEXT USING MACHINE LEARNING AND DEEP LEARNING WITH PYTHON GUI In the dataset used in this project, there are two columns, Text and Emotion. Quite self-explanatory. The Emotion column has various categories ranging from happiness to sadness to love and fear. You will build and implement machine learning and deep learning models which can identify what words denote what emotion. The models used in this project are K-Nearest Neighbor, Random Forest, Naive Bayes, Logistic Regression, Decision Tree, Support Vector Machine, Adaboost, LGBM classifier, Gradient Boosting, and XGB classifier. Three feature scaling used in machine learning are raw, minmax scaler, and standard scaler. Finally, you will develop a GUI using PyQt5 to plot cross validation score, predicted values versus true values, confusion matrix, learning curve, decision boundaries, performance of the model, scalability of the model, training loss, and training accuracy. Book 4: HATE SPEECH DETECTION AND SENTIMENT ANALYSIS USING MACHINE LEARNING AND DEEP LEARNING WITH PYTHON GUI The objective of this task is to detect hate speech in tweets. For the sake of simplicity, a tweet contains hate speech if it has a racist or sexist sentiment associated with it. So, the task is to classify racist or sexist tweets from other tweets. Formally, given a training sample of tweets and labels, where label '1' denotes the tweet is racist/sexist and label '0' denotes the tweet is not racist/sexist, the objective is to predict the labels on the test dataset. The models used in this project are K-Nearest Neighbor, Random Forest, Naive Bayes, Logistic Regression, Decision Tree, Support Vector Machine, Adaboost, LGBM classifier, Gradient Boosting, XGB classifier, LSTM, and CNN. Three feature scaling used in machine learning are raw, minmax scaler, and standard scaler. Finally, you will develop a GUI using PyQt5 to plot cross validation score, predicted values versus true values, confusion matrix, learning curve, decision boundaries, performance of the model, scalability of the model, training loss, and training accuracy. Book 5: TRAVEL REVIEW RATING CLASSIFICATION AND PREDICTION USING MACHINE LEARNING WITH PYTHON GUI The dataset used in this project has been sourced from the Machine Learning Repository of University of California, Irvine (UC Irvine): Travel Review Ratings Data Set. This dataset is populated by capturing user ratings from Google reviews. Reviews on attractions from 24 categories across Europe are considered. Google user rating ranges from 1 to 5 and average user rating per category is calculated. The attributes in the dataset are as follows: Attribute 1 : Unique user id; Attribute 2 : Average ratings on churches; Attribute 3 : Average ratings on resorts; Attribute 4 : Average ratings on beaches; Attribute 5 : Average ratings on parks; Attribute 6 : Average ratings on theatres; Attribute 7 : Average ratings on museums; Attribute 8 : Average ratings on malls; Attribute 9 : Average ratings on zoo; Attribute 10 : Average ratings on restaurants; Attribute 11 : Average ratings on pubs/bars; Attribute 12 : Average ratings on local services; Attribute 13 : Average ratings on burger/pizza shops; Attribute 14 : Average ratings on hotels/other lodgings; Attribute 15 : Average ratings on juice bars; Attribute 16 : Average ratings on art galleries; Attribute 17 : Average ratings on dance clubs; Attribute 18 : Average ratings on swimming pools; Attribute 19 : Average ratings on gyms; Attribute 20 : Average ratings on bakeries; Attribute 21 : Average ratings on beauty & spas; Attribute 22 : Average ratings on cafes; Attribute 23 : Average ratings on view points; Attribute 24 : Average ratings on monuments; and Attribute 25 : Average ratings on gardens. The models used in this project are K-Nearest Neighbor, Random Forest, Naive Bayes, Logistic Regression, Decision Tree, Support Vector Machine, Adaboost, LGBM classifier, Gradient Boosting, XGB classifier, and MLP classifier. Three feature scaling used in machine learning are raw, minmax scaler, and standard scaler. Finally, you will develop a GUI using PyQt5 to plot cross validation score, predicted values versus true values, confusion matrix, learning curve, decision boundaries, performance of the model, scalability of the model, training loss, and training accuracy. Book 6: ONLINE RETAIL CLUSTERING AND PREDICTION USING MACHINE LEARNING WITH PYTHON GUI The dataset used in this project is a transnational dataset which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail. The company mainly sells unique all-occasion gifts. Many customers of the company are wholesalers. You will be using the online retail transnational dataset to build a RFM clustering and choose the best set of customers which the company should target. In this project, you will perform Cohort analysis and RFM analysis. You will also perform clustering using K-Means to get 5 clusters. The machine learning models used in this project to predict clusters as target variable are K-Nearest Neighbor, Random Forest, Naive Bayes, Logistic Regression, Decision Tree, Support Vector Machine, LGBM, Gradient Boosting, XGB, and MLP. Finally, you will plot boundary decision, distribution of features, feature importance, cross validation score, and predicted values versus true values, confusion matrix, learning curve, performance of the model, scalability of the model, training loss, and training accuracy.

Book Step by Step Tutorial IMAGE CLASSIFICATION Using Scikit Learn  Keras  And TensorFlow with PYTHON GUI

Download or read book Step by Step Tutorial IMAGE CLASSIFICATION Using Scikit Learn Keras And TensorFlow with PYTHON GUI written by Vivian Siahaan and published by BALIGE PUBLISHING. This book was released on 2023-06-21 with total page 211 pages. Available in PDF, EPUB and Kindle. Book excerpt: In this book, implement deep learning-based image classification on classifying monkey species, recognizing rock, paper, and scissor, and classify airplane, car, and ship using TensorFlow, Keras, Scikit-Learn, OpenCV, Pandas, NumPy and other libraries. In chapter 1, you will learn how to use TensorFlow, Keras, Scikit-Learn, OpenCV, Pandas, NumPy and other libraries to perform how to classify monkey species using 10 Monkey Species dataset provided by Kaggle (https://www.kaggle.com/slothkong/10-monkey-species/download). Here's an overview of the steps involved in classifying monkey species using the 10 Monkey Species dataset: Dataset Preparation: Download the 10 Monkey Species dataset from Kaggle and extract the files. The dataset should consist of separate folders for each monkey species, with corresponding images.; Load and Preprocess Images: Use libraries such as OpenCV to load the images from the dataset. Resize the images to a consistent size (e.g., 224x224 pixels) to ensure uniformity.; Split the Dataset: Divide the dataset into training and testing sets. Typically, an 80:20 or 70:30 split is used, where the larger portion is used for training and the smaller portion for testing the model's performance.; Label Encoding: Encode the categorical labels (monkey species) into numeric form. This step is necessary to train a machine learning model, as most algorithms expect numerical inputs.; Feature Extraction: Extract meaningful features from the images using techniques like deep learning or image processing algorithms. This step helps in representing the images in a format that the machine learning model can understand.; Model Training: Use libraries like TensorFlow and Keras to train a machine learning model on the preprocessed data. Choose an appropriate model architecture, in this case, MobileNetV2.; Model Evaluation: Evaluate the trained model on the testing set to assess its performance. Metrics like accuracy, precision, recall, and F1-score can be used to evaluate the model's classification performance.; Predictions: Use the trained model to make predictions on new, unseen images. Pass the images through the trained model and obtain the predicted labels for the monkey species. In chapter 2, you will learn how to use TensorFlow, Keras, Scikit-Learn, OpenCV, Pandas, NumPy and other libraries to perform how to recognize rock, paper, and scissor using dataset provided by Kaggle (https://www.kaggle.com/sanikamal/rock-paper-scissors-dataset/download). Here's the outline of the steps: Step 1: Dataset Preparation: Download the rock-paper-scissors dataset from Kaggle by visiting the provided link and clicking on the "Download" button. Save the dataset to a local directory on your machine. Extract the downloaded dataset to a suitable location. This will create a folder containing the images for rock, paper, and scissors.; Step 2: Data Preprocessing: Import the required libraries: TensorFlow, Keras, NumPy, OpenCV, and Pandas. Load the dataset using OpenCV: Iterate through the image files in the dataset directory and use OpenCV's cv2.imread() function to load each image. You can specify the image's file extension (e.g., PNG) and directory path. Preprocess the images: Resize the loaded images to a consistent size using OpenCV's cv2.resize() function. You may choose a specific width and height suitable for your model. Prepare the labels: Create a list or array to store the corresponding labels for each image (rock, paper, or scissors). This can be done based on the file naming convention or by mapping images to their respective labels using a dictionary.; Step 3: Model Training: Create a convolutional neural network (CNN) model using Keras: Define a CNN architecture using Keras' Sequential model or functional API. This typically consists of convolutional layers, pooling layers, and dense layers. Compile the model: Specify the loss function (e.g., categorical cross-entropy) and optimizer (e.g., Adam) using Keras' compile() function. You can also define additional metrics to evaluate the model's performance. Train the model: Use Keras' fit() function to train the model on the preprocessed dataset. Specify the training data, labels, batch size, number of epochs, and validation data if available. This will optimize the model's weights based on the provided dataset. Save the trained model: Once the model training is complete, you can save the trained model to disk using Keras' save() or save_weights() function. This allows you to load the model later for predictions or further training.; Step 4: Model Evaluation: Evaluate the trained model: Use Keras' evaluate() function to assess the model's performance on a separate testing dataset. Provide the testing data and labels to calculate metrics such as accuracy, precision, recall, and F1 score. This will help you understand how well the model generalizes to new, unseen data. Analyze the model's performance: Interpret the evaluation metrics and analyze any potential areas of improvement. You can also visualize the confusion matrix or classification report to gain more insights into the model's predictions.; Step 5: Prediction: Use the trained model for predictions: Load the saved model using Keras' load_model() function. Then, pass new, unseen images through the model to obtain predictions. Preprocess these images in the same way as the training images (resize, normalize, etc.). Visualize and interpret predictions: Display the predicted labels alongside the corresponding images to see how well the model performs. You can use libraries like Matplotlib or OpenCV to show the images and their predicted labels. Additionally, you can calculate the accuracy of the model's predictions on the new dataset. In chapter 3, you will learn how to use TensorFlow, Keras, Scikit-Learn, OpenCV, Pandas, NumPy and other libraries to perform how to classify airplane, car, and ship using Multiclass-image-dataset-airplane-car-ship dataset provided by Kaggle (https://www.kaggle.com/abtabm/multiclassimagedatasetairplanecar). Here are the outline steps: Import the required libraries: TensorFlow, Keras, Scikit-Learn, OpenCV, Pandas, NumPy. Load and preprocess the dataset: Read the images from the dataset folder. Resize the images to a fixed size. Store the images and corresponding labels.; Split the dataset into training and testing sets: Split the data and labels into training and testing sets using a specified ratio.; Encode the labels: Convert the categorical labels into numerical format. Perform one-hot encoding on the labels.; Build MobileNetV2 model using Keras: Create a sequential model. Add convolutional layers with activation functions. Add pooling layers for downsampling. Flatten the output and add dense layers. Set the output layer with softmax activation.; Compile and train the model: Compile the model with an optimizer and loss function. Train the model using the training data and labels. Specify the number of epochs and batch size.; Evaluate the model: Evaluate the trained model using the testing data and labels. Calculate the accuracy of the model.; Make predictions on new images: Load and preprocess a new image. Use the trained model to predict the label of the new image. Convert the predicted label from numerical format to categorical.

Book Hands On Guide To IMAGE CLASSIFICATION Using Scikit Learn  Keras  And TensorFlow with PYTHON GUI

Download or read book Hands On Guide To IMAGE CLASSIFICATION Using Scikit Learn Keras And TensorFlow with PYTHON GUI written by Vivian Siahaan and published by BALIGE PUBLISHING. This book was released on 2023-06-20 with total page 210 pages. Available in PDF, EPUB and Kindle. Book excerpt: In this book, implement deep learning on detecting face mask, classifying weather, and recognizing flower using TensorFlow, Keras, Scikit-Learn, OpenCV, Pandas, NumPy and other libraries. In chapter 1, you will learn how to use TensorFlow, Keras, Scikit-Learn, OpenCV, Pandas, NumPy and other libraries to perform detecting face mask using Face Mask Detection Dataset provided by Kaggle (https://www.kaggle.com/omkargurav/face-mask-dataset/download). Here's an overview of the steps involved in detecting face masks using the Face Mask Detection Dataset: Import the necessary libraries: Import the required libraries like TensorFlow, Keras, Scikit-Learn, OpenCV, Pandas, and NumPy.; Load and preprocess the dataset: Load the dataset and perform any necessary preprocessing steps, such as resizing images and converting labels into numeric representations.; Split the dataset: Split the dataset into training and testing sets using the train_test_split function from Scikit-Learn. This will allow us to evaluate the model's performance on unseen data.; Data augmentation (optional): Apply data augmentation techniques to artificially increase the size and diversity of the training set. Techniques like rotation, zooming, and flipping can help improve the model's generalization.; Build the model: Create a Convolutional Neural Network (CNN) model using TensorFlow and Keras. Design the architecture of the model, including the number and type of layers.; Compile the model: Compile the model by specifying the loss function, optimizer, and evaluation metrics. This prepares the model for training. Train the model: Train the model on the training dataset. Adjust the hyperparameters, such as the learning rate and number of epochs, to achieve optimal performance.; Evaluate the model: Evaluate the trained model on the testing dataset to assess its performance. Calculate metrics such as accuracy, precision, recall, and F1 score.; Make predictions: Use the trained model to make predictions on new images or video streams. Apply the face mask detection algorithm to identify whether a person is wearing a mask or not.; Visualize the results: Visualize the predictions by overlaying bounding boxes or markers on the images or video frames to indicate the presence or absence of face masks. In chapter 2, you will learn how to use TensorFlow, Keras, Scikit-Learn, OpenCV, Pandas, NumPy and other libraries to perform how to classify weather using Multi-class Weather Dataset provided by Kaggle (https://www.kaggle.com/pratik2901/multiclass-weather-dataset/download). To classify weather using the Multi-class Weather Dataset from Kaggle, you can follow these general steps: Load the dataset: Use libraries like Pandas or NumPy to load the dataset into memory. Explore the dataset to understand its structure and the available features.; Preprocess the data: Perform necessary preprocessing steps such as data cleaning, handling missing values, and feature engineering. This may include resizing images (if the dataset contains images) or encoding categorical variables.; Split the data: Split the dataset into training and testing sets. The training set will be used to train the model, and the testing set will be used for evaluating its performance.; Build a model: Utilize TensorFlow and Keras to define a suitable model architecture for weather classification. The choice of model depends on the type of data you have. For image data, convolutional neural networks (CNNs) often work well.; Train the model: Train the model using the training data. Use appropriate training techniques like gradient descent and backpropagation to optimize the model's weights.; Evaluate the model: Evaluate the trained model's performance using the testing data. Calculate metrics such as accuracy, precision, recall, or F1-score to assess how well the model performs.; Fine-tune the model: If the model's performance is not satisfactory, you can experiment with different hyperparameters, architectures, or regularization techniques to improve its performance. This process is called model tuning.; Make predictions: Once you are satisfied with the model's performance, you can use it to make predictions on new, unseen data. Provide the necessary input (e.g., an image or weather features) to the trained model, and it will predict the corresponding weather class. In chapter 3, you will learn how to use TensorFlow, Keras, Scikit-Learn, OpenCV, Pandas, NumPy and other libraries to perform how to recognize flower using Flowers Recognition dataset provided by Kaggle (https://www.kaggle.com/alxmamaev/flowers-recognition/download). Here are the general steps involved in recognizing flowers: Data Preparation: Download the Flowers Recognition dataset from Kaggle and extract the contents. Import the required libraries and define the dataset path and image dimensions.; Loading and Preprocessing the Data: Load the images and their corresponding labels from the dataset. Resize the images to a specific dimension. Perform label encoding on the flower labels and split the data into training and testing sets. Normalize the pixel values of the images.; Building the Model: Define the architecture of your model using TensorFlow's Keras API. You can choose from various neural network architectures such as CNNs, ResNet, or InceptionNet. The model architecture should be designed to handle image inputs and output the predicted flower class..; Compiling and Training the Model: Compile the model by specifying the loss function, optimizer, and evaluation metrics. Common choices include categorical cross-entropy loss and the Adam optimizer. Train the model using the training set and validate it using the testing set. Adjust the hyperparameters, such as the learning rate and number of epochs, to improve performance.; Model Evaluation: Evaluate the trained model on the testing set to measure its performance. Calculate metrics such as accuracy, precision, recall, and F1-score to assess how well the model is recognizing flower classes.; Prediction: Use the trained model to predict the flower class for new images. Load and preprocess the new images in a similar way to the training data. Pass the preprocessed images through the trained model and obtain the predicted flower class labels.; Further Improvements: If the model's performance is not satisfactory, consider experimenting with different architectures, hyperparameters, or techniques such as data augmentation or transfer learning. Fine-tuning the model or using ensembles of models can also improve accuracy.

Book OPTICAL FLOW ANALYSIS AND MOTION ESTIMATION IN DIGITAL VIDEO WITH PYTHON AND TKINTER

Download or read book OPTICAL FLOW ANALYSIS AND MOTION ESTIMATION IN DIGITAL VIDEO WITH PYTHON AND TKINTER written by Vivian Siahaan and published by BALIGE PUBLISHING. This book was released on 2024-04-11 with total page 181 pages. Available in PDF, EPUB and Kindle. Book excerpt: The first project, the GUI motion analysis tool gui_motion_analysis_fsbm.py, employs the Full Search Block Matching (FSBM) algorithm to analyze motion in videos. It imports essential libraries like tkinter, PIL, imageio, cv2, and numpy for GUI creation, image manipulation, video reading, computer vision tasks, and numerical computations. The script organizes its functionalities within the VideoFSBMOpticalFlow class, managing GUI elements through methods like create_widgets() for layout management, open_video() for video selection, and toggle_play_pause() for video playback control. It employs the FSBM algorithm for optical flow estimation, utilizing methods like full_search_block_matching() for motion vector calculation and show_optical_flow() for displaying motion patterns. Ultimately, by combining user-friendly controls with powerful analytical capabilities, the script facilitates efficient motion analysis in videos. The second project gui_motion_analysis_fsbm_dsa.py aims to provide a comprehensive solution for optical flow analysis through a user-friendly graphical interface. Leveraging the Full Search Block Matching (FSBM) algorithm with the Diamond Search Algorithm (DSA) optimization, it enables users to estimate motion patterns within video sequences efficiently. By integrating these algorithms into a GUI environment built with Tkinter, the script facilitates intuitive exploration and analysis of motion dynamics in various applications such as object tracking, video compression, and robotics. Key features include video file input, playback control, parameter adjustment, zooming capabilities, and optical flow visualization. Users can interactively analyze videos frame by frame, adjust algorithm parameters to tailor performance, and zoom in on specific regions of interest for detailed examination. Error handling mechanisms ensure robustness, while support for multiple instances enables simultaneous analysis of multiple videos. In essence, the project empowers users to gain insights into motion behaviors within video content, enhancing their ability to make informed decisions in diverse fields reliant on optical flow analysis. The third project "Optical Flow Analysis with Three-Step Search (TSS)" is dedicated to offering a user-friendly graphical interface for motion analysis in video sequences through the application of the Three-Step Search (TSS) algorithm. Optical flow analysis, pivotal in computer vision, facilitates tasks like video surveillance and object tracking. The implementation of TSS within the GUI environment allows users to efficiently estimate motion, empowering them with tools for detailed exploration and understanding of motion dynamics. Through its intuitive graphical interface, the project enables users to interactively engage with video content, from opening and previewing video files to controlling playback and navigating frames. Furthermore, it facilitates parameter customization, allowing users to fine-tune settings such as zoom scale and block size for tailored optical flow analysis. By overlaying visualizations of motion vectors on video frames, users gain insights into motion patterns, fostering deeper comprehension and analysis. Additionally, the project promotes community collaboration, serving as an educational resource and a platform for benchmarking different optical flow algorithms, ultimately advancing the field of computer vision technology. The fourth project gui_motion_analysis_bgds.py is developed with the primary objective of providing a user-friendly graphical interface (GUI) application for analyzing optical flow within video sequences, utilizing the Block-based Gradient Descent Search (BGDS) algorithm. Its purpose is to facilitate comprehensive exploration and understanding of motion patterns in video data, catering to diverse domains such as computer vision, video surveillance, and human-computer interaction. By offering intuitive controls and interactive functionalities, the application empowers users to delve into the intricacies of motion dynamics, aiding in research, education, and practical applications. Through the GUI interface, users can seamlessly open and analyze video files, spanning formats like MP4, AVI, or MKV, thus enabling thorough examination of motion behaviors within different contexts. The application supports essential features such as video playback control, zoom adjustment, frame navigation, and parameter customization. Leveraging the BGDS algorithm, motion vectors are computed at the block level, furnishing users with detailed insights into motion characteristics across successive frames. Additionally, the GUI facilitates real-time visualization of computed optical flow fields alongside original video frames, enhancing users' ability to interpret and analyze motion information effectively. With support for multiple instances and configurable parameters, the application caters to a broad spectrum of users, serving as a versatile tool for motion analysis endeavors in various professional and academic endeavors. The fifth project gui_motion_analysis_hbm2.py serves as a comprehensive graphical user interface (GUI) application tailored for optical flow analysis in video files. Leveraging the Tkinter library, it provides a user-friendly platform for scrutinizing the apparent motion of objects between consecutive frames, essential for various applications like object tracking and video compression. The algorithm of choice for optical flow analysis is the Hierarchical Block Matching (HBM) technique enhanced with the Three-Step Search (TSS) optimization, renowned for its effectiveness in motion estimation tasks. Primarily, the GUI layout encompasses a video display panel alongside control buttons facilitating actions such as video file opening, playback control, frame navigation, and parameter specification for optical flow analysis. Users can seamlessly open supported video files (e.g., MP4, AVI, MKV) and adjust parameters like zoom scale, step size, block size, and search range to tailor the analysis according to their needs. Through interactive features like zooming, panning, and dragging to manipulate the optical flow visualization, users gain insights into motion patterns with ease. Furthermore, the application supports additional functionalities such as time-based navigation, parallel analysis through multiple instances, ensuring a versatile and user-centric approach to optical flow analysis. The sixth project object_tracking_fsbm.py is designed to showcase object tracking capabilities using the Full Search Block Matching Algorithm (FSBM) within a user-friendly graphical interface (GUI) developed with Tkinter. By integrating this algorithm with a robust GUI, the project aims to offer a practical demonstration of object tracking techniques commonly utilized in computer vision applications. Upon execution, the script initializes a Tkinter window and sets up essential widgets for video display, playback control, and parameter adjustment. Users can seamlessly open video files in various formats and navigate through frames with intuitive controls, facilitating efficient analysis and tracking of objects. Leveraging the FSBM algorithm, object tracking is achieved by comparing pixel blocks between consecutive frames to estimate motion vectors, enabling real-time visualization of object movements within the video stream. The GUI provides interactive features like bounding box initialization, parameter adjustment, and zoom functionality, empowering users to fine-tune the tracking process and analyze objects with precision. Overall, the project serves as a comprehensive platform for object tracking, combining algorithmic prowess with an intuitive interface for effective analysis and visualization of object motion in video streams. The seventh project showcases an object tracking application seamlessly integrated with a graphical user interface (GUI) developed using Tkinter. Users can effortlessly interact with video files of various formats (MP4, AVI, MKV, WMV) through intuitive controls such as play, pause, and stop for video playback, as well as frame-by-frame navigation. The GUI further enhances user experience by providing zoom functionality for detailed examination of video content, contributing to a comprehensive and user-friendly environment. Central to the application is the implementation of the Diamond Search Algorithm (DSA) for object tracking, enabling the calculation of motion vectors between consecutive frames. These motion vectors facilitate the dynamic adjustment of a bounding box around the tracked object, offering visual feedback to users. Leveraging event handling mechanisms like mouse wheel scrolling and button press-and-drag, along with error handling for smooth operation, the project demonstrates the practical fusion of computer vision techniques with GUI development, exemplifying the real-world application of algorithms like DSA in object tracking scenarios. The eight project aims to provide an interactive graphical user interface (GUI) application for object tracking, employing the Three-Step Search (TSS) algorithm for motion estimation. The ObjectTrackingFSBM_TSS class defines the GUI layout, featuring essential widgets for video display, control buttons, and parameter inputs for block size and search range. Users can effortlessly interact with the application, from opening video files to controlling video playback and adjusting tracking parameters, facilitating seamless exploration of object motion within video sequences. Central to the application's functionality are the full_search_block_matching_tss() and track_object() methods, responsible for implementing the TSS algorithm and object tracking process, respectively. The full_search_block_matching_tss() method iterates over blocks in consecutive frames, utilizing TSS to calculate motion vectors. These vectors are then used in the track_object() method to update the bounding box around the object of interest, enabling real-time tracking. The GUI dynamically displays video frames and updates the bounding box position, providing users with a comprehensive tool for interactive object tracking and motion analysis. The ninth project encapsulates an object tracking application utilizing the Block-based Gradient Descent Search (BGDS) algorithm, providing users with a user-friendly interface developed using the Tkinter library for GUI and OpenCV for video processing. Upon initialization, the class orchestrates the setup of GUI components, offering intuitive controls for video manipulation and parameter configuration to enhance the object tracking process. Users can seamlessly open video files, control video playback, and adjust algorithm parameters such as block size, search range, iteration limit, and learning rate, empowering them with comprehensive tools for efficient motion estimation. The application's core functionality lies in the block_based_gradient_descent_search() method, implementing the BGDS algorithm for motion estimation by iteratively optimizing motion vectors over blocks in consecutive frames. Leveraging these vectors, the track_object() method dynamically tracks objects within a bounding box, computing mean motion vectors to update bounding box coordinates in real-time. Additionally, interactive features enable users to define bounding boxes around objects of interest through mouse events, facilitating seamless object tracking visualization. Overall, the ObjectTracking_BGDS class offers a versatile and user-friendly platform for object tracking, showcasing the practical application of the BGDS algorithm in real-world scenarios with enhanced ease of use and efficiency.

Book SQLITE QUERIES  ANALYSIS  AND VISUALIZATION WITH PYTHON

Download or read book SQLITE QUERIES ANALYSIS AND VISUALIZATION WITH PYTHON written by Vivian Siahaan and published by BALIGE PUBLISHING. This book was released on 2022-06-01 with total page 48 pages. Available in PDF, EPUB and Kindle. Book excerpt: Sakila for SQLite is a part of the sakila-sample-database-ports project intended to provide ported versions of the original MySQL database for other database systems, including: Oracle, SQL Server, SQLite, Interbase/Firebird, and Microsoft Access. Sakila for SQLite is a port of the Sakila example database available for MySQL, which was originally developed by Mike Hillyer of the MySQL AB documentation team. The project is designed to help database administrators to decide which database to use for development of new products. In this project, you will: read sqlite database and every table in it; read every actor in actor table, read every film in films table; plot case distribution of film release year, film rating, rental duration, and categorize film length; plot rating variable against rental_duration variable in stacked bar plots; plot length variable against rental_duration variable in stacked bar plots; read payment table; plot case distribution of Year, Day, Month, Week, and Quarter of payment; plot which year, month, week, days of week, and quarter have most payment amount; read film list by joining five tables: category, film_category, film_actor, film, and actor; plot case distribution of top 10 and bottom 10 actors; plot which film title have least and most sales; plot which actor have least and most sales; plot which film category have least and most sales; plot case distribution of top 10 and bottom 10 overdue costumers; plot which customer have least and most overdue days; plot which store have most sales; plot average payment amount by month with mean and EWM; and plot payment amount over June 2005.

Book HOUSEHOLD ELECTRIC POWER CONSUMPTION  ANALYSIS  CLUSTERING  AND PREDICTION WITH PYTHON

Download or read book HOUSEHOLD ELECTRIC POWER CONSUMPTION ANALYSIS CLUSTERING AND PREDICTION WITH PYTHON written by Vivian Siahaan and published by BALIGE PUBLISHING. This book was released on 2022-03-03 with total page 150 pages. Available in PDF, EPUB and Kindle. Book excerpt: In this project, you will perform analysis, clustering, and prediction on household electric power consumption with python. The dataset used in this project contains 2075259 measurements gathered between December 2006 and November 2010 (47 months). Following are the attributes in the dataset: date: Date in format dd/mm/yyyy; time: time in format hh:mm:ss; globalactivepower: household global minute-averaged active power (in kilowatt); globalreactivepower: household global minute-averaged reactive power (in kilowatt); voltage: minute-averaged voltage (in volt); global_intensity: household global minute-averaged current intensity (in ampere); submetering1: energy sub-metering No. 1 (in watt-hour of active energy). It corresponds to the kitchen, containing mainly a dishwasher, an oven and a microwave (hot plates are not electric but gas powered); submetering2: energy sub-metering No. 2 (in watt-hour of active energy). It corresponds to the laundry room, containing a washing-machine, a tumble-drier, a refrigerator and a light; and submetering3: energy sub-metering No. 3 (in watt-hour of active energy). It corresponds to an electric water-heater and an air-conditioner. In this project, you will perform clustering using KMeans to get 5 clusters. The machine learning models used in this project to perform regression on total number of purchase and to predict clusters as target variable are K-Nearest Neighbor, Random Forest, Naive Bayes, Logistic Regression, Decision Tree, Support Vector Machine, LGBM, Gradient Boosting, XGB, and MLP. Finally, you will plot boundary decision, distribution of features, feature importance, cross validation score, and predicted values versus true values, confusion matrix, learning curve, performance of the model, scalability of the model, training loss, and training accuracy.

Book DATAFRAME MANIPULATION  THEORY AND APPLICATIONS WITH PYTHON AND TKINTER

Download or read book DATAFRAME MANIPULATION THEORY AND APPLICATIONS WITH PYTHON AND TKINTER written by Vivian Siahaan and published by BALIGE PUBLISHING. This book was released on 2024-08-13 with total page 431 pages. Available in PDF, EPUB and Kindle. Book excerpt: A DataFrame is a fundamental data structure in pandas, a powerful Python library for data manipulation and analysis, designed to handle two-dimensional, labeled data akin to a spreadsheet or SQL table. It simplifies working with tabular data by supporting various operations like filtering, sorting, grouping, and aggregating. DataFrames are easily created from lists, dictionaries, or NumPy arrays and offer flexible data handling, including managing missing values and performing input/output operations with different file formats. Key features include hierarchical indexing for multi-level grouping, time series functionality, and integration with libraries such as NumPy and Matplotlib. DataFrame manipulation encompasses filtering, sorting, merging, grouping, pivoting, and reshaping data, while also allowing custom functions, handling missing data, and managing data types. Mastering these techniques is crucial for efficient data analysis, ensuring clean, transformed data ready for deeper insights and decision-making. In chapter 2, in the first project, we filter a DataFrame named employee_data, which includes columns like 'Name', 'Department', 'Age', 'Salary', and 'Years_Worked', to find employees in the 'Engineering' department with a salary exceeding $70,000. We create the DataFrame using sample data and apply boolean indexing to achieve this. The boolean masks employee_data['Department'] == 'Engineering' and employee_data['Salary'] > 70000 identify rows meeting each condition. Combining these masks with the & operator filters the DataFrame to include only those rows where both conditions are met, resulting in a subset of employees who fit the criteria. The final output displays this filtered DataFrame. In second project, we filter a DataFrame named sales_data, which includes columns such as 'Product', 'Category', 'Quantity Sold', 'Unit Price', and 'Total Revenue', to find products in the 'Electronics' category with quantities sold exceeding 100. We use boolean indexing to achieve this: sales_data['Category'] == 'Electronics' creates a mask for rows in the 'Electronics' category, while sales_data['Quantity_Sold'] > 100 identifies rows where quantities sold are above 100. By combining these masks with the & operator, we filter the DataFrame to include only rows meeting both conditions. The final output displays this filtered subset of products. In third project, we filter a DataFrame named movie_data, which includes columns such as 'Title', 'Genre', 'Release Year', 'Rating', and 'Box Office Earnings', to find movies released after 2010 with a rating above 8. We use boolean indexing where movie_data['Release_Year'] > 2010 creates a mask for movies released after 2010, and movie_data['Rating'] > 8 identifies movies with ratings higher than 8. By combining these masks with the & operator, we filter the DataFrame to include only the rows meeting both conditions. The final output displays the subset of movies that fit these criteria. The fourth project demonstrates a Tkinter-based GUI application for filtering a sales dataset using Python libraries Tkinter, Pandas, and PandasTable. The application allows users to interact with a table displaying sales data, applying filters based on product category and quantity sold. The filter_data() function updates the table to show only items from the selected category with quantities exceeding the specified value, while the refresh_data() function resets the table to display the original dataset. The GUI includes input fields for category selection and quantity entry, along with buttons for filtering and refreshing. The sales data is initially presented in a PandasTable with a toolbar and status bar. Users interact with the interface, which updates and displays filtered data or the full dataset as needed. The fifth project features a Tkinter GUI application that lets users filter a movie dataset by minimum release year and rating using Python libraries Tkinter, Pandas, and PandasTable. The filter_data() function updates the displayed table based on user inputs, while the refresh_data() function resets it to show the original dataset. The GUI includes fields for entering minimum release year and rating, buttons for filtering and refreshing, and a PandasTable for displaying the data. The application allows for interactive data filtering and visualization, with the table initially populated with sample movie data. In the sixth project, a retail store manager uses a DataFrame containing sales data to identify products that are both popular and profitable. By applying logical operators to filter the DataFrame, the goal is to isolate products that have sold more than 100 units and generated revenue exceeding $5000. This filtering is achieved using the Pandas library in Python, where the & operator combines conditions to select the relevant rows. The resulting DataFrame, which includes only products meeting both criteria, provides insights for decision-making and analysis in retail management. The seventh project involves creating a Tkinter-based GUI application to manage and visualize sales data. The GUI displays data in a table and a bar graph, allowing users to filter products based on minimum quantity sold and total revenue. The application uses pandas for data manipulation, pandastable for table display, and matplotlib for the bar graph. The GUI consists of an input frame for user filters and a display frame for showing the table and graph side by side. Users can update the table and graph by clicking "Filter Data" or reset them to the original data with the "Refresh" button, providing an interactive way to analyze sales performance. In chapter three, the first project demonstrates how to sort synthetic financial data for analysis. The code imports libraries, sets random seeds for reproducibility, and generates data for businesses including revenue and expenses. It then creates a DataFrame with this data, sorts it by monthly revenue in descending order, and saves the sorted DataFrame to an Excel file. This process aids in organizing and analyzing financial data, making it easier to identify top-performing businesses. The second project creates a Tkinter GUI to view and interact with synthetic financial data, displaying monthly revenue and expenses for various businesses. It generates random data, stores it in a DataFrame, and sets up a GUI with two tabs: one for sorting by revenue and another for expenses. Each tab features a table to display the data and a matplotlib plot for visual representation. The GUI allows users to sort and view data dynamically, with alternating row colors for readability and embedded plots for better analysis. The third project generates synthetic unemployment data for 10 regions over 5 years, sets random seeds for reproducibility, and creates a DataFrame with the data. It then sorts the DataFrame alphabetically by region and saves it to an Excel file named "synthetic_unemployment_data.xlsx". Finally, the script prints a confirmation message indicating that the data has been successfully saved. The fourth project generates synthetic unemployment data for 25 regions over a 5-year period and creates a Tkinter GUI for interactive data exploration. The data, organized into a DataFrame and saved to an Excel file, is displayed in a tabbed interface with two views: one sorted by unemployment rate and another by year. Each tab features scrollable tables and corresponding bar charts for visual analysis. The UnemploymentDataGUI class manages the interface, updating tables and graphs dynamically to allow users to explore regional and yearly unemployment variations effectively. The fifth project demonstrates how to concatenate dataframes with synthetic temperature data for various countries. Initially, we generate temperature data for countries like the USA and Canada for each month. Next, we create an additional dataframe with temperature data for other countries such as the UK and Germany. We then concatenate the original and additional dataframes into a single dataframe and save the combined data to an Excel file named combined_temperatures.xlsx. The steps involve generating synthetic data, creating additional dataframes, concatenating them, and exporting the result to Excel. The sixth project demonstrates how to build a Tkinter application to visualize synthetic temperature data. The app features a tabbed interface with tabs for displaying raw data, temperature graphs, and filters. It uses alternating row colors for better readability and includes functionality for filtering data by country and month. Users can view and analyze temperature data across different countries through tables and graphical representations, and apply or reset filters as needed. The seventh project demonstrates how to perform an inner join on two synthetic dataframes: one containing housing details and the other containing owner information. First, synthetic data is generated for houses and their owners. The dataframes are then merged on the common key, HouseID, using an inner join to include only rows with matching keys. Finally, the combined data is saved to an Excel file named combined_housing_data.xlsx. The result is an Excel file that contains details about houses along with their respective owners. The eight project provides an interactive platform for managing and visualizing synthetic housing data. Users can view comprehensive tables, apply filters for location and house type, and analyze house price distributions with Matplotlib plots. The application includes tabs for displaying data, filtering results, and generating visualizations, with functionalities to reset filters, save filtered data to Excel, and ensure a user-friendly experience with alternating row colors in tables and dynamic updates. To demonstrate an outer join on DataFrames with synthetic medical data, in ninth project, we create two DataFrames: one for patient information and another for medical records. We then perform an outer join to ensure all patients and records are included, even if some records don't have corresponding patient data. The code generates synthetic data, performs the outer join using pd.merge() on the PatientID column, and saves the result to an Excel file named outer_join_medical_data.xlsx. This approach provides a comprehensive dataset with complete patient and medical record information. The tenth project involves creating a Tkinter-based desktop application to visualize and interact with synthetic medical data. The application uses an outer join to merge patient and medical record datasets, displaying the comprehensive result in a user-friendly table. Users can filter data by patient ID and condition, view distribution graphs of medical conditions, and save filtered results to an Excel file. The GUI, leveraging Tkinter and Matplotlib, includes tabs for data display, filtering, and graph visualization, providing a robust tool for exploring medical datasets. In chapther four, the first project demonstrates creating and manipulating a synthetic insurance dataset. Using numpy and pandas, the script generates random data including columns for Policyholder, Age, State, Coverage_Type, and Premium. It groups this data by State and Coverage_Type to show basic data segmentation, then saves the dataset to an Excel file for further analysis. The code provides a practical framework for simulating and analyzing insurance data by illustrating the process of data creation, grouping, and storage. The second project demonstrates a Tkinter GUI application designed for analyzing a synthetic insurance dataset. The GUI displays 1,000 records of policyholder data in a scrollable table using the Treeview widget, with options to filter by state and coverage type. Users can save filtered data to an Excel file and generate a bar plot of policy distribution by state, integrated into the Tkinter window using Matplotlib. This application provides interactive tools for data exploration, filtering, exporting, and visualization in a user-friendly interface. The third project focuses on creating, analyzing, and aggregating a large synthetic sales dataset with 10,000 records. This dataset includes salespersons, regions, products, sales amounts, and timestamps, simulating a detailed sales environment. The core task involves grouping the data by region, product, and salesperson to calculate total sales and transaction counts. This aggregated data is saved to an Excel file, providing insights into sales performance and trends, which helps businesses optimize their sales strategies and make informed decisions. The fourth project develops a Tkinter GUI for analyzing synthetic sales data, allowing users to explore raw and aggregated data interactively. The application includes a dual-view setup with raw and aggregated data tables, filtering options for region, product, and salesperson, and visualization features for generating plots. Users can apply filters, view data summaries, save results to Excel, and visualize sales trends by region. The GUI is designed to provide a comprehensive tool for data analysis, visualization, and reporting. The dataset includes 10,000 records with attributes such as salesperson, region, product, sales amount, and date, and is grouped by region, product, and salesperson to aggregate sales data. The fifth project demonstrates how to create and analyze a synthetic transportation dataset. The code generates a large dataset simulating vehicle and route data, including distances traveled and durations. It groups the data by vehicle and route, calculating total and average distances and durations, and then saves these aggregated results to an Excel file. This approach allows for detailed examination of transportation patterns and performance metrics, facilitating reporting and decision-making. The sixth project outlines a Tkinter GUI project for analyzing synthetic transportation data using Python. This GUI, combining Tkinter and Matplotlib, provides a user-friendly interface to inspect and visualize large datasets involving vehicle routes, distances, and durations. It features interactive tables for raw and aggregated data, filter options for vehicle, route, and date, and integrates various plots like histograms and bar charts for data visualization. Users can apply filters, view dynamic updates, and save filtered data to Excel. The goal is to facilitate comprehensive data analysis and enhance decision-making through an intuitive, interactive tool. In chapter five, the first project involves generating and analyzing a synthetic dataset representing gold production across countries, years, and regions. The dataset, created with attributes like country, year, region, and production quantities, simulates complex real-world data for detailed analysis. By using the pivot_table method, the data is transformed to aggregate gold production metrics by country and region over different years, revealing trends and patterns. The results are saved as both original and pivoted datasets in Excel files for easy access and further analysis, aiding in decision-making related to mining and resource management. The second project creates an interactive Tkinter GUI to visualize and interact with a large synthetic dataset on gold production, including details on countries, regions, mines, and yearly production. Using pandas and numpy to generate the dataset, the GUI features multiple tabs for viewing the original data, pivoted data, and various summary statistics, alongside graphical visualizations of gold production trends across countries, regions, and years. The application integrates matplotlib for embedding charts within the Tkinter interface, making it a comprehensive tool for exploring and analyzing the data effectively. The third project demonstrates how to create a synthetic dataset simulating stock prices for multiple companies over 10,000 days, using random number generation to simulate stock prices for AAPL, GOOG, AMZN, MSFT, TSLA, and META. The dataset, initially in a wide format with separate columns for each company's stock prices, is then reshaped to a long format using pd.melt(). This long format, where each row represents a single date, stock, and its price, is often better suited for data analysis and visualization. Finally, both the original and unpivoted DataFrames are saved to separate Excel files for further use. The fourth project involves developing a visually engaging Tkinter GUI to analyze and visualize a synthetic stock dataset. The application handles stock price data for multiple companies, offering users both the original and unpivoted DataFrames, along with summary statistics and graphical representations. The GUI includes tabs for viewing raw and transformed data, statistical summaries, and interactive graphs, utilizing Tkinter's advanced widgets for a polished user experience. Data is saved to Excel files, and Matplotlib charts are integrated for clear data visualization, making the tool useful for both casual and advanced analysis of stock market trends. In chapter six, the first project demonstrates creating a large synthetic road traffic dataset with 10,000 rows using randomization techniques. Fields include Date, Time, Location, Vehicle_Count, Average_Speed, and Incident. Random NaN values are introduced into 10% of the dataset to simulate missing data. The dataset is then cleaned by removing rows with any missing values using dropna(), and the resulting cleaned DataFrame is saved to 'cleaned_large_road_traffic_data.xlsx' for further analysis. The second project creates a Tkinter-based GUI to analyze and visualize a synthetic road traffic dataset. It generates a dataset with 10,000 rows, including fields like date, time, location, vehicle count, average speed, and incidents. Random missing values are introduced and then removed by dropping rows with any NaNs. The GUI features four tabs: one for the original dataset, one for the cleaned dataset, one for summary statistics, and one for distribution graphs. Users can explore data tables with Tkinter's Treeview widget and view visualizations such as histograms and bar charts using Matplotlib, providing a comprehensive tool for data analysis. The third project generates a large synthetic electricity dataset to simulate real-world patterns in electricity consumption, temperature, and pricing. Missing values are introduced and then handled by filling gaps with regional averages for consumption, forward-filling temperature data, and using overall means for pricing. The cleaned dataset is saved to an Excel file, offering a valuable resource for testing data processing methods and developing data analysis algorithms in a controlled environment. The fourth project demonstrates a Tkinter GUI for handling missing data in a synthetic electricity dataset. The application offers a multi-tab interface to analyze electricity consumption data, including features for displaying the original and cleaned DataFrames, summary statistics, distribution graphs, and time-series plots. Users can view raw and processed data, explore statistical summaries, and visualize distributions and trends in electricity consumption, temperature, and pricing over time. The GUI integrates data generation, cleaning, and visualization techniques, providing a comprehensive tool for electricity data analysis.

Book MARKETING ANALYSIS AND PREDICTION USING MACHINE LEARNING AND DEEP LEARNING WITH PYTHON

Download or read book MARKETING ANALYSIS AND PREDICTION USING MACHINE LEARNING AND DEEP LEARNING WITH PYTHON written by Vivian Siahaan and published by BALIGE PUBLISHING. This book was released on 2022-02-12 with total page 182 pages. Available in PDF, EPUB and Kindle. Book excerpt: This data set was provided to students for their final project in order to test their statistical analysis skills as part of a MSc. in Business Analytics. It can be utilized for EDA, Statistical Analysis, and Visualizations. Following are the features in the dataset: ID = Customer's unique identifier; Year_Birth = Customer's birth year; Education = Customer's education level; Marital_Status = Customer's marital status; Income = Customer's yearly household income; Kidhome = Number of children in customer's household; Teenhome = Number of teenagers in customer's household; Dt_Customer = Date of customer's enrollment with the company; Recency = Number of days since customer's last purchase; MntWines = Amount spent on wine in the last 2 years; MntFruits = Amount spent on fruits in the last 2 years; MntMeatProducts = Amount spent on meat in the last 2 years; MntFishProducts = Amount spent on fish in the last 2 years; MntSweetProducts = Amount spent on sweets in the last 2 years; MntGoldProds = Amount spent on gold in the last 2 years; NumDealsPurchases = Number of purchases made with a discount; NumWebPurchases = Number of purchases made through the company's web site; NumCatalogPurchases = Number of purchases made using a catalogue; NumStorePurchases = Number of purchases made directly in stores; NumWebVisitsMonth = Number of visits to company's web site in the last month; AcceptedCmp3 = 1 if customer accepted the offer in the 3rd campaign, 0 otherwise; AcceptedCmp4 = 1 if customer accepted the offer in the 4th campaign, 0 otherwise; AcceptedCmp5 = 1 if customer accepted the offer in the 5th campaign, 0 otherwise; AcceptedCmp1 = 1 if customer accepted the offer in the 1st campaign, 0 otherwise; AcceptedCmp2 = 1 if customer accepted the offer in the 2nd campaign, 0 otherwise; Response = 1 if customer accepted the offer in the last campaign, 0 otherwise; Complain = 1 if customer complained in the last 2 years, 0 otherwise; and Country = Customer's location. The machine and deep learning models used in this project are K-Nearest Neighbor, Random Forest, Naive Bayes, Logistic Regression, Decision Tree, Support Vector Machine, LGBM classifier, Gradient Boosting, XGB classifier, MLP classifier, and CNN 1D. Finally, you will plot boundary decision, ROC, distribution of features, feature importance, cross validation score, and predicted values versus true values, confusion matrix, learning curve, performance of the model, scalability of the model, training loss, and training accuracy.

Book WIND POWER ANALYSIS AND FORECASTING USING MACHINE LEARNING WITH PYTHON

Download or read book WIND POWER ANALYSIS AND FORECASTING USING MACHINE LEARNING WITH PYTHON written by Vivian Siahaan and published by BALIGE PUBLISHING. This book was released on 2023-07-09 with total page 229 pages. Available in PDF, EPUB and Kindle. Book excerpt: In this project on wind power analysis and forecasting using machine learning with Python, we started by exploring the dataset. We examined the available features and the target variable, which is the active power generated by wind turbines. The dataset likely contained information about various meteorological parameters and the corresponding active power measurements. To begin our analysis, we focused on the regression task of predicting the active power using regression algorithms. We split the dataset into training and testing sets and preprocessed the data by handling missing values and performing feature scaling. The preprocessing step ensured that the data was suitable for training machine learning models. Next, we trained several regression models on the preprocessed data. We utilized algorithms such as Linear Regression, Decision Tree Regression, Random Forest Regression, and Gradient Boosting Regression. Each model was trained on the training set and evaluated on the testing set using performance metrics like mean squared error (MSE) and R-squared score. After obtaining regression models for active power prediction, we shifted our focus to predicting categorized active power using machine learning models. This involved converting the continuous active power values into discrete categories or classes. We defined categories based on certain thresholds or ranges of active power values. For the categorized active power prediction task, we employed classification algorithms. Similar to the regression task, we split the dataset, preprocessed the data, and trained various classification models. Common classification algorithms used were Logistic Regression, Support Vector Machines (SVM), K-Nearest Neighbors (KNN), Decision Trees, Random Forests, Gradient Boosting, Extreme Gradient Boosting, Multi-Layer Perceptron, and Light Gradient Boosting models. During the training and evaluation of classification models, we used performance metrics like accuracy, precision, recall, and F1-score to assess the models' predictive capabilities. Additionally, we analyzed the classification reports to gain insights into the models' performance for each category. Throughout the process, we paid attention to feature scaling techniques such as normalization and standardization. These techniques were applied to ensure that the features were on a similar scale and to prevent any bias or dominance of certain features during model training. The results of predicting categorized active power using machine learning models were highly encouraging. The models demonstrated exceptional accuracy and exhibited strong classification performance across all categories. The findings from this analysis have significant implications for wind power forecasting and monitoring systems, allowing for more effective categorization and management of wind power generation based on predicted active power levels. To summarize, the wind power analysis and forecasting session involved dataset exploration, active power regression using regression algorithms, and predicting categorized active power using various machine learning models. The regression task aimed to predict continuous active power values, while the classification task aimed to predict discrete categories of active power. Preprocessing, training, evaluation, and performance analysis were key steps throughout the session. The selected models, algorithms, and performance metrics varied depending on the specific task at hand. Overall, the project provided a comprehensive overview of applying machine learning techniques to analyze and forecast wind power generation.

Book GOLD PRICE ANALYSIS AND FORECASTING USING MACHINE LEARNING WITH PYTHON

Download or read book GOLD PRICE ANALYSIS AND FORECASTING USING MACHINE LEARNING WITH PYTHON written by Vivian Siahaan and published by BALIGE PUBLISHING. This book was released on 2022-05-23 with total page 157 pages. Available in PDF, EPUB and Kindle. Book excerpt: The challenge of this project is to accurately predict the future adjusted closing price of Gold ETF across a given period of time in the future. The problem is a regression problem, because the output value which is the adjusted closing price in this project is continuous value. Data for this study is collected from November 18th 2011 to January 1st 2019 from various sources. The data has 1718 rows in total and 80 columns in total. Data for attributes, such as Oil Price, Standard and Poor’s (S&P) 500 index, Dow Jones Index US Bond rates (10 years), Euro USD exchange rates, prices of precious metals Silver and Platinum and other metals such as Palladium and Rhodium, prices of US Dollar Index, Eldorado Gold Corporation and Gold Miners ETF were gathered. The dataset has 1718 rows in total and 80 columns in total. Data for attributes, such as Oil Price, Standard and Poor’s (S&P) 500 index, Dow Jones Index US Bond rates (10 years), Euro USD exchange rates, prices of precious metals Silver and Platinum and other metals such as Palladium and Rhodium, prices of US Dollar Index, Eldorado Gold Corporation and Gold Miners ETF were gathered. To perform forecasting based on regression adjusted closing price of gold, you will use: Linear Regression, Random Forest regression, Decision Tree regression, Support Vector Machine regression, Naïve Bayes regression, K-Nearest Neighbor regression, Adaboost regression, Gradient Boosting regression, Extreme Gradient Boosting regression, Light Gradient Boosting regression, Catboost regression, and MLP regression. The machine learning models used predict gold daily returns as target variable are K-Nearest Neighbor classifier, Random Forest classifier, Naive Bayes classifier, Logistic Regression classifier, Decision Tree classifier, Support Vector Machine classifier, LGBM classifier, Gradient Boosting classifier, XGB classifier, MLP classifier, and Extra Trees classifier. Finally, you will plot boundary decision, distribution of features, feature importance, predicted values versus true values, confusion matrix, learning curve, performance of the model, and scalability of the model.

Book HOUSE PRICE  ANALYSIS AND PREDICTION USING MACHINE LEARNING WITH PYTHON

Download or read book HOUSE PRICE ANALYSIS AND PREDICTION USING MACHINE LEARNING WITH PYTHON written by Vivian Siahaan and published by BALIGE PUBLISHING. This book was released on 2022-02-20 with total page 126 pages. Available in PDF, EPUB and Kindle. Book excerpt: The dataset used in this project is taken from the second chapter of Aurélien Géron's recent book 'Hands-On Machine learning with Scikit-Learn and TensorFlow'. It serves as an excellent introduction to implementing machine learning algorithms because it requires rudimentary data cleaning, has an easily understandable list of variables and sits at an optimal size between being to toyish and too cumbersome. The data contains information from the 1990 California census. Although it may not help you with predicting current housing prices like the Zillow Zestimate dataset, it does provide an accessible introductory dataset for teaching people about the basics of machine learning. The data pertains to the houses found in a given California district and some summary stats about them based on the 1990 census data. Be warned the data aren't cleaned so there are some preprocessing steps required! The columns are as follows: longitude, latitude, housing_median_age, total_rooms, total_bedrooms, population, households, median_income, median_house_value, and ocean_proximity. The machine learning models used in this project used to perform regression on median_house_value and to predict it as target variable are K-Nearest Neighbor, Random Forest, Naive Bayes, Logistic Regression, Decision Tree, Support Vector Machine, LGBM classifier, Gradient Boosting, XGB classifier, and MLP classifier. Finally, you will plot boundary decision, distribution of features, feature importance, cross validation score, and predicted values versus true values, confusion matrix, learning curve, performance of the model, scalability of the model, training loss, and training accuracy.

Book THREE PROJECTS  SQL SERVER AND PYTHON GUI FOR DATA ANALYSIS

Download or read book THREE PROJECTS SQL SERVER AND PYTHON GUI FOR DATA ANALYSIS written by Vivian Siahaan and published by BALIGE PUBLISHING. This book was released on 2022-11-08 with total page 1344 pages. Available in PDF, EPUB and Kindle. Book excerpt: PROJECT 1: FULL SOURCE CODE: SQL SERVER FOR STUDENTS AND DATA SCIENTISTS WITH PYTHON GUI In this project, we provide you with the SQL SERVER version of SQLite sample database named chinook. The chinook sample database is a good database for practicing with SQL, especially PostgreSQL. The detailed description of the database can be found on: https://www.sqlitetutorial.net/sqlite-sample-database/. The sample database consists of 11 tables: The employee table stores employees data such as employee id, last name, first name, etc. It also has a field named ReportsTo to specify who reports to whom; customers table stores customers data; invoices & invoice_items tables: these two tables store invoice data. The invoice table stores invoice header data and the invoice_items table stores the invoice line items data; The artist table stores artists data. It is a simple table that contains only the artist id and name; The album table stores data about a list of tracks. Each album belongs to one artist. However, one artist may have multiple albums; The media_type table stores media types such as MPEG audio and AAC audio files; genre table stores music types such as rock, jazz, metal, etc; The track table stores the data of songs. Each track belongs to one album; playlist & playlist_track tables: The playlist table store data about playlists. Each playlist contains a list of tracks. Each track may belong to multiple playlists. The relationship between the playlist table and track table is many-to-many. The playlist_track table is used to reflect this relationship. In this project, you will write Python script to create every table and insert rows of data into each of them. You will develop GUI with PyQt5 to each table in the database. You will also create GUI to plot: case distribution of order date by year, quarter, month, week, and day; the distribution of amount by year, quarter, month, week, day, and hour; the bottom/top 10 sales by employee, the bottom/top 10 sales by customer, the bottom/top 10 sales by customer, the bottom/top 10 sales by artist, the bottom/top 10 sales by genre, the bottom/top 10 sales by play list, the bottom/top 10 sales by customer city, the bottom/top 10 sales by customer city, the bottom/top 10 sales by customer city, the payment amount by month with mean and EWM, the average payment amount by every month, and amount payment in all years. PROJECT 2: FULL SOURCE CODE: SQL SERVER FOR DATA ANALYTICS AND VISUALIZATION WITH PYTHON GUI This book uses SQL SERVER version of MySQL-based Sakila sample database. It is a fictitious database designed to represent a DVD rental store. The tables of the database include film, film_category, actor, customer, rental, payment and inventory among others. The Sakila sample database is intended to provide a standard schema that can be used for examples in books, tutorials, articles, samples, and so forth. Detailed information about the database can be found on website: https://dev.mysql.com/doc/index-other.html. In this project, you will develop GUI using PyQt5 to: read SQL SERVER database and every table in it; read every actor in actor table, read every film in films table; plot case distribution of film release year, film rating, rental duration, and categorize film length; plot rating variable against rental_duration variable in stacked bar plots; plot length variable against rental_duration variable in stacked bar plots; read payment table; plot case distribution of Year, Day, Month, Week, and Quarter of payment; plot which year, month, week, days of week, and quarter have most payment amount; read film list by joining five tables: category, film_category, film_actor, film, and actor; plot case distribution of top 10 and bottom 10 actors; plot which film title have least and most sales; plot which actor have least and most sales; plot which film category have least and most sales; plot case distribution of top 10 and bottom 10 overdue customers; plot which customer have least and most overdue days; plot which store have most sales; plot average payment amount by month with mean and EWM; and plot payment amount over June 2005. PROJECT 3: ZERO TO MASTERY: THE COMPLETE GUIDE TO LEARNING SQL SERVER AND DATA SCIENCE WITH PYTHON GUI In this project, we provide you with a SQL SERVER version of an Oracle sample database named OT which is based on a global fictitious company that sells computer hardware including storage, motherboard, RAM, video card, and CPU. The company maintains the product information such as name, description standard cost, list price, and product line. It also tracks the inventory information for all products including warehouses where products are available. Because the company operates globally, it has warehouses in various locations around the world. The company records all customer information including name, address, and website. Each customer has at least one contact person with detailed information including name, email, and phone. The company also places a credit limit on each customer to limit the amount that customer can owe. Whenever a customer issues a purchase order, a sales order is created in the database with the pending status. When the company ships the order, the order status becomes shipped. In case the customer cancels an order, the order status becomes canceled. In addition to the sales information, the employee data is recorded with some basic information such as name, email, phone, job title, manager, and hire date. In this project, you will write Python script to create every table and insert rows of data into each of them. You will develop GUI with PyQt5 to each table in the database. You will also create GUI to plot: case distribution of order date by year, quarter, month, week, and day; the distribution of amount by year, quarter, month, week, day, and hour; the distribution of bottom 10 sales by product, top 10 sales by product, bottom 10 sales by customer, top 10 sales by customer, bottom 10 sales by category, top 10 sales by category, bottom 10 sales by status, top 10 sales by status, bottom 10 sales by customer city, top 10 sales by customer city, bottom 10 sales by customer state, top 10 sales by customer state, average amount by month with mean and EWM, average amount by every month, amount feature over June 2016, amount feature over 2017, and amount payment in all years.