EBookClubs

Read Books & Download eBooks Full Online

EBookClubs

Read Books & Download eBooks Full Online

Book COMPANY BANKRUPTCY ANALYSIS AND PREDICTION USING MACHINE LEARNING WITH PYTHON GUI

Download or read book COMPANY BANKRUPTCY ANALYSIS AND PREDICTION USING MACHINE LEARNING WITH PYTHON GUI written by Vivian Siahaan and published by BALIGE PUBLISHING. This book was released on 2023-08-25 with total page 335 pages. Available in PDF, EPUB and Kindle. Book excerpt: In this comprehensive project titled "Company Bankruptcy Analysis and Prediction Using Machine Learning with Python GUI," we embarked on a journey to explore, analyze, and predict the bankruptcy status of companies. Our project began with an exploration of the dataset, which involved importing it using Pandas and refining it by removing leading spaces and replacing spaces with underscores in column names to ensure consistency. To grasp the dataset's characteristics, we delved into categorized features' distributions, allowing us to understand the underlying patterns within the data. This step helped us gain insights into the distribution of attributes across different classes, aiding in feature selection and engineering. Moving on to the heart of our project, the prediction of company bankruptcy, we employed various machine learning models. Utilizing grid search, we performed hyperparameter tuning to optimize model performance. Our model arsenal included Logistic Regression, K-Nearest Neighbors, Support Vector, Decision Trees, Random Forests, Gradient Boosting, AdaBoost, Extreme Gradient Boosting, Light Gradient Boosting, and Multi-Layer Perceptron (MLP), which were evaluated using accuracy, precision, recall, and F1-score. Transitioning to deep learning, we implemented an Artificial Neural Network (ANN) model. This involved constructing a feed-forward neural network with hidden layers, dropouts, and activation functions. We evaluated the ANN using accuracy, precision, recall, and F1-score, gaining a comprehensive understanding of its classification performance. Our journey into deep learning continued with the implementation of Long Short-Term Memory (LSTM) networks, which are well-suited for sequence data. We structured the LSTM model with multiple layers and dropouts, evaluating its performance using metrics like accuracy, precision, recall, and F1-score. This marked a pivotal step in predicting company bankruptcy. Furthermore, we explored Feed-Forward Neural Networks (FNN) for prediction. Constructing a multi-layered architecture with varied dropouts and activation functions, we assessed its classification capabilities using metrics similar to previous models. Incorporating Recurrent Neural Networks (RNN) added another dimension to our analysis. Building an RNN model with sequential data, we examined its accuracy, precision, recall, and F1-score, highlighting its ability to capture sequential patterns in bankruptcy data. To comprehensively evaluate our models, we employed a range of metrics including precision, recall, F1-score, and accuracy. These metrics enabled us to gauge not only the overall model performance but also its capability to correctly predict bankrupt and non-bankrupt cases. Our project also extended into creating a Python GUI using PyQt. This graphical interface facilitated user interaction, allowing them to input data for prediction and view the outcomes through an intuitive interface. This GUI enhanced accessibility and usability, making it easier for users to engage with our models. In conclusion, our journey through the "Company Bankruptcy Analysis and Prediction Using Machine Learning with Python GUI" project encompassed data exploration, categorized features distribution analysis, model selection, performance evaluation using diverse metrics, and the creation of an interactive GUI. This endeavor combined analytical rigor, machine learning expertise, and user-centric design to provide a comprehensive solution for predicting company bankruptcy.

Book 5 FIVE DATA SCIENCE PROJECTS FOR ANALYSIS  CLASSIFICATION  PREDICTION  AND SENTIMENT ANALYSIS WITH PYTHON GUI

Download or read book 5 FIVE DATA SCIENCE PROJECTS FOR ANALYSIS CLASSIFICATION PREDICTION AND SENTIMENT ANALYSIS WITH PYTHON GUI written by Vivian Siahaan and published by BALIGE PUBLISHING. This book was released on 2022-04-29 with total page 979 pages. Available in PDF, EPUB and Kindle. Book excerpt: PROJECT 1: SUPERMARKET SALES ANALYSIS AND PREDICTION USING MACHINE LEARNING WITH PYTHON GUI The dataset used in this project consists of the growth of supermarkets with high market competitions in most populated cities. The dataset is one of the historical sales of supermarket company which has recorded in 3 different branches for 3 months data. Predictive data analytics methods are easy to apply with this dataset. Attribute information in the dataset are as follows: Invoice id: Computer generated sales slip invoice identification number; Branch: Branch of supercenter (3 branches are available identified by A, B and C); City: Location of supercenters; Customer type: Type of customers, recorded by Members for customers using member card and Normal for without member card; Gender: Gender type of customer; Product line: General item categorization groups - Electronic accessories, Fashion accessories, Food and beverages, Health and beauty, Home and lifestyle, Sports and travel; Unit price: Price of each product in $; Quantity: Number of products purchased by customer; Tax: 5% tax fee for customer buying; Total: Total price including tax; Date: Date of purchase (Record available from January 2019 to March 2019); Time: Purchase time (10am to 9pm); Payment: Payment used by customer for purchase (3 methods are available – Cash, Credit card and Ewallet); COGS: Cost of goods sold; Gross margin percentage: Gross margin percentage; Gross income: Gross income; and Rating: Customer stratification rating on their overall shopping experience (On a scale of 1 to 10). In this project, you will perform predicting rating using machine learning. The machine learning models used in this project to predict clusters as target variable are K-Nearest Neighbor, Random Forest, Naive Bayes, Logistic Regression, Decision Tree, Support Vector Machine, LGBM, Gradient Boosting, XGB, and MLP. Finally, you will plot boundary decision, distribution of features, feature importance, cross validation score, and predicted values versus true values, confusion matrix, learning curve, performance of the model, scalability of the model, training loss, and training accuracy. PROJECT 2: DETECTING CYBERBULLYING TWEETS USING MACHINE LEARNING AND DEEP LEARNING WITH PYTHON GUI As social media usage becomes increasingly prevalent in every age group, a vast majority of citizens rely on this essential medium for day-to-day communication. Social media’s ubiquity means that cyberbullying can effectively impact anyone at any time or anywhere, and the relative anonymity of the internet makes such personal attacks more difficult to stop than traditional bullying. On April 15th, 2020, UNICEF issued a warning in response to the increased risk of cyberbullying during the COVID-19 pandemic due to widespread school closures, increased screen time, and decreased face-to-face social interaction. The statistics of cyberbullying are outright alarming: 36.5% of middle and high school students have felt cyberbullied and 87% have observed cyberbullying, with effects ranging from decreased academic performance to depression to suicidal thoughts. In light of all of this, this dataset contains more than 47000 tweets labelled according to the class of cyberbullying: Age; Ethnicity; Gender; Religion; Other type of cyberbullying; and Not cyberbullying. The data has been balanced in order to contain ~8000 of each class. The models used in this project are K-Nearest Neighbor, Random Forest, Naive Bayes, Logistic Regression, Decision Tree, Support Vector Machine, Adaboost, LGBM classifier, Gradient Boosting, XGB classifier, LSTM, and CNN. Three feature scaling used in machine learning are raw, minmax scaler, and standard scaler. Finally, you will develop a GUI using PyQt5 to plot cross validation score, predicted values versus true values, confusion matrix, learning curve, decision boundaries, performance of the model, scalability of the model, training loss, and training accuracy. PROJECT 3: HIGHER EDUCATION STUDENT ACADEMIC PERFORMANCE ANALYSIS AND PREDICTION USING MACHINE LEARNING WITH PYTHON GUI The dataset used in this project was collected from the Faculty of Engineering and Faculty of Educational Sciences students in 2019. The purpose is to predict students' end-of-term performances using ML techniques. Attribute information in the dataset are as follows: Student ID; Student Age (1: 18-21, 2: 22-25, 3: above 26); Sex (1: female, 2: male); Graduated high-school type: (1: private, 2: state, 3: other); Scholarship type: (1: None, 2: 25%, 3: 50%, 4: 75%, 5: Full); Additional work: (1: Yes, 2: No); Regular artistic or sports activity: (1: Yes, 2: No); Do you have a partner: (1: Yes, 2: No); Total salary if available (1: USD 135-200, 2: USD 201-270, 3: USD 271-340, 4: USD 341-410, 5: above 410); Transportation to the university: (1: Bus, 2: Private car/taxi, 3: bicycle, 4: Other); Accommodation type in Cyprus: (1: rental, 2: dormitory, 3: with family, 4: Other); Mother's education: (1: primary school, 2: secondary school, 3: high school, 4: university, 5: MSc., 6: Ph.D.); Father's education: (1: primary school, 2: secondary school, 3: high school, 4: university, 5: MSc., 6: Ph.D.); Number of sisters/brothers (if available): (1: 1, 2:, 2, 3: 3, 4: 4, 5: 5 or above); Parental status: (1: married, 2: divorced, 3: died - one of them or both); Mother's occupation: (1: retired, 2: housewife, 3: government officer, 4: private sector employee, 5: self-employment, 6: other); Father's occupation: (1: retired, 2: government officer, 3: private sector employee, 4: self-employment, 5: other); Weekly study hours: (1: None, 2: <5 hours, 3: 6-10 hours, 4: 11-20 hours, 5: more than 20 hours); Reading frequency (non-scientific books/journals): (1: None, 2: Sometimes, 3: Often); Reading frequency (scientific books/journals): (1: None, 2: Sometimes, 3: Often); Attendance to the seminars/conferences related to the department: (1: Yes, 2: No); Impact of your projects/activities on your success: (1: positive, 2: negative, 3: neutral); Attendance to classes (1: always, 2: sometimes, 3: never); Preparation to midterm exams 1: (1: alone, 2: with friends, 3: not applicable); Preparation to midterm exams 2: (1: closest date to the exam, 2: regularly during the semester, 3: never); Taking notes in classes: (1: never, 2: sometimes, 3: always); Listening in classes: (1: never, 2: sometimes, 3: always); Discussion improves my interest and success in the course: (1: never, 2: sometimes, 3: always); Flip-classroom: (1: not useful, 2: useful, 3: not applicable); Cumulative grade point average in the last semester (/4.00): (1: <2.00, 2: 2.00-2.49, 3: 2.50-2.99, 4: 3.00-3.49, 5: above 3.49); Expected Cumulative grade point average in the graduation (/4.00): (1: <2.00, 2: 2.00-2.49, 3: 2.50-2.99, 4: 3.00-3.49, 5: above 3.49); Course ID; and OUTPUT: Grade (0: Fail, 1: DD, 2: DC, 3: CC, 4: CB, 5: BB, 6: BA, 7: AA). The models used in this project are K-Nearest Neighbor, Random Forest, Naive Bayes, Logistic Regression, Decision Tree, Support Vector Machine, Adaboost, LGBM classifier, Gradient Boosting, and XGB classifier. Three feature scaling used in machine learning are raw, minmax scaler, and standard scaler. Finally, you will develop a GUI using PyQt5 to plot cross validation score, predicted values versus true values, confusion matrix, learning curve, decision boundaries, performance of the model, scalability of the model, training loss, and training accuracy. PROJECT 4: COMPANY BANKRUPTCY ANALYSIS AND PREDICTION USING MACHINE LEARNING WITH PYTHON GUI The dataset was collected from the Taiwan Economic Journal for the years 1999 to 2009. Company bankruptcy was defined based on the business regulations of the Taiwan Stock Exchange. Attribute information in the dataset are as follows: Y - Bankrupt?: Class label; X1 - ROA(C) before interest and depreciation before interest: Return On Total Assets(C); X2 - ROA(A) before interest and % after tax: Return On Total Assets(A); X3 - ROA(B) before interest and depreciation after tax: Return On Total Assets(B); X4 - Operating Gross Margin: Gross Profit/Net Sales; X5 - Realized Sales Gross Margin: Realized Gross Profit/Net Sales; X6 - Operating Profit Rate: Operating Income/Net Sales; X7 - Pre-tax net Interest Rate: Pre-Tax Income/Net Sales; X8 - After-tax net Interest Rate: Net Income/Net Sales; X9 - Non-industry income and expenditure/revenue: Net Non-operating Income Ratio; X10 - Continuous interest rate (after tax): Net Income-Exclude Disposal Gain or Loss/Net Sales; X11 - Operating Expense Rate: Operating Expenses/Net Sales; X12 - Research and development expense rate: (Research and Development Expenses)/Net Sales X13 - Cash flow rate: Cash Flow from Operating/Current Liabilities; X14 - Interest-bearing debt interest rate: Interest-bearing Debt/Equity; X15 - Tax rate (A): Effective Tax Rate; X16 - Net Value Per Share (B): Book Value Per Share(B); X17 - Net Value Per Share (A): Book Value Per Share(A); X18 - Net Value Per Share (C): Book Value Per Share(C); X19 - Persistent EPS in the Last Four Seasons: EPS-Net Income; X20 - Cash Flow Per Share; X21 - Revenue Per Share (Yuan ¥): Sales Per Share; X22 - Operating Profit Per Share (Yuan ¥): Operating Income Per Share; X23 - Per Share Net profit before tax (Yuan ¥): Pretax Income Per Share; X24 - Realized Sales Gross Profit Growth Rate; X25 - Operating Profit Growth Rate: Operating Income Growth; X26 - After-tax Net Profit Growth Rate: Net Income Growth; X27 - Regular Net Profit Growth Rate: Continuing Operating Income after Tax Growth; X28 - Continuous Net Profit Growth Rate: Net Income-Excluding Disposal Gain or Loss Growth; X29 - Total Asset Growth Rate: Total Asset Growth; X30 - Net Value Growth Rate: Total Equity Growth; X31 - Total Asset Return Growth Rate Ratio: Return on Total Asset Growth; X32 - Cash Reinvestment %: Cash Reinvestment Ratio X33 - Current Ratio; X34 - Quick Ratio: Acid Test; X35 - Interest Expense Ratio: Interest Expenses/Total Revenue; X36 - Total debt/Total net worth: Total Liability/Equity Ratio; X37 - Debt ratio %: Liability/Total Assets; X38 - Net worth/Assets: Equity/Total Assets; X39 - Long-term fund suitability ratio (A): (Long-term Liability+Equity)/Fixed Assets; X40 - Borrowing dependency: Cost of Interest-bearing Debt; X41 - Contingent liabilities/Net worth: Contingent Liability/Equity; X42 - Operating profit/Paid-in capital: Operating Income/Capital; X43 - Net profit before tax/Paid-in capital: Pretax Income/Capital; X44 - Inventory and accounts receivable/Net value: (Inventory+Accounts Receivables)/Equity; X45 - Total Asset Turnover; X46 - Accounts Receivable Turnover; X47 - Average Collection Days: Days Receivable Outstanding; X48 - Inventory Turnover Rate (times); X49 - Fixed Assets Turnover Frequency; X50 - Net Worth Turnover Rate (times): Equity Turnover; X51 - Revenue per person: Sales Per Employee; X52 - Operating profit per person: Operation Income Per Employee; X53 - Allocation rate per person: Fixed Assets Per Employee; X54 - Working Capital to Total Assets; X55 - Quick Assets/Total Assets; X56 - Current Assets/Total Assets; X57 - Cash/Total Assets; X58 - Quick Assets/Current Liability; X59 - Cash/Current Liability; X60 - Current Liability to Assets; X61 - Operating Funds to Liability; X62 - Inventory/Working Capital; X63 - Inventory/Current Liability X64 - Current Liabilities/Liability; X65 - Working Capital/Equity; X66 - Current Liabilities/Equity; X67 - Long-term Liability to Current Assets; X68 - Retained Earnings to Total Assets; X69 - Total income/Total expense; X70 - Total expense/Assets; X71 - Current Asset Turnover Rate: Current Assets to Sales; X72 - Quick Asset Turnover Rate: Quick Assets to Sales; X73 - Working capitcal Turnover Rate: Working Capital to Sales; X74 - Cash Turnover Rate: Cash to Sales; X75 - Cash Flow to Sales; X76 - Fixed Assets to Assets; X77 - Current Liability to Liability; X78 - Current Liability to Equity; X79 - Equity to Long-term Liability; X80 - Cash Flow to Total Assets; X81 - Cash Flow to Liability; X82 - CFO to Assets; X83 - Cash Flow to Equity; X84 - Current Liability to Current Assets; X85 - Liability-Assets Flag: 1 if Total Liability exceeds Total Assets, 0 otherwise; X86 - Net Income to Total Assets; X87 - Total assets to GNP price; X88 - No-credit Interval; X89 - Gross Profit to Sales; X90 - Net Income to Stockholder's Equity; X91 - Liability to Equity; X92 - Degree of Financial Leverage (DFL); X93 - Interest Coverage Ratio (Interest expense to EBIT); X94 - Net Income Flag: 1 if Net Income is Negative for the last two years, 0 otherwise; and X95 - Equity to Liabilitys. The models used in this project are K-Nearest Neighbor, Random Forest, Naive Bayes, Logistic Regression, Decision Tree, Support Vector Machine, Adaboost, LGBM classifier, Gradient Boosting, and XGB classifier. Three feature scaling used in machine learning are raw, minmax scaler, and standard scaler. Finally, you will develop a GUI using PyQt5 to plot cross validation score, predicted values versus true values, confusion matrix, learning curve, decision boundaries, performance of the model, scalability of the model, training loss, and training accuracy. PROJECT 5: DATA SCIENCE FOR RAIN CLASSIFICATION AND PREDICTION WITH PYTHON GUI This dataset contains about 10 years of daily weather observations from many locations across Australia. RainTomorrow is the target variable to predict. You will determine rain or not in the next day. This column is Yes if the rain for that day was 1mm or more. Observations were drawn from numerous weather stations. The daily observations are available from http://www.bom.gov.au/climate/data. The dataset contains 23 attributes. Some of them are as follows: About some of them are: DATE - The date of observation; LOCATION - The common name of the location of the weather station; MINTEMP - The minimum temperature in degrees celsius; MAXTEMP - The maximum temperature in degrees celsius; RAINFALL - The amount of rainfall recorded for the day in mm; EVAPORATION - The so-called Class A pan evaporation (mm) in the 24 hours to 9am; SUNSHINE - The number of hours of bright sunshine in the day; WINDGUESTDIR - The direction of the strongest wind gust in the 24 hours to midnight; WINDGUESTSPEED- The speed (km/h) of the strongest wind gust in the 24 hours to midnight; and WINDDIR9AM - Direction of the wind at 9am. The models used in this project are K-Nearest Neighbor, Random Forest, Naive Bayes, Logistic Regression, Decision Tree, Support Vector Machine, Adaboost, LGBM classifier, Gradient Boosting, and XGB classifier. Three feature scaling used in machine learning are raw, minmax scaler, and standard scaler. Finally, you will develop a GUI using PyQt5 to plot cross validation score, predicted values versus true values, confusion matrix, learning curve, decision boundaries, performance of the model, scalability of the model, training loss, and training accuracy.

Book Bankruptcy Prediction through Soft Computing based Deep Learning Technique

Download or read book Bankruptcy Prediction through Soft Computing based Deep Learning Technique written by Arindam Chaudhuri and published by Springer. This book was released on 2017-12-01 with total page 109 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book proposes complex hierarchical deep architectures (HDA) for predicting bankruptcy, a topical issue for business and corporate institutions that in the past has been tackled using statistical, market-based and machine-intelligence prediction models. The HDA are formed through fuzzy rough tensor deep staking networks (FRTDSN) with structured, hierarchical rough Bayesian (HRB) models. FRTDSN is formalized through TDSN and fuzzy rough sets, and HRB is formed by incorporating probabilistic rough sets in structured hierarchical Bayesian model. Then FRTDSN is integrated with HRB to form the compound FRTDSN-HRB model. HRB enhances the prediction accuracy of FRTDSN-HRB model. The experimental datasets are adopted from Korean construction companies and American and European non-financial companies, and the research presented focuses on the impact of choice of cut-off points, sampling procedures and business cycle on the accuracy of bankruptcy prediction models. The book also highlights the fact that misclassification can result in erroneous predictions leading to prohibitive costs to investors and the economy, and shows that choice of cut-off point and sampling procedures affect rankings of various models. It also suggests that empirical cut-off points estimated from training samples result in the lowest misclassification costs for all the models. The book confirms that FRTDSN-HRB achieves superior performance compared to other statistical and soft-computing models. The experimental results are given in terms of several important statistical parameters revolving different business cycles and sub-cycles for the datasets considered and are of immense benefit to researchers working in this area.

Book ANALYSIS AND PREDICTION PROJECTS USING MACHINE LEARNING AND DEEP LEARNING WITH PYTHON

Download or read book ANALYSIS AND PREDICTION PROJECTS USING MACHINE LEARNING AND DEEP LEARNING WITH PYTHON written by Vivian Siahaan and published by BALIGE PUBLISHING. This book was released on 2022-02-17 with total page 860 pages. Available in PDF, EPUB and Kindle. Book excerpt: PROJECT 1: DEFAULT LOAN PREDICTION BASED ON CUSTOMER BEHAVIOR Using Machine Learning and Deep Learning with Python In finance, default is failure to meet the legal obligations (or conditions) of a loan, for example when a home buyer fails to make a mortgage payment, or when a corporation or government fails to pay a bond which has reached maturity. A national or sovereign default is the failure or refusal of a government to repay its national debt. The dataset used in this project belongs to a Hackathon organized by "Univ.AI". All values were provided at the time of the loan application. Following are the features in the dataset: Income, Age, Experience, Married/Single, House_Ownership, Car_Ownership, Profession, CITY, STATE, CURRENT_JOB_YRS, CURRENT_HOUSE_YRS, and Risk_Flag. The Risk_Flag indicates whether there has been a default in the past or not. The machine learning models used in this project are K-Nearest Neighbor, Random Forest, Naive Bayes, Logistic Regression, Decision Tree, Support Vector Machine, Adaboost, LGBM classifier, Gradient Boosting, XGB classifier, MLP classifier, and CNN 1D. Finally, you will plot boundary decision, ROC, distribution of features, feature importance, cross validation score, and predicted values versus true values, confusion matrix, learning curve, performance of the model, scalability of the model, training loss, and training accuracy. PROJECT 2: AIRLINE PASSENGER SATISFACTION Analysis and Prediction Using Machine Learning and Deep Learning with Python The dataset used in this project contains an airline passenger satisfaction survey. In this case, you will determine what factors are highly correlated to a satisfied (or dissatisfied) passenger and predict passenger satisfaction. Below are the features in the dataset: Gender: Gender of the passengers (Female, Male); Customer Type: The customer type (Loyal customer, disloyal customer); Age: The actual age of the passengers; Type of Travel: Purpose of the flight of the passengers (Personal Travel, Business Travel); Class: Travel class in the plane of the passengers (Business, Eco, Eco Plus); Flight distance: The flight distance of this journey; Inflight wifi service: Satisfaction level of the inflight wifi service (0:Not Applicable;1-5); Departure/Arrival time convenient: Satisfaction level of Departure/Arrival time convenient; Ease of Online booking: Satisfaction level of online booking; Gate location: Satisfaction level of Gate location; Food and drink: Satisfaction level of Food and drink; Online boarding: Satisfaction level of online boarding; Seat comfort: Satisfaction level of Seat comfort; Inflight entertainment: Satisfaction level of inflight entertainment; On-board service: Satisfaction level of On-board service; Leg room service: Satisfaction level of Leg room service; Baggage handling: Satisfaction level of baggage handling; Check-in service: Satisfaction level of Check-in service; Inflight service: Satisfaction level of inflight service; Cleanliness: Satisfaction level of Cleanliness; Departure Delay in Minutes: Minutes delayed when departure; Arrival Delay in Minutes: Minutes delayed when Arrival; and Satisfaction: Airline satisfaction level (Satisfaction, neutral or dissatisfaction) The machine learning models used in this project are K-Nearest Neighbor, Random Forest, Naive Bayes, Logistic Regression, Decision Tree, Support Vector Machine, LGBM classifier, Gradient Boosting, XGB classifier, MLP classifier, and CNN 1D. Finally, you will plot boundary decision, ROC, distribution of features, feature importance, cross validation score, and predicted values versus true values, confusion matrix, learning curve, performance of the model, scalability of the model, training loss, and training accuracy. PROJECT 3: CREDIT CARD CHURNING CUSTOMER ANALYSIS AND PREDICTION USING MACHINE LEARNING AND DEEP LEARNING WITH PYTHON The dataset used in this project consists of more than 10,000 customers mentioning their age, salary, marital_status, credit card limit, credit card category, etc. There are 20 features in the dataset. In the dataset, there are only 16.07% of customers who have churned. Thus, it's a bit difficult to train our model to predict churning customers. Following are the features in the dataset: 'Attrition_Flag', 'Customer_Age', 'Gender', 'Dependent_count', 'Education_Level', 'Marital_Status', 'Income_Category', 'Card_Category', 'Months_on_book', 'Total_Relationship_Count', 'Months_Inactive_12_mon', 'Contacts_Count_12_mon', 'Credit_Limit', 'Total_Revolving_Bal', 'Avg_Open_To_Buy', 'Total_Amt_Chng_Q4_Q1', 'Total_Trans_Amt', 'Total_Trans_Ct', 'Total_Ct_Chng_Q4_Q1', and 'Avg_Utilization_Ratio',. The target variable is 'Attrition_Flag'. The machine learning models used in this project are K-Nearest Neighbor, Random Forest, Naive Bayes, Logistic Regression, Decision Tree, Support Vector Machine, LGBM classifier, Gradient Boosting, XGB classifier, MLP classifier, and CNN 1D. Finally, you will plot boundary decision, ROC, distribution of features, feature importance, cross validation score, and predicted values versus true values, confusion matrix, learning curve, performance of the model, scalability of the model, training loss, and training accuracy. PROJECT 4: MARKETING ANALYSIS AND PREDICTION USING MACHINE LEARNING AND DEEP LEARNING WITH PYTHON This data set was provided to students for their final project in order to test their statistical analysis skills as part of a MSc. in Business Analytics. It can be utilized for EDA, Statistical Analysis, and Visualizations. Following are the features in the dataset: ID = Customer's unique identifier; Year_Birth = Customer's birth year; Education = Customer's education level; Marital_Status = Customer's marital status; Income = Customer's yearly household income; Kidhome = Number of children in customer's household; Teenhome = Number of teenagers in customer's household; Dt_Customer = Date of customer's enrollment with the company; Recency = Number of days since customer's last purchase; MntWines = Amount spent on wine in the last 2 years; MntFruits = Amount spent on fruits in the last 2 years; MntMeatProducts = Amount spent on meat in the last 2 years; MntFishProducts = Amount spent on fish in the last 2 years; MntSweetProducts = Amount spent on sweets in the last 2 years; MntGoldProds = Amount spent on gold in the last 2 years; NumDealsPurchases = Number of purchases made with a discount; NumWebPurchases = Number of purchases made through the company's web site; NumCatalogPurchases = Number of purchases made using a catalogue; NumStorePurchases = Number of purchases made directly in stores; NumWebVisitsMonth = Number of visits to company's web site in the last month; AcceptedCmp3 = 1 if customer accepted the offer in the 3rd campaign, 0 otherwise; AcceptedCmp4 = 1 if customer accepted the offer in the 4th campaign, 0 otherwise; AcceptedCmp5 = 1 if customer accepted the offer in the 5th campaign, 0 otherwise; AcceptedCmp1 = 1 if customer accepted the offer in the 1st campaign, 0 otherwise; AcceptedCmp2 = 1 if customer accepted the offer in the 2nd campaign, 0 otherwise; Response = 1 if customer accepted the offer in the last campaign, 0 otherwise; Complain = 1 if customer complained in the last 2 years, 0 otherwise; and Country = Customer's location. The machine and deep learning models used in this project are K-Nearest Neighbor, Random Forest, Naive Bayes, Logistic Regression, Decision Tree, Support Vector Machine, LGBM classifier, Gradient Boosting, XGB classifier, MLP classifier, and CNN 1D. Finally, you will plot boundary decision, ROC, distribution of features, feature importance, cross validation score, and predicted values versus true values, confusion matrix, learning curve, performance of the model, scalability of the model, training loss, and training accuracy. PROJECT 5: METEOROLOGICAL DATA ANALYSIS AND PREDICTION USING MACHINE LEARNING WITH PYTHON Meteorological phenomena are described and quantified by the variables of Earth's atmosphere: temperature, air pressure, water vapour, mass flow, and the variations and interactions of these variables, and how they change over time. Different spatial scales are used to describe and predict weather on local, regional, and global levels. The dataset used in this project consists of meteorological data with 96453 total number of data points and with 11 attributes/columns. Following are the columns in the dataset: Formatted Date; Summary; Precip Type; Temperature (C); Apparent Temperature (C); Humidity; Wind Speed (km/h); Wind Bearing (degrees); Visibility (km); Pressure (millibars); and Daily Summary. The machine learning models used in this project are K-Nearest Neighbor, Random Forest, Naive Bayes, Logistic Regression, Decision Tree, Support Vector Machine, LGBM classifier, Gradient Boosting, XGB classifier, and MLP classifier. Finally, you will plot boundary decision, distribution of features, feature importance, cross validation score, and predicted values versus true values, confusion matrix, learning curve, performance of the model, scalability of the model, training loss, and training accuracy.

Book Predicting Corporate Bankruptcy with Machine Learning Algorithms

Download or read book Predicting Corporate Bankruptcy with Machine Learning Algorithms written by Grzegorz Sobczak and published by . This book was released on 2017 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: This thesis compares the accuracy of different machine learning methods in bankruptcy prediction based on financial statement data. The two categories of methods applied are support vector machines and boosting. The study uses two independent sources of data, one of which was earlier used in a successful study of boosting. The other dataset was created with similar data from another country, with less features due to data availability. While support vector machines could be applied to both datasets with similar, moderate success, the strong performance of boosting algorithms could not be replicated on the new data. Furthermore, the difference of treatment of type I and type II errors by both groups of methods is discussed. While both groups of methods show potential and might provide valuable insight for a financial analyst, their value for ad-hoc analysis is limited. Despite good performance in machine learning studies, algorithms can perform poorly in similar applications due to data issues. Significant part of the performance loss of the boosting algorithms could be explained by the number of predictors available in the data.

Book Corporate Bankruptcy Prediction

Download or read book Corporate Bankruptcy Prediction written by Błażej Prusak and published by MDPI. This book was released on 2020-06-16 with total page 202 pages. Available in PDF, EPUB and Kindle. Book excerpt: Bankruptcy prediction is one of the most important research areas in corporate finance. Bankruptcies are an indispensable element of the functioning of the market economy, and at the same time generate significant losses for stakeholders. Hence, this book was established to collect the results of research on the latest trends in predicting the bankruptcy of enterprises. It suggests models developed for different countries using both traditional and more advanced methods. Problems connected with predicting bankruptcy during periods of prosperity and recession, the selection of appropriate explanatory variables, as well as the dynamization of models are presented. The reliability of financial data and the validity of the audit are also referenced. Thus, I hope that this book will inspire you to undertake new research in the field of forecasting the risk of bankruptcy.

Book Comparing Classification Models for Bankruptcy Prediction

Download or read book Comparing Classification Models for Bankruptcy Prediction written by Arben Hasanaj and published by . This book was released on 2019 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: This study adds to the large body of literature that aims to predict corporate bankruptcy. It does so by evaluating two established machine learning methods which have shown promising results in earlier studies, namely Support Vector Machines (SVM) and Artificial Neural Networks, specifically Multi-Layer Perceptrons (MLP). Furthermore, Bagging and AdaBoost ensemble variations of these models are tested which have been proposed to improve prediction performance. The unique features of the present study are the sampled companies and the sample size: it focuses on unlisted, smaller companies from Western Europe and comprises 46'857 firm-year observations from 2013 to 2017, of which 7'095 or around 15% represent observations preceding bankruptcy of the respective firm. The used variables are mainly financial ratios which have shown predictive value before, general firm characteristics, and three variables proposed for the special case of unlisted SMEs (age of company, country of domicile, and GDP growth rate). In this regard the MLP models clearly outperform the SVM models and the ensemble variations are also generally able to increase the prediction performance. Nevertheless, the achieved performance level is not deemed good enough for a practical implementation of the models as they are. Based on the findings, the author suggests investing in the collection of high-quality samples, considering different model architectures (Decision Trees or heterogeneous ensembles), as well as scrutinizing the role of variables that capture the business conditions for firms.

Book Statistical Techniques for Bankruptcy Prediction

Download or read book Statistical Techniques for Bankruptcy Prediction written by Volodymyr Perederiy and published by GRIN Verlag. This book was released on 2015-05-22 with total page 106 pages. Available in PDF, EPUB and Kindle. Book excerpt: Master's Thesis from the year 2005 in the subject Business economics - Accounting and Taxes, grade: 1,0, European University Viadrina Frankfurt (Oder), course: International Business Administration, language: English, abstract: Bankruptcy prediction has become during the past 3 decades a matter of ever rising academic interest and intensive research. This is due to the academic appeal of the problem, combined with its importance in practical applications. The practical importance of bankruptcy prediction models grew recently even more, with “Basle-II” regulations, which were elaborated by Basle Committee on Banking Supervision to enhance the stability of international financial system. These regulations oblige financial institutions and banks to estimate the probability of default of their obligors. There exist some fundamental economic theory to base bankruptcy prediction models on, but this typically relies on stock market prices of companies under consideration. These prices are, however, only available for large public listed companies. Models for private firms are therefore empirical in their nature and have to rely on rigorous statistical analysis of all available information for such firms. In 95% of cases, this information is limited to accounting information from the financial statements. Large databases of financial statements (e.g. Compustat in the USA) are maintained and often available for research purposes. Accounting information is particularly important for bankruptcy prediction models in emerging markets. This is because the capital markets in these countries are often underdeveloped and illiquid and don’t deliver sufficient stock market data, even for public/listed companies, for structural models to be applied. The accounting information is normally summarized in so-called financial ratios. Such ratios (e.g. leverage ratio, calculated as Debt to Total Assets of a company) have a long tradition in accounting analysis. Many of these ratios are believed to reflect the financial health of a company and to be related to the bankruptcy. However, these beliefs are often very vague (e.g. leverages above 70% might provoke a bankruptcy) and subjective. Quantitative bankruptcy prediction models objectify these beliefs in that they apply statistical techniques to the accounting data. [...]

Book An Analysis of Features Predicting Bankruptcy of Newly Formed Japanese Small and Medium Sized Firms Using Machine Learning Techniques

Download or read book An Analysis of Features Predicting Bankruptcy of Newly Formed Japanese Small and Medium Sized Firms Using Machine Learning Techniques written by Hong Xu and published by . This book was released on 2018 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: The study analyses the bankruptcy of newly formed Japanese small and medium-sized entities (SMEs) by comparing specific features with newly formed but survived SMEs. It applies Resource-based Firm Theory and other theories to identify the bankruptcy features and uses logit regression to validate those features. It builds bankruptcy prediction models using machine learning techniques based on those features and evaluates the performance of the models.

Book Using Machine Learning to Predict Success Or Failure in Chapter 13 Bankruptcy Cases

Download or read book Using Machine Learning to Predict Success Or Failure in Chapter 13 Bankruptcy Cases written by Warren Agin and published by . This book was released on 2018 with total page 63 pages. Available in PDF, EPUB and Kindle. Book excerpt: Obtaining a chapter 13 bankruptcy discharge is notoriously difficult. Past empirical studies conclude that only one-third of chapter 13 debtors complete their obligations under their plans and obtain a chapter 13 discharge. Many cases end up dismissed, or converted to a case under chapter 7. New data recently made available by the Federal Judicial Center, shows that in recent years only about 39% of chapter 13 filers successfully obtain their chapter 13 discharges. These are low numbers. In this project I examined a public case level database made available in 2017 by the US Federal Judicial Center, based on information collected by the Administrative Office of the United States Courts. The project examines the extent and quality of this data, and the steps needed to use it for advanced statistical analysis and application of machine learning models. This project goes beyond such descriptive statistics. Using machine learning algorithms - so-called artificial intelligence - it describes a model that can predict, using data from the Federal Judicial Center's Integrated Database, whether a debtor will obtain a chapter 13 discharge based only on information provided in the initial petition and summary of schedules. The model is able to predict case results with 70% accuracy overall - and for about 25% of cases can predict results with more than 90% accuracy. When case predictions are cross-referenced against actual case results, the model can assign to specific cases a highly accurate probability of success. The model uses a random forest decision tree algorithm to achieve its results, although nearly similar results were also obtained using a neural network. The model, relevant scripts, and related files and instructions for use are available online through Github at /warrenagin/Ch13Learner.

Book The Prediction of Corporate Bankruptcy

Download or read book The Prediction of Corporate Bankruptcy written by Edward I. Altman and published by Facsimiles-Garl. This book was released on 1988 with total page 192 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Book Models for Predicting Business Bankruptcies and Their Application to Banking and to Financial Regulation

Download or read book Models for Predicting Business Bankruptcies and Their Application to Banking and to Financial Regulation written by James Ming Chen and published by . This book was released on 2019 with total page 33 pages. Available in PDF, EPUB and Kindle. Book excerpt: Models for predicting business bankruptcies have evolved rapidly. Machine learning is displacing traditional statistical methodologies. Three distinct techniques for approaching the classification problem in bankruptcy prediction have emerged: single classification, hybrid classifiers, and classifier ensembles. Methodological heterogeneity through the introduction and integration of machine-learning algorithms (especially support vector machines, decision trees, and genetic algorithms) has improved the accuracy of bankruptcy prediction models. Improved natural language processing has enabled machine learning to combine textual analysis of corporate filings with evaluation of numerical data. Greater accuracy promotes external processes of banks by minimizing credit risk and by facilitating regulatory compliance.

Book Financial Statement Analysis and the Prediction of Financial Distress

Download or read book Financial Statement Analysis and the Prediction of Financial Distress written by William H. Beaver and published by Now Publishers Inc. This book was released on 2011 with total page 89 pages. Available in PDF, EPUB and Kindle. Book excerpt: Financial Statement Analysis and the Prediction of Financial Distress discusses the evolution of three main streams within the financial distress prediction literature: the set of dependent and explanatory variables used, the statistical methods of estimation, and the modeling of financial distress. Section 1 discusses concepts of financial distress. Section 2 discusses theories regarding the use of financial ratios as predictors of financial distress. Section 3 contains a brief review of the literature. Section 4 discusses the use of market price-based models of financial distress. Section 5 develops the statistical methods for empirical estimation of the probability of financial distress. Section 6 discusses the major empirical findings with respect to prediction of financial distress. Section 7 briefly summarizes some of the more relevant literature with respect to bond ratings. Section 8 presents some suggestions for future research and Section 9 presents concluding remarks.

Book Genetic Algorithms in Search  Optimization  and Machine Learning

Download or read book Genetic Algorithms in Search Optimization and Machine Learning written by David Edward Goldberg and published by Addison-Wesley Professional. This book was released on 1989 with total page 436 pages. Available in PDF, EPUB and Kindle. Book excerpt: A gentle introduction to genetic algorithms. Genetic algorithms revisited: mathematical foundations. Computer implementation of a genetic algorithm. Some applications of genetic algorithms. Advanced operators and techniques in genetic search. Introduction to genetics-based machine learning. Applications of genetics-based machine learning. A look back, a glance ahead. A review of combinatorics and elementary probability. Pascal with random number generation for fortran, basic, and cobol programmers. A simple genetic algorithm (SGA) in pascal. A simple classifier system(SCS) in pascal. Partition coefficient transforms for problem-coding analysis.

Book Predictive Analytics and Data Mining

Download or read book Predictive Analytics and Data Mining written by Vijay Kotu and published by Morgan Kaufmann. This book was released on 2014-11-27 with total page 447 pages. Available in PDF, EPUB and Kindle. Book excerpt: Put Predictive Analytics into ActionLearn the basics of Predictive Analysis and Data Mining through an easy to understand conceptual framework and immediately practice the concepts learned using the open source RapidMiner tool. Whether you are brand new to Data Mining or working on your tenth project, this book will show you how to analyze data, uncover hidden patterns and relationships to aid important decisions and predictions. Data Mining has become an essential tool for any enterprise that collects, stores and processes data as part of its operations. This book is ideal for business users, data analysts, business analysts, business intelligence and data warehousing professionals and for anyone who wants to learn Data Mining.You’ll be able to:1. Gain the necessary knowledge of different data mining techniques, so that you can select the right technique for a given data problem and create a general purpose analytics process.2. Get up and running fast with more than two dozen commonly used powerful algorithms for predictive analytics using practical use cases.3. Implement a simple step-by-step process for predicting an outcome or discovering hidden relationships from the data using RapidMiner, an open source GUI based data mining tool Predictive analytics and Data Mining techniques covered: Exploratory Data Analysis, Visualization, Decision trees, Rule induction, k-Nearest Neighbors, Naïve Bayesian, Artificial Neural Networks, Support Vector machines, Ensemble models, Bagging, Boosting, Random Forests, Linear regression, Logistic regression, Association analysis using Apriori and FP Growth, K-Means clustering, Density based clustering, Self Organizing Maps, Text Mining, Time series forecasting, Anomaly detection and Feature selection. Implementation files can be downloaded from the book companion site at www.LearnPredictiveAnalytics.com Demystifies data mining concepts with easy to understand language Shows how to get up and running fast with 20 commonly used powerful techniques for predictive analysis Explains the process of using open source RapidMiner tools Discusses a simple 5 step process for implementing algorithms that can be used for performing predictive analytics Includes practical use cases and examples

Book Python for Data Analysis

Download or read book Python for Data Analysis written by Wes McKinney and published by "O'Reilly Media, Inc.". This book was released on 2017-09-25 with total page 553 pages. Available in PDF, EPUB and Kindle. Book excerpt: Get complete instructions for manipulating, processing, cleaning, and crunching datasets in Python. Updated for Python 3.6, the second edition of this hands-on guide is packed with practical case studies that show you how to solve a broad set of data analysis problems effectively. You’ll learn the latest versions of pandas, NumPy, IPython, and Jupyter in the process. Written by Wes McKinney, the creator of the Python pandas project, this book is a practical, modern introduction to data science tools in Python. It’s ideal for analysts new to Python and for Python programmers new to data science and scientific computing. Data files and related material are available on GitHub. Use the IPython shell and Jupyter notebook for exploratory computing Learn basic and advanced features in NumPy (Numerical Python) Get started with data analysis tools in the pandas library Use flexible tools to load, clean, transform, merge, and reshape data Create informative visualizations with matplotlib Apply the pandas groupby facility to slice, dice, and summarize datasets Analyze and manipulate regular and irregular time series data Learn how to solve real-world data analysis problems with thorough, detailed examples

Book Conformal Prediction for Reliable Machine Learning

Download or read book Conformal Prediction for Reliable Machine Learning written by Vineeth Balasubramanian and published by Newnes. This book was released on 2014-04-23 with total page 323 pages. Available in PDF, EPUB and Kindle. Book excerpt: The conformal predictions framework is a recent development in machine learning that can associate a reliable measure of confidence with a prediction in any real-world pattern recognition application, including risk-sensitive applications such as medical diagnosis, face recognition, and financial risk prediction. Conformal Predictions for Reliable Machine Learning: Theory, Adaptations and Applications captures the basic theory of the framework, demonstrates how to apply it to real-world problems, and presents several adaptations, including active learning, change detection, and anomaly detection. As practitioners and researchers around the world apply and adapt the framework, this edited volume brings together these bodies of work, providing a springboard for further research as well as a handbook for application in real-world problems. - Understand the theoretical foundations of this important framework that can provide a reliable measure of confidence with predictions in machine learning - Be able to apply this framework to real-world problems in different machine learning settings, including classification, regression, and clustering - Learn effective ways of adapting the framework to newer problem settings, such as active learning, model selection, or change detection