“Mining the Impact of Social Media on High-frequency Financial Data“
co-authored with Ray Hashemi, Jeffrey Young, and , Chanchal Tamrakar, Proceedings of the International Conference on Computational Science and Computational Intelligence, forthcoming.
Abstract: Establishing the relationship between stock price changes of a fortune 500 company and events (such as political, social, and/or business) is a multi-dimensional complex problem. However, such events change the social mood, which manifests itself in the social media communications. Therefore, we collected the time series high frequency financial (HFF) data and published time-series tweets about the same company for six months of 2019. The five months data, used to (a) mine the impactful tweets (nuggets) on minute-by-minute stock price changes, (b) discover and validate the nuggets profile, (c) predict the future impactful tweets prior to their effects on the stock price using the HFF data and tweets for the sixth month as a test set, and (d) maintain an up-to-date nuggets profile. The results revealed the successful detection of nuggets of tweets with a certainty factor close to 80%. Such prediction may greatly affect the market analytics decisions.
co-authored with Majid Asadi, Nader Ebrahimi, and Ehsan Soofi, Journal of the Iranian Statistical Society, 20(1), 27-59. DOI: 10.52547/jirss.20.1.27.
Abstract: In recent years, we have studied information properties of various types of mixtures of probability distributions and introduced a new type, which includes previously known mixtures as special cases. These studies are disseminated in different fields: reliability engineering, econometrics, operations research, probability, the information theory, and data mining. This paper presents a holistic view of these studies and provides further insights and examples. We note that the insightful probabilis- tic formulation of the mixing parameters stipulated by Behboodian (1972) is required for a representation of the well-known information measure of the arithmetic mixture. Applications of this information measure presented in this paper include lifetime modeling, system reliability, measuring uncertainty and disagreement of forecasters, probability modeling with partial information, and information loss of kernel estimation. Probabilistic formulations of the mixing weights for various types of mixtures provide the Bayes-Fisher information and the Bayes risk of the mean residual function.
co-authored with Keagan Galbraithy, Ray Hashemi, and Jason Beck, Proceedings of the International Conference on Data Science, forthcoming.
Abstract: The number of days that a home stays on the housing market (Days-On-Market—DOM) provides crucial information at both microlevel (behavior associated with the buyer’s/seller’s decision) and macro level (risk associated with real estate investments and also housing bubbles’ identification). Housing data has a mixture of simple and complex attributes. A complex attribute in contrast with a simple attribute, has an array of values for a real estate property, which creates a major challenge in prediction of DOM. The goals of this research effort are: (a) Providing for complex attributes in DOM’s prediction, (b) Analyzing, designing, and implementing a DOM prediction’s package using Naïve Bayesian and Linear Regression separately, and (c) Establishing the superiority and robustness of the underline models.
co-authored with Majid Asadi, Nader Ebrahimi, and Ehsan Soofi, Statistical Analysis and Data Mining: The American Statistical Association Data Science Journal, 2020, 13, 405-418. DOI: 10.1002/sam.11464.
Abstract: Big data enables reliable estimation of continuous probability density, cumu- lative distribution, survival, hazard rate, and mean residual functions (MRFs). We illustrate that plot of the MRF provides the best resolution for distinguish- ing between distributions. At each point, the MRF gives the mean excess of the data beyond the threshold. Graph of the empirical MRF, called here the MR plot, provides an effective visualization tool. A variety of theoretical and data driven examples illustrate that MR plots of big data preserve the shape of the MRF and complex models require bigger data. The MRF is an optimal predictor of the excess of the random variable. With a suitable prior, the expected MRF gives the Bayes risk in the form of the entropy functional of the survival function, called here the survival entropy. We show that the survival entropy is dominated by the standard deviation (SD) and the equality between the two measures character- izes the exponential distribution. The empirical survival entropy provides a data concentration statistic which is strongly consistent, easy to compute, and less sensitive than the SD to heavy tailed data. An application uses the New York City Taxi database with millions of trip times to illustrate the MR plot as a powerful tool for distinguishing distributions.
co-authored with Ray Hashemi, Azita Bahrami, and Jeffrey Young, Proceedings of the International Conference on Computational Science and Computational Intelligence, 382-387.
Abstract: A multi-recurrent neural network (RNN) hybrid system made up of three RNNs is introduced to predict the stock prices for 10 different companies (five selected from the Dow Jones Industrial Average and five from the Standard and Poor’s 500.) The daily historical data used to train and test the system are collected for the period of October 15, 2013 to March 5, 2019. For each company, the system provides two separate predictions of the daily stock price by using (1) historical stock prices and (2) historical trends along with the historical daily net changes in stock price. The two predictions are mediated to select one as the final output of the hybrid system. For each company, the accuracy of the system was tested for the prediction of the most recent 98 consecutive days using the forecast accuracy measure of the Mean Squared Error (MSR). The results revealed that for every company the difference between the predicted and actual stock price is not statistically different from zero, which is the ideal (error-free) forecast.
co-authored with Kundan Kishor and Suyong Song , Journal of Economic Dynamics and Control, 2018, 90, 76-97. DOI: 10.1016/j.jedc.2018.01.045.
Abstract: This paper estimates the treatment effect of inflation targeting on macroeconomic variables using a semiparametric single index method by taking into account the model misspecification of parametric propensity scores. Our study uses a broader set ofpreconditions for inflation targeting and macroeconomic outcome variables than the existing literature. The results suggest no significant difference in the inflation level and inflation volatility between targeters and non-targeters after the adoption of inflation targeting. We find that inflation targeting reduces sacrifice ratio and interest rate volatility in the developed economies, and that it enhances fiscal discipline in both the industrial and developing countries.
Abstract: The stochastic error distance (SED) introduced by Diebold and Shin (2017) ranks forecast models by divergence between distributions of the errors of the actual and perfect forecast models. The basic SED is defined by the variation distance and provides a representation of the mean absolute error, but by basing ranking on the entire error distribution and divergence, the SED moves beyond the traditional forecast evaluations. First, we establish connections between ranking forecast models by the SED, error entropy and some partial orderings of distributions. Then, we introduce the notion of excess error for forecast errors of magnitudes larger than a tolerance threshold and give the SED representation of the mean excess error (MEE). As a function of the threshold, the MEE is a local risk measure. With the distribution of the absolute error as a prior for the threshold, its Bayes risk is the entropy functional of the survival function, which is a known measure in the information theory and reliability. Notions and results are illustrated using various distributions for the error. The empirical versions of SED, MEE and its Bayes risk are compared with the mean squared error in ranking regression and autoregressive integrated moving average models for forecasting bond risk premia.
“Examining the Success of the Central Banks in Inflation Targeting Countries: The Dynamics of the Inflation Gap and Institutional Characteristics”
co-authored with Kundan Kishor, Studies in Nonlinear Dynamics and Econometrics, 2018, 22(1). DOI: 10.1515/snde-2016-0085.
Abstract: This paper analyzes the performance of the central banks in inflation targeting (IT) countries by examining their success in achieving their explicit inflation targets. For this purpose, we decompose the inflation gap, the difference between actual inflation and the inflation target, into predictable and unpredictable components. We argue that the central banks are successful if the predictable component diminishes over time. The predictable component of the inflation gap is measured by the conditional mean of a parsimonious time-varying autoregressive model. Our results find considerable heterogeneity in the success of these IT countries in achieving their targets at the start of this policy regime. Our findings suggest that the central banks of the IT adopting countries started targeting inflation implicitly before becoming an explicit inflation targeter. The panel data analysis suggests that the relative success of these countries in reducing the gap is influenced by their institutional characteristics, particularly fiscal discipline and macroeconomic performance.
co-authored with Ray Hashemi, Azita Bahrami, Jeffrey Young, and Rosina Campbell. Proceedings of the International Conference on Advances in Information Mining and Management, 2018, 39-45. ISBN: 978-1-61208-654-5.
Abstract: The European Monetary Union (EMU) is a result of an economic integration of European Union member states into a unified economic system. The literature is divided on whether the EMU members benefit from this monetary unification. Considering costs and benefits, a fiscal authority may ask whether it is a good decision to join the EMU. We introduce and develop a decision support system to answer the proposed question using a historical dataset of twelve Macroeconomic Outcomes (MOs) obtained for 31 European countries and for 18 years (1999-2016). The system meets the three-prong goal of: (1) identifying highly relevant MOs for a given year, yi, using the data from years y1 to yi; (2) deriving decision of “join/not-join” the EMU along with its certainty factor using the relevant MOs for yi; and (3) examining the accuracy of the derived decision using the data from yi+1 to y18. The performance analysis of the system reveals that (a) the number of relevant MOs has declined nonlinearly over time, (b) the relevant MOs and decisions are significantly changed before and after the European debt crisis, and (c) the derived decisions by the system has 79% accuracy.
co-authored with Ray Hashemi, Azita Bahrami, and Jeffrey Young, Proceedings of the International Conference on Computational Science and Computational Intelligence, 2017, 350-356. DOI: 10.1109/CSCI.2017.59.
Abstract: Standard and Poor’s ranks S&P 500 components based on a weighting scheme and identifies a set of top companies. The weighting scheme relies only on individual company’s market value and ignores the impact of collective market values on the index. We introduce a ranking methodology based on entropy which results in a new set of top components. Then we compare its predictability power in reference to the index. For this comparison, we develop a method based on Markov Chain and Hidden Markov Chain models. The results reveal that the set of top companies identified by the entropy approach provides a more accurate prediction of the S&P 500 index.
Journal of Economic Literature, 54(4), 2016, 1551-1580. DOI: 10.1257/jel.54.4.1551.
Option Valuation with Maximum Entropy Densities: Accounting for Higher-Order Moments
Abstract: Entropy pricing applies notions of information theory to derive the theoretical value of options. This paper elaborates on the maximum entropy formulation of option pricing given risk-neutral moment constraints computed directly from the observed option prices. A generalization of Shannon entropy, called Renyi entropy is considered to account for extreme values. The solution to this maximum entropy problem provides a class of heavy-tailed distributions. Monte Carlo simulation and empirical evidence suggest that forecast accuracy improves when higher-order risk-neutral moment constraints are used.
An Information Framework for Measuring Perception Alignment in Financial Markets
co-authored with Viktoria Dalko and Hyeeun Shim.
Abstract: At the onset of the COVID-19 pandemic, the CBOE volatility index reached heights last experienced during the 2008 financial crisis. The consensus is that the World Health Organization’s announcement of the pandemic contributed to the high level of volatility. The question arises whether we have a potentially robust measure to quantify the degree that investors’ perceptions were suddenly aligned about future asset returns due to a WHO announcement. This paper provides an information framework to propose measuring the degree of perception alignment based on the perception alignment hypothesis. We provide simulation examples and illustrate empirical evidence of financial market manipulation, and estimate the loss of information due to those cases of perception alignment.
Estimating hedonic models with endogenous marketing time using quantile regression without excluded instruments
co-authored with Jason Beck and Suyong Song.
Abstract: Hedonic modeling has been used to examine the impacts of housing characteristics on selling prices. Digressing from conventional hedonic modeling, we propose a control function approach in quantile regression models to account for heterogeneous effects of endogenous marketing time. Our approach utilizes conditional heteroscedasticity of structural errors in the triangular model as an identification strategy without excluded instruments. We document substantial heterogeneous effects of marketing time across the conditional distribution of housing prices, which show a U-shaped relationship; the marketing time impact is substantially larger for lower and higher quantiles of selling prices than for median selling price.
Does Membership of the EMU Matter for Economic and Financial outcomes?
co-authored with Kundan Kishor and Suyong Song.
Abstract: We examine treatment effects of joining the European monetary union (EMU) on macroeconomic and financial outcomes in member countries. Specifically, we apply propensity score analysis to mitigate the self-selection bias associated with the non-random nature of joining the union. The findings suggest that average treatment effect on the treated (ATT) of the EMU is associated with decline in volatility of inflation, real GDP growth and bond yields. Splitting the sample into the pre-crisis (1990-2008) and the post-crisis (2009-2019) periods and exclusion of Portugal, Ireland, Greece and Spain (PIGS) from the sample show divergent pattern of ATTs on bond yields and the debt-GDP ratio. The results suggest that the fiscal situation in the member states that excluded PIGS worsened in the pre-crisis period. We also find that PIGS benefitted from the EMU membership in terms of lower bond yields in the pre-crisis period.
Estimating loss from extreme climate events within a real options approach
co-authored with Ruth Dittrich.
Abstract: Sea level rise is a major consequences of climate change. This paper studies climate change uncertainty through an information theory framework and examines the current cost of extreme sea level rise within a real options analysis. We first propose an approach to estimate the risk-neutral density of change in global mean sea level and then use the estimated density to compute the expected overall cost from sea level rise. The proposed framework accounts for extreme sea level rise in computing the theoretical option value.
A probabilistic view to capture automation impacts
co-authored with Mariana Saenz.
Abstract: This paper examines the effects of automation on the number of transactions, sales, and cost in the foodservice industry. First, a big data tool is applied to distinguish distributions of transactions during different times of the day. Then the automation impacts on transactions are studied using a probabilistic approach in which the best fitting theoretical probability density is used in constructing simulated sampling distributions. Next, the effects of automation on sales and cost are examined through simulated forecasts and sampling distributions. The simulation studies illustrate how automation increases efficiency and improves forecasting accuracy.