Decision-making using data

  • Fundamental to the utility of the data scientist as a person and data science function as an institution within a private or public organization
  • Arguments here are ecoonomic in nature -- many assume rational actions taken on the part of an agent. Is this realistic?
    • For humans -- probably not
    • For organizations -- depends on the organization's makeup and incentives
  • We will endeavor to keep out of theoretical economics as much as possible but sometimes that's unavoidable (particularly when analyzing incentives, which are at the core of how the human world functions)

Decision-making using data

  • Why did we create models of the world using probabilistic methods? This wasn't (just) for fun... Taking a step back, why did we create models of the world at all?
  • Unless we are purely motivated by science (nothing wrong with this!) we want to do something with these models. In other words, we are going to use these models to take actions and / or make decisions. Therefore we need to develop a notion of good / bad actions and decisions.
  • Usually we introduce a utility or value function $V(a)$ that we want to maximize. The values $a \in \mathcal A$ are our decisions we make or actions we take, and $\mathcal A$ is the (often quite abstract) space of all possible decisions / actions.
    • Important to note that $V(a)$ is NOT necessarily valued in monetary units, though that's common in some contexts.
  • Here is an important quasi-philosophical point: we usually do not have utility functions defined over observable states of the world $x$ unless we can control them with our actions, e.g. unless $x_t = f(a_{t-1})$.

Decision-making using data

  • Where do statistical models come in? Suppose we have a joint density $p(x, z)$ over latent variables $z$ and observable variables $x$. We've performed inference so that we knoow the posterior (or at least, we've estimated) the posterior $p(z | x)$.
  • Now we need to make some decision $a$. We'll introduce a decision-making model $f_\theta(z)$ that generates actions $a$ as $a(z) = f_\theta(z)$.
    • This model can take many, many different forms -- this depends on whether $a$ and $z$ are continuous or discrete, what functional restrictions are put on $V$, etc. We will talk about this in some depth.
  • Because we don't usually know $z$ for sure, we need to estimate the best action to take over all possible appropriately-weighted $z$. Often we will consider the optimal expected action: $$ \max_a E_{z\sim p(z|x)}[V(a(z))] = \max_\theta \int\ dz\ V(f_\theta(z))\ p(z|x) $$
  • We can see how important the accuracy, and often precision, of the estimation of $p(z|x)$ is...
  • Estimating this last integral can be done using Monte Carlo methods: $$ \max_\theta \int\ dz\ V(f_\theta(z))\ p(z|x) \approx \max_\theta \frac{1}{N}\sum_n V(f_\theta(z_n)) $$ where each $z_n \sim p(z |x)$. This brings us back to our gradient-based optimization: when $f_\theta$ is differentiable w.r.t. $\theta$, we can do what comes naturally and estimate the optimal parameters via gradient descent: $$ \begin{aligned} \theta_{k} &= \theta_{k - 1} + \gamma \frac{1}{N}\sum_n \nabla_\theta V(f_\theta(z_n))\\ &= \theta_{k - 1} + \gamma \frac{1}{N}\sum_n J_{f_\theta}V\ \nabla_\theta f_\theta(z_n) \end{aligned} $$
  • However, oftentimes $f_\theta$ is not differentiable w.r.t. $\theta$. E.g., suppose that $\theta$ is a set of integers that parameterize a linear function of the posterior $z$, which is itself a vector of integers. We need other methods to solve optimization problems of this nature.
  • Examples of decision functions -- from finance:
  • We have estimated the returns of a set of $n$ assets for the next time period. Construct an optimal portfolio weighting of the assets that maximizes expected alpha) and minimizes expected deviance from market beta). Denote our portfolio's returns by $r_p$ and the appropriate market index's returns by $r_m$.
  • Write $r_p = \alpha + r_m \beta + \varepsilon$ and take expectation -- $E[\varepsilon] = 0$. Substitute OLS solution for beta: $\beta = \text{Cov}(r_p, r_m) / \text{Var}(r_m)$, so problem to solve is $$ \max \alpha \text{ subject to } |\beta - 1| < \text{beta tolerance} $$
  • Our decision model $a = f_\theta$ outputs a vector $a$ that lies on the $n$-simplex -- i.e., $\sum_i a_i = 1$ and each $a_i \geq 0$, so $a$ is a probability vector. In other words, given that we have $D$ dollars to allocate, we will allocate $a_i D$ dollars to the $i$-th asset.
  • Substituting all of our notation and using our usual trick of including the constraint in the objective function, the objective function to be minimized is $$ \min_\theta E_{z_{t + 1}\sim p(z_{t + 1} | x_t,...)} \left[ f_\theta(z_t) - r_m \frac{\text{Cov}(f_\theta(z_t), r_m)}{ \text{Var}(r_m)} + \lambda \left| \frac{\text{Cov}(f_\theta(z_t), r_m)}{ \text{Var}(r_m)} - 1 \right|\right] $$
  • Side note: this is a terrifically hard problem. The most difficult bit is what we've assumed away -- that we have an accurate and precise model for the returns for a set of $n$ assets over the next time period.
  • Examples of decision functions -- from statistics:
  • Problems in statistics can also be formulated this way, and often are in the framework of empirical risk minimization. Your statistical estimator is $f_\theta(z)$ and you're trying to minimize the expected value of a risk function $R(y, \hat y)$ where $\hat y = f_\theta(z)$ -- you're solving $$ \min_\theta \int\ dz\ R(y, f_\theta(z))\ p(z|x) $$
  • Two examples -- one classic (from Elements of Statistical Learning, I think) and another that's relevant to some of my interests.