- Fundamental to the utility of the data scientist as a person and data science function as an institution within a private or public organization
- Arguments here are ecoonomic in nature -- many assume rational actions taken on the part of an agent. Is this realistic?
- For humans -- probably not
- For organizations -- depends on the organization's makeup and incentives

- We will endeavor to keep out of theoretical economics as much as possible but sometimes that's unavoidable (particularly when analyzing
*incentives*, which are at the core of how the human world functions)

- Why did we create models of the world using probabilistic methods? This wasn't (just) for fun... Taking a step back, why did we create models of the world at all?
- Unless we are purely motivated by science (nothing wrong with this!) we want to
*do*something with these models. In other words, we are going to use these models to*take actions*and / or*make decisions*. Therefore we need to develop a notion of good / bad actions and decisions. - Usually we introduce a
*utility*or*value*function $V(a)$ that we want to maximize. The values $a \in \mathcal A$ are our decisions we make or actions we take, and $\mathcal A$ is the (often quite abstract) space of all possible decisions / actions.- Important to note that $V(a)$ is NOT necessarily valued in monetary units, though that's common in some contexts.

- Here is an important quasi-philosophical point: we usually do
*not*have utility functions defined over observable states of the world $x$ unless we can control them with our actions, e.g. unless $x_t = f(a_{t-1})$.

- Where do statistical models come in? Suppose we have a joint density $p(x, z)$ over latent variables $z$ and observable variables $x$. We've performed inference so that we knoow the posterior (or at least, we've estimated) the posterior $p(z | x)$.
- Now we need to make some decision $a$. We'll introduce a decision-making model $f_\theta(z)$ that generates actions $a$ as $a(z) = f_\theta(z)$.
- This model can take many, many different forms -- this depends on whether $a$ and $z$ are continuous or discrete, what functional restrictions are put on $V$, etc. We will talk about this in some depth.

- Because we don't usually know $z$ for sure, we need to estimate the best action to take over all possible appropriately-weighted $z$. Often we will consider the optimal
*expected*action: $$ \max_a E_{z\sim p(z|x)}[V(a(z))] = \max_\theta \int\ dz\ V(f_\theta(z))\ p(z|x) $$ - We can see how important the accuracy, and often precision, of the estimation of $p(z|x)$ is...

- Estimating this last integral can be done using Monte Carlo methods: $$ \max_\theta \int\ dz\ V(f_\theta(z))\ p(z|x) \approx \max_\theta \frac{1}{N}\sum_n V(f_\theta(z_n)) $$ where each $z_n \sim p(z |x)$. This brings us back to our gradient-based optimization: when $f_\theta$ is differentiable w.r.t. $\theta$, we can do what comes naturally and estimate the optimal parameters via gradient descent: $$ \begin{aligned} \theta_{k} &= \theta_{k - 1} + \gamma \frac{1}{N}\sum_n \nabla_\theta V(f_\theta(z_n))\\ &= \theta_{k - 1} + \gamma \frac{1}{N}\sum_n J_{f_\theta}V\ \nabla_\theta f_\theta(z_n) \end{aligned} $$
- However, oftentimes $f_\theta$ is
*not*differentiable w.r.t. $\theta$. E.g., suppose that $\theta$ is a set of integers that parameterize a linear function of the posterior $z$, which is itself a vector of integers. We need other methods to solve optimization problems of this nature.- Linear and integer programming
- Constraint / SAT solvers
- Global optimization techniques (e.g., Metropolis-Hastings algorithm...we will revisit this shortly...)

- Examples of decision functions -- from finance:
- We have estimated the returns of a set of $n$ assets for the next time period. Construct an optimal portfolio weighting of the assets that maximizes expected alpha) and minimizes expected deviance from market beta). Denote our portfolio's returns by $r_p$ and the appropriate market index's returns by $r_m$.
- Write $r_p = \alpha + r_m \beta + \varepsilon$ and take expectation -- $E[\varepsilon] = 0$. Substitute OLS solution for beta: $\beta = \text{Cov}(r_p, r_m) / \text{Var}(r_m)$, so problem to solve is $$ \max \alpha \text{ subject to } |\beta - 1| < \text{beta tolerance} $$
- Our decision model $a = f_\theta$ outputs a vector $a$ that lies on the $n$-simplex -- i.e., $\sum_i a_i = 1$ and each $a_i \geq 0$, so $a$ is a probability vector. In other words, given that we have $D$ dollars to allocate, we will allocate $a_i D$ dollars to the $i$-th asset.
- Substituting all of our notation and using our usual trick of including the constraint in the objective function, the objective function to be minimized is $$ \min_\theta E_{z_{t + 1}\sim p(z_{t + 1} | x_t,...)} \left[ f_\theta(z_t) - r_m \frac{\text{Cov}(f_\theta(z_t), r_m)}{ \text{Var}(r_m)} + \lambda \left| \frac{\text{Cov}(f_\theta(z_t), r_m)}{ \text{Var}(r_m)} - 1 \right|\right] $$
- Side note: this is a terrifically hard problem. The most difficult bit is what we've assumed away -- that we have an accurate and precise model for the returns for a set of $n$ assets over the next time period.

- Examples of decision functions -- from statistics:
- Problems in statistics can also be formulated this way, and often are in the framework of empirical risk minimization. Your statistical estimator is $f_\theta(z)$ and you're trying to minimize the expected value of a
*risk function*$R(y, \hat y)$ where $\hat y = f_\theta(z)$ -- you're solving $$ \min_\theta \int\ dz\ R(y, f_\theta(z))\ p(z|x) $$ - Two examples -- one classic (from Elements of Statistical Learning, I think) and another that's relevant to some of my interests.