Probabilistic programming uses tools of programming languages to encode expressive probabilistic models of data generating processes and to perform statistical inference – inferring latent or unobserved quantities – using those models. Many models of behavior encode an a priori unbounded number of latent random variables. Such models are known as open universe models. For example, in a measurement and signature intelligence (MASINT) application, a generative model might encode the possibility of observing arbitrarily many emitters of radiation; corresponding inference problems might be to infer, based on observed evidence, how many emitters of radiation were observed, how strong the 85th percentile radiation emitter was, or the most likely direction from which the emitters originated.
Performing statistical inference over open-universe models is a challenging task. The task is made harder in an environment that imposes severe constraints on available computation. Today, a distributed network of low-powered sensor nodes today is rarely, if ever, equipped with technology that enables statistical inference on the sensors themselves – let alone technology that enables reasoning over models containing arbitrarily many emitters of radiation! One reason is that such sensors are often built using very small microcontrollers that operate without many of the creature comforts of modern computing environments, such as an operating system and heap memory. Enabling complex statistical inference on such sensors could transform intelligence collection, predictive maintenance, and other fields of vital importance to defense, intelligence, and commercial operations of many types (e.g., oil and gas) in austere environments.
In this short document I discuss a few related topics that point the way toward realizing this goal:
We can (almost!) segment probabilistic programs as follows:
Examples of each of these types of programs are easy to create. In what follows, we display example probabilistic programs written in fmcs
. A simple example of a CU program is a normal model:
double norm(record_t<DTypes<Normal, Gamma>>& r, double obs) {
auto loc = sample(r, "loc", Normal(0, 5), rng);
auto scale = sample(r, "scale", Gamma(2, 2), rng);
return observe(r, "value", Normal(loc, scale), obs);
}
A path of random length is a simple example of a WOU program:
unsigned path(record_t<DTypes<Poisson, Normal>>& r, unsigned mean) {
auto num = sample(r, "num", Poisson(mean), rng);
double loc = 0.0, scale = 1.0;
for (auto t = 0; t != num; ++t) {
auto m = sample(r, "pl/" + std::to_string(t), Normal(m, s), rng);
}return num;
}
A very simple – and instructive – SOU program is the generative form of a Geometric distribution:
unsigned geom(record_t<DTypes<Categorical>>& r, double p) {
unsigned n = 0, s = 0;
while (s < 1) {
"choice/" + std::to_string(n), Categorical({p, 1.0 - p}), rng);
s = sample(r,
++n;
}return n;
}
CU, WOU, and SOU programs do not cover the space of probabilistic programs. For example, imagine a program that chooses to execute geom
with probability q or to execute path
with probability 1 − q. We say that a program has the q-strong open universe property if, at compile time, a subprogram of the program has probability 1 − q of not exhibiting the SOU property.
How likely is it for a SOU program to terminate? Obviously this question is unanswerable for any arbitrary program; however, there are simple probabilistic arguments that are applicable to many examples. Suppose that each address in a SOU program has p as a constant lower bound to the probability of the value of the random variable sampled at that address terminating the program. The probability that the program terminates after values have been sampled at n addresses is then bounded below by 1 − (1 − p)n, approaching 1 at an exponential rate as n becomes large. Indeed, for there to be a constant probability of non-termination at any n requires a extremely steep tail-off in the probability of termination at the n-th sampled address. Extending the argument just used and supposing that Q ∈ (0, 1) is the limiting probability of termination of the SOU program, a similar argument gives the limiting probability of termination as
\[Q = 1 - \prod_{k=1}^\infty (1-p(k)),\]
by which we mean
\[
\sum_{k=1}^\infty \log (1 - p(k)) = \log(1-Q),
\]
where we are making the assumption that the infinite series converges (i.e., that p(k) < 1 for all k).
Taylor expanding the left-hand side and matching terms, we find that
\[
p(k) = 1-\exp(-Q^k/k),
\]
which should be true in the limit (though, of course, this scaling behavior may be violated for any k less than some fixed constant).
From the analysis just presented we can guess that most q-SOU and even SOU programs that we are likely to write are extraordinarily likely to terminate rapidly. From a practical perspective, this means that we may not need to resort to effectively unbounded heap memory in order to sample from – and thus perform inference over – such programs. A practical implementation will require our solving a few related problems, all of which boil down to a fundamental question: how can we maintain a finite-size representation of a program state that could grow without bound?
We outline a methodology for performing inference over and querying SOU programs using only automatic and static allocation. This is feasible – even without arena-based coroutines – while imposing a mild constraint on the form of the query. The methodology is based on further decomposing the inference process into three parts:
This decomposition extends the two-part decomposition introduced in fmcs. The process for drawing one sample of the view of the posterior of the program is displayed in the diagram below.
There are two restrictions placed on this process to guarantee operation using stack memory only:
In other words, the space of posterior queries is limited to those that can be computed using a sequence of query graphs {Gn}1 ≤ n ≤ N with maxn|Gn| bounded above at compile time, and with the following constraint. Suppose there is a mapping f such that VN = f({Gn}1 ≤ n ≤ N) for any N , where V is the desired view of the posterior and VN is the N-th empirical estimate of it. Then there must exist mappings g and h such that Vn = h(Un), where Un = g(Un − 1, Gn) and with maxn|Un| bounded above at compile time.
opencyan
To demonstrate that performing inference over SOU programs using only static and automatic allocation is feasible, we implemented opencyan, a proof-of-concept probabilistic programming language. While definitely not intended for production use, it’s fully functional and concise (299 lines of C++17 code). interested readers can view the entire library and integration tests at the link in the previous sentence. We overview several key features of the library here.
The implementation of collection and view updating simultaneously addresses the finite-size query representation and address size restriction. Collector structs are parameterized over the address type – delegating to the user the responsibility of ensuring that addresses do not dynamically allocate – and over implementations of the collection strategy. Collector structs, by themselves, default to only enabling storage of the log probabilities of the latent random variables log p(z) and of the observed random variables log p(x|z):
template<template<typename> class Impl, typename Address>
struct Collector {
Impl<Address> impl;double log_latent;
double log_likelihood;
0.0), log_likelihood(0.0) {}
Collector(Impl<Address> impl) : impl(impl), log_latent(
void operator()(std::optional<Address> a, Value v, Dist d) {
impl(a, v, d);
}
void clear() {
0.0;
log_latent = 0.0;
log_likelihood =
impl.clear();
} };
What is probably the simplest Impl
of a Collector
simply records the value at a single address:
template<typename Address>
struct GrabOne {
Address addr;
MaybeValue val;
std::nullopt) {}
GrabOne(Address address) : addr(address), val(
void operator()(std::optional<Address> a, Value v, Dist d) {
if (a.has_value()) {
if (a.value() == addr) {
val = v;
}
}
}
void clear() {
std::nullopt;
val =
} };
At this point, you may be wondering why the Impl
handles a std::optional<Address>
rather than an Address
. opencyan
treats all but a finite number of random variables as untraced randomness; when passing the result of a sampled random variable into an Impl
, it may be that such randomness does not have an address. Concomitantly, there are two related sample methods – one for dealing with randomness that is addressable and one for dealing with randomness that is not addressable:
template<template<typename> class Impl, typename Address>
std::nullopt) {
Value sample(Collector<Impl, Address>& coll, Address a, Dist d, rngstate * rng, MaybeValue o = if (!o.has_value()) {
std::visit(
Value v = auto the_dist){ return Value(the_dist.sample(rng)); },
[&rng](
d
);std::visit(
coll.log_latent += auto the_dist){ return the_dist.logprob(v); },
[&v](
d
);
coll(a, v, d);return v;
else {
} std::visit(
coll.log_likelihood += auto the_dist){ return the_dist.logprob(o.value()); },
[&o](
d
);
coll(a, o.value(), d);return o.value();
}
}
template<template<typename> class Impl, typename Address>
Value sample(Collector<Impl, Address>& coll, Dist d, rngstate * rng) {std::visit(
Value v = auto the_dist){ return Value(the_dist.sample(rng)); },
[&rng](
d
);std::visit(
coll.log_latent += auto the_dist){ return the_dist.logprob(v); },
[&v](
d
);std::nullopt, v, d);
coll(return v;
}
This design is very similar to that of fmcs
; compare the syntax of the first implementation of sample
here with that of the top-level definition of sampling in fmcs
:
template<typename D, typename RNG, typename... Ts>
record_t<DTypes<Ts...>>& r, std::string address, D dist, RNG& rng) {
DSType<D> sample(if (r.interp.empty()) return _sample_with_node_interp<D, RNG, Ts...>(r, address, dist, rng); // \todo when is this empty?
return std::visit(
overloaded(return _sample_with_node_interp<D, RNG, Ts...>(r, address, dist, rng); },
[&r, &address, &dist, &rng](RecordStandard) {auto) {return _sample_with_record_interp<D, RNG, Ts...>(r, address, dist, rng); }
[&r, &address, &dist, &rng](
),
r.interp.front()
); }
Constructing a view in opencyan
is also similar in syntax and semantics to doing so in fmcs
. An instance of a struct receives a collection of variables extracted from an instance of the program along with a possibly unnormalized posterior weight of the collection. After every sample from the program, it computes Un = g(Un − 1, Gn) then, when .emit()
is called, it computes and returns Vn = h(Un). A very simple example of this computes the online mean of a single address:
template<template<typename> class CollImpl, typename Address, typename> struct Mean;
template<template<typename> class CollImpl, typename Address>
struct Mean<CollImpl, Address, ViewParams<>> {
Address addr;double value;
double weight;
0.0), weight(0.0) {}
Mean(Address addr) : addr(addr), value(void operator()(Collector<CollImpl, Address>& c, unsigned n) {
if (addr == c.impl.addr) {
if (c.impl.val.has_value()) {
std::exp(c.log_likelihood) * std::get<double>(c.impl.val.value());
value += std::exp(c.log_likelihood);
weight +=
}
}
}double emit() {
return value / weight;
} };
A slightly more complicated example creates an online histogram of maximum value of the addresses sampled from a SOU program:
template<template<typename> class CollImpl, typename AddressType, unsigned N>
struct AddressHist<CollImpl, AddressType, Dimension<N>> {
std::array<AddressType, N> bins;
std::array<unsigned, N> values;
std::array<AddressType, N> bins) : bins(bins) {
AddressHist(for (unsigned ix = 0; ix != N; ++ix) values[ix] = 0;
}void operator()(Collector<CollImpl, AddressType>& c, unsigned n) {
0];
AddressType left = bins[1];
AddressType right = bins[N -
AddressType mid;
AddressType val = c.impl.addr;while (left < right) {
2;
mid = (left + right) / if (bins[mid] < val) {
1;
left += else {
}
right = mid;
}
}1;
values[left] +=
}std::pair<std::array<AddressType, N>, std::array<unsigned, N>> emit() {
return std::pair(bins, values);
} };
For more details on how all these pieces work together, as well as the definition of the (very simple) inference method, you should look at the source code.
There are numerous shortcomings of the library. Aside from the fact that it’s in no way a robust, production-ready implementation (e.g., it is awfully trusting of users of a raw pointer), its core flaw is that it doesn’t easily enable implementation of inference methods that do anything other than sample the program forward. While this does enable importance sampling via likelihood weighting – which is the inference method we implemented – that’s not very sample efficient. A very useful extension of this library would implement the ability to halt and resume execution of probabilistic programs, thereby enabling other forms of importance sampling, variational inference, and markov-chain monte carlo methods. This is possible in C++ while only automatically allocating memory by overloading operator new
in the promise type, or while using only statically allocated memory by overloading operator new
to use a statically allocated memory arena. Another version of this library could eschew C++ altogether and re-implement the above design in Rust which defaults to statically allocated coroutines.