Owl’s Portfoliohttps://www.owlfolio.org/2022-12-01T10:17:57-05:00Thread-safe Environment Variable Mutation: Working Draft 2022-152022-11-15T15:29:33-05:002022-11-27T19:47:00-05:00Zack Weinbergtag:www.owlfolio.org,2022-11-15:/development/thread-safe-environment-variable-mutation-working-draft-2022-15/<p>This is a draft proposal for changes to the POSIX specification for
environment variables (including both the various C library functions
for access to environment variables, and the underlying data structure).
The goal is to make it possible for multithreaded programs to modify
<q>the environment</q> (the set of environment variables, with their
values) safely.</p>
<p>This is a draft proposal for changes to the POSIX specification for
environment variables (including both the various C library functions
for access to environment variables, and the underlying data structure).
The goal is to make it possible for multithreaded programs to modify
<q>the environment</q> (the set of environment variables, with their
values) safely.</p>
<h2 id="background">Background</h2>
<blockquote>
<p>This proposal was inspired by the lengthy discussion of
thread-related limitations of the environment variable API here: <a class="uri" href="https://internals.rust-lang.org/t/synchronized-ffi-access-to-posix-environment-variable-functions/15475">https://internals.rust-lang.org/t/synchronized-ffi-access-to-posix-environment-variable-functions/15475</a>.
An earlier version was posted at <a class="uri" href="https://research.owlfolio.org/scratchpad/threadsafe-env-v0.md">https://research.owlfolio.org/scratchpad/threadsafe-env-v0.md</a>
almost a year ago.</p>
</blockquote>
<p><q>The environment</q> is a set of key-value pairs (key and value are
both strings) supplied to each Unix process by its parent (via
<code>execve</code>). These typically contain small pieces of
information related to the user’s session and its configuration, such as
the preferred UI language and the search path for command-line programs.
The C library provides functions for looking up the value of a key
(<code>getenv</code>), establishing a new key-value pair or changing the
value associated with an existing key (<code>putenv</code>,
<code>setenv</code>), deleting a key (<code>unsetenv</code>), and
clearing the environment entirely (<code>clearenv</code>). All of these
functions existed, in some form, long before the addition of threading
to the POSIX standards, and therefore thread safety was not a concern in
their design.</p>
<p>Modern C libraries include internal locking, sufficient to prevent
the global data structure that holds the environment from being
corrupted by concurrent operations, as long as all accesses go via the
above functions. However, several race conditions still exist for a
multithreaded <em>application</em> that modifies the environment. The
most important of these is that <code>getenv</code> returns a pointer to
a C-string which is part of the live data structure. A call to
<code>putenv</code>, <code>setenv</code>, <code>unsetenv</code>, or
<code>clearenv</code> from another thread may modify or deallocate that
string, racing with <em>the application’s use of its contents.</em> Some
C libraries provide a <code>getenv_r</code> which addresses this race by
copying the string that <code>getenv</code> would return into a
caller-supplied buffer before releasing the internal lock.
Unfortunately, the only way the application can know how big to make the
buffer is by guessing and enlarging the buffer if the call fails.</p>
<p>Another important hole in the thread-safety of the existing API is
the global variable <code>environ</code>, which holds a pointer to the
actual underlying data structure. This variable is accessible to
applications—it has to be, because normal usage is to supply it as the
third argument to <code>execve</code>—but the associated lock object is
<em>not</em> accessible, so any <em>use</em> of this variable in a
multithreaded program (e.g. to iterate over the entire environment)
could race with changes to the environment by another thread. (Note that
one typically calls <code>execve</code> in a child process that has just
been created by <code>fork</code>, which duplicates only the calling
thread and makes the entire address space copy-on-write; <em>in this
context</em>, using <code>environ</code> as the third argument to
<code>execve</code> is safe.)</p>
<p>(As you might expect of an API that dates all the way back to Unix
Version 7, the data pointed to by <code>environ</code> is the simplest
structure that could possibly work: an unsorted array of pointers to C
strings, which are expected to be in the format <code>VAR=VALUE</code>,
with a null pointer for an end-of-list sentinel.)</p>
<p>The current state of affairs is that C library maintainers have
declared that any process that currently hosts more than one thread must
treat the environment as read-only or else risk catastrophic
malfunctions (e.g. corruption of the <code>malloc</code> arena). This
might not <em>seem</em> to be a serious problem, since the most common
reason to <em>want</em> to modify the environment is to tweak settings
at startup time. However, there is no way for one module of a large
program (perhaps assembled from many libraries, maintained by different
groups of people) to know whether some other module has already started
some threads.</p>
<p>Runtimes for languages more managed than C (e.g. Java) sometimes
choose to copy the entire environment into a data structure they control
on startup. This allows them to provide thread safety for all operations
on environment variables from within the language, but it means that
changes are not visible to any code running in the same process that is
not written in the language (e.g. for Java, third-party libraries used
via the <q>native code</q> interface) and also incurs extra startup
costs.</p>
<p>The goal of this proposal is to lay out a combination of new C
library functions, and changes to existing functions and rules, that
will enable programs to read and write environment variables in a
thread-safe manner, accommodating everything application programmers
might reasonably want to do.</p>
<h2 id="design-constraints">Design Constraints</h2>
<p>I have written this proposal with the overarching goal of minimizing
required changes to application code. In particular, programs that
<code>getenv</code> to access individual environment variables, but
never modify environment variables, should continue to work unmodified,
and programs that use <code>environ</code> <em>solely</em> as an
argument to <code>exec</code> and/or <code>spawn</code> functions
(<code>execve</code>, <code>execle</code>, <code>posix_spawnp</code>,
etc.) should also continue to work unmodified. To the maximum extent
possible, <em>single-threaded</em> programs should also continue to work
unmodified, no matter what they do to the environment, or how they do
it.</p>
<p>If possible, there should be no new startup costs for programs that
do not use the environment at all, whether single- or multithreaded.</p>
<h2 id="changes-to-existing-apis">Changes to Existing APIs</h2>
<p>We can get most of the way to a thread-safe environment by making the
following changes to the specifications of existing APIs:</p>
<ol type="1">
<li><p>Require <code>getenv</code>, <code>setenv</code>,
<code>putenv</code>, <code>unsetenv</code>, and <code>clearenv</code> to
be thread-safe, codifying the internal locks that already exist. They
remain async-signal unsafe.</p></li>
<li><p>Declare it to be thread-<em>unsafe</em>, but otherwise
legitimate, to inspect the data pointed to by <code>environ</code>. That
is, a program that directly accesses this data, but does not modify it,
has well-defined behavior if and only if there is only one thread in the
process at the time, or the program supplies its own locking which makes
inspection mutually exclusive with calls to functions that modify the
environment.</p>
<p>Corollary: <code>execve(program, args, environ)</code> is safe on the
child side of <code>fork</code>, but not necessarily otherwise.</p></li>
<li><p>Require the implementation to ensure that strings returned by
<code>getenv</code> will remain allocated for the lifetime of the
process, and will not change after <code>getenv</code> returns.</p></li>
<li><p>Forbid the application to modify the <code>environ</code> global,
or any of the data it transitively points to, by any means other than
the documented set of environment variable access APIs (including any
implementation extensions). Violation of this rule causes the program to
have undefined behavior, as the C standard uses that term. (POSIX
already forbids direct modification of the data pointed to; the new
requirement is to not modify the pointer itself.)</p>
<p>This is the opposite side of the coin from change #3; without it,
there is no way for the implementation to guarantee that strings
returned by <code>getenv</code> will remain valid and immutable for the
lifetime of the process.</p>
<p>As a matter of quality of implementation (QoI),
<em>single-threaded</em> programs (programs that <em>never</em> create a
second thread) that alter environment data directly should continue to
work unmodified.</p>
<p>Corollary: The application is now forbidden to modify or deallocate
any string it has passed to <code>putenv</code>. (I’m calling this out
because the current specification of <code>putenv</code> explicitly says
that modifications are <em>allowed</em>.)</p>
<p>Corollary: The third argument to <code>main</code> is equal to
<code>environ</code> upon program startup, therefore modification of the
data it points to is also forbidden. (I’m calling this out because it
might not be obvious that both pointers point to the same
data.)</p></li>
</ol>
<h2 id="new-apis">New APIs</h2>
<p>The changes described in the previous section have two major
limitations.</p>
<p>First, any string returned by <code>getenv</code> must now remain
allocated and immutable for the lifetime of the process, which means
that this loop</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode c"><code class="sourceCode c"><span id="cb1-1"><a aria-hidden="true" href="#cb1-1" tabindex="-1"></a> <span class="cf">for</span> <span class="op">(</span><span class="dt">int</span> i <span class="op">=</span> <span class="dv">0</span><span class="op">;</span> i <span class="op"><</span> <span class="dv">1000</span><span class="op">;</span> i<span class="op">++)</span> <span class="op">{</span></span>
<span id="cb1-2"><a aria-hidden="true" href="#cb1-2" tabindex="-1"></a> <span class="dt">char</span> buf<span class="op">[</span><span class="dv">5</span><span class="op">];</span></span>
<span id="cb1-3"><a aria-hidden="true" href="#cb1-3" tabindex="-1"></a> snprintf<span class="op">(</span>buf<span class="op">,</span> <span class="kw">sizeof</span> buf<span class="op">,</span> <span class="st">"%d"</span><span class="op">,</span> i<span class="op">);</span></span>
<span id="cb1-4"><a aria-hidden="true" href="#cb1-4" tabindex="-1"></a> setenv<span class="op">(</span><span class="st">"VARIABLE"</span><span class="op">,</span> buf<span class="op">);</span></span>
<span id="cb1-5"><a aria-hidden="true" href="#cb1-5" tabindex="-1"></a> use<span class="op">(</span>getenv<span class="op">(</span><span class="st">"VARIABLE"</span><span class="op">));</span></span>
<span id="cb1-6"><a aria-hidden="true" href="#cb1-6" tabindex="-1"></a> <span class="op">}</span></span></code></pre></div>
<p>must leak 999 copies of a string of the form
<code>VARIABLE=nnn</code>. This is a rare usage pattern <em>except</em>
for shells, where the difference between an environment variable and a
shell-language variable may be blurred or nonexistent. Shells may well
choose to manage their own data structure for variables, but it would be
nice to give them the <em>option</em> of using the C library’s built-ins
without suffering an unavoidable memory leak. Also, language runtimes
that need to copy strings returned by <code>getenv</code> for their own
reasons (e.g. to convert from the locale’s encoding to UTF-8, or because
their notion of a <q>string</q> has to have its lifetime managed by a
garbage collector) should not have to keep the other copy around
forever.</p>
<p>Second, there is no thread-safe way to iterate over all the
environment variables, and the new restrictions on <code>environ</code>
mean there is no longer any supported way to replace the entire set of
variables atomically.</p>
<p>These restrictions can only be lifted by introducing new APIs. I
believe the minimal set of additions is one opaque type and six
functions, as described below. Their names are tentative.</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode c"><code class="sourceCode c"><span id="cb2-1"><a aria-hidden="true" href="#cb2-1" tabindex="-1"></a><span class="co">/** Retrieve the value of an environment variable named NAME.</span></span>
<span id="cb2-2"><a aria-hidden="true" href="#cb2-2" tabindex="-1"></a><span class="co"> *</span></span>
<span id="cb2-3"><a aria-hidden="true" href="#cb2-3" tabindex="-1"></a><span class="co"> * If the variable is found, returns a pointer to a string of the</span></span>
<span id="cb2-4"><a aria-hidden="true" href="#cb2-4" tabindex="-1"></a><span class="co"> * form NAME=value. (Caller must skip over the `NAME=` part to get</span></span>
<span id="cb2-5"><a aria-hidden="true" href="#cb2-5" tabindex="-1"></a><span class="co"> * at the value.) If not found, returns NULL.</span></span>
<span id="cb2-6"><a aria-hidden="true" href="#cb2-6" tabindex="-1"></a><span class="co"> *</span></span>
<span id="cb2-7"><a aria-hidden="true" href="#cb2-7" tabindex="-1"></a><span class="co"> * Unlike getenv(), the string returned by this function is *not*</span></span>
<span id="cb2-8"><a aria-hidden="true" href="#cb2-8" tabindex="-1"></a><span class="co"> * permanently allocated. However, it will remain allocated at least</span></span>
<span id="cb2-9"><a aria-hidden="true" href="#cb2-9" tabindex="-1"></a><span class="co"> * until it is passed to env_release(), putenv(), or (transitively)</span></span>
<span id="cb2-10"><a aria-hidden="true" href="#cb2-10" tabindex="-1"></a><span class="co"> * env_replace_all(). Caller may not modify the string.</span></span>
<span id="cb2-11"><a aria-hidden="true" href="#cb2-11" tabindex="-1"></a><span class="co"> */</span></span>
<span id="cb2-12"><a aria-hidden="true" href="#cb2-12" tabindex="-1"></a><span class="dt">const</span> <span class="dt">char</span> <span class="op">*</span>env_lookup<span class="op">(</span><span class="dt">const</span> <span class="dt">char</span> <span class="op">*</span>name<span class="op">);</span></span>
<span id="cb2-13"><a aria-hidden="true" href="#cb2-13" tabindex="-1"></a></span>
<span id="cb2-14"><a aria-hidden="true" href="#cb2-14" tabindex="-1"></a><span class="co">/** Release a reference to an environment string.</span></span>
<span id="cb2-15"><a aria-hidden="true" href="#cb2-15" tabindex="-1"></a><span class="co"> *</span></span>
<span id="cb2-16"><a aria-hidden="true" href="#cb2-16" tabindex="-1"></a><span class="co"> * It is incorrect to use this function on a string that was not</span></span>
<span id="cb2-17"><a aria-hidden="true" href="#cb2-17" tabindex="-1"></a><span class="co"> * returned by either env_lookup() or env_next() (see below).</span></span>
<span id="cb2-18"><a aria-hidden="true" href="#cb2-18" tabindex="-1"></a><span class="co"> *</span></span>
<span id="cb2-19"><a aria-hidden="true" href="#cb2-19" tabindex="-1"></a><span class="co"> * It is also incorrect to use this function more than once per time a</span></span>
<span id="cb2-20"><a aria-hidden="true" href="#cb2-20" tabindex="-1"></a><span class="co"> * string was returned (that is, for any given char* there must be</span></span>
<span id="cb2-21"><a aria-hidden="true" href="#cb2-21" tabindex="-1"></a><span class="co"> * exactly one call to env_release() per env_lookup()/env_next()).</span></span>
<span id="cb2-22"><a aria-hidden="true" href="#cb2-22" tabindex="-1"></a><span class="co"> *</span></span>
<span id="cb2-23"><a aria-hidden="true" href="#cb2-23" tabindex="-1"></a><span class="co"> * If you pass a string that was returned by env_lookup() or</span></span>
<span id="cb2-24"><a aria-hidden="true" href="#cb2-24" tabindex="-1"></a><span class="co"> * env_next() to putenv() or (transitively) env_replace_all(),</span></span>
<span id="cb2-25"><a aria-hidden="true" href="#cb2-25" tabindex="-1"></a><span class="co"> * that implicitly causes a call to env_release() for that string.</span></span>
<span id="cb2-26"><a aria-hidden="true" href="#cb2-26" tabindex="-1"></a><span class="co"> */</span></span>
<span id="cb2-27"><a aria-hidden="true" href="#cb2-27" tabindex="-1"></a><span class="dt">void</span> env_release<span class="op">(</span><span class="dt">const</span> <span class="dt">char</span> <span class="op">*</span>var<span class="op">);</span></span></code></pre></div>
<p>These functions replace <code>getenv</code> for programs that
frequently modify the environment. This version of the loop shown
earlier</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode c"><code class="sourceCode c"><span id="cb3-1"><a aria-hidden="true" href="#cb3-1" tabindex="-1"></a> <span class="cf">for</span> <span class="op">(</span><span class="dt">int</span> i <span class="op">=</span> <span class="dv">0</span><span class="op">;</span> i <span class="op"><</span> <span class="dv">1000</span><span class="op">;</span> i<span class="op">++)</span> <span class="op">{</span></span>
<span id="cb3-2"><a aria-hidden="true" href="#cb3-2" tabindex="-1"></a> <span class="dt">char</span> buf<span class="op">[</span><span class="dv">5</span><span class="op">];</span></span>
<span id="cb3-3"><a aria-hidden="true" href="#cb3-3" tabindex="-1"></a> snprintf<span class="op">(</span>buf<span class="op">,</span> <span class="kw">sizeof</span> buf<span class="op">,</span> <span class="st">"%d"</span><span class="op">,</span> i<span class="op">);</span></span>
<span id="cb3-4"><a aria-hidden="true" href="#cb3-4" tabindex="-1"></a> setenv<span class="op">(</span><span class="st">"VARIABLE"</span><span class="op">,</span> buf<span class="op">);</span></span>
<span id="cb3-5"><a aria-hidden="true" href="#cb3-5" tabindex="-1"></a></span>
<span id="cb3-6"><a aria-hidden="true" href="#cb3-6" tabindex="-1"></a> <span class="dt">const</span> <span class="dt">char</span> <span class="op">*</span>v <span class="op">=</span> env_lookup<span class="op">(</span><span class="st">"VARIABLE"</span><span class="op">);</span></span>
<span id="cb3-7"><a aria-hidden="true" href="#cb3-7" tabindex="-1"></a> use<span class="op">(</span>v <span class="op">+</span> <span class="kw">sizeof</span> <span class="st">"VARIABLE="</span> <span class="op">-</span> <span class="dv">1</span><span class="op">);</span></span>
<span id="cb3-8"><a aria-hidden="true" href="#cb3-8" tabindex="-1"></a> env_release<span class="op">(</span>v<span class="op">);</span></span>
<span id="cb3-9"><a aria-hidden="true" href="#cb3-9" tabindex="-1"></a> <span class="op">}</span></span></code></pre></div>
<p>does not leak memory. This is also useful for language runtimes that
need to copy strings returned by <code>setenv</code> for their own
reasons, e.g. Rust</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode rust"><code class="sourceCode rust"><span id="cb4-1"><a aria-hidden="true" href="#cb4-1" tabindex="-1"></a><span class="kw">pub</span> <span class="kw">fn</span> var_os<span class="op"><</span>K<span class="op">:</span> <span class="bu">AsRef</span><span class="op"><</span>OsStr<span class="op">>></span>(key<span class="op">:</span> K) <span class="op">-></span> <span class="dt">Option</span><span class="op"><</span>OsString<span class="op">></span> <span class="op">{</span></span>
<span id="cb4-2"><a aria-hidden="true" href="#cb4-2" tabindex="-1"></a> <span class="kw">let</span> kraw <span class="op">=</span> key<span class="op">.</span>as_ref()<span class="op">.</span>as_bytes()<span class="op">;</span></span>
<span id="cb4-3"><a aria-hidden="true" href="#cb4-3" tabindex="-1"></a> <span class="kw">let</span> klen <span class="op">=</span> kraw<span class="op">.</span>len()<span class="op">;</span></span>
<span id="cb4-4"><a aria-hidden="true" href="#cb4-4" tabindex="-1"></a> <span class="kw">let</span> k <span class="op">=</span> <span class="pp">CString::</span>new(kraw)<span class="op">.</span>ok()<span class="op">?;</span></span>
<span id="cb4-5"><a aria-hidden="true" href="#cb4-5" tabindex="-1"></a> <span class="kw">unsafe</span> <span class="op">{</span></span>
<span id="cb4-6"><a aria-hidden="true" href="#cb4-6" tabindex="-1"></a> <span class="kw">let</span> s <span class="op">=</span> <span class="pp">libc::</span>env_lookup(k<span class="op">.</span>as_ptr())<span class="op">;</span></span>
<span id="cb4-7"><a aria-hidden="true" href="#cb4-7" tabindex="-1"></a> <span class="cf">if</span> s<span class="op">.</span>is_null() <span class="op">{</span></span>
<span id="cb4-8"><a aria-hidden="true" href="#cb4-8" tabindex="-1"></a> <span class="cf">return</span> <span class="cn">None</span><span class="op">;</span></span>
<span id="cb4-9"><a aria-hidden="true" href="#cb4-9" tabindex="-1"></a> <span class="op">}</span></span>
<span id="cb4-10"><a aria-hidden="true" href="#cb4-10" tabindex="-1"></a> <span class="co">// this copies the string</span></span>
<span id="cb4-11"><a aria-hidden="true" href="#cb4-11" tabindex="-1"></a> <span class="kw">let</span> v <span class="op">=</span> <span class="pp">OsStringExt::</span>from_vec(</span>
<span id="cb4-12"><a aria-hidden="true" href="#cb4-12" tabindex="-1"></a> <span class="pp">CStr::</span>from_ptr(s <span class="op">+</span> klen <span class="op">+</span> <span class="dv">1</span>)<span class="op">.</span>to_bytes()<span class="op">.</span>to_vec())<span class="op">;</span></span>
<span id="cb4-13"><a aria-hidden="true" href="#cb4-13" tabindex="-1"></a> <span class="pp">libc::</span>env_release(s)<span class="op">;</span></span>
<span id="cb4-14"><a aria-hidden="true" href="#cb4-14" tabindex="-1"></a> <span class="cn">Some</span>(v)</span>
<span id="cb4-15"><a aria-hidden="true" href="#cb4-15" tabindex="-1"></a> <span class="op">}</span></span>
<span id="cb4-16"><a aria-hidden="true" href="#cb4-16" tabindex="-1"></a><span class="op">}</span></span></code></pre></div>
<p>Strings returned by <code>env_lookup</code> can also be released by
passing them to <code>putenv</code>; this facilitates <em>temporary</em>
modifications to the environment.</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode c"><code class="sourceCode c"><span id="cb5-1"><a aria-hidden="true" href="#cb5-1" tabindex="-1"></a> <span class="dt">char</span> <span class="op">*</span>old_TZ <span class="op">=</span> env_lookup<span class="op">(</span><span class="st">"TZ"</span><span class="op">);</span></span>
<span id="cb5-2"><a aria-hidden="true" href="#cb5-2" tabindex="-1"></a> setenv<span class="op">(</span><span class="st">"TZ"</span><span class="op">,</span> <span class="st">"Pacific/Samoa"</span><span class="op">);</span></span>
<span id="cb5-3"><a aria-hidden="true" href="#cb5-3" tabindex="-1"></a> do_something_with_localtime<span class="op">();</span></span>
<span id="cb5-4"><a aria-hidden="true" href="#cb5-4" tabindex="-1"></a> putenv<span class="op">(</span>old_TZ<span class="op">);</span></span></code></pre></div>
<p>Of course the <em>effect</em> of this is global, and therefore it may
not be a sensible thing to do in a multithreaded program.</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode c"><code class="sourceCode c"><span id="cb6-1"><a aria-hidden="true" href="#cb6-1" tabindex="-1"></a><span class="co">/** Replace the entire environment, atomically.</span></span>
<span id="cb6-2"><a aria-hidden="true" href="#cb6-2" tabindex="-1"></a><span class="co"> *</span></span>
<span id="cb6-3"><a aria-hidden="true" href="#cb6-3" tabindex="-1"></a><span class="co"> * The `envp` argument must be an array of `VAR=value` strings,</span></span>
<span id="cb6-4"><a aria-hidden="true" href="#cb6-4" tabindex="-1"></a><span class="co"> * terminated by a NULL pointer, same as the third argument to</span></span>
<span id="cb6-5"><a aria-hidden="true" href="#cb6-5" tabindex="-1"></a><span class="co"> * `execve`. Additionally, `envp` must point to memory allocated by</span></span>
<span id="cb6-6"><a aria-hidden="true" href="#cb6-6" tabindex="-1"></a><span class="co"> * `malloc`, and each of the `VAR=value` strings must be either a</span></span>
<span id="cb6-7"><a aria-hidden="true" href="#cb6-7" tabindex="-1"></a><span class="co"> * string previously returned by `env_lookup` or `env_next` and not</span></span>
<span id="cb6-8"><a aria-hidden="true" href="#cb6-8" tabindex="-1"></a><span class="co"> * yet passed to `env_release`, or a fresh allocation made by</span></span>
<span id="cb6-9"><a aria-hidden="true" href="#cb6-9" tabindex="-1"></a><span class="co"> * `malloc`. The C library takes ownership of all the allocations</span></span>
<span id="cb6-10"><a aria-hidden="true" href="#cb6-10" tabindex="-1"></a><span class="co"> * reachable from `envp`, and performs the equivalent of `env_release`</span></span>
<span id="cb6-11"><a aria-hidden="true" href="#cb6-11" tabindex="-1"></a><span class="co"> * for all strings brought over from the old environment.</span></span>
<span id="cb6-12"><a aria-hidden="true" href="#cb6-12" tabindex="-1"></a><span class="co"> */</span></span>
<span id="cb6-13"><a aria-hidden="true" href="#cb6-13" tabindex="-1"></a><span class="dt">void</span> env_replace_all<span class="op">(</span><span class="dt">const</span> <span class="dt">char</span> <span class="op">**</span>envp<span class="op">);</span></span></code></pre></div>
<p>This function replaces assignment to <code>environ</code>.</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode c"><code class="sourceCode c"><span id="cb7-1"><a aria-hidden="true" href="#cb7-1" tabindex="-1"></a><span class="co">/** Iterator over the environment. */</span></span>
<span id="cb7-2"><a aria-hidden="true" href="#cb7-2" tabindex="-1"></a><span class="kw">typedef</span> <span class="kw">struct</span> <span class="op">{</span> <span class="co">/* unspecified */</span> <span class="op">}</span> ENV_ITER<span class="op">;</span></span>
<span id="cb7-3"><a aria-hidden="true" href="#cb7-3" tabindex="-1"></a></span>
<span id="cb7-4"><a aria-hidden="true" href="#cb7-4" tabindex="-1"></a><span class="co">/** Begin an iteration over environment variables.</span></span>
<span id="cb7-5"><a aria-hidden="true" href="#cb7-5" tabindex="-1"></a><span class="co"> *</span></span>
<span id="cb7-6"><a aria-hidden="true" href="#cb7-6" tabindex="-1"></a><span class="co"> * This function behaves as-if it takes an atomic snapshot of the</span></span>
<span id="cb7-7"><a aria-hidden="true" href="#cb7-7" tabindex="-1"></a><span class="co"> * environment. That is, concurrent modifications to the environment</span></span>
<span id="cb7-8"><a aria-hidden="true" href="#cb7-8" tabindex="-1"></a><span class="co"> * during an iteration will *not* be visible through the iteration.</span></span>
<span id="cb7-9"><a aria-hidden="true" href="#cb7-9" tabindex="-1"></a><span class="co"> * It is unspecified whether such modifications are visible to</span></span>
<span id="cb7-10"><a aria-hidden="true" href="#cb7-10" tabindex="-1"></a><span class="co"> * getenv() or env_lookup() on *any* thread.</span></span>
<span id="cb7-11"><a aria-hidden="true" href="#cb7-11" tabindex="-1"></a><span class="co"> *</span></span>
<span id="cb7-12"><a aria-hidden="true" href="#cb7-12" tabindex="-1"></a><span class="co"> * It is implementation-defined whether the object returned by this</span></span>
<span id="cb7-13"><a aria-hidden="true" href="#cb7-13" tabindex="-1"></a><span class="co"> * function can be used from a different thread than the one that</span></span>
<span id="cb7-14"><a aria-hidden="true" href="#cb7-14" tabindex="-1"></a><span class="co"> * called this function.</span></span>
<span id="cb7-15"><a aria-hidden="true" href="#cb7-15" tabindex="-1"></a><span class="co"> *</span></span>
<span id="cb7-16"><a aria-hidden="true" href="#cb7-16" tabindex="-1"></a><span class="co"> * A thread that is iterating over all environment variables may not</span></span>
<span id="cb7-17"><a aria-hidden="true" href="#cb7-17" tabindex="-1"></a><span class="co"> * call any function that modifies the environment. Violations of</span></span>
<span id="cb7-18"><a aria-hidden="true" href="#cb7-18" tabindex="-1"></a><span class="co"> * this rule do not cause undefined behavior but may cause deadlock.</span></span>
<span id="cb7-19"><a aria-hidden="true" href="#cb7-19" tabindex="-1"></a><span class="co"> */</span></span>
<span id="cb7-20"><a aria-hidden="true" href="#cb7-20" tabindex="-1"></a>ENV_ITER <span class="op">*</span>env_iter<span class="op">(</span><span class="dt">void</span><span class="op">);</span></span>
<span id="cb7-21"><a aria-hidden="true" href="#cb7-21" tabindex="-1"></a></span>
<span id="cb7-22"><a aria-hidden="true" href="#cb7-22" tabindex="-1"></a><span class="co">/** Each time this function is called, it returns a string of the form</span></span>
<span id="cb7-23"><a aria-hidden="true" href="#cb7-23" tabindex="-1"></a><span class="co"> * VAR=value, representing one environment variable, and advances the</span></span>
<span id="cb7-24"><a aria-hidden="true" href="#cb7-24" tabindex="-1"></a><span class="co"> * iteration to the next one. It returns NULL when all of the</span></span>
<span id="cb7-25"><a aria-hidden="true" href="#cb7-25" tabindex="-1"></a><span class="co"> * environment variables have been returned.</span></span>
<span id="cb7-26"><a aria-hidden="true" href="#cb7-26" tabindex="-1"></a><span class="co"> *</span></span>
<span id="cb7-27"><a aria-hidden="true" href="#cb7-27" tabindex="-1"></a><span class="co"> * Each of the strings returned by this function must be released</span></span>
<span id="cb7-28"><a aria-hidden="true" href="#cb7-28" tabindex="-1"></a><span class="co"> * (by passing it to `env_release`, `putenv`, or `env_replace_all`)</span></span>
<span id="cb7-29"><a aria-hidden="true" href="#cb7-29" tabindex="-1"></a><span class="co"> * when the caller is done with it. Note that calls to `putenv` and/or</span></span>
<span id="cb7-30"><a aria-hidden="true" href="#cb7-30" tabindex="-1"></a><span class="co"> * `env_replace_all` must be deferred until after the iteration is complete.</span></span>
<span id="cb7-31"><a aria-hidden="true" href="#cb7-31" tabindex="-1"></a><span class="co"> */</span></span>
<span id="cb7-32"><a aria-hidden="true" href="#cb7-32" tabindex="-1"></a><span class="dt">const</span> <span class="dt">char</span> <span class="op">*</span>env_next<span class="op">(</span>ENV_ITER <span class="op">*</span>iter<span class="op">);</span></span>
<span id="cb7-33"><a aria-hidden="true" href="#cb7-33" tabindex="-1"></a></span>
<span id="cb7-34"><a aria-hidden="true" href="#cb7-34" tabindex="-1"></a><span class="co">/** End an iteration over environment variables.</span></span>
<span id="cb7-35"><a aria-hidden="true" href="#cb7-35" tabindex="-1"></a><span class="co"> * The iterator object need not have been advanced over all of the variables.</span></span>
<span id="cb7-36"><a aria-hidden="true" href="#cb7-36" tabindex="-1"></a><span class="co"> */</span></span>
<span id="cb7-37"><a aria-hidden="true" href="#cb7-37" tabindex="-1"></a><span class="dt">void</span> env_iter_close<span class="op">(</span>ENV_ITER <span class="op">*</span>iter<span class="op">);</span></span></code></pre></div>
<p>These functions provide a thread-safe way to iterate over the
environment. They are modeled on opendir/readdir/closedir.</p>
<h2 id="implementation-notes">Implementation notes</h2>
<p>This section describes one possible way to implement the proposed
changes.</p>
<p>The C library internally maintains a reader-writer lock that protects
the array pointed to by <code>environ</code>. All of the above functions
take this lock in the appropriate sense.</p>
<p>The C library also maintains a table of ancillary data for each
environment variable. It is protected by the same lock as the
<code>environ</code> array, and created on the first call to any
environment-variable access function. The ancillary data includes,
perhaps among other things, a reference count which is incremented by
<code>env_lookup</code> and <code>env_next</code> and decremented by
<code>env_release</code>. This reference count has a special sentinel
value (probably represented as <code>(T)-1</code> for some type T) which
means <em>either</em> that the <code>VAR=value</code> string for that
variable has not been changed from what it was on program startup (and
therefore the string is in the <q>information block</q> created by
<code>execve</code>, rather than the <code>malloc</code> heap)
<em>or</em> that the variable has been read by a call to
<code>getenv</code> and therefore the <code>VAR=value</code> string can
no longer be deallocated.</p>
<p>The ancillary table maintains entries for strings that have been
removed from the environment, until their reference counts drop to zero.
This means it has to be indexed by the address of the
<code>VAR=value</code> string, rather than by offset in the
<code>environ</code> array.</p>
<p><code>setenv</code> and <code>putenv</code> can change an existing
<code>VAR=val</code> string, if and only if its reference count is 1
(i.e. the string is in the <code>malloc</code> heap <em>and</em> the
only live reference is from the <code>environ</code> array). Otherwise
<code>setenv</code> creates a new <code>VAR=val</code> string on the
heap and replaces the old entry in <code>environ</code> with it, and
<code>putenv</code> swaps in the string it was given.</p>
<p><code>environ</code> itself is copied into <code>malloc</code> space
on the first call to any function that modifies environment variables,
and possibly also by any function that <em>reads</em> environment
variables (e.g. so that it can be sorted).</p>
<p><code>env_iter</code> <em>may</em> take a read lock and hold it until
the matching call to <code>env_iter_close</code>. An implementation that
does this will be subject to all the dire caveats listed in the spec for
<code>env_iter</code>:</p>
<ul>
<li>The <code>ENV_ITER</code> object is tied to the thread that called
<code>env_iter</code>.</li>
<li>Modifications to the environment, concurrent with an iteration, are
not visible to <em>any</em> thread until the iteration finishes (because
the modification call will block until it can acquire a write
lock).</li>
<li>An attempt to modify the environment, from a thread that is
iterating over it, will deadlock.</li>
</ul>
<p>An alternative implementation is for <code>env_iter</code> to copy
the <code>environ</code> array to private storage (inside the
<code>ENV_ITER</code> object), and increment all the reference counts on
individual strings, before returning. This design would permit it to
return <em>without</em> holding a read lock, and thus avoid all of the
above issues, but it will have higher overhead, particularly in the case
where iterations are frequently cancelled before completion.</p>
<p>Optionally, as a safety measure, have the OS kernel make the
<q>information block</q> created by <code>execve</code> be read-only.
This means that several of the scenarios described as undefined
behavior, above, will lead to prompt memory protection exceptions as
long as the variable being tampered with has not been changed since
process startup. Note that this would also affect the command line
arguments and the ELF auxiliary vector, with potential negative fallout;
in particular, GNU <code>getopt</code> may permute the elements of
<code>argv</code>, which would crash, and it would no longer be possible
to erase the value of the <code>AT_RANDOM</code> auxv entry after using
it. (Probably <code>AT_RANDOM</code> should just be removed, since
everyone has <code>getrandom(2)</code> or equivalent now. I’m not sure
what to do about <code>getopt</code>.)</p>Corrected UTF-82022-11-14T09:55:02-05:002022-11-27T19:47:00-05:00Zack Weinbergtag:www.owlfolio.org,2022-11-14:/development/corrected-utf-8/<p>UTF-8 is decent and all but it contains some design errors, partly
because its original designers just messed up, and partly because of ISO
and Unicode Consortium internal politics. We’re probably going to be
using it forever so it would be good to correct these design errors
before they get any more entrenched than they already have.</p>
<p>Corrected UTF-8 is <em>almost</em> the same as UTF-8. We make only
three changes: overlength encodings become <em>impossible</em> instead
of just forbidden; the C1 controls and the Unicode <q>surrogate
characters</q> are not encoded; and the artifical upper limit on the
code space is removed.</p>
<p>UTF-8 is decent and all but it contains some design errors, partly
because its original designers just messed up, and partly because of ISO
and Unicode Consortium internal politics. We’re probably going to be
using it forever so it would be good to correct these design errors
before they get any more entrenched than they already have.</p>
<p>Corrected UTF-8 is <em>almost</em> the same as UTF-8. We make only
three changes: overlength encodings become <em>impossible</em> instead
of just forbidden; the C1 controls and the Unicode <q>surrogate
characters</q> are not encoded; and the artifical upper limit on the
code space is removed.</p>
<p>The key words <q>MUST,</q> <q>MUST NOT,</q> <q>REQUIRED,</q>
<q>SHALL,</q> <q>SHALL NOT,</q> <q>SHOULD,</q> <q>SHOULD NOT,</q>
<q>RECOMMENDED,</q> <q>MAY,</q> and <q>OPTIONAL</q> in this document are
to be interpreted as described in <a href="https://datatracker.ietf.org/doc/html/rfc2119">RFC 2119</a>.</p>
<h2 id="eliminating-overlength-encodings">Eliminating overlength
encodings</h2>
<p>The possibility of overlength encodings is the design error in UTF-8
that’s just a plain old mistake. As originally specified, the codepoint
U+002F (SOLIDUS, <code>/</code>) could be encoded as the one-byte
sequence <code>2F</code>, or the two-byte sequence <code>C0 AF</code>,
or the three-byte sequence <code>E0 80 AF</code>, etc. <a href="https://capec.mitre.org/data/definitions/80.html">This led to
security holes</a> and so the specification was revised to say that a
UTF-8 encoder must produce the shortest possible sequence that can
represent a codepoint, and a decoder must reject any byte sequence
that’s longer than it needs to be.</p>
<p>Corrected UTF-8 instead adds offsets to the codepoints encoded by all
sequences of at least two bytes, so that every possible sequence is the
<em>unique</em> encoding of a single codepoint. For example, a two-byte
sequence, 110xxxxx 10yyyyyy, encodes the codepoint 0000 0xxx xxyy yyyy
<em>plus 160</em>; therefore, <code>C0 AF</code> becomes the unique
encoding of U+00CF (LATIN CAPITAL LETTER I WITH DIAERESIS,
<code>Ï</code>).</p>
<h2 id="not-encoding-c1-controls-or-surrogates">Not encoding C1 controls
or surrogates</h2>
<p>The C1 control character range (U+0080 through U+009F) is included in
Unicode <a href="https://www.unicode.org/versions/Unicode12.0.0/ch23.pdf#page=3">primarily
for backward compatibility with ISO/IEC 2022</a>, an older character
encoding standard in which the <em>byte</em> ranges <code>00</code>
through <code>1F</code> and <code>7F</code> through <code>9F</code> are
reserved for control characters.</p>
<p>It is never appropriate to use the C1 controls in interchangeable
text, as they are very likely to be misinterpreted according to one of
the DOS code pages that defined bytes <code>80</code> through
<code>9F</code> as graphic characters. Corrected UTF-8 skips over them
entirely; this is why the offset for two-byte sequences is 160 rather
than 128. (I would <em>like</em> to discard almost all of the C0
controls as well—preserving only U+0000 and U+000A—but that would break
ASCII compatibility, which is a step too far.) If there is a need to
represent U+0080 through U+009F, perhaps for round-tripping historical
documents, they can be mapped to some convenient private-use
codepoints.</p>
<p>Similarly, the only reason the surrogate space (U+D800 through
U+DFFF) exists is to support UTF-16. These codepoints will never appear
in well-formed Unicode text, and the current generation of the UTF spec
actually forbids the three-byte sequences <code>ED A0 80</code> through
<code>ED BF BF</code> to be emitted or accepted at all, rather like the
overlength sequences. In Corrected UTF-8, we skip this range just like
we do for the C1 controls. (This unfortunately does mean that the
three-byte sequences are split into two ranges with two different
offsets.) Again, programs that need to represent actual surrogates
(perhaps for the same reasons that motivated the creation of <a href="https://simonsapin.github.io/wtf-8/">WTF-8</a>) can map them into
private-use space.</p>
<h2 id="removing-the-artificial-upper-limit">Removing the artificial
upper limit</h2>
<p>The original design of UTF-8 (as <q>FSS-UTF,</q> by Pike and
Thompson; standardized in 1996 by <a href="https://datatracker.ietf.org/doc/html/rfc2044">RFC 2044</a>) could
encode codepoints up to U+7FFF FFFF. In 2003 the IETF changed the
specification (via <a href="https://datatracker.ietf.org/doc/html/rfc3629">RFC 3629</a>) to
disallow encoding any codepoint beyond U+10 FFFF. This was purely
because of internal ISO and Unicode Consortium politics; they rejected
the possibility of a future in which codepoints would exist that
UTF-<em>16</em> could not represent. UTF-16 is now obsolete, so there is
no longer any reason to stick to this upper limit, and at the present
rate of codepoint allocation, the space below U+10 FFFF will be
exhausted in something like 600 years (less if private-use space is not
reclaimed). Text encodings are forever; the time to avoid running out of
space is now, not 550 years from now.</p>
<p>Corrected UTF-8 reverts to the original definition of four-, five-,
and six-byte sequences from RFC 2044; after taking the offsets into
account, the highest encodable code point is U+8421 109F. The encoding
schema could be extended still further by use of the lead bytes
<code>FE</code> and <code>FF</code>, which RFC 2044 leaves undefined.
<code>FE</code> would begin a seven-byte sequence, and <code>FF</code>
would indicate that the unary count of tail bytes extends into the next
byte. <code>1111 1111 110x xxxx</code> would be the first <em>two</em>
bytes of an eight-byte sequence, <code>1111 1111 1110 xxxx</code> would
begin a <em>nine</em>-byte sequence, and so on; in this way the encoding
schema would not have any upper limit at all.</p>
<p>We are leaving that extension for the future, because the original
rationale for not using bytes <code>FE</code> and <code>FF</code>
(avoiding conflicts with UTF-16 byte order marks and Telnet IAC bytes)
is still <em>somewhat</em> relevant, even though both UTF-16 and Telnet
are obsolete. However, to preserve the <em>possibility</em> of longer
byte sequences being used in the future, Corrected UTF-8 decoders MUST
treat sequences beginning with <code>FE</code> or <code>FF</code> as
<q>reserved for future use</q> and as extending until the next
recognized lead byte, rather than as <q>invalid.</q></p>
<h2 id="putting-it-all-together">Putting it all together</h2>
<p>Here is a complete table of byte sequences up to 6 bytes long, with
their offsets and the codepoint ranges they encode. Byte and codepoint
values are shown in hexadecimal, offsets in decimal.</p>
<table style="width:100%;">
<colgroup>
<col style="width: 52%"/>
<col style="width: 13%"/>
<col style="width: 33%"/>
</colgroup>
<thead>
<tr class="header">
<th style="text-align: left;">Byte Sequence Range</th>
<th style="text-align: right;">Offset</th>
<th style="text-align: left;">Codepoint Range</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: left;">00 … 7F</td>
<td style="text-align: right;">0</td>
<td style="text-align: left;">0000 0000 … 0000 007F</td>
</tr>
<tr class="even">
<td style="text-align: left;">C0 80 … DF BF</td>
<td style="text-align: right;">160</td>
<td style="text-align: left;">0000 00A0 … 0000 089F</td>
</tr>
<tr class="odd">
<td style="text-align: left;">E0 80 80 … EC BD 9F</td>
<td style="text-align: right;">2 208</td>
<td style="text-align: left;">0000 08A0 … 0000 D7FF</td>
</tr>
<tr class="even">
<td style="text-align: left;">EC BD A0 … EF BF BF</td>
<td style="text-align: right;">4 256</td>
<td style="text-align: left;">0000 E000 … 0001 109F</td>
</tr>
<tr class="odd">
<td style="text-align: left;">F0 80 80 80 … F7 BF BF BF</td>
<td style="text-align: right;">69 792</td>
<td style="text-align: left;">0001 10A0 … 0021 109F</td>
</tr>
<tr class="even">
<td style="text-align: left;">F8 80 80 80 80 … FB BF BF BF BF</td>
<td style="text-align: right;">2 166 944</td>
<td style="text-align: left;">0021 10A0 … 0421 109F</td>
</tr>
<tr class="odd">
<td style="text-align: left;">FC 80 80 80 80 80 … FD BF BF BF BF BF</td>
<td style="text-align: right;">69 275 808</td>
<td style="text-align: left;">0421 10A0 … 8421 109F</td>
</tr>
</tbody>
</table>
<p>The eight-byte sequence EF B7 9D ED B2 AE 00 0A is defined as the
<q>magic number</q> signaling text using Corrected UTF-8. It SHOULD be
present at the beginning of any <em>file</em> encoded in Corrected
UTF-8, but need not be prepended to strings whose encoding is known by
other means. Like <q>byte order marks</q> in UTF-16, when it appears at
the beginning of a file, it should not be considered part of the
text.</p>
<p>This byte sequence is the Corrected encoding of the four-codepoint
sequence U+10E7D U+ED4E U+0000 U+000A. If interpreted as traditional
UTF-8, it instead encodes U+FDDD U+DCAE U+0000 U+000A, which is
forbidden on two counts: U+FDDD is a noncharacter and U+DCAE is a
surrogate (and an unpaired one at that). U+10E7D is <a href="http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3087-1.pdf">RUMI</a>
FRACTION ONE THIRD, and U+ED4E is the private use character assigned by
the <a href="http://www.kreativekorp.com/ucsur/">Under-ConScript Unicode
Registry</a> to <a href="https://norbertlindenberg.com/2018/03/niji-script/index.html">NIJI</a>
CONSONANT CH; these choices are largely arbitrary.</p>
<h2 id="other-legacy-control-characters">Other legacy control
characters</h2>
<p>As mentioned above, the major reason why the C0 controls are still
encodable in Corrected UTF-8 is to preserve compatibility with ASCII,
which is still important. However, these characters are also largely
obsolete; the only one that should appear in a normal text file is
U+000A. The others’ functions are, nowadays, better handled by
binary-safe transport protocols and markup languages, or else they’re
simply redundant.</p>
<p>Because of the common use of the lone byte <code>00</code> as a
string terminator, U+0000 MUST NOT appear in a Corrected UTF-8 document
except as part of the <q>magic number</q> defined above. Corrected UTF-8
documents SHOULD conform to the Unix definition of a text file, which
means that U+000A is used by itself as a line terminator (NOT a line
separator; the last character in the file should be U+000A) and U+000D
and U+2028 SHOULD NOT appear. The other C0 controls, and additionally
U+2029 PARAGRAPH SEPARATOR, also SHOULD NOT appear.</p>I Didn’t Learn Unix By Reading All The Manpages2022-10-13T21:34:10-04:002022-12-01T10:17:57-05:00Zack Weinbergtag:www.owlfolio.org,2022-10-13:/research/i-didnt-learn-unix-by-reading-all-the-manpages/<blockquote>
<p>Originally drafted as <a href="https://hackers.town/@zwol/108936234680866181">a thread on
hackers.town</a>, after <a href="https://floss.social/@abbienormal">Abbie Normal</a> asked me to
expand on a side comment in <a href="https://hackers.town/@zwol/108861581410003388">a discussion of
documentation</a>.</p>
</blockquote>
<p>There’s a story old Unix beards tell about how they learned Unix.
<q>We just read all the manpages,</q> they say, <q>that’s how well
written they are, you don’t need to read anything else or take any
classes. Maybe also pick up a copy of <a href="https://en.wikipedia.org/wiki/The_C_Programming_Language">K&R</a>
if you’re a little iffy on C.</q></p>
<p>I consider myself an old Unix beard, even though I don’t have a beard
and I only got into the game in the days of SunOS 4.1, and until quite
recently I thought this was how <em>I</em> learned Unix. I <em>did</em>
read all the manpages, without any formal coursework, and trained myself
up as a programmer to the point where I could get a job in the industry.
It took three years of self-study and experimentation, consuming nearly
all my free time, and in retrospect I wouldn’t recommend the experience,
but, y’know, it worked out, right?</p>
<p>But the thing is, this story completely neglects all the things I’d
already learned about computers and programming <em>before</em> I got to
college.</p>
<blockquote>
<p>Originally drafted as <a href="https://hackers.town/@zwol/108936234680866181">a thread on
hackers.town</a>, after <a href="https://floss.social/@abbienormal">Abbie Normal</a> asked me to
expand on a side comment in <a href="https://hackers.town/@zwol/108861581410003388">a discussion of
documentation</a>.</p>
</blockquote>
<p>There’s a story old Unix beards tell about how they learned Unix.
<q>We just read all the manpages,</q> they say, <q>that’s how well
written they are, you don’t need to read anything else or take any
classes. Maybe also pick up a copy of <a href="https://en.wikipedia.org/wiki/The_C_Programming_Language">K&R</a>
if you’re a little iffy on C.</q></p>
<p>I consider myself an old Unix beard, even though I don’t have a beard
and I only got into the game in the days of SunOS 4.1, and until quite
recently I thought this was how <em>I</em> learned Unix. I <em>did</em>
read all the manpages, without any formal coursework, and trained myself
up as a programmer to the point where I could get a job in the industry.
It took three years of self-study and experimentation, consuming nearly
all my free time, and in retrospect I wouldn’t recommend the experience,
but, y’know, it worked out, right?</p>
<p>But the thing is, this story completely neglects all the things I’d
already learned about computers and programming <em>before</em> I got to
college.</p>
<p>As a kid I read every single book in the house, indiscriminately, no
matter how boring it would seem to an adult—including a bunch of old
computer science textbooks that we had for some reason. And I spent a
bunch of time tinkering around with computer programming, mostly
<em>not</em> on Unix and <em>not</em> in C, but still. When I came to
the manpages I had the beginnings of a conceptual structure for
understanding systems programming in my brain already.</p>
<p>I realized this only because of the experience I’ve had over the past
two years in <em>teaching</em> computer science—specifically, CMU’s
<q><a href="https://www.cs.cmu.edu/~213/">Introduction to Computer
Systems</a></q> course, which, at first glance, <em>seems</em> to have a
lot of overlap with the content I used to think I learned from the
manpages. But as I repeat the lessons for each new batch of
undergraduates, and especially as I spend time helping them with the
specific things they get stuck on, I’ve come to realize that what I’m
actually teaching is the stuff I <em>already knew</em> when I
<em>started</em> reading the manpages. And, also, that I would not have
been able to get much out of the manpages if I hadn’t already known that
stuff.</p>
<p>So, okay, why does it <em>matter</em> that I didn’t learn my trade
the way I thought I did? Because, first, holding up <q>the manpages</q>
as ideal documentation is a mistake. They are pretty darn good
<em>reference</em> documentation within their domain, and that’s why
they appeal so much to experts: if you <em>already know</em> most of
what there is to know about a standard C library function, or a Unix
shell command, and you just need a bit of a reminder on how to do one
specific thing, the manpages will not let you down. Reference
documentation for other languages and tools is often frustrating by
comparison. But, if you <em>don’t</em> already know, if you don’t have
the concepts and the mental models, reference documentation is <em>not
what you want</em>. Instead you need a guide, or a textbook.</p>
<p>(I <em>did</em> read lots and lots of guides and textbooks, before,
during, and after those three years of self-study and experimentation.
I’d expect that <em>most</em> of the people who call themselves Unix
beards had done the same. In retrospect, I got so much more out of <q><a href="https://www.forth.com/starting-forth/">Starting FORTH</a></q> and
<q><a href="https://www.powells.com/book/tcpip-network-administration-9780596002978">TCP/IP
Network Administration</a></q> and the <a href="https://www.softwarepreservation.org/projects/LISP/book/Weismann_LISP1.5_Primer_1967.pdf">Lisp
1.5 Primer</a> and some 1975-vintage algorithms textbook whose name I
can’t remember (all the examples were in Pascal) than I ever got out of
the manpages.)</p>
<p>The people who insist that the manpages are all you need will
sometimes dismiss guide-type documentation as tedious to work through;
they’d rather learn things from a reference, they say, because that way
they can jump around in it and look for the specific bits that are
relevant to them right now. And that’s fine—if they’re right that the
stuff they’re skipping over <em>isn’t</em> relevant to them. But it also
has negative practical consequences.</p>
<p>If you are in the habit of reading only the bits of the
documentation, whatever documentation you have, that you think are
relevant right now, you’re liable to come away with a mental model
that’s only vaguely accurate; possibly dangerously <em>in</em>accurate
in places. (I think this is a lot of why the user commentary at the
bottom of the online PHP documentation is so full of bad advice.) And,
if you are only interested in reference documentation yourself, you’re
probably not going to try to <em>write</em> guides for the software you
yourself write. (This is how we get monstrosities like the Git
documentation, that are <em>only</em> of any use to someone who already
knows how it works and just needs a bit of a reminder.)</p>
<p>Furthermore—this isn’t just about bad documentation. When experts
repeat inaccurate stories about how <em>we</em> learned to code, we’re
setting the next generation of hackers up to <em>fail</em> to learn to
code. The <a href="http://www.catb.org/jargon/html/">Jargon File</a>,
which records how the generation of programmers <em>before</em> mine
thought they learned to code, holds up the experience of devoting all of
your free time to learning computers as <em>necessary</em>:</p>
<blockquote>
<p><a href="http://www.catb.org/jargon/html/L/larval-stage.html">larval
stage</a> <em>n.</em> Describes a period of monomaniacal concentration
on coding apparently passed through by all fledgling hackers…the ordeal
seems to be necessary to produce really wizardly (as opposed to merely
competent) programmers</p>
</blockquote>
<p>This may have <em>seemed</em> to be true when it was written
(although I believe I smell a variation of the <a href="https://en.wikipedia.org/wiki/Escalation_of_commitment">sunk cost
fallacy</a> at work) but it doesn’t leave any room for people who don’t
learn things by monomaniacally concentrating on them. (This is not the
only place where the Jargon File’s authors failed to imagine how people
whose brains worked differently than theirs could be any good at
computers.) I’m pretty confident that someone who practices programming
strictly as a hobby, less than ten hours a week, will eventually get
just as good at it as one of these <q>fledgling hackers</q> who doesn’t
do anything else with their spare time. And, if we <em>tell</em> people
that, they won’t get put off the subject by the unappetizing prospect of
not getting to hang out with friends on the weekends.</p>Strengths, weaknesses, opportunities, and threats facing the GNU Autotools2021-01-19T16:05:00-05:002022-11-27T19:47:00-05:00Zack Weinbergtag:www.owlfolio.org,2021-01-19:/development/autoconf-swot/<p>I’ve been a contributor to GNU projects for many years, notably both
GCC and GNU libc, and recently I led the effort to make the first
release of Autoconf since 2012 (<a href="https://lists.gnu.org/archive/html/autoconf/2020-12/msg00002.html">release
announcement for Autoconf 2.70</a>). For background and context, see <a href="https://lwn.net/Articles/834682/">the LWN article my colleague
Sumana Harihareswara of Changeset Consulting wrote</a>.</p>
<p>Autoconf not having made a release in eight years is a symptom of a
deeper problem. Many GNU projects, including all of the other components
of the Autotools (<a href="https://www.gnu.org/software/automake/">Automake</a>, <a href="https://www.gnu.org/software/libtool/">Libtool</a>, <a href="https://www.gnu.org/software/gnulib/">Gnulib</a>, etc.) and the
software they depend upon (<a href="https://www.gnu.org/software/m4/">GNU M4</a>, <a href="https://www.gnu.org/software/make/">GNU Make</a>, etc.) have seen
a steady decline in both contributor enthusiasm and user base over the
past decade. I include myself in the group of declining enthusiasts; I
would not have done the work leading up to the Autoconf 2.70 release if
I had not been paid to do it. (I would like to say thank you to the
project funders: Bloomberg, Keith Bostic, and the GNU Toolchain Fund of
the FSF.)</p>
<p>The Autotools are in particularly bad shape due to the decline in
contributor enthusiasm. Preparation for the Autoconf 2.70 release took
almost twice as long as anticipated; I made five beta releases between
July and December 2020, and merged 157 patches, most of them bugfixes.
On more than one occasion I was asked why I was going to the
trouble—isn’t Autoconf (and the rest of the tools by implication)
thoroughly obsolete? Why doesn’t everyone switch to something newer,
like <a href="https://cmake.org/">CMake</a> or <a href="https://mesonbuild.com/">Meson</a>? (See the comments on Sumana’s
LWN article for examples.)</p>
<p>I personally don’t think that the Autotools are obsolete, or even all
that much more difficult to work with than some of the alternatives, but
it <em>is</em> a fair question. Should development of the Autotools
continue? If they are to continue, we need to find people who have the
time and the inclination (and perhaps also the funding) to maintain them
steadily, rather than in six-month release sprints every eight years. We
also need a proper roadmap for where further development should take
these projects. As a starting point for the conversation about whether
the projects should continue, and what the roadmap should be, I was
inspired by Sumana’s book in progress on open source project management
(<a href="https://changeset.nyc/resources/getting-unstuck-sampler-offer.html">sample
chapters are available from her website</a>) to write up a <q>strengths,
weaknesses, opportunities, and threats</q> analysis of Autotools.</p>
<p>This inventory can help us figure out how to build on new
opportunities, using the Autotools’ substantial strengths, and where to
invest to guard against threats and shore up current weaknesses.</p>
<p>Followup discussion should go to <a href="https://lists.gnu.org/mailman/listinfo/autoconf">the Autoconf
mailing list</a>.</p>
<p>I’ve been a contributor to GNU projects for many years, notably both
GCC and GNU libc, and recently I led the effort to make the first
release of Autoconf since 2012 (<a href="https://lists.gnu.org/archive/html/autoconf/2020-12/msg00002.html">release
announcement for Autoconf 2.70</a>). For background and context, see <a href="https://lwn.net/Articles/834682/">the LWN article my colleague
Sumana Harihareswara of Changeset Consulting wrote</a>.</p>
<p>Autoconf not having made a release in eight years is a symptom of a
deeper problem. Many GNU projects, including all of the other components
of the Autotools (<a href="https://www.gnu.org/software/automake/">Automake</a>, <a href="https://www.gnu.org/software/libtool/">Libtool</a>, <a href="https://www.gnu.org/software/gnulib/">Gnulib</a>, etc.) and the
software they depend upon (<a href="https://www.gnu.org/software/m4/">GNU M4</a>, <a href="https://www.gnu.org/software/make/">GNU Make</a>, etc.) have seen
a steady decline in both contributor enthusiasm and user base over the
past decade. I include myself in the group of declining enthusiasts; I
would not have done the work leading up to the Autoconf 2.70 release if
I had not been paid to do it. (I would like to say thank you to the
project funders: Bloomberg, Keith Bostic, and the GNU Toolchain Fund of
the FSF.)</p>
<p>The Autotools are in particularly bad shape due to the decline in
contributor enthusiasm. Preparation for the Autoconf 2.70 release took
almost twice as long as anticipated; I made five beta releases between
July and December 2020, and merged 157 patches, most of them bugfixes.
On more than one occasion I was asked why I was going to the
trouble—isn’t Autoconf (and the rest of the tools by implication)
thoroughly obsolete? Why doesn’t everyone switch to something newer,
like <a href="https://cmake.org/">CMake</a> or <a href="https://mesonbuild.com/">Meson</a>? (See the comments on Sumana’s
LWN article for examples.)</p>
<p>I personally don’t think that the Autotools are obsolete, or even all
that much more difficult to work with than some of the alternatives, but
it <em>is</em> a fair question. Should development of the Autotools
continue? If they are to continue, we need to find people who have the
time and the inclination (and perhaps also the funding) to maintain them
steadily, rather than in six-month release sprints every eight years. We
also need a proper roadmap for where further development should take
these projects. As a starting point for the conversation about whether
the projects should continue, and what the roadmap should be, I was
inspired by Sumana’s book in progress on open source project management
(<a href="https://changeset.nyc/resources/getting-unstuck-sampler-offer.html">sample
chapters are available from her website</a>) to write up a <q>strengths,
weaknesses, opportunities, and threats</q> analysis of Autotools.</p>
<p>This inventory can help us figure out how to build on new
opportunities, using the Autotools’ substantial strengths, and where to
invest to guard against threats and shore up current weaknesses.</p>
<p>Followup discussion should go to <a href="https://lists.gnu.org/mailman/listinfo/autoconf">the Autoconf
mailing list</a>.</p>
<h2 id="strengths">Strengths</h2>
<p>In summary: as the category leader for decades, the Autotools benefit
from their architectural approach, interoperability, edge case coverage,
standards adherence, user trust, and existing install base.</p>
<ul>
<li>Autoconf’s feature-based approach to compiled-code portability
scales better than lists of system quirks.</li>
<li>The Autotools carry 30+ years’ worth of embedded knowledge about
portability traps for C programs and shell-based build scripting on Unix
(and to a lesser extent Windows and others), including variants of Unix
that no other comparable configuration tool supports.</li>
<li>Autoconf and Automake support cross-compilation better than
competing build systems.</li>
<li>Autoconf and Automake support software written in multiple languages
better than some competing build systems (but see below).</li>
<li>Autoconf is very extensible, and there are lots of third-party
<q>macros</q> available.</li>
<li>Tarball releases produced by Autotools have fewer build dependencies
than tarball releases produced by competing tools.</li>
<li>Tarball releases produced by Autotools have a predictable,
standardized (literally; it’s a key aspect of the <q><a href="https://www.gnu.org/prep/standards/html_node/index.html">GNU
Coding Standards</a></q>) interface for setting build-time options,
building them, testing them, and installing them.</li>
<li>Automake tries very hard to generate Makefiles that will work with
<em>any</em> Make implementation, not just GNU make, and not even just
(GNU or BSD) make.</li>
<li>The Autotools have excellent reference-level documentation (better
than CMake and Meson’s).</li>
<li>As they are GNU projects, users can have confidence that Autotools
are and will always remain Free Software.</li>
<li>Relatedly, users can trust that architectural decisions are not
driven by the needs of particular large corporations.</li>
<li>There is a large installed base, and switching to a competing build
system is a lot of work.</li>
</ul>
<h2 id="weaknesses">Weaknesses</h2>
<p>In summary: Autoconf’s core function is to solve a problem that
software developers, working primarily in C, had in the 1990s/early
2000s (during the Unix wars). System programming interfaces have become
much more standardized since then, and the shell environment, much less
buggy. Developers of new code, today, looking at existing configure
scripts and documentation, cannot easily determine which of the
portability traps Autoconf knows about are still relevant to them.
Similarly, maintainers of older programs have a hard time knowing which
of their existing portability checks are still necessary. And weak
coordination with other Autotools compounds the issue.</p>
<h3 id="autoconf">Autoconf</h3>
<ul>
<li>Autoconf (and the rest of the Autotools) are written in a
combination of four old and difficult programming languages: Bourne
shell, the portable subset of Make, Perl, and M4. Competing build
systems tend to use newer, more ergonomic languages, which both makes it
easier for them to get things done, and makes it easier for them to
attract new developers.</li>
<li>All the supported languages except C and C++ are second-class
citizens.</li>
<li>The set of languages that are supported has no particular rationale.
Several new and increasingly popular compiled-code languages (e.g. Swift
and Rust) are not supported, while oddities like Erlang are.</li>
<li>Much of that 30 years’ worth of embedded knowledge about portability
traps is obsolete. There’s no systematic policy for deciding when some
problem is too obsolete to worry about anymore.</li>
<li>Support for <em>newer</em> platforms, C standard editions, etc. is
weaker than support for older things.</li>
<li>Autoconf’s extensibility is unsystematic; many of those third-party
macros reach into its guts, and do things that create awkward
compatibility constraints on core development. Same for existing
<code>configure.ac</code>s.</li>
<li>The code quality of third-party macros varies widely; bad
third-party macros reflect poorly on Autoconf proper.</li>
<li>Some of the ancillary tools distributed with Autoconf don’t work
well; most importantly, autoupdate (which is <em>supposed</em> to patch
a <code>configure.ac</code> to bring it in line with current Autoconf’s
recommendations) is so limited and unreliable that it might be better
not to have it at all.</li>
<li>Feature gaps in GNU M4 hold back development of Autoconf.</li>
</ul>
<h3 id="the-autotools-as-a-whole">The Autotools as a whole</h3>
<ul>
<li>There are few active developers and no continuing funders.</li>
<li>GNU project status discourages new contributors because of the
paperwork requirements and the perceived lack of executive-level
leadership.</li>
<li>There is no continuous integration and no culture of code review.
Test suites exist but are not comprehensive enough (and at the same time
they’re very slow).</li>
<li>Bugs, feature requests, and submitted patches are not tracked
systematically. (This is partially dependent on <a href="https://www.fsf.org/blogs/sysadmin/the-fsf-tech-team-doing-more-for-free-software">FSF/GNU
infrastructure improvements</a> which are indefinitely delayed.)</li>
<li>There’s a history of releases breaking compatibility, and thus
people are hesitant to upgrade. At the same time, Linux distributions
actively want to force-upgrade everything they ship to ensure
architecture support, leading to upstream/downstream friction.</li>
<li>Guide-level documentation is superficial and outdated.</li>
<li>Building an Autotools-based project directly from its VCS checkout
is often significantly harder than building it from a tarball release,
and may involve tracking down and installing any number of unusual
tools.</li>
<li>The Autotools depend on other GNU software that is not actively
maintained, most importantly GNU M4, and to a lesser extent GNU
Make.</li>
<li>Coordination among the Autotools is weak, even though the tools are
tightly coupled to each other. There are portions of codebases that
exist solely for interoperability with other tools in the toolchain,
which leads to overlapping maintainer and reviewer responsibility, slow
code review and inconvenient copyright assignment processes multiplying,
and causing confusion and dropped balls. For instance, there is code
shared among Autoconf, Automake, and/or Gnulib by copying files between
source repositories; changes to these files are extra inconvenient. The
lack of coordination also makes it harder for tool maintainers to
deprecate old functionality, or to decouple interfaces to make things
more extensible; maintainers do not negotiate policies with each other
to help. For instance, Autoconf has trouble knowing when it is safe to
remove internal kludges that old versions of Automake depend on, and
certain shell commands (e.g. aclocal) are distributed with one package
but abstractly belong to another.</li>
<li>Division of labor among the Autotools, and the sources of
third-party macros, is ad-hoc and unclear. (Which macros should be part
of Autoconf proper? Which should be part of Gnulib? Which should be part
of the Autoconf Macro Archive? Which should be shipped with Automake?
Which tools should autoreconf know how to run? Etc.)</li>
<li>Automake and Libtool are not nearly as extensible as Autoconf
is.</li>
<li>Unlike several competitors, Automake <em>only</em> works with Make,
not with newer build drivers (e.g. Ninja).</li>
<li>Because Automake tries to generate Makefiles that will work with
<em>any</em> Make implementation, the Makefiles it generates are much
more complicated and slow than they would be if they took advantage of
GNU and/or BSD extensions.</li>
<li>Libtool is notoriously slow, brittle, and difficult to modify (even
worse than Autoconf proper). This is partially due to technical debt and
partially due to maintaining support for completely obsolete platforms
(e.g. old versions of AIX).</li>
<li>Libtool has opinions about the proper way to manage shared libraries
that Linux distributions actively disagree with, forcing them to kludge
around its code during package builds.</li>
<li>Alternatives to Libtool have all failed to gain traction, largely
because Automake only supports building shared libraries using Libtool
or an <em>exact</em> drop-in replacement.</li>
</ul>
<h2 id="opportunities">Opportunities</h2>
<p>Because of its extensible architecture, install base, and wellspring
of user trust, Autotools can react to these industry changes and thus
spur increases in usage, investment, and developer contribution.</p>
<ul>
<li>Renewed interest in Autotools due to the Autoconf 2.70 release.</li>
<li>Renewed interest in systems programming due to the new generation of
systems programming languages (Go, Rust, D, Swift(?), Dart(?), etc. may
create an opportunity for a build system that handles them well
particularly if it handles polyglot projects well (see below).</li>
<li>Cross-compilation is experiencing new appeal because of the
increasing popularity of ARM and RISC-V CPUs, and of small devices (too
small to compile their own code) based on these chips.</li>
<li>The Free software ecosystem as a whole would benefit from a
reconciliation between the traditional model of software distribution
(compiled code with stable interfaces, released as tarballs at regular
intervals, installed once on any given computer and depended on as
shared libraries and/or binaries) and the newer <q>depend directly on
VCS checkouts and bundle everything</q> model described below. Autotools
contributors have the experience and knowledge to lead this effort.</li>
<li>Funding may be available for projects targeting the weaknesses
listed above.</li>
</ul>
<h2 id="threats">Threats</h2>
<p>These threats may lead to a further decrease in Autotools developer
contribution, funding, and momentum.</p>
<ul>
<li>Increasing mindshare of competing projects (CMake, Meson, <a href="https://gn.googlesource.com/gn">Generate-Ninja</a>, …).</li>
<li>Increasing mindshare of programming languages that come with a build
system that works out of the box, as long as you only use that one
language in your project. (These systems typically cannot handle a
polyglot project at all, hence the above opportunity for a third-party
system that handles polyglot projects well.)</li>
<li>Increasing preference for building software from VCS checkouts
(perhaps at a specific tag, perhaps not) rather than via tarballs.</li>
<li>Increasing mindshare of the software distribution model originated
by Node.js, Ruby, etc. where each application bundles <em>all</em> of
its dependencies. While this is considered a profoundly bad idea by
Linux distribution maintainers in particular (because it makes it much
harder to find and patch a buggy dependency) and makes it harder for
end-users to modify the software (because out-of-date dependencies may
be very different from what their own documentation—describing the
latest version—says), it is significantly more convenient for upstream
developers. Competing build systems handle this model much better than
Autoconf does.</li>
</ul>
<hr/>
<p>Thanks to <a href="https://changeset.nyc/">Sumana Harihareswara</a>
for inspiration and editing.</p>
<p>Followup discussion should go to <a href="https://lists.gnu.org/mailman/listinfo/autoconf">the Autoconf
mailing list</a>.</p>Open beta for ICLab TagTeam2018-03-19T11:39:00-04:002022-11-27T19:47:00-05:00Zack Weinbergtag:www.owlfolio.org,2018-03-19:/research/open-beta-for-iclab-tagteam/<p>I’m pleased to announce the open beta test of <a href="https://iclab.org/">ICLab</a>’s clearinghouse for data about
censored websites. This site will aggregate manual and automated test
reports, facilitate more efficient use of automated test resources, and
help policy analysts draw conclusions about <em>what</em> gets censored
in particular countries.</p>
<p><b>[EDIT 19 Jan 2021:</b> <em>The clearinghouse had to be taken down
almost immediately because no one had time to maintain it. Someday the
project it is part of may be continued. Read on for details on what we
had and what we aspired to.</em><b>]</b></p>
<p>I’m pleased to announce the open beta test of <a href="https://iclab.org/">ICLab</a>’s clearinghouse for data about
censored websites. This site will aggregate manual and automated test
reports, facilitate more efficient use of automated test resources, and
help policy analysts draw conclusions about <em>what</em> gets censored
in particular countries.</p>
<p><b>[EDIT 19 Jan 2021:</b> <em>The clearinghouse had to be taken down
almost immediately because no one had time to maintain it. Someday the
project it is part of may be continued. Read on for details on what we
had and what we aspired to.</em><b>]</b></p>
<p>… Well, that’s the aspiration, anyway. Right now what we have is a
slightly reskinned instance of the <a href="https://cyber.harvard.edu/">Berkman Center</a>’s <a href="https://github.com/berkmancenter/tagteam/">TagTeam</a> software,
loaded up with a set of sites reported as censored in leaks and so on
(mostly about five years old) and the automated topic analysis I
described in my <a href="https://research.owlfolio.org/pubs/2017-topics-controversy.pdf">PETS
paper last year</a>, and taking one ongoing input feed, from <a href="https://cyber.harvard.edu/research/herdict">Herdict</a>. I said it
was a beta test. <code>:-)</code></p>
<p>If any of the above sounds interesting to you, there are a bunch of
ways you can help:</p>
<ul>
<li><p>The most important thing I need right now is additional
inputs:</p>
<ul>
<li>Ongoing, manually curated reports of censored websites in a specific
country (e.g. Engelli Web, rublacklist.net).</li>
<li>Ongoing crowdsourced reports of inaccessible websites (like
Herdict).</li>
<li>Recent, credible one-time leaks of the actual blacklist used in some
country, or shipped with some specific commercial <q>filtering</q>
software.</li>
<li>Control groups: relatively low-volume feeds of long-tail material
that <em>isn’t</em> particularly likely to get censored. (We already
have the tall head.)</li>
</ul>
<p>The optimal format for a continuously updated data source is an RSS
feed that can be directly added to TagTeam as an <q>input.</q> If that’s
not available, the next best thing is a screen-scraper that takes the
existing website or whatever and converts it to an RSS feed (we already
have infrastructure for this; send a pull request to <a class="uri" href="https://bitbucket.org/elwoz/iclab-topic-pipeline">https://bitbucket.org/elwoz/iclab-topic-pipeline</a>, adding
a program to the <q>input-feeds</q> directory, and I’ll take it from
there).</p>
<p>The optimal format for a one-time source is whatever you have, I’m
going to have to write a custom import script for it regardless.
<code>:-/</code></p></li>
<li><p>The second most helpful thing would be manual verification of the
topic labels assigned by my old analysis. <q>Simply</q> create an
account on the site, and then go through the sites that already have a
topic:something tag and add more tags indicating whether that is
accurate. Please get in touch with me first so we can coordinate
efforts.</p>
<p>This task does not require a lot of technical skill, but it does need
a lot of time and patience, and a strong stomach for the nasty
underbelly of the Internet, ranging from garden-variety pornography all
the way up to active advocacy for genocide. Fluency in diverse natural
languages will also be helpful; the top five after English are Chinese,
Japanese, Russian, Arabic, and Persian. Finally, many sites have been
taken over by spam and/or malware, so you’ll want to use a disposable
and locked-down browser instance.</p></li>
<li><p>General poking at the site, kicking the tires, finding things
that don’t work and telling me about them is also very helpful. (I
already know about the missing documentation.)</p></li>
<li><p>If you have any experience hacking Ruby on Rails, I need all the
help I can get upstreaming my changes to TagTeam and developing further
extensions that we’re going to need.</p></li>
<li><p>If you have any nonzero level of skill with web, graphic, and/or
UI design, I also need help improving the presentation of the
site.</p></li>
<li><p>Anyone who runs ongoing, automated monitoring for censorship, on
any scale from one city to the whole world, is invited to get in touch
to talk about how my data might help you do it better.</p></li>
<li><p>If you have ideas for interesting <em>uses</em> for a large
collection of possibly-censored websites with extracted text and topic
labels, or interesting analyses we could run on it, please also get in
touch.</p></li>
</ul>
<p>Please note that account creation is manual right now—after filling
out the sign-up form, email me at <a class="email" href="mailto:zackw@cmu.edu">zackw@cmu.edu</a> and tell me the handle you picked plus a
little about who you are and what you propose to do with the
account.</p>
<p>Reproduction and dissemination of this announcement is
encouraged.</p>A simple ritual for laying to rest domestic ghosts2017-10-31T13:36:37-04:002022-11-27T19:47:00-05:00Zack Weinbergtag:www.owlfolio.org,2017-10-31:/possibly-useful/domestic-ghosts/<blockquote>
<p>In honor of the feast of All Souls, I thought I might put on a
costume, as it were, and write a blog post as if I were an old English
cunning man and you, my readers, came to me for advice on supernatural
problems, rather than computational ones.</p>
</blockquote>
<p>So your house is haunted. You don’t know who the ghosts were in life,
and you’re maybe a bit scared to find out, but you would like to gently
encourage them to let go of their troubles and move on. I have for you a
simple ritual involving a little of the old rune-magic.</p>
<blockquote>
<p>In honor of the feast of All Souls, I thought I might put on a
costume, as it were, and write a blog post as if I were an old English
cunning man and you, my readers, came to me for advice on supernatural
problems, rather than computational ones.</p>
</blockquote>
<p>So your house is haunted. You don’t know who the ghosts were in life,
and you’re maybe a bit scared to find out, but you would like to gently
encourage them to let go of their troubles and move on. I have for you a
simple ritual involving a little of the old rune-magic.</p>
<p>You will need these materials:</p>
<ul>
<li>A small piece of paper. A 3x5” card is plenty big enough. The paper
does not need to be anything special, but ideally it would be unlined,
heavy, and an unremarkable color, the sort of thing you would write a
formal invitation on.</li>
<li>A pen, marker, or writing brush whose ink is not soluble in water or
alcohol. The ink should be black or dark blue.</li>
<li>A small glass (no more than one mouthful of liquid) of whatever you
would drink at a memorial wake for a close friend. If you’re not sure
what would be appropriate, red wine is a safe choice. If you never drink
alcohol, fruit juice will serve. Do not, however, use plain water or any
variety of milk.</li>
<li>Something with which to attach the paper to the inside wall above
your front door.</li>
</ul>
<p>Draw the following seven symbols on one side of the paper, as big as
you can. It would be a good idea to use a pencil and a ruler to divide
up the space so you don’t start too big and then have to cram the last
few in at the end; it’s important that the first three and the last
three have equal significance. Leave a little extra space on either side
of the middle symbol. Erase any pencil marks after you’re done.</p>
<figure>
<img alt="ᚦᛟᛁᛃᛖᛞᚹ" src="/possibly-useful/ghosts/runes.png"/>
</figure>
<p>These are Old Norse runes. They don’t add up to a word in Icelandic
or anything like that; we are using them for the meanings of the
individual letters, which are:</p>
<ul>
<li><strong>ᚦ</strong> THORN, a curse, a misfortune, a hammer; ghosts
are a curse to the living, but they are also themselves cursed, unable
to leave the world where they no longer belong. Draw this one
backward—with the point to the left—to make it the symbol of what we
seek to bring to an end.</li>
<li><strong>ᛟ</strong> OTHILA, inheritance, fate; the cause of the
curse; whatever debts the ghost may owe the world, or the world owe the
ghost, that keeps it here.</li>
<li><strong>ᛁ</strong> ISA, ice, the nature of the curse: something that
cannot change when it should.</li>
<li><strong>ᛃ</strong> JERA, years, time, change, the cycle of life.
This is the pivotal rune of the working. All things, both good and ill,
shall pass away in their time, and new things shall come in their place.
No joyful thing can endure forever, but neither can any curse.</li>
<li><strong>ᛖ</strong> EHWAZ, horse, motion. What is frozen shall move
again; what is trapped shall be released.</li>
<li><strong>ᛞ</strong> DAGAZ, daylight, dawn, emergence. The end of a
curse, the lifting of a burden, the forgiveness of a debt.</li>
<li><strong>ᚹ</strong> WUNJO, joy, blessing, liberty. The negative of
THORN, and the intended outcome of the working.</li>
</ul>
<p>At twilight, take the drink and the paper and stand just inside the
open doorway of your house, facing outward. It is better to do this on
the first night after a new moon, but if there is some urgency to the
matter you don’t need to wait. Dip one finger into the drink and trace
the first of the runes with it; repeat for each rune. (This is called
<em>staining</em> the runes. The tradition here originally calls for
blood, but for an amateur working involving the dead, that would be an
unwise choice.)</p>
<p>Next, attach the paper to the inside wall above the doorway, with the
runes facing the wall (that is, blank side visible, runed side toward
the outside of the house). Take care not to mar the rune lines with a
pinhole or a glob of sticky goo or anything like that.</p>
<p>Step to one side of the doorway, raise the glass and recite a short
prayer for the dead, from whichever religion you feel the most emotional
connection with. If you have no emotional connection to any religion at
all, make up your own brief benediction. I can’t help you with that,
because it has to be emotionally meaningful for <em>you</em>. Drink the
drink, all in one go. Close the door.</p>
<p>Leave the paper in place for a complete lunar cycle (29 days). At
weekly intervals during this time, repeat the part of the ritual where
you stand to one side of the open door at twilight, say a prayer for the
dead, and drink. At the end of the cycle, again at twilight, take the
paper down and carry it through the door. Standing just outside, hold
the paper at one corner with tongs, set fire to it, and recite the
prayer for the dead while it burns.</p>
<p>If the paper falls down by itself during the lunar cycle, that means
the ghosts have already departed. Leave it where it fell until the next
twilight and then burn it as above; you don’t need to continue with the
ritual after that.</p>
<p>Come back if it doesn’t work.</p>Call for Volunteers: Active Geolocation2016-07-08T08:24:26-04:002022-11-27T19:47:00-05:00Zack Weinbergtag:www.owlfolio.org,2016-07-08:/research/ageo-call-for-volunteers/<p>For the past few months I’ve been working on a research study of
<q>active geolocation</q> algorithms. These attempt to determine where
in the world a computer is, by measuring how long it takes network
messages from that computer to reach other computers in known
locations.</p>
<p>In order to test some of these algorithms thoroughly, I need
volunteers who are willing to run my measurement software on their
computers, and tell me where they are. I’m especially interested in data
reported from computers that are not in Europe nor North America, but
data from anywhere is useful. Currently, running the software takes a
fair bit of technical skill—if you’re not comfortable with the Unix
command line, please wait for the friendlier web-based version which is
in development.</p>
<p>If you’re interested, please go to <a href="https://research.owlfolio.org/active-geo/">https://research.owlfolio.org/active-geo/</a>
for further instructions.</p>
<p>(For legal reasons, you must be at least 18 years old to
volunteer.)</p>
<p>(Reproduction and dissemination of this call for volunteers is
encouraged.)</p>Using GPG2 with a read-only .gnupg directory2016-04-18T11:46:14-04:002022-11-27T19:47:00-05:00Zack Weinbergtag:www.owlfolio.org,2016-04-18:/possibly-useful/readonly-gpg2/<p>Another bulletin funded by the I Just Blew An Entire Morning On This
Foundation:</p>
<p>Suppose you want to encrypt and sign files using <a href="https://www.gnupg.org/"><code>gpg</code></a>, but without giving
it ownership or write access to its own keystore. For instance, this
might be necessary if the <code>gpg</code> process is going to be run
from a low-privilege CGI user and you don’t have root privileges on the
webserver. This is relatively straightforward with the <q>classic</q>
version 1, although there’s an error message that’s harmless but
impossible to suppress, but version 2 made some architectural changes
that make it harder, and does not document the necessary tricks. Below
the fold, how you do it.</p>
<p>Another bulletin funded by the I Just Blew An Entire Morning On This
Foundation:</p>
<p>Suppose you want to encrypt and sign files using <a href="https://www.gnupg.org/"><code>gpg</code></a>, but without giving
it ownership or write access to its own keystore. For instance, this
might be necessary if the <code>gpg</code> process is going to be run
from a low-privilege CGI user and you don’t have root privileges on the
webserver. This is relatively straightforward with the <q>classic</q>
version 1, although there’s an error message that’s harmless but
impossible to suppress, but version 2 made some architectural changes
that make it harder, and does not document the necessary tricks. Below
the fold, how you do it.</p>
<p>First, create the signing key, and import and countersign the public
key to which files will be encrypted. I’m going to assume you already
know how to do that. Using the same version of <code>gpg</code> that the
low-privilege user will use, encrypt and sign a test message; this is
important not only to make sure that everything works in a <q>normal</q>
configuration, but because it may create files you need later. After
doing this, the contents of your <code>.gnupg</code> directory should
look like this:</p>
<pre><code>$ find .gnupg -ls
7013415 19 drwx------ 3 you you 10 Apr 18 14:24 .gnupg
7013416 19 -rw------- 1 you you 9186 Apr 18 11:58 .gnupg/gpg.conf
7013684 3 -rw------- 1 you you 600 Apr 18 13:11 .gnupg/random_seed
7013425 4 -rw------- 1 you you 1360 Apr 18 12:08 .gnupg/trustdb.gpg
7014778 3 drwx------ 2 you you 4 Apr 18 12:38 .gnupg/private-keys-v1.d
7014785 4 -rw------- 1 you you 1376 Apr 18 12:38 .gnupg/private-keys-v1.d/BADC0FFEE0DDF00DBADC0FFEE0DDF00DBADC0FFE.key
7014781 4 -rw------- 1 you you 978 Apr 18 13:11 .gnupg/private-keys-v1.d/B01DFACECAB005EB01DFACECAB005EB01DFACECA.key
7013788 9 -rw------- 1 you you 3740 Apr 18 12:08 .gnupg/pubring.gpg~
7013797 9 -rw------- 1 you you 3740 Apr 18 12:08 .gnupg/pubring.gpg
7013777 6 -rw------- 1 you you 2509 Apr 18 12:07 .gnupg/secring.gpg
7014786 1 -rw-rw-r-- 1 you you 0 Apr 18 12:38 .gnupg/.gpg-v21-migrated</code></pre>
<p>(Sizes and inode numbers and so on will vary, of course.) Copy this
directory over to where the low-privilege user can get at it. Delete
<code>pubring.gpg~</code>, <code>random_seed</code>, and
<code>gpg.conf</code> from the copy. Adjust permissions and group
ownership so that the low-privilege user can read, but not write,
everything:</p>
<pre><code>$ find gnupg-web -ls
7015000 19 drwxr-x--- 3 you web 10 Apr 18 14:24 gnupg-web
7015001 4 -rw-r----- 1 you web 1360 Apr 18 12:08 gnupg-web/trustdb.gpg
7015002 3 drwxr-x--- 2 you web 4 Apr 18 12:38 gnupg-web/private-keys-v1.d
7015003 4 -rw-r----- 1 you web 1376 Apr 18 12:38 gnupg-web/private-keys-v1.d/BADC0FFEE0DDF00DBADC0FFEE0DDF00DBADC0FFE.key
7015004 4 -rw-r----- 1 you web 978 Apr 18 13:11 gnupg-web/private-keys-v1.d/B01DFACECAB005EB01DFACECAB005EB01DFACECA.key
7015006 9 -rw-r----- 1 you web 3740 Apr 18 12:08 gnupg-web/pubring.gpg
7015007 6 -rw-r----- 1 you web 2509 Apr 18 12:07 gnupg-web/secring.gpg
7015008 1 -rw-r----- 1 you web 0 Apr 18 12:38 gnupg-web/.gpg-v21-migrated</code></pre>
<p>Now you need to create a new <code>gpg.conf</code> file in this
directory, with a bunch of special options:</p>
<pre><code>$ cat > gnupg-web/gpg.conf <<!
# modern sane defaults
charset utf-8
openpgp
# fully noninteractive mode
quiet
batch
no-tty
no-greeting
#no-secmem-warning # uncomment if necessary
# no implicit writes to the keystore directory
lock-never
no-auto-check-trustdb
no-random-seed-file
!
$ chgrp web gnupg-web/gpg.conf
$ chmod 640 gnupg-web/gpg.conf</code></pre>
<p>This setup plus an appropriate command-line invocation is sufficient
to make GPG version 1 happy:</p>
<pre><code>$ echo test message |
gpg1 --homedir `pwd`/gnupg-web --no-permission-warning \
--encrypt --sign --recipient ABAD1DEA > test.gpg</code></pre>
<p>but if you try it with version 2 you will get an error message:</p>
<pre><code>gpg: can't connect to the agent: IPC connect call failed</code></pre>
<p>Agent, you say? Why do we need an agent? Isn’t that just to avoid
having to type one’s passphrase all the time? Apparently version 2 is
doing a modest amount of <a href="https://en.wikipedia.org/wiki/Privilege_separation">privilege
separation</a>, and <em>always</em> uses an agent internally to handle
secret keys. That’s the architectural change I mentioned earlier. And
this means it wants to create a socket in the keystore directory, which
it can’t. (50 demerits for not reporting the error properly. This kind
of error message should <em>always</em> name the system call that failed
(it’s not <code>connect</code> in this case!), all the filename(s)
involved, and the decoded system error code.)</p>
<p>To fix this, you can’t just create the socket yourself and give it
appropriate permissions, because <code>gpg</code> expects to be able to
delete and recreate it. You need to move the socket to a writable
directory … but how do we do that? The manpage smugly informs us that
the command-line option that <em>used</em> to move the agent socket
around no longer does anything. There is still a way, but it isn’t
documented anywhere I can find; I got the trick from <a href="https://michaelheap.com/gpg-cant-connect-to-the-agent-ipc-connect-call-failed/">this
blog post</a> and he doesn’t say where he found it. Create a
subdirectory that the low-privilege user <em>can</em> write to, and
place a <q>redirection file</q> where gpg expects to find the
socket:</p>
<pre><code>$ mkdir gnupg-web/agent
$ chmod 775 gnupg-web/agent
$ chgrp web gnupg-web/agent
$ printf '%%Assuan%%\nsocket=%s/gnupg-web/agent/S.gpg-agent\n' \
"$(pwd)" > gnupg-web/S.gpg-agent
$ chmod 640 gnupg-web/S.gpg-agent
$ chgrp web gnupg-web/S.gpg-agent</code></pre>
<p>Having done all this, your <code>gnupg-web</code> directory should
look like this:</p>
<pre><code>$ find gnupg-web -ls
7015000 19 drwxr-x--- 3 you web 10 Apr 18 14:24 gnupg-web
7015001 4 -rw-r----- 1 you web 1360 Apr 18 12:08 gnupg-web/trustdb.gpg
7015002 3 drwxr-x--- 2 you web 4 Apr 18 12:38 gnupg-web/private-keys-v1.d
7015003 4 -rw-r----- 1 you web 1376 Apr 18 12:38 gnupg-web/private-keys-v1.d/BADC0FFEE0DDF00DBADC0FFEE0DDF00DBADC0FFE.key
7015004 4 -rw-r----- 1 you web 978 Apr 18 13:11 gnupg-web/private-keys-v1.d/B01DFACECAB005EB01DFACECAB005EB01DFACECA.key
7015006 9 -rw-r----- 1 you web 3740 Apr 18 12:08 gnupg-web/pubring.gpg
7015007 6 -rw-r----- 1 you web 2509 Apr 18 12:07 gnupg-web/secring.gpg
7015008 1 -rw-r----- 1 you web 0 Apr 18 12:38 gnupg-web/.gpg-v21-migrated
7015009 2 -rw-r----- 1 you web 124 Apr 18 13:22 gnupg2/gpg.conf
7018010 1 drwxrwxr-x 2 you web 2 Apr 18 13:23 gnupg2/agent
7018011 2 -rw-r----- 1 you web 68 Apr 18 13:24 gnupg2/S.gpg-agent</code></pre>
<p>and the command-line invocation shown above should work when executed
as the low-privilege user.</p>2016 Hugo Award nominations2016-04-02T16:36:29-04:002022-11-27T19:47:00-05:00Zack Weinbergtag:www.owlfolio.org,2016-04-02:/fiction/2016-hugo-noms/<p>Let’s talk about something more fun, shall we? These were my
nominations for the <a href="https://www.thehugoawards.org/">2016 Hugo
Awards</a>. The final ballot will be announced on April 26. Hugo
nominations, unlike final ballots, are not ranked. I’d be happy to see
any of these things win their categories.</p>
<p>I read a lot of good stuff at novel-length this year, but not enough
shorter fiction to fill all five nomination slots per category.
Something to work harder on next year, I suppose. (It didn’t help that I
spent most of January and February in paper crunch mode.) I don’t even
try to nominate outside the fiction categories.</p>
<p>Let’s talk about something more fun, shall we? These were my
nominations for the <a href="https://www.thehugoawards.org/">2016 Hugo
Awards</a>. The final ballot will be announced on April 26. Hugo
nominations, unlike final ballots, are not ranked. I’d be happy to see
any of these things win their categories.</p>
<p>I read a lot of good stuff at novel-length this year, but not enough
shorter fiction to fill all five nomination slots per category.
Something to work harder on next year, I suppose. (It didn’t help that I
spent most of January and February in paper crunch mode.) I don’t even
try to nominate outside the fiction categories.</p>
<p>Links go to the full text of the work and to authors’ websites when
possible, otherwise to Goodreads pages.</p>
<h2 id="novel">Novel</h2>
<ul>
<li><p><a href="https://zencho.org/">Zen Cho</a>, <em><a href="https://www.goodreads.com/book/show/23943137-sorcerer-to-the-crown">Sorcerer
to the Crown</a></em>. If you, like me, have been wishing for a sequel
to <em>Jonathan Strange & Mr Norrell</em> since the day you finished
reading it, you will like this book.</p></li>
<li><p><a href="https://www.naominovik.com/">Naomi Novik</a>, <em><a href="https://www.goodreads.com/book/show/22544764-uprooted">Uprooted</a></em>.
Polish folktale crossed with supernatural horror. Online reviews tend to
be all about the characters (whom they either love or hate) but the
really compelling aspect of this one, IMNSHO, is the evil magic
forest.</p></li>
<li><p><a href="https://www.goodreads.com/author/show/4280.Kazuo_Ishiguro">Kazuo
Ishiguro</a>, <em><a href="https://www.goodreads.com/book/show/22522805-the-buried-giant">The
Buried Giant</a></em>. It takes some doing to achieve a new take on the
Matter of Britain nowadays; Ishiguro has pulled it off.</p></li>
<li><p><a href="https://www.goodreads.com/author/show/41194.Judith_Tarr">Judith
Tarr</a>, <em><a href="https://www.goodreads.com/book/show/24290807-forgotten-suns">Forgotten
Suns</a></em>. Three words: space opera <em>archaeology</em>. Why
haven’t people done more of that? Yeah, <em>Stargate</em>, but it was
almost never central to the plot.</p></li>
<li><p><a href="http://www.jowaltonbooks.com/">Jo Walton</a>, <em><a href="https://www.goodreads.com/book/show/22055276-the-just-city">The
Just City</a></em>. Pallas Athene decides to create the allegorical city
from Plato’s <em>Republic</em> in real life, basically to see what
happens.</p></li>
</ul>
<h2 id="novella">Novella</h2>
<ul>
<li><p><a href="https://www.usmanmalik.org/">Usman Malik</a>, <em><a href="https://www.tor.com/2015/04/22/the-pauper-prince-and-the-eucalyptus-jinn-usman-malik/">The
Pauper Prince and the Eucalyptus Jinn</a></em>. Mughal folktale
transposed to the modern world: another <q>why haven’t people done more
of that?</q></p></li>
<li><p><a href="http://www.warrenellis.com/">Warren Ellis</a>, <em><a href="https://www.goodreads.com/book/show/26069714-elektrograd">ELEKTROGRAD:
RUSTED BLOOD</a></em>. And a third <q>there should be more of this</q>
category: steampunk detective noir.</p></li>
<li><p><a href="https://nnedi.com/">Nnedi Okorafor</a>, <em><a href="https://www.goodreads.com/book/show/25667918-binti">Binti</a></em>.
Another unusual space operatic setting. Might not be to everyone’s
taste, particularly if you dislike fish-out-of-water POV
protagonist.</p></li>
</ul>
<h2 id="novelette">Novelette</h2>
<ul>
<li><a href="http://roselemberg.net/">Rose Lemberg</a>, <q><a href="http://www.beneath-ceaseless-skies.com/stories/grandmother-nai-leylits-cloth-of-winds/">Grandmother-nai-Leylit’s
Cloth of Winds</a>.</q> Somewhat clash of cultures, somewhat travelogue,
somewhat coming of age, and almost certainly nothing like anything
you’ve read before.</li>
</ul>
<h2 id="short-story">Short Story</h2>
<ul>
<li><p><a href="https://zencho.org/">Zen Cho</a>, <q><a href="http://www.kaleidotrope.net/archives/spring-2015/monkey-king-faerie-queen-by-zen-cho/">Monkey
King, Faerie Queen</a>.</q> Sun Wukong visits the court of the Queen of
Air and Darkness, and it goes just about as you would expect.</p></li>
<li><p><a href="http://danieljoseolder.net/">Daniel José Older</a>,
<q><a href="https://www.tor.com/2015/01/06/kia-and-gio-daniel-jose-older/">Kia
and Gio</a>.</q> Magical realism set in New York City.</p></li>
<li><p><a href="https://www.marissalingen.com/">Marissa Lingen</a>,
<q><a href="https://www.nature.com/articles/526286a">The Many Media
Hypothesis</a>.</q> What if you were Facebook <q>friends</q> with all
the alternate possible versions of yourself?</p></li>
<li><p><a href="https://unlikelyexplanations.com/">Laura Pearlman</a>,
<q><a href="https://flashfictiononline.com/main/article/i-am-graalnak-of-the-vroon-empire-destroyer-of-galaxies-supreme-overlord-of-the-planet-earth-ask-me-anything/">I
am Graalnak of the Vroon Empire, Destroyer of Galaxies, Supreme Overlord
of the Planet Earth. Ask Me Anything.</a></q> Alien invasion as
Internet-based farce. It made me laugh.</p></li>
</ul>
<h2 id="graphic-story">Graphic Story</h2>
<ul>
<li><p>Sydney Padua, <em><a href="http://sydneypadua.com/2dgoggles/the-book/">The Thrilling
Adventures of Lovelace and Babbage</a></em>. <q>What if Ada Lovelace and
Charles Babbage had successfully constructed an Analytical Engine?</q>
is <em>not</em> new territory—Bruce Sterling did it back in the 80s, and
it’s often implied background for steampunk Victoriana—but doing it as a
humorous graphic novel which is also a detailed work of historical
research, with footnotes and references and everything: that deserves
recognition.</p></li>
<li><p><q>Abbadon,</q> <em><a href="https://killsixbilliondemons.com/">Kill Six Billion
Demons</a></em>. This is worth reading just for the art. And the
incredibly vast world that has been built. The plot is <em>set up</em>
like your standard <q>everygirl rescues love interest in distress,
taking several levels in badass along the way</q> but I doubt that’s
where it’s going.</p></li>
<li><p>Ru Xu, <em><a href="https://www.saintforrent.com/">Saint for
Rent</a></em>. So often you see time travel stories where the time
travel is just a way to put people <em>into</em> the interesting
historical or futuristic situations, and not actually used to its full
power. This is not like that.</p></li>
<li><p>Pascale Lepas, <em><a href="https://www.wildelifecomic.com/">Wilde Life</a></em>. <q>Oscar
rented an old house off craigslist, then things got weird…</q> Creepy
rural <a href="https://the-toast.net/series/chris-kimball/">Vermont</a>
and creepy rural <a href="http://www.welcometonightvale.com/">Arizona</a> are both
well-traveled paths, but how often do you see cheerful-yet-creepy rural
<em>Oklahoma?</em></p></li>
<li><p>Dave Kellett, <em><a href="http://www.drivecomic.com/">Drive</a></em>. Relatively
straightforward space opera, but lots of fun detail and manages to
remain tongue-in-cheek while also running a deadly serious
plot.</p></li>
</ul>Do not do business with Northwest Talent Search2016-04-02T12:02:44-04:002022-11-27T19:47:00-05:00Zack Weinbergtag:www.owlfolio.org,2016-04-02:/possibly-useful/northwest-talent/<p>A depressing number of computer industry recruiters cannot be
bothered to read the very first paragraph of the <q><a href="/contact/">contact information</a></q> page of this very website,
or else they think they are ~special snowflakes~ and it does not apply
to them. For reference, this paragraph reads</p>
<blockquote>
<p>I AM NOT LOOKING FOR A JOB. DO NOT CONTACT ME WITH ANY SORT OF JOB
OFFER.</p>
</blockquote>
<p>I get unwanted solicitations about once a month, and I reply with a
polite but acerbic note about how they should’ve noticed the paragraph
in ALL CAPS that says don’t contact me, and <em>usually</em> that’s the
end of it. Not this time.</p>
<p>A depressing number of computer industry recruiters cannot be
bothered to read the very first paragraph of the <q><a href="/contact/">contact information</a></q> page of this very website,
or else they think they are ~special snowflakes~ and it does not apply
to them. For reference, this paragraph reads</p>
<blockquote>
<p>I AM NOT LOOKING FOR A JOB. DO NOT CONTACT ME WITH ANY SORT OF JOB
OFFER.</p>
</blockquote>
<p>I get unwanted solicitations about once a month, and I reply with a
polite but acerbic note about how they should’ve noticed the paragraph
in ALL CAPS that says don’t contact me, and <em>usually</em> that’s the
end of it. Not this time.</p>
<p>Last week I got one from an outfit calling itself <q>Northwest Talent
Search, Inc.</q> (They don’t have a website.) It does just about
everything wrong:</p>
<blockquote>
<p>Hello Zack</p>
<p>I am working with one of the fastest growing startups in the world on
a Aspiring Software Engineering Manager search. They just landed a major
partnership with a fortune 500 company. If you have an interest in
joining a world class team and an incredible opportunity what would be a
good time for a phone call and a good number to reach you at?</p>
<p>Thanks<br/> [REDACTED]<br/> [REDACTED]@northwesttalentsearch.com</p>
</blockquote>
<p>Besides ignoring the request not to contact me: Why would anyone not
want to know <em>which</em> startup, <em>which</em> megacorp, and at
least the executive summary of the concrete job description? If you’re
going to cold-contact people with job offers, these things should
<em>always</em> appear in the initial message. And anyone who has done
their due diligence on <em>me</em> should know that I’m not the right
candidate for any sort of engineering management position and I’m
allergic to startups. So I was less polite than I usually am, when
replying:</p>
<blockquote>
<p>Thank you for your interest, however:</p>
<ol type="1">
<li><p>I have made it abundantly clear, both on my personal website, and
everywhere recruiters typically trawl for interesting people, that I am
not looking for a job and do not want to be cold-contacted with job
offers.</p></li>
<li><p>I have neither any interest nor any qualifications for an
engineering management position, and I do not understand how you could
possibly have gotten the impression that I might be an appropriate
candidate for such.</p></li>
<li><p>As a matter of basic courtesy, <em>in your initial message</em>
you should have stated the name of the company you are recruiting for
and given a couple sentences’ description of what business they are in
and what the job responsibilities are.</p></li>
</ol>
<p>Never contact me again. Do not even reply to this message.</p>
</blockquote>
<p>Now, if that had been the end of it, you wouldn’t be reading this
post. <em>Today</em> I received <em>this:</em></p>
<blockquote>
<p>Hello Zack</p>
<p>I am working with one of the fastest growing startups in the world on
a Backend Engineer search. They just landed a major partnership with a
fortune 500 company. If you have an interest in joining a world class
team and an incredible opportunity what would be a good time for a phone
call and a good number to reach you at?</p>
<p>Thanks<br/> [SAME PERSON]<br/> [SAME
EMAIL]@northwesttalentsearch.com</p>
</blockquote>
<p>The <em>only change</em> is the job title. <q>Backend Engineer</q> is
<em>less</em> wrong than <q>Aspiring Software Engineering Manager,</q>
but it’s still wrong. And sending another instance of what is evidently
a form letter, after having been told not to contact me again, is both
disrespectful and unprofessional.</p>
<p>Hence what I dearly hope will be my final reply to them, and this
post.</p>
<blockquote>
<p>You sent me a message last week which was word-for-word identical but
for the job title. In my reply, I made it plain that I was not
interested and I did not want to hear from you ever again.</p>
<p>Your continued solicitations are unprofessional, as are the vagueness
of your cold-contact messages (as explained in the previous reply) and
your clear lack of research on <em>me</em> prior to contact.</p>
<p>I have directed [MY MAIL CLIENT] to treat all further messages from
anyone at your company as spam, and I have filed an abuse report with
[YOUR BULKMAIL SERVICE]. I will also be publishing all of our
communications on my website as a warning to others not to do business
with your company.</p>
<p>My previous reply, for reference: [etc]</p>
</blockquote>
<p>If you’re a company looking to hire: Don’t do business with these
clowns, there are people who will do much better by you.</p>
<p>If you’re also getting these: I strongly suspect you don’t want any
of the jobs they are soliciting for.</p>