I recently spent several days improving the OCaml FreeTDS C bindings for work, and I thought it might be useful to share the problems I ran into and how to solve them.

I tried to order things so the most likely issues are listed first, but if you're trying to debug some C binding crashes, I recommend just reading the whole thing.

This post will assume you're already familiar with the official documentation.

value is an alias for int

When debugging errors in C bindings, one of the first things you should check is whether you're consistently using Val_int and Int_val correctly. Unfortunately, value and int are the same type as far as C is concerned, so the compiler won't warn you if you mix these two macros up.

The way to remember which is which is that they're shorthand for "value of int" (Val_int) and "int of value" (Int_val).

Also note that the same problem exists for Bool_val and Val_bool because booleans are also integers in C.

Bad

value why_is_c_so_bad(value unit) {
    CAMLparam1(unit);
    // error, effectively casting 5 to pointer
    CAMLreturn(Int_val(5));
}

Good

value why_is_c_so_bad(value unit) {
    CAMLparam1(unit);
    CAMLreturn(Val_int(5));
}

Follow the rules for CAMLparam and CAMLlocal

Whenever you allocate an OCaml value, you give the garbage collector a chance to run. To ensure that values you're using don't get garbage collected, you have to use the macros CAMLparam, CAMLlocal, and CAMLreturn. As far as I can tell, these are basically the equivalent of the standard ref and unref functions most C libraries have, with some special cases to handle cleanup if your C code throws an OCaml exception.

There are cases where you can avoid this (if you're sure you're not allocating), but this is extremely annoying to debug if you get it wrong, since your code will work 99% of the time. As fun as it is to track down heisenbugs, save yourself the trouble and just use those macros every time you have a variable of type value.

Technically OK

value simplified_dbexec(value vdbconn, value vsql) {
    DBPROCESS* dbconn = DBPROCESS_VALUE(vdbconn);
    RETCODE ret = dbsqlexec(dbconn, String_val(vsql));
    return Val_int(ret);
}

Bad

value simplified_dbexec(value vdbconn, value vsql) {
    DBPROCESS* dbconn = DBPROCESS_VALUE(vdbconn);

    RETCODE ret = dbsqlexec(dbconn, String_val(vsql));
    if (ret == FAIL) {
        // Throw `exception Example_exception of string * string`
        value exn = caml_alloc_small(3, 0);
        Field(exn, 0) = EXAMPLE_EXCEPTION_TAG;
        Field(exn, 1) = caml_copy_string("simplified_dbexec");
        // error, vsql could have been GC'd while allocating vexn
        //        or while copying the first string arg
        // also error, using direct Field assignment after an allocation
        Field(exn, 2) = vsql;
        caml_raise(exn);
    }

    return Val_int(ret);
}

Good

value simplified_dbexec(value vdbconn, value vsql) {
    CAMLparam2(vdbconn, vsql);
    DBPROCESS* dbconn = DBPROCESS_VALUE(vdbconn);
    RETCODE ret = dbsqlexec(dbconn, String_val(vsql));
    CAMLreturn(Val_int(ret));
}

value simplified_dbexec(value vdbconn, value vsql) {
    CAMLparam2(vdbconn, vsql);
    CAMLlocal1(exn);
    DBPROCESS* dbconn = DBPROCESS_VALUE(vdbconn);

    RETCODE ret = dbsqlexec(dbconn, String_val(vsql));
    if (ret == FAIL) {
        exn = caml_alloc_small(3, 0);
        Store_field(exn, 0, EXAMPLE_EXCEPTION_TAG);
        Store_field(exn, 1, caml_copy_string("simplified_dbexec"));
        Store_field(exn, 2, vsql);
        caml_raise(exn);
    }

    CAMLreturn(Val_int(ret));
}

Don't hold onto string data

The garbage collector is allowed to move OCaml data around in memory whenever it runs. This doesn't matter much for primitives like ints, but for strings you need to be aware that the result of String_val is only safe to use until the next allocation.

The easiest way to do this is to just never hold onto the result of String_val.

Technically ok

value simplified_dbexec(value vdbconn, value vsql) {
    CAMLparam2(vdbconn, vsql);
    DBPROCESS* dbconn = DBPROCESS_VALUE(vdbconn);
    const char* sql = String_val(vsql);
    RETCODE ret = dbsqlexec(dbconn, sql);
    CAMLreturn(Val_int(ret));
}

Bad

value simplified_dbexec(value vdbconn, value vsql) {
    CAMLparam2(vdbconn, vsql);
    DBPROCESS* dbconn = DBPROCESS_VALUE(vdbconn);
    const char* sql = String_val(vsql);

    RETCODE ret = dbsqlexec(dbconn, sql);
    if (ret == FAIL) {
        value exn = caml_alloc_small(3, 0);
        Store_field(exn, 0, EXAMPLE_EXCEPTION_TAG);
        Store_field(exn, 1, caml_copy_string("simplified_dbexec"));
        // error, vsql's data could have been moved by the two previous
        //        allocations
        Store_field(exn, 2, caml_copy_string(sql));
        caml_raise(exn);
    }

    CAMLreturn(Val_int(ret));
}

Good

value simplified_dbexec(value vdbconn, value vsql) {
    CAMLparam2(vdbconn, vsql);
    DBPROCESS* dbconn = DBPROCESS_VALUE(vdbconn);
    RETCODE ret = dbsqlexec(dbconn, String_val(vsql));
    CAMLreturn(Val_int(ret));
}

Take the GC rules very seriously if releasing the runtime lock

When you release the runtime lock, OCaml is allowed to run the garbage collector at any time, and move memory around at will. This means you definitely want to follow the rule above about using the CAMLparam and CAMLlocal macros correctly, but additionally, you can't reference any value data until the runtime lock is re-acquired.s

Since we shouldn't store references to strings when the GC might be running, this means we need to copy any string data we need access to while the runtime lock is released.

Another thing to keep in mind is that you also shouldn't mess with the list of local variables or parameters while the lock is released, since you can seriously confuse the garbage collector. Don't call CAMLparam, CAMLlocal, or CAMLreturn if you're not holding the runtime lock. If you have a situation where you need to release the runtime lock and then call CAMLreturn, you can use CAMLdrop instead of CAMLreturn or refactor your code into two functions, where the inner function uses CAML* macros and the outer function releases the runtime lock after calling it.

Bad

value simplified_dbexec(value vdbconn, value vsql) {
    CAMLparam2(vdbconn, vsql);
    caml_release_runtime_lock();
    // error, vsql's data could be moved at any time
    // (not clear to me if DBPROCESS_VALUE(vdbconn) is safe)
    RETCODE ret = dbsqlexec(DBPROCESS_VALUE(vdbconn), String_val(vsql));
    caml_acquire_runtime_lock();
    CAMLreturn(Val_int(ret));
}

void error_handler(int error_code)
{
    caml_acquire_runtime_lock();

    CAMLparam0();
    CAMLlocal1(verror_code);
    verror_code = Val_int(error_code);
    caml_callback1(*handler, verror_code);

    caml_release_runtime_lock();

    // error, messing with the OCaml locals list when we don't have the
    // runtime lock
    CAMLreturn0;
}

Still bad

value simplified_dbexec(value vdbconn, value vsql) {
    CAMLparam2(vdbconn, vsql);
    DBPROCESS* dbconn = DBPROCESS_VALUE(vdbconn);
    char* sql = String_val(vsql);

    caml_release_runtime_lock();
    // error, sql could be moved at any time
    RETCODE ret = dbsqlexec(dbconn, sql);
    caml_acquire_runtime_lock();

    CAMLreturn(Val_int(ret));
}

Good

value simplified_dbexec(value vdbconn, value vsql) {
    CAMLparam2(vdbconn, vsql);
    DBPROCESS* dbconn = DBPROCESS_VALUE(vdbconn);
    char* sql = caml_stat_strdup(String_val(vsql));

    caml_release_runtime_lock();
    RETCODE ret = dbsqlexec(dbconn, sql);
    caml_acquire_runtime_lock();

    caml_stat_free(sql);
    CAMLreturn(Val_int(ret));
}

void error_handler_with_lock(int error_code)
{
    CAMLparam0();
    CAMLlocal1(verror_code);
    verror_code = Val_int(error_code);
    caml_callback1(*handler, verror_code);
    CAMLreturn0;
}

void error_handler(int error_code)
{
    caml_acquire_runtime_lock();
    err_handler_with_lock(error_code);
    caml_release_runtime_lock();
}

// alternative with CAMLdrop
void error_handler(int error_code)
{
    caml_acquire_runtime_lock();
    CAMLparam0();
    CAMLlocal1(verror_code);
    verror_code = Val_int(error_code);
    caml_callback1(*handler, verror_code);
    CAMLdrop;
    caml_release_runtime_lock();
}

Be careful about throwing OCaml exceptions from C

If you call caml_raise or any of its friends, you will immediately throw away the existing C stack. Any OCaml variables registered with CAMLparam or CAMLlocal will be handled correctly, but you need to be very careful about the current state of the C library you're calling into.

For example, you need to make sure to clean up any non-OCaml memory. Also, if you're in a callback from a C function.. just don't do it. You can maybe try to guess that the C function doesn't need to do any cleanup, but really.. don't.

If you run into this situation where you have to do this (i.e., the only way to do error handling in FreeTDS's dblib is through callbacks), you can save exceptions and rethrow them when the C library finishes.

Properly saving OCaml values

If you need to save an OCaml value, you use caml_register_global_root and caml_remove_global_root. When doing this, keep in mind that you're registering a pointer to a value variable, not the contents, so the value needs to always be valid!

Bad

typedef struct My_thing {
    value example;
    int have_example;
} My_thing;

static void my_thing_new() {
    My_thing* thing = caml_stat_alloc(sizeof(My_thing));
    thing->have_example = FALSE;
    // error, thing->example is invalid
}

static void my_thing_free_example(My_thing* thing) {
    if (thing->have_example) {
        // error, we never registered this
        caml_remove_global_root(&(thing->example));
        this->have_example = FALSE;
    }
}

static void my_thing_set_example(My_thing* thing, value example) {
    CAMLparam1(example);

    my_thing_free_example(thing);

    // error, not registering the right variable
    caml_register_global_root(&example);
    thing->example = example;
    thing->have_example = TRUE;

    CAMLreturn0;
}

static void my_thing_free(My_thing* thing)
{
    my_thing_free_example(thing);
    caml_stat_free(thing);
}

Good

typedef struct My_thing {
    value example;
} My_thing;

static void my_thing_new() {
    My_thing* thing = caml_stat_alloc(sizeof(My_thing));
    thing->example = Val_unit;
    caml_register_global_root(&(thing->example));
}

static void my_thing_set_example(My_thing* thing, value example) {
    CAMLparam1(example);
    thing->example = example;
    CAMLreturn0;
}

static void my_thing_free(My_thing* thing) {
    caml_remove_global_root(&(thing->example));
    caml_stat_free(thing);
}

And that's all I have.

Figuring out what was wrong in some cases was pretty confusing, so I hope this helps save someone else some time while writing or debugging C bindings. Thanks to the members of the OCaml Discord for helping me find several of these problems!