【C/C++】在main函数之前执行代码

冒-_-泡

编辑于 2023年05月04日 00:52

在C和C++中，main函数是逻辑上的程序执行入口，但入口并非指其最早执行，即便是从语法意义来说也是如此，有的时候，我们需要在main之前执行一些操作，例如程序的自行初始化等（一般是各库或模块的，即main中无法感知或无需感知操作的那种）

由于这种需求还是比较多，很多语言就提供了相关的机制，例如Java的static代码块，Golang的init函数等，然而在C和C++这里，并没有一个专门的针对性设计，需要我们利用一些语法特性或扩展特性，且在实践中会碰到一些问题

最简单的做法是利用全局变量初始化：

 代码块
C++
自动换行
复制代码
~/test/cpp_test$ cat a.cpp
#include &amp;lt;stdio.h&amp;gt;

static void f()
{
    printf(&amp;quot;f\n&amp;quot;);
}

static int _x = (f(), 0); //逗号表达式，当然直接让f返回int也行

int main()
{
    printf(&amp;quot;main start\n&amp;quot;);
}
~/test/cpp_test$ g++ -o a a.cpp &amp;amp;&amp;amp; ./a
f
main start
~/test/cpp_test$复制成功

C++还可以更简单一点，用lambda省去显式定义：

 代码块
C++
自动换行
复制代码
#include &amp;lt;stdio.h&amp;gt;

static int _x = [] () -&amp;gt; int {
    printf(&amp;quot;f\n&amp;quot;);
    return 0;
}();

int main()
{
    printf(&amp;quot;main start\n&amp;quot;);
}复制成功

当然类似的搞个自定义类的全局变量，用类的构造函数做也行，但是这类做法需要注意，语言并没有规定全局变量初始化和析构的严格顺序，只是说析构顺序和构造相反。虽说同一个编译单元中的多个全局对象的构造是顺序的，但C++程序一般都是多个编译单元所构成，所以不要依赖这点

上面的做法不爽的一个地方是占用了一个变量名，对于强迫症人员来说，可以用有些编译器扩展提供的相关机制，例如GNUC中，可以指定一个函数的属性为constructor，使其在main之前执行：

 代码块
C++
自动换行
复制代码
~/test/cpp_test$ cat a.cpp
#include &amp;lt;stdio.h&amp;gt;

static __attribute__((constructor)) void f()
{
    printf(&amp;quot;f\n&amp;quot;);
}

int main()
{
    printf(&amp;quot;main start\n&amp;quot;);
}
~/test/cpp_test$ g++ -o a a.cpp &amp;amp;&amp;amp; ./a
f
main start
~/test/cpp_test$复制成功

类似的，GNUC也提供了destructor属性来实现在main后面执行某个函数的用法，而且这两个属性也都支持指定优先级，具体可参考链接：

https://gcc.gnu.org/onlinedocs/gcc-13.1.0/gcc/Common-Function-Attributes.html#index-constructor-function-attribute

但是和全局变量初始化顺序问题类似，也有执行顺序的问题，就GNUC来说，虽然constructor函数之间可以通过优先级来控制，但是同优先级的就不一定了，而且它们和全局变量的初始化顺序之间也是未指定的，例如：

 代码块
C++
自动换行
复制代码
~/test/cpp_test$ cat a.cpp
#include &amp;lt;stdio.h&amp;gt;
#include &amp;lt;map&amp;gt;

static std::map&amp;lt;int, int&amp;gt; m;

static __attribute__((constructor)) void f()
{
    m[1] = 2;
}

int main()
{
    printf(&amp;quot;main start\n&amp;quot;);
    printf(&amp;quot;%d\n&amp;quot;, m[1]);
}
~/test/cpp_test$ g++ -o a a.cpp &amp;amp;&amp;amp; ./a
Segmentation fault (core dumped)
~/test/cpp_test$复制成功

在我这个环境下的这个例子中，constructor f执行的时候，全局变量m还没有初始化，即这个map对象的值是非法的（实际是全0），此时直接操作就会出问题，改成这样就可以执行：

 代码块
C++
自动换行
复制代码
#include &amp;lt;stdio.h&amp;gt;
#include &amp;lt;map&amp;gt;

static std::map&amp;lt;int, int&amp;gt; *m;

static __attribute__((constructor)) void f()
{
    if (!m)
    {
        m = new std::map&amp;lt;int, int&amp;gt;;
    }
    (*m)[1] = 2;
}

int main()
{
    printf(&amp;quot;main start\n&amp;quot;);
    printf(&amp;quot;%d\n&amp;quot;, (*m)[1]);
}复制成功

但是需要注意，这种做法能成功的前提是：constructor的执行是在m的默认初始化之前，由于在这里m是一个指针，其默认初始化可以认为是程序加载时候（将全局变量赋值为代码预设值，或者对于未指定预设值的清零（bss段）），那么当f执行的时候，就可以像上面这样去做事了，而且也不用担心在f执行完成之后对m的初始化会覆盖掉f中对m的赋值

如果m有赋值初始化呢：

 代码块
C++
自动换行
复制代码
~/test/cpp_test$ cat a.cpp
#include &amp;lt;stdio.h&amp;gt;
#include &amp;lt;map&amp;gt;

static std::map&amp;lt;int, int&amp;gt; *g()
{
    printf(&amp;quot;g\n&amp;quot;);
    return new std::map&amp;lt;int, int&amp;gt;;
}

static std::map&amp;lt;int, int&amp;gt; *m = g();

static __attribute__((constructor)) void f()
{
    printf(&amp;quot;f\n&amp;quot;);
    if (!m)
    {
        m = new std::map&amp;lt;int, int&amp;gt;;
    }
    (*m)[1] = 2;
}

int main()
{
    printf(&amp;quot;main start\n&amp;quot;);
    printf(&amp;quot;%d\n&amp;quot;, (*m)[1]);
}
~/test/cpp_test$ g++ -o a a.cpp &amp;amp;&amp;amp; ./a
f
g
main start
0
~/test/cpp_test$复制成功

可以看到，在我的环境下，f的执行是给m做赋值之前的，然而，其他环境下有可能不是这个顺序，所以如果要这样写代码，需要特别注意，例如，上面调用g之前最好判断一下m是否已经被赋值，可以用一个全局变量来标记

但实际上我们可以有更好的做法：

 代码块
C++
自动换行
复制代码
~/test/cpp_test$ cat a.cpp
#include &amp;lt;stdio.h&amp;gt;
#include &amp;lt;map&amp;gt;

static std::map&amp;lt;int, int&amp;gt; &amp;amp;GetM()
{
    static std::map&amp;lt;int, int&amp;gt; m;
    return m;
}

static __attribute__((constructor)) void f()
{
    printf(&amp;quot;f\n&amp;quot;);
    GetM()[1] = 2;
}

int main()
{
    printf(&amp;quot;main start\n&amp;quot;);
    printf(&amp;quot;%d\n&amp;quot;, GetM()[1]);
}
~/test/cpp_test$ g++ -o a a.cpp &amp;amp;&amp;amp; ./a
f
main start
2
~/test/cpp_test$复制成功

这里利用了C语言的机制，同样是全局变量m，放置于函数中，就保证在函数第一次调用时初始化，当然也保证只初始化一次，这样就比较安全了

以上例子是在一个cpp文件，也就是说单个翻译单元中实现的，而一般C++或C项目都是有多个翻译单元（或者也叫源码模块之类）组成的，那么如果main和初始化代码分开放会如何呢：

 代码块
C++
自动换行
复制代码
~/test/cpp_test$ cat a.cpp
#include &amp;lt;stdio.h&amp;gt;

int main()
{
    printf(&amp;quot;main start\n&amp;quot;);
}
~/test/cpp_test$ cat b.cpp
#include &amp;lt;stdio.h&amp;gt;

static __attribute__((constructor)) void AUTO_INIT()
{
    printf(&amp;quot;AUTO_INIT\n&amp;quot;);
}
~/test/cpp_test$ g++ -o a a.cpp b.cpp &amp;amp;&amp;amp; ./a
AUTO_INIT
main start
~/test/cpp_test$复制成功

如果按一般的做法，将每个文件先编译成.o，然后连接，也是一样的：

 代码块
C++
自动换行
复制代码
~/test/cpp_test$ g++ -c a.cpp
~/test/cpp_test$ g++ -c b.cpp
~/test/cpp_test$ g++ -o a a.o b.o &amp;amp;&amp;amp; ./a
AUTO_INIT
main start
~/test/cpp_test$复制成功

但是，如果你认为这样就可以在工程中使用，那大概率还是会掉入一个坑中：

 代码块
C++
自动换行
复制代码
~/test/cpp_test$ g++ -c a.cpp b.cpp
~/test/cpp_test$ ar -crsP libb.a b.o
~/test/cpp_test$ g++ -o a a.o libb.a &amp;amp;&amp;amp; ./a
main start
~/test/cpp_test$复制成功

直接将a和b的cpp源码，或者.o的目标文件连接在一起是没有问题的，但是，将b.o打包成静态库，就不行了，b中的AUTO_INIT代码没有被执行，nm看一下会发现并没有链接进去：

 代码块
C++
自动换行
复制代码
~/test/cpp_test$ nm libb.a | grep AUTO_INIT
0000000000000000 t _ZL9AUTO_INITv
~/test/cpp_test$ nm b.o | grep AUTO_INIT
0000000000000000 t _ZL9AUTO_INITv
~/test/cpp_test$ nm a | grep AUTO_INIT
~/test/cpp_test$复制成功

有人说这是不是因为GNUC的constructor机制，但实际上如果用一开始说的全局变量的初始化方式，也是一样的：

 代码块
C++
自动换行
复制代码
~/test/cpp_test$ cat b.cpp
#include &amp;lt;stdio.h&amp;gt;

static void AUTO_INIT()
{
    printf(&amp;quot;AUTO_INIT\n&amp;quot;);
}

int _x = (AUTO_INIT(), 0);
~/test/cpp_test$ g++ -c a.cpp b.cpp
~/test/cpp_test$ rm libb.a &amp;amp;&amp;amp; ar -crsP libb.a b.o
~/test/cpp_test$ g++ -o a a.o libb.a &amp;amp;&amp;amp; ./a
main start
~/test/cpp_test$复制成功

出现这种情况的原因，是链接器（虽然命令是g++，但大家都知道链接过程是g++调用了其他链接器，这里默认是ld）在从静态库查找.o文件时，是根据当前依赖来找的，由于a.cpp中没有用到b.cpp中任何元素，所以b.o就没有被链入了，如果在a.cpp中用到了b.cpp的元素（全局变量、函数等），即便和初始化代码无任何关系，b.o也会被链入，而AUTO_INIT也被“顺便”链入从而能起效了

其实从“代码链接”的角度看，ld链接器这种做法并不太对，因为constructor是显式指定了要在main之前运行的，换句话说，它是程序启动器的默认依赖对象，但是这个属于编译器的GNUC扩展语法，ld认为自己不用管这些，从实现角度也管不好；而如果使用下面的全局变量_x这种方式，全局变量的初始化调用了AUTO_INIT，是可能产生副作用的，但是ld也不管这种情况，一切只是以显式的依赖关系为准，对.o文件做最少量的链入

但有的时候我们还是需要将libb.a里的b.o链入，即便其他模块并不显式依赖它，这一般是出现在一些需要启动时静默初始化的代码，且代码文件单独维护的情况下（例如这段代码是编译系统自动生成），这时候我们可以通过选项修改链接器的行为，给ld传递参数，强制要求其链接静态库中所有目标：

 代码块
C++
自动换行
复制代码
~/test/cpp_test$ g++ -o a a.o \
&amp;gt; -Wl,--whole-archive libb.a -Wl,--no-whole-archive
~/test/cpp_test$ ./a
AUTO_INIT
main start
~/test/cpp_test$复制成功

这里，我们使用-Wl将选项--whole-archive传递给ld，这个会指示ld改变工作模式，将后续.a静态库中所有.o都强行链入，就达到了执行b.o中AUTO_INIT的效果，但请注意，在libb.a之后还需要用--no-whole-archive解除这个行为模式，即只对libb.a起效，否则在后续链接libc等默认库时，会出现大量的连接错误

将静态库整个链入自然会增加可执行程序的体积，不过一般情况下这个问题并不大，像Golang等语言默认也都是这样做的

cv23445740

分享至

投诉或建议