Why my rust program cannot merge string literals?

こういう文字列の末尾が共通となるようなRustのコードがある。

fn main() {
    print!("foobar\n");
    print!("bar\n");
}

こうなってしまう。

$ rustc hoge.rs
$ readelf -p .rodata hoge| grep bar
  [     0]  invalid args/rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/fmt/mod.rsfoobar\n
            bar\n

これは当然ふたつめのbar\nが不要なので消えていてほしい。

なぜこうなってしまうのか。

まず文字列リテラルがnull終端でないのが気になる。少しコードを変更してみよう。

fn main() {
    print!("dummy\n\0");
    print!("foobar\n\0");
    print!("bar\n\0");
}

出力はちょっとましになったが、相変わらず期待した結果にならない。

$ readelf -p .rodata hoge| grep bar
  [    5e]  foobar\n
  [    66]  bar\n

リンカの問題だろうか。lldを試してみよう。

$ rustc -C link-arg=-fuse-ld=lld hoge.rs
$ readelf -p .comment hoge

String dump of section '.comment':
  [     0]  GCC: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
  [    2b]  rustc version 1.75.0 (82e1608df 2023-12-21)
  [    58]  Linker: Ubuntu LLD 14.0.0

$ readelf -p .rodata hoge| grep bar
  [    70]  foobar\n
  [    98]  thread 'extern "bar\n

なんだか変な並びになってわかりにくい。ちょっと元のコードをいじってわかりやすくする。あまりよいコードの変更ではないが、とりあえず。

e$ cat hoge.rs
fn main() {
    print!("\0foobar\n\0");
    print!("\0bar\n\0");
}
$ rustc -C link-arg=-fuse-ld=lld hoge.rs
$ readelf -p .rodata hoge| grep bar
  [    58]  foobar\n
  [    61]  bar\n

実はLLDはoptimizeフラグを明示的にいじってやらないとtail-optimized stringsの最適化をやってくれない。

lld/OutputSections.cpp

static MergeSyntheticSection *createMergeSynthetic(StringRef name,
                                                   uint32_t type,
                                                   uint64_t flags,
                                                   uint32_t addralign) {
  if ((flags & SHF_STRINGS) && config->optimize >= 2)
    return make<MergeTailSection>(name, type, flags, addralign);
  return make<MergeNoTailSection>(name, type, flags, addralign);
}

これを引数に与えてみよう。

$ rustc -C link-arg=-fuse-ld=lld -C link-arg=-O2 hoge.rs
$ readelf -p .rodata hoge| grep bar
  [    58]  foobar\n
  [    61]  bar\n

残念、効果がないみたいだ。

さて、LLDのコードをみてみるとSHF_STRINGSというフラグの有無をみている。こいつはそのセクションに含まれているものが文字列であることを示す。また、実はmergeが行われるためにはSHF_MERGEというフラグでmergeableであることを示していなければならない。これもついでにみていこう。

これらはリンク前のセクションについている必要がある。そのためrustcでリンク前のオブジェクトファイルを生成して、文字列リテラルが含まれているセクションについて確認していく。

$ rustc --emit obj hoge.rs
$ readelf -WS hoge.o
There are 46 section headers, starting at offset 0x1260:

Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
(省略)
  [25] .rodata..L__unnamed_2 PROGBITS        0000000000000000 000278 000000 00   A  0   0  8
  [26] .rodata..L__unnamed_7 PROGBITS        0000000000000000 000278 00000c 00   A  0   0  1
  [29] .rodata..L__unnamed_8 PROGBITS        0000000000000000 000298 00004b 00   A  0   0  1
  [32] .rodata..L__unnamed_9 PROGBITS        0000000000000000 000300 000009 00   A  0   0  1
  [35] .rodata..L__unnamed_10 PROGBITS        0000000000000000 000320 000006 00   A  0   0  1

$ readelf -p .rodata..L__unnamed_9 hoge.o

String dump of section '.rodata..L__unnamed_9':
  [     1]  foobar\n

$ readelf -p .rodata..L__unnamed_10 hoge.o

String dump of section '.rodata..L__unnamed_10':
  [     1]  bar\n

フラグを表すのはFlgである。どうやら文字列リテラルを含んでいるセクションのフラグにはSHF_STRINGSとSHF_MERGEはついてないということがわかった。

実はここでsh_entsizeの値についても確認しておく必要がある。というのも、リンカはここの値が0の場合mergeableと判定してくれないからだ。

lld:ELF/InputFiles.cpp

template <class ELFT>
bool ObjFile<ELFT>::shouldMerge(const Elf_Shdr &sec, StringRef name) {
  // On a regular link we don't merge sections if -O0 (default is -O1). This
  // sometimes makes the linker significantly faster, although the output will
  // be bigger.
  //
  // Doing the same for -r would create a problem as it would combine sections
  // with different sh_entsize. One option would be to just copy every SHF_MERGE
  // section as is to the output. While this would produce a valid ELF file with
  // usable SHF_MERGE sections, tools like (llvm-)?dwarfdump get confused when
  // they see two .debug_str. We could have separate logic for combining
  // SHF_MERGE sections based both on their name and sh_entsize, but that seems
  // to be more trouble than it is worth. Instead, we just use the regular (-O1)
  // logic for -r.
  if (config->optimize == 0 && !config->relocatable)
    return false;

  // A mergeable section with size 0 is useless because they don't have
  // any data to merge. A mergeable string section with size 0 can be
  // argued as invalid because it doesn't end with a null character.
  // We'll avoid a mess by handling them as if they were non-mergeable.
  if (sec.sh_size == 0)
    return false;

  // Check for sh_entsize. The ELF spec is not clear about the zero
  // sh_entsize. It says that "the member [sh_entsize] contains 0 if
  // the section does not hold a table of fixed-size entries". We know
  // that Rust 1.13 produces a string mergeable section with a zero
  // sh_entsize. Here we just accept it rather than being picky about it.
  uint64_t entSize = sec.sh_entsize;
  if (entSize == 0)
    return false;

sh_entsizeはreadelf -Sの出力のESに対応する。各文字列リテラルを含むセクションはここの値が0になっていることがわかる。

なぜそのようなオブジェクトファイルを出力しているのか？というのはrustcのソースコードを読みにいかねばわからない。

まずrustcが生成するLLVM IRを読んで、文字列リテラルがどのような実装になっているかの雰囲気をつかもう。

$ rustc --emit llvm-ir hoge.rs

以下は関係ありそうな箇所を抜粋したもの。

(省略..)
@alloc_33edafb4635cba7fe9b343830eb4d654 = private unnamed_addr constant <{ [9 x i8] }> <{ [9 x i8] c"\00foobar\0A\00" }>, align 1
@alloc_0dc7dcbb24079b5067fcd16f8d1d6fab = private unnamed_addr constant <{ ptr, [8 x i8] }> <{ ptr @alloc_33edafb4635cba7fe9b343830eb4d654, [8 x i8] c"\09\00\00\00\00\00\00\00" }>, align 8
@alloc_82ad759880ed30affa0db4ee70a74cbc = private unnamed_addr constant <{ [6 x i8] }> <{ [6 x i8] c"\00bar\0A\00" }>, align 1
@alloc_170c2fea5b9cc064c6e9c015b7a41c15 = private unnamed_addr constant <{ ptr, [8 x i8] }> <{ ptr @alloc_82ad759880ed30affa0db4ee70a74cbc, [8 x i8] c"\06\00\00\00\00\00\00\00" }>, align 8

ここで、たとえば\0foobar\n\0は@alloc_33edafb4635cba7fe9b343830eb4d654 = private unnamed_addr constant <{ [9 x i8] }> <{ [9 x i8] c"\00foobar\0A\00" }>, align 1のように表現されていることがわかる。

private unnamed_addr constantとはなんぞや？

まずprivateはわかりやすいが、これはlinkage typesで同一モジュール内からのアクセスのみが許可されていることを表す。

unnamed_addrはちょっとわかりにくいが、これはアドレスは重要ではなく内容のみが重要であることを表す。またconstantもこのセクションで説明されているとおりで、まあ定数となる(変更されない)くらいで解釈して問題ない。

次に<{ [9 x i8] }> <{ [9 x i8] c"\00foobar\0A\00" }>, align 1をみてみよう。まず<{ [9 x i8] }>についてだが、LLVMの型定義をみると、これは要素にarray typeをひとつ含んだpacked struct typeということになる。packedなstruct typeであるので、末尾にalign 1も付与されている。

次に<{ [9 x i8] c"\00foobar\0A\00" }>の意味だが、これはLLVMでの定数を表す表現で、中に文字列定数を一つ格納するstructの定数となる。

と、上記記述をみてみると、Rustの文字列リテラルは文字列定数をひとつ格納するstruct型の定数としてLLVMのGlobal Variableで表現されているようだ。なのでllvm::GlobalVariableを生成している箇所を追っていこう。

Topic: LDCの場合

Rust側のコードを覗く前に、ここまでちゃんと生成できてない例しかみてないので、そもそもちゃんと動くの？前提が正しいのか？みたいな話がある。そこで別のLLVMベースのコンパイラ・LDCでみてみる。

まずはそもそもLLDで文字列リテラルのmergeがちゃんと行われているのか？を実験してみよう。

$ cat app.d
import std.stdio;

void main()
{
    writeln("foobar");
    writeln("bar");
}
$ ldc2 --linker=lld app.d
$ readelf -p .rodata app| grep bar
  [   a5b]  foobar
  [   ea6]  bar
$ ldc2 --linker=lld -L-O2 app.d
$ readelf -p .rodata app| grep bar
  [   392]  foobar

リンカオプションに最適化フラグを与えるとちゃんとmergeしてくれていることがわかる。

次にセクションのフラグ情報をみてみる。

$ ldc2 --output-o app.d
$ readelf -WS app.o | grep \.rodata
  [50] .rodata.str1.1    PROGBITS        0000000000000000 0013df 00000b 01 AMS  0   0  1
  [51] .rodata.str1.16   PROGBITS        0000000000000000 0013f0 000169 01 AMS  0   0 16
$ readelf -p .rodata.str1.1 app.o

String dump of section '.rodata.str1.1':
  [     0]  foobar
  [     7]  bar

.rodata.str1.1セクションのsh_entsizeが1に、フラグがAMSになっている(=mergeとstringsがセットされている)ことが確認できた。

これで期待どおりにバイナリを吐いている、ということはわかったが、LDCはできてRustができてないのはなんで？となる。

生成するLLVM IRをみてみよう。

$ ldc2 --output-ll app.d
$ cat app.ll
(省略)
@.str = private unnamed_addr constant [7 x i8] c"foobar\00" ; [#uses = 1]
@.str.1 = private unnamed_addr constant [4 x i8] c"bar\00" ; [#uses = 1]

private unnamed_addr constantだから、Rust同様にLLVMのGlobalVariableな実装になっている。

LDCのコードは比較的追いやすい。

まず、このGlobalVariableを生成している箇所はgen/irstate.cppのgetCachedStringLiteralImpl関数にある。

ldc:gen/irstate.cpp

template <typename F>
LLGlobalVariable *
getCachedStringLiteralImpl(llvm::Module &module,
                           llvm::StringMap<LLGlobalVariable *> &cache,
                           llvm::StringRef key, F initFactory) {
  auto iter = cache.find(key);
  if (iter != cache.end()) {
    return iter->second;
  }

  LLConstant *constant = initFactory();

  auto gvar =
      new LLGlobalVariable(module, constant->getType(), true,
                           LLGlobalValue::PrivateLinkage, constant, ".str");
  gvar->setUnnamedAddr(LLGlobalValue::UnnamedAddr::Global);

  cache[key] = gvar;

  return gvar;
}

GlobalVariableのコンストラクタの定義は以下のようになっている。

llvm:include/llvm/IR/GlobalVariable.h

/// GlobalVariable ctor - This creates a global and inserts it before the
  /// specified other global.
  GlobalVariable(Module &M, Type *Ty, bool isConstant, LinkageTypes Linkage,
                 Constant *Initializer, const Twine &Name = "",
                 GlobalVariable *InsertBefore = nullptr,
                 ThreadLocalMode = NotThreadLocal,
                 std::optional<unsigned> AddressSpace = std::nullopt,
                 bool isExternallyInitialized = false);

Type *TyとConstant *Initializerが重要そうだ。なのでinitFactoryにわたってくる関数をみてみる。D言語の文字列リテラルの場合はgen/llvmhelper.cppのbuildStringLiteralConstant関数が渡されてくる。

ldc:gen/llvmhelper.cpp

llvm::Constant *buildStringLiteralConstant(StringExp *se,
                                           uint64_t bufferLength) {
  const auto stringLength = se->numberOfCodeUnits();
  assert(bufferLength >= stringLength);

  if (se->sz == 1 && bufferLength <= stringLength + 1) {
    const DString data = se->peekString();
    const bool nullTerminate = (bufferLength == stringLength + 1);
    return llvm::ConstantDataArray::getString(
        gIR->context(), {data.ptr, data.length}, nullTerminate);
  }

  Type *dtype = se->type->toBasetype();
  Type *cty = dtype->nextOf()->toBasetype();
  LLType *ct = DtoMemType(cty);
  LLArrayType *at = LLArrayType::get(ct, bufferLength);

  std::vector<LLConstant *> vals;
  vals.reserve(bufferLength);
  for (uint64_t i = 0; i < stringLength; ++i) {
    vals.push_back(LLConstantInt::get(ct, se->getCodeUnit(i), false));
  }
  const auto nullChar = LLConstantInt::get(ct, 0, false);
  for (uint64_t i = stringLength; i < bufferLength; ++i) {
    vals.push_back(nullChar);
  }
  return LLConstantArray::get(at, vals);
}

文字列リテラルが直接渡されてくる今回の場合は以下のケースだけ着目すればよい。

    const DString data = se->peekString();
    const bool nullTerminate = (bufferLength == stringLength + 1);
    return llvm::ConstantDataArray::getString(
        gIR->context(), {data.ptr, data.length}, nullTerminate);

なのでここではllvm::ConstantDataArrayが返ってきてる。

constant->getType()が返す型はこのArray型の要素となるので、この場合はint8_tになる。

よってこのセクションは、

要素の型がint8_tなConstantDataArray型のconstantなGlobalVariable
内部のデータはnull終端な文字列リテラル

で構成されている。

rustcコードリーディング

rustcが直接LLVMのGlobalVariableを生成しているコードは二箇所ある。この2つの関数をrustc_codegen_llvm内でいろいろな箇所から呼んでいる。

rust:compiler/rustc_llvm/llvm-wrapper/RustWrapper.cpp

extern "C" LLVMValueRef
LLVMRustGetOrInsertGlobal(LLVMModuleRef M, const char *Name, size_t NameLen, LLVMTypeRef Ty) {
  Module *Mod = unwrap(M);
  StringRef NameRef(Name, NameLen);

  // We don't use Module::getOrInsertGlobal because that returns a Constant*,
  // which may either be the real GlobalVariable*, or a constant bitcast of it
  // if our type doesn't match the original declaration. We always want the
  // GlobalVariable* so we can access linkage, visibility, etc.
  GlobalVariable *GV = Mod->getGlobalVariable(NameRef, true);
  if (!GV)
    GV = new GlobalVariable(*Mod, unwrap(Ty), false,
                            GlobalValue::ExternalLinkage, nullptr, NameRef);
  return wrap(GV);
}

extern "C" LLVMValueRef
LLVMRustInsertPrivateGlobal(LLVMModuleRef M, LLVMTypeRef Ty) {
  return wrap(new GlobalVariable(*unwrap(M),
                                 unwrap(Ty),
                                 false,
                                 GlobalValue::PrivateLinkage,
                                 nullptr));
}

先にも述べた通り、これらの関数はいろいろな箇所から呼ばれており、逐一探すのは骨が折れる。なのでちょっとアドホックなやり方で特定しよう。

rustcが生成するLLVM IRの出力をちょっと振り返ってみよう。

(省略..)
@alloc_33edafb4635cba7fe9b343830eb4d654 = private unnamed_addr constant <{ [9 x i8] }> <{ [9 x i8] c"\00foobar\0A\00" }>, align 1
@alloc_0dc7dcbb24079b5067fcd16f8d1d6fab = private unnamed_addr constant <{ ptr, [8 x i8] }> <{ ptr @alloc_33edafb4635cba7fe9b343830eb4d654, [8 x i8] c"\09\00\00\00\00\00\00\00" }>, align 8
@alloc_82ad759880ed30affa0db4ee70a74cbc = private unnamed_addr constant <{ [6 x i8] }> <{ [6 x i8] c"\00bar\0A\00" }>, align 1
@alloc_170c2fea5b9cc064c6e9c015b7a41c15 = private unnamed_addr constant <{ ptr, [8 x i8] }> <{ ptr @alloc_82ad759880ed30affa0db4ee70a74cbc, [8 x i8] c"\06\00\00\00\00\00\00\00" }>, align 8

ここで各GlobalVariableの定義は、alloc_xxxxx...のような命名規則になっていることがわかる。この命名規則を利用しているのは、rustc_codegen_llvm/src/common.rsのscalar_to_backend関数である。

rust:compiler/rustc_codegen_llvm/src/common.rs

    fn scalar_to_backend(&self, cv: Scalar, layout: abi::Scalar, llty: &'ll Type) -> &'ll Value {
        let bitsize = if layout.is_bool() { 1 } else { layout.size(self).bits() };
        match cv {
            Scalar::Int(int) => { (省略...) }
            Scalar::Ptr(ptr, _size) => {
                let (prov, offset) = ptr.into_parts();
                let (base_addr, base_addr_space) = match self.tcx.global_alloc(prov.alloc_id()) {
                    GlobalAlloc::Memory(alloc) => {
                        let init = const_alloc_to_llvm(self, alloc);
                        let alloc = alloc.inner();
                        let value = match alloc.mutability {
                            Mutability::Mut => self.static_addr_of_mut(init, alloc.align, None),
                            _ => self.static_addr_of(init, alloc.align, None),
                        };
                        if !self.sess().fewer_names() && llvm::get_value_name(value).is_empty() {
                            let hash = self.tcx.with_stable_hashing_context(|mut hcx| {
                                let mut hasher = StableHasher::new();
                                alloc.hash_stable(&mut hcx, &mut hasher);
                                hasher.finish::<Hash128>()
                            });
                            llvm::set_value_name(value, format!("alloc_{hash:032x}").as_bytes());
                        }
                        (value, AddressSpace::DATA)
                    }

llvm::GlobalVariableを生成しているのはstatic_addr_of関数内になる。 (実際の呼び出し階層はstatic_addr_of->static_addr_of_mut->define_private_global(rustc_codegen_llvm/src/declare.rs)->LLVMRustInsertPrivateGlobal(RustWrapper.cpp)のようになっている)

なんでstatic_addr_ofのほうかとわかるのかというと、llvm::LLVMSetGlobalConstant(gv, True);を呼んでるのがこちらのパスだけなので、LLVM IRの内容と照らし合わせるとこちらになるとわかる。

ここまででどのパスでGlobalVariableの生成するかはわかった ~~が型情報をコードから追うのはなかなか大変そうだ。~~ (追記: 別にそんなことはなかった)

ここの型を決める箇所は以下のコードで常にconst_structを呼び出しており、これはConstStruct型となる。

rust:compiler/rustc_codegen_llvm/src/consts.rs

pub fn const_alloc_to_llvm<'ll>(cx: &CodegenCx<'ll, '_>, alloc: ConstAllocation<'_>) -> &'ll Value {
    let alloc = alloc.inner();
    let mut llvals = Vec::with_capacity(alloc.provenance().ptrs().len() + 1);
(省略...)
    cx.const_struct(&llvals, true)
}

ここで再びLLVM IRに目を戻してみる。

@alloc_33edafb4635cba7fe9b343830eb4d654 = private unnamed_addr constant <{ [9 x i8] }> <{ [9 x i8] c"\00foobar\0A\00" }>, align 1

ここでLDCでの定義を思い返してみると、ここはarray type(正確にはArrayDataType)だった。これでだんだんみえてきた。ここの型がArray-likeでなければSHF_MERGE/SHF_STRINGSは付与されない、ということだろう。

rustcの中でこれらのフラグをつけているわけではないことがわかったので、ここでLLVMのコード生成側に返ってみよう。

型情報によってmergeable判定を決めている箇所は、LLVMのTargetLoweringObjectFile.cppのgetKindForGlobal関数となっている。

llvm:lib/Target/TargetLoweringObjectFile.cpp

SectionKind TargetLoweringObjectFile::getKindForGlobal(const GlobalObject *GO,
                                                       const TargetMachine &TM){
(省略...)
      // If initializer is a null-terminated string, put it in a "cstring"
      // section of the right width.
      if (ArrayType *ATy = dyn_cast<ArrayType>(C->getType())) {
        if (IntegerType *ITy =
              dyn_cast<IntegerType>(ATy->getElementType())) {
          if ((ITy->getBitWidth() == 8 || ITy->getBitWidth() == 16 ||
               ITy->getBitWidth() == 32) &&
              IsNullTerminatedString(C)) {
            if (ITy->getBitWidth() == 8)
              return SectionKind::getMergeable1ByteCString();
            if (ITy->getBitWidth() == 16)
              return SectionKind::getMergeable2ByteCString();

            assert(ITy->getBitWidth() == 32 && "Unknown width");
            return SectionKind::getMergeable4ByteCString();
          }
        }

SectionKind::Mergeable1ByteCStringやSectionKind::Mergeable4ByteCStringを返す条件として、セクションの型が整数型のArrayTypeにキャスト可能な型であること・セクション内の文字列がnull終端となっていることなどが求められている。

これらのSectionKindはSHF_MERGEとSHF_STRINGSに影響することが以下の関数でわかる。

llvm:lib/CodeGen/TargetLoweringObjectFileImpl.cpp

static unsigned getELFSectionFlags(SectionKind K) {
  unsigned Flags = 0;
(省略...)
  if (K.isMergeableCString() || K.isMergeableConst())
    Flags |= ELF::SHF_MERGE;

  if (K.isMergeableCString())
    Flags |= ELF::SHF_STRINGS;

  return Flags;
}

まとめ

結論としては、Rustは文字列リテラルをLLVMの世界では要素として文字列定数をひとつ持つpacked struct型の定数をグローバル変数として表現するが、LLVMはArray型の定数でかつnull終端である文字列定数しかMerge可能な文字列と判定できないため、文字列リテラルを格納しているセクションにmergeableかつstringsであることを表すフラグを付与できないということになる。

今回Rustが意図してこのような文字列リテラルの実装になっているかまでは調べきることはできなかったが、機会があればまた調査したい。

kubo39's blog

ただの雑記です。

Why my rust program cannot merge string literals?

Topic: LDCの場合

rustcコードリーディング

まとめ