去年に続き今年もGCCが開催されるそうです。去年は卒論で教員に圧力をかけられ参加できませんでしたが、今年は卒論も無いので応募することにしました。 GCCの応募には応募課題をする必要があるのですが、この記事ではその課題に対する私の回答を晒します。

やってみた感想としては、出された3つの課題はどれも勉強になって面白かったです。ただ、問題の意図が分からなかったり、「簡潔に」と言いながらソースコードを載せる必要があったりで回答に困る部分もありました。ということで「これくらい書いたら受かりました」という情報を後世に残しておきます。

所要時間

3日。1日1つやりました。人にもよると思いますが、1日フルに使えばすべて終わる程度の量で、万が一解けなくても過程を書けば良いようです。

概要

課題1

AFL, laf-intel, angr(, Driller)を、バグのあるサンプルバイナリに対して回して、それぞれの利点と欠点をまとめるという課題です。

正直これが一番つらかったです。fuzzerは使ったことが無かったので何をすれば正解なのかよく分かりませんでした。 angrの方は使い慣れていたのですぐに終わりましたが、AFLも1つしかクラッシュを見つけてくれないまま提出しました。 Drillerに関しては動かし方すら分からなかったので触れずに提出しました。（これは必須ではないので。）

課題2

破損したAPFSからファイルを復元してね、という課題です。

APFSは触ったことが無かったので非常に勉強になり、一番面白かったです。参考資料も付いており、APFSの構造を勉強しながら課題を進められました。結果としては、superblockが破損しているのでバックアップ領域を使って修正し、ファイルの方は断片化してオフセットが書き換えられているので適当に直す、という内容でした。

課題3

C++製っぽい簡易VMプログラムを解析し、VMコードの逆アセンブラまで作ろうという課題です。 VMはVMと言えないほど単純なもので、逆アセンブラは簡単に実装できます。あとはVMのバイトコードを解析してフラグを見つけるだけなのですが、解がたくさんあって結局どれが正解か分かりませんでした。どれも入力として正しいので正解なのですが、フラグっぽい形式のものが無かったので不安で、結局全部の解を求めるスクリプトを提出しました。

内容

晒す。日本語の問には日本語で答え、英語の問には英語で答えましたが、課題に関しては言語指定は無いようです。

(1) What is your cyber security experience?

I got interested in cyber security about 5 years ago. The start was Security Mini Camp held in Hokkaido, in which I learned digital forensics. In the session we analysed a disk/memory dump of a computer infected with a malware. I kept learning cyber secuity after that too since I heard the word "CTF" in the Security Mini Camp. As I've played CTFs, I acquired many skills, not only forensics but also of other fields such as cryptoraphy, reverse engineering, web/binary exploitation. I focus on studying binary exploitation and malware analysis in these days. I'm a college student but as a job experience I've been creating an anti-virus for Linux in my part time job. So, my cyber security experience is suitable for the upcoming GCC as far as I read the program overview.

(2) What is your English proficiency? Please write in English.

You'll understand that my English skill is enough if you take a look over my blog. I've post many writeups/articles in English on my blog. Also, I've been to some foreign countries or teamed up with foreigners for CTF and my English is good not only at reading/writing but also at listening/speaking.

(3) セキュリティ・キャンプに参加したことがある人は、参加した大会名と、クラス/トラック/受講講義等を教えてください。 (ない場合は「なし」と記載。)

セキュリティ・ミニキャンプ in 北海道 2014
セキュリティ・ミニキャンプ in 沖縄 2016
セキュリティキャンプ 2017 - 集中コースX（x86 OSの自作）

(4) 公開してる技術系の活動や資料(ブログや、Twitter、GitHub、Slideshare、Speaker Deckなど)がありましたらURL等を記載してください。 (学校の課題で作ったものか、自発的に作ったものかがわかるように列挙してください。) 専攻でやっていることについては、概要と、その分野での客観的にわかる成果などを記載してください。

Blog: https://ptr-yudai.hatenablog.com/
Twitter: https://twitter.com/ptrYudai
Bitbucket: https://bitbucket.org/ptr-yudai/
Github: https://github.com/ptr-yudai

(5) gcc_challenge1 : Fuzzing See the folder "gcc_challenge1_Fuzzing"

[Answer/解答：簡潔に。]

１．Find as many crash inputs as possible of simple_linter.c by using AFL and laf-intel. Describe the difference between them based on your observation.

AFL found 1 unique crash by running it for 10 minutes. The input is \xde\xad\xdf\xd5. AFL+laf-intel also found 1 unique crash in 10 minutes. This input is \xde\xad\x20\x22, which is the same pattern as that of AFL only. However, there's a difference in the number of crashes they found. While AFL found 25 crashes, AFL+laf-intel found 173 crashes. Since the exec speed is almost same, using laf-intel may increase the number of crashes it finds.

２．Write your angr script which can find all crash inputs of simple_linter.c.

The following Python script could find all 3 unique crashes with multiple patterns for 3rd crash within a minute.

import angr
import claripy
from logging import getLogger, WARN
getLogger("angr").setLevel(WARN + 1)

filepath = "input.dat"

p = angr.Project("./a.out", load_options={"auto_load_libs": False})
state = p.factory.entry_state(args=[p.filename, filepath])

input_file = angr.storage.SimFile(filepath)
state.posix.fs = {filepath: input_file}

class CrashHandler(angr.SimProcedure):
    def run(self, crash_id):
        print("===== Crash =====")
        print(crash_id)
        print(self.state.posix.dump_file_by_path(filepath))
        print("=================")

p.hook_symbol("crash", CrashHandler())
simgr = p.factory.simulation_manager(state)
simgr.explore()

３．Does symbolic execution always work better than fuzzing in the context of vulnerability finding? Please explain the advantage/disadvantages between AFL, angr and Driller.

I don't think it's always better though symbolic execution seems much better than fuzzing according to the result of the previous tasks. I don't know much about fuzzing tools but as far as I used AFL for this task, I found the following pros and cons.

(1) AFL

pros:

Can find a bug in the deep part of the program.
Faster than symbolic execution.

cons:

Hard to reach to the path which requires the input to be a specific value. (like a normal compare: input == 0xdeadbeef)

(2) angr

pros:

May find the desired input faster than AFL because it guesses the input based on SAT solver.
Can analyse the program part by part as it can start the execution anywhere.

cons:

Execution speed is slower than AFL. Especially it gets slower when we try to analyse the program deeper as the number of required paths increases drastically.

(3) Driller

pros:

Can reach to the deep part of the program.
Can pass the condition which AFL can't pass and symbolic execution can do.

cons:

Slower than AFL (?)

Angr might be the best tool in this task as the target is small and thus angr could find all the crashes fast. Also, the comparison in the program makes it hard for AFL to find the crash path as it requires numerous attempts to find the proper input. If we need to check a project with large lines of code, for example, it may be better to run AFL rather than use angr. As far as I did a search, Driller takes advantages of fuzzer and symbolic execution. So, practically Driller seems a good option for fuzzing.

(6) gcc_challenge2 : IR See the folder "gcc_challenge2_IR"

[Answer/解答：簡潔に。]

問1. challenge.rawをfuse-apfsでマウントできるようにするために修正し、以下の内容を答えてください。

１．修正箇所のオフセットと修正内容。

以下の2箇所を変更する。

オフセット 0x20 から連続する 8 バイトを 4e 58 53 42 00 10 00 00 に修正する。
オフセット 0xa0 から連続する 4 バイトを 32 01 00 00 に修正する。

２．そのように修正するとマウントできるようになる理由。

hexdumpでchallenge.rawのContainer Superblockを見ると、nx_magicが破壊されている。また、nx_block_sizeやnx_omap_oidもゼロ埋めされているため、Object Mapが参照できない状態になっている。そこでContainer Superblockのバックアップを探すためにnx_xp_desc_blocksとnx_xp_desc_baseを見ると、checkpoint descriptor areaが8ブロック存在し、それがブロック1から始まることが分かる。このデータが改竄されていない仮定でブロックを参照する。nx_block_sizeが分からないが、0x1000バイトごとにObjectのヘッダと思われるデータ（o_chksumなど）が存在することから、nx_block_sizeは0x1000と考えるのが妥当である。1*0x1000から8ブロックを確認すると、次の4つのnx_superblock_tが確認できる。

offset=0x2000, o_xid=0x3d
offset=0x4000, o_xid=0x3e
offset=0x6000, o_xid=0x3b
offset=0x8000, o_xid=0x3c

最新のバックアップはo_xidが0x3eのものなので、これを利用する。（また、これは先頭のContainer Superblockのo_xidとも等しい。）offsetが0x4000のContainer Superblockを見ると、nx_block_sizeは0x1000で、nx_omap_oidは0x132である。したがって、先頭ブロックのnx_magicを修正するとともに、これらの情報を書き込めばapfs-fuseでマウントできるようになる。

問2. challenge.rawに含まれるボリュームには、sample.rawに含まれるものと同じPDFファイルが保存されています。しかし、問1を解決してマウントしても、そのままでは開くことができないファイルが1つあります。当該ファイルを開けるようにchallenge.rawのデータを修正し、以下の内容を答えてください。なお、challenge.rawに含まれるファイルに関するデータが破損している場合などは、sample.rawに含まれる同名のファイルに関するデータを参考に修正してください。ファイルが適切に復元できているかどうかは、当該ファイルのhashを計算し、sample.rawの同名のファイルのhashと比較することで確認できます。

１．修正箇所のオフセットと修正内容。

以下の4箇所を変更する。

オフセット 0x12d7b0 から連続する 2 バイトを c3 4a に修正する。
オフセット 0x12d798 から連続する 2 バイトを 8b 46 に修正する。
オフセット 0x12d780 から連続する 2 バイトを c9 52 に修正する。
オフセット 0x12d000 から連続する 8 バイトを 8f 72 e6 91 dd 53 a4 9c に修正する。

２．そのように修正するとファイルを開くことができるようになる理由。

破損しているファイルは3つの箇所に断片化して存在するが、そのアドレスを表すphys_block_numがいずれも改竄されている。したがって、sample.rawの正しいファイルをもとに断片化したデータを探し、見つけたオフセットから3つのphys_block_numを正しい値に直せばファイルが正しくマウントされる。なお、変更したブロックのチェックサムを正しい値に変更するため0x12d000の8バイトも修正した。

３．回答に至る調査の過程(簡潔に)。

破損しているファイルはIIR/iir_vol40.pdfで、中身は0x00で埋められている。hexdumpでこのファイルまで辿ってみる。問1で修正したnx_omap_oidからom_tree_oidを参照すると0x133である。このオブジェクトを見ると、btn_flagsは7で、keyを1つ持つことが分かる。Valueを見ると、ov_paddrは0x131なのでそこを辿る。apfs_omap_oidは0x12bなので、0x12b000のオブジェクトのom_tree_oidを見ると0x12cである。このBTREEを見ると、n_flagsは7でkeyは3つ存在することが分かる。対応するvalueを見ると、paddrはそれぞれ0x129, 0x12a, 0x12dであることが分かる。確認したところ3つ目の0x12dに破損したファイルが含まれることが分かるのでここを見る。n_flagsは2で、keyは0x34個存在する。iir_vol40.pdfはTOC上で22番目にあたり、対応するvalueを見ると(オフセット0x12dc9a)file_idは0x29である。idが0x29のj_inode_val_tは0x12d7d8に存在する。また、idが0x29のj_file_extent_key_tは0x12d529, 0x12d539, 0x12d540に存在する。対応するj_file_extent_val_tはそれぞれ0x12d7a8, 0x12d790, 0x12d778に存在する。これらを見ると、いずれもphys_block_numが0になっている。これら3つのファイルサイズを足すと0x1f7000 + 0x317000 + 0x4e9000 = 0x9f7000であり、先程参照したj_inode_val_tからファイルサイズが0x9f6bf1であったことを考えても妥当である。なお、それぞれのlogical_addrは0, 0x4e9000, 0x800000である。sample.rawに含まれていた正しいiir_vol40.pdfがフラグメントしていると考え、断片化したデータをchallenge.rawから探すことでphys_block_numを修正する。iir_vol40.pdfのデータをもとに断片化したデータを探すと、それぞれ次の箇所に存在することが分かった。

オフセット0x000000の内容：0x04ac3000
オフセット0x4e9000の内容：0x0468b000
オフセット0x800000の内容：0x052c9000

したがって、0x12d7a8, 0x12d790, 0x12d778にあった3つのj_file_extent_val_tのphys_block_numをそれぞれ0x52c9, 0x468b, 0x4ac3にすれば良い。また、オブジェクトの内容を変更したためチェックサムも変更する必要がある。これらを次のPython スクリプトで（問1も含めて）一括して変更する。

import fletcherNbit
import struct

def patch(fin, fout, patchList):
    offset = 0
    block_size = 0x1000
    while True:
        block = fin.read(block_size)
        if block == b'': break
        updateChecksum = False
        for (addr, data, update) in patchList:
            if offset < addr < offset + block_size:
                local_ofs = addr - offset
                block = block[:local_ofs] + data + block[local_ofs+len(data):]
                updateChecksum |= update
                print("[+] Patched {}: {}".format(hex(addr), data))
        if updateChecksum:
            cs = fletcherNbit.Fletcher64()
            cs.update(block[8:])
            hashval = int(cs.cb_hexdigest(), 16)
            hash_lo, hash_hi = hashval & 0xffffffff, hashval >> 32
            checksum = struct.pack('<I', hash_hi) + struct.pack('<I', hash_lo)
            block = checksum + block[8:]
            print("[+] Updated checksum: {}".format(hex(hashval)))
        fout.write(block)
        offset += block_size
    return

if __name__ == '__main__':
    patchList = [
        (0x20, b'\x4e\x58\x53\x42\x00\x10\x00\x00', False),
        (0xa0, b'\x32\x01\x00\x00', False),
        (0x12d7b0, b'\xc3\x4a', True),
        (0x12d798, b'\x8b\x46', True),
        (0x12d780, b'\xc9\x52', True),
    ]

    fin = open("challenge.raw", "rb")
    fout = open("patched-challenge.raw", "wb")
    
    patch(fin, fout, patchList)
    
    fout.close()
    fin.close()

修正したディスクダンプをapfs-fuseでマウントすると、破損していたPDFファイルが読めるようになった上、sample.rawとハッシュ値も同じになった。

(7) gcc_challenge3 : Reversing See the folder "gcc_challenge3_Reversing"

[Answer/解答：簡潔に。]

１．Submit a solver which extracts the flag within the vmgcc program. Literally, this program employs a custom VM architecture.

I don't understand what "the flag within the vmgcc" means. I regarded it as an input with which the program prints "Success!" but there are hundreds of input patterns accepted by the binary. So, I wrote a solver to find all possible solutions which consist of alphanumeric characters. Here is the Python code.

from z3 import *

flag = [BitVec('flag{:02X}'.format(i), 32) for i in range(4)]
s = Solver()

for i in range(4):
    for j in range(4):
        c = (flag[i] >> j * 8) & 0xff
        s.add(Or( # alphanumeric
            And(0x30 <= c, c < 0x3a),
            And(0x41 <= c, c < 0x40 + 26),
            And(0x61 <= c, c < 0x60 + 26)
        ))
s.add(
    ((flag[0] & 0x44434241) ^ 0x44404041) ^\
    ((flag[1] & 0x48474645) ^ 0x48470441) ^\
    ((flag[2] & 0x4c4b4a49) ^ 0x4c404a49) ^\
    ((flag[3] & 0x504f4e4d) ^ 0x4043424d) == 0
)

while True:
    r = s.check()
    if r == sat:
        m = s.model()
        answer = ['?' for i in range(4)]
        for d in m.decls():
            answer[int(d.name()[4:], 16)] = hex(m[d].as_long())[2:]
        print(bytes.fromhex(''.join(answer)))
        s.add(Not(And([flag[int(d.name()[4:], 16)] == m[d] for d in m.decls()])))
    else:
        print(r)
        break

The first output is JLLLLL0LHl80ah40 and it's accepted by the binary.

２．Submit a disassembler for the keycheck bytecode. Analyzing vmgcc is required in advance.

This is my disassembler written in Python.

def disasm(code):
    def u32(b):
        return b[3]|(b[2]<<8)|(b[1]<<16)|(b[0]<<24)
    def n2reg(n):
        assert 0 <= n < 6
        return 'r{}'.format(n)
    output = []
    pc = 0
    while True:
        inst = code[pc]
        if inst == 0x1a:
            # mov reg, val
            imm = u32(code[pc+2:pc+6])
            dst = code[pc+1]
            output.append('mov {}, 0x{:08x}'.format(n2reg(dst), imm))
            pc += 6
        elif inst == 0x1b:
            # mov reg, reg
            src = code[pc+2]
            dst = code[pc+1]
            output.append('mov {}, {}'.format(n2reg(dst), n2reg(src)))
            pc += 3
        elif inst == 0x0a:
            # and reg, reg
            src = code[pc+2]
            dst = code[pc+1]
            output.append('and {}, {}'.format(n2reg(dst), n2reg(src)))
            pc += 3
        elif inst == 0x0b:
            # xor reg, reg
            src = code[pc+2]
            dst = code[pc+1]
            output.append('xor {}, {}'.format(n2reg(dst), n2reg(src)))
            pc += 3
        elif inst == 0xff:
            # ret
            output.append('ret')
            break
        else:
            raise AssertionError("Illegal Instruction at 0x{:X}".format(pc))
    return output

if __name__ == '__main__':
    with open("keycheck", "rb") as f:
        code = f.read()
    print('\n'.join(disasm(code)))

おわりに

せっかくなら海外に行きたかった感もありますが、受かったからには東京でも全力で受講します。というか他に通った人間を観測していないので不安です......

CTFするぞ

CTF以外のことも書くよ

GCC Tokyo(Global Cybersecurity Camp v2.0) 応募課題晒し

所要時間

概要

課題1

課題2

課題3

内容

おわりに