Weather (Google Capture the Flag 2022) Writeup

TCPサーバの接続情報とファイルが与えられた。
このファイルを7-Zipで開くと、以下のファイルが得られた。

A file and information to connect to a TCP server were given.
Opening the file with 7-Zip, I found these files:

Device Datasheet Snippets.pdf (以下、「PDF」) はシステムの構造についての説明、 firmware.c はサーバのプログラムのソースコードのようであった。

Device Datasheet Snippets.pdf (I'll call this as "the PDF") looked like a document about the structure of the system, and firmware.c looked like a source code for the server program.

デバイスへのアクセス Accessing the devices

PDFより、今回のシステムでは、プロセッサと各種センサおよびEEPROMがI2Cで繋がっていることがわかる。
このEEPROMには、プロセッサ用のプログラムが格納されていそうである。
なお、センサのI2CアドレスはPDFに載っているが、EEPROMのI2Cアドレスは載っていない。

firmware.c を読むと、

The PDF tells that the processor is connected to the sensors and the EEPROM via I2C in the system for this challenge.
The EEPROM should have the program for the processor.
Note that the I2C addresses for the sensors are on the PDF, but the address for the EEPROM isn't.

Reading firmware.c, I found that we can read from I2C devices by specifying the address to read and the number of bytes to read in this order after "r" like this:

のように、「r」に続いて読み込むアドレスと読み込むバイト数を順に指定することで、I2Cデバイスからの読み込みを行えることがわかる。
また、

Also, we can write to I2C devices by specifying the address to write, the number of bytes to write, the data to write in this order after "w" like this:

のように、「w」に続いて書き込むアドレス、書き込むバイト数、書き込むデータを順に指定することで、I2Cデバイスへの書き込みを行えることがわかる。

ここで、指定したアドレスは、port_to_int8 関数によって数値に変換される。
この関数では、まず is_port_allowed 関数でアドレスを表す文字列をチェックし、チェックを通過した場合のみ str_to_uint8 関数で文字列を数値に変換して返す。

is_port_allowed 関数では、指定した文字列が許可リストにある文字列のいずれかで始まるかどうかをチェックしている。
また、str_to_uint8 関数では、オーバーフローやラップアラウンドのチェックをしない単純な処理で十進文字列を8ビットの数値に変換している。
したがって、許可リストにある文字列 (例えば 101) で始め、その後の数字列をうまく選ぶことで、0～127の任意のアドレスを指定することができる。

例えば、101248000 は 101 で始まっているため、アドレスの指定として利用可能である。
さらに、256で割った余りが0なので、末尾の 000 の部分をアクセスしたいアドレスに置き換えることで、簡単に任意のアドレスを指定できる。
これを利用し、0～127の全I2Cアドレスへのアクセスを試みる以下のプログラムを作成・実行した。

Now note that the specified address is converted to an integer by the function port_to_int8.
This function first checks the strings that represent the addresses using the function is_port_allowed, and then converts the strings to integers using the function str_to_uint8 only if it passed the check.

The function is_port_allowed checks if the strings start from either one of the strings in the allow list.
Also, the function str_to_uint8 simply converts decimal strings to 8-bit integers without checking for overflows/wraparounds.
Therefore, we can specify arbitrary addresses from 0 to 127 by starting from a string in the allow list (101, for example) and properly selecting numbers to add after that.

For example, we can use 101248000 as the address because this starts from 101.
Moreover, since the remainder of this value divided by 256 is 0, we can easily specify arbitrary addresses by replacing the last part 000 with the address to specify.
I wrote this program that tries accessing all I2C addresses from 0 to 127 using this value, and executed that.

この結果、センサのI2CアドレスとしてPDFに載っているアドレスに加えて、アドレス33へのアクセスが有効であることがわかった。
したがって、このアドレス33がEEPROMのI2Cアドレスであると推測できた。

アドレス33は、101 で始まり、256で割った余りが33になる 101153 とも表すことができる。
101248033 よりも短いため、今後はこれを用いる。

As a result, I found that the I2C address 33 is available in addition to the addresses that are on the PDF as the addresses for the sensors.
Therefore, I guessed that the address 33 is the I2C address for the EEPROM.

We can also use 101153, which begins from 101 and the remainder divided by 256 is 33, to specify the address 33.
I'll use this from here because this is shorter than 101248033.

EEPROMのダンプ Dumping the EEPROM

PDFより、今回使用されているEEPROMは以下の手順でI2Cから読み出すことができることがわかる。

page index を書き込むことにより、読み出すページを選択する。
データを読み出す。最大64バイトを読み出すことができる。

1ページの大きさは64バイトである。
page index の指定方法は明示されていないようであるが、試した結果何番目のページを読み出したいかを0-originの1バイトで指定すればよさそうだった。
また、今回使用されているEEPROMは CTF-55930D であり、これは64ページあることがわかる。

これらを踏まえ、EEPROM全体の内容を読み出す以下のプログラムを作成・実行した。

The PDF is telling that the EEPROM used in this challenge can be read via I2C in this way:

Select the page to read by writing "page index".
Read data. We can read 64 bytes at most.

The size of a page is 64 bytes.
How to specify "page index" didn't look clearly specified. Some experiments showed that putting the index of the page to read (the first page is 0th) as a single byte should work.
The PDF is also telling that the EEPROM used in this challenge is CTF-55930D, which has 64 pages.

Based on these information, I wrote this program to read the whole contents from the EEPROM and executed that.

読み出した結果、EEPROMのデータは以下のようになっているようだった。

0x000～0x889 : 謎のデータ (機械語？)
0x88A～0xA01 : 文字列データ
0xA02～0xFFF : 0xFFの連続

EEPROMの解析 Analyzing the EEPROM

EEPROMの内容を読み出すことができたので、これを解析し、flagの取得に繋げたい。
文字列はEEPROMの内容と firmware.c の内容を紐づける特徴となりそうなので、まずはこれを足がかりとした解析を試みた。

例えば、firmware.c 中の関数 i2c_status_to_error では、以下のように文字列が連続で使われている。

Now I succeeded to read out the contents of EEPROM. What to do next is analyzing this and find some ways to get the flag.
I decided to begin with using strings for analysis because they looks useful to determine which part of the EEPROM corresponds to each parts of firmware.c.

For example, this function i2c_status_to_error in firmware.c has multiple strings used in row.

EEPROMから読み出したデータに strings --radix=x コマンドをかけた結果のうち、これらの文字列に対応する部分は以下のようになった。

This is the part of the result of strings --radix=x command used to the EEPROM data that corresponds to the strings.

これらのアドレスをEEPROMのデータから探したところ、このあたりにビッグエンディアンで入っていた。

I searched for these addresses from the EEPROM data. As a result, I found them (stored in big-endian) in this area.

この部分のデータをよく見ると、0x100～0x122において、90 XX XX 75 F0 80 22 というデータが繰り返されていることがわかる。
(XX XX はそれぞれの文字列のアドレスをビッグエンディアンで表したものである)
さらに、0x0EC～0x0FFのデータは BF YY ZZ 80 WW というパターンの繰り返しになっており、WW の次のバイトからWWバイト後が90 XX XXになっていることがわかった。
これらを i2c_status_to_error 関数の処理と照らし合わせ、これらのデータは以下の意味を持つ機械語であると推測した。

90 XX XX : 即値 XX XX をどこかにロードする
75 F0 80 22 : 関数から戻る
BF YY ZZ : 何かの値がYYでない場合、実行を ZZ バイト飛ばす
80 WW : 無条件で実行を WW バイト飛ばす

ここで、プロセッサの名前が CTF-8051μC であり、「8051」が入っていることから、8051の命令セットが使われているのではないかと考えた。
8051の命令セットについては、例えば以下のページに情報がある。

Carefully looking at data in this area, I found that a pattern 90 XX XX 75 F0 80 22 is repeated from 0x100 to 0x122.
(XX XX is the addresses of each strings, represented in big-endian)
Moreover, I found a pattern BF YY ZZ 80 WW repeated from 0x0EC to 0x0FF, and that the pattern 90 XX XX begins at WW bytes ahead from the byte next to WW.
Comparing these findings with what the function i2c_status_to_error does, I guessed that these patterns are machine codes that has these meanings:

90 XX XX : Load an immediate value XX XX somewhere.
75 F0 80 22 : Return from a function.
BF YY ZZ : If the value of something is not YY, skip ZZ bytes to execute.
80 WW : Unconditionally skip WW bytes to execute.

Here, I guessed that the instruction set of 8051 is used here because the name of the processor is CTF-8051μC, which contains "8051".
We can found information about the instruction set of 8051 on, for example, these pages:

これらのページを参照すると、75 F0 80 22 は 75 F0 80 (値0x80をアドレス0xF0にストアする) と 22 (RET) に分割でき、
推測した機械語の意味と合っていそうであることがわかる。
そこで、以下の逆アセンブラを利用し、EEPROMのデータ全体の逆アセンブルを行った。

Referring these pages, I found that the instruction set matches with the guessed meanings of machine code,
dividing 75 F0 80 22 into 75 F0 80 (store the value 0x80 to the address 0xF0) and 22 (RET).
Seeing this, I disassembled the whole data in the EEPROM using this disassembler:

その結果、全体を自然に逆アセンブルすることができたので、8051の命令セットが使われていると仮定して進めることにした。

As a result, the whole data is smoothly disassembled. Therefore, I decided to assume that the instruction set for 8051 is used to proceed to the next step.

FlagROMの読み出し Reading the FlagROM

使われている命令セットが推測できたので、残る仕事はFlagROMの内容を読み出すことである。
PDFによれば、FlagROMは256バイトのROMであり、アドレス用のレジスタにセットした位置のデータがデータ用のレジスタから読めるようである。

I2C経由でEEPROMの内容を書き換えることで、実行するプログラムを書き換えることができるはずである。
ただし、ビットを1から0にすることはできるが、0から1にすることはできないようである。
幸い、EEPROMの後半に0xFFが連続している部分があるので、ここには任意のプログラムを書くことができる。
そこで、既存のプログラムをうまく書き換え、この部分に書いたプログラムを実行する方法を考えることにした。

逆アセンブル結果を調べた結果、出力する文字列のアドレスをセットしてから呼び出されていることなどから、
X0123 が serial_print 関数に相当しそうであることがわかった。
さらに、その冒頭部分は以下のようになっていた。

Now I guessed the instruction set used. What to do next is to read the FlagROM out.
The PDF tells that the FlagROM is a 256-byte ROM, and that data whose position is specified via the address register is available via the data register.

It should be possible to change the program to execute by changing contents of the EEPROM via I2C.
Note that changing bits from 1 to 0 is possible, but changing bits from 0 to 1 looks impossible.
Fortunately there is a region with continuous 0xFF in the latter part of the EEPROM and we can write arbitrary programs there.
Therefore, I decided to find a way to change the existing program to have it execute programs written to the region.

Reading the disassembled code, I found that X0123 should correspond to the function serial_print.
One of the reasons is that the label is called after setting the addresses of strings to print.
This is the first part of the location:

このうち 83 af f0 という部分は、適切にビットを1から0にすることで、02 0a 40 にすることができる。
これはアドレス0xA40に実行を移すLJMP命令であり、0xFFが連続している部分に実行を移すことができる。

次に、このアドレス0xA40に書き込む、FlagROMの内容を出力するプログラムを作成した。
serial_print 関数のプログラムを参考に以下のプログラムを書き、手動で機械語に変換した。

The piece 83 af f0, which exists in this area, can be changed to 02 0a 40 by properly changing some bits from 1 to 0.
This is the LJMP instruction to jump to the address 0xA40, so it can execute the region with continuous 0xFF.

After finding this, I created a program for writing to this address 0xA40 to print the contents of the FlagROM.
Referring to the program of the function serial_print, I wrote this program and converted to machine code manually.

PDFより、EEPROMにI2C経由で以下のデータを書き込むことで、EEPROMに書き込みを行うことができることがわかる。

書き込むページ (64バイト) を表すPageIndex
WriteKey (A5 5A A5 5A)
0にするビットを表すバイト列 (0にするビットを1で表す)

これに基づき、作成したプログラムをCyberChefで書き込み用のバイト列に変換した。

The PDF tells that we can write to the EEPROM by writing the following data to the EEPROM via I2C:

PageIndex that represents the page (64 byte) to write
WriteKey (A5 5A A5 5A)
Sequence of bytes that represents the bits to change to 0 (Bits with the value 1 means that the bits should be changed to 0)

Based on this, I converted the program I wrote to a sequence of bytes to use for writing using CyberChef.

以下が、これを用いて作成した、このプログラムを0xA40から書き込むコマンドである。

This is the command to write this program from 0xA40, created using the result of this conversion.

さらに、以下が serial_print 関数の冒頭部分を0xA40にジャンプするように書き換えるコマンドである。
83 af f0 の部分を書き換えるだけでなく、その前の ae を 00 (NOP) に書き換えることで、LJMP命令を実行させている。

This is the command to modify the beginning part of the function serial_print to have it jump to 0xA40.
This command will not only modify the piece 83 af f0 but also change ae placed before the piece to 00 (NOP) to have it execute the LJMP instruction.

これらのコマンドを以下のように続けて実行することで、flagが得られた。