Git hack happens when site maintainers use Git to manage the source code of the website but forget to delete .git directory. By accessing http://your.target.site/.git/
, you can easily find Git information and even the source code of the target website. Sometimes you may see 403 when you are trying to visit that URL, but that is because the access to directories is restricted. In this case, you can still access and download certain files if you know the exact URL to the file.
Why to write this script
Recently I went through a CTF hosted by Wuhan University. There was a web challenge about Git hack.
When I was trying to solve that challenge with a widely-used GitHack script which was supposed to download some source codes, no files were downloaded even all of the .git files where listed on the page. Quickly I analyzed the script and found out the reason.
For that script, the process is like
- Retrieve the index file and extract file names and SHA1;
- Download the files with the SHA1;
- Save the downloaded files into organized directories.
Seems everything is going perfect. However, a fatal problem may happen to the step 1. If there is no files in the current repository, the index file will be empty!
The index is a single, large, binary file in <baseOfRepo>/.git/index, which lists all files in the current branch, their sha1 checksums, time stamps and the file name — it is not another directory with a copy of files in it. [1]
After I realize this problem, I decided to write a new one with another method.
How my script works
First off, we need to know what we need in the .git directory to restore files.
HEAD # HEAD pointer ref refs \_ heads \_ master # real pointer to the newest commit objects \_ * # all files compressed with zlib
- HEAD: reference to the pointer. (ref: refs/heads/master)
- refs/heads/master: real pointer. (f43003bce4f11d9b2532b5fac0a0006126f14e2a)
- objects: a file with the hash will be stored here in the format of ./hash[0:2]/hash[2:] . Files are compressed with zlib.
Secondly, there are 3 types of objects in the object directories.
Commit
It specifies the Git log information and the hashes of the tree node and parent node. Example shown below.
commit: b'commit 210\x00tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904 parent 912133f7bd228e96757a531ef52d1777de10ca8a author Alice <[email protected]> 1554748133 +0800 committer Alice <[email protected]> 1554748133 +0800
Tree
It specifies the tree structure. The branches of a tree node can be either a blob or another tree. The entries in the tree contains filenames and the hashes their blobs. The graph[2] describes the structure. Also, an example is provided in the below.
tree: b'tree 0\x00’ # or b"tree 32\x00100644 flag\x00'0\xedavmI\x048\xad\x80\xe0\x7f\xec\x85|\x83\xfbB\xb3"
Blob
It specifies the content of a certain file. Example here.
blob: b'blob 37\x00WHUCTF{xxxxxxxxxxxxxxxxxxxxxxxx}\n'
My script retrieves the first commit hash from the refs/heads/master in the step 1, and then traverse all commit hashes with the recursion algorithm. Within each round, the script will locate the tree, parses it and then find the blob and extracts it. Then, all files will be classified in to organized directories.
Script
Without installing git, you can download files.
Project: https://github.com/hazzel-cn/GitHack
Usage:
git clone https://github.com/hazzel-cn/GitHack.git cd GitHack python3 githack.py http://you.target/.git/
Screenshot
githack.py
gitclone.py
Reference
- [1] https://stackoverflow.com/questions/3689838/whats-the-difference-between-head-working-tree-and-index-in-git/3690796
- [2] https://www.jianshu.com/p/8659c9ae00cb
- [3] https://www.jianshu.com/p/6d93d6153070