How to Check If a File Is Binary in Git

GitGitBeginner
Practice Now

Introduction

In this lab, you will learn how to determine if a file is considered binary by Git. We will explore two methods: using the git diff --numstat command to observe how Git summarizes changes, and utilizing the standard Linux file command to identify the file type. By the end of this lab, you will understand how Git distinguishes between text and binary files and how to check this distinction yourself.


Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL git(("Git")) -.-> git/BasicOperationsGroup(["Basic Operations"]) git/BasicOperationsGroup -.-> git/add("Stage Files") git/BasicOperationsGroup -.-> git/status("Check Status") git/BasicOperationsGroup -.-> git/diff("Compare Changes") subgraph Lab Skills git/add -.-> lab-560025{{"How to Check If a File Is Binary in Git"}} git/status -.-> lab-560025{{"How to Check If a File Is Binary in Git"}} git/diff -.-> lab-560025{{"How to Check If a File Is Binary in Git"}} end

Use git diff --numstat to Check

In this step, we will explore how to use git diff --numstat to understand the changes between different versions of your files. This command provides a summary of changes, showing the number of lines added and deleted for each file.

First, let's make sure we are in our project directory. Open your terminal and navigate to the my-time-machine directory:

cd ~/project/my-time-machine

Now, let's make a change to our message.txt file. We will add a new line to it:

echo "Hello, Past Me" >> message.txt

The >> operator appends the text to the existing file, rather than overwriting it.

Let's check the status of our repository again:

git status

You should see that message.txt has been modified:

On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   message.txt

no changes added to commit but untracked files present (use "git add" to track)

Now, let's use git diff --numstat to see the summary of the changes we made:

git diff --numstat

The output should look something like this:

1       0       message.txt

This output tells us that in message.txt, 1 line was added and 0 lines were deleted. This is a concise way to see the overall impact of your changes across multiple files.

Understanding git diff --numstat is useful when you want a quick overview of how much a file has changed without seeing the exact content of the changes. It's particularly helpful when reviewing changes made by others or when you want to see the scale of modifications in your own work.

Run file Command on File

In this step, we will learn about the file command, a useful tool in Linux that tells you the type of a file. This is important because Git handles text files and binary files differently.

First, make sure you are in your project directory:

cd ~/project/my-time-machine

Now, let's use the file command on our message.txt file:

file message.txt

You should see output similar to this:

message.txt: ASCII text

This tells us that message.txt is a text file. Git is designed to work very well with text files because it can easily track line-by-line changes.

What about other types of files? Let's create a simple binary file. We can use the head command to take the first few bytes of a system file and redirect it to a new file in our project. For example, let's create a small "binary" file from the /bin/ls executable:

head -c 1024 /bin/ls > binary_file

This command takes the first 1024 bytes of the /bin/ls file and saves it as binary_file in your current directory.

Now, let's use the file command on this new file:

file binary_file

The output will be different, indicating it's a binary file. It might look something like this:

binary_file: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=..., for GNU/Linux 3.2.0, BuildID[sha1]=..., stripped

This output confirms that binary_file is not a simple text file. Git treats binary files differently because it cannot easily determine line-by-line changes. Instead, it typically stores the entire binary file for each version.

Understanding the difference between text and binary files is crucial when working with Git, especially when dealing with files like images, compiled programs, or compressed archives. Git's powerful diffing and merging capabilities are primarily designed for text files.

Test Text vs Binary Files

In this step, we will see how Git handles changes in text files versus binary files. This will highlight why Git's diffing capabilities are primarily designed for text.

First, ensure you are in your project directory:

cd ~/project/my-time-machine

We already have our message.txt (text file) and binary_file. Let's make another change to message.txt:

echo "Another line for the future" >> message.txt

Now, let's add both files to the staging area and commit them. First, add the files:

git add message.txt binary_file

Check the status to confirm both files are staged:

git status

You should see both files listed under "Changes to be committed":

On branch master
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
        modified:   message.txt
        new file:   binary_file

Now, let's commit these changes:

git commit -m "Add binary file and update message"

You will see output confirming the commit, including changes to both files:

[master ...] Add binary file and update message
 2 files changed, 2 insertions(+)
 create mode 100644 binary_file

Now, let's make a small change to the binary_file. We can append a single byte to it:

echo -n "a" >> binary_file

The -n flag prevents echo from adding a newline character.

Check the status again:

git status

Git will show that binary_file has been modified:

On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   binary_file

no changes added to commit but untracked files present (use "git add" to track)

Now, let's try to see the difference using git diff:

git diff

Instead of showing line-by-line changes, Git will likely tell you that the binary file differs:

warning: LF will be replaced by CRLF in binary_file.
The file has no newline at the end of the file.
diff --git a/binary_file b/binary_file
index ... ...
Binary files a/binary_file and b/binary_file differ

This output clearly shows that Git doesn't attempt to show the detailed changes within the binary file. It simply states that the files are different. This is a key difference in how Git handles text versus binary files. For text files, Git can show you exactly which lines were added, removed, or modified. For binary files, it can only tell you that a change occurred.

This step demonstrates why Git's powerful diffing and merging tools are most effective with text-based content, which is common in source code and configuration files.

Summary

In this lab, we learned how to determine if a file is binary in Git. We explored two primary methods. First, we used the git diff --numstat command to examine changes between file versions, observing how it summarizes additions and deletions, which can indirectly indicate the nature of the file based on the scale of changes.

Secondly, we learned about the file command, a standard Linux utility, and how to use it to directly identify the type of a file, distinguishing between text and binary formats. This direct approach is crucial because Git's handling of binary files differs significantly from text files.