Using YARA to attribute malware

If you’ve performed malware research, you’ve likely observed samples that are very similar in functionality, yet have different hashes, file sizes, etc.

When looking at the same malware at the assembly level, you might have noticed the “differing” malware may have functions and strings that are exactly the same.

Given enough analysis time, researchers can attribute samples to certain malware families. However, in-depth analysis methods (such as reverse engineering) can be a lengthy process, and that’s why tools are developed to streamline the analysis process and assist researchers in rapid identification when possible.

Even still, many of these tools have their own flaws; for example, some malware can bypass a sandbox altogether.

Fortunately, there are other tools we can fine tune to assist researchers in quick and easy identification of malware. One of these tools is known as YARA.

yara

” height=”65″ width=”350″ class=”aligncenter size-full wp-image-2084″>

YARA is an open source tool for identifying malware using a variety of techniques. YARA is quite flexible, and is of great value in emergency response situations where both tools and time may be limited, or just when hunting for malware that’s similar to something you’ve already seen.

YARA can be installed on both Windows and Linux systems, although you’ll have to build it from source code if you choose the latter option. In addition, there is a yara-python extension available if you desire to use YARA in python scripts.

Once installed, you can test it out by simply typing “yara” in a command prompt or terminal window.

yara_cmd

How to write rules:

Writing rules for YARA couldn’t be easier, really. Although YARA offers a lot of options for writing rules, to get started you only need to understand a few things. Observe the example rule below.

rule zaccess_3 {    meta:       author = "josh"       description = "ZeroAccess Trojan, WaesColaweExport found"    strings:       $WaesColaweExport = { 55 8B EC 5? 0F B6 [5] 8A [5] 8? [1-2] 99 0F B6 [1] F7 [1] B? [4] 8? [2] 8? [2] 66 (8B|A1) [4-5] 66 2B [1] 0F B7 [1] (35|83 F0) [1-4] C1 E8 [1-4] 8B E5 5D C2 }       $interface = “jjjinterface”    condition:       all of them }

I made this rule after finding several ZeroAccess Trojans that all had the same exported function called “WaesColawe”. In this rule, I’m looking for byte patterns found within the function, as well as a specific text string containing “jjjinterface” that’s used for a Windows API call.

The above rule contains two important keywords: strings and condition. Strings are the unique values to search for, while condition specifies your detection criteria. In this rule, both the byte pattern for the exported function and the text string must be found for a detection to occur. Most rules you create will have these two keywords, although there are some exceptions.

You will also notice the use of the meta keyword. Metadata isn’t necessary to create a rule, but is nice to have if a rule needs supporting information. The metadata can be referenced using the “–m” option at the command line.

Finally, notice how some of the bytes in the WaesColawe exported function differ, some with question mark (?), pipe (|), and bracket([]) characters. These represent wildcard values, logical ORing, and byte skips, respectively.

There is, of course, much more that you can do when creating YARA rules. I would recommend reading the official documentation if you are trying to master writing these rules.

Practical Uses Cases

YARA can be used in a variety of ways. In fact, there are many research tools that are incorporating YARA into their malware analysis packages.

Virus Total Intelligence allows users to upload their own YARA rules in order to track down samples.

VTI

Whenever new files are uploaded to Virus Total, they will be automatically scanned with a user’s own YARA rules. These feature is known as ‘Hunting’ on Virus Total.

vthunting
Using Virus Total Intelligence Hunting to find ZeroAccess and Solarbot malware

YARA is also used in another popular malware analysis tool known as Cuckoo Sandbox, a free, open-source python-based Sandbox that runs on GNU/Linux systems.

cuckoo

As the documentation describes, setting up Cuckoo is a “delicate” process. However, once it’s up and running it’s pretty good at providing automated malware analysis.

Like any sandbox, though, Cuckoo has some limitations. For example, you will likely run into issues when dealing with malware with VM detection.

When Cuckoo is configured and set up correctly, you can see which YARA rules were detected in the Sandbox report.

sandbox_rpt

These are just a few examples of how YARA is being used. Of course, there are many more.

What to avoid:

When writing YARA rules, there are certain things you will want to avoid in order to write reliable, effective rules.

In general, it’s always best to find static components within a binary for your rules. By that, I mean something that is common across different samples.

Sometimes this is a text string, but oftentimes more reliable detection is found in byte patterns (called hex strings in YARA). Other times, it’s a combination of both types of strings, coupled with a certain entrypoint value, or maybe a PE section hash. The possibilities are vast, and I would encourage any researcher to explore what YARA has to offer when developing a “tight” signature.

However, one must be careful that the detection criteria (condition) is not only static but unique to the malware sample/family involved. For example, some types of malware will statically link code libraries to enhance their functionality. It’s important not to base a rule on functions within these libraries, as this could easily lead to false positives.

Another important thing to remember is excluding byte offsets when writing hex strings. For example, consider two snippets of Assembly that are detected by the example rule provided earlier.

bytecomp1

bytecomp2

If you look closely, you can see the lines beginning with 0F B6 and 8A differ from each other. The instructions are the same, but the bytes have a different Virtual Addresses within the file. You can account for these by using byte skips (brackets []) in YARA.

Closing Thoughts

YARA is a free tool that allows researchers to create custom rules to identify malware. By searching for unique values found in byte patterns (hex strings) and textual data (text strings), researchers can quickly recognize and attribute malware.

YARA has many more capabilities than what has been described here. To understand how to use YARA at its full potential, I recommend first reading the Users Manuel.

A special thanks goes out to Victor Manual Alvarez, the creator of YARA. Victor currently works for Virus Total as a Software Engineer.

_________________________________________________________________

Joshua Cannell is a Malware Intelligence Analyst at Malwarebytes where he performs research and in-depth analysis on current malware threats. He has over 5 years of experience working with US defense intelligence agencies where he analyzed malware and developed defense strategies through reverse engineering techniques. His articles on the Unpacked blog feature the latest news in malware as well as full-length technical analysis.  Follow him on Twitter @joshcannell

ABOUT THE AUTHOR

Joshua Cannell

Malware Intelligence Analyst

Gathers threat intelligence and reverse engineers malware like a boss.