Threat intelligence is a popular topic in security circles these days. Many organizations are now using a threat feed that comes bundled with some other security product, such as McAfee’s GTI or IBM’s X-Force feeds. Lots of products, notably SIEMs, have added support for some sort of integration with specific threat intelligence feeds or more generic imports via STIX/TAXII. With many now hoping to take advantage of the large number of open source and free intelligence feeds available. Some are even investing in commercial intelligence feeds.
However, as many organizations quickly discover, without effective management of the threat intelligence lifecycle, making effective use of this valuable information is nearly impossible. Today, an organization has two choices for managing threat intelligence, these are to deploy a threat intelligence management platform, or a manual in-house management program. The steps required to set up a manual threat intelligence lifecycle program will be outlined below for those who prefer this approach.
Effective threat intelligence management consists of six main functions or processes:
- Threat intelligence source selection
- Threat intelligence capture
- Threat intelligence processing
- Actioning threat intelligence
- Threat intelligence analysis
- Threat intelligence maintenance
Each of these requires consideration of multiple challenges and requires particular skill sets be present or contracted. Right now, we will look at threat intelligence source selection and threat intelligence capture.
Source selection is actually not the first step in setting up a manual threat intelligence program. Before any threat intelligence can be made useful, you must first have something against which to compare it. This will usually be a log management system or SIEM technology, collecting logs or other key information from security devices in your environment.
Without this critical foundation, there is no way to correlate what is happening in your environment against the intelligence you are collecting, and therefore no way to know when you are communicating with any of the malicious indicators you have identified. Choose carefully, as the limitations of the chosen solution may reduce your options when it comes time to integrate your threat intelligence.
Assuming you have an adequate solution in place, you are ready to select the intelligence sources from which you wish to collect. You can choose from free/open source feeds or you may purchase a feed from one of the several dozen vendors in the market today. There are well over a hundred free or open source intelligence feeds available.
Many of these feeds get their indicators from the same sources and report on the same indicators, creating large areas of overlap and duplication of data. This is an important consideration, as too much overlap can negatively impact the later stages of the threat intelligence management process. There are dozens of paid feeds available as well. Each has their own areas of focus, and costs vary widely. Although the quality of paid feeds is high, the cost of subscribing to multiple feeds can add up quickly.
Careful attention should be paid to contract negotiations with feed vendors so that you are absolutely clear about which of their feeds you will have access to and which you will not. Another important consideration should be the methods supported for ingesting those feeds. A flexible API (Application Program Interface) would be an advantage in this instance, since you will be integrating each of these sources in-house.
Capturing Threat Intelligence
Once you have settled on the sources you wish to collect, a method of collection must be established. If you have lots of sources identified, you are likely to be forced to support several different methods of collection. In some cases, delivery will be automated, such as TAXII over email, or received by email, but in a format that must be converted such as a CSV, PDF, XML, or even free text. Some websites will publish threat intelligence in HTML or XML formats, from which users may either capture it manually or script an automated method to scrape the site at a predetermined interval. STIX and TAXII are widely supported standards for formatting and delivery, but support is by no means universal. An API may be available for some feeds. This is certainly the case for most commercial feeds, but it may or may not be the case with open source or free intelligence feeds.
The APIs themselves will generally require reviewing reference documentation to understand how to access them, how to request and/or retrieve data, as well as limitations on use such as rate limits. Leveraging APIs to ingest feeds can be fairly straightforward but does require scripting or some other mechanism to actually pull the data and do something with it. Additional care and feeding may be required over time as APIs do change as features are deprecated or added, and tweaks are made for improved efficiency. Major overhauls of APIs are not unheard of and may break a lot of automation if previous APIs are deprecated. Monitoring API sources for updates is an important part of keeping feed collection running smoothly.
You should automate as much of the threat intelligence collection process as possible. This can be done mostly via scripting but may require some additional efforts around collecting via email or web scraping. Putting in this effort pays off over time as manual collection consumes time few teams have to spare. It also frequently takes analysts away from their primary duties while they focus on the mechanics of manual collection. Source selection itself may end up being limited due to the inability to regularly capture the available data without it being manually collected.
Once captured, threat intelligence data must be processed. Processing includes several steps:
- Storage of indicators
- update, expiration and removal of old indicators
- Score/Weight intelligence
- Enrich indicators for context
- Associate indicators with Actors, Campaigns, Incidents, TTPs, etc.
- Track and maintain Actor Alias list
If you have chosen more than a very few feeds, you will likely encounter a variety of formats. If you’re lucky, it will be something structured specifically for intelligence, like STIX, OPENIOC or CYBOX. Others will use XML or JSON, which are also structured, but not specifically created for threat intelligence information. The rest will arrive as unstructured text in a variety of file formats. You could receive intelligence via CSV, text, PDF, Word document, or any other format. You will need the necessary expertise to normalize the disparate feeds by parsing the required data out of the feeds.
This could require a sophisticated understanding of RegEx and/or JSON/XML. Expect to create a different parser for each of your unstructured sources of intelligence. You will also need to store the parsed information in a database for later use. To give you a sense of scale for this initial repository, remember that, today, collecting a large number of feeds could result in as many as 10 million indicators per day or more. That number will only increase with time.
Plan accordingly. However, before storage, you should de-duplicate your collected indicators to reduce the overall processing load. It is important that care is paid in the preceding the normalization step, as incorrectly normalized events will not be identical, and therefore will not be de-duplicated, resulting in unnecessary load and copies in later stages of the process. These duplications could even lead to replica alerts and investigations.
One way to handle this would be to check the database for the existence of a particular piece of data before adding it as a new entry. If it is already there, adding a tag to the existing entry to note it was also found in another source is useful for context. Once you have normalized, cleansed, and stored your chosen indicators, you must do necessary maintenance on previously collected indicators. The reason for this is that indicators change over time. Sometimes they change types, such as going from a scanning IP in March 2014 to a brute force IP in May of 2015. You need to not only capture and reflect these changes over time, but also “expire” indicators after some period of time.
This can be an arbitrary time frame that is set globally, say 30, 60 or 90 days, or it can be set individually by indicator type. Be aware though, that failing to expire indicators promptly will result in increased false positives, as can expiring them too quickly. It is a balance that must be struck, monitored and adjusted as needed. Next, you will want to score and/or weight your intelligence in some fashion. Both give you the ability to prioritize certain indicators or sources, to allow you to focus your attention on those first, among the millions of indicators consumed each day. Do you trust one feed more than another? Give it a higher weight. Use that weight in your evaluation rules to prefer information from this source.
Do you consider one type of indicator more threatening than another? Most do, but you will need to define them yourself, decide how you will classify them, and then incorporate these values and weights into your evaluation of what to present to your analysts. The scoring and weighting are the first enrichments you will perform on your intelligence data. Since you want to maximise the number of events/incidents, etc. your analysts can triage each day, you may choose to enrich your indicators for context. In addition to scoring and weighting, enrichment can mean many things. For example, information such as GeoIP, WHOIS requests, or reports from sites like VirusTotal or SHODAN. Basically, anything that will help your analysts to make a decision in the shortest amount of time should be considered at this step.
Enrichment challenges include possible costs for commercial enrichment sources, coding or scripting necessary to integrate with your indicator database and maintenance of those mechanisms that enable the integration. Each new source of context brought in increases the size of an indicator, so planning should include such increased storage requirements. Advanced enrichments might include associations with actors, campaigns or incidents and tracking of actor aliases. These further enable analysts to gather all relevant information on indicators into one place, requiring less manual research and timelier decision-making
Actioning Threat Intelligence
Although a database of indicators and contextual information is useful, it is not enough. Once a storehouse of normalized, vetted, enriched information has been created, organizations must devise a means to use this information in some way. In order to confer real-time, let alone proactively benefit, the collected intelligence may be provided to some other security technology already in place. Most often, this is the SIEM or log management solution, but can include other technologies as well.
For example, firewalls that support it could be given a list of IPs, domains or URLs that will be automatically blocked. Similarly, web proxies could be given web or domain information to do the same for user web traffic. IDS/IPS is another possible integration point, and some might opt to deliver MD5 or SHA hashes to endpoint protection solutions to enhance the lists of malware for which they monitor. After identifying the technology or technologies you wish to integrate, the normalized intelligence data must be extracted and forwarded to that destination. In order to do this, you will need to determine which fields are useful to each technology, create queries to retrieve that information, reformat it into something that it will understand, and then create a mechanism to forward that to the device/s involved. For example, if you are using ArcSight, you will need to send it in the CEF format.
For each additional integration, you will need to repeat this process with another forwarder, each of which need to be maintained over time. Once the information has arrived at its destination, you must create “content” that will take advantage of the information. In the case of firewalls, it could be as simple as creating a block list and writing a rule to reference it. In the case of SIEMs, it might include custom parsers, lists, rules, alerts, dashboards and reports. Each integration will require its own content be created. Just as with every other component in the process, this must also be maintained and updated over time. The final task in this stage is to automate the indicator import and expiration process. Indicator import is obvious, but expiration is equally important to avoid overloading the integrated technologies with lists that grow ever larger over time. Without automation, you will have to establish and manage a manual import and expiration process.
Everything we have discussed to this point is meant to deliver the right information to your analysts. The intelligence still has to be analyzed. An analyst workflow process must be created including incident escalation and response processes.
This workflow must provide a repeatable process to analyse the output of the integrations you have created in the previous steps. For example, if the SIEM determines that a server is communicating with a known botnet command and control domain, your analyst must be notified in some fashion (on screen prompt, email, SMS, IM, etc.). The analyst must then evaluate the collected information and decide if this is, in fact, happening, and then take appropriate action based on that decision. If the analyst determines that the notification is not correct, they should document their findings for future reference and move on to the next analysis. If the analyst verifies that the notification is correct, they should begin a formal set of incident response steps.
In addition to providing analysts a workflow, you must also provide them with the tools they need to gather information on incidents they analyse. This is where enrichment of the sort discussed in the processing step can be useful. Analysts use sites like SHODAN, Web of Trust, VirusTotal and more to gather additional information about indicators they see. If these sources of information can be integrated into your threat intelligence management platform, you can remove the need to go seek it out manually and save precious time when making an escalation decision.
One final tool you may wish to provide to analysts is the ability to do indicator expansion in their research. This refers to looking at indicators that are in some way related to the indicators seen in the local environment, and doing a secondary search to see if any of those indicators have also been present. Many organizations struggle with this due to short retention periods of gathered log data. An analyst can only go as far back as the data he has to work with.
If a threat has been found inside the network, the next steps are to find additional details about it to determine, as best as possible, where the infection came from, how it happened, and when it happened. Traditional incident response practice kicks in here to ensure the threat is isolated and remediation steps are handled accordingly.
Threat Intelligence Maintenance
So far in this series we have covered everything from defining the sources of external intelligence to collection, processing, and analyzing threat intelligence. Now, once an analyst has made a decision, the output of that decision must be captured and stored, preferably within the system. Both, a “threat” and “non threat” need to be documented appropriately. If it was determined to be a threat, additional output could be to include notes, reports, recommendations or other documentation. It can also include additional information gathered about the indicators themselves.
All this should be easily available for future reference. Indicators must be maintained over time as well. Some methods of incorporating new information about existing indicators while retaining the previous information is required. Although today, an IP may be actively engaged in brute force attacks, then next week it might get cleaned up and re-imaged. That same IP might be clean for two years before getting compromised again and being put into service as a botnet command and control IP. Analysts need to be able to see these changes over time in order to avoid confusion in analysis.
Additionally, if integration content, such as SIEM alert rules, is based on categories or other elements that change over time, automated monitoring may fail to detect new threats and may identify threats incorrectly.
Threat intelligence can offer concrete benefits to organizations, making security analysts more efficient and effective, but only if that intelligence has been managed correctly. Poorly managed threat intelligence can lead to incorrect decisions that may have lasting consequences for the business or organization. I have attempted to lay out the steps necessary to create a manual threat intelligence management process.
As you can see, it is a complex undertaking, which may require a lot of resources. Some organizations have the necessary skill in-house to develop such a program. Many do not. Given the level of effort required around developing and maintaining a manual threat intelligence program, even those capable of building their own may opt for a commercial threat intelligence management platform.
There are a lot of moving parts and all require a level of diligence to be done appropriately. It is important that you do an honest assessment of your own organization before starting a project to develop a manual threat intelligence management program.
Chris Black, senior sales engineer, Anomali.
Published under license from ITProPortal.com, a Future plc Publication. All rights reserved.