Some points to analyze before sending logs to Sumo Logic

So, you want to start log ingestion with Sumo Logic?
First, you need to choose the Collector’s name or source needed to collect logs called Metadata. We can follow the official document about naming Sumo Logic's metadata: The Source Category which is used for optimizing log search and I believe is the most relevant in terms of design.

Common naming convention

The naming convention should be considered first. You can use underscores, hyphens, periods, and slashes for name breaks. Sumo also recommends including a slash, especially for Source Category names.

Naming Source Categories

The Source Category is set when configuring the data source. It can be defined as a field and basically is used to speed up the query when searching logs and limiting the search scope. The official Sumo document states that naming (correctly) source categories require the following 3 points to be considered as design elements:

  1. Define the search scope.
  2. Determine the data index and partition.
  3. Data retrieval by role-based access control (RBAC)

Let's take a closer look at each element:

1. Define the search scope

By defining the scope it's possible to perform a flexible search. Permits categorize the collected logs in logical units from a business side or from an IT or security operations management perspective. Sumo Logic recommends forming a Source Category as a hierarchy of metadata tags, in "general-to-specific" order, separated by forwarding slashes (/), like this:

  • IT/Network/Firewall/Cisco/ASA
  • IT/Network/AP/Aruba
  • Service/Prod/App/StoreFront
  • Service/Dev/App/Shipment
  • Security/PrismaCloud
  • Sandbox/Leo/linuxtest
  • Debug/CustomApp/LeoApp

By doing this, you can find the desired log with just a limited search scope.

image.png

2. Determine the data index and partition

Due to search speed, you can search by "Index units" during log search by compartmentalizing into a logical data area called Partition. This is intended to streamline data retrieval by narrowing the range of data when scanning/searching. You can configure Partition from the Sumo console: Manage Data > Logs > Partitions and set the Partition name and Routing Expression at any time. By specifying _sourceCategory on the Routing Expression, the corresponding log will be indexed.

Partition settings:

image.png

Partition routing:

image.png

Sumo recommends a maximum number of partitions up to 20 (in terms of index fragmentation and data management).

3. Data retrieval by role-based access control (RBAC)

You can use Source Category to control access to your data and consider environment access control, geographical information, or organizational units, based on the Sumo user role.

Data access control settings can be set by Search Filter: Administration > Users and Roles > Roles.

If you want to limit access to data for each group, company, or department, design it as a classification in the source category:

Azure/[ACCOUNT_ID]/Prod/AD

image.png

Other metadata name design

Collector's name

The Collector name will be the value entered when it was activated. For Installed Collectors, you can set its name during the host install. For Hosted Collector, specify it when setting up a collector on Sumo's console.

Host name

The source host is set automatically by Sumo acquiring the actual hostname at the Device OS level from which the log was collected. It's possible to rewrite the source host with any value in the source settings. There is a 128-character limit.

These defined source hosts can be used as follows during the search.

  • _sourceHost=*USWest*
  • _sourceHost=*MongoDB*
  • _sourceHost=*FW*

Source name

The source name is the log file path when setting the source. Keep in mind that all the above metadata can be used as search conditions but after any Source category.

The order is of paramount importance!

Conclusion

By thinking and designing metadata ahead, you can create flexible query statements to speed up log searches.