Dependabot
Like bricks in a wall: Dependabot is a software supply chain security tool. If you have broken brick, your entire structure may degrade over time if you don’t fix it.
Software Supply Chain
The software supply chain consists of the 3rd party frameworks, tools, libraries etc. For example:
Spring Boot (Broadcom, formerly VMware)
you can find this in the Maven (Apache Foundation) or Gradle (IntelliJ) configs
more 3rd party configs like Thymeleaf will be pulled in transiently
Enterprise Java Beans (Oracle)
Fiori (SAP)
…
the JSON or XML parser ()
Typically, even Micro Service can have 150+ dependencies. Important components (like Log4J) aren’t always obvious.
Dependabot will resolve the entire dependency tree, lookup known vulnerabilities in popular catalogs, and produce a report.
Features
Dependabot works with Gradle (pom.xml can be generated) and Maven, as well as NPM, Go etc. The support is expansive
Dependabot cannot resolve the imports to actual modules, meaning that it won’t know whether the scanned software project really uses a vulnerable function. Veracode’s SourceClear can do that
The reporting depends on GitHub
GraphQL API
GitHub services have a REST and a GraphQL API.
Sadly, for GH Security, they are not consistently developed.
Python
Assuming you have
Pandas (
*_df
are DataFrame objects in the following), which accept nested JSON datarequests (library) is being used
Based on that, the following Python 3 code exemplifies how to generate a Software Bill Of Materials (SBOM) for a GitHub Advanced Enterprise Security enabled repository.
The set_score
function is explained separately. It generates basic metrics.
def get_sbom_issues_score(hed=dict, graphql_url="", verbose=False, repo="", org=""):
"""
Ask GitHub Sec API for data about the Dependabot findings and analyze it
:param hed: dict, auth data
:param graphql_url: GraphQL endpoint
:param verbose: boolean, flag
:param repo: string, repository name
:param org: string, org name
:return: sbom_score (int), sbom_severity_list (DataFrame statistical object)
"""
from string import Template
# this is the GraphQL query for the API
query_template_sbom = """
{
repository(name: "$repo", owner: "$org") {
vulnerabilityAlerts(first: 100) {
nodes {
createdAt
dismissedAt
state
dismissReason
securityVulnerability {
package {
name
}
severity
advisory {
description
}
}
}
}
}
}
"""
query_template_depbot_enabled = """
{
repository(name: "$repo", owner: "$org") {
id
hasVulnerabilityAlertsEnabled
}
}
"""
print("Dependabot Repo: " + repo, file=sys.stdout)
dbot_enabled_query = ""
sbom_query = ""
# prevent escaping the literal context of the graphql template
if "\'" or "\"" not in repo + org:
template_sbom = Template(query_template_sbom)
sbom_query = template_sbom.substitute({'repo': repo, 'org': org})
template_dbot_enabled = Template(query_template_depbot_enabled)
dbot_enabled_query = template_dbot_enabled.substitute({'repo': repo, 'org': org})
dbot_enabled_status = requests.post(graphql_url, headers=hed, json={'query': dbot_enabled_query})
parsed_dbot_status_rply = dbot_enabled_status.json()["data"]
dbot_status_df = pd.json_normalize(parsed_dbot_status_rply)
dbot_status = dbot_status_df["repository.hasVulnerabilityAlertsEnabled"].iloc[0]
response_dp = requests.post(graphql_url, headers=hed, json={'query': sbom_query})
parsed_dp = response_dp.json()["data"]
df_deps = pd.json_normalize(parsed_dp)
# we need to rename the columns because dots with table headers cannot get handled correctly
cols = df_deps.columns.map(lambda x: x.replace('.', '_') if isinstance(x, (str)) else x)
df_deps.columns = cols
# a sub-section of the flattened JSON gets extracted
sub_json = df_deps['repository_vulnerabilityAlerts_nodes'][0]
# needed in case there are 0 issues and the HTTP status code is ok
if len(sub_json) == 0 and response_dp.status_code == 200:
status = {"Status": "No findings"}
status_df = pd.DataFrame([status])
print("Dependabot Status: " + "no findings for repo", file=sys.stdout)
return 0, status_df
# handle disabled state
if not dbot_status or response_dp.status_code == 403:
status = {"Status": "Disabled"}
status_df = pd.DataFrame([status])
print("Dependabot Status: " + "disabled for repo", file=sys.stdout)
print()
return 100, status_df
if len(sub_json) > 0 and response_dp.status_code == 200:
print("Dependabot Status: " + "processing findings for repo", file=sys.stdout)
# data with the findings needs to be re-framed
dependabot_data = pd.DataFrame(sub_json)
# data needs to vbe flattened again
dependabot_issues = pd.json_normalize(pd.DataFrame.from_records(sub_json)["securityVulnerability"])
# since the data is flattened and framed from JSON we need to normalize the types
dependabotDf = pd.concat([dependabot_data["state"], dependabot_issues], axis=1)
dependabotDf["state"] = dependabotDf["state"].astype(str)
dependabotDf["severity"] = dependabotDf["severity"].str.lower()
# print(dependabotDf)
# column renamed again for this dataframe
cols = dependabotDf.columns.map(lambda x: x.replace('.', '_') if isinstance(x, (str)) else x)
dependabotDf.columns = cols
# filter out anything that's not been treated (marked as dismissed in API)
dependabot_severity_open_list = dependabotDf[dependabotDf['state'] == 'OPEN']
print(dependabot_severity_open_list)
if verbose:
print("Software Components Issue List (open)")
dependabot_severity_list = dependabot_severity_open_list["severity"].value_counts()
if verbose:
print(dependabot_severity_list)
print("Software Components Severity Score (open)")
sbom_score = set_score(severity_df=dependabot_severity_list)
# for better table style
dependabot_severity_list = dependabot_severity_list.reset_index()
dependabot_severity_list.columns = ['Risk', 'Dependency Findings Reported']
return sbom_score, dependabot_severity_list
An equivalent REST endpoint doesn’t seem to exist ( last time I checked Feb 7, 2023 )
This is equivalent for GH Cloud and on-premises Server variants
Use these results in an SSDLC
Secure Software Development Lifecycle
Here is how basic metrics can be generated: