Before the Threat Hunt: Enriching the Section 1260H Software List with AI

Table of Contents

TL;DR
#

The U.S. Department of Defense publishes a list, under Section 1260H, of companies it has identified as Chinese military companies operating in the United States. I wanted to turn that roster of corporate names into something a defender can use: an inventory of the software those companies publish, and a way to find it on a Windows machine. This is enrichment work. It is the step before a threat hunt, not the hunt itself. I ran it twice. The first pass, with Microsoft Copilot, produced an analysis that looked right and a dataset that was empty. The second pass, with Claude Code, produced a dataset that was disciplined and mostly unverifiable. What shipped is smaller and honest: a sourced catalog of companies and their software, and a simple PowerShell script that gives you a place to start. The other cost was time. The fast start turned into a long slog dragging the work back to something trustworthy, and the lesson I am keeping is about managing that time, not the tools. The result is on GitHub.

What I was trying to build
#

Section 1260H of the FY21 National Defense Authorization Act (Public Law 116-283) requires the DoD to identify Chinese military companies operating in the United States. The current notice is dated June 8, 2026. It names 80 parent entities, 108 named subsidiaries, and 10 companies that have been removed, or delisted, over time.

One point matters before anything else. A 1260H designation is a corporate-affiliation flag. It is not a finding that a company’s software is malicious, and it is not a sanction. The teeth are in defense procurement law, not in your endpoint. So the right posture for a defender is recognition and inventory, not blocking. If you find this software, you report it for human review and you have a conversation with the system owner. You do not auto-block lawful software that a user may have installed on purpose.

That framing set the goal. I did not want indicators of compromise. I wanted to answer a plainer question: of these 80 companies, which ones publish software that can land on a Windows system, what is that software called, and how would I notice it is there. Get that right and you have the raw material for a hunt later. Get it wrong and you have a spreadsheet that wastes everyone’s afternoon.

The first pass: Copilot looked right and came up empty
#

I started in Microsoft Copilot, working through the list interactively. We built the company and subsidiary lists, classified each vendor, and walked through which ones ship Windows software. The analysis on screen seemed solid, and honestly most of it was. The summary it produced named the right vendors. It sorted them sensibly into surveillance, consumer, networking, and the large set that ships no Windows software at all. It even listed correct example artifacts, names like iVMS-4200.exe and HCNetSDK.dll for Hikvision and SmartPSS.exe for Dahua. As a conversation, it was a competent first pass.

To make the indicator searches easier to manage, we normalized the work into several cross-referenced CSV files: one for companies, one for aliases, one for parent and subsidiary relationships, one for software, one for indicators, and mapping files to tie them together. I summarized and packaged it for Claude Code to verify and enrich.

Then Claude Code opened the files, and the floor gave out. The structured dataset contained the cross-reference scaffolding and none of the findings. The company file was a list of Company_1 through Company_190. The software file was Software_1 through Software_100, with the category field set to the literal word “category.” The indicator file was proc_1 through proc_150. The relationships were mechanical, the confidence scores were a flat constant, and every table joined cleanly to every other table while holding no real content.

A second bundle was a little more alarming. It held 13 real indicators, names like iVMS-4200.exe and HCNetSDK.dll, the genuine article. But the source column was a bare label like “Hikvision docs” rather than a citation, the confidence was a uniform top score, and the attribution file mapped every single indicator back to UNKNOWN for both software and company. One entry, a driver named boe_display.sys, appears to be invented outright.

I am still not entirely sure how the live analysis failed to land in the saved files. What I can say is what I observed. The model produced a convincing shape of a dataset, the right columns, plausible row counts, clean joins between tables, and filled it with sequential placeholders instead of the work we had just done. The normalization into several tidy linked files made the result look more rigorous, not less, which is exactly what hid the problem. Nothing in the file structure told you the keys pointed at nothing.

The lesson landed hard. A complete-looking, well-organized artifact is not data. The conversation feeling right is not the deliverable being right. You have to open the file and confirm that the findings, the citations, and the attribution actually made it in. I quarantined the entire Copilot output as untrusted and kept it only as a record of the failure.

The second pass: Claude Code went deep and hit a wall
#

I rebuilt from the source notice under one rule: verified, not derived. No software name, file name, registry key, or service name goes in unless it traces to a real, fetched source. Nothing gets guessed.

Claude Code built a real verification pipeline to enforce that. It checked vendor software against the winget package repository, cross-referenced naming against the NIST National Vulnerability Database, used VirusTotal to confirm code-signing attribution, and did read-only checks that a vendor’s own servers actually serve a named installer. Each claim carried a confidence tier, and only the top tiers were allowed to ship. The discipline worked. The pipeline caught and rejected its own weak spots, including a version-pinned installer name and a malformed vendor path that earlier passes would have shipped.

It was rigorous, and it ran into a wall that no amount of rigor gets you past. You cannot confirm most install-time artifacts from public sources without installing the software, and I was not going to install surveillance and device-management tools from designated vendors to harvest file paths. The names of executables, the DLLs, the service names, the registry paths, the durable stuff a hunt actually keys on, mostly live inside the installer, on the disk, after setup runs. Public pages will tell you a product exists. They rarely tell you what it drops.

So the deep, careful pass produced very few verifiable host-level indicators. Two cleared the bar: the WeChat installer, verified as served from Tencent’s own content network, and a QQ uninstall registry key, corroborated through package metadata. Two. Everything I had imagined exporting into tidy STIX and YAML indicator files had almost nothing real and verifiable to carry. Building those exports would have produced impressive, well-formed files that told a defender very little. Going too specific turned out to be its own failure mode, the same family of mistake as going unsourced. It just wears a nicer suit.

What I actually shipped
#

I stopped trying to manufacture a detection feed that the evidence would not support, and I shipped the two things that are real and useful.

The first is the catalog. All 80 parent entities are assessed and given a disposition. Nineteen of them publish installable Windows software. That comes to 59 products, each recorded with a fetched source URL and a confidence tier: 40 confirmed on a vendor-official source, 11 confirmed on a reputable third party, and 8 named but not yet confirmed for Windows. The rest of the 80 are hardware and firmware vendors, server and Linux platforms, or companies that ship no installable software at all. The vendors that do matter for endpoint recognition are the names you would expect, including Hikvision, Dahua, Huawei, DJI, Tencent, TP-Link, and Qihoo 360.

The second is a starting point for the hunt, not the hunt. Find-DesignatedVendorSoftware.ps1 reads the Windows uninstall registry and flags entries whose publisher or product name matches a designated vendor. It is deliberately simple and deliberately honest about its limits. It only sees software registered in Add and Remove Programs with a recognizable vendor string. It will miss portable apps, SDK components shipped as DLLs inside other software, renamed binaries, and vendors whose publisher string is localized or unexpected. Treat a clean result as “not found by this method,” never as “not present.” Pair it with the catalog and a real conversation with the people who own the systems.

What this taught me about using AI for enrichment
#

Two failure modes sit on either side of the useful middle. Copilot handed me something complete, organized, and hollow. Claude Code handed me something precise, careful, and impossible to verify. Neither looks like the cartoon version of an AI mistake, the obvious howler you can laugh off. They look like good work until you check.

There is a lesson here that is about me, not the tools. The analysis was solid and the AI setup is strong. What I manage badly is the clock. AI gets you most of the way fast, and then the last stretch, the part where you verify, reconcile, and rebuild, quietly eats the time you thought you saved. This is a known pattern. Addy Osmani calls it the 70 percent problem: the model carries you to roughly 70 percent quickly, and the final 30 percent, the edge cases and the verification, is as slow as it ever was. I felt that twice on this project. The fix is not a better prompt, it is a time budget. Decide up front how long the research and the coding get, watch for the point where another hour will not make the deliverable any more true, and then call it and ship the smaller honest thing. I am still building that discipline. Knowing when to stop is a skill, and you get it by doing work like this and paying attention to where the time went.

What AI did genuinely well was the enrichment itself. It worked across 80 companies far faster than I would by hand, drafted the structure, classified vendors sensibly, and surfaced real leads worth verifying. That is real value, and I will keep using it for exactly that. What it could not do was conjure verifiable indicators that do not exist in public sources, and it will happily produce something that looks like it did. The judgment about what counts as evidence, how specific to get, and when to stop and ship something smaller stayed where it belongs, with a person.

That is the same discipline behind report, do not block. It is the same reason the AI work we are doing for defense at Cutaway Security runs through controlled accounts with a human reviewing the output against the real environment. AI is a strong enrichment engine. It’s not a source of truth, and the moment you treat its tidy output as truth, it will hand you a beautifully formatted empty box.

Go get it
#

The catalog, the dataset, the PowerShell script, and a methodology document that spells out what the work does and does not claim are all in the repository: github.com/cutaway-security/threat_hunt_china_entities. It is best-effort and built from public sources only. Use the catalog to recognize this software, use the script as a first sweep, and remember that a 1260H designation is an affiliation flag, not a verdict on the software. Report what you find for human review.

Go forth and do good things,

Don C. Weber

TL;DR#

What I was trying to build#

The first pass: Copilot looked right and came up empty#

The second pass: Claude Code went deep and hit a wall#

What I actually shipped#

What this taught me about using AI for enrichment#

Go get it#

Related