Data Classification for SMB file shares
Automate data classification by integrating MineOS with your SMB file server
How It Works
MineOS connects to your Windows SMB file server over your existing network and performs a statistically representative smart sample scan across your shared drives. Rather than scanning every file, MineOS:
- Classifies PII, PHI, PCI, and other sensitive data found inside supported file types, producing a report of data types discovered and the folders where they reside.
- Samples proportionally — to ensure results reflect the true on-disk composition as much as possible, the scanned sample is proportional to the on-disk file-type mix, within the constraints detailed below.
Note: This integration performs data classification only — it does not fulfill DSR deletion or copy requests on SMB file shares.
Before You Start
- Your SMB environment must use Active Directory/WORKGROUP with NTLM authentication
- You will need to create a dedicated read-only user on your file server for MineOS before setup (see step 4 below). Reusing personal credentials is not recommended. You will need the server hostname, Active Directory domain, username, and password for setting up the integration in Mine.
- MineOS must be able to reach your file server over the network. If your server is on a private or on-premise network, a VPN tunnel must be configured first. If it is publicly accessible, MineOS IPs must be whitelisted in your firewall. See the VPN Tunneling guide and IP Whitelisting guide.
- SMB port TCP 445 (DirectTCPTransport) must be open between MineOS and your file server.
Setting Up
Optional Preview Mode set up
There is an option in MineOS to view sample PII values alongside classification results. The sample values are stored encrypted and automatically deleted after 7 days. When enabling Preview mode it applies to ALL scans that you perform after enablement, not only SMB. Make sure to enable preview mode BEFORE setting up the SMB scan, if desired.
To set up preview mode follow these steps:
- Enable Preview Mode: Navigate to the main Settings page by clicking on the company name at the bottom-left corner of the screen → Settings → Classifications tab → Data Classifier section → enable the Preview mode toggle
- Set a password: MineOS cannot recover this password


SMB integration set up
- Log in to your MineOS account. On the left sidebar, click Inventory → Data Sources. Click Add Data Source and search for SMB. Click Add.

- Open the newly added SMB data source from the Inventory list → navigate to the Integrations tab → Request Handling section -> Select Integration.

- In MineOS, fill in the connection fields:
- Password — the account password
- Server — server hostname or IP (e.g.,
fileserver01.corp.example.com) - Domain — your Active Directory domain or WORKGROUP (e.g.,
corp.example.com) - Username — the dedicated service account username
- [Optional] Share name filter — optional regex, only matching share names are scanned. Leave this blank to scan all eligible shares the service account can access

- Mark the Scan this source using Data Classifier checkbox
- Click Test your integration to verify MineOS can connect to the file server
- Click Test & Save
You're done! MineOS has been connected to your SMB file server and will begin the data classification scan.
Which Shares Are Scanned
The SMB integration scans only regular, accessible shares — administrative shares (e.g., C$, ADMIN$, IPC$, PRINT$) and any share whose name ends in $ are skipped automatically. This means legitimately hidden data folders named with a trailing $ (e.g., Finance$, HR$) will also not be scanned. Only standard visible shares are included.
| Share example | What it is | Scanned? |
|---|---|---|
| C$, D$, E$ | Whole disk drive — Windows auto-shares one per drive for remote admin | ✗ Blocked (server admin) |
| ADMIN$ | The Windows system folder (C:\Windows), for remote management | ✗ Blocked (server admin) |
| IPC$ | Not a folder — a channel for programs to talk (named pipes / RPC) | ✗ Blocked (not a file share) |
| print$ | Shared printer driver files that client PCs download | ✗ Blocked (printer queue/$) |
| FAX$ | Working files for the Windows fax service | ✗ Blocked (name ends in $) |
| Finance$, HR$ | A normal data folder an admin hid with "$" — holds real files | ✗ Blocked (name ends in $) |
| Finance, HR, Documents | Ordinary visible folders where people keep work documents | ✓ Scanned, unless regex set in portal would exclude it |
| Users, home dirs | A person's private home / Documents folder | ✓ Scanned, unless regex set in portal would exclude it |
| smbuser (Samba phantom) | Listed as a share but won't open when connected to | Skipped gracefully |
Important: The trailing $ rule is a fallback for Samba servers that don't set the admin flag — so the scanner skips every $ share. This means legitimately hidden data folders (like Finance$ or HR$) are also skipped. If you need these folders scanned, rename the share to remove the trailing $ and ensure the service account has Read access to it.
Note: Inaccessible "phantom" shares that appear in the share list but cannot be opened are skipped gracefully — the rest of the scan continues unaffected.
What MineOS Scans
MineOS scans files whose extension belongs to one of the following supported categories. All other file types are silently skipped.
| Category | Supported formats |
|---|---|
| Documents | .pdf, .asc, .brf, .c, .cc, .cpp, .cxx, .c++, .dart, .eml, .go, .h, .hh, .hpp, .hxx, .h++, .hs, .html, .htm, .shtml, .shtm, .xhtml, .lhs, .ini, .java, .js, .ocaml, .md, .mkd, .markdown, .m, .ml, .mli, .pl, .pm, .php, .phtml, .pht, .py, .pyw, .rb, .rbw, .rs, .rc, .scala, .sh, .sql, .tex, .txt, .text, .vcard, .vcs, .wml, .xsl, .xsd, .docx, .dotx, .docm, .dotm |
| Structured data | .avro, .csv, .tsv, .json, .xml, .yml, .yaml |
| Images (via OCR) | .bmp, .gif, .jpg, .jpeg, .jpe, .png |
| Spreadsheets | .xlsx, .xlsm, .xltx, .xltm |
| Slideshows | .pptx, .pptm, .potx, .potm |
| Parquet | .parquet |
Note: Legacy Office formats (.doc, .xls, .ppt), archives (.zip), media files (.mp4, .mp3), .odt, .rtf, and other binary formats are not supported and will be silently skipped. Only modern Office Open XML formats are scanned.
Understanding Scan Limits
MineOS uses smart sampling to keep scans fast and representative on large file servers. The following limits apply per scan run:
| Limit | Value |
|---|---|
| Max file size downloaded for scanning | 10 MB |
| Max files enumerated (walked) per share | 1,000,000 |
| Parallel connections per share | Up to 10 |
| Max shares processed per run | 2,000 |
Files larger than 10 MB are counted in the inventory totals but are not downloaded or scanned for content. This keeps results proportionally accurate without inflating scan time.
Need Help?
If you need any help with this integration, please contact your Customer Success Manager. They will be happy to assist you!