1.3 Structuring and Manipulating Archaeological Data

Objective: Build and manipulate basic datasets in Python using appropriate data structures

In this chapter, you’ll learn how to create, organize, and clean archaeological datasets using the core Python structures you’ve already explored — tuples, lists, and dictionaries. You’ll also be introduced to reading data from CSV and JSON files and preparing it for reuse or visualization.

We’ll stay at Tell Logika, our fictional excavation site, where this week’s task is to digitize and filter artifact data related to tool adoption. You’ll act as the data analyst preparing site-level data on the percentage of stone, wood, and metal tools found at various trenches.

1.3.1 What Does “Structuring Data” Mean?

Transforming raw information into usable formats
Organizing data into rows, columns, keys, and values
Creating predictable, searchable, and analysable structures

Unstructured data — like unlabelled field notes or inconsistent spreadsheets — is hard to work with. Structuring it gives shape to the information, allowing you to sort, filter, and analyze it computationally.

1.3.2 Mock Dataset: Lists of Tuples

Let’s say you’ve recorded tool adoption estimates at six trenches at Tell Logika:

tool_adoption = [
    ("TL01", 75),
    ("TL02", 40),
    ("TL03", 85),
    ("TL04", 20),
    ("TL05", 95),
    ("TL06", 35)
]

Each number represents the estimated percentage of metal tools versus other materials (stone or wood). This format mirrors how rows of a spreadsheet might be structured.

1.3.3 Dictionaries for Fast Lookups

tool_dict = {
    "TL01": 75,
    "TL02": 40,
    "TL03": 85
}

Dictionaries are great when you need to look up data quickly — like checking the tool adoption for trench TL03.

1.3.4 Reading Data from a CSV File

import csv

with open("tool_adoption.csv") as f:
    reader = csv.reader(f)
    for row in reader:
        print(row)

This script reads a basic CSV file. Each row might look like ["TL01", "75"]. You could then convert each row into a tuple or dictionary.

1.3.5 Reading JSON into Python

JSON (JavaScript Object Notation) is a flexible format used to store structured data, especially when it contains nesting or multiple layers (e.g., categories within sites).

{
  "TL01": {"stone": 10, "wood": 15, "metal": 75},
  "TL02": {"stone": 30, "wood": 30, "metal": 40}
}

This structure shows how JSON stores data as a dictionary, and each value can be another dictionary. This is more expressive than a CSV file, which is flat and row-based.

To read JSON in Python:

import json

with open("tool_types.json") as f:
    data = json.load(f)
    print(data["TL01"]["metal"])

This will print the percentage of metal tools in trench TL01. JSON is powerful but may be unfamiliar to many archaeologists — that’s okay! CSV is usually your starting point.

1.3.6 Cleaning and Filtering

# Filter trenches with > 50% metal tool usage
filtered = {k: v for k, v in tool_dict.items() if v > 50}
print(filtered)

You may want to isolate trenches that show early or dominant adoption of metal tools. Filtering is essential for narrowing your data focus.

1.3.7 Organizing Data for Reuse

import json

with open("filtered_tools.json", "w") as f:
    json.dump(filtered, f)

Saving data ensures you or your collaborators can reload it later for further analysis or visualization.

1.3.8 Coming Up: Visualizing Data

In Chapter 1.4, you’ll visualize this dataset using bar charts and scatterplots. You’ll explore questions like: Are metal tools more common in later trenches? Does geography influence tool type distribution?

Creating bar charts and scatterplots
Interpreting visual patterns
Correlating archaeological data with material transitions

1.3.9 Quick Review

✅ Use tuples and lists to mock small datasets
✅ Use dictionaries for quick access and lookups
✅ Use csv and json for importing/exporting data
✅ Filter and save cleaned data for reuse

🔍 Activity: Structuring and Filtering Tool Data from Tell Logika

Scenario: As the site data analyst, your team at Tell Logika has recorded the percentage of metal tools found in six excavation trenches. You’ve been asked to filter out only those trenches that show significant metal tool use — defined by your supervisor as greater than 50%. Your task is to build the dataset, apply this filter, and save the results for further analysis.

Instructions:

Create a new folder: ComputationalArchaeology
Create a subfolder: chapter1
Inside it, create a Jupyter Notebook: TellLogika_Tools.ipynb

Step-by-step Python Code:

# --- CREATE DATA ---

# List of tuples (Trench ID, Metal Tool %)
tools = [
    ("TL01", 75),
    ("TL02", 40),
    ("TL03", 85),
    ("TL04", 20),
    ("TL05", 95),
    ("TL06", 35)
]

# Convert to dictionary
tool_dict = {site: percent for site, percent in tools}

print("Original Dataset:")
for site, score in tool_dict.items():
    print(f"{site}: {score}% metal tools")

print("\n---\n")

# --- FILTERING ---

# Keep only trenches above 50% metal tool usage
filtered = {k: v for k, v in tool_dict.items() if v > 50}

print("Filtered Dataset (> 50% metal tools):")
for site, score in filtered.items():
    print(f"{site}: {score}%")

# --- SAVING TO JSON ---

import json

with open("filtered_tool_sites.json", "w") as f:
    json.dump(filtered, f)

print("Filtered data saved as 'filtered_tool_sites.json'")

What This Teaches You:

✅ How to represent tool-related data with tuples and dictionaries
✅ How to apply filters for analytical insight
✅ How to save structured data for future use

Tips for further exploration:

Change the filter to select trenches below 30% metal usage: if v < 30
Add new trenches with different values and observe how the output changes
Try sorting the trenches by percentage before filtering (optional challenge)
Modify the output format to include a summary count of high-metal trenches