python

Fiona

How variables handle mutable objects in Python

Question 1

def calculate_area(geom):
    return abs(geod.geometry_area_perimeter(geom)[0])

def area_interval_gcs(gdf):
    # 計算每個 geometry 的面積,結果是平方米
    gdf['area_m2'] = gdf['geometry'].apply(calculate_area)
    # 定義區間
    bins = [0, 25, 50, 100, 150, 200, 250, float('inf')]
    labels = ['0-25 m²',  '25-50 m²', '50-100 m²', '100-150 m²', '150-200 m²', '200-250 m²', '>250 m²']
    # 將面積劃分到不同區間
    gdf['area_interval'] = pd.cut(gdf['area_m2'], bins=bins, labels=labels, include_lowest=True)
    # 統計每個區間的幾何數量
    interval_counts = gdf['area_interval'].value_counts().sort_index()
    # 計算每個區間所佔的百分比
    interval_percentage = (interval_counts / interval_counts.sum()) * 100
    # 將數量和百分比結合在一起
    result = pd.DataFrame({
        'Count': interval_counts,
        'Percentage': interval_percentage
    })
    # 顯示統計結果
    return result, gdf
interval_stats, gdf_update = area_interval_gcs(gdf)
print(interval_stats)

Answer 1

In Python, understanding how variables handle mutable objects (like lists, dicts, and DataFrames) is key to understanding how changes propagate. Here’s a more in-depth explanation:

1. Variable Assignment and References

In Python, when you assign a variable to a mutable object (such as a list, dict, or DataFrame), the variable doesn’t hold the actual object itself, but rather a reference to the object. This means that multiple variables can reference the same object in memory.

Example:
a = [1, 2, 3]  # List is mutable
b = a  # b now references the same list object as a

b.append(4)  # Modify the list through b

print(a)  # Output: [1, 2, 3, 4]

In this example, modifying b also modifies a because they both refer to the same list in memory.

2. Mutable vs. Immutable Objects

For immutable objects like strings, if you attempt to modify the object, Python creates a new object rather than modifying the original one.

Example (Immutable Objects):
x = 5  # Integer is immutable
y = x
y += 1  # y now references a new object (6), x remains unchanged

print(x)  # Output: 5
print(y)  # Output: 6

In this case, modifying y does not affect x because integers are immutable. A new object is created instead.

3. Passing Mutable Objects to Functions

When you pass a mutable object (like a GeoDataFrame) to a function, the function receives a reference to the original object. This means that changes made to the object within the function affect the original object outside the function unless you explicitly create a copy of it.

Example (Passing a Mutable Object):
def modify_list(my_list):
    my_list.append(4)  # Modify the list in place

lst = [1, 2, 3]
modify_list(lst)

print(lst)  # Output: [1, 2, 3, 4]

Here, lst is modified within the modify_list function because the list is mutable, and the function modifies the original object.

4. Creating Copies to Prevent Unintended Changes

If you don’t want changes to propagate to the original object, you need to create a copy. For mutable objects like DataFrame or GeoDataFrame, you can use .copy() to create a shallow copy.

Example (Using copy()):
import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3]})
df_copy = df.copy()  # Create a copy of the DataFrame

df_copy['A'] = df_copy['A'] + 10  # Modify the copy

print(df)  # Original DataFrame remains unchanged
# Output: 
#    A
# 0  1
# 1  2
# 2  3

print(df_copy)  # The copy is changed
# Output: 
#     A
# 0  11
# 1  12
# 2  13

Without .copy(), both df and df_copy would point to the same object, and changes in one would affect the other.

5. Deep Copy vs. Shallow Copy

Example (Shallow Copy):
import copy

a = [[1, 2], [3, 4]]
b = a.copy()  # Shallow copy

b[0].append(5)

print(a)  # Output: [[1, 2, 5], [3, 4]] - Inner list is modified

To avoid this, you can use a deep copy:

b = copy.deepcopy(a)

6. GeoDataFrame Example (Copy vs. Reference)

Consider the following GeoDataFrame example, where the original data is modified inside the function because of passing by reference:

import geopandas as gpd

def modify_gdf(gdf):
    gdf['new_column'] = gdf['geometry'].area

gdf = gpd.read_file('example.geojson')
modify_gdf(gdf)

print(gdf.head())  # 'new_column' is now in the original GeoDataFrame

To prevent this, use .copy():

def modify_gdf(gdf):
    gdf = gdf.copy()  # Now changes are made only to this copy
    gdf['new_column'] = gdf['geometry'].area
    return gdf

Key Takeaways:

By keeping these behaviors in mind, you can control how your functions interact with mutable data and prevent unwanted side effects!

Question 2

def calculate_area(geom):
    return abs(geod.geometry_area_perimeter(geom)[0])

def area_interval_gcs(gdf):
    # 計算每個 geometry 的面積,結果是平方米
    gdf['area_m2'] = gdf['geometry'].apply(calculate_area)
    # 定義區間
    bins = [0, 25, 50, 100, 150, 200, 250, float('inf')]
    labels = ['0-25 m²',  '25-50 m²', '50-100 m²', '100-150 m²', '150-200 m²', '200-250 m²', '>250 m²']
    # 將面積劃分到不同區間
    gdf['area_interval'] = pd.cut(gdf['area_m2'], bins=bins, labels=labels, include_lowest=True)
    # 統計每個區間的幾何數量
    interval_counts = gdf['area_interval'].value_counts().sort_index()
    # 計算每個區間所佔的百分比
    interval_percentage = (interval_counts / interval_counts.sum()) * 100
    # 將數量和百分比結合在一起
    result = pd.DataFrame({
        'Count': interval_counts,
        'Percentage': interval_percentage
    })
    # 顯示統計結果
    return result, gdf

interval_stats, gdf_update = area_interval_gcs(gdf)
print(interval_stats)

gdf_update = gdf_update[gdf_update['area_m2'] >= 100]

Answer 2

Great question! Let’s dive into why this line only changes gdf_update and not the original gdf:

gdf_update = gdf_update[gdf_update['area_m2'] >= 100]

Key Concept: Variable Assignment Creates a New Reference

When you execute this line of code:

gdf_update = gdf_update[gdf_update['area_m2'] >= 100]

You are re-assigning the gdf_update variable to a new subset of the original gdf_update DataFrame where the area_m2 values are greater than or equal to 100.

Here’s what happens step by step:

  1. Initial Reference: When you return gdf_update from the area_interval_gcs function, both gdf and gdf_update still refer to the same underlying object (the same GeoDataFrame) at this point.

  2. New DataFrame Creation: When you run this line:
    gdf_update = gdf_update[gdf_update['area_m2'] >= 100]
    

    A new DataFrame is created. It contains only the rows where area_m2 >= 100. This new DataFrame is assigned to the variable gdf_update.

  3. Variable Reassignment: The variable gdf_update now holds a reference to this new filtered DataFrame. This assignment does not modify the original gdf GeoDataFrame because the assignment only affects gdf_update’s reference.

  4. Original Object Unaffected: Since gdf still points to the original GeoDataFrame, and gdf_update now points to a new object (the filtered DataFrame), any subsequent changes to gdf_update will not affect gdf.

In summary:

Why Doesn’t gdf Change Automatically?

Because Python doesn’t automatically link a new assignment (like gdf_update = ...) to other variables that previously referred to the same object. After reassigning gdf_update, only gdf_update refers to the filtered DataFrame, while gdf remains as it was.

This distinction allows you to freely modify subsets of your data (as gdf_update) without impacting the original data (in gdf).

Question 3

Why do we not need to return path in the following uscase? (DFS with Path Tracking – NeedCode: Lowest Common Ancestor in Binary Search Tree) ```python def find_path(node, target, path): if not node: return False

path.append(node)  # <- modifies the shared list

if node == target:
    return True

if find_path(node.left, target, path) or find_path(node.right, target, path):
    return True

path.pop()  # <- backtrack
return False ```

Answer 3

Lists are mutable in Python. Mutablility means the function can modify the original list directly, and those changes will persist ouside the function.

Python Variable Assignment and Reassignment

1. Create main_process.py

Example Structure of main_process.py

import geopandas as gpd

def process_geojson(input_file, output_file):
    # Load the GeoJSON file
    gdf = gpd.read_file(input_file)
    
    # Perform processing (calculations, transformations, etc.)
    
    # Save the result to a TSV file
    gdf.to_csv(output_file, sep='\t', index=False)

# Optionally include a main block for testing
if __name__ == "__main__":
    process_geojson('example.geojson', 'output.tsv')

2. Create batch_processor.py

Example Structure of batch_processor.py

import os
from main_process import process_geojson  # Import the processing function

# Define directories
input_dir = '/path/to/your/geojson/files/'  # Directory with GeoJSON files
output_dir = '/path/to/save/processed/files/'  # Directory to save TSV files

# Create output directory if it doesn't exist
if not os.path.exists(output_dir):
    os.makedirs(output_dir)

# Process each GeoJSON file in the input directory
for file_name in os.listdir(input_dir):
    if file_name.endswith('.geojson'):
        input_file = os.path.join(input_dir, file_name)
        output_file = os.path.join(output_dir, file_name.replace('.geojson', '_processed.tsv'))
        
        # Call the process_geojson function
        process_geojson(input_file, output_file)

        print(f"Finished processing: {input_file}")

Summary Steps

  1. Define processing logic in main_process.py:
    • Write a function to process a single GeoJSON file.
    • Include the necessary logic to handle the GeoDataFrame and save it as a TSV file.
  2. Set up batch processing in batch_processor.py:
    • Import the processing function.
    • Set up input and output directories.
    • Loop through the files, process each GeoJSON, and save the results.
  3. Run batch_processor.py:
    • This script will process all GeoJSON files in the specified directory and save the results in the output directory.

By organizing your code in this way, you can maintain a clean separation of concerns while making it easy to process multiple files efficiently.

Shapely Geometry Attributes

geometry.coords

The geometry.coords attribute provides direct access to the coordinate sequence of a Shapely geometry, such as a LineString or Point. It returns the actual points (as a list of (x, y) tuples) that define the geometry’s shape.

For example:

In short, geometry.coords is used to get all points for simpler geometries, but for polygons, it’s often necessary to use exterior.coords or work with each part individually in multi-part geometries.

geometry.geoms

Python Language Basics

for loop

sequences = [0, 1, 2, 3, 4, 5]
for i in sequences:
  print(i)

# output: 0 1 2 3 4 5
for i in range(10):
    print(i, end=" ")
    
print() #換行
for i in range(20, 2, -2):
    print(i, end=" ")

# output:
# 0 1 2 3 4 5 6 7 8 9
# 20 18 16 14 12 10 8 6 4
# Example with two arguments
for i in range(-1, 5):
    print(i, end=", ") # prints: -1, 0, 1, 2, 3, 4, 

list

In Python, list[-1] returns the last element of the list.

length of a list

my_list = [1, 2, 3, 4, 5]
print(len(my_list))

# Output:
# 5

Split String in List

string = 'geeksforgeeks'
lst = []

for letter in string:
 lst.append(letter)

print(lst)

# output: ['g', 'e', 'e', 'k', 's', 'f', 'o', 'r', 'g', 'e', 'e', 'k', 's']

Remove List Items

thislist = ["apple", "banana", "cherry"]
thislist.pop(1)
print(thislist)

# output: ['apple', 'cherry']
# https://www.w3schools.com/python/python_lists_remove.asp

Check if a list is empty or not

# Python code to check for empty list 
lis1 = [] 
if lis1==[]: 
    print("Empty List") 
else: 
    print("The list is not empty")

list of object

# neecode:meeting schedule
"""
Definition of Interval:
class Interval(object):
    def __init__(self, start, end):
        self.start = start
        self.end = end
"""

class Solution:
    def canAttendMeetings(self, intervals: List[Interval]) -> bool:
        intervals.sort(key = lambda i : i.start)
        
        for i in range(len(intervals)):
            i1 = interval[i]
            i2 = interval[i + 1]

            if i1.end > i2.start:
                return False
        
        return True

檢查 list 列表是否為空

mylist = list() # or mylist = []
print(type(mylist))
print(len(mylist))
if len(mylist) == 0:
    print('mylist is empty')
# <class 'list'>
# 0
# mylist is empty
# https://shengyu7697.github.io/python-check-list-empty/

List append()

# neetcode-two sum
class Solution:
    def twoSum(self, nums: List[int], target: int) -> List[int]:
        Ans = []
        for i in range(len(nums)):
            for j in range(i + 1, len(nums)):
                if (nums[i] + nums[j]) == target:
                    Ans.append(i)
                    Ans.append(j)
                    return Ans
        return False

enumerate()

seq = ['one', 'two', 'three']
for i, element in enumerate(seq):
    print(i, element)
# 0 one
# 1 two
# 2 three

pop()

break, continue, pass

count=0
for string in 'content':
    count+=1
    if string == 't':
        break
    print(string)
    
print('\n迴圈結束')
print('迴圈執行了 %d 次' %count)
#output
# c
# o
# n

# 迴圈結束
# 迴圈執行了四次
#https://medium.com/%40chiayinchen/1-%E5%88%86%E9%90%98%E6%90%9E%E6%87%82-python-%E8%BF%B4%E5%9C%88%E6%8E%A7%E5%88%B6-break-continue-pass-be290cd1f9d8

Dictionary + HashMap + get()

Dictionary

HashMap

# neetcode-Valid Anagram
class Solution:
    def isAnagram(self, s: str, t: str) -> bool:
        if len(s) != len(t):
            return False
        
        countS = {}  #!!!!!
        countT = {}  #!!!!!

        for i in range(len(s)):
            count[s[i]] = countS.get(s[i], 0) + 1  #!!!!!
            count[t[i]] = countT.get(t[i], 0) + 1  #!!!!!
        
        return countS == countT
# neetcode-two sum
class Solution:
    def twoSum(self, nums: List[int], target: int) -> List[int]:
        prevmap = {}
        
        for i, element in enumerate(nums):
            diff = target - element
            if diff in prevmap:   #!!!!!!!
                return [prevmap[diff], i]
            prevmap[element] = i

get()

tinydict = {'Name': 'Runoob', 'Age': 27}

print ("Age : ", tinydict.get('Age'))

# 没有设置 Sex,也没有设置默认的值,输出 None
print ("Sex : ", tinydict.get('Sex'))  

# 没有设置 Salary,输出默认的值  0.0
print ('Salary: ', tinydict.get('Salary', 0.0))

# output
# Age : 27
# Sex : None
# Salary: 0.0
>>> runoob = {}
>>> print('URL: ', runoob.get('url'))     # 返回 None
URL:  None

>>> print(runoob['url'])     # 触发 KeyError
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'url'
>>>

Remove Whitespace from String

def remove(string):
  return string.replace(" ", "")
# Driver Program
string = ' g e e k '
print(remove(string))

# geek

Remove all characters other than alphabets from string

# Python program to remove all the 
# characters other than alphabets
 
# Function to remove special characters 
# and store it in another variable
def removeSpecialCharacter(s):
    t = ""
    for i in s:
         
        # Store only valid characters
        if (i >= 'A' and i <= 'Z') or (i >= 'a' and i <= 'z'):
            t += i
    print(t)
 
# Driver code
s = "$Gee*k;s..fo, r'Ge^eks?"
removeSpecialCharacter(s)

# output: GeeksforGeeks

字串處理 lower() upper() capitalize() title()

ignores all non-alphanumeric characters

# neetcode-Is Palindrome
def removeNonAlpha(self, s: str) -> str:
        t = ''
        
        for i in s:
            if (i >= 'A' and i <= 'Z') or (i >= 'a' and i <= 'z') or (i >= '0' and i <= '9'):
                t = t + i
        return t

Linked List

確認是否已經到linked list的尾巴

while list1 != None and list2 != None #尾巴不是node,單純是None

sort()

Sorting List of Tuples by a Specific Element

# Original list of tuples
people = [("Alice", 25), ("Bob", 30), ("Charlie", 22), ("David", 28)]
 
# Sorting by the second element of each tuple (age)
people.sort(key=lambda x: x[1])
 
# Displaying the sorted list
print("Sorted by Age:", people)

Tuples

元組(tuple)是 Python 中的一種數據結構,它類似於列表,但有一些重要的區別。元組是不可變的,這意味著一旦創建,元組的元素就不能被修改、添加或刪除。

元組的特點

  1. 不可變:元組一旦創建,其中的元素就不能改變。
  2. 有序:元組中的元素按它們被添加的順序存儲。
  3. 可包含多種類型的元素:元組可以包含不同類型的數據,例如整數、浮點數、字符串、列表甚至其他元組。
  4. 使用圓括號表示:元組用圓括號 () 表示,而列表用方括號 [] 表示。

Python Class, Object, and Self

Sure! Here’s your updated complete note, now including a section about the typical structure of a Python class.

🧱 What is a class?

📦 What is an object?


🔧 __init__ method (constructor)

class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age

# Create a Person object
p1 = Person("Alice", 25)

# Access its attributes
print(p1.name)  # Output: Alice
print(p1.age)   # Output: 25

🙋‍♀️ What is self?

class Person:
    def __init__(self, name, age):
        self.name = name     # instance variable
        self.age = age       # instance variable

    def greet(self):
        print(f"Hi, I'm {self.name} and I'm {self.age} years old.")

# Create a Person object
p1 = Person("Alice", 25)

# Call the greet method
p1.greet()

# Output
# Hi, I'm Alice and I'm 25 years old.

🧠 Typical Structure of a Python Class

class ClassName:
    def __init__(self, arg1, arg2, ...):
        self.attr1 = arg1
        self.attr2 = arg2
        # Other instance-specific data

    def method1(self):
        # Action using self.attr1, self.attr2
        pass

    def method2(self, param):
        # Action using external parameter and internal state
        pass

✅ Key Concepts

Concept Meaning
class Defines a custom data type or blueprint
object Instance of a class, created using ClassName()
__init__ Initializes object with default or passed-in values
self Refers to the current object; used to access its attributes/methods
self.attribute Stores data in the object (e.g., self.name = name)
Method call obj.method() automatically passes self

❓Can a class work without __init__?


Usage Example

class BankAccount:
    def __init__(self, owner, balance):
        self.owner = owner
        self.balance = balance

    def show_balance(self):
        print(f"{self.owner}'s balance is ${self.balance}")

    def deposit(self, amount):
        self.balance += amount
        print(f"Deposited ${amount}. New balance is ${self.balance}")

# Create an object (instance)
acc1 = BankAccount("Alice", 1000)

# Call methods
acc1.show_balance()       # Output: Alice's balance is $1000
acc1.deposit(500)         # Output: Deposited $500. New balance is $1500
acc1.show_balance()       # Output: Alice's balance is $1500

# Print variables directly
print(acc1.owner)    # Output: Alice
print(acc1.balance)  # Output: 1500

list comprehension(列表生成式)

📌 什麼是 List Comprehension?

List Comprehension 是 Python 中用來快速 建立 list 的語法糖。
可用一行語法完成「遍歷 + 篩選 + 轉換」。

🧱 語法結構

[表達式 for 變數 in 可迭代物件 if 條件]

📌 範例:OSM 節點轉成幾何點

[Point([n.lon, n.lat]) for n in w.nodes]

✅ 解釋:

等效傳統寫法

points = []
for n in w.nodes:
    points.append(Point([n.lon, n.lat]))

🎯 常見應用情境

應用 範例
篩選 [x for x in lst if x > 0] → 正數列表
轉換 [x**2 for x in lst] → 平方
解構物件 [user.name for user in users]
幾何轉換 [Point([n.lon, n.lat]) for n in way.nodes]

🧠 小技巧

✅ 優點

⚠️ 注意

References