// 获取CPU核心数
int cpuCores = Runtime.getRuntime().availableProcessors();

// CPU密集型任务线程池
ThreadPoolExecutor cpuIntensiveExecutor = new ThreadPoolExecutor(
    cpuCores + 1,           // 核心线程数
    cpuCores + 1,           // 最大线程数
    0L,                     // 线程空闲时间
    TimeUnit.MILLISECONDS,  // 时间单位
    new LinkedBlockingQueue<>()  // 任务队列
);

// IO密集型任务线程池（假设IO等待时间是CPU处理时间的3倍）
ThreadPoolExecutor ioIntensiveExecutor = new ThreadPoolExecutor(
    cpuCores * 4,           // 核心线程数
    cpuCores * 4,           // 最大线程数
    0L,                     // 线程空闲时间
    TimeUnit.MILLISECONDS,  // 时间单位
    new LinkedBlockingQueue<>()  // 任务队列
);

2.1.2 任务队列的选择

任务队列的选择应根据任务特性来确定：

有界队列：可以防止系统资源耗尽，但可能导致任务被拒绝
无界队列：可以处理大量任务，但可能导致内存溢出

常见的任务队列包括：

ArrayBlockingQueue：基于数组的有界阻塞队列
LinkedBlockingQueue：基于链表的阻塞队列，可以是有界或无界
SynchronousQueue：不存储元素的阻塞队列，每个插入操作必须等待一个对应的删除操作
PriorityBlockingQueue：基于优先级的无界阻塞队列

2.2 线程池监控与调优

定期监控线程池的状态对于调优至关重要：

ThreadPoolExecutor executor = new ThreadPoolExecutor(
    5, 10, 60L, TimeUnit.SECONDS, new LinkedBlockingQueue<>(100)
);

// 监控线程池状态
ScheduledExecutorService monitor = Executors.newScheduledThreadPool(1);
monitor.scheduleAtFixedRate(() -> {
    System.out.println("当前线程数: " + executor.getPoolSize());
    System.out.println("活跃线程数: " + executor.getActiveCount());
    System.out.println("完成任务数: " + executor.getCompletedTaskCount());
    System.out.println("队列中等待的任务数: " + executor.getQueue().size());
    System.out.println("====================================");
}, 0, 5, TimeUnit.SECONDS);

2.3 线程池调优最佳实践

根据任务类型选择合适的线程池参数
使用有界队列防止系统资源耗尽
选择合适的拒绝策略
定期监控线程池状态
避免使用Executors工厂方法创建线程池
为线程池命名，便于调试和监控

3. 锁竞争优化

锁竞争是并发性能的主要瓶颈之一，减少锁竞争可以显著提高并发性能。

3.1 减小锁的粒度

减小锁的粒度是减少锁竞争的有效方法之一：

// 粗粒度锁（不推荐）
public class CoarseGrainedLockExample {
    private final Object lock = new Object();
    private Map<String, Object> map1 = new HashMap<>();
    private Map<String, Object> map2 = new HashMap<>();
    
    public void put1(String key, Object value) {
        synchronized (lock) {
            map1.put(key, value);
        }
    }
    
    public void put2(String key, Object value) {
        synchronized (lock) {
            map2.put(key, value);
        }
    }
}

// 细粒度锁（推荐）
public class FineGrainedLockExample {
    private final Object lock1 = new Object();
    private final Object lock2 = new Object();
    private Map<String, Object> map1 = new HashMap<>();
    private Map<String, Object> map2 = new HashMap<>();
    
    public void put1(String key, Object value) {
        synchronized (lock1) {
            map1.put(key, value);
        }
    }
    
    public void put2(String key, Object value) {
        synchronized (lock2) {
            map2.put(key, value);
        }
    }
}

3.2 使用读写锁

对于读多写少的场景，使用读写锁可以提高并发性能：

public class ReadWriteLockExample {
    private final ReadWriteLock rwLock = new ReentrantReadWriteLock();
    private Map<String, Object> data = new HashMap<>();
    
    // 读操作使用读锁
    public Object get(String key) {
        rwLock.readLock().lock();
        try {
            return data.get(key);
        } finally {
            rwLock.readLock().unlock();
        }
    }
    
    // 写操作使用写锁
    public void put(String key, Object value) {
        rwLock.writeLock().lock();
        try {
            data.put(key, value);
        } finally {
            rwLock.writeLock().unlock();
        }
    }
}

3.3 锁分离

锁分离是将一把锁拆分为多把锁，分别保护不同的资源：

// 基于LinkedBlockingQueue的锁分离实现
public class LockSeparationExample {
    private final Node head;
    private final Node tail;
    private final Object headLock = new Object();
    private final Object tailLock = new Object();
    
    public LockSeparationExample() {
        Node dummy = new Node(null);
        head = dummy;
        tail = dummy;
    }
    
    // 入队操作只需要锁定tail节点
    public void enqueue(Object item) {
        Node newNode = new Node(item);
        synchronized (tailLock) {
            tail.next = newNode;
            tail = newNode;
        }
    }
    
    // 出队操作只需要锁定head节点
    public Object dequeue() {
        synchronized (headLock) {
            Node first = head.next;
            if (first == null) {
                return null;
            }
            Object item = first.item;
            first.item = null;
            head.next = first.next;
            // 如果队列变为空，更新tail
            if (head.next == null) {
                synchronized (tailLock) {
                    tail = head;
                }
            }
            return item;
        }
    }
    
    private static class Node {
        Object item;
        Node next;
        
        Node(Object item) {
            this.item = item;
        }
    }
}

3.4 无锁编程

无锁编程是指不使用传统的锁机制，而是使用CAS等原子操作来实现线程安全：

public class LockFreeStack<T> {
    private AtomicReference<Node<T>> top = new AtomicReference<>();
    
    public void push(T item) {
        Node<T> newNode = new Node<>(item);
        Node<T> oldTop;
        
        do {
            oldTop = top.get();
            newNode.next = oldTop;
        } while (!top.compareAndSet(oldTop, newNode));
    }
    
    public T pop() {
        Node<T> oldTop;
        Node<T> newTop;
        
        do {
            oldTop = top.get();
            if (oldTop == null) {
                return null;
            }
            newTop = oldTop.next;
        } while (!top.compareAndSet(oldTop, newTop));
        
        return oldTop.item;
    }
    
    private static class Node<T> {
        T item;
        Node<T> next;
        
        Node(T item) {
            this.item = item;
        }
    }
}

4. 内存访问优化

内存访问优化可以减少内存访问延迟，提高并发性能。

4.1 减少伪共享

伪共享是指多个线程访问不同的变量，但这些变量位于同一个缓存行中，导致缓存一致性开销增加：

// 伪共享示例
public class FalseSharingExample {
    private static class Counter {
        public volatile long count = 0;
    }
    
    private static final int THREAD_COUNT = 4;
    private static final long ITERATIONS = 100000000;
    
    public static void main(String[] args) throws InterruptedException {
        Counter[] counters = new Counter[THREAD_COUNT];
        for (int i = 0; i < THREAD_COUNT; i++) {
            counters[i] = new Counter();
        }
        
        Thread[] threads = new Thread[THREAD_COUNT];
        for (int i = 0; i < THREAD_COUNT; i++) {
            final int index = i;
            threads[i] = new Thread(() -> {
                for (long j = 0; j < ITERATIONS; j++) {
                    counters[index].count++;
                }
            });
        }
        
        long startTime = System.currentTimeMillis();
        for (Thread thread : threads) {
            thread.start();
        }
        for (Thread thread : threads) {
            thread.join();
        }
        long endTime = System.currentTimeMillis();
        
        System.out.println("执行时间: " + (endTime - startTime) + "ms");
    }
}

// 避免伪共享的示例
public class PaddingExample {
    private static class PaddedCounter {
        public volatile long count = 0;
        // 填充缓存行，避免伪共享
        public volatile long p1, p2, p3, p4, p5, p6, p7;
    }
    
    // 与FalseSharingExample类似的测试代码
}

4.2 内存对齐

内存对齐可以提高内存访问效率：

// 未对齐的类
public class MisalignedClass {
    private byte a;      // 1字节
    private long b;      // 8字节
    private byte c;      // 1字节
    // 总大小：1 + 7（填充） + 8 + 1 + 7（填充） = 24字节
}

// 对齐的类
public class AlignedClass {
    private long b;      // 8字节
    private byte a;      // 1字节
    private byte c;      // 1字节
    // 总大小：8 + 1 + 1 + 6（填充） = 16字节
}

4.3 使用局部变量

局部变量存储在栈上，访问速度快，且不会导致竞争：

// 不推荐：频繁访问共享变量
public class SharedVariableExample {
    private volatile int count = 0;
    
    public void increment() {
        for (int i = 0; i < 1000; i++) {
            count++;
        }
    }
}

// 推荐：使用局部变量减少共享变量访问
public class LocalVariableExample {
    private volatile int count = 0;
    
    public void increment() {
        int localCount = 0;
        for (int i = 0; i < 1000; i++) {
            localCount++;
        }
        count += localCount;
    }
}

5. 并发集合选择与调优

Java并发包提供了多种并发集合类，选择合适的并发集合对于提高性能至关重要。

5.1 并发集合的选择

根据不同的使用场景选择合适的并发集合：

集合类型	适用场景	实现原理	性能特点
ConcurrentHashMap	高并发读写	分段锁/CAS	读操作无锁，写操作锁分段
CopyOnWriteArrayList	读多写少	写时复制	读操作无锁，写操作开销大
ConcurrentLinkedQueue	高并发队列	CAS	无锁实现，高并发性能好
LinkedBlockingQueue	有界队列	重入锁+Condition	适合生产者-消费者模式
ArrayBlockingQueue	有界队列	重入锁+Condition	数组实现，性能稳定
SynchronousQueue	无缓冲队列	CAS	适合直接传递模式

5.2 ConcurrentHashMap调优

ConcurrentHashMap是最常用的并发集合之一，其性能调优主要关注以下几点：

初始容量：根据预期的元素数量设置合理的初始容量
负载因子：默认为0.75，一般不需要修改
并发级别：Java 8之前用于控制分段锁的数量，Java 8之后已不再使用

// 初始化ConcurrentHashMap时设置合理的初始容量
int expectedSize = 10000;
int initialCapacity = (int) (expectedSize / 0.75f) + 1;
ConcurrentHashMap<String, Object> map = new ConcurrentHashMap<>(initialCapacity);

5.3 避免过度使用并发集合

并发集合虽然线程安全，但性能开销比非并发集合大。在单线程环境或无并发访问的情况下，应使用非并发集合：

// 单线程环境使用普通HashMap
Map<String, Object> singleThreadMap = new HashMap<>();

// 多线程环境使用ConcurrentHashMap
Map<String, Object> concurrentMap = new ConcurrentHashMap<>();

6. 原子操作与无锁编程

原子操作和无锁编程可以减少锁竞争，提高并发性能。

6.1 Atomic类的使用

Java并发包提供了多种Atomic类，用于实现原子操作：

// AtomicInteger示例
AtomicInteger atomicInt = new AtomicInteger(0);

// 原子递增
int newValue = atomicInt.incrementAndGet();

// 原子比较并交换
boolean success = atomicInt.compareAndSet(1, 10);

// AtomicReference示例
AtomicReference<User> atomicUser = new AtomicReference<>(new User("张三", 20));

// 原子更新用户信息
atomicUser.updateAndGet(user -> {
    User newUser = new User(user.getName(), user.getAge());
    newUser.setAge(user.getAge() + 1);
    return newUser;
});

6.2 LongAdder与AtomicLong的选择

对于高并发计数场景，LongAdder的性能优于AtomicLong：

// 高并发计数场景使用LongAdder
LongAdder counter = new LongAdder();

// 并发递增
counter.increment();

// 获取当前值
long currentValue = counter.sum();

6.3 无锁数据结构

无锁数据结构使用CAS等原子操作实现线程安全，避免了锁竞争：

// 无锁链表
public class LockFreeLinkedList<T> {
    private static class Node<T> {
        T value;
        AtomicReference<Node<T>> next;
        
        Node(T value) {
            this.value = value;
            this.next = new AtomicReference<>(null);
        }
    }
    
    private final Node<T> head = new Node<>(null);
    private final AtomicReference<Node<T>> tail = new AtomicReference<>(head);
    
    public void add(T value) {
        Node<T> newNode = new Node<>(value);
        Node<T> oldTail;
        
        while (true) {
            oldTail = tail.get();
            Node<T> next = oldTail.next.get();
            
            // 检查tail是否仍然指向oldTail
            if (oldTail == tail.get()) {
                // 检查oldTail的next是否为null
                if (next == null) {
                    // 尝试将oldTail的next设置为newNode
                    if (oldTail.next.compareAndSet(null, newNode)) {
                        // 尝试将tail设置为newNode
                        tail.compareAndSet(oldTail, newNode);
                        return;
                    }
                } else {
                    // tail已经落后，尝试将tail设置为next
                    tail.compareAndSet(oldTail, next);
                }
            }
        }
    }
    
    public boolean contains(T value) {
        Node<T> current = head.next.get();
        while (current != null) {
            if (Objects.equals(current.value, value)) {
                return true;
            }
            current = current.next.get();
        }
        return false;
    }
}

7. 异步编程与非阻塞IO

异步编程和非阻塞IO可以提高系统的吞吐量和响应时间。

7.1 CompletableFuture的使用

CompletableFuture提供了丰富的异步编程API：

// 异步执行任务
CompletableFuture<String> future = CompletableFuture.supplyAsync(() -> {
    // 模拟耗时操作
    try {
        Thread.sleep(1000);
    } catch (InterruptedException e) {
        Thread.currentThread().interrupt();
        throw new RuntimeException(e);
    }
    return "异步结果";
});

// 处理异步结果
future.thenAccept(result -> {
    System.out.println("获取到结果: " + result);
});

// 组合多个CompletableFuture
CompletableFuture<String> future1 = CompletableFuture.supplyAsync(() -> "Hello");
CompletableFuture<String> future2 = CompletableFuture.supplyAsync(() -> "World");

CompletableFuture<String> combinedFuture = future1
    .thenCombine(future2, (result1, result2) -> result1 + " " + result2)
    .thenApply(String::toUpperCase);

String result = combinedFuture.join();
System.out.println(result); // 输出: HELLO WORLD

7.2 非阻塞IO

非阻塞IO可以提高IO密集型应用的性能：

// 使用NIO2的异步文件通道
AsynchronousFileChannel fileChannel = AsynchronousFileChannel.open(
    Paths.get("data.txt"),
    StandardOpenOption.READ,
    StandardOpenOption.WRITE,
    StandardOpenOption.CREATE
);

// 异步写入数据
ByteBuffer buffer = ByteBuffer.wrap("Hello, NIO2!".getBytes());
Future<Integer> writeFuture = fileChannel.write(buffer, 0);

// 异步读取数据
ByteBuffer readBuffer = ByteBuffer.allocate(1024);
fileChannel.read(readBuffer, 0, readBuffer, new CompletionHandler<Integer, ByteBuffer>() {
    @Override
    public void completed(Integer result, ByteBuffer attachment) {
        attachment.flip();
        byte[] data = new byte[attachment.remaining()];
        attachment.get(data);
        System.out.println("读取到数据: " + new String(data));
    }
    
    @Override
    public void failed(Throwable exc, ByteBuffer attachment) {
        System.err.println("读取失败: " + exc.getMessage());
    }
});

8. 性能监控与分析

性能监控和分析是并发性能优化的重要环节，通过监控和分析可以发现性能瓶颈并进行针对性优化。

8.1 JVM工具的使用

常用的JVM性能监控工具包括：

jps：查看Java进程ID
jstat：监控JVM的统计信息
jstack：查看Java线程堆栈信息
jmap：生成Java堆转储文件
jhat：分析Java堆转储文件
VisualVM：可视化JVM监控工具

8.2 线程堆栈分析

通过线程堆栈分析可以发现死锁、锁竞争等问题：

# 生成线程堆栈信息
jstack -l <pid> > thread_dump.txt

# 分析死锁
jstack -F <pid> | grep -A 10 "Found one Java-level deadlock"

8.3 性能分析工具

常用的性能分析工具包括：

Java Flight Recorder (JFR)：JVM内置的性能分析工具
Async Profiler：低开销的Java性能分析工具
YourKit：商业级Java性能分析工具

8.4 自定义性能监控

可以通过自定义监控来跟踪并发程序的性能：

// 自定义性能监控工具
public class PerformanceMonitor {
    private final ConcurrentHashMap<String, AtomicLong> metrics = new ConcurrentHashMap<>();
    
    // 记录操作耗时
    public void recordTime(String operation, long duration) {
        AtomicLong counter = metrics.computeIfAbsent(operation, k -> new AtomicLong());
        counter.addAndGet(duration);
    }
    
    // 获取操作总耗时
    public long getTotalTime(String operation) {
        AtomicLong counter = metrics.get(operation);
        return counter != null ? counter.get() : 0;
    }
    
    // 输出所有监控数据
    public void printMetrics() {
        System.out.println("性能监控数据:");
        metrics.forEach((operation, counter) -> {
            System.out.printf("%s: %dms\n", operation, counter.get());
        });
    }
}

// 使用示例
PerformanceMonitor monitor = new PerformanceMonitor();

long startTime = System.currentTimeMillis();
// 执行操作
long endTime = System.currentTimeMillis();

monitor.recordTime("operation1", endTime - startTime);